Ten years ago it all seemed so simple! Increase utilization of existing compute resource by hosting multiple virtual machines per physical platform; so consolidating applications onto fewer physical machines. As the virtual machine ‘shields’ its hosted application from the underlying physical environment, this is achieved without changes to the application. As applications may now move runtime location without re-configuration; the idea of virtual machine based ‘Cloud Computing’ was inevitable.
However, there are downsides.
Virtual machine image sprawl is now a well know phrase. If the virtual machine image is the unit of deployment; any software upgrade or configuration change, no matter how small, generates a new image. With a typical size of ~1 Gbyte (see table 2 – http://www.ssrc.ucsc.edu/Papers/ssrctr-10-01.pdf) – this soon adds up! Large virtual environments rapidly consume expensive on-line and off-line data storage resource. This in-turn has driven the use of de-duplication technologies. So increasing storage cost and / or increasing operational complexity.
Once constructed, virtual machine images must be propagated, perhaps many times across the network, to the physical hosts. Also, a small configuration change, which results in a new virtual machine image, which needs to be deployed to many nodes; can generate hundreds of Gbytes of network traffic.
When used as the unit of application deployment; virtualization increases operation complexity, and increases the consumption of expensive physical network and storage resources: both of which are ironically probably more expensive than compute resource which virtualization is attempting to optimize the use of.
We’re not finished!
- Some categories of application simply cannot be de-coupled from the physical environment. Network latency is NOT zero, network bandwidth is NOT infinite and locality of data DOES matter.
- Virtualization complicates and obscures runtime dependencies. If a physical node fails, which virtual machines were lost? More importantly, which services were lost, which business applications were operationally impacted? Companies are now building monitoring systems that attempt to unravel these questions: further operational band-aids!
- Centralized VM management solutions introduce new and operationally significant points of failure.
- As the operational complexity of virtual environments is higher than their physical predecessors; there is an increased the likelihood of catastrophic cascading failure caused by simple human error.
For all these reasons, the idea of re-balancing critical production loads by dynamically migrating virtual machine images, is I suggest a popular Marketing Myth. While many analysts, software vendors, investors and end users continue to see virtualization as the ultimate silver bullet! They are, I believe, deluded.
The move to the ‘virtual enterprise’ has not been without significant cost. The move to the ‘virtual enterprise’ has not addressed fundamental IT issues. Nor will moving to public or private Cloud solutions based on virtualization.
And so the Story Evolves
Acknowledging these issues, a discernible trend has started in the Cloud Computing community. Increasingly the virtual machine image is rejected as the deployment artifact. Rather:
- Virtual machines are used to partition physical resource.
- Software is dynamically installed and configured.
- In more sophisticated solutions, each resource target has a local agent which can act upon an installation command. This agent is able to:
- Resolve runtime installation dependencies implied by the install command.
- Download only the required software artifacts.
- Install, configure and start required ‘services’.
- Should subsequent re-configure or update commands be received; the agent will only download the changed software component, and / or re-configure artifacts that are already cached locally.
Sort of makes sense, doesn’t it!?
The Elephant in the Room
Dynamic deployment and configuration of software artifacts certainly makes more sense than pushing around virtual machine images. But have we actually addressed the fundamental issues that organisations face?
As I’ve referenced on many occasions; Gartner research indicates that software maintenance dominates IT OPEX (http://www.soasymposium.com/pdf_berlin/Anne_Thomas_Manes_Proving_the.pdf). In comparison hardware costs are only ~10% of this OPEX figure.
To put this into perspective; a large organisation’s annual IT OPEX may be ~$2 billion. Gartner’s research implies that, of this, $1.6 billion will be concerned with the management and maintenance of legacy applications. Indeed, one organization recently explained that each line of code changed in an application generated a downstream cost of >$1 million!
The issue isn’t resolved by virtualisation, nor Cloud. Indeed, software vendors, IT end users, IT investors and IT industry analysts have spent the last decade trying to optimize an increasingly insignificant part of the OPEX equation; while at the same time ignoring the elephant in the room.
Modular Systems are Maintainable Systems
If one is to address application maintainability – then modularity is THE fundamental requirement.
Luckily for organizations that are pre-dominantly Java based; help is at hand in the form of OSGi. OSGi specifications and corresponding OSGi implementations provide the industry standards upon which an organisation can being to modularise their portfolio of in-house Java applications; thereby containing the on-going cost of application maintenance. For further detail on the business benefits of OSGi based business systems; see http://www.osgi.org/wiki/uploads/Links/OSGiAndTheEnterpriseBusinessWhitepaper.pdf).
But what are the essential characteristics of a ‘modular Cloud runtime’: characteristics that will ensure a successful OSGi strategy? These may be simply deduced from the following principles:
- The unit of maintenance and the unit of re-use are the same as the unit of deployment. Hence the unit of deployment should be the ‘bundle’.
- Modularity reduces application maintenance for developers. However, this must not be at the expense of increasing runtime complexity for operations. The unit of operational management should be the ‘business system’.
Aren’t these requirements inconsistent? No, not if the ‘business system’ is dynamically assembled from the required ‘bundles’ at runtime. Operations: deploy, re-configure, scale and up-date ‘business systems’. The runtime internally maps these activities to the deployment and re-configuration of the required OSGi bundles.
In addition to these essential characteristics:
- We would still like to leverage the resource partitioning capabilities of virtual machines. But the virtual machine image is no-longer the unit of application deployment. As the runtime dynamically manages the mapping of services to virtual and physical resources; operations need no longer be concerned with this level of detail. From an operational perspective, it is sufficient to know that the ‘business system’ is functional and meeting its SLA.
- Finally, it takes time to achieve a modular enterprise. It would be great if the runtime supported traditional software artifacts including WAR’s, simply POJO deployments and even non-Java artifacts!
Are there any runtime solutions that have such characteristics? Yes, one: the Paremus Service Fabric. A modular Cloud runtime – designed from the ground-up using OSGi; for OSGi based ‘business systems’. The Service Fabric’s unique adaptive, agile and self-assembling runtime behaviors minimizes operational management whilst increasing service robustness. To get you started – the Service Fabric also supports non OSGi artefacts.
A final note: even Paremus occasionally bends to IT fashion :-/ Our imminent Service Fabric 1.8 release will support deployment of virtual machine images: though if you are reading this blog hopefully you will not be too interested in using that capability!