Reality is Wondrously Complex

From the atoms in my coffee (Mmmm Union Roast – actually now my Harveys Elizabethan Ale 😉 — typos courtesy of the latter) to the interactions of individuals that collectively define the earths eco-system, and on a smaller scale human societies; reality is wondrously complex. To cope with this we create abstractions. The abstractions we create, and an understanding of the dependencies between these abstractions allow us – to some degree – make sense of reality. In some quite old posts (2008 no less!), I investigated the relationship between abstraction and complexity; see Complexity part I  & Complexity part II. Being a physicist by training I tend to regress back to this world view whenever I’m allowed ;-). However the arguments are generic and relevant to software and IT operational ‘complexity’.

Abstraction is not the same as virtualisation. Abstraction allows us to describe the essential characteristics of entities that we care about (coffee – ‘smooth’, ale – ‘malty’); without being troubled by their internal structures. Abstractions ‘simplify’. Abstractions encapsulate complexity. Virtualisation on the other hand attempts to create an alternative reality; one that is ideally as close as possible to the physical reality which it replaces and which also underpins it. As the purpose is to replicate reality, virtualisation does not encapsulate complexity.

As many now realise, having learn’t the hard way, virtualisation does not simplify.

Yet while we interpret the natural world through a self-consistent hierarchy of highly modular structural abstractions; we have been slow to adopt the structural abstractions required to address environmental complexity and the associated challenge of maintaining these environments.

Necessity is the Mother of Invention

The economic realities we face in 2013 will, I suggest, drive some long overdue changes. Fixation with virtualisation, and/or the latest and greatest ‘Cloud’ stack will wane, and organisations will regain some perspective. The ability to virtualise resource will remain a useful tool in the arsenal of enterprise technologies; but don’t expect too much of it. Virtualisation has failed to deliver the simplicity, operational savings and business agility promised by vendor marketing messages. In all honesty, virtualisation never could. The post virtualised world is more complex and risk prone than its physical predecessor; this the trade off made for increasing resource utilisation while shying away from addressing monolithic applications which are the root cause of most of our ills.

This is why OSGi is so important. An open industry specification, OSGi directly address the fundamental issue of structural modularity; encapsulating complexity and providing a powerful dependency management mechanism. Increasing not only Java; activities with the OSGi Alliance have started to address C/C++ and other languages.

The next generation of adaptive business platform, whether classified as private or public cloud environments; will need to be modular, will need to understand modularity applications, will need to manage all the forms of run time dependency, will need to be OSGi based.

Through our contributions to the OSGi Alliance, our sponsorship of the bndtools project and the OSGi Community and UK Forum, and above all the ongoing development of the Paremus Service Fabric – the industries first OSGi based Cloud runtime; from the developer IDE to the Cloud runtime, Paremus remain committed to delivering this vision. We hope that more of you join us in 2013.

In the meantime I’d like to wish you seasonal best wishes and peace, happiness and health for the coming year.

Richard & The Paremus Team.

 

 

 

 

Why modularity matters more than virtualization.

Ten years ago it all seemed so simple! Increase utilization of existing compute resource by hosting multiple virtual machines per physical platform; so consolidating applications onto fewer physical machines. As the virtual machine ‘shields’ its hosted application from the underlying physical environment, this is achieved without changes to the application.  As applications may now move runtime location without re-configuration; the idea of  virtual machine based ‘Cloud Computing’ was inevitable.

However, there are downsides.

Virtual machine image sprawl is now a well know phrase. If the virtual machine image is the unit of deployment; any software upgrade or configuration change, no matter how small, generates a new image. With a typical size of ~1 Gbyte  (see table 2 – http://www.ssrc.ucsc.edu/Papers/ssrctr-10-01.pdf) – this soon adds up! Large virtual environments rapidly consume expensive on-line and off-line data storage resource. This in-turn has driven the use of de-duplication technologies. So increasing storage cost and / or increasing operational complexity.

Once constructed, virtual machine images must be propagated, perhaps many times across the network, to the physical hosts. Also, a small configuration change, which results in a new virtual machine image, which needs to be deployed to many nodes; can generate hundreds of Gbytes of network traffic.

When used as the unit of application deployment; virtualization increases operation complexity, and increases the consumption of expensive physical network and storage resources: both of which are ironically probably more expensive than compute resource which virtualization is attempting to optimize the use of.

We’re not finished!

  • Some categories of application simply cannot be de-coupled from the physical environment. Network latency is NOT zero, network bandwidth is NOT infinite and locality of data DOES matter.
  • Virtualization complicates and obscures runtime dependencies. If a physical node fails, which virtual machines were lost? More importantly, which services were lost, which business applications were operationally impacted? Companies are now building monitoring systems that attempt to unravel these questions: further operational band-aids!
  • Centralized VM management solutions introduce new and operationally significant points of failure.
  • As the operational complexity of virtual environments is higher than their physical predecessors; there is an increased the likelihood of catastrophic cascading failure caused by simple human error.

Feeling comfortable with your virtualization strategy?

 

 

For all these reasons, the idea of re-balancing critical production loads by dynamically migrating virtual machine images, is I suggest a popular Marketing Myth. While many analysts, software vendors, investors and end users continue to see virtualization as the ultimate silver bullet! They are, I believe, deluded.

The move to the ‘virtual enterprise’ has not been without significant cost. The move to the ‘virtual enterprise’ has not addressed fundamental IT issues. Nor will moving to public or private Cloud solutions based on virtualization.

 

 

And so the Story Evolves

Acknowledging these issues, a discernible trend has started in the Cloud Computing community. Increasingly the virtual machine image is rejected as the deployment artifact. Rather:

  • Virtual machines are used to partition physical resource.
  • Software is dynamically installed and configured.
  • In more sophisticated solutions, each resource target has a local agent which can act upon an installation command. This agent is able to:
    • Resolve runtime installation dependencies implied by the install command.
    • Download only the required software artifacts.
    • Install, configure and start required ‘services’.
  • Should subsequent re-configure or update commands be received; the agent will only download the changed software component, and / or re-configure artifacts that are already cached locally.

Sort of makes sense, doesn’t it!?

The Elephant in the Room

Dynamic deployment and configuration of software artifacts certainly makes more sense than pushing around virtual machine images. But have we actually addressed the fundamental issues that organisations face?

Not really.

As I’ve referenced on many occasions; Gartner research indicates that software maintenance dominates IT OPEX (http://www.soasymposium.com/pdf_berlin/Anne_Thomas_Manes_Proving_the.pdf). In comparison hardware costs are only ~10% of this OPEX figure.

 

 

“Our virtual cloud strategy sounds awesome: but what are the business benefits again??”

 

To put this into perspective; a large organisation’s annual IT OPEX may be ~$2 billion. Gartner’s research implies that, of this, $1.6 billion will be concerned with the management and maintenance of legacy applications. Indeed, one organization recently explained that each line of code changed in an application generated a downstream cost of >$1 million!

The issue isn’t resolved by virtualisation, nor Cloud. Indeed, software vendors, IT end users, IT investors and IT industry analysts have spent the last decade trying to optimize an increasingly insignificant part of the OPEX equation; while at the same time ignoring the elephant in the room.

 

 

Modular Systems are Maintainable Systems

If one is to address application maintainability – then modularity is THE fundamental requirement.

Luckily for organizations that are pre-dominantly Java based; help is at hand in the form of OSGi. OSGi specifications and corresponding OSGi implementations provide the industry standards upon which an organisation can being to modularise their portfolio of in-house Java applications; thereby containing the on-going cost of application maintenance. For further detail on the business benefits of OSGi based business systems; see http://www.osgi.org/wiki/uploads/Links/OSGiAndTheEnterpriseBusinessWhitepaper.pdf).

But what are the essential characteristics of a ‘modular Cloud runtime’: characteristics that will ensure a successful OSGi strategy? These may be simply deduced from the following principles:

  • The unit of maintenance and the unit of re-use are the same as the unit of deployment. Hence the unit of deployment should be the ‘bundle’.
  • Modularity reduces application maintenance for developers. However, this must not be at the expense of increasing runtime complexity for operations. The unit of operational management should be the ‘business system’.

Aren’t these requirements inconsistent? No, not if the ‘business system’ is dynamically assembled from the required ‘bundles’ at runtime. Operations: deploy, re-configure, scale and up-date ‘business systems’. The runtime internally maps these activities to the deployment and re-configuration of the required OSGi bundles.

Simple.

In addition to these essential characteristics:

  • We would still like to leverage the resource partitioning capabilities of  virtual machines. But the virtual machine image is no-longer the unit of application deployment. As the runtime dynamically manages the mapping of services to virtual and physical resources; operations need no longer be concerned with this level of detail. From an operational perspective, it is sufficient to know that the ‘business system’ is functional and meeting its SLA.
  • Finally, it takes time to achieve a modular enterprise. It would be great if the runtime supported traditional software artifacts including WAR’s, simply POJO deployments and even non-Java artifacts!

Are there any runtime solutions that have such characteristics? Yes, one: the Paremus Service Fabric. A modular Cloud runtime – designed from the ground-up using OSGi; for OSGi based ‘business systems’. The Service Fabric’s unique adaptive, agile and  self-assembling runtime behaviors minimizes operational management whilst increasing service robustness. To get you started – the Service Fabric also supports non OSGi artefacts.

A final note: even Paremus occasionally bends to IT fashion :-/ Our imminent Service Fabric 1.8 release will support deployment of virtual machine images: though if you are reading this blog hopefully you will not be too interested in using that capability!

Zen and the Art of Cloud Computing

Change is inevitable. Change is constant.

Benjamin Disraeli

I used the enclosed “Cloud Computing” slide set to summarize the Paremus position with respect to Cloud at the OSGi Cloud workshop (EclipseCon 2010 – organised by Peter Kriens).

The slides attempted to communicate the following fundamentals:

  • The Inevitability of Change
  • The strong fundamental relationship between Agility and Robustness
  • The need to Simplify through Abstraction

The implications being that:

  • Clouds’ need to be reactive runtimes; able to dynamically assemble and maintain themselves, as well as the composite business services which run upon them.
  • Modularity through OSGi is the key enabler.
View more presentations from mfrancis.

To explore these concepts further I will be publishing a series of short blog articles using the ‘Zen and the Art of Cloud Computing’ theme. Each article concerned with a specific idea, and how this is realized within the context of the Paremus Service Fabric.

Stay tuned….

Teleport or Telegraph?

If this blog entry were chiseled in stone, no currently existing technology would be capable of near instantaneous transportation of that stone. Perhaps quantum entanglement might one day provide the basis for Teleportation – yet much serious physics and engineering would be required to make this more than Science Fiction.

Yet the same information – in an binary format (Morse) – could have be transmitted across a continent at near the speed of light over a hundred years ago.

Both approaches achieve the same result – transmission of information.

Sometimes identifying the correct approach, the correct perspective, is far more important than the amount of engineering effort you throw at a problem.

Which brings me to the following article

So VMware need 2,000 people to build a resource orchestration layer? Certainly, trying to manage a resource landscape so that it appears unchanging to a population of legacy applications is extremely difficult!

The alternative?

Take a different perspective.

Build dynamic / agile applications that adapt to the changing characteristics of their operational environments.

Complexity Part II

Complexity – Part II: It all depends on the Question you ask!

I previously argued that the apparent complexity of a system varies dramatically with respect to the type of question you ask. The answer to one question may make a given system seem inordinately complex, yet ask another similar question, from a slightly different perspective, and the same system appears very simple.

Hence, it is the question that dictates where the line is drawn separating hidden and exposed system complexity.

Assume I want to deploy a set of services to an Enterprise. These services have specific runtime requirements and interdependencies. The usual question asked is…

  • What compute resources do I have, what are their individual configurations and capabilities?”

The response to which, an extensive list of resources and associated configurations/capabilities are presented, that now need analyzing. Like the positions of nodes in a lattice, the initial question, and subsequent answer, expose too much unnecessary information!

In contrast, if I ask,

  • “Out of the cloud of potential resource which may or may not exist, what sub-set resources currently satisfies the following conditions?”

The response requires no further thought. Whilst I may never know the configuration of everything, I’ll always know whether there are resources capable of servicing my stated requirements. As the response to the question is simple, and requires no effort on my part, I have no issue in re-asking the question as may times as required; this is essential, as the one thing I do know is that the environment WILL change!!

Re-visiting the lattice analogy.

Because it is simple to measure emergent macroscopic properties such a pressure, temperature and volume, it is easy to re-measure these and so deduce the relationship between them over time – e.g. Boyles Law. This would have been a significant challenge if the microscopic quantities of position, mass and velocity for each particle had been used instead!

Abstraction versus Virtualization?

Resource abstraction is different from resource virtualization. Whilst the latter attempts to represent a physical resource with a “virtual” equivalent, this equivalent emulating the attributes as the underlying entity, resource abstraction masks the complexity of the entity (physical or virtual), representing this resource via a simplified description. Resource abstraction and resource virtualization are orthogonal / complementary and interdependent.

To Conclude

  • As systems become increasingly distributed and composed of an ever increasing number of moving parts – we need to step back from attempting a microscope description of the environment, and rather describe it in terms of its emergent macroscopic characteristics.
  • We need to intelligently define the boundaries – the point at which microscopic behavior gives way to a more appropriate macroscopic view. Also don’t be surprised if several boundaries exist.
  • Dynamic service discovery / dynamic service provisioning / re-provisioning are fundamental – they are MUST HAVE core behaviors.
  • So avoid all architectures and solutions that assume a static world, comprising of fixed immutable resources at known addresses; NB including wiring systems together via static immutable middleware services! Unfortunately the vast percentage of current software solutions, and the mindsets of the engineers that built them.

Build dynamic systems, manage them with respect to their macroscopic properties and the management / complexity issue vanishes. Conversely, if runtime complexity is a serious issue – it’s about time you redesign / rebuilt your systems as no amount of traditional management software will save you.

Virtualisation: Emperors new Clothes

Paremus colleagues continuing to flag articles to me, and partially restored vigor (New Year and all that) – have conspired to overcome my Q4-07 Blogger’s Block.

I’ll start by briefly para-phrasing the recent crop of virualization articles.

Virtualization is great because…

  • You can increase data centre resource utilization.
  • You can simply restart a service on a new physically platform, should the current physical platform fail.

However there may be some dark clouds on the horizon because…

  • Resource / Dependency Management and Security are problematic
  • Operational Risk may adversely affected

Well, I’ll go to the foot of our stairs!

Isn’t it obvious that, “visible” runtime complexity is increased by current forms of virtualization. Given this, it is surely no surprise that virtualization can negatively impact manageability, OPEX and, ironically, service availability?

Increasing server utilization at the expense of increased runtime complexity seems like a poor trade; especially if you remember that complexity is proportional to the number of skilled personnel required – and so OPEX. From a recent survey by Sun Microsystems ( Sun Survey ) it would appear that many CIO’s would agree.

Meanwhile, we’re told that:

“the IT industry will develop a new generation of management tools to address manageability and security issues created by virtualization. A great opportunity for start-ups and large IT companies alike”.

Doubt this? Then check the current datacentre virtualization hype, and the number of VC funded companies in this market sector. Clearly the adage – “Identify the Pain – and sell them the Aspirin” is still in vogue with our VC friends. Unfortunately such strategies are at best simplistic; at worse, they demonstrate both the level of stupidity only achievable via a fully qualified MBA, and also the lemming behavior of the IT industry.

This time, the patient (Enterprise IT), really does need more than yet another, in a long sequence, of expensive Aspirins.

But perhaps the established IT vendors will address the problem?

Let’s see. How many established vendors after 15 years of client server computing have enterprise management frameworks that are:

  • Simple to Use
  • Cost Effective
  • Simple to deploy
  • Address simple requirements like configuration management for software, server, storage and networks.

Its been a while since I’ve been involved in this area (i.e. HP Openview, Tivoli and the like), but I suspect the answer is still the same.

So what real hope is there for extending such solutions to address the new complications posed by the service virtualization?

I’ll let the reader come to their own conclusions.

Venture Capitalists embrace Command Economy in preference to Free Market!

Venture Capitalists embrace Command Economy in preference to Free Market!

A recent article Interesting Times for Distributed DataCentresby Paul Strong (Ebay – Distinguished Research Scientist ) makes a number of interesting points:

  • For Web2.0 Services to scale, you MUST back-end these onto massively horizontally scaled processing environments.
  • Most Enterprise datacentre environments are moving towards, or could be considered as, priomordial Grid type architectures.
  • What is really missing is the Data Centre MetaOperating System – to provide the resource scheduling and management functions required.

Whilst these arguments are correct, and highlight a real need, Industry & VC response seems entirely inappropriate.

Whilst VC and major Systems Vendors are happly throwing money into expounding the virtues of loosely coupled business models enabled by Web2.0 and all things WS-SOA; somewhat perplexingly, they also continue to invest in managment / virtualization / infrastructure solutions which drive tight couplings through the infrastructure stack. Examples include data centre “virtualization” or, as per my previous blog entry on the Complexity Crisis, configuration / deployment management tools.

Hence, industry investment seems to continue to favor the technology equivalent of the “command economy” in which the next generation of distributed Grid data centre is really just one more iteration on today’s; central IT organisation control/manage and allocate IT resource in a rigid hierarchical/control command structure. The whole environment is viewed as rigid system which one centrally controls at each layer of the ISO stack; approaches that continue the futile attempt to make distributed environments behave like MainFrames!

What is actually needed is a good dose of Free Market Economics!

  • Business Services dynamically compete for available resources at each point in time,
  • Resources may come and go – as they feel fit!
  • Infrastructure and Systems look after their own interests, and optimise their behaviors to ensure overall efficency within the Business Ecosystem.

Successful next generation MetaOperating Systems, will heavily leverage such principles at the core of their architectures!

You simply cannot beat an efficient Market!

The “Hidden Costs of Virtualization”

The “Hidden Costs of Virtualization

An interesting article arguing that whereas OS virtualisation is sold on the costs savings achieved by higher CPU utilisation of existing resource, several cost factors seem to be overlooked. The most important of these being that operational costs scale with number of OS instances ; it immaterial whether these instances are OS’s real or virtual. The article also points out that commercial OS virtualization software is not cheap; around $20,000 for VMware ESX for a 4 way Intel box, though open source solutions should in due course pull this pricing down.

However, I’d suggest that there are a number of additional considerations.

OS virtualisation, in itself, does nothing to address inherent complexity issue within modern enterprise environments. Instead of sprawl of physical machines with poorly managed applications and configurations, one can now extend these complexity and management issues into a virtualized resource space! Obviously, OS virtualization management is needed, and indeed provided, as commercial products, by virtualization vendors.

In many respects are we not back to where we started? Sure, we can now drive up CPU utilisation, but the runtime infrastructure is more complex than ever. Meanwhile business applications are still as brittle, as tightly coupled, as change resistent as they ever were!

Also, is increased CPU untilisation, at the cost of increased complexity, a good trade?

Driving up CPU utilization has got to be good, right? Indeed, many CIO’s want to make dramatic OPEX cost savings by driving CPU utilisation to ~100%.

Yet, whilst running a large datacentre’s CPU resource at single figure utilisation levels is an obvious cost issue, what seems to be overlooked are the issues associated with running resources at near maximum utilisation.

Load volatility is an obvious concern. If you achieve, on average 80% utilization across your resource population, just how do you cope with peaks that require say 50% more resource? The standard response may be to outsource the extra resource requirement to a third party utility compute supplier. Yet whilst frequently discussed by the industry, I’m not aware of many over-capacity deals. Quite the contrary, early entants into the Utility Compute Market have recently dismanled facilities due to lack of commerical interest.

Yet, there is a more important issue; namely operational risk. Evidence suggests that compute resource under excessive load is statistically more likely to experience software failure (reference). Moreover, any complex tightly coupled system may suffer cascading failures ; i.e. an initial component failure cascading into a major system outage.

Hopefully, such cascading failures are the exception, however component failure within a heavily loaded environment will always be more intrusive than an for a lightly loaded equivalent, as dependent upon priority of lost service/component, other running services may need to be terminated to free sufficient resource.

Hence, Operational Risk and Data Centre Resource Utilization are issues that are fundamentally linked; linked by the sizes of the potential failure domains within the system.

Failure domains may be defined by / mapped to:

  • Physical locality / Physically shared resource.
  • Hardware type
  • Software type / version
  • Management / Security domains

Considering “physical” failure domains. The following domains usual exist:

  • A data centre facility – (complete power failure, halon release, collapse of network infrastructure)
  • A shared PDU – ( possibly affecting 25% of data centre resource – assuming critical systems like San storage and IP Newtorking – a wired into at least two PDU’s)
  • A network switch failure – perhaps impacting 50 servers if each is single homed.
  • A cabinet failure – perhaps affecting the 100 processors in that cabinet.
  • Single, physical machine.

Without spare usable capacity, re-provisioning OS instances caused by a single cabinet or even machine failure may prove challenging.

Perhaps we’re need to think about things in a slightly different way?

Conventional Wisdom:

Large datacentres are running out of space and / or limitied by environmental conciderations (power, air-conditioning requirements). Usually, there is little possibility of building a secondary large datacentre facility within the Metro area, because cost can by substantial, and suitable real-estate is not available. Hence, virtualize compute instances and maximally use what resource is already there.

The Alternative:

Instead of a single large datacentre facility, adopt a modular datacentre; these distributed over a larger geographic region. The largest failure domain that we care about is a datacentre module instance, so for ‘N’ active modular datacentres, we need ‘N+1’ to allow for complete failure of any instance. So by modularizing and virtualizing the datacentre, we actually increase the ability to use spare CPU resource per datacentre instance, without impacting operational risk.

In this respect Sun’s Black Box strategywas an interesting marketing move. I say “marketing” as in reality the hardware required to realise a modular datacentre centre is the easy bit! The difficultly lies in the unstated, but very real requirement, for a distributed/extremely robust composite application resource broker that seemlessly knits these modular resource together into one robust distributed Enterprise/Utility Service Fabric.