AntiFragile

 

The IT ‘fashion’ industry continues to drive us towards highly fragile environments. A fundamental change is needed and is long overdue.

The theme I continue to labour is that as IT becomes increasingly distributed and interconnected, we simply cannot continue down the same old software engineering path: see my recent posts including ‘Reality is Wondrously Complex’‘Cloud the Wasted Decade’‘Complex Systems and Failure’‘Why Modularity Matters more than  Virtualisation’.

9781846141560So I guess I am more than predisposed than most to agree with AntiFragile‘s. central message. Systems are either:

  • Fragile – their capabilities degrade with random change / environmental volatility.
  • Robust – their capabilities are immune to random change / environmental volatility.
  • AntiFragile – their capabilities improve courtesy of random change / environmental volatility.

Nassim Taleb (the author) correctly argues (IMO) that resource optimisation and prescriptive processes are hallmarks of: fragile systems, fragile environments and fragile organisations. Unfortunately most organisations’ current ‘private Cloud’ and / or ‘Virtualisation’ strategies are justified by – yes you guessed it – resource optimisation and prescriptive / simplistic / automation. There is every indication that the industries next ‘big’ fashion –  the ‘Software Defined Data Centre’ – will compound these errors.

To deal with the unforeseen / unforeseeable / a system needs spare resource; and as the unforeseen event is – well unforeseen – there will be no pre-determined prescriptive response.

In stark contrast AntiFragile systems incorporate a large degree of ‘optionality’; meaning the ability to exercise and re-exercise the most attractive option at each point in time.

By trying to understand why biological systems were adaptive and robust whereas software systems were / are / not; Paremus started mapping such concepts to software systems in 2003. As hoped for, this resulted in a set of basic principles which have continued to guide our efforts:

  • Robust software platforms must be based upon dynamic resource and service discovery, and subsequent rediscovery. Why? Things Change.
  • The platform should continually assess the runtime structures against the desired goal. How close is the current structure to this goal? How can the difference be further reduced / removed? Why? Things Change.
  • All runtime units must be loosely coupled. Why? Things change!
  • Finally, and most importantly. A platform should strive to be AntiFragile; to be able to harness new runtime ‘Optionality’. To achieve this the platform MUST be modular – it MUST be dynamically assembled from self-describing units – it MUST  leverage emergence and environmental feedback.

But I digress; back to the book.

Taleb rightly calls out the naivety of assuming that simple abstract mathematical models, i.e. naive reductionism, can be used to predict Complex System behaviour: this illusion helping prime both the Long Term Capital Markets meltdown in 1998 and the current – even more spectacular – banking crisis (see Fault Lines: How Hidden Fractures Still Threaten the World Economy).

Tabel also makes important observations concerning apparent causation. It is very difficult to predict which of the many adjacent possibilities may be chosen by a Complex System. An adjacent state will be ‘more fit’; but there is no guarantee that it is the ‘most fit’. However, looking backwards, a path of causality seems to exist and appears ‘obvious‘: i.e. ‘history‘ provides us with a simple narrative. Tabel also explains how history tends to be distorted (teleological interpretations) with the following example. Derivatives were traded (based on intuition / ‘gut’ ) long before the rocket scientists got involved. Yet documented history implies the inverse.

Taleb has clearly reached these insights through his own explorations; shaped by his experiences in the Financial Services industry. Yet while Taleb’s presentation is unique, the underlying themes have been previously explored in a number of different domains. In a recent post  ‘Complex Systems and Failure’ I briefly reviewed the book  ‘Adapt: Why Success Always Starts with Failure’. Here the author (Tim Harford) argues for diversity and the need for speculative experimentation and failure. Meanwhile, in ‘The Future of Money’ (2001 no less!), Bernard Lietaer argues, counter to single currency doctrines, that currency diversity is essential for achieving both stability and localised wealth creation. Stuart Kauffman in Investigations argues that Complex Systems evolve into adjacent possibilities. Kauffman suggests this behaviour explains the explosion in diversity seen in the natural biosphere; and over the last 3000 years the human economy. Meanwhile ‘Diversity and Complexity’ provides a quantitative exploration of why diversity is a fundamental requirement of robust, evolvable Systems. Finally, for those with a heavy theoretical bias – watch out for ‘Dynamics of Complex Systems: from Glasses to Evolution‘; which is due to be published this year.

I’m still reading AntiFragile, so I’ll reserve final judgement. However the concepts and arguments presented are important. There are some omissions. Taleb has not (so far) addressed the fundamental dilemma — my local AntiFragile behaviour today may result in our collective Fragile behaviour tomorrow; i.e. tragedy of the commons. To use one of Taleb’s analogies — rather than solely being concerned about winning a street fight, lets also work on the root issues that trigger such violence?!

Taleb does seem to like the street fighting analogy. Perhaps he’s a natural candidate for speaking at FITEclub in 2013 😉

UPDATE: Completed reading this evening. Contrary to the previous comment – in the closing section’s Taleb does directly address my concern – this via the ‘skin in the game‘ mechanism!

A comment w.r.t. scientific disciplines – unfortunately Physics isn’t quite the shinning beacon as portrayed. Fashions and employability are concerns even for theoretical physicists. If you want a career you were much more likely to succeed over the last decade if you work in String Theory! See Big Bang – or more seriously – The Trouble with Physics and/or Not Even Wrong). Luckily adherence to ‘String Theory’ (or ‘Loop Gravity’) is unlikely to cause the next economic collapse.

To conclude +1 for AntiFragile

Reality is Wondrously Complex

From the atoms in my coffee (Mmmm Union Roast – actually now my Harveys Elizabethan Ale 😉 — typos courtesy of the latter) to the interactions of individuals that collectively define the earths eco-system, and on a smaller scale human societies; reality is wondrously complex. To cope with this we create abstractions. The abstractions we create, and an understanding of the dependencies between these abstractions allow us – to some degree – make sense of reality. In some quite old posts (2008 no less!), I investigated the relationship between abstraction and complexity; see Complexity part I  & Complexity part II. Being a physicist by training I tend to regress back to this world view whenever I’m allowed ;-). However the arguments are generic and relevant to software and IT operational ‘complexity’.

Abstraction is not the same as virtualisation. Abstraction allows us to describe the essential characteristics of entities that we care about (coffee – ‘smooth’, ale – ‘malty’); without being troubled by their internal structures. Abstractions ‘simplify’. Abstractions encapsulate complexity. Virtualisation on the other hand attempts to create an alternative reality; one that is ideally as close as possible to the physical reality which it replaces and which also underpins it. As the purpose is to replicate reality, virtualisation does not encapsulate complexity.

As many now realise, having learn’t the hard way, virtualisation does not simplify.

Yet while we interpret the natural world through a self-consistent hierarchy of highly modular structural abstractions; we have been slow to adopt the structural abstractions required to address environmental complexity and the associated challenge of maintaining these environments.

Necessity is the Mother of Invention

The economic realities we face in 2013 will, I suggest, drive some long overdue changes. Fixation with virtualisation, and/or the latest and greatest ‘Cloud’ stack will wane, and organisations will regain some perspective. The ability to virtualise resource will remain a useful tool in the arsenal of enterprise technologies; but don’t expect too much of it. Virtualisation has failed to deliver the simplicity, operational savings and business agility promised by vendor marketing messages. In all honesty, virtualisation never could. The post virtualised world is more complex and risk prone than its physical predecessor; this the trade off made for increasing resource utilisation while shying away from addressing monolithic applications which are the root cause of most of our ills.

This is why OSGi is so important. An open industry specification, OSGi directly address the fundamental issue of structural modularity; encapsulating complexity and providing a powerful dependency management mechanism. Increasing not only Java; activities with the OSGi Alliance have started to address C/C++ and other languages.

The next generation of adaptive business platform, whether classified as private or public cloud environments; will need to be modular, will need to understand modularity applications, will need to manage all the forms of run time dependency, will need to be OSGi based.

Through our contributions to the OSGi Alliance, our sponsorship of the bndtools project and the OSGi Community and UK Forum, and above all the ongoing development of the Paremus Service Fabric – the industries first OSGi based Cloud runtime; from the developer IDE to the Cloud runtime, Paremus remain committed to delivering this vision. We hope that more of you join us in 2013.

In the meantime I’d like to wish you seasonal best wishes and peace, happiness and health for the coming year.

Richard & The Paremus Team.

 

 

 

 

Cloud: The Wasted Decade?

‘Cloud Computing’ remains the darling of the technology industry. ‘Cloud’, we are told, will reduce operational cost while increase business agility. ‘Cloud’, we are promised, will re-shape the way we think about IT services.

 

The problem is, I don’t understand how theses transformational effects are achieved? Current 1st-generation ‘Compute Clouds’ result from the synthesis of two underlying technology enablers.
  1. The adoption of coarse grained Service Orientation (WS-* or REST), allowed the decoupling of interconnected coarse grained business services. This is good.
  2. The use of the virtual machine image as the deployment artifact, allowed applications to be deployed ‘unchanged’ into alien runtime environments.
And therein lies the problem.

 

How can applications be more agile, when their internal composition remains unchanged? When they remain monolithic entities with tightly coupled constituent components? Given this, how are spiralling application maintenance costs countered? Exactly how is environmental complexity addressed? Are not 1st-generation public Cloud Compute offerings simple approaches to resource outsourcing? From a resource perspective more flexible than previous approaches; but potentially much more restrictive and ultimately dangerous as business logic is rigidly locked to sets of third party Cloud middleware API’s? Meanwhile, are 1st-generation private ‘Compute Clouds’ not just the latest marketing tag-line for traditional enterprise software deployment products?

After a decade of ‘industry innovation’ –  I fear Cloud Computing and resource Virtualization have collectively made the world more complex, our business systems more brittle and ultimately more costly to support.

In 2004 Paremus started a project to address what we considered to be the fundamental problem: how to build adaptive and robust application and runtime environments. Despite the ongoing Cambrian explosion of programming languages, Paremus did not believe the answer lay with the adoption of new or old programming language; e.g. Scala or Erlang! We also felt that the fashion to ‘Virtualize’ was probably – at best – a distraction; a position which places Paremus at odds with the rest of the industry.

 

Having a predominance of Physicists; Paremus reached to an area of scientific research known as ‘Complex Adaptive Systems’, hoping this would provide the architectural guidance we needed. It did and we distilled the following principles:
  • Change: Whether you like it or not its going to happen. Change must not be viewed as an inconvenient edge case or ‘Black Swan’ event.
  • Necessary Complexity: Systems that do interesting things are by their nature Complex.
  • Loose Coupling: While Complexity isn’t the issue, rigid structural coupling within Complex systems is fatal. Constituent parts must be loosely coupled.
  • Stigmergy: Don’t manage the details, rather set the overall objective and enable the System’s constituent components to respond appropriately: i.e Think Globally, Act locally.
  • Structural Modularity: In complex systems structural modularity always exists in a natural hierarchy.
  • Self Describing: Modular systems comprised from self-describing units (each unit describes it’s requirements and capabilities) may be dynamically assembled.
  • Diversity: Structural modularity enables diversity, which enables agility and evolvability.
  • Accidental Complexity: Structural modularity enables accidental complexity to be reduced over time.

Creating the illusion of a static environment – via ‘virtualization’ – for rigid, inflexible monolithic applications is a mistake. An easily consumed pain killer that only masked the fundamental illness.

The body of research was both compelling and conclusive.

Modularity is the key-stone. Modularity enables systems to be dynamically assembled and, if required, re-assembled from self-describing components.

Of all the languages available, only one (Java) had a sufficiently sophisticated modularity framework (OSGi) to meet our requirements. A set of open industry specifications created by the OSGi Alliance; OSGi provided many of the characteristics we required:
  • Modules (known as OSGi Bundles) are self-describing entities whose capabilities and requirements are well defined.
  • A powerful resolver capability allows OSGi based applications to be dynamically assembled during development or runtime. This not only includes bundle dependencies, but also life-cycle; and looking ahead service and environmental resource dependencies.
  • OSGi provides a well defined and industry standard approach to life-cycle and dynamic configuration.
  • Finally, OSGi provides a powerful micro-Services layer: this completing a natural structural hierarchy ( Class  ➔ Packages ➔ Bundles ➔ microServices ➔ traditional SOA ).
For these reasons Paremus adopted OSGi as a cornerstone of our product strategy in 2005. The result of these ongoing efforts is the Service Fabric: the industries first distributed OSGi runtime which enables:
  • Business applications to be dynamically assembled from re-usable components.
  • The dynamic assembly and configuration of these applications with respect to the runtime environment within which they find themselves.
  • Middleware services dynamically installed as required. No Cloud / PaaS API lock-in
  • Dynamic reconfiguration of all runtime elements (business logic, middleware and the platform itself) in response to unforeseen cascading resource failure.

 

Why now?
Whether a Financial Market crash or a paradigm shift in the IT industry; it is relatively easy to forecast that a change is inevitable; but notoriously difficult to predict precisely when this will occur. It is inevitable that next generation cloud environmentsPublic or Private will support dynamically assembly highly modular applications: these constructed from re-usable industry standard components – meaning OSGi. It is also inevitable that these environments will themselves be OSGi from the ground up. OSGi remains the only open industry standard, and so will be highly influential for those organisations that want to leverage a decade of expertise and avoid propriety vendor lock-in. The OSGi Alliance have recognised this trend and are extending specification activities to include Cloud Computing and generic Modularity and Life-Cycle management.
But is that time now?

 

For some organizations the ‘Boiling Frog’ analogy is, unfortunately, apt. Such organisations will continue to endure increasing OPEX costs; avoiding the upfront expense and perceived risk of re-orientating their IT strategy to focus on optimizing maintainability, agility and resilience. For such organisations – Cloud and PaaS strategies will consist of virtual machine management and the same old application software stack. Such organisations will see little real business benefit.

 

Increased interest in DevOps tooling (e.g. Puppet, Chief) demonstrate that other organisations have started to look beyond virtual machine centric strategies. While such DevOps tools start to address the issue of dynamic deployment and configuration; this is still with respect to monolithic applications, and so in essence no different from the IBM Tivoli and HP management tools of the 1990’s.

 

Meanwhile organisations that view technology as an enabler rather than a cost;  are starting to seriously look at development and runtime modularity. In part as a way to realise continuous release and agile development. For Java centric organisations this is driving the adoption of OSGi. One simple metric I can provide is that in the last 12 months Paremus has provided OSGi training to Government, Financial Services and Web companies in the US, Europe and Asia. In most of these organizations the relationship between OSGi, Agile Development and Cloud was a major interest.

 

Despite all our economic woes, is the time now? Quite possibly!

Packager – because OSGi > Java

If you are following the progress of Java 8, then you will be aware that Jigsaw may – yet again – fail to make the release train. There again, it might: ‘they’ seem just a little uncertain.

In stark contrast, since the appearance of JSR277 in 2005 (yes 2005!), the OSGi Alliance position has remained consistent and coherent. This position was restated yet again this week here. For those interested, several excellent supporting background articles may be found here, here and with Neil Bartlett’s usual humourous perspective here.

Paremus’ view is quite simple. For the record…

Java modularity should be based upon OSGi. Failure to do so will fracture the Java community and market and so damage to the future of the Java language.

However, this isn’t the main focus of this post: Just an interesting aside. 😉

 

 

And now for something completely different…

Paremus have been dynamically deploying OSGi based applications into ‘Cloud‘ environments for 7 years! Why would one use static virtual machine images, when your applications can be automatically assembled from re-usable components and automatically configured with respect to their runtime topology and the specifics of the runtime environment?

Surprisingly, this message has taken a while to seed; but seed it has: Paremus is nothing if not persistent. So 7 years on, and the Paremus Service Fabric represents the state-of-the-art ‘Cloud’ platform for those organisations that require something a little bit special: an adaptive, self-managing solution for their next generation of low maintenance modular OSGi based Java or Scala composite applications.

However:

  • Some popular Java software projects remain monolithic and are unlikely to embrace OSGi anytime soon: e.g. Hadoop, ZooKeeper etc.
  • Many Service Fabric customers have a large portfolio of monolithic applications.
  • Some applications are not even Java / JVM based.

So if you want Service Fabric capabilities, but have one or more of the above challenges what  are we to do?

OSGi > Java

OSGi provides a set of Modularity, Life-cycle and Service capabilities that transcend any particular language.

The journey starts with the realisation that OSGi specifications encapsulate a number of remarkably useful language neutral concepts:

  1. Metadata for describing software artefacts; including flexible declaration of the artefact’s Capabilities and Requirements [environmental resources, Services and module dependencies], and a powerful semantic versioning scheme for  module dependencies.
  2. Leveraging this metadata; powerful and extremely flexible dynamic dependency resolution capabilities.
  3. A consistent approach to life-cycle: installing, starting, configuring, stopping and uninstalling software artefacts.
  4. A flexible configuration service for configuring installed artefacts.
  5. A powerful Service model, including pluggable RPC implementations for remote services that enable the decoupling of the service interactions from the underlying implementation language. For example, the Paremus Remote Service Administration implementation provides an Avro distribution provider.
While usually implemented in the Java language; these capabilities need not be limited to Java artefacts!

Introducing Packager

So, pulling these concepts together, Paremus created Packager: Packager allows traditional software artefact to be installed, deployed and dynamically configured just like OSGi bundles.

Neil Ellis, our lead developer on the Packager project, will post a series of articles over the next few weeks that will:

  • Introduce Packager concepts with a series of examples.
  • Investigate scenarios in which Packager may be of interest in non OSGi environments.
  • Contrast Packager against some of the other software deployment approaches that are available today.

If you are pursuing a modular cloud platform strategy and see the value in a consistent approach to runtime deployment and management: from the smallest of bundles to the LARGEST OF APPLICATIONS. If you understand the pitfalls and complexities caused by reliance on static bloated virtual machine images, and if you value open industry standards; then we think you will be interested in Packager.

So Stay Tuned … 😉

Complex Systems and Failure

Tim Harford’s ADAPT was one of those spontaneous airport bookshop purchases

In summary a good read with a relevant message. Short-termism stifles true innovation. It is only by attempting novel high risk activities that we can hope to make substantive changes and ultimately succeed.

ADAPT provides some advice for putting this philosophy into practice:

  • All interesting systems (ecological, economic, social, political) are Complex.
  • ‘Complexity’ is not the issue:  Tight coupling is the issue.
  • Tight coupling propagates failure; tight coupling must be avoided.
  • Information has context. Lose the context and much of the value of the information is lost.
  • Avoid overly centralised command and control. Rather, delegate the decision making process.
  • Where possible, act locally.

Those interested in ‘Complex Adaptive Systems’ will be aware of the substantive body of background research that underpin ADAPT’s arguments.

Why Complex System Fail

And yet these principles are rarely put into practices by the software industry.

Response to Failure: A tightly-coupled system

This is perplexing as the ‘fail fast‘ mantra is not new: it just seems to have been largely ignored. While Berkley’s Recovery Oriented Computing program demonstrated these ideas almost a decade ago; we see little evidence of them being incorporated in the latest ‘Cloud’ & ‘Virtualisation’ platform offerings from the dominant software vendors. Indeed, peel beneath the marketing covers, and the usual suspects continue to pursue ‘High Availability’ or ‘Fault-Tolerant‘ approaches.

This folly of this is nicely explained by ‘How Complex System Fail‘ (University of Chicago’s Cognitive technologies Laboratory). This paper covers some of the same ground as ADAPT but  explains the problem from an IT Operational perspective. Fault-Tolerance masks component failure, and so paradoxically, such systems are more vulnerable to severe cascading or systemic failures.

When these cascading events finally do occur, Operations are the only defence: Operations pick-up the pieces!

The author (Richard I. Cook) argues a number of points; but the following two are to my mind the most important.

  • Safety is a characteristic of systems and not of their components.
  • Failure free operations require experience with failure.

Surely such fundamental principles should be at the core of modern ‘cloud’ platform runtimes? Surely failure recovery must be an integral part of any overall solution? These behaviours exercised as part of normal ongoing runtime activities and not as responses to rare Black Swan events? Finally, surely cloud environments should aim to be truly loosely coupled environments?!

Note: In my book reliance on centralised message brokers, or naive use of rigid ZooKeeper type lock services are part of the problem, not the solution.

 

Stopping Complex System from Failing?

Markov Chain Analysis of a loosely-coupled ‘Target State’ driven platform

Thanks to early exposure to adaptive SOA frameworks like Jini; Paremus developed a strong intuition with respect to the requirements for mission critical ‘cloud’ environments. To provide concrete theoretical foundations, Paremus in 2005 used Markov Chain analysis to simulate the availability of traditional HA clusters; and contrasted these to alternative architectures we internally referred to as ‘No Frame of Reference (NFoR)‘. A ‘NFoR‘ architecture  had no static control points and could continuously re-allocate software components as required.

To achieve this:

  • Component failure was visible within the runtime environment.
  • Loose coupling at all structural layers ensured that failure was effectively isolated.
  • As the architecture was extremely modular, only the smallest units need to be replaced & recovery was rapid.
  • Sophisticated  ‘Target State Driven‘ dependency management automatically replaced the failed units.

The results of the simulations where clear. A ‘No Frame of Reference‘ runtime platform embodying fail fast and automated repair and recover behaviours significantly outperformed traditional static high availability alternatives.

 

Even in the most volatile of environments with multiple failures being rapidly injected; such platforms always settled back into a functional state.
Such solutions, because of their extreme agility could also be rapidly reconfigured, shutdown and re-started by Operations.

 

 

 

Hence, our own experiences were consistent with the advice offered by ADAPT and  Richard I. Cook’s paper.

The following are fundamental requirements.

  • A high degree of structural modularity as Modular System are Maintainable Systems.
  • Loose coupling between interactive software components (locally or network distributed).
  • Loose coupling between components and the underlying resources (physical or virtual).
With the following implications:
  • Structural modularity requires powerful dependency management.
  • Resource abstraction requires sophisticated ‘Target State‘ provisioning / re-provisioning capabilities.

And for Paremus, the OSGi software modularity framework provided a compelling set of industry standards via which these capabilities might be achieved.

It is worth emphasising that such capabilities are not a function of the programming language used. Choice of language does not in itself  provide an answer: just the notation you might use to realise an answer. For this reason I see the increasing adoption of the OSGi modularity system as far more significant that recent Java developments or even the emergence of languages like Scala.

Nor is resource ‘virtualisation’ relevant in achieving this goal. Virtualisation is an orthogonal and secondary concern! If you need to partition physical resource – by all means use virtual machines. If you need to partition a data-centre pursue a SDC (Software defined Data Centre) strategy. But tread carefully! These solutions do not address your fundamental issues and risk the introduction of yet another complex tightly coupled management layer.

That’s all for today!

If you are interested in further detail on Service Fabric concepts: see Paremus Service Fabric Concepts and Terminology.

Reflections on GigaOM ‘Structure’ Event…

I’d like to start by thanking the GigaOM team once again for inviting me to participate in the panel session: ‘The Gap Between Applications and Infrastructure’. It is the first time I’ve attend a GigaOM ‘Structure’ event; and it proved a worthwhile experience.

There were many topics, but the re-occurring theme focused upon the on-going shift towards ‘Cloud Computing’:

  • Cloud ‘Business Models’: How ‘Agile’ organisations should off-loading non-critical IT.
  • SaaS Use Cases: Again selling the concept of  ‘Agile’ SaaS services for all those ‘Agile’ organisations.
  • Cloud API’s – which is more open? Which is closest to Amazon? The next great lock-in?
  • The role of the current dominant Cloud and Virtual Machine vendors and the rise of ‘Software defined Data Centres’.

All interesting topics. Yet I left the event further convinced that the  IT industry continues to be too ‘fashion driven’. Too much evangelism, too little substance, a near criminal overuse of the word ‘Agile’. Beneath the sound-bites, an absence of vision from the dominant Cloud and Virtual Machine vendors.

However there were some important gems.

The analogy between data and mass was touched upon in one panel session: this a fundament, real-world, constraint. Quite simply; the more data you amass, the more energy / cost is required to manage it and move it.  Frequently information has a time-to-live and only has value within a specific context. Hence aggregating vast amounts of data to drive centralised decision making processes is not necessarily the greatest of ideas: worth considering before embarking on your companies next ‘must have’ BigData project. For a good popular read which touches upon this area – try ‘Adapt‘.

So what does this imply for ‘BigData’ / ‘Cloud’ convergence?

  • If information can be extracted from data in flight, then do so! When possible use CEP rather than Hadoop!
  • It will frequently make sense to move processing to the data: not data to the processing.
  • Sometimes there is no alternative. A centralised approach which aggregates data from remote sources may be the only approach.
  • Analysis must be on an system by system basis. Data may flow from the Core to the Edge; or the data flow / interaction / may oscillate between core and edge.

It seems inevitable that data locality will be a primary forcing factor which will drive the shape of  next generation ‘Data Clouds‘ solutions.

A second panel session, ever so briefly, touched on the importance of system modularity. In an animated, amusing and good natured argument between three Cloud Infrastructure vendors, points were attempted to be scored via a modularity argument. One of the solutions consisted of multiple independent sub-projects rather than a monolithic whole. Unfortunately that is as far as the argument went, the fundamental importance of modularity, at all structural levels was not discussed. Meanwhile the Java vendors who were branded ‘monolithic’; didn’t realise that the most powerful modularisation framework in existence – the OSGi framework – could have provide them – if they were using it – with a devastating response to the criticism.

‘times they are a-Changin’

Back-stage conversations were more interesting. There was an increasing awareness of an imminent inflection point in the industry. Environmental complexity is rapidly increasing; this as business systems evolve towards an increasingly intertwined ecosystem of services, resources and highly modular / maintainable components. It was increasingly understood that in this new world order; runtime dependency management would be an essential enabler.

The idea that evolvable, adaptive Cloud platforms will be highly modular and must have powerful runtime dependency management. So enabling:

  • Applications to be dynamically assembled as required from fine grained components.
  • In a manner influenced by the characteristics of the local host environment
  • With middleware services provisioned as required.
Is not quite as Alien as when Paremus first started work upon this vision in 2005.

Service Fabric 1.8 release

As we’re about to release Service Fabric 1.8 the next few blogs will focus of some of the new capabilities introduced with 1.8 and we’ll also re-visit some the underlying ideas which have always underpin the Service Fabric.

Why modularity matters more than virtualization.

Ten years ago it all seemed so simple! Increase utilization of existing compute resource by hosting multiple virtual machines per physical platform; so consolidating applications onto fewer physical machines. As the virtual machine ‘shields’ its hosted application from the underlying physical environment, this is achieved without changes to the application.  As applications may now move runtime location without re-configuration; the idea of  virtual machine based ‘Cloud Computing’ was inevitable.

However, there are downsides.

Virtual machine image sprawl is now a well know phrase. If the virtual machine image is the unit of deployment; any software upgrade or configuration change, no matter how small, generates a new image. With a typical size of ~1 Gbyte  (see table 2 – http://www.ssrc.ucsc.edu/Papers/ssrctr-10-01.pdf) – this soon adds up! Large virtual environments rapidly consume expensive on-line and off-line data storage resource. This in-turn has driven the use of de-duplication technologies. So increasing storage cost and / or increasing operational complexity.

Once constructed, virtual machine images must be propagated, perhaps many times across the network, to the physical hosts. Also, a small configuration change, which results in a new virtual machine image, which needs to be deployed to many nodes; can generate hundreds of Gbytes of network traffic.

When used as the unit of application deployment; virtualization increases operation complexity, and increases the consumption of expensive physical network and storage resources: both of which are ironically probably more expensive than compute resource which virtualization is attempting to optimize the use of.

We’re not finished!

  • Some categories of application simply cannot be de-coupled from the physical environment. Network latency is NOT zero, network bandwidth is NOT infinite and locality of data DOES matter.
  • Virtualization complicates and obscures runtime dependencies. If a physical node fails, which virtual machines were lost? More importantly, which services were lost, which business applications were operationally impacted? Companies are now building monitoring systems that attempt to unravel these questions: further operational band-aids!
  • Centralized VM management solutions introduce new and operationally significant points of failure.
  • As the operational complexity of virtual environments is higher than their physical predecessors; there is an increased the likelihood of catastrophic cascading failure caused by simple human error.

Feeling comfortable with your virtualization strategy?

 

 

For all these reasons, the idea of re-balancing critical production loads by dynamically migrating virtual machine images, is I suggest a popular Marketing Myth. While many analysts, software vendors, investors and end users continue to see virtualization as the ultimate silver bullet! They are, I believe, deluded.

The move to the ‘virtual enterprise’ has not been without significant cost. The move to the ‘virtual enterprise’ has not addressed fundamental IT issues. Nor will moving to public or private Cloud solutions based on virtualization.

 

 

And so the Story Evolves

Acknowledging these issues, a discernible trend has started in the Cloud Computing community. Increasingly the virtual machine image is rejected as the deployment artifact. Rather:

  • Virtual machines are used to partition physical resource.
  • Software is dynamically installed and configured.
  • In more sophisticated solutions, each resource target has a local agent which can act upon an installation command. This agent is able to:
    • Resolve runtime installation dependencies implied by the install command.
    • Download only the required software artifacts.
    • Install, configure and start required ‘services’.
  • Should subsequent re-configure or update commands be received; the agent will only download the changed software component, and / or re-configure artifacts that are already cached locally.

Sort of makes sense, doesn’t it!?

The Elephant in the Room

Dynamic deployment and configuration of software artifacts certainly makes more sense than pushing around virtual machine images. But have we actually addressed the fundamental issues that organisations face?

Not really.

As I’ve referenced on many occasions; Gartner research indicates that software maintenance dominates IT OPEX (http://www.soasymposium.com/pdf_berlin/Anne_Thomas_Manes_Proving_the.pdf). In comparison hardware costs are only ~10% of this OPEX figure.

 

 

“Our virtual cloud strategy sounds awesome: but what are the business benefits again??”

 

To put this into perspective; a large organisation’s annual IT OPEX may be ~$2 billion. Gartner’s research implies that, of this, $1.6 billion will be concerned with the management and maintenance of legacy applications. Indeed, one organization recently explained that each line of code changed in an application generated a downstream cost of >$1 million!

The issue isn’t resolved by virtualisation, nor Cloud. Indeed, software vendors, IT end users, IT investors and IT industry analysts have spent the last decade trying to optimize an increasingly insignificant part of the OPEX equation; while at the same time ignoring the elephant in the room.

 

 

Modular Systems are Maintainable Systems

If one is to address application maintainability – then modularity is THE fundamental requirement.

Luckily for organizations that are pre-dominantly Java based; help is at hand in the form of OSGi. OSGi specifications and corresponding OSGi implementations provide the industry standards upon which an organisation can being to modularise their portfolio of in-house Java applications; thereby containing the on-going cost of application maintenance. For further detail on the business benefits of OSGi based business systems; see http://www.osgi.org/wiki/uploads/Links/OSGiAndTheEnterpriseBusinessWhitepaper.pdf).

But what are the essential characteristics of a ‘modular Cloud runtime’: characteristics that will ensure a successful OSGi strategy? These may be simply deduced from the following principles:

  • The unit of maintenance and the unit of re-use are the same as the unit of deployment. Hence the unit of deployment should be the ‘bundle’.
  • Modularity reduces application maintenance for developers. However, this must not be at the expense of increasing runtime complexity for operations. The unit of operational management should be the ‘business system’.

Aren’t these requirements inconsistent? No, not if the ‘business system’ is dynamically assembled from the required ‘bundles’ at runtime. Operations: deploy, re-configure, scale and up-date ‘business systems’. The runtime internally maps these activities to the deployment and re-configuration of the required OSGi bundles.

Simple.

In addition to these essential characteristics:

  • We would still like to leverage the resource partitioning capabilities of  virtual machines. But the virtual machine image is no-longer the unit of application deployment. As the runtime dynamically manages the mapping of services to virtual and physical resources; operations need no longer be concerned with this level of detail. From an operational perspective, it is sufficient to know that the ‘business system’ is functional and meeting its SLA.
  • Finally, it takes time to achieve a modular enterprise. It would be great if the runtime supported traditional software artifacts including WAR’s, simply POJO deployments and even non-Java artifacts!

Are there any runtime solutions that have such characteristics? Yes, one: the Paremus Service Fabric. A modular Cloud runtime – designed from the ground-up using OSGi; for OSGi based ‘business systems’. The Service Fabric’s unique adaptive, agile and  self-assembling runtime behaviors minimizes operational management whilst increasing service robustness. To get you started – the Service Fabric also supports non OSGi artefacts.

A final note: even Paremus occasionally bends to IT fashion :-/ Our imminent Service Fabric 1.8 release will support deployment of virtual machine images: though if you are reading this blog hopefully you will not be too interested in using that capability!

Breaking the Karmic Cycle

Paremus News

As you may have guessed from the syndication of Neil Bartlett’s blog; I’m very pleased to announce that Neil has joined Paremus and will be working on a number of interesting OSGi & Service Fabric based projects. Some of this work will appear in our imminent Service Fabric 1.8 release; but more on that in due course. In addition to Service Fabric and Nimble related work; Neil will be spending time advancing the open source BNDTools project and some interesting new capabilities are already in the pipe-line for the next release.

On a different note:  Whether you are a battle-hardened ‘OSGi master’, an enthusiastic ‘OSGi initiate’, or just curious; you’ll be interested in the OSGi Community Event in Darmstadt in September. Paremus will be attending this year: I’ll be presenting on Cloud and OSGi and explaining the importance of modularity and why the current generation of virtual machine based Cloud solutions have got it wrong! Should be fun and controversial – so come and heckle 😉

 

Breaking the Cycle

Whilst writing this post I came to the realisation that I was re-iterating a message I posted in 2008: see Impaled on the Horns of an OPEX Dilemma.

Perhaps,  if repeated frequently enough, the message concerning the fundamental importance of environmental modularity will seed, grow be heard over the general cacophony relating to the latest technology and software  fads.

With that hope –  once more…

As we  head towards the latest global economic downturn we see the all too predictable response from many organisations whose core business functions are heavily dependent upon technology.

The response goes something like this:

  • We have to deal with spiralling OPEX costs; the dominant component of which is ongoing application maintenance.
  • We have to do something now!
  • If we  reduce number of applications; clearly we reduce our maintenance burden!
  • We can also reduce headcount by pairing down local teams and off-shoring support.

These slash and burn programmes are subsequently implemented with varying degrees of success; BUT IN ALL CASES the remaining application portfolio is: just as difficult to manage; just as difficult to bug fix; just as difficult to functionally enhance; as ever. Perhaps more so as during the cost cutting exercise, with each round of redundancies, critical knowledge concerning the environment flows away from the organisation.

Business Agility and Service Availability Suffer

And then we have the economic upturn!

The business demand new functionality – NOW! The IT department cannot adapt or change the existing business systems: they are too monolithic, too high risk and core expertise has been lost. And so new generation of applications are created to meet the new business requirements. Developer and Operations headcount swell. The business systems are ever more complex, less flexible and OPEX is even higher; yet the business don’t care: they are making profit!

That is until the next down-turn.

Repeat this short term cyclic behaviours for ~15 to 20 years and you’ll end up with a fundamentally broken organisation.

Maintainable Systems are Modular Systems

There is a better way!

  • Acceptance: Realise that agile and cost effect environments take time to create; and will require some fundamental changes.
  • Enlightenment: Ignore industry fads (cloud, virtualisation, programming language of the day); maintainability is only achieved through ensuring modularity and agility at each layer of your environment.
  • If your business systems are Java based – then you are in luck. OSGi technology provides – if not the elusive silver bullet – the next best thing; an industry backed standards framework for modularity from which one can begin to realise these objectives.
  • In addition to these industry standards – you are going to need some help – so call us!
  • You need to start sometime; so may as well be today!

It many be unpalatable – but if organisations had implemented a long term modularisation strategy in 2008; those same organisations would be well placed; realising substantive and sustainable OPEX savings today.

Wonder if I’ll be repeating the same message in 2015?

EclipseCon 2011

The recent EclipseCon 2011 was the 4th consecutive EclipseCon conference in Santa Clara that we at Paremus have attended; and from my perspective, was the most exciting yet. One need look no further than Peter Krien’s Introduction to OSGi session to realise that the Eclipse community’s interest in OSGi continues its rapid growth.

Despite the thundering music (Thus Spake Zarathustra) from the CDO folks next door, which some assumed must be associated with my presentation, my talk on Cloud & OSGi was well attended and seemed to be well received; at least by the individuals that subsequently approached me to discuss this area in some depth.

The OSGi Alliance BoF was well attended with some interesting updates on the 4.3 release and ongoing work in the EEG group; the session concluded on an amusing note with some evil OSGi puzzles concocted by Peter Kriens and BJ Hargrave.

Whilst it was a shame that Neil Bartlett wasn’t able to attend this year, it was great to see Peter Kriens and David Savage rolling up their sleeves (metaphorically speaking) and walking interested parties through BNDtools and SIGIL tooling capabilities; explaining how these features will be combined in the very near future to create a powerful OSGi tooling solution.

For those of you that are interested, copies of the Paremus presentations (including the screen casts), will be posted here in the near future.

 

OSGi? With Nimble? Yes Please!

You’ll need more than logic to persuade people of your case

For believers in rationality, the modern world is often a frustrating and bewildering place.

New Scientist – 10 November 2010

Never-the-less, we keep trying.

Me.

Yesterday Paremus  jointly announced with MakeWave the ‘Nimble Distribution‘. I’d like to take this opportunity to thank both Paremus and MakeWave teams for all the effort and late nights that have gone into making this happen.

Paremus will continue to actively develop Nimble capabilities throughout 2011; with remote services being one of the areas that will receive ongoing attention. To track these developments consider signing up to the Nimble Forum. This will be on-air over the Christmas holidays. All feedback and suggestions are most welcome, so don’t be shy.

However, perhaps of  greater importance, is the commercial aspect of the Nimble announcement.

For those organisations that understand:

  • The medium term efficiencies and transformative business value gained from modularisation and dynamic system assembly.
  • Their business requirements cannot be met by a WAR file deployed to Tomcat (Sigh).
  • The necessity of strong industry based standards (OSGi) shepherded by a strong democratic standards body (OSGi Alliance).
  • The value of high quality product ready implementations and high quality support.

We hope that Paremus / MakeWave announcement provides a compelling proposition. A high quality, elegant, agile and operationally simple OSGi runtime, bundled with high quality commercial support, tailored for the most demanding of business requirements and environments.

Finally, complementing our ongoing Nimble & Service Fabric activities; Paremus will be working closely with Neil Bartlett and other BNDtools contributors through 2011 to ensure that OSGi has the highest quality tooling support possible.

In the meantime…

Seasonal Greetings & a Nimble 2011 to you all 😉

Richard