Agility and Structural Modularity – part II

In this second Agility and Structural Modularity post we explore the importance of OSGi™; the central role that OSGi plays in realising Java™ structural modularity and the natural synergy between OSGi and the aims of popular Agile methodologies.

But we are already Modular!

Most developers appreciate that applications should be modular. However, whereas the need for logical modularity was rapidly embraced in the early years of Object Orientated programming (see http://en.wikipedia.org/wiki/Design_Patterns), it has taken significantly longer for the software industry to appreciate the importance of structural modularity; especially the fundamental importance of structural modularity with respect to increasing application maintainability and controlling / reducing  environmental complexity.

Just a Bunch of JARs

In Java Application Architecture, Kirk Knoernschild explores structural modularity and develops a set of best practice structural design patterns. As Knoernschild explains, no modularity framework is required to develop in a modular fashion; for Java the JAR is sufficient.cover-small-229x300

Indeed, it is not uncommon for ‘Agile’ development teams to break an application into a number of smaller JAR’s as the code-base grows. As JAR artifacts increase in size, they are broken down into collections of smaller JAR’s. From a code perspective, especially if Knoernschild’s structural design patterns have been followed, one would correctly conclude that – at one structural layer – the application is modular.

But is it ‘Agile’ ?

From the perspective of the team that created the application, and who are subsequently responsible for its on-going maintenance, the application is more Agile. The team understand the dependencies and the impact of change. However, this knowledge is not explicitly associated with the components. Should team members leave the company, the application and the business are immediately compromised. Also, for a third party (e.g. a different team within the same organisation), the application may as well have remained a monolithic code-base.

While the application has one layer of structural modularity – it is not self-describing. The metadata that describes the inter-relationship between the components is absent; the resultant business system is intrinsically fragile.

What about Maven?

Maven artifacts (Project Object Model – POM) also express dependencies between components. These dependencies are expressed in-terms of the component names.

A Maven based modular application can be simply assembled by any third party. However, as we already know from the first post in this series, the value of name based dependencies is severely limited. As the dependencies between the components are not expressed in terms of Requirements and Capabilities,  third parties are unable to deduce why the dependencies exist and what might be substitutable.

It is debatable whether Maven makes any additional tangible contribution to our goal of application Agility.

The need for OSGi

As Knoernschild demonstrates in his book Java Application Architecture, once structural modularity is achieved, it is trivially easy to move to OSGi – the modularity standard for Java. 

Not only does OSGi help us enforce structural modularity, it provides the necessary metadata to ensure that the Modular Structures we create are also Agile structures

OSGi expresses dependencies in terms of Requirements and Capabilities. It is therefore immediately apparent to a third party which components may be interchanged. As OSGi also uses semantic versioning, it is immediately apparent to a third party whether a change to a component is potentially a breaking change.

OSGi also has a key part to play with respect to structural hierarchy.

At one end of the modularity spectrum we have Service Oriented Architectures, at  the other end of the spectrum we have Java Packages and Classes. However, as explained by Knoernschild, essential layers are missing between these two extremes.

diag6

Figure 1: Structural Hierarchy: The Missing Middle (Kirk Knoernschild – 2012).

The problem, this missing middle, is directly addressed by OSGi.

diag7

Figure 2: Structural Hierarchy: OSGi Services and Bundles

As explained by Knoernschild the modularity layers provided by OSGi address a number of critical considerations:

  • Code Re-Use: Via the concept of the OSGi Bundle, OSGi enables code re-use.
  • Unit of Intra / Inter Process Re-Use: OSGi Services are light-weight Services that are able to dynamically find and bind to each other. OSGi Services may be collocated within the same JVM, or via use of an implementation of OSGi’s remote service specification, distributed across JVM’s separated by a network. Coarse grained business applications may be composed from a number of finer grained OSGi Services.
  • Unit of Deployment: OSGi bundles provide the basis for a natural unit of deployment, update & patch.
  • Unit of Composition: OSGi bundles and Services are essential elements in the composition hierarchy.

Hence OSGi bundles and services, backed by OSGi Alliance’s open specifications, provide Java with essential – and previously missing – layers of structural modularity. In principle, OSGi technologies enable Java based business systems to be ‘Agile – All the Way Down!’.

As we will now see, the OSGi structures (bundles and services) map well to, and help enable, popular Agile Methodologies.

Embracing Agile

The Agile Movement focuses on the ‘Processes’ required to achieve Agile product development and delivery. While a spectrum of Lean & Agile methodologies exist, each tends to be; a variant of, a blend of, or an extension to, the two best known methodologies; namely Scrum and Kanbanhttp://en.wikipedia.org/wiki/Lean_software_development.

To be effective each of these approaches requires some degree of structural modularity.

Scrum

Customers change their minds. Scrum acknowledges the existence of ‘requirement churn’ and adopts an empirical (http://en.wikipedia.org/wiki/Empirical) approach to software delivery. Accepting that the problem cannot be fully understood or defined up front. Scrum’s focus is instead on maximising the team’s ability to deliver quickly and respond to emerging requirements.

Scrum is an iterative and incremental process, with the ‘Sprint’ being the basic unit of development. Each Sprint is a “time-boxed” (http://en.wikipedia.org/wiki/Timeboxing) effort, i.e. it is restricted to a specific duration. The duration is fixed in advance for each Sprint and is normally between one week and one month. A Sprint is preceded by a planning meeting, where the tasks for the Sprint are identified and an estimated commitment for the Sprint goal is made. This is followed by a review or retrospective meeting, where the progress is reviewed and lessons for the next Sprint are identified.

During each Sprint, the team creates finished portions of a product. The set of features that go into a Sprint come from the product backlog, which is an ordered list of requirements (http://en.wikipedia.org/wiki/Requirement).

Scrum attempts to encourage the creation of self-organizing teams, typically by co-location of all team members, and verbal communication between all team members.

Kanban

‘Kanban’ originates from the Japanese word “signboard” and traces back to Toyota, the Japanese automobile manufacturer in the late 1940’s ( see http://en.wikipedia.org/wiki/Kanban ). Kanban encourages teams to have a shared understanding of work, workflow, process, and risk; so enabling the team to build a shared comprehension of a problems and suggest improvements which can be agreed by consensus.

From the perspective of structural modularity, Kanban’s focus on work-in-progress (WIP), limited pull and feedback are probably the most interesting aspects of the methodology:

  1. Work-In-Process (WIP) should be limited at each step of a multi-stage workflow. Work items are “pulled” to the next stage only when there is sufficient capacity within the local WIP limit.
  2. The flow of work through each workflow stage is monitored, measured and reported. By actively managing ‘flow’, the positive or negative impact of continuous, incremental and evolutionary changes to a System can be evaluated.

Hence Kanban encourages small continuous, incremental and evolutionary changes. As the degree of structural modularity increases, pull based flow rates also increase while each smaller artifact spends correspondingly less time in a WIP state.

 

An Agile Maturity Model

Both Scrum and Kanban’s objectives become easier to realize as the level of structural modularity increases. Fashioned after the Capability Maturity Model (see http://en.wikipedia.org/wiki/Capability_Maturity_Model – which allows organisations or projects to measure the improvements on a software development process), the Modularity Maturity Model is an attempt to describe how far along the modularity path an organisation or project might be; this proposed by Dr Graham Charters at the OSGi Community Event 2011. We now extend this concept further, mapping an organisation’s level of Modularity Maturity to its Agility.

Keeping in step with the Modularity Maturity Model we refer to the following six levels.

Ad Hoc – No formal modularity exists. Dependencies are unknown. Java applications have no, or limited, structure. In such environments it is likely that Agile Management Processes will fail to realise business objectives.

Modules – Instead of classes (or JARs of classes), named modules are used with explicit versioning. Dependencies are expressed in terms of module identity (including version). Maven, Ivy and RPM are examples of modularity solutions where dependencies are managed by versioned identities. Organizations will usually have some form of artifact repository; however the value is compromised by the fact that the artifacts are not self-describing in terms of their Capabilities and Requirements.

This level of modularity is perhaps typical for many of today’s in-house development teams. Agile processes such are Scrum are possible, and do deliver some business benefit. However ultimately the effectiveness & scalability of the Scrum management processes remain limited by deficiencies in structural modularity; for example Requirements and Capabilities between the Modules usually being verbally communicated. The ability to realize Continuous Integration (CI) is again limited by ill-defined structural dependencies.

Modularity – Module identity is not the same as true modularity. As we’ve seen Module dependencies should be expressed via contracts (i.e. Capabilities and Requirements), not via artifact names. At this point, dependency resolution of Capabilities and Requirements becomes the basis of a dynamic software construction mechanism. At this level of structural modularity dependencies will also be semantically versioned.

With the adoption of a modularity framework like OSGi the scalability issues associated with the Scrum process are addressed. By enforcing encapsulation and defining dependencies in terms of Capabilities and Requirements, OSGi enables many small development teams to efficiently work independently and in parallel. The efficiency of Scrum management processes correspondingly increases. Sprints can be clearly associated with one or more well defined structural entities i.e. development or refactoring of OSGi bundles. Meanwhile Semantic versioning enables the impact of refactoring is efficiently communicated across team boundaries. As the OSGi bundle provides strong modularity and isolation, parallel teams can safely Sprint on different structural areas of the same application.

Services – Services-based collaboration hides the construction details of services from the users of those services; so allowing clients to be decoupled from the implementations of the providers. Hence, Services encourage loose-coupling. OSGi Services‘ dynamic find and bind behaviours directly enable loose-coupling, enabling the dynamic formation, or assembly of, composite applications. Perhaps of greater import, Services are the basis upon which runtime Agility may be realised; including rapid enhancements to business functionality, or automatic adaption to environmental changes.

Having achieved this level of structural modularity an organization may simply and naturally apply Kanban principles and achieve the objective of Continuous Integration.

Devolution – Artifact ownership is devolved to modularity-aware repositories which encourage collaboration and enable governance. Assets may selected on their stated Capabilities. Advantages include:

  • Greater awareness of existing modules
  • Reduced duplication and increased quality
  • Collaboration and empowerment
  • Quality and operational control

As software artifacts are described in terms of a coherent set of Requirements and Capabilities, developers can communicate changes (breaking and non-breaking) to third parties through the use of semantic versioning. Devolution allows development teams to rapidly find third-party artifacts that meet their Requirements. Hence Devolution enables significantly flexibility with respect to how artifacts are created, allowing distributed parties to interact in a more effective and efficient manner. Artifacts may be produced by other teams within the same organization, or consumed from external third parties. The Devolution stage promotes code re-use and efficient, low risk, out-sourcing, crowd-sourcing, in-sources of the artifact creation process.

Dynamism This level builds upon Modularity, Services & Devolution and is the culminatation of our Agile journey.

  • Business applications are rapidly assembled from modular components.
  • As strong structural modularity is enforced (isolation by the OSGi bundle boundary),  components may be efficiently and effectively created and maintained by a number of small – on-shore, near-shore or off-shore developement teams.
  • As each application is self-describing, even the most sophisticated of business systems is simple to understand, to maintain, to enhance.
  • As semantic versioning is used; the impact of change is efficiently communicated to all interested parties, including Governance & Change Control processes.
  • Software fixes may be hot-deployed into production – without the need to restart the business system.
  • Application capabilities may be rapidly extended applied, also without needing to restart the business system.

Finally, as the dynamic assembly process is aware of the Capabilities of the hosting runtime environment, application structure and behavior may automatically adapt to location; allowing transparent deployment and optimization for public Cloud or traditional private datacentre environments.

diag8

Figure 3: Modularity Maturity Model

An organization’s Modularisation Migration strategy will be defined by the approach taken to traversing these Modularity levels. Mosts organizations will have already moved from an initial Ad- Hoc phase to Modules. Meanwhile organizations that value a high degree of Agility will wish to reach the endpoint; i.e. Dynamism. Each organisation may traverse from Modules to Dynamism via several paths; adapting migration strategy as necessary.

  • To achieve maximum benefit as soon as possible; an organization may choose to move directly to Modularity by refactor the existing code base into OSGi bundles. The benefits of Devolution and Services naturally follow. This is also the obvious strategy for new greenfield applications.
  • For legacy applications an alternative may be to pursue a Services first approach; first expressing coarse grained software components as OSGi Services; then driving code level modularity (i.e. OSGi bundles) on a Service by Service basis. This approach may be easier to initiate within large organizations with extensive legacy environments.
  • Finally, one might move first to limited Devolution by adoption OSGi metadata for existing artifacts. Adoption of Requirements and Capabilities, and the use of semantic versioning, will clarify the existing structure and impact of change to third parties. While structural modularity has not increased, the move to Devolution positions the organisation for subsequent migration to the Modularity and Services levels.

diverse set of choices and the ability to pursue these choices as appropriate, is exactly what one would hope for, expect from, an increasingly Agile environment!

Agility and Structural Modularity – part I

Introduction

Agile development methodologies are increasingly popular. Yet most ‘Agile’ experts and analysts discuss agility in isolation.  This oversight is surprising given that ‘Agility’ is an emergent characteristic; this meaning a property of the underlying entity. For an entity to be ‘Agile’ it must have a high degree of structural modularity.

Perhaps as a result of this, many organisations attempt to invest in ‘Agile’ processes without ever considering the structure of their applications. Alongside the question, ‘How might one realise an Agile system?’, one must also ask, ‘How might one build systems with high degrees of structural modularity?’.

We start this series of blog articles by exploring the relationship between structural modularity and agility.

 

Structure, Modularity & Agility 

Business Managers and Application Developers face many of the same fundamental challenges. Whether a business, or a software application serving a business, the entity must be cost effective to create and maintain. If the entity is to endure, it must also be able to rapidly adapt to unforeseen changes in a cost effective manner.

If we hope to effectively manage a System, we must first understand the System. Once we understand a System, manageable Change and directed Evolution are possible.

Yet we do not need to understand all of the fundamental constituents of the System; we only need to understand the relevant attributes and behaviors for the level of the hierarchy we are responsible for managing.

Services should be Opaque

From an external perspective, we are interested in the exposed behavior; the type of Service provided, and the properties of that Service. For example is the Service reliable? Is it competitively priced relative to alternative options?

 

diag1

Figure 1: A consumer of a Service.

As a consumer of the Service I have no interest in how these characteristics are achieved. I am only interested in the advertised Capabilities, which may or may not meet my Requirements.

To Manage I need to understand Structure

Unlike the consumer, the implementation of the Service is of fundamental importance to the Service provider. To achieve an understanding, we create a conceptual model by breaking the System responsible for providing the Service into a set of smaller interconnected pieces. This graph of components may represent an ‘Organization Chart’ , if the entity is a business, or a mapping of the components used,  if the entity is a software application.

A first simple attempt to understand our abstract System is shown below.

 

diag2Figure 2: The Service provider / System Maintainer

From this simple representation we immediately know the following:

  • The System is composed of 15 Components.
  • The names of the Components.
  • The dependencies that exist between these Components; though we don’t know why those dependencies exist.
  • While we do not know the responsibilities of the individual Components; from the degree on inter-connectedness, we can infer that component ‘Tom’ is probably more important than ‘Dick’.

It is important to note that, we may not have created these Components, we may have no understanding of their internal construction. Just as the consumers of our Service are interested in the Capabilities offered, we, as a consumer of these components, simply Require their Capabilities.

Requirements & Capabilities

At present, we have no idea why the dependencies exist between the Components, just that those dependencies exist. Also, this is a time independent view. What about change over time?

One might initially resort to using versions or version ranges with the named entities; changes in the structure indicated by version changes on the constituents. However, as shown in figure 3, versioned names, while indicating change, fail to explain why Susan 1.0 can work with Tom 2.1, but Susan 2.0 cannot!

Why is this?

diag3

Figure 3: How do we track structural change over time? The earlier System functioned correctly; the later System – with an upgraded Component – fails. Why is this?

It is only when we look at the Capabilities and Requirements of the entities involved that we understand the issue. Tom 2.1 Requires a Manager Capability, a capability that can be provided by Susan 1.0. However, at the later point in time  Susan 2.0, having reflected upon her career, decided to retrain. Susan 2.0  now no longer advertises a Manager Capability, but instead advertises a  Plumber 1.0 Capability.

This simple illustration demonstrates that dependencies need to be expressed in terms of Requirements and Capabilities of the participating entities and not their names.

These descriptions should also be intrinsic to the entities; i.e. components should be self-describing.

diag4

Figure 4: An Organizational Structure: Defined in terms of Capabilities & Requirements with the use of Semantic versioning.

As shown, we can completely describe the System in terms of Requirements and Capabilities, without referencing specific named entities.

Evolution and the role of Semantic Versioning

Capabilities and Requirements are now the primary means via which we understand the structure of our System. However we are still left with the problem of understanding change over time.

  • In an organization chart; to what degree are the dependencies still valid if an employee is promoted (Capabilities enhanced)?
  • In a graph of interconnected software components; to what degree are the dependencies still valid if we refactor one of the components (changing / not changing a public interface)?

By applying simple versioning we can see that changes have occurred; however we do not understand the impact of these changes. However, if instead of simple versioning, semantic versioning is used (see http://www.osgi.org/wiki/uploads/Links/SemanticVersioning.pdf), the potential impact of a change can be communicated.

This is achieved in the following manner:

  • Capabilities are versioned with a major.minor.micro versioning scheme. In addition, we collectively agree that – minor or micro version changes represent non-breaking changes; e.g.  2.7.1 2.8.7. In contrast major version changes; e.g.  2.7.1. 3.0.0. represent breaking changes which may affect the users of our component.
  • Requirements are now specified in terms of a range of acceptable Capabilities. Square brackets ‘[‘ and ‘]‘ are used to indicate inclusive and parentheses ‘(‘ and ‘)‘ to indicate exclusive. Hence a range [2.7.1, 3.0.0) means any Capability with version at or above  2.7.1 is acceptable up to, but not including 3.0.0.

Using this approach we can see that if Joe is substituted for Helen, Tom’s Requirements are still met. However Harry, while having a Manager Capability, cannot meet Tom’s Requirements as Harry’s 1.7 skill set is outside of the  acceptable range for Tom i.e. [2,3).

Via the use of semantic versioning the impact of change can be communicated. Used in conjunction with Requirements and Capabilities we now have sufficient information to be able to substitute components while ensuring that all the structural dependencies continue to be met.

Our job is almost done. Our simple System is Agile & Maintainable!

 

Agile – All the Way Down

The final challenge concerns complexity. What happens when the size and sophistication of the System increases? An increased number of components and a large increase in inter-dependencies? The reader having already noticed a degree of self-similarity arising in the previous examples may have already guessed the answer.

The Consumer of our Service selected our Service because the advertised Capabilities met the consumers Requirements (see figure 1). The implementation of the System which provided this Service is masked from the consumer. This pattern is once again repeated one layer down. The System’s structure is itself described in-terms of the Capabilities and Requirements of the participating components (see figure 4). This time, the internal structure of the components are masked from the System. As shown in figure 5; this pattern may be re-repeated at many logical layers.

diag5

Figure 5: An Agile Hierarchy: Each layer only exposes the necessary information. Each layer is composite with the dependencies between the participating components expressed in-terms of their Requirements and Capabilities.

All truly Agile systems are built this way, consisting of a hierarchy of structural layers. Within each structural layer the components are self-describing: self-describing in terms of information relevant to that layer, with unnecessary detail from lower layers masked.

This pattern is repeated again and again throughout natural and man-made systems. Natural ecosystems build massive structures from nested hierarchies of modular components:

  • The Organism
  • The Organ
  • The Tissue
  • The Cell

For good reason, commercial organizations attempt the same structures:

  • The Organization
  • The Division
  • The Team
  • The Individual

Hence we might expect a complex Agile software systems to also mirror these best practices:

  • The Business Service
  • Coarse grained business components.
  • Fine grained micro-Services.
  • Code level modularity.

This process started in the mid/late 1990‘s as organizations started to adopt coarse grain modularity as embodied by Service Oriented Architectures (SOA) and Enterprise Service Buses (ESB’s). These approaches allowed business applications to be loosely coupled; interacting via well defined service interfaces or message types. SOA advocates promised more ‘Agile’ IT environments as business systems would be easier to upgrade and/or replace.

However, in many cases the core applications never actually changed. Rather the existing application interfaces were simply exposed as SOA Services. When viewed in this light it is not surprising that SOA failed to deliver the promised cost savings and business agility: http://apsblog.burtongroup.com/2009/01/soa-is-dead-long-live-services.html.

Because of the lack of internal modularity, each post-SOA application was as inflexible as its pre-SOA predecessor.

 

To be Agile?

We conclude this section with a brief summary of the arguments developed so far.

To be ‘Agile’ a System will exhibit the following characteristics:

  • A Hierarchical Structure: The System will be hierarchical. Each layer composed from components from the next lower layer.
  • Isolation: For each structural layer; strong isolation will ensure that the internal composition of each participating component will be masked.
  • Abstraction: For each layer; the behavior of participating components is exposed via stated Requirements and Capabilities.
  • Self-Describing: Within each layer the relationship between the participating components will be self-describing; i.e. dependencies will be defined in terms of published Requirements and Capabilities.
  • Impact of Change: Via semantic versioning the impact of a change on dependencies can be expressed.

Systems built upon these principles are:

  • Understandable: The System’s structure may be understood at each layer in the structural hierarchy.
  • Adaptable: At each layer in the hierarchy, structural modularity ensures that changes remains localized to the affect components; the boundaries created by strong structural modularity shielding the rest of the System from these changes.
  • Evolvable: Components within each layer may be substituted; the System supports diversity and is therefore evolvable.

The System achieves Agility through structural modularity.

In the next post in this series we will discover how OSGi™ – the Java™ Modularity framework – meets the requirements of structure modularity, and thereby provides the necessary foundations for popular Agile Methodologies and ultimately, Agile businesses.

AntiFragile

 

The IT ‘fashion’ industry continues to drive us towards highly fragile environments. A fundamental change is needed and is long overdue.

The theme I continue to labour is that as IT becomes increasingly distributed and interconnected, we simply cannot continue down the same old software engineering path: see my recent posts including ‘Reality is Wondrously Complex’‘Cloud the Wasted Decade’‘Complex Systems and Failure’‘Why Modularity Matters more than  Virtualisation’.

9781846141560So I guess I am more than predisposed than most to agree with AntiFragile‘s. central message. Systems are either:

  • Fragile – their capabilities degrade with random change / environmental volatility.
  • Robust – their capabilities are immune to random change / environmental volatility.
  • AntiFragile – their capabilities improve courtesy of random change / environmental volatility.

Nassim Taleb (the author) correctly argues (IMO) that resource optimisation and prescriptive processes are hallmarks of: fragile systems, fragile environments and fragile organisations. Unfortunately most organisations’ current ‘private Cloud’ and / or ‘Virtualisation’ strategies are justified by – yes you guessed it – resource optimisation and prescriptive / simplistic / automation. There is every indication that the industries next ‘big’ fashion –  the ‘Software Defined Data Centre’ – will compound these errors.

To deal with the unforeseen / unforeseeable / a system needs spare resource; and as the unforeseen event is – well unforeseen – there will be no pre-determined prescriptive response.

In stark contrast AntiFragile systems incorporate a large degree of ‘optionality’; meaning the ability to exercise and re-exercise the most attractive option at each point in time.

By trying to understand why biological systems were adaptive and robust whereas software systems were / are / not; Paremus started mapping such concepts to software systems in 2003. As hoped for, this resulted in a set of basic principles which have continued to guide our efforts:

  • Robust software platforms must be based upon dynamic resource and service discovery, and subsequent rediscovery. Why? Things Change.
  • The platform should continually assess the runtime structures against the desired goal. How close is the current structure to this goal? How can the difference be further reduced / removed? Why? Things Change.
  • All runtime units must be loosely coupled. Why? Things change!
  • Finally, and most importantly. A platform should strive to be AntiFragile; to be able to harness new runtime ‘Optionality’. To achieve this the platform MUST be modular – it MUST be dynamically assembled from self-describing units – it MUST  leverage emergence and environmental feedback.

But I digress; back to the book.

Taleb rightly calls out the naivety of assuming that simple abstract mathematical models, i.e. naive reductionism, can be used to predict Complex System behaviour: this illusion helping prime both the Long Term Capital Markets meltdown in 1998 and the current – even more spectacular – banking crisis (see Fault Lines: How Hidden Fractures Still Threaten the World Economy).

Tabel also makes important observations concerning apparent causation. It is very difficult to predict which of the many adjacent possibilities may be chosen by a Complex System. An adjacent state will be ‘more fit’; but there is no guarantee that it is the ‘most fit’. However, looking backwards, a path of causality seems to exist and appears ‘obvious‘: i.e. ‘history‘ provides us with a simple narrative. Tabel also explains how history tends to be distorted (teleological interpretations) with the following example. Derivatives were traded (based on intuition / ‘gut’ ) long before the rocket scientists got involved. Yet documented history implies the inverse.

Taleb has clearly reached these insights through his own explorations; shaped by his experiences in the Financial Services industry. Yet while Taleb’s presentation is unique, the underlying themes have been previously explored in a number of different domains. In a recent post  ‘Complex Systems and Failure’ I briefly reviewed the book  ‘Adapt: Why Success Always Starts with Failure’. Here the author (Tim Harford) argues for diversity and the need for speculative experimentation and failure. Meanwhile, in ‘The Future of Money’ (2001 no less!), Bernard Lietaer argues, counter to single currency doctrines, that currency diversity is essential for achieving both stability and localised wealth creation. Stuart Kauffman in Investigations argues that Complex Systems evolve into adjacent possibilities. Kauffman suggests this behaviour explains the explosion in diversity seen in the natural biosphere; and over the last 3000 years the human economy. Meanwhile ‘Diversity and Complexity’ provides a quantitative exploration of why diversity is a fundamental requirement of robust, evolvable Systems. Finally, for those with a heavy theoretical bias – watch out for ‘Dynamics of Complex Systems: from Glasses to Evolution‘; which is due to be published this year.

I’m still reading AntiFragile, so I’ll reserve final judgement. However the concepts and arguments presented are important. There are some omissions. Taleb has not (so far) addressed the fundamental dilemma — my local AntiFragile behaviour today may result in our collective Fragile behaviour tomorrow; i.e. tragedy of the commons. To use one of Taleb’s analogies — rather than solely being concerned about winning a street fight, lets also work on the root issues that trigger such violence?!

Taleb does seem to like the street fighting analogy. Perhaps he’s a natural candidate for speaking at FITEclub in 2013 😉

UPDATE: Completed reading this evening. Contrary to the previous comment – in the closing section’s Taleb does directly address my concern – this via the ‘skin in the game‘ mechanism!

A comment w.r.t. scientific disciplines – unfortunately Physics isn’t quite the shinning beacon as portrayed. Fashions and employability are concerns even for theoretical physicists. If you want a career you were much more likely to succeed over the last decade if you work in String Theory! See Big Bang – or more seriously – The Trouble with Physics and/or Not Even Wrong). Luckily adherence to ‘String Theory’ (or ‘Loop Gravity’) is unlikely to cause the next economic collapse.

To conclude +1 for AntiFragile

Reality is Wondrously Complex

From the atoms in my coffee (Mmmm Union Roast – actually now my Harveys Elizabethan Ale 😉 — typos courtesy of the latter) to the interactions of individuals that collectively define the earths eco-system, and on a smaller scale human societies; reality is wondrously complex. To cope with this we create abstractions. The abstractions we create, and an understanding of the dependencies between these abstractions allow us – to some degree – make sense of reality. In some quite old posts (2008 no less!), I investigated the relationship between abstraction and complexity; see Complexity part I  & Complexity part II. Being a physicist by training I tend to regress back to this world view whenever I’m allowed ;-). However the arguments are generic and relevant to software and IT operational ‘complexity’.

Abstraction is not the same as virtualisation. Abstraction allows us to describe the essential characteristics of entities that we care about (coffee – ‘smooth’, ale – ‘malty’); without being troubled by their internal structures. Abstractions ‘simplify’. Abstractions encapsulate complexity. Virtualisation on the other hand attempts to create an alternative reality; one that is ideally as close as possible to the physical reality which it replaces and which also underpins it. As the purpose is to replicate reality, virtualisation does not encapsulate complexity.

As many now realise, having learn’t the hard way, virtualisation does not simplify.

Yet while we interpret the natural world through a self-consistent hierarchy of highly modular structural abstractions; we have been slow to adopt the structural abstractions required to address environmental complexity and the associated challenge of maintaining these environments.

Necessity is the Mother of Invention

The economic realities we face in 2013 will, I suggest, drive some long overdue changes. Fixation with virtualisation, and/or the latest and greatest ‘Cloud’ stack will wane, and organisations will regain some perspective. The ability to virtualise resource will remain a useful tool in the arsenal of enterprise technologies; but don’t expect too much of it. Virtualisation has failed to deliver the simplicity, operational savings and business agility promised by vendor marketing messages. In all honesty, virtualisation never could. The post virtualised world is more complex and risk prone than its physical predecessor; this the trade off made for increasing resource utilisation while shying away from addressing monolithic applications which are the root cause of most of our ills.

This is why OSGi is so important. An open industry specification, OSGi directly address the fundamental issue of structural modularity; encapsulating complexity and providing a powerful dependency management mechanism. Increasing not only Java; activities with the OSGi Alliance have started to address C/C++ and other languages.

The next generation of adaptive business platform, whether classified as private or public cloud environments; will need to be modular, will need to understand modularity applications, will need to manage all the forms of run time dependency, will need to be OSGi based.

Through our contributions to the OSGi Alliance, our sponsorship of the bndtools project and the OSGi Community and UK Forum, and above all the ongoing development of the Paremus Service Fabric – the industries first OSGi based Cloud runtime; from the developer IDE to the Cloud runtime, Paremus remain committed to delivering this vision. We hope that more of you join us in 2013.

In the meantime I’d like to wish you seasonal best wishes and peace, happiness and health for the coming year.

Richard & The Paremus Team.

 

 

 

 

Cloud: The Wasted Decade?

‘Cloud Computing’ remains the darling of the technology industry. ‘Cloud’, we are told, will reduce operational cost while increase business agility. ‘Cloud’, we are promised, will re-shape the way we think about IT services.

 

The problem is, I don’t understand how theses transformational effects are achieved? Current 1st-generation ‘Compute Clouds’ result from the synthesis of two underlying technology enablers.
  1. The adoption of coarse grained Service Orientation (WS-* or REST), allowed the decoupling of interconnected coarse grained business services. This is good.
  2. The use of the virtual machine image as the deployment artifact, allowed applications to be deployed ‘unchanged’ into alien runtime environments.
And therein lies the problem.

 

How can applications be more agile, when their internal composition remains unchanged? When they remain monolithic entities with tightly coupled constituent components? Given this, how are spiralling application maintenance costs countered? Exactly how is environmental complexity addressed? Are not 1st-generation public Cloud Compute offerings simple approaches to resource outsourcing? From a resource perspective more flexible than previous approaches; but potentially much more restrictive and ultimately dangerous as business logic is rigidly locked to sets of third party Cloud middleware API’s? Meanwhile, are 1st-generation private ‘Compute Clouds’ not just the latest marketing tag-line for traditional enterprise software deployment products?

After a decade of ‘industry innovation’ –  I fear Cloud Computing and resource Virtualization have collectively made the world more complex, our business systems more brittle and ultimately more costly to support.

In 2004 Paremus started a project to address what we considered to be the fundamental problem: how to build adaptive and robust application and runtime environments. Despite the ongoing Cambrian explosion of programming languages, Paremus did not believe the answer lay with the adoption of new or old programming language; e.g. Scala or Erlang! We also felt that the fashion to ‘Virtualize’ was probably – at best – a distraction; a position which places Paremus at odds with the rest of the industry.

 

Having a predominance of Physicists; Paremus reached to an area of scientific research known as ‘Complex Adaptive Systems’, hoping this would provide the architectural guidance we needed. It did and we distilled the following principles:
  • Change: Whether you like it or not its going to happen. Change must not be viewed as an inconvenient edge case or ‘Black Swan’ event.
  • Necessary Complexity: Systems that do interesting things are by their nature Complex.
  • Loose Coupling: While Complexity isn’t the issue, rigid structural coupling within Complex systems is fatal. Constituent parts must be loosely coupled.
  • Stigmergy: Don’t manage the details, rather set the overall objective and enable the System’s constituent components to respond appropriately: i.e Think Globally, Act locally.
  • Structural Modularity: In complex systems structural modularity always exists in a natural hierarchy.
  • Self Describing: Modular systems comprised from self-describing units (each unit describes it’s requirements and capabilities) may be dynamically assembled.
  • Diversity: Structural modularity enables diversity, which enables agility and evolvability.
  • Accidental Complexity: Structural modularity enables accidental complexity to be reduced over time.

Creating the illusion of a static environment – via ‘virtualization’ – for rigid, inflexible monolithic applications is a mistake. An easily consumed pain killer that only masked the fundamental illness.

The body of research was both compelling and conclusive.

Modularity is the key-stone. Modularity enables systems to be dynamically assembled and, if required, re-assembled from self-describing components.

Of all the languages available, only one (Java) had a sufficiently sophisticated modularity framework (OSGi) to meet our requirements. A set of open industry specifications created by the OSGi Alliance; OSGi provided many of the characteristics we required:
  • Modules (known as OSGi Bundles) are self-describing entities whose capabilities and requirements are well defined.
  • A powerful resolver capability allows OSGi based applications to be dynamically assembled during development or runtime. This not only includes bundle dependencies, but also life-cycle; and looking ahead service and environmental resource dependencies.
  • OSGi provides a well defined and industry standard approach to life-cycle and dynamic configuration.
  • Finally, OSGi provides a powerful micro-Services layer: this completing a natural structural hierarchy ( Class  ➔ Packages ➔ Bundles ➔ microServices ➔ traditional SOA ).
For these reasons Paremus adopted OSGi as a cornerstone of our product strategy in 2005. The result of these ongoing efforts is the Service Fabric: the industries first distributed OSGi runtime which enables:
  • Business applications to be dynamically assembled from re-usable components.
  • The dynamic assembly and configuration of these applications with respect to the runtime environment within which they find themselves.
  • Middleware services dynamically installed as required. No Cloud / PaaS API lock-in
  • Dynamic reconfiguration of all runtime elements (business logic, middleware and the platform itself) in response to unforeseen cascading resource failure.

 

Why now?
Whether a Financial Market crash or a paradigm shift in the IT industry; it is relatively easy to forecast that a change is inevitable; but notoriously difficult to predict precisely when this will occur. It is inevitable that next generation cloud environmentsPublic or Private will support dynamically assembly highly modular applications: these constructed from re-usable industry standard components – meaning OSGi. It is also inevitable that these environments will themselves be OSGi from the ground up. OSGi remains the only open industry standard, and so will be highly influential for those organisations that want to leverage a decade of expertise and avoid propriety vendor lock-in. The OSGi Alliance have recognised this trend and are extending specification activities to include Cloud Computing and generic Modularity and Life-Cycle management.
But is that time now?

 

For some organizations the ‘Boiling Frog’ analogy is, unfortunately, apt. Such organisations will continue to endure increasing OPEX costs; avoiding the upfront expense and perceived risk of re-orientating their IT strategy to focus on optimizing maintainability, agility and resilience. For such organisations – Cloud and PaaS strategies will consist of virtual machine management and the same old application software stack. Such organisations will see little real business benefit.

 

Increased interest in DevOps tooling (e.g. Puppet, Chief) demonstrate that other organisations have started to look beyond virtual machine centric strategies. While such DevOps tools start to address the issue of dynamic deployment and configuration; this is still with respect to monolithic applications, and so in essence no different from the IBM Tivoli and HP management tools of the 1990’s.

 

Meanwhile organisations that view technology as an enabler rather than a cost;  are starting to seriously look at development and runtime modularity. In part as a way to realise continuous release and agile development. For Java centric organisations this is driving the adoption of OSGi. One simple metric I can provide is that in the last 12 months Paremus has provided OSGi training to Government, Financial Services and Web companies in the US, Europe and Asia. In most of these organizations the relationship between OSGi, Agile Development and Cloud was a major interest.

 

Despite all our economic woes, is the time now? Quite possibly!

Complex Systems and Failure

Tim Harford’s ADAPT was one of those spontaneous airport bookshop purchases

In summary a good read with a relevant message. Short-termism stifles true innovation. It is only by attempting novel high risk activities that we can hope to make substantive changes and ultimately succeed.

ADAPT provides some advice for putting this philosophy into practice:

  • All interesting systems (ecological, economic, social, political) are Complex.
  • ‘Complexity’ is not the issue:  Tight coupling is the issue.
  • Tight coupling propagates failure; tight coupling must be avoided.
  • Information has context. Lose the context and much of the value of the information is lost.
  • Avoid overly centralised command and control. Rather, delegate the decision making process.
  • Where possible, act locally.

Those interested in ‘Complex Adaptive Systems’ will be aware of the substantive body of background research that underpin ADAPT’s arguments.

Why Complex System Fail

And yet these principles are rarely put into practices by the software industry.

Response to Failure: A tightly-coupled system

This is perplexing as the ‘fail fast‘ mantra is not new: it just seems to have been largely ignored. While Berkley’s Recovery Oriented Computing program demonstrated these ideas almost a decade ago; we see little evidence of them being incorporated in the latest ‘Cloud’ & ‘Virtualisation’ platform offerings from the dominant software vendors. Indeed, peel beneath the marketing covers, and the usual suspects continue to pursue ‘High Availability’ or ‘Fault-Tolerant‘ approaches.

This folly of this is nicely explained by ‘How Complex System Fail‘ (University of Chicago’s Cognitive technologies Laboratory). This paper covers some of the same ground as ADAPT but  explains the problem from an IT Operational perspective. Fault-Tolerance masks component failure, and so paradoxically, such systems are more vulnerable to severe cascading or systemic failures.

When these cascading events finally do occur, Operations are the only defence: Operations pick-up the pieces!

The author (Richard I. Cook) argues a number of points; but the following two are to my mind the most important.

  • Safety is a characteristic of systems and not of their components.
  • Failure free operations require experience with failure.

Surely such fundamental principles should be at the core of modern ‘cloud’ platform runtimes? Surely failure recovery must be an integral part of any overall solution? These behaviours exercised as part of normal ongoing runtime activities and not as responses to rare Black Swan events? Finally, surely cloud environments should aim to be truly loosely coupled environments?!

Note: In my book reliance on centralised message brokers, or naive use of rigid ZooKeeper type lock services are part of the problem, not the solution.

 

Stopping Complex System from Failing?

Markov Chain Analysis of a loosely-coupled ‘Target State’ driven platform

Thanks to early exposure to adaptive SOA frameworks like Jini; Paremus developed a strong intuition with respect to the requirements for mission critical ‘cloud’ environments. To provide concrete theoretical foundations, Paremus in 2005 used Markov Chain analysis to simulate the availability of traditional HA clusters; and contrasted these to alternative architectures we internally referred to as ‘No Frame of Reference (NFoR)‘. A ‘NFoR‘ architecture  had no static control points and could continuously re-allocate software components as required.

To achieve this:

  • Component failure was visible within the runtime environment.
  • Loose coupling at all structural layers ensured that failure was effectively isolated.
  • As the architecture was extremely modular, only the smallest units need to be replaced & recovery was rapid.
  • Sophisticated  ‘Target State Driven‘ dependency management automatically replaced the failed units.

The results of the simulations where clear. A ‘No Frame of Reference‘ runtime platform embodying fail fast and automated repair and recover behaviours significantly outperformed traditional static high availability alternatives.

 

Even in the most volatile of environments with multiple failures being rapidly injected; such platforms always settled back into a functional state.
Such solutions, because of their extreme agility could also be rapidly reconfigured, shutdown and re-started by Operations.

 

 

 

Hence, our own experiences were consistent with the advice offered by ADAPT and  Richard I. Cook’s paper.

The following are fundamental requirements.

  • A high degree of structural modularity as Modular System are Maintainable Systems.
  • Loose coupling between interactive software components (locally or network distributed).
  • Loose coupling between components and the underlying resources (physical or virtual).
With the following implications:
  • Structural modularity requires powerful dependency management.
  • Resource abstraction requires sophisticated ‘Target State‘ provisioning / re-provisioning capabilities.

And for Paremus, the OSGi software modularity framework provided a compelling set of industry standards via which these capabilities might be achieved.

It is worth emphasising that such capabilities are not a function of the programming language used. Choice of language does not in itself  provide an answer: just the notation you might use to realise an answer. For this reason I see the increasing adoption of the OSGi modularity system as far more significant that recent Java developments or even the emergence of languages like Scala.

Nor is resource ‘virtualisation’ relevant in achieving this goal. Virtualisation is an orthogonal and secondary concern! If you need to partition physical resource – by all means use virtual machines. If you need to partition a data-centre pursue a SDC (Software defined Data Centre) strategy. But tread carefully! These solutions do not address your fundamental issues and risk the introduction of yet another complex tightly coupled management layer.

That’s all for today!

If you are interested in further detail on Service Fabric concepts: see Paremus Service Fabric Concepts and Terminology.

Why modularity matters more than virtualization.

Ten years ago it all seemed so simple! Increase utilization of existing compute resource by hosting multiple virtual machines per physical platform; so consolidating applications onto fewer physical machines. As the virtual machine ‘shields’ its hosted application from the underlying physical environment, this is achieved without changes to the application.  As applications may now move runtime location without re-configuration; the idea of  virtual machine based ‘Cloud Computing’ was inevitable.

However, there are downsides.

Virtual machine image sprawl is now a well know phrase. If the virtual machine image is the unit of deployment; any software upgrade or configuration change, no matter how small, generates a new image. With a typical size of ~1 Gbyte  (see table 2 – http://www.ssrc.ucsc.edu/Papers/ssrctr-10-01.pdf) – this soon adds up! Large virtual environments rapidly consume expensive on-line and off-line data storage resource. This in-turn has driven the use of de-duplication technologies. So increasing storage cost and / or increasing operational complexity.

Once constructed, virtual machine images must be propagated, perhaps many times across the network, to the physical hosts. Also, a small configuration change, which results in a new virtual machine image, which needs to be deployed to many nodes; can generate hundreds of Gbytes of network traffic.

When used as the unit of application deployment; virtualization increases operation complexity, and increases the consumption of expensive physical network and storage resources: both of which are ironically probably more expensive than compute resource which virtualization is attempting to optimize the use of.

We’re not finished!

  • Some categories of application simply cannot be de-coupled from the physical environment. Network latency is NOT zero, network bandwidth is NOT infinite and locality of data DOES matter.
  • Virtualization complicates and obscures runtime dependencies. If a physical node fails, which virtual machines were lost? More importantly, which services were lost, which business applications were operationally impacted? Companies are now building monitoring systems that attempt to unravel these questions: further operational band-aids!
  • Centralized VM management solutions introduce new and operationally significant points of failure.
  • As the operational complexity of virtual environments is higher than their physical predecessors; there is an increased the likelihood of catastrophic cascading failure caused by simple human error.

Feeling comfortable with your virtualization strategy?

 

 

For all these reasons, the idea of re-balancing critical production loads by dynamically migrating virtual machine images, is I suggest a popular Marketing Myth. While many analysts, software vendors, investors and end users continue to see virtualization as the ultimate silver bullet! They are, I believe, deluded.

The move to the ‘virtual enterprise’ has not been without significant cost. The move to the ‘virtual enterprise’ has not addressed fundamental IT issues. Nor will moving to public or private Cloud solutions based on virtualization.

 

 

And so the Story Evolves

Acknowledging these issues, a discernible trend has started in the Cloud Computing community. Increasingly the virtual machine image is rejected as the deployment artifact. Rather:

  • Virtual machines are used to partition physical resource.
  • Software is dynamically installed and configured.
  • In more sophisticated solutions, each resource target has a local agent which can act upon an installation command. This agent is able to:
    • Resolve runtime installation dependencies implied by the install command.
    • Download only the required software artifacts.
    • Install, configure and start required ‘services’.
  • Should subsequent re-configure or update commands be received; the agent will only download the changed software component, and / or re-configure artifacts that are already cached locally.

Sort of makes sense, doesn’t it!?

The Elephant in the Room

Dynamic deployment and configuration of software artifacts certainly makes more sense than pushing around virtual machine images. But have we actually addressed the fundamental issues that organisations face?

Not really.

As I’ve referenced on many occasions; Gartner research indicates that software maintenance dominates IT OPEX (http://www.soasymposium.com/pdf_berlin/Anne_Thomas_Manes_Proving_the.pdf). In comparison hardware costs are only ~10% of this OPEX figure.

 

 

“Our virtual cloud strategy sounds awesome: but what are the business benefits again??”

 

To put this into perspective; a large organisation’s annual IT OPEX may be ~$2 billion. Gartner’s research implies that, of this, $1.6 billion will be concerned with the management and maintenance of legacy applications. Indeed, one organization recently explained that each line of code changed in an application generated a downstream cost of >$1 million!

The issue isn’t resolved by virtualisation, nor Cloud. Indeed, software vendors, IT end users, IT investors and IT industry analysts have spent the last decade trying to optimize an increasingly insignificant part of the OPEX equation; while at the same time ignoring the elephant in the room.

 

 

Modular Systems are Maintainable Systems

If one is to address application maintainability – then modularity is THE fundamental requirement.

Luckily for organizations that are pre-dominantly Java based; help is at hand in the form of OSGi. OSGi specifications and corresponding OSGi implementations provide the industry standards upon which an organisation can being to modularise their portfolio of in-house Java applications; thereby containing the on-going cost of application maintenance. For further detail on the business benefits of OSGi based business systems; see http://www.osgi.org/wiki/uploads/Links/OSGiAndTheEnterpriseBusinessWhitepaper.pdf).

But what are the essential characteristics of a ‘modular Cloud runtime’: characteristics that will ensure a successful OSGi strategy? These may be simply deduced from the following principles:

  • The unit of maintenance and the unit of re-use are the same as the unit of deployment. Hence the unit of deployment should be the ‘bundle’.
  • Modularity reduces application maintenance for developers. However, this must not be at the expense of increasing runtime complexity for operations. The unit of operational management should be the ‘business system’.

Aren’t these requirements inconsistent? No, not if the ‘business system’ is dynamically assembled from the required ‘bundles’ at runtime. Operations: deploy, re-configure, scale and up-date ‘business systems’. The runtime internally maps these activities to the deployment and re-configuration of the required OSGi bundles.

Simple.

In addition to these essential characteristics:

  • We would still like to leverage the resource partitioning capabilities of  virtual machines. But the virtual machine image is no-longer the unit of application deployment. As the runtime dynamically manages the mapping of services to virtual and physical resources; operations need no longer be concerned with this level of detail. From an operational perspective, it is sufficient to know that the ‘business system’ is functional and meeting its SLA.
  • Finally, it takes time to achieve a modular enterprise. It would be great if the runtime supported traditional software artifacts including WAR’s, simply POJO deployments and even non-Java artifacts!

Are there any runtime solutions that have such characteristics? Yes, one: the Paremus Service Fabric. A modular Cloud runtime – designed from the ground-up using OSGi; for OSGi based ‘business systems’. The Service Fabric’s unique adaptive, agile and  self-assembling runtime behaviors minimizes operational management whilst increasing service robustness. To get you started – the Service Fabric also supports non OSGi artefacts.

A final note: even Paremus occasionally bends to IT fashion :-/ Our imminent Service Fabric 1.8 release will support deployment of virtual machine images: though if you are reading this blog hopefully you will not be too interested in using that capability!

Breaking the Karmic Cycle

Paremus News

As you may have guessed from the syndication of Neil Bartlett’s blog; I’m very pleased to announce that Neil has joined Paremus and will be working on a number of interesting OSGi & Service Fabric based projects. Some of this work will appear in our imminent Service Fabric 1.8 release; but more on that in due course. In addition to Service Fabric and Nimble related work; Neil will be spending time advancing the open source BNDTools project and some interesting new capabilities are already in the pipe-line for the next release.

On a different note:  Whether you are a battle-hardened ‘OSGi master’, an enthusiastic ‘OSGi initiate’, or just curious; you’ll be interested in the OSGi Community Event in Darmstadt in September. Paremus will be attending this year: I’ll be presenting on Cloud and OSGi and explaining the importance of modularity and why the current generation of virtual machine based Cloud solutions have got it wrong! Should be fun and controversial – so come and heckle 😉

 

Breaking the Cycle

Whilst writing this post I came to the realisation that I was re-iterating a message I posted in 2008: see Impaled on the Horns of an OPEX Dilemma.

Perhaps,  if repeated frequently enough, the message concerning the fundamental importance of environmental modularity will seed, grow be heard over the general cacophony relating to the latest technology and software  fads.

With that hope –  once more…

As we  head towards the latest global economic downturn we see the all too predictable response from many organisations whose core business functions are heavily dependent upon technology.

The response goes something like this:

  • We have to deal with spiralling OPEX costs; the dominant component of which is ongoing application maintenance.
  • We have to do something now!
  • If we  reduce number of applications; clearly we reduce our maintenance burden!
  • We can also reduce headcount by pairing down local teams and off-shoring support.

These slash and burn programmes are subsequently implemented with varying degrees of success; BUT IN ALL CASES the remaining application portfolio is: just as difficult to manage; just as difficult to bug fix; just as difficult to functionally enhance; as ever. Perhaps more so as during the cost cutting exercise, with each round of redundancies, critical knowledge concerning the environment flows away from the organisation.

Business Agility and Service Availability Suffer

And then we have the economic upturn!

The business demand new functionality – NOW! The IT department cannot adapt or change the existing business systems: they are too monolithic, too high risk and core expertise has been lost. And so new generation of applications are created to meet the new business requirements. Developer and Operations headcount swell. The business systems are ever more complex, less flexible and OPEX is even higher; yet the business don’t care: they are making profit!

That is until the next down-turn.

Repeat this short term cyclic behaviours for ~15 to 20 years and you’ll end up with a fundamentally broken organisation.

Maintainable Systems are Modular Systems

There is a better way!

  • Acceptance: Realise that agile and cost effect environments take time to create; and will require some fundamental changes.
  • Enlightenment: Ignore industry fads (cloud, virtualisation, programming language of the day); maintainability is only achieved through ensuring modularity and agility at each layer of your environment.
  • If your business systems are Java based – then you are in luck. OSGi technology provides – if not the elusive silver bullet – the next best thing; an industry backed standards framework for modularity from which one can begin to realise these objectives.
  • In addition to these industry standards – you are going to need some help – so call us!
  • You need to start sometime; so may as well be today!

It many be unpalatable – but if organisations had implemented a long term modularisation strategy in 2008; those same organisations would be well placed; realising substantive and sustainable OPEX savings today.

Wonder if I’ll be repeating the same message in 2015?

OSGi – The Business Drivers

Software modularization should not be considered in isolation, but rather seen within the context of several inter-related trends that have occurred over the last decade. Indeed, software modularity is merely the latest visible facet of a much larger and more fundamental technology shift, with an impact at least equivalent to the move from mainframes to client server computing in the 1980’s.

These related trends include:

  • Service Oriented Architecture (SOA) – Enabling previously rigidly coupled business systems with proprietary protocols to be expressed as “services” which can be accessed via common protocols. However the business systems themselves remained opaque and monolithic.
  • Cloud Computing –  Decoupling applications from the underlying compute resources upon which they run, allowing more efficient resource utilization and scaling.
  • Software Modularization – Most recently, replacing the opaque monolithic applications and stove-piped middleware with dynamically assembled alternatives composed from re-usable software components.

As commented on in “The Rise of the Stackless Stack” (see http://www.redmonk.com/jgovernor/2008/02/05/osgi-and-the-rise-of-the-stackless-stack-just-in-time/); these trends collectively shift the industry away from rigidly coupled, static, opaque environments; towards adaptive, loosely coupled systems which are dynamically assembled from well-defined software components that run across a fluid set of compute resource.

.

.

Modularity and Assembly – It’s not a new idea!

The concept of assembling a product from a set of well-defined re-usable components is not new. Indeed its roots can be traced back to at least 250BC with emperor Qin Shi Huang and his commissioning of the Terracotta Army (see http://en.wikipedia.org/wiki/Assembly_line). Whatever the product, the driver for modularity and subsequent assembly is to; increase and maintain quality, reduce cost and increase output. The modern archetype for modularity and assembly is the automotive industry, where extensive use of standardization, modularity and assembly results in affordable automobiles.

Likewise, the computer industry already extensively uses hardware modularization in the form of standardised CPUs, memory and disk subsystems; leading to affordable computer hardware. These concepts are also well understood by software engineers. Extracts from http://en.wikipedia.org/wiki/Programming_in_the_large_and_programming_in_the_small include:

“Small programs are typified by being physically small in terms of their source code size, are easy to specify, quick to code and typically perform one task or a few very closely related tasks very well”,

… programming in the large, coding managers place emphasis on partitioning work into modules with precisely-specified interactions…”

 

and

“… one goal of programming in the large involves setting up modules that will not need altering in the event of probable changes. This is achieved by designing modules so they have high cohesion and loose coupling.”

Coarse grained modularization of business processes (via SOA) and the subsequent re-assembly (via BPEL) has been underway for some time, however the individual applications have remained monolithic and opaque. Further progress was not possible until a strong, widely endorsed, industry standard for enterprise software modularity was available. OSGiTM provides this modularization standard.

Since 1999 the OSGi Alliance (see http://www.osgi.org) has provided standards, reference implementations and guidance for modularization and assembly best practices. Recently, the OSGi Alliance has succeeded in:

  • Recruiting the majority of enterprise software vendors.
  • Encouraging those vendors to migrate their product portfolios to OSGi.
  • Producing the OSGi Enterprise standard which provides mechanisms to integrate OSGi based applications with legacy JEE & Spring.

Today OSGi is rapidly becoming a cornerstone for any successful business system re‑engineering effort.

.

.

The Dilemma

While software modularity and dynamic application assembly are inevitable industry trends; OSGi faces the usual adoption challenges: see Innovators Dilemma for an exploration of this theme – http://www.businessweek.com/chapter.christensen.htm.

‘The Business’ will never ask for OSGi based systems, it’s an implementation detail. Yet in the fullness of time ‘The Business’ will complain when the internal IT organization is seen as; too inefficient, too expensive and /or no longer agile enough to meet tactical, let alone strategic, business objectives.

Yet, from an IT management perspective revisiting a complex but working enterprise software stack is an immediate cost and quantifiable risk.[1] Something to be avoided. The alternative, a slow death seems almost preferable, or at least somebody else’s problem. As the environment degrades, make-shift tactical solutions are implemented to ease immediate pain points caused by strategic issues. This ongoing tactical ‘reactive’, behaviour results in an explosion in operational complexity and associated OPEX. The organisation eventually becomes paralysed by its own management’s inability to ‘grasp the nettle’ and identify and address the fundamental issues.

It is suggested that this organizational behavior parallels the well known ‘Tragedy of the Commons’ (see http://en.wikipedia.org/wiki/Tragedy_of_the_commons).

Developer attitudes and/or their working environment may also act as an impediment:

  • Just Get It Done. At each point in time, the developer is pursuing, or is forced to pursue, the shortest / lowest effort / route to the immediate deliverable at the expense of longer term maintainability.
  • The Artisan: The software developer “knows better” and pursues his own approach in defiance of industry standards and accepted best practices.
  • The Luddite: The current opaque, undocumented spaghetti of code within the organisation ensures employment. Increased efficiency, self-describing dependencies and code re-use sound like a recipe for smaller more flexible development teams.

Each can be a powerful brake on change.

Yet history tells us, those who effectively embrace and leverage change, succeed. OSGi migration is neither free, nor in itself, a quick ‘silver bullet’. However OSGi is of fundamental strategic importance to any organisation whose core business involves the processing of information.

.

.

The Benefits

Quite simply, modularity localizes the impact of change, which directly leads to increased maintainability.

  • As long as the module boundaries don’t change, one can change the functionality of the module freely, without concern for breaking the wider system; i.e. the impact of any local change is prevented from leaking into the wider system.
  • Modules that perform a few, or a single function, are much easier to test exhaustively than an entire monolithic system.
  • Smaller pieces of the system can be independently versioned.
  • Specific knowledge is only required for the particular module being worked upon, along with its relationship to other modules in the system; other modules can be treated as “black boxes” that perform specific functions, without worrying about how they perform them.

The large scale structure of the migrated application, which in all likelihood was previously unknown, is now completely defined by the dependencies between the set of versioned modules. Hence:

  • It is understood which modules are and are not required!
  • By replacing or re-wiring modules, the behavior of a composite system may be rapidly changed.

From an ongoing maintenance perspective it is now possible to re-factor individual modules, or even the overall system, to systematically drive out accidental complexity (see http://theit.org/publishing/books/prof-app/19261.cfm) and so contain, or even reverse, design rot.

While these arguments are understood, many organizations require demonstrable real world examples. The challenge is that those organisations that are first movers have no real incentive to act as references, thereby losing competitive advantage.

For several years, Paremus have been working with a number of organizations who are at various stages of OSGi migration. While Paremus cannot discuss specific details, or disclose the organizations by name, we are able to reference some common themes that demonstrate real world benefits for OSGi in general, and the Paremus Service Fabric in particular.

.

.

Resource Utilisation

When working with traditional applications, a lack of information concerning required libraries results in developers having to load every possible library into their IDE. From experience, Paremus have seen this drive memory requirements as high as 40GB per developer desktop. However, once dependencies are understood, and mechanisms are in place so that only the required components are loaded, Paremus have also seen the number of artefacts reduced by an order of magnitude with corresponding machine memory savings.

For an organization with several hundred developers the saving is considerable. The saving in reduced memory footprint in Production is of course correspondingly larger.

.

.

Developer Efficiency

Developing against a monolithic application requires the developer to work and test against the complete code-base. Yet for large applications it may not be possible to test changes in the local IDE, as compile times may be hours. As a result, developers are forced to rely upon unit and integration tests that run during the nightly build cycle. The result is that the whole bug detection and rectification cycle can take days, with an increased likelihood that some issues are not found and leak into production.

In contrast, rapid testing of OSGi modules is easily achieved in the developer IDE, and Paremus have seen this test / fix cycle reduced from days to minutes.

A modular system also lends itself well to many hands being involved in its development and maintenance. It’s not necessary to understand the whole system inside-out, each individual can independently work on  small well-defined and decoupled modules. This directly translates to Increased project delivery success rates as smaller, well-contained projects have a higher success rate than larger and more poorly constrained projects.

.

.

Definitive Dependencies

Most modern build systems allow build-time dependencies to be specified and obtained from various repositories. However, different types of  build system repository specify dependencies with different formats; e.g. pom.xml, ivy.xml.

OSGi provides runtime dependency meta-data in its Require-Bundle and Import-Package bundle manifest headers. This provides definitive  industry standard dependency information that can be consumed regardless of the build system you use. Hence OSGi decouples your runtime from the source build systems; thereby avoiding meta-data lock-in and allowing  different types of build repository to co-exist.

In addition, OSGi supports package-level dependencies. These are finer-grained than the artefact-level dependencies supported by most build systems and can be programmatically determined directly from source code using tools such as Apache Felix Sigil . This avoids the errors that occur when trying to manually maintain dependency data.

Finally, OSGi supports its own repository API: OBR. A current priority for the OSGi Alliance EEG working group; OBR allows searching for package-level dependencies as well as the artefact-level dependencies used by other repositories.

.

.

Production Stability, Availability & Agility

Unless you happen to own applications that never change; stability, availability and agility are closely related concerns. The more agile the business service, the easier it is to change from one well-defined state to the next. This may be to introduce new business functionality, apply a patch, or roll back to a previously good version.

The internal structure of monolithic opaque applications is poorly understood; hence upgrades are complex and high risk. This, coupled with the long developer diagnostic / fix cycle, can result in repeated production outages and instability that spans several working days. Operations typically respond by maintaining a number of isolated horizontal silos; attempting to ensure service availability by releasing new software one silo at a time. This issue isn’t just common, its almost universal.

In contrast; with an OSGi-based runtime like the Paremus Service Fabric, each application is self‑describing; meaning that the application structure is now fully described in-terms of dependencies between versioned modules. A system may be deployed in seconds. A running system may be upgraded on-the-fly (i.e. within seconds); and may be returned to a previous well known-state just as rapidly.

.

.

Operational Risk & Governance

The implication for operational risk should be immediately apparent:

  • From regulatory and business continuity perspectives, the structure of an application is precisely known at each point-in-time; allowing all versions of the application to be rapidly re-constituted.
  • Key structural information is no longer locked within key members of staff. In principle the organisation can employee any OSGi literate developer / systems architect who can rapidly navigate the structure of the organisation’s software systems.

From a governance perspective, it is now simply task to rapidly answer the following types of question:

  • Which software license types are used within which production applications?
  • Which production applications use third party modules with an identified security vulnerability?

This information is readily available courtesy of the metadata embedded in each OSGi module and the dynamic deployment & assembly mechanisms provided by the OSGi runtime environment.

.

.

An OSGi ROI?

Anne Thomas Manes (Gartner) estimates that ongoing maintenance accounts for 92% of the total lifetime cost (TCO) of each application. Whilst hardware accounts for <10% of TCO, software maintenance accounts for ~70% of TCO; the remainder being the initial cost of developing the application; see slides 9 & 10 – SOA Symposium: Berlin, October 2010.

Given this, the current fashion for virtual machine based Cloud computing seems somewhat perplexing. The deployment of traditional opaque software stacks as virtual machine images does  NOTHING to address the issue of application maintainability.

Closer to home, a group of Financial Services engineers recently attempted to quantify the potential return realised by migrating their production environment to the Paremus Service Fabric. They conclude that OPEX savings of 60% were possible. While Paremus do not know the details of this analysis, it is likely that the following three contributing factors were considered.

  • The Paremus Service Fabric is a cost effective replacement for Application Servers, Compute Grids, CMDBs and provisioning tooling and requires less operational resource to manage it.
  • The Paremus Service Fabric, being a Private Cloud, achieves the resource utilisation and efficiency savings alluded to by many virtualisation vendors, but without the management overhead and operational risk associated with ‘virtual machine sprawl’.
  • However, given the Gartner TCO analysis, the bulk of the identified OPEX saving most likely results from the ongoing maintainability of OSGi based business applications running upon the Service Fabric runtime.

A large Financial Service organisations may have 1000+ applications, each with an average annual running cost of ~$1,000,000. This equates to an potential annual saving of $600,000,000! A significant medium term bottom line saving that surely warrants investing in a multi-year OSGi based application transformation program?

.

.

Migration Strategy

Hopefully a compelling set of arguments for adopting OSGi have been presented, but how do you actually migrate an organisation with a decade of legacy applications to a new OSGi based world?

Organisations considering this invariably start with the following questions:

  • How do we determine and then untangle dependencies in our current environment?
  • How do we move to an OSGi-centric build/test/release cycle within minimum impact on the majority of developers?
  • What level of modularity / granularity should we pursue?

In response, Paremus advise an iterative approach; the precise details dependent upon each organisation’s starting point and business objectives.

Stage I

 

1. Assemble a small high-caliber team of engineers with appropriate OSGi / modularity skills. Ensure that this team have senior management backing and representation.

2. Set up tools to determine and monitor dependencies within the organisation’s existing code base. Provide automated reporting, remove superfluous, and fix unsatisfied, dependencies.

At this early stage, the organization may have already achieved as much as an order of magnitude simplification in the code base dependencies. This in itself will improve developer productivity by reducing compile times and help increase the success of production releases.

Stage II – Tooling & Metadata

 

3. Review organisation’s standards for IDE tooling. Does current tooling support OSGi metadata and enable simple management of exposed dependencies? Review organisation’s current repository standards; will these support the OSGi Alliance OBR standard? If required, select new tooling.

4. Set up OSGi metadata for all organisations projects; this to be maintained by project developers. This metadata need not yet be used at runtime, but it can be used during the build process to monitor the progress of the modularisation effort.

 

It should now be possible to run OSGi and standard Java variants of an application side-by-side. This allows migration to progress without creating large parallel branches in source control, which are difficult to subsequently re-merge. CAPEX savings may be realised at this point, as only the required artifacts will be loaded into the developers IDE, decreasing the amount of resources required and increasing testability further.

Stage III – Runtime

5. Select candidate applications for migration based upon agility and re-usability considerations:

  • A set of applications that share a high degree of functionality / code and require frequent functional updates are excellent candidates for early migration.
  • A standalone application, which seldom, if ever changes, and shares little or no functionality with other applications, is a very low priority – and may never be migrated.

6. Create working runtime bundles using existing libraries as the level of granularity.

7. Create integration tests for use during and after migration to assert that parallel development streams do not break any modularisation efforts as the code is migrated.

8. Test OSGi version of candidate application in an OSGi runtime environment; e.g. Paremus Service Fabric.

9. Create integration tests to ensure fixes for hidden gotchas are caught during development rather than in production; i.e. Development and UAT fabrics.

The organization should now be in a good position to deploy OSGi based applications to production and accelerate OSGi migration. The new development, build, test. release lifecycle will enable considerable improvements in developer productivity via further reductions in time to run unit and integration tests.

Stage IV – Iterate and Reward

10. For each application take an iterative approach to modularisation; break down the deliverable into smaller modules (improving modular boundaries), test, deploy, release to production. Then re-visit modularity, break down further if appropriate, test, release.

11. For each migrated application, using the dependency information now available, publish ‘composition reports’; listing degree of re-used of in-house modules, certified third part modules and alerting on uses of non-certified modules.

12. The modularisation process will now be running across many candidate applications in parallel; during which the core OSGi team will continue to advise each participating application group on appropriate levels of modularity and opportunities for re-use.

13. Reward application teams; not just for meeting an initial delivery objectives, but also on:

  • Use of agreed third party OSGi libraries, where appropriate.
  • Achieve a high degree of re-use in a re-factored applications.
  • Delivering and maintaining in-house modules which are re-used in many in-house applications.

The ‘composition reports’ and associated incentives provide the development teams with powerful feedback mechanisms.

In this manner, the organisation typically starts with an isolated application, building the required skills & processes. Driven by initial successes and cost savings, adoption naturally flows from this initial seed point through the organisation.

.

.

To Conclude

Whether OSGi is destined to be the next IT ‘fashion bubble’ (once the current hysteria on ‘Cloud’ has waned) or it will grow organically via initial adoption by the most sophisticated IT organizations, is unclear.

However, the software industry is a manufacturing industry; no more and no less than manufacturers of disk drives, steel or automobiles.  While the raw materials are low cost, the ongoing effort required to craft and maintain flexible, high quality software, is not.

Necessity will drive the software industry, and those organisations with large in-house development teams, towards modularization, dynamic application assembly and so OSGi.


[1]The author has had direct experience of this behaviour in an enterprise environment.

A backup solution that was initially designed for a hand full of servers was nearing collapse. Each time a new backup server was added to address over-runs the reduction in backup time was less than the 1/n expected as loads could not be exactly balanced across backup real estate. The situation degraded to the point where the backups extended to a full 24 hours. The correct solution was to refresh the enterprise network and consolidate backup servers into centralized high speed silos. Yet, despite the operational risk, some management refused to sign off on what was perceived to be a high risk project. Only when no ‘last’ silver bullet could be found was sign-off achieved to progress the correct strategic solution. The situation had by then become so serious that this solution had to be implemented as rapidly as possible.

OSGi: The Value Proposition: Part II

Spiralling OPEX – caused by epidemic levels of complexity and rigid / frail business systems – will ensure the success of enterprise OSGi!

That’s all it takes, really. Pressure and time

The Shawshank Redemption

In OSGi: The Value Proposition – I argued that modularisation was a fundamental requirement if one is to avoid long term system rot and thereby avoid excessive on-going maintenance costs.

Kirk Knoernschild’s keynote presentation at the London JAX conference this week promises a fascinating exploration of this and related themes.

Quoting from Kirk’s latest post concerning his JAX keynote

As a system evolves, its complexity increases unless work is done to maintain or reduce it.

and concludes…

For a long time, a central ingredient has been missing. But not for much longer, because the enterprise will get its OSGi!

Well worth attending if you are able!