Do We Have a Problem Just Describing an Event-Driven System?

Could one of our problems with a software-defined future be as simple as terminology?  When I noted in my blog that terms like “network management system” or “operations support system” were implicitly monolithic, implying a traditional single-structure application, I had a lot of operators contact me to agree.  Certainly, we have a long history of using terms that describe not only the “what” but the “how”, and when we use them, we may be cementing our vision of software into an obsolete model.

To me, the best example of this in the networking space is NFV.  Back in the summer of 2013, as NFV was evolving from a goal (replace devices with software instances hosted on commercial off-the-shelf servers) to an architecture, the NFV ISG took what seemed a logical step and published what they called an “End-to-End” or E2E model of NFV.  By this time, I’d been promoting the idea that NFV was explicitly a cloud computing application and needed to adopt what’s now called a “cloud-native” architecture.  I objected to the model because it described a monolithic application structure, but everyone at the time assured me this was a “functional” description.  Well, we ended up defining interfaces and implementations based on it anyway, and we lost all the cloud innovations (and even the TMF NGOSS Contract innovations) along the way.

Part of the problem with terminology in software-centric processes is that few people are software developers or architects.  We struggle to convey software-based frameworks to people who don’t know programming, and in our struggle we fall back on the “lowest common denominator”, concepts that everyone sort-of-gets.  We know what applications are, whether we’re programmers or not.  Think PowerPoint or Excel or Word.  Thus, we think applications, and when we do our examples are monolithic applications.

An “application” is the term we use for a collection of software components that interact with the outside world to perform a useful function.  In a monolithic application vision, the application presents a user interface, accepts requests and changes, displays conditions/results, and performs its function in a very cohesive way.  You can see a word processor as such a monolithic application.

You could also see it another way.  We have this “document”.  We have a place in that document that’s defined as the “cursor position”.  You have keys you can punch, and those keys represent “events” that are processed in the context of the document and the cursor position.  This is much closer to the vision of a word processor that the development team would necessarily have, an event-driven approach, but it’s not easily communicated to the kind of people we’d expect to be using word processors, meaning to non-programmer types.

Word processors fit easily into both worlds, descriptively speaking.  You can describe one functionally, as a monolithic document-centric application, or you could describe it as a state/event system.  But word processors are supporting a user who’s naturally single-threaded in thinking.  There is one source of events, one destination for results, and one context to consider.  Suppose that the functionality we’re building has to deal with a large number of asynchronous events and a significant number of contexts?  The implementation of this is going to get a lot more complicated very quickly, and it will be increasingly difficult for any non-programmer (and even many programmers) to gain a sense of the functional logic from the implementation description.

Traditional (meaning early) event processing tended to be based on a monolithic model for the application, fed by an “event queue” where events were posted as they happened and popped for processing when the application had the resources available to do the work.  The processing had to first identify what the event was, then what the event related to, then the context of the event relative to what the application was doing at the time.  If the system the application was designed to support had multiple independent pieces (two different access services and a core service, for example), there was also the problem of making sure that something you were doing in one area didn’t get stepped on while processing an event in another area.

This model of application is still the basis for initiatives like the various NFV MANO implementations and ONAP.  It works as long as you don’t have too many events and too many interrelated contexts to deal with, so it’s OK for deployment and not OK at all for full-service lifecycle automation.  For that, you need a different approach.

The basic technical solution to asynchronous event handling is a finite state machine.  We say that the system our application is managing has a specific set of discrete functional states, like “orderable”, “activating”, “active”, “fault”, and so forth.  It also has a series of events, like “Order”, “Operating”, “Failure”, “Decommission”, and so forth.  For every state/event combination, there is an appropriate set of things that should happen, and in most cases there’s also a “next state” to indicate what state should be entered when those things have been done.  The combination is expressed in a state/event table, and this kind of thing is fundamental to building protocol handlers, for example.

The TMF’s NGOSS Contract innovation, which I’ve mentioned many times in blogs, said that this state/event table was part of a service contract, and that the contract then became the mechanism for steering service events to service processes.  This is a “data-driven” or “model-driven” approach to handling event-driven systems, and it’s a critical step forward.

The problem with the finite state machine model is that where the system consists of multiple interrelated but functionally independent pieces (like our access/core network segments), you have state/events for each piece, and you then need something to correlate all of this.  In my ExperiaSphere project (both the first phase in 2008 and the second in 2014), I presumed that a “service” was made up of functional assemblies that were themselves decomposed further, to the point where actual resources were committed.  This structural hierarchy, this successive decomposition, was how multiple contexts can be correlated.

If we have a functional element called “Access”, that functional element might decompose into an element called “Access Connection” and another called “Access Termination”.  Each of these might then decompose into something (“Ethernet Connection”, “MM-Wave Connection”) that eventually gets to actually creating a resource commitment.  The state of “Access” is determined by the state of what it decomposes into.  If the Access element, in the orderable state, gets an “Order” event, it would set itself in the “Setup” state, and send an Order to its subordinate objects.  When these objects complete their setup, they’d send an Operating event up to Access, which would then set its state to Active.  Events in this approach can be sent only to adjacent elements, superior or subordinate.

You can see from this very basic example that an implementation description of such a system would convey nothing to a non-programmer.  That’s almost surely how we ended up with monolithic implementations.  We tried to describe what we wanted in functional terms, because those terms would be more widely understood, but we then translated the functional model, which looks and is monolithic, directly into an implementation, and so implemented a monolith.

The cloud community has been advancing toward a true cloud-native model on a broad front, and here again there’s some hope from that quarter.  The advent of “serverless” computing (functions, lambdas, or whatever you’d like to call it) has launched some initiatives in event orchestration, the best known of which is Amazon’s Step Functions.  As they stand, they don’t fit the needs of service lifecycle automation, but they could easily evolve to fit, particularly given the fact that the whole space is only now developing and there’s a lot of interest from all the cloud providers and also from IBM/Red Hat and VMware.

It would be a major advance for cloud-native in general, and telecom software-centric thinking in particular, if the maturation of a technical approach came with a vision of how to describe these kinds of orchestrated-event systems so everyone could grasp the functional details.  It’s not an easy problem to solve, though, and we’ll have to keep an eye on the space to see how things shake out.