Why the Functional Model of a Zero-Touch Solution is important

Scope of impact is critical for the success of zero-touch automation, whether we’re talking about the general case of managing application/service lifecycles or the specific case of supporting network operator transformation.  There’s a lot of stuff to be done in deploying and sustaining something useful, and the more pieces there are involved the more expensive and risky the lifecycle processes are.  If you grab a low apple, you may not get more than a tiny bite of the pie.

While scope of impact is important, it’s not the only thing that matters.  The second zero-touch automation issue of the trio of issues I promised to address is the functional model used by the software solution.  This is the set of issues or factors that determine just what a zero-touch solution can really do, because it determines what it really knows.

Zero-touch automation, as applied to network service lifecycle automation, involves two things.  First, the knowledge of the actual lifecycle, as a stepwise progression from nothing to something that’s running and being used and paid for.  Second, a knowledge of changes in conditions (events) that impact in some way the status of the service.  In these terms, zero-touch automation is the execution of service processes in response to events, filtered by the combination of the current lifecycle state and the “goal state”.

The popular vision of zero-touch implementation is a kind of Isaac Asimov-ish “I, Robot” or Star Wars R2D2 concept, where the implementation is imparted with human intelligence and understands the goals and the things happening.  Perhaps we’ll get to that, but right now we can’t ask R2 to do something to run our networks so we have to look at more pragmatic approaches.

The software DevOps world has given us two basic ways of thinking of automating software-related tasks.  One, the prescriptive model, says that you do specific things in response to events.  This is what a human operator in an operations center might do—handle things that are happening.  The other is the descriptive model, which says that there is a goal state, a current state, and a perhaps-implicit event, and the combination indicates what has to be done to get from where you are to where you want to be.  In networking, another concept seems to have taken pride of place in current thinking—intent modeling.

In intent model systems, a functional element is a black box, something that is known by its properties from the outside rather than by what’s inside.  The properties describe what the functional element is supposed to do/offer (its “intent”).  In common practice, intent models are nested to reflect the structure of a service.  You might have a “service” model at the top, below which are “high-level-function” models, each of which are successively decomposed into lower-level stuff until you reach the point where a functional element is actually implemented/deployed.

There are two properties of intent models that are important to zero-touch automation.  The first property is that each intent model is responsible for fulfilling its intent or notifying the superior element of a failure to comply.  What’s inside is invisible, so nothing outside is able to manage it.  The second property is that all implementations of a given intent model are by definition equal and compatible.  If you can’t see inside the black box you can’t tell what’s different or the same.  That means that all implementations of a given intent model relate to the rest of the service the same way, and are managed the same way.

The nice thing about intent models is that you can stick either prescriptive or descriptive behavior inside them, as long as they do behave as promised or report their flaws.  Arguably the approach of intent modeling is a better high-level way of looking at services or application lifecycles because of that.  It also means that whatever DevOps tools might be available, they can be exploited in the implementation of intent models.  Hey, it’s a black box.

The missing link here is that while what goes on inside an intent model is invisible, what gets outside one cannot be.  Remember that a given model has to either do its job or report its failure.  The “to whom?” question is answerable fairly easily—to whatever model contains it, or to the service/application process that spawned the model tree if it’s the top model in such a tree.  The question of what the report is, and does, is more complicated.

We obviously can’t say X broke; too bad! and let it go.  The event that “X broke” is then up to the superior object to X to handle.  Since the conditions of all its subordinate objects are asynchronous with respect to each other (and everything else in the hierarchy), the most convenient way to address the event interpretation is via the current-state-and-event model.  When a service is ordered, it might move from the “Orderable” to the “Deploying” state.  The superior service intent model element might then send an event to its subordinates, “Ordering” them as well, and moving them to “Deploying”.  Eventually, if all goes well, the bottom objects would reach the “Active” state, and that would pass up the line, with each superior object reporting “Active” when all its subordinates are active.

If a fault occurs somewhere, the intent model that went bad would report “Fault” to its superior and enter a “Fault” state.  The superior object then has the option of redeploying the failed element, rethinking its whole decomposition into subordinates based on the failure (redeploying them all), or reporting “Fault” up the line.  Remediation would involve a similar cascade of events.

The nice thing about this approach is that you can visualize a model hierarchy as a set of data elements, and you can host the state/event processes anywhere you have resources and can read the appropriate data elements.  There doesn’t need to be a process if there’s nothing happening at a given intent-model level; you spin one up as needed, and as many as you need.  Everything is elastic and scalable.

All of this is lovely, and in my view the right way of doing things.  There’s also a linear model of the process, which says that there is no per-intent-model state/event process at all, but rather an event is simply handed to a software tool, which then figures out what the event means by exploring the service structure in some way.  There’s no concurrency in this approach, no scalability, no real resiliency.  And yet this is how zero-touch or service lifecycle process management is often visualized, how the NFV ISG’s E2E model describes it.  A set of software components, connected by interfaces, not a set of objects controlling state/event process management.

If, as I’ve said, ONAP/ECOMP is the only path forward toward achieving zero-touch automation simply because the scope issue is so complicated that nothing else will be able to catch up, then does ONAP support the right model?  As far as I can tell, the current software is much more linear than state/event in orientation.  It wouldn’t have to stay that way if the software was designed as a series of functional components, and I might be wrong in my assessment (I will try to check with the ONAP people to find out), but in any event the documentation doesn’t describe the models in detail, indicate the state/event relationships, etc.  More documentation, and perhaps new work, will be needed.

It should be considered essential to do both, for two reasons.  First, there is a risk that a large number of events would swamp a linear model of software.  Without a high degree of scalability, zero-touch automation is a trap waiting for a major incident to spring.  Second, the credibility of ONAP could be threatened by even near-term issues associated with limited linear processes, and that would put the only credible path to zero-touch automation at risk.  Sometimes you have to fix things, not try to start over, and this is such a time.