Some Further Thoughts on Service Lifecycle Automation

Everyone wants service lifecycle automation, which some describe as a “closed-loop” of event-to-action triggering, versus an open loop where humans have to link conditions to action.  At one level, the desire for lifecycle automation is based on the combined problem of reducing opex and improving service agility.  At another level, it’s based on the exploding complexity of networks and services, complexity that would overwhelm manual processes.  Whatever its basis, it’s hardly new in concept, but it may have to be new in implementation.

Every network management system deployed in the last fifty years has had at least some capability to trigger actions based on events.  Often these actions were in the form of a script, a list of commands that resemble the imperative form of DevOps.  Two problems plagued the early systems from the start, one being the fact that events could be generated in a huge flood that overwhelmed the management system, and the other being that the best response to an event usually required considerable knowledge of network conditions, making the framing of a simple “action” script very difficult.

One mechanism proposed to address the problems of implementing closed-loop systems is that of adaptive behavior.  IP networks were designed to dynamically learn about topology, for example, and so to route around problems without specific operations center action.  Adaptive behavior works well for major issues like broken boxes or cable-seeking backhoes, but not so well for subtle issues of traffic engineering for QoS or efficient use of resources.  Much of the SDN movement has been grounded in the desire to gain explicit control of routes and traffic.

Adaptive behavior is logically a subset of autonomous or self-organizing networks.  Network architecture evolution, including the SDN and NFV initiatives have given rise to two other approaches.  One is policy-based networking where policies defined centrally and then distributed to various points in the network enforce the goals of the network owner.  The other is intent-modeled service structures, which divide a service into a series of domains, each represented by a model that defines the features it presents to the outside and the SLA it’s prepared to offer.  There are similarities and differences in these approaches, and the jury isn’t out yet on what might be best overall.

Policy-based networks presume that there are places in the network where a policy on routing can be applied, and that by coordinating the policies enforced at those places it’s possible to enforce a network-wide policy set.  Changes in policy have to be propagated downward to the enforcement points as needed, and each enforcement point is largely focused on its own local conditions and its own local set of possible actions.  It’s up to higher-level enforcement points to see a bigger picture.

Policy enforcement is at the bottom of policy distribution, and one of the major questions the approach has to address is how you balance the need for “deep manipulation” of infrastructure to bring about change, with the fact that the deeper you go the narrower your scope has to be.  Everybody balances these factors differently, and so there is really no standard approach to policy-managed infrastructure; it depends on the equipment and the vendor, not to mention the mission/service.

Intent-modeled services say that both infrastructure and services created over it can be divided into domains that represent a set of cooperative elements doing something (the “intent”).  These elements, because they represent their capabilities and the SLA they can offer, have the potential to self-manage according to the model behavior.  “Am I working?”  “Yes, if I’m meeting my SLA!”  “If I’m not, take unspecified internal action to meet it.”  I say “unspecified” here because in this kind of system, the remediation procedures, like the implementation, are hidden inside a black box.  If the problem isn’t fixed internally, a fault occurs that breaks the SLA and creates a problem in the higher-level model that incorporates the first model.  There the local remediation continues.

You can see that there’s a loose structural correspondence between these two approaches.  Both require a kind of hierarchy—policies in one case and intent models in another.  Both presume that “local” problem resolution is tried first, and if it fails the problem is kicked to a successively higher level (of policy, or of intent model).  In both cases, therefore, the success of the approach will likely depend on how effectively this hierarchy of remediation is implemented.  You want any given policy or model domain to encompass the full range of things that could be locally manipulated to fix something, or you end up kicking too many problems upstairs.  But if you have a local domain that’s too big, it has too much to handle and ends up looking like one of those old-fashioned monolithic management systems.

I’m personally not fond of a total-policy-based approach.  Policy management may be very difficult to manipulate on a per-application, per-user, per-service basis.  Most solutions simply don’t have the granularity, and those that do present very complex policy authoring processes to treat complicated service mixes.  There is also, according to operators, a problem when you try to apply policy control to heterogeneous infrastructure, and in particular to hosted elements of the sort NFV mandates.  Finally, most policy systems don’t have explicit events and triggers from level to level, which makes it harder to coordinate the passing of a locally recognized problem to a higher-level structure.

With intent-based systems, it’s all in the implementation, both at the level of the modeling language/approach and the way that it’s applied to a specific service/infrastructure combination.  There’s an art to getting things right, and if it’s not applied then you end up with something that won’t work.  It’s also critical that an intent system define a kind of “class” structure for the modeling, so that five different implementations of a function appear as differences inside a given intent model, not as five different models.  There’s no formalism to insure this happens today.

You can combine the two approaches, and in fact an intent-model system could envelope a policy system, or a policy system could drive an intent-modeled system.  This combination seems more likely to succeed where infrastructure is made up of a number of different technologies, vendors, and administrative domains.  Combining the approaches is often facilitated by the fact that inside an intent model there almost has to be an implicit or explicit policy.

We’re still some distance from having a totally accepted strategy here.  Variability in application and implementation of either approach will dilute effectiveness, forcing operators to change higher-level management definitions and practices because the lower-level stuff doesn’t work the same way across all vendors and technology choices.  I mentioned in an earlier blog that the first thing that should have been done in NFV in defining VNFs was to create a software-development-like class-and-inheritance structure; “VNF” as a superclass is subsetted into “Subnetwork-VNF” and “Chain-VNF”, and the latter perhaps into “Firewall”, Accelerator” and so forth.  This would maximize the chances of logical and consistent structuring of intent models, and thus of interoperability.

The biggest question for the moment is whether all the orderly stuff that needs to be done will come out of something like NFV or SDN, where intent modeling is almost explicit but where applications are limited, or from broader service lifecycle automation, where there’s a lot of applications to work with but no explicit initiatives.  If we’re going to get service lifecycle automation, it will have to come from somewhere.