Some Top-Down Service Lifecycle Modeling and Orchestration Commentary

Most of you know that I’ve been an advocate of intent-modeled, model-driven, networking for almost two decades. This approach would divide a service into functional/deployable elements, each represented by a “black box” intent model responsible for meeting an SLA. This approach has some major advantages, in my view, and also a few complications. Operators generally agree with this view.

I believe that a model-based approach, one aligned with seminal work done by the TMF on what was called “NGOSS Contract”, is the only way to make service lifecycle automation, service creation, and service-to-resource virtualization mapping, work. I’ve laid out some of my thoughts on this before, but some operators have told me they’d like a somewhat more top-down view, but one that’s aligned with current infrastructure and service realities. This is my attempt to respond.

The first and perhaps paramount factor is the relationship between functional division and “administrative control”. For sake of discussion, let’s say that an API through which it’s possible to exercise operational control over a set of service elements is an administrative control point. Let’s say that such a control point would let an operations process create multiple “functions”, which are visible features that can be coerced through the control point. Obviously, this control point could represent a current management system, or a new management tool.

My approach to modeling is also based on a division between the logical/functional and the actual/behavioral, meaning that it has a “service domain” and a “resource domain”. The former expresses the relationship between features and services, and the latter expresses how features map to the behavior of actual resources, including both software and hardware. The “bottom” of the service model has to “bind” to the “top” of the resource model in some way, and that binding is the basis for “deployment”. Once deployment has been completed, the fulfillment of the service-level agreement is based on the enforcement of the SLAs passed down the models (service and resource).

The service level top, which creates the actual overall SLA, has to “decompose” that SLA into subordinate SLAs for it’s child model elements, and each of them in turn must do the same. Each model element represents a kind of contract to meet its derived SLA. The model element relies on one of two things to enforce the contract—a notice from a child element that it has failed its own SLA (and by implication, no notice means it hasn’t) or, for a resource model, an event from a resource within that shows a fault that must either be remedied or reported.

The binding between service model and resource model is based on supported resource “behaviors”, which are functionalities or capabilities that the resource administration has committed to support. The behavior resource model would then be divided based on the administrative control points through which the behavior was controlled. There might be one, or there might be many such points, depending on just how the collective set of resources were managed.

The reason for the domain division is to loosely couple services to resources. With this approach, it would be possible to create a service model that consumed a behavior set that could then be realized by any combination of resource providers, without modification. Anyone could author “service models” to fit a set of behaviors and anyone would be able to advertise those behaviors and thus support the services. This could be used to focus standards activities on behavior definition.

Another feature of this approach is “functional equivalence”. Any implementation of a behavior could be mapped to a service element that consumed it, which means that you could deploy both features based on network behaviors—router networks—and features created by hosting software instances. In fact, you could even have something based on a totally manual process, so that if actual field provisioning of something was needed, you could reflect that in the way a behavior was implemented.

To return to SLAs, each model element in both domains has the common responsibility to either meet the SLA it commits to, to remedy its performance within the SLA, or report an SLA failure. During service creation, or “deployment/redeployment”, each model element has a responsibility to select a child element that can meet its SLA requirements on deployment, and to re-select if the previously selected element reports a failure. The SLA would necessarily include three things—the service parameters expected, the SLA terms, and the location(s) where the connections to the element were made and where the SLA would be expected to be enforced. “I need a VPN with 20 Mbps capacity, latency x, packet loss y, at locations 1, 2, and 3”. That “contract” would then be offered to child elements or translated into resource control parameters and actions.

At one level, this sounds complicated. If, for example, we had a service model that contained a dozen elements, we would have a dozen management processes “running” during the lifecycle of the service. If the resource model contained a half-dozen elements, there would then be 18 such processes. Some could argue that this is a lot of management activity, and it is.

But what is the actual process? It’s a state/event table or graph that references what are almost surely microservices or functions that run only when an event is recognized. A service or resource “architect” who builds the model would either build or identify each process referenced. Many of the processes would be common across all models, particularly in the service domain where deployment/redeployment is based on common factors. I’ve actually built “services” based on this approach, and process creation wasn’t a big deal, but some might think it’s an issue.

The upside of this approach is that each model element is essentially an autonomous application that’s event-driven and that can be run anywhere. An event-handling process is needed to receive events, consult the state/event reference, and activate the designated process, but even that “process” could be a process set with an instance in multiple places. In my own test implementation, I had a single service-domain process “owned” by the seller of the service, and resource-domain processes “owned” by each administrative domain who offered behaviors. This is possible because my presumption was (and is) that model elements could generate events only to their own parent or child elements.

The place where special considerations are needed is the binding point between the domains. A bottom-level service model element has to exchange events with the top-level behavior element in the resource domain. In my implementation, the binding process was separate and provided for the event exchange between what were two different event queues.

This approach also conserves lower-level management processes in the resource domain, processes that are likely already in place. All that’s needed is to wrap the API of an administrative control point in an intent model (a resource model element) that can coerce behaviors and you can then “advertise” them to bind with services. This is possible at any level, and at multiple levels, meaning that if there is some over-arching service management system in place, that system could advertise behaviors on behalf of what it controls, and so could any lower-level control APIs.

For those who wonder about ONAP and my negative views on it, I’m hoping this blog helps explain my objections. ONAP is, IMHO, a telco view of how cloud-centric service lifecycle management would work. It’s monolithic and it doesn’t address many of the issues I’ve noted because it doesn’t rely on intent modeling or service/resource models based on intent models. I don’t think that ONAP was architected correctly at the start, I don’t believe they want to fix it (any more than the NFV ISG really wants to fix NFV), and I don’t believe it could be fixed even if they wanted to, without starting over.

I’m not saying that my approach is the only one that would work, just that I believe it would work and that I’ve done some proof-of-concept development to prove out the major points. I’d love to see some vendor or body take up the issue from the top down, as I have, and I’d be happy to chat with a group that takes on that responsibility.