Taking a Model-Side View of Ops Automation

Earlier this week I blogged about virtualization and infrastructure abstraction.  I left the modeling aspect of that to the end because of my intentional focus on the abstraction side.  Now I’d like to look at the same problem of virtualization from the modeling side, so see where we end up and whether conclusions from that direction support the conclusions you get when you start with the notion of an “abstraction layer”.

Operations automation, whether we’re talking about applications or service features, is usually based on one of two models that were popularized in the “DevOps” (Development/Operations) automation activity that started about a decade ago.  One model, the oldest, is an extension of the “shell scripts” or “batch files” routinely used on computer systems to execute a repetitively used series of commands.  This is now called the “prescriptive” or “imperative” model; it tells the system what to do explicitly.  The other model, the “declarative” model, describes the goal state of a system and then takes automatic (and invisible) steps to achieve it.

What we call “intent modeling” today is an expansion of or variation to the declarative approach.  With an intent model, an application or service is divided into functional elements, each of which is represented by a “black box” that has externally visible interfaces and properties, but whose interior processes are opaque.  These elements don’t represent the functionality of the pieces in the data plane, but rather the lifecycle processes associated with them, which we could call the management or control plane.  If you tell an intent-modeled element to “deploy”, for example, the interior processes (that, you recall, are invisible) do what’s needed to accomplish that goal or “intent”.  Any element that meets the intent is equivalent to any other such element from the outside, so differences in configuration and infrastructure can be accommodated without impacting the model at the high level.

This raises a very important point in modeling, which is that the model represents an operations process set associated with lifecycle automation.  There are process steps that are defined by the model, and others that are embedded within model elements, opaque to the outside.  Model elements in a good operations model could be hierarchical, meaning that inside a given element could be references to other elements.  This hierarchical approach is how a general function (“Access-Element”) might be decoded to a specific kind/technology/vendor element.  In most cases, the bottom of any hierarchical “inverted tree” would be an element that decoded into an implementation rather than another model hierarchy.

The qualifier “in most cases” here arises from the fact that there’s an intrinsic limit to how low a model hierarchy can go.  A management system API, which is probably the logical bottom layer of a model, could represent anything from the specific control of an individual device to a broad control of features of a device community.  Remember the old OSI management model of element/network/service management layers?  If the API you’re manipulating in the end is a service-level API, then you can’t expect to model devices—they’re inside the bottom-level black box.

This means that there are really two levels of “automation” going on.  One layer is represented by the model itself, which is under the control of the organization who writes/develops the model.  The other is represented by the opaque processes inside the bottom model element, which is under the control of whoever wrote the implementation of those opaque processes.  In our example, that would be the implementation of the service management API.  Whatever isn’t in the bottom layer has to be in the top.

Another implication of this point is that if different products or vendors offer different levels of abstraction in their management system APIs, there might be a different modeling requirement for each.  If Vendor A offers only element management access, then services have to be created by manipulating individual elements, which means that the model would have to represent that.  If Vendor B offers full service management, then the elements wouldn’t even be visible in that vendor’s management system, and could not be modeled at the higher level.

This same issue occurs if you use a model above a software tool that offers a form of resource abstraction.  The Apache Mesos technology that’s the preferred enterprise solution for very large-scale virtualization and container deployments has a basic single-abstraction API for an arbitrarily large resource pool, and all the deployment and connection decisions are made within the Mesos layer.  Obviously you don’t then describe how those decisions should be made within your higher model layer.  If you don’t use something like Mesos, though, you need to have the modeling take over those deployment/connection tasks.

The cloud community, who I believe are driving the bus on virtualization and lifecycle automation, have already come down on the resource abstraction side of this debate, which to me means that for service lifecycle automation the network operator community should be doing the same thing.  That would align their work with the real development in the space.  To do otherwise would be to expect that network/service lifecycle automation and application/cloud automation will follow different paths with different tools.

My dispute with the NFV ISG, with ONAP, with the ETSI ZTA stuff, revolves around this point.  When we try to describe little details about how we’d pick the place to put some virtual function, we are discarding the experience of the cloud community and deliberately departing from their development path.  Can the networking community then totally replicate the cloud’s work, in a different way, and sustain that decision in an ever-changing market?  Can they do that when a hosted application component and a hosted virtual function are the same thing every way except in nomenclature?

That so many in the operator world are now infatuated with “cloud-native” functions only makes the separation of network functions and cloud applications sillier.  How cloud-native are you if you decide to build your transformation on developments that are in little or no way related to what’s going on in the cloud you’re trying to be native to?  It makes you wonder whether the term is anything more than fluff, an attempt to give new life to old and irrelevant initiatives.

We are, with developments like containers, Kubernetes, Mesos, Istio, and other cloud-related tools, reaching the point where the cloud has defined the ecosystem within which we should be considering future network operator transformation.  Surely everything above the connection layer of services is more like the cloud than like the network.  Much of what’s inside the connection layer (the hosted elements) are also cloud-like, and the way that connection/network devices relate to each other and to hosted components is totally cloud-like as well.  Why the heck are we not using these tools?  What ETSI should be doing now is not spawning new groups like ZTA, or continuing old groups that are only getting further off-track daily as the NFV ISG is.  They should be aligning their goals to the cloud, period.

One critical step here is the tension between modeling and resource abstraction; in fact, abstraction in general.  We could accomplish a lot even in the short term if we could convince vendors in the networking space to think about “abstraction APIs” and how these APIs could integrate with modeling tools like TOSCA.  This kind of thing would not only benefit the network transformation efforts of operators, but also the cloud itself.  Resource abstraction is increasingly a part of the evolution of cloud tools, after all.

Along the way, they can raise some valid issues for the cloud community.  There is already cloud work being done in data-plane connectivity for hosted elements; it should be expanded to address the needs of virtualization of network data-plane elements like routers and switches.  There are constraints to hosting location or to “proximity” of components to each other or to common failure sources that have been pointed out in networks and are only now becoming visible in cloud services.  This can be a give-and-take, but only if everyone is part of the same initiatives.  Let’s get it together.