5G, Virtualization, and Complexity: The Ops Dimension

One operator offered an interesting view of hosted-function networking: “In some ways I hope it’s different from devices, and in others I hope it isn’t.”  The “I-hope-it-is” position is based on the economic driver for using hosted virtual functions; it would be cheaper than proprietary appliances.  The “I-hope-it-isn’t” position relates to the fear that the network operators don’t have a handle on the operational issues associated with function hosting, including operations cost and complexity and security/compliance.

If you host something instead of buying the same functionality in device form, you’ve generated a second tier or layer of infrastructure, below the functionality.  It’s convenient to use this model as the basis for our discussion here.  We have a “functional” layer that represents the way the functions themselves interact with each other and with their own EMS/NMS/SMS structure.  We have an “infrastructure” layer that represents the interactions involved in sustaining the hosting infrastructure and sustaining the lifecycle of the functions.  Got it?

There are two impacts that this dualism creates, obviously.  The first is that whatever you’d expect to do to manage the functional behavior, you now have the additional requirement of managing the infrastructure layer.  Nearly all the operators who have experience with this tell me that infrastructure lifecycle operations is more complex and expensive than functional, meaning device, management.  That shouldn’t be a surprise given the number of things that make up a cloud platform, but somehow it seems to have surprised most operators.  The second impact, at least as often overlooked, is in security and governance.

Both these impacts are important in next-gen networking, and more so where network technology relies explicitly on “virtualization” of devices/functions.  5G is such a place, but hardly the only one.  In this blog, I’m going to talk about operations considerations arising from the structure/function dualism, and I’ll address security in the next one.

One factor that complicates both issues is that having two “layers” actually means having three.  Yes, we have functional and infrastructure, but we also have what I’ve called a “binding layer”, the relationship between the two that’s established by provisioning applications onto infrastructure.  The relationship between the two layers is important for the obvious reason that you have to host functionality on infrastructure, but also because problems in either layer have to be considered in the context of the other, and perhaps fixed there.

The operational challenge created by the two separate layers has already been noted; you have two layers to operationalize instead of one and that’s pretty easily understood.  The binding layer introduces a different set of issues, issues relating to the relationship between the other two layers.  Since that relationship is at least somewhat transient and dynamic, normal management problems are exacerbated, and since the two layers are in a sense both independent and interdependent, the problems can be profound.

The binding process, the linkage of the layers into a service-based relationship, can be subsumed into the infrastructure layer or treated through a separate piece of “orchestration” software.  The latter seems to be the prevailing practice, so let’s assume it’s in place.  When a service is created, the functional elements are deployed (by orchestration), and any necessary pathways for information (data, control, and management planes) are connected.  How this is done could be very important, and that may or may not be “remembered” in any useful way.  Whether that hurts depends on which of two broad management approaches are taken.

The first of these options is to assume that the top layer is responsible for the SLA and also for remediation.  The bottom layer is only responsible for fixing failures that aren’t functional failures.  If something in the infrastructure layer breaks and that breaks functionality, the presumption is that the functional layer will “re-bind” the functional element to infrastructure, which fixes the functional problem.  It’s then up to the infrastructure layer to actually fix the real problem.  If the infrastructure breakage can be remedied within the infrastructure layer through some simple replacement or re-parameterization, it never reaches the status of a functional failure at all.  Ships sort-of-in-the night.

“Orchestration” in this approach sits outside both layers, creating the original binding between the layers but not really using whatever knowledge it might have obtained out of its process of hosting and connecting things.  That means that there’s really no “systemic” vision of what a service is built from.  You can probably replace a component that has failed even without that knowledge, but a broader multi-component failure could leave you with no understanding of how to connect the new with the remaining non-failed elements.

The second option is to assume that the binding layer is the source of remediation, of lifecycle management.  In this option, the binding process, and the “binding layer” are the real objective.  We need a map, a model, of infrastructure, stated in service terms.  The presumption is that there’s a “template” for service-building that is followed, a blueprint on how to commit infrastructure to the functional layer.  This is filled in with details, the specific commitments of resources, as deployment progresses.

With this approach, the model/map is available throughout the lifecycle of the service.  It’s possible to use it to rebuild a piece of the service, even a large piece, because the connections between elements are recorded, and where the rebuilding impacts those connections, they can then be rebuilt as well.  If there’s a problem at either the functional or infrastructure level, it’s possible to correlate the impact across the binding layer because we’ve recorded the bindings.

The problem with this approach is that it requires a very sophisticated model, one that doesn’t try to model topology directly, but at some level has to incorporate it in at least virtual terms.  Some modeling approaches, like YANG, are more topology-related and not particularly suited to modeling bindings.  TOSCA can be enhanced to represent/record the structure of a “service” from a binding perspective, and a few firms have been playing with that.  I’ve used XML to do the modeling in the experiments I’ve run because it lets me control the way that relationships are represented without imposing any assumptions by the syntax of the modeling language.

The take-away here is that virtual-function infrastructure is going to add operations complexity somewhere.  Either you add “technology complexity” to the modeling phase, and carry it through deployment and lifecycle management, or you add complexity to the lifecycle management process when you encounter a condition that crosses the border between the functional and the infrastructure layers.  Right now, it’s my sense that we’re dodging the problem by not recognizing it exists, and if 5G really does commit us to broad-scale virtualization of functions/features, we’ll have to fix this border problem when we try to deploy and exploit it.