Who Will Orchestrate the Orchestrators (and How)

What exactly is “service automation” and who does it?  Those are the two questions that are top of the list for network operators and cloud providers today, and they’re ranking increasingly high on the list of enterprises as well.  As the complexity of networks increases, as technology changes introduce hosted elements in addition to discrete devices, and as cloud computing proliferates, everyone is finding that the cost of manual service operations is rising too fast, and the error rate even faster.  Something obviously needs to be done, but it’s not entirely clear what that something is.

Part of the problem is that we are approaching the future from a number of discrete “pasts”.  Application deployment and lifecycle management have been rolled into “DevOps”, and the DevOps model has been adopted in the cloud by users.  Network service automation has tended to be supported through network management tools for enterprises and service providers alike, but the latter have also integrated at least some of the work with OSS/BSS systems.  Now we have SDN and NFV, which have introduced the notion of “orchestration” of both application/feature and network/connection functions into one process.

Another part of the problem is that the notion of “service” isn’t fully defined.  Network operators tend to see services as being retail offerings that are then decomposed into features (the TMF’s “Customer-Facing Services, or CFSs).  Cloud providers sometimes see the “service” as the ability to provide platforms to execute customer applications, which separates application lifecycle issues from service lifecycle issues.  The trend in cloud services is adding “serverless” computing, which raises the level of features that the operator provides and makes their “service” look more application-like.  Enterprises see services as being something they buy from an operator, and in some cases what they have to provide to cloud/container elements.  Chances are, there will be more definitions emerging over time.

The third piece of the problem is jurisdictional.  We have a bunch of different standards and specifications bodies out there, and they cut across the whole of services and infrastructure rather than embracing it all.  As a result, the more complex the notion of services becomes, the more likely it is that nobody is really handling it at the standards level.  Vendors, owing perhaps to the hype magnetism of standards groups, have tended to follow the standards bodies into disorder.  There are some vendors who have a higher-level vision, but most of the articulation at the higher level comes from startups because the bigger players tend to focus on product-based marketing and sales.

If we had all of the requirements for the service automation of the future before us, and a greenfield opportunity to implement them, we’d surely come up with an integrated model.  We don’t have either of these conditions, and so what’s been emerging is a kind of ad hoc layered approach.  That has advantages and limitations, and balancing the two is already difficult.

The layered model says, in essence, that we already have low-level management processes that do things like configure devices or even networks of devices, deploy stuff, and provide basic fault, configuration, accounting, performance, and security (FCAPS) management.  What needs to be done is to organize these into a mission context.  This reduces the amount of duplication of effort by allowing current management systems to be exploited by the higher layer.

We see something of this in the NFV approach, where we have a management and orchestration (MANO) function that interacts with a virtual infrastructure manager (VIM), made up presumably of a set of APIs that then manage the actual resources involved.  But even in the NFV VIM approach we run into issues with the layered model.

Some, perhaps most, in the NFV community see the VIM as being OpenStack.  That certainly facilitates the testing and deployment of virtual network functions (VNFs) as long as you consider the goal to be one of simply framing the hosting and subnetwork connections associated with a VNF.  What OpenStack doesn’t do (or doesn’t do well) is left to the imagination.  Others, including me, think that there has to be a VIM to represent each of the management domains, those lower-layer APIs that control the real stuff.  These VIMs (or more properly IMs, because not everything they manage is virtual) would then be organized into services using some sort of service model.  The first of these views makes the MANO process very simple, and the second makes it more complicated because you have to model a set of low-level processes to build a service.  However, the second view is much more flexible.

There are also layers in the cloud itself.  OpenStack does what’s effectively per-component deployment, and there are many alternatives to OpenStack, as well as products designed to overcome some of its basic issues.  To deploy complex things, you would likely use a DevOps tool (Chef, Puppet, Ansible, Kubernetes, etc.).  Kubernetes is the favored DevOps for container systems like Docker, which by the way does its own subnetwork building and management and also supports “clusters” of components in a native way.  Some users layer Kubernetes for containers with other DevOps tools, and to make matters even more complex, we have cloud orchestration standards like TOSCA, which is spawning its own set of tools.

What’s emerging here is a host of “automation” approaches, many overlapping and those that don’t covering a specific niche problem, technology, or opportunity.  This is both a good thing, perhaps, and a bad thing.

The good things are that if we visualize deployment and lifecycle management as distributed partitioned processes we allow for a certain amount of parallelism.  Different domains could be doing their thing at the same time, as long as there’s coordination to ensure that everything comes together.  We’d also be able to reuse technology that’s already developed and in many cases fully proven out.

The bad thing is that coordination requirement I just mentioned.  Ships passing in the night is not a helpful vision of the components of a service lifecycle automation process.  ETSI MANO, SDN controllers, and most DevOps, are “domain” solutions that still have to be fit into a higher-level context.  That’s something that we don’t really have at the moment.  We need a kind of “orchestrator of orchestrator” approach, and that is in fact one of the options.  Think of an uber-process that lives at the service level and dispatches work to all of the domains, then coordinates their work.  That’s probably how the cloud would do it.

The cloud, in fact, is contributing a lot of domain-specific solutions that should be used where available, and we should also be thinking about whether the foundation of the OofO I just mentioned should be built in the cloud and not outside it, in NFV or even OSS/BSS.  That’s a topic for my next blog.