Should NFV Adopt “Infrastructure as Code” from the Cloud?

From the first, it was (or should have been) clear that NFV was a cloud application.  Despite this, we aren’t seeing what should then have been clear signs of technologies and concepts migrating from the cloud space into NFV.  One obvious example is TOSCA, the OASIS Topology and Orchestration Specification for Cloud Applications, which has been quietly increasing its profile in the NFV space despite a lack (until recently) of even recognition in the ETSI NFV activity.  But I’ve talked about TOSCA before; today I want to look at “Infrastructure as Code” or IaC.

IaC is a development in the DevOps space that, at first glance, is actually kind of hard to distinguish from DevOps.  Puppet and Chef both talk about it, and Amazon has picked the notion up (along with Chef) in its OpsWorks stuff.  The explanation of just why we have IaC and DevOps independently is not only useful for IaC, it’s also instructive in how NFV’s own management and orchestration should be expected to work.

Any virtualized environment is a combination of abstraction and instantiation.  You have an abstract something, like an application or virtual function, and you instantiate it, meaning that you commit it to resources.  In software, “DevOps” or “Development/Operations” described an initiative to transfer deployment and later lifecycle management information from the application development process forward into data center operations.  Because both virtualization and DevOps end up with deploying or committing resources, the similarity at that level overwhelms an underlying difference—one is really a layer on the other.

DevOps is about the logical organization of complex (multi-element) applications.  But if you’re doing virtualized resources, the resources have a life separate from that of applications.  A server pool is a server pool, and some VMs within it are the targets of traditional DevOps deployment.  But not only does the existence of a resource pool versus a specific resource complicate the deployment, it also separates the management of resources as a collection from the management of applications.

The DevOps people, especially market leaders Chef and Puppet, were among the first to see this and to reflect it in their products through the addition of resource descriptions.  You could describe what was needed to commission a resource in a pool or independently, just as you could describe what was needed to commission an application.  Rather than trying to tie the two tightly, these evolving changes reflected an interdepencence.  They created another side to the DevOps coin, and that other side became known as IaC, to reflect the fact that DevOps-like tools were to be used to commission resources and to handle their lifecycle management.

It’s my view that what makes IaC a critical concept in DevOps and the cloud should also make it critical in NFV, and probably in SDN too.  Resources are always separate from what they’re resources for, meaning separate from the deployment of applications/components or the threading of connections.  The mission that commits them—which we could call a “service” or an “application”—is one deployment and management domain, and the resources themselves are another.  Logical, and perhaps even compelling, but not something we hear about in NFV.

It’s also interesting to note that what the DevOps community seems to be doing (or moving to do) is supporting the “interdependence” I talked about earlier by providing an event-based link between the DevOps and IaC processes.  The two are separate worlds when everything is going normally, but if the IaC operations activities are unable to sustain the resource lifecycle properly, then they have to trigger a DevOps-level activity.

An example here is that of a failed instance of a component or virtual function.  You might have a resource-level process that attempts to recover the lost component by simply reloading or restarting it, or by instantiating a new copy local to the original.  But if you need to spin that copy up in another data center, you need to make connections that are outside the domain of the resource control or IaC processes and you have to kick the problem to another level.

This shows a couple of critical points.  Obviously some resource conditions, like the failure of a resource not currently committed to anything, has to be handled at the “IaC level” by NFV, whether we have such a function or not.  You wouldn’t want to deal with that kind of failure only when you tried to deploy.  Second, there are some kinds of failures of resources that could be handled at the resource level alone, and others that would require higher-level coordination because multiple resource types are involved.  There are also things, like a service change initiated by the customer, that could require high-level connection/coordination first, but might then require something at the resource level—setting up vCPE devices on prem for example.

Virtualization introduces not two layers, potentially, but three.  We have services/applications and resources, but also “mapping”.  Resource management is responsible for maintaining the pool, services/applications for maintaining what’s been hosted on the pool, but the “virtualization mapping” or binding process is itself dynamic.  The trend with the cloud and IaC seems to be to presume that resource issues, including mapping issues, are reported as service/application events.  With NFV there is at least an indication of a different approach.

Arguably, NFV presents a three-layer model of “orchestration” (which, by the way, Verizon’s architecture makes explicit).  You have services, then MANO, then the Virtual Infrastructure Manager (VIM).  None of these three layers correspond to IaC because pure resource management is out of scope.  Service-layer orchestration is recognized but not described either.  Presumably, in NFV, resource conditions/events that impact orchestrated deployments are reflected into the VNF Manager.  The MANO-level orchestration is where mapping/binding management is sustained, meaning that any “resource” problems that aren’t automatically remediated at the resource level are presumed to be handled by MANO.  IaC would then be “in” or “below” the VIM.

Where or why, you may be thinking is the integration of IaC with NFV important?  My view is that lifecycle management has to be coordinated wherever there are layers of functionality.  Logically if we have cloud-like IaC going on with NFV resources, then that IaC should be the source of “events” to signal for the attention of higher-layer lifecycle processes, be they MANO or service/OSS/BSS.  If I have an “issue” with resources, the IaC gets the first shot, then signals upward to (hypothetically) VNFM/MANO, and then upward to OSS/BSS or “super-MANO”.

The ETSI ISG has generally accepted the notion of a higher orchestration layer, which is good because the network operators are writing that into their approaches.  The only thing, as I’ve said before, is that if you have multiple orchestration layers you have to define how they communicate, and that should be somebody’s next step.  Defining APIs implies a non-event interface, and there is no question that all of the layers of orchestration create parallel, asynchronous, processes that can’t communicate except through events.

More broadly, the IaC point is another example of what I opened with, the need to contextualize NFV in light of other industry developments in general, and the cloud in particular.  The cloud is much more advanced than NFV in both technology thinking and market acceptance.  It’s framing tomorrow’s issues for NFV today.  The NFV ISG has, like most standards groups, set narrow borders for itself to insure it can make progress rather than “boil the ocean”.  That’s fine as long as the rest of the technology landscape can be harmonized at those borders, and I think IaC makes it clear that efforts to do that may be getting outrun by events.