We Don’t Need to Modernize NFV, We Need to Move Beyond It

It seems impossible to shake the debate on “containerized” virtual network functions for NFV, even as we should be debating the generalization of cloud-ready models.  Red Hat and Intel, both with a considerable upside if network devices were suddenly turned into hosted virtual functions, has launched an onboarding service and test bed to facilitate “CNFs”.  I’d be the last person to say that we didn’t need to think more about containerizing virtual functions, versus nailing them to virtual machines, but I think we’re at risk to focusing on a limited symptom here.  Some more important points are becoming obvious, so it’s hard to see how they’re still being ignored.

If you were to look at a very general vision of NFV, you might expect it to look a lot like DevOps.  You start with a declarative model of a service and from that you invoke specific steps to manage the lifecycle, from deployment through tearing it down.  The specific tasks would be carried out by integrated tools that could deploy on containers, virtual machines, or bare metal, and these same tools would be able to parameterize and configure both systems (virtual or otherwise) or networks.

I was involved in NFV from the first, and in those early days the vision was in fact fairly generalized.  Even when it was codified into the “End-to-End model” by the NFV ISG, you still had fairly high-level concepts represented by blocks.  We had “management and orchestration” (MANO), VNF Manager (VNFM), Virtual Infrastructure Manager (VIM), and so forth.  It was really the case studies that created the issues, I think, combined with the fact that the NFV ISG was populated mostly by network types doing a software/cloud job.

What issues are we talking about?  I contend that there were three.  First, NFV focused on virtualizing devices that deployed within a single customer’s service, not on building multi-service infrastructure.  Second, NFV presumed that VNFs would deploy within VMs.  Third, NFV permitted or even encouraged a strict translation of the functional E2E model into reference code.  We’ll take each of these, and their impacts on market requirements and NFV suitability, below.

So on to our first issue.  From the very first, a series of interlocking and seemingly harmless decisions relating to what NFV virtualized conspired to put NFV in a subordinate rather than transformative mode.  A “virtual network function” was determined to be the software-hosted equivalent of a “physical network function”, which in turn was a device.  A VNF was to be managed by the same element management and higher-layer tools as those devices were.  The only exception was that features that might typically have been delivered by daisy chaining several CPE devices were to be implemented as “service chains” of VNFs, which eventually turned into co-hosting multiple VNFs in a piece of universal CPE (uCPE).

The problem with this sequence is that virtualizing CPE (vCPE) impacts only the cost of some business services, not the vast majority of infrastructure capex.  SDN had taken the approach that “routing” was possible without “routers”, and sought to define how that might work.  NFV could have enveloped the SDN initiatives, defining IP networks as enormous virtual routers and letting everyone innovate with respect to what was inside the black box, but they didn’t.  As a result, NFV missed the critical notion of abstraction that’s central first to virtualization and then to the cloud.  Without a powerful abstraction concept at its core, NFV is not going to be able to define a cloud-native toolkit.

Even if we were to extend NFV to multi-service infrastructure by deploying, for example, virtual routers, the majority of the NFV work focuses on deployment, which for multi-service virtual routers is largely a one-off issue.  If a virtual router could be an entire network, we might have been able to apply lifecycle management automation within a virtual-router black box, but with only real devices to abstract, we’re left with the same management structure that devices had.  The rest of the service lifecycle isn’t in scope for NFV at all, since device, network, and service management are all ceded to a higher-level process.  Thus, we can’t expect opex to be impacted, and without a significant capex/opex benefit, we’ve accomplished nothing.

The second issue started with a natural assumption.  Hosting virtual functions on a one-per-server basis wasn’t likely to have a favorable impact on capex.  Virtual machines were the rage in the 2013 timeframe, and OpenStack was the open-model approach to deploying things in virtual machines.  It’s not surprising that NFV presumed both VMs and OpenStack, but things went a bit south from there.

The biggest problem was the Virtual Infrastructure Manager.  Over the course of the proof-of-concepts introduced to the NFV ISG, the VIM got a bit conflated with OpenStack.  The more people thought about OpenStack VIMs, the less the thought was applicable to the general question of how you host something.  The singular flavor of OpenStack percolated upward to the VIM, making it specialized not only to VMs but to OpenStack itself.

“Singular” is a nice segue to the final issue.  Functionally, the VIM sits at the bottom of a deployment, translating an abstract view of something into a software-to-VM commitment.  We could have salvaged a lot of NFV had the functional diagram of NFV not been strictly translated into reference implementations.  There is a single block for MANO, VNFM, and VIM, and to many that meant that these were three monolithic elements.  Functional had become architectural.

For the specific issue of VNF versus CNF, the singularity of the VIM creates a massive and unnecessary problem.  Any service “object” or abstraction needs to be realized, and so a service model would be expected to define the software element responsible for that realization.  If a VIM were a logical structure, one as varied as the options for realizing a functional element, there would be no need to fret about VNFs and CNFs.  If MANO were similarly logical, we could map its functionality to any of the DevOps tools, or to Kubernetes for containers.  The decision to translate functional blocks into explicit implementations robbed NFV of that flexibility.

If we go back to that initial, simple, NFV model I opened with, we should expect that an abstract service would decompose into simple service elements, which in turn would be realized either by committing them to a device or system of devices, or by deploying virtual functions.  The three issues I’ve cited here interfered with our capability to realize this simple approach with a conformant NFV implementation.  That is the problem.  The whole CNF/VNF thing is simply a symptom of that problem.

I’m raising this point because we’re creating a risk with this kind of discussion, not a reduction in risk.  The risk is that we’ll spin our wheels to cobble together a vision for a CNF that can be glued onto the NFV model as the ISG framed it, a model which is simply not conformant to cloud principles.  How do monoliths like MANO and VNFM and VIM create an elastic cloud?  They’re not even elastic themselves.

What these points add up to is simple.  First, hosting virtual instances of the kind of devices normally considered “infrastructure” doesn’t even require NFV because the devices have to be hosted where the trunks terminate.  Second, hosting virtual functions in white boxes doesn’t require NFV, because the white boxes replace appliances that go in static locations (again).  Third, virtual function missions associated with individual users are beneficial only for high-value business services, and won’t move the capex/opex ball.  Fourth, dynamic cloud functionality is going to be delivered through cloud software, not through something like NFV, defined by the network community.  That’s especially true given that operators seem to be moving away from building their own clouds.

We could easily make NFV container-ready, simply by taking the right position with respect to the VIM and allowing any number of model-specified VIMs be used in a given service.  The question is “Why bother?”  Why try to create another cloud deployment model when we have an exploding ecosystem of container software and Kubernetes already.  What the operator community should do is let NFV be a limited business-service strategy, and move on.  We don’t need to make NFV “cloud-ready”, we need to skip it and move to the cloud.