What’s Really Needed to “Simplify” NFV

Intel says it will simplify NFV by creating reference NFVIs.  Is there a need for simplification with NFV, and does Intel’s move actually address it in the optimum way?  It depends on what you think NFV is and what NFVI is, and sadly there’s not full accord on that point.  It also depends on where you think NFV is going—toward more service chaining or to 5G and IoT.  In the ETSI NFV E2E model, “NFVI” or NFV Infrastructure, is something that hosts virtual functions.  It lives underneath a whole set of components and it’s really those components, and in particular the Virtual Infrastructure Manager (VIM) that frames the relationship between hardware and NFV.  We can’t start with NFVI, we have to look at things from the top.

If there’s NFV confusion, it doesn’t start with NFVI because logically the infrastructure itself should be invisible to NFV.  Why that’s true is based on the total structure.  It’s difficult to pull a top-down vision from the ETSI material, but in my view the approach is fairly straightforward.  Virtual Network Functions (VNFs) are the hosted analog of Physical Network Functions (PNFs), which are the devices we already use.  The goal of NFV is to deploy these VNFs and connect the result to current management and operations systems and practices.  Referencing a prior blog of mine, there is a hosting layer and a network layer, and the goal of NFV is to elevate a function from the hosting layer into the network layer, as a 1:1 equivalent of a real device that might otherwise be there.

If that is the mission of NFV, then the role of Management and Orchestration (MANO) is to take what might be a device that consists of multiple functions and is defined as a “service chain” and convert that into successive hosting interactions.  With what?  The VIM.  MANO doesn’t do generalized orchestration.  The VNFM’s role is to harmonize VNF management with management of PNFs.  Service lifecycles are out of scope, as is the control of the PNFs that remain.  In this model, a VIM is the cloud/virtualization software stack that deploys an application on infrastructure.  Many in the NFV ISG say that the VIM is OpenStack.

Logically, an operator would want to be able to deploy VNFs on whatever hosting resources it found optimal, including resources from open architectures like the Open Compute Project or Telecom Infrastructure Project.  Logically, they’d want to be able to embrace servers based on different chips (we just saw an announcement of a server partnership between Nvidia and the biggest server players, focusing on AI), and many think operators should be able to use something other than OpenStack as a VIM (see below).  We should be able to have portions of our resource pool focused on accelerated data plane, and portions focused on high compute power.  We should have what’s needed, in short.

Logically, we should be thinking of a VIM as being (in modern terms) an “intent model”, exposing standard SLAs and interfaces to MANO and implementing them via whatever software and hardware resources were desired.  If this is the goal, then it seems like a lot of the NFVI confusion is really a symptom of an inadequate VIM model.  If the goal of a VIM is to totally abstract the implementation of the deployment platform and underlying hardware, then there can’t be “confusion” in the NFVI because it’s invisible.  If there is confusion then it should be resolved by the implementation of the VIM.

This happy situation is compromised if you assume that there has to be one and only one VIM.  If the NFVI isn’t homogeneous, or if the operator elects to use virtualization software other than OpenStack, you end up with the problem of having all the options supported inside one VIM, which means that somehow a VIM would have to be vendor-neutral.  What vendor will provide that?  Is it a mandate for an open-source implementation of NFV?  Not yet.

I think that it’s particularly critical to be able to use VMware’s solutions rather than depend exclusively on OpenStack.  VMware is widely used, favored by many operators, and solid competition in the virtualization/cloud-stack space would be very helpful to network operators.  A VMware-modeled VIM would be helpful to NFV overall.

Whatever the motive behind VIM multiplicity, having more than one VIM means having some means of selecting which VIM you use.  If there is indeed diverse NFVI, then you need to have a mechanism to decide what part of the diverse pool you plan to use.  This isn’t complicated to do in theory if you have the right implementation, but a requirement to do it would have complicated the work of the NFV ISG and they elected not to address that issue.  I’ve called this a “resource domain” problem, a requirement that an intent model representing hosting/deployment represent a sub-model structure that can then link to the right deployment stack.

The VIM issues are bad enough when you consider the problem of deploying a VNF in the right place and right way, but they’re a lot worse when you consider redeployment.  Suppose I need a server resource with a widget to properly host a given VNF.  It’s certainly a problem if the only way I can make that available is to put it on all servers, because my VIM can’t select a server with it from a diverse pool.  But imagine now that my server breaks and there’s no other widget-equipped server in the same location.  I now have to reconfigure the service to route the connection to a different data center.  This is almost surely going to require configuring not only virtual switch ports for OpenStack, but configuring WAN paths.  Remember, I don’t have that capability because my MANO is focused on deploying the virtual elements and not setting up the WAN.

Intel could address this problem, but not with reference NFVIs.  What they needed to do was to hand it over to Wind River (part of Intel) and ask them to frame an open architecture for a VIM that had that internal, nested-models, capability needed to control diverse infrastructure using diverse virtualization software.  That would be a huge step forward, not only for Intel but for NFV overall.  It would also, of course, tend to open up the NFVI market, which may not be in Intel’s interest.

The need to have a very agile approach to managing virtual infrastructure goes beyond just different implementations of cloud hosting or CPUs.  Nokia has recently announced AirGile, what it calls a “cloud-native” model for NFV hosting that incorporates many of the attributes of functional/lambda/microservice programming that I’ve been talking about.  If you want to be truly agile with stateless elements for 5G and IoT (which is what Nokia is aiming at) then you need to be a lot more efficient in deployment, scaling, and redeployment.  Taking advantage of the AirGile model means having statelessly designed VNFs.  If we’re going to do that, we should rethink some of the use cases for NFV as well as the management model.

Vendors, including Intel and Nokia, clearly have their own visions of how NFV should work.  Add these to the multiplicity of open-source solutions, many with strong operator support and it shows that things aren’t going to converge on a single NFV model any time soon.  That means we have to be able to assess the relative merits of each approach, and the only way to fully understand or assess NFV is to take it from the top, in the most general case, and explore the implications.  I think that the biggest problem NFV had was starting from the bottom.  The second-biggest problem was excessive fixation on a single use case, virtual CPE.  Top-down, all-use-cases, is the way to go.

NFT’s problems can be solved, and in fact there are proposals in various forms and venues to do that.  One candidate is ONAP, and the first of several pieces explaining why can be found HERE.  Certainly ONAP needs to be tested, in particular in terms of how its use of TOSCA can address the modeling needs.  Is it best?  What’s needed is to explore the capabilities of these solutions in that general case I noted, testing them against the variety of service configurations and mixtures of VNF and PNF, and over a range of deployment/redeployment scenarios.  If we do that, we can ensure that all the pieces of NFV fit the mission, and that we simplify the process of onboarding VNFs, infrastructure, and everything else.  The ETSI ISG probably won’t be the forum for that to happen, and the use case focus that has biased the ISG is also biasing other activities.  We may have to wait for broader NFV applications (like 5G, as Nokia suggests, and IoT) to emerge and force a more general approach to the problem.