Making NFV as Good, and Cloud-Centric, as It Should Be

I don’t think NFV will ever be what its proponents hope it will be, but I do think it can be better.  Here’s the big question that the Network Functions Virtualization (NFV) initiative has to answer.  This is a cloud project, so why not simply adopt cloud technology?  That didn’t happen, so we have to ask just what makes NFV different from the cloud.  Is it enough to justify an approach only loosely linked with the cloud and virtualization mainstream?  In one case, I think the specs have undershot the cloud’s potential, and in another I think it’s cloudier than it should be.

The stated mission of the NFV Industry Specification Group was to identify specifications that could guide the deployment of hosted virtualized network functions (VNFs) that would replace “physical network functions”, meaning purpose-built devices.  This mission is a mixture of traditional cloud stuff and traditional network stuff.  What we need to do is to look at the issues with VNF deployment and whether they fit reasonably within the cloud model or require some significant extension.

To start with, there is no inherent reason why hosting any feature of a service would be any different than hosting a component of an application, in deployment terms.  The process of deploying and redeploying VNFs should be considered an application of cloud or container software and the associated orchestration tools, like Kubernetes or Marathon.  Thus, if we look at deploying VNFs, it seems we should be simply handing that process off to cloud tools with as little intervention as possible.

On the network-related side of deployment, there are two principle differences.  First, applications are not normally data-plane forwarding elements.  That means that there are potential hardware and software optimizations that would be appropriate for VNFs and not as likely to be needed (but not impossible) for the cloud.  These can be accommodated with cloud tools, though.  Second, nearly all VNFs require operational parameterization and ongoing management, which is fairly rare with applications.  Standard cloud facilities don’t offer these capabilities in the form that VNFs would expect.

The network part of the VNF picture is critical in the sense that networks are cooperative communities of devices that have somewhat broadly controllable individual behavior.  The system has to be functional as a system, and that’s complicated by the fact that making the system functional is partly a matter of adaptive behavior within the system and partly a function of remediation outside it.

Moving up the ladder from basic deployment, the unit of deployment in the cloud is an application, which is a combination of components united by a workflow and serving a community of users.  The presumption in the cloud is that the application is an arbitrary functional collection owned by the user, so there’s no expectation that you’d stamp a bunch of the same application out with a functional cookie cutter.  On the network side, the unit of functionality is the service, and a service is a set of capabilities provided on a per-user basis (leaving the Internet out of the discussion).  It is essential that a given “service” be offered in a specific form and, once ordered, deploy to conform with the specifications, which include a service-level agreement (SLA).

It’s my view that this point is what introduces the real need for “orchestration” in NFV, as something differing from the comparable tools/capabilities in the cloud.  However, the way to approach it is to say that a service is modeled, and that when it is ordered, the model is used to deploy it.  Once deployed, the same model is used for lifecycle management.  Given this, the first step that NFV should have taken was to define a service model approach.  The TMF has a combination of models of various types, called “SID”, and while I don’t think it’s structured to do what’s needed in an optimal way, it at least is a model-driven approach.  The NFV ISG hasn’t defined a specific model, and thus a means of model-driven lifecycle management.  That’s where it falls short of cloud evolution.

One thing that makes commentary on service modeling so important is that operators now recognize that this should have been the foundation of transformation all along.  Another thing is that service modeling is a potential point of convergence for the cloud and network sides of our story.

Properly constructed service models do two important things.  First, they show the dependencies between component features of a service.  That lets faults propagate as they must.  Second, they provide a basis for ongoing lifecycle management by retaining state information per component and acting as a conduit for events into a service process.  These missions were recognized by the TMF’s work, including both SID and NGOSS Contract.  They’re critical for service automation, but if you think a moment you can see that they are likely to become critical for application automation as well.

Deployment of complex multi-component applications on virtual infrastructure is a whole new dimension in application deployment management.  The process isn’t exactly like service deployment in that you don’t have to stamp out thousands of instances of an application and manage them all independently, but it still benefits from automation.  DevOps and application orchestration interest proves that.  The cloud is coming to NFV’s rescue, in an orchestration sense.

Where NFV may be too cloudy, so to speak, is in the ever-popular concept of “service chaining”.  This particular notion has been a fixation in the NFV ISG from the first.  The idea is to create a service by chaining linear functions that are hosted independently, and that is a problem for two very specific reasons.

The first reason is that the application of service chaining is almost totally limited to “virtual CPE”, the use of hosted elements to replace customer-edge appliances.  We actually have some interest in vCPE today, but it’s in the form of general-purpose boxes that can edge-host custom-loaded features.  However, this mission is most valuable for business services whose terminating devices are fairly expensive.  That makes it a narrow opportunity.

The second reason is that if you really wanted to chain functions like that in a linear way, you’d almost surely want to construct a single software image with all the functions included.  Software transfer of data between functions isn’t going to require network connections or generate additional complexity.  A vCPE element with three functions has three hosts and two connecting pipes, five elements in all in addition to the input and output.  A single-image solution has none.

For those who say the chain is more reliable, remember that any break in any of the five elements breaks the connection, and that’s more likely to happen than a break in a single-image hosting point.  Just because there are more things that have to work.  The three-function chain also poses three times the integration problems, because each of the three functions will have to be converted into a VNF and deployed and managed.  There are more resources to manage, more operational complexity.

In the cloud, the dangers of over-componentizing are widely accepted and written about almost daily.  Service chaining is like the worst of over-componentization; it adds complexity and cost and provides nothing in return.  There was no reason to seize on this early on, and less reason to stay fixated on it today.

The ISG is looking at steps to take after its current mandate expires.  Two good ones would be to develop a specific NFV service model, based on a cloud standard like OASIS, and fit NFV orchestration and management to that model.  The other to step back from chaining services and toward the use of composed multi-VNF images to replace those chains.  If these two things were to be done, it would better align NFV with cloud thinking, and that should be the goal.