What Do We Need to Make VNFs Open and Interoperable?

In prior blogs, I’ve talked about the vCPE mission for NFV—why it’s gaining in importance and what models of vCPE might be favored.  Some people have asked (reasonably) whether there were any technical issues on NFV implementation raised by the vCPE model.  There are; some that are more exposed by vCPE than created by it, and some that are created by it.

vCPE is challenging for two primary reasons.  First, the vCPE model almost demands highly portable VNFs.  You have to be able to deploy vCPE on various platforms because there are probably multiple and fairly different) edge-hosting alternatives developing, and because the cloud has to be one of the hosting options too.  Second, vCPE isn’t an NFV service, but rather it’s an NFV piece of a broader service (like Carrier Ethernet) that already exists and will likely be deployed based on legacy technology for years to come.

Some of the developments I cited in my last blog (the Wind River and Overture announcements) are driven by the first point.  Everyone who’s tried to run a Linux application in the cloud knows that operating system and middleware version control can be critical.  OpenStack has to run on something, after all, and so do the VNFs that become either machine images or container images.  However, while platform uniformity is a necessary condition for VNF portability, it’s not a sufficient condition.

Programs rely on APIs to connect to the outside world.  Chained VNFs in vCPE are no different.  They have to connect to other things in the chain, they have to connect to the real access and network ports of the service they’re integrated with, and they have to connect to the NFV management and operations processes.  If there are “private” VNF Managers (VNFMs) involved, then there may also have to be links to the resource/infrastructure management processes.  In short, there could be a lot of APIs, like a puzzle piece with a lot of bays and peninsulas to fit other things with.

All of these are written into the virtual function software, meaning that the program expects a specific type of API that’s connected in a specific way.  vCPE because of the multiplicity of things it might run on and the multiplicity of ways it might be composed into services, probably generates more variability in terms of these bays and peninsulas than any other VNF mission.  For service chaining and vCPE VNFs to be open and portable, we’d have to standardize all these APIs or every different framework for vCPE deployment would demand a different version of VNF software.

One of these bay/peninsula combinations represents the port-side (user) and trunk-side (network) connections for the service.  Obviously the user has to be a part of a service address space that allows a given site/user to communicate with others that share the service (VPN or VLAN, for example).  These are public addresses, visible to the service users.  But do we want that visibility for the interior chaining connections, for the management pathways, for the resource connections?  Such visibility would pose a serious risk to service stability and security.  So we have to define address spaces, and keep stuff separate, to make vCPE work.

None of these issues are unique to vCPE, but vCPE surely has to have them resolved if it’s to succeed on a large scale.  There are other issues that are largely unique to vCPE, at least in one specific form, and some of these are also potentially troubling.

One example is the question of the reliability of a service chain through an insertion or deletion.  We talk about a benefit of vCPE as being the ability to support no-truck-roll in-service additions of features like firewall.  The challenge is that these features would have to be inserted into the data path.  How does one reroute a data path without impacting the data?  And it’s not just a matter of switching a new function into a chain—the things both in front of and behind that point have to be connected to the new thing.  A tunnel that once terminated in a user demarcation might now have to terminate in a firewall function.  And if you take it out, how do you know whether there’s data in-flight along the path you’ve just broken down?

Another example is on the management side.  We have a dozen or more different implementations of most of the vCPE features.  Does the user of the service have to change their management practices if we connect in a different implementation?  Sometimes even different versions of the same virtual function could present different features, and thus require a different management interface.  If we’re updating functions without truck rolls, does the user even know this happened?  How would they then adapt their management tools and practices?

Staying with management, most CPE is customer-managed.  A virtual function in the middle of a service chain has to be manageable by the user to the extent that the function would normally be so managed.  If I can tune my firewall when it’s a real device, I have to be able to do the same thing if I want when I have a virtual firewall.  But can I offer that without admitting the user to the management space where all manner of other things (mostly damaging) might be done?  Do I have to have both “public management” and “private management” ports to address the dualistic notion of CPE management?

These problems aren’t insurmountable, or even necessarily difficult, but they don’t solve themselves.  The fact that we can tune VPFs to work in a given environment is nice, but if we have to tune every VNF/environment combination it’s hard to see how we’re going to gain much agility or efficiency.

How have we gotten this far without addressing even these basic points?  Most of the work done with vCPE has either been focused on a specific partnership between hosting and VNF vendors (where the problem of multiple implementations is moot) or it’s been focused on simply deploying via OpenStack, which tends to expose management and operations processes at the infrastructure level to VNFs and VNFMs.  That breaks any real hope of interoperability or portability.

You can’t have an open, portable, model of VNFs if every VNF decides how it’s going to be deployed and managed.  At the least, this approach would demand that a service architect or customer would have to understand the specific needs of a VNF and adapt scripts or diddle with management interfaces just to connect things in a service chain.  A service catalog would be either dependent on a single vendor ecosystem where everything was selected because it made a compatible connection choice, or simply a list of things that couldn’t hope to drive automated assembly or management.

I suggested in a prior blog that we establish a fixed set of APIs that VNFs could expect to have available, creating a VNF platform-as-a-service programming environment.  The key requirement for VNFPaaS is abstraction of the features represented by each interface/API.  It’s not enough to say that a VNF or VNF Manager can access a MIB.  If the VNFM has to be parameterized or modified to work with a given MIB, then we’ve created a brittle connection that any significant variation in either element will break, and we’re done with any notion of assembling services or deploying new features on demand.

The current PoCs and trials have identified where these VNFPaaS features are needed, and with a little thinking they’d provide all the insight we need to define them and start them moving toward specification and standardization.  This is an achievable technical goal for the NFV ISG and also for OPNFV, and one both bodies should take up explicitly.  A related goal, which I’ll cover in a blog next week, is to define a similar abstraction for the Virtual Infrastructure Manager’s northbound interface.  If those two things were nailed down we could make major progress in advancing the cause of an open, optimal, model for NFV.