Can We “Open” NFV or Test Its Interoperability? We May Find Out.

I suspect that almost everyone involved in NFV would agree that it’s a work in progress.  Operators I’ve talked with through the entire NFV cycle—from the Call for Action white paper in the fall of 2012 to today—exhibit a mixture of hope and frustration.  The top question these operators ask today is how the NFV business case can be made, but the top technical question they have is about interoperability.

Interoperability has come to the fore again this week because of a Light Reading invitation to vendors to submit their NFV products to the EANTC lab for testing, and because a startup who promises an open-source and interoperable NFV came out of stealth.  I say “come to the fore again” because it’s been an issue for operators from the first.

Everyone wants interoperability, in no small part because it is seen as a means of preventing vendor lock-in.  NFV is a combination of three functional elements—virtual network functions, NFV infrastructure, and management and orchestration—and there’s long been a fear among operators that vendors would field their own proprietary trio and create “NFV silos” that would impose different management requirements, demand different infrastructure, and even support only specific VNFs.

That risk can’t be dismissed either.  The ETSI NFV ISG hasn’t defined its interfaces and data models in sufficient detail (in my view, and in the view of many operators) to allow unambiguous harmony in implementation.  We do have trials underway that integrate vendor offerings, but operators tell me that the integration mechanisms aren’t common across the trials, and so won’t assure interoperability in the broad community of NFV players.  What’s needed to create it is best understood by addressing those three NFV functional elements one at a time.

VNFs are the foundation of NFV because if you don’t have them you have no functions to host and no way to generate benefits.  A VNF is essentially a cloud application that’s written to be deployed and managed in some way and to expose some set of external interfaces.  There are two essential VNF properties to define for interoperability.

A real device typically has some addresses that represent its data, control, and management interfaces.  These interfaces speak the specific language of the device, and so to make them work we have to “connect” to them with a partner that understands that language.  We have to match protocols in the data path, and we have to support control and management features through those interfaces.  Rather than define a specific standard for the management side, NFV has presumed that a “VNF Manager” would be bound with the VNFs to control their lifecycle.  VNFMs know how to set up, parameterize, and scale VNFs.

One thing this means is that VNFMs are kind of caught between two worlds—they are on one hand a part of a VNF and on the other hand a part of the management process.  If you look at the implementations of NFV today, most VNFMs have rather specific interfaces to the management systems and resource management tools of the vendors who create the NFV platform.  That’s not unexpected, but it means that it’s going to be difficult to make a VNF portable unless the VNFM is portable, and that’s difficult if it’s hooked to specific vendor tools.

The other hidden truth here is that if a VNFM drives lifecycle management, then the VNFM knows the rules for things like scaling, event-handling for faults, and so forth.  It also obviously has to know the service model—the way that all the components in a VNF are connected and what has to be redone if a given component is scaled out and scaled in.  If this knowledge exists “inside” the VNFM then the VNFM is the only thing that knows what the configuration of a service is, which means that if you can’t port the VNFM you can’t port anything.

The second critical interoperability issue is the NFV Infrastructure piece.  You’d want to be able to host VNFs on the best available resource, both in terms of resource capacity planning (cheap commodity servers or ones with special data-plane features) and in terms of picking a specific hosting point to optimize performance and cost during deployment.  Infrastructure has to be abstracted to make this work, so that you give a command to abstract-hosting-point and it hosts depending on all your deployment policies and the current state of resources.

It’s clear who does this—the Virtual Infrastructure Manager.  It’s not really clear how it works.  For example, if there are parameters to guide where a VNF is to be put (and there are), you’d either have to be able to pass these to the lower-level cloud management API (OpenStack Nova for example) to guide its process, or you’d have to apply your decision policies to infrastructure within the VIM (or higher up) and then tell the CMS API specifically where you wanted something put.  The first option is problematic because cloud deployment tools today don’t support the full range of NFV options, and the second is problematic because there’s no indication that resource topology and state information is ever published “upward” from the NFV Infrastructure to or though the VIM.

If you read through the NFV specifications looking for the detail on these points, through the jaded eyes of a developer, you’ll not find everything you need.  Thus, you can’t test for interoperability without making assumptions or leaving out issues.  Light Reading can, and I’d hope they might, identify the specific things that we don’t have and need to have, but it’s not going to be able to apply a firm standard of interoperability that’s meaningful.

How about the startup?  The company is called “RIFT.io” and its product is “RIFT.ware”.  The details of what RIFT.ware includes and what it does are a bit vague (not surprising since the company just came out of stealth), but the intriguing quote from their website is “VNFs built with RIFT.ware feature the economics and scale of hyperscale data centers and the security and availability of Telco-grade network services.”  Note my italics here.  The implication is that RIFT.ware is a kind of VNFPaaS framework, something that offers a developer of VNFs a toolkit that would, when used, exercise all of the necessary deployment and lifecycle management features of NFV in a standard way.

I think that a PaaS for NFV is a great notion, and I’ve said that in blogs before.  However, it’s obvious there are some questions here.  How would RIFT.io induce VNF vendors, particularly the big NFV players who also have infrastructure and MANO, to use their system?  Since there are no definitive specifications from ETSI that you could force compliance with, could the big guys simply thumb their noses?  Another question is whether RIFT.ware could answer the questions about “interoperability” within its own framework.  And if we don’t get conformance on RIFT.ware across the board, it becomes another silo to be validated.

The final question here, which applies to both LR and RIFT.io, is that of the business case.  Interoperability doesn’t guarantee utility.  If NFV doesn’t reach far enough into legacy service infrastructure to operationalize things end to end, and if it doesn’t integrate with current OSS/BSS/NMS processes in a highly efficient way, then it doesn’t move the ball in terms of service agility and operations efficiency.  The ETSI spec doesn’t address legacy management and doesn’t address operations integration, at least not yet.

I’m hopeful that both these activities will produce something useful, but I think that for utility to emerge from either, we’ll have to address omissions in the NFV concept as it’s currently specified.  I hope that both these activities will identify those omissions by trying to do something useful and running into them, because that sort of demo may be the only way we get them out in the open and dealt with.