How NFV Can Save Itself in 2018

Network Functions Virtualization (NFV) has generated a lot of buzz, but it became pretty clear last year that the bloom was off the rose in terms of coverage and operator commitment.  Does this mean that NFV was a bad idea?  Is all the work that was done irrelevant, or about to become so?  Are vendor and operator hopes for NFV about to be dashed for good?

NFV wasn’t a bad idea, and still isn’t, but the fulfillment of its potential is in doubt.  NFV is at a crossroads this year, because the industry is moving in a broader direction and the work of the ISG is getting more and more detailed and narrow.  The downward direction collides more and more with established cloud elements, so it’s redundant.  The downward direction has also opened a gap between the business case and the top-level NFV definitions, and stuff like ONAP is now filling that gap and controlling deployment.

I’ve noted in many past blogs that the goal of efficient, agile, service lifecycle management can be achieved without transforming infrastructure at all, whether with SDN or NFV.  If we get far enough in service automation, we’ll achieve infrastructure independence, and that lets us stay the course with switches and routers (yes, probably increasingly white-box but still essentially legacy technology).  To succeed in this kind of world, NFV has to find its place, narrower than it could have been but not as narrow as it will end up being if nothing is done.

The first step for NFV is hitch your wagon to the ONAP star.  The biggest mistake the ETSI NFV ISG made was limiting its scope to what was little more than how to deploy a cloud component that happened to be a piece of service functionality.  A new technology for network-building can never be justified by making it equivalent to the old ones.  It has to be better, and in fact a lot better.  The fact is that service lifecycle automation should have been the goal all along, but NFV’s scope couldn’t address it.  ONAP has a much broader scope, and while (as its own key technologists say) it’s a platform and not a product, the platform has the potential to cover all the essential pieces of service lifecycle automation.

NFV would fit into ONAP as a “controller” element, which means that NFV’s Management and Orchestration (MANO) and VNF Manager (VNFM) functions would be active on virtual-function hosting environments.  The rest of the service could be expected to be handled by some other controller, such as one handling SDN or even something interfacing with legacy NMS products.  Thus, ONAP picks up a big part of what NFV doesn’t handle with respect to lifecycle automation.  Even though it doesn’t do it all, ONAP at least relieves the NFV ISG of the requirements of working on a broader area.

The only objections to this step may come from vendors want to push their own approaches, or from some operators who have alternative open-platform aspirations.  My advice to both groups is to get over it!  There can be only one big thrust forward at this point, and it’s ONAP or nothing.

The second step for NFV is probably going to get a lot of push-back from the NFV ISG.  That step is to forget a new orchestration and management architecture and focus on adapting cloud technology to the NFV mission.  A “virtual network function” is a cloud component, period.  To the greatest extent possible, deploying and sustaining them should be managed as any other cloud component would be.  To get to that point, we have to divide up the process of “deployment” into two elements, add a third for “sustaining”, and then fit NFV to each.

The first element is the actual hosting piece, which today is dominated by OpenStack for VMs or Docker for containers.  I’ve not seen convincing evidence that the same two elements wouldn’t work for basic NFV deployment.

The second element is orchestration, which in the cloud is typically addressed through DevOps products (Chef, Puppet, Heat, Ansible) and with containers through Kubernetes or Marathon.  Orchestration is about how to deploy systems of components, and so more work may be needed here to accommodate the policy-based automation of deployment of VNFs based on factors (like regulations) that don’t particularly impact the cloud at this point.  These factors should be input into cloud orchestration development, because many of them are likely to eventually matter to applications as much as to services.

The final element is the management (VNFM) piece.  Cloud application management isn’t as organized a space as DevOps or cloud stacks, and while we have this modern notion of intent-modeled services, we don’t really have a specific paradigm for “intent model management”.  The NFV community could make a contribution here, but I think the work is more appropriately part of the scope of ONAP.  Thus, the NFV people should be promoting that vision within ONAP.

The next element on my to-do-for-NFV list is think outside the virtual CPE.  NFV quickly got obsessed with the vCPE application, service chaining, and other things related to that concept.  This has, in my view, created a huge disconnect between NFV work and the things NFV will, in the long term, have to support to be successful.

The biggest problem with vCPE is that it doesn’t present even a credible benefit beyond business services.  You always need a box at the point of service termination, particularly for consumer broadband where WiFi hubs combine with broadband demarcations in virtually every case.  Thus, it’s difficult to say what you actually save through virtualization.  In most current vCPE business applications, you end up with a premises box that hosts functions, not cloud hosting.  That’s even more specialized as a business case, and it doesn’t drive carrier cloud deployment critical for the rest of NFV.

Service chaining is another boondoggle.  If you have five functions to run, there is actually little benefit in having the five separately hosted and linked in a chain.  You now are dependent on five different hosting points and all the connections between them, or you get a service interruption.  Why not create a single image containing all five features?  If any of the five break, you lose the connection anyway.  Operations and hosting costs are lower for the five-combined strategy than the service-chain strategy.  Give it up, people!

The beyond-vCPE issue is that of persistence and tenancy.  Many, perhaps even most, credible NFV applications are really multi-tenant elements that are installed once and sustained for a macro period.  Even most single-tenant NFV services are static for the life of the contract, and so in all cases they are really more like cloud applications than like dynamic service chains.  We need to have an exploration of how static and multi-tenant services are deployed and managed, because the focus has been elsewhere.

We have actually seen some successful examples of multi-tenant service elements in NFV already; Metaswitch’s implementation of IMS comes to mind.  The thing that sets these apart from “typical” NFV is that you have a piece of service, an interface, that has to be visible in multiple services at the same time.  There has to be some protection against contamination or malware for such services, but there also has to be coordination in managing the shared elements, lest one service end up working against the interests of others.

Nothing on this list would be impossible to do, many wouldn’t even be difficult, and all are IMHO totally essential.  It’s not that a failure to address these points would cause NFV to fail as a concept, but that it could make NFV specifications irrelevant.  That would be a shame because a lot of good thinking and work has gone into the initiative to date.  The key now is to direct both past work and future efforts in a direction where results that move the ball for the industry as a whole, not for NFV as an atomic activity, can be obtained.  That’s going to be a bitter pill for some, but it’s essential.