Can NFV and SDN Standards Learn from the Market?

I’ve commented in a number of my blogs that the standards process for both SDN and NFV have failed to address their respective issues to full effect.  The result is that we’re not thinking about either technology in the optimum way, and are at risk for under-shooting the opportunities both could generate.  There are some lessons on what the right way might be out there in the market, too.

One of the most interesting aspects of this swing-and-miss is that the problem may well come simply from the fact that we have “standards processes for both SDN and NFV.”  There’s more symbiosis between the two than many think, and it may be true that neither can succeed without the other.  There’s some evidence of this basic truth in, of all things, the way that OTT giants Amazon and Google have approached the problem.

Amazon and Google are both cloud providers, and both have the challenge of building applications for customers in a protected multi-tenant way.  That sounds a lot like the control and feature hosting frameworks of SDN and NFV.  When the two cloud giants conceptualized their model, their first step was to visualize components of applications running in a virtual network, which they presumed would be an IP subnet based on an RFC 1918 address space.

RFC 1918 is a standard that sets aside some IP addresses for “private” or internal use.  These addresses (there’s one Class A address, 16 Class B addresses, and 256 Class C addresses) are not routed on public networks, and so can’t be accessed from the outside except through NAT.  The presumption of both Amazon and Google is that you build complexes of components into apps or services within a private address space and expose (via NAT) only the addresses that should be accessible from the outside.

Logically this should have been done for management/control in both SDN and NFV, and NFV in particular should have established this private virtual network model for the hosting of VNFs.  The notion of “forwarding graphs” that’s crept into NFV is IMHO an unnecessary distraction from a basic truth that major cloud vendors have accepted from the first.

OpenFlow, which most NFV implementations use, has also accepted this model in a general sense; cloud applications are normally deployed on a subnet (via Neutron) and exposed through a gateway.  Within such a subnet or private virtual network, application components communicate as they like.  You could still provision tunnels between components where the relationship between elements was determined by provisioning rather than by how the elements functioned, of course, but in most cases complex topologies would be created not by defining them but by how the components of an application/service naturally interrelated.

A service/application model like the private virtual network model of Amazon and Google could provide a framework in which security, compliance, and management issues could be considered more effectively.  When you create a “VNF” and “host” it, it has to be addressable, and how that happens will both set your risk profile and expose your connectivity requirements.  For example, if you expect a VNFM component of a VNF to access resource information about its own platforms, you’d have to cross over the NAT boundary with that data—twice perhaps if you assume that your resources were all protected in private address spaces too.  This is exactly the situation Google describes in detail in its Andromeda network virtualization approach.

Another lesson to be learned is the strategy for resource independence.  Amazon and Google abstract infrastructure through a control layer, so that hosting and connection resources appear homogeneous.  A collection of resources with a control agent to manage the abstraction-to-reality connection is the way that new resources get presented to the cloud.  NFV doesn’t quite do that.

In NFV, we have four issues with resource abstraction:

  1. A Virtual Infrastructure Manager (VIM) is only now evolving into a general “Infrastructure Manager” model that admits anything into the orchestration/management mix, not just virtualized stuff. Everyone in the operator space has long realized that you need to mix virtual and real resources in deployments, so that generalization is critical.
  2. In the ETSI model, a VIM/IM is a part of MANO, when logically the VIM/IM and the NFVI components it represents should be a combined plug-and-play unit. Anyone who offers NFVI should be able to pair their offering with an IM that supports some set of abstractions, and the result should be equivalent to anything else that offers those abstractions.
  3. You can’t have resource abstraction without a specific definition of abstractions you expect to support. If a given “offering” has hosting capability, then it has to define some specific virtual-host abstraction and map that to VMs or containers or bare metal as appropriate.  We should have a set of required abstractions at this point, and we don’t.
  4. You can’t manage below an abstraction unless you manage through that abstraction, meaning that abstraction management is decomposed into and composed from management of what’s underneath, what’s realized. Unless you want to assume that actual resources are forever opaque to NFV management and that the pool of resources is managed independently without service correlation, you need to exercise management of realized abstractions through the element that realizes them, the IM.  The current ETSI model doesn’t make that clear.

Google’s Andromeda, in particular, seems to derive NFV and SDN requirements from a common mission.  What would Andromeda say about SDN, for example?  It’s an abstraction for execution of NaaS.  It’s also a mechanism for building those private virtual networks.

There are some things NFV could learn from other sources, including the TMF.  I’ve long been a fan of the “NGOSS Contract” approach to management, where management of events is directed to processes through the intermediation of the service contract.  That should have been a fundamental principle for virtualization management from the first.  The TMF also has a solution to how to define the myriad of service characteristics without creating an explosion of variables that threaten to bloat all of the parameter files.  IMHO, the ETSI work is at risk to that right now.

For quite a while the TMF data model (SID) has supported the use of “Characteristics” which means a dynamic run-time assignment of variables, a dynamic attribute pattern.  It should be possible, using the TMF approach, to define resource- or service-specific variables and pass them along without making specific by-name accommodation.  What’s required is consistency in production and consumption of the variables, which is needed in any case.

I think there are plenty of operators who agree these capabilities are needed, at least in some form.  I don’t think they’re going to get them from the NFV ISG because standards groups in any form are dominated by vendors because there are more vendors and because vendors have more money to spend.  They’re not going to get them from the OPNFV effort either, because open-source projects are dominated by contributors (who are most likely vendors for all the reasons just cited).

The New IP Agency might help here by shining a light on the difference between where we are in NFV and SDN and where we need to be.  Most likely, as I’ve said before, vendors will end up driving beneficial changes as well as being barriers.  Some vendors, particularly computer vendors, have nothing to lose in a transition to virtual technologies in networking.  While the incumbent equipment vendors surely do, they can’t sit on their hands if there are other vendors who are happy (for opportunistic reasons) to push things forward.

In some way or another, all these points are going to have to be raised.  They should have been considered earlier, of course, and the price of that omission is that some of the current stuff is sub-optimal to the point where it may have to be modified to be useful.  I think standards groups globally should explore the lessons of SDN and NFV and recognize that we have to write standards in a different way if we’re to gain any benefit in a reasonable time.