Parcel Delivery Teaches NFV a Lesson

Here’s a riddle for you.  What do Fedex and NFV have in common?  Answer:  Maybe nothing, and that’s a problem.  A review of some NFV trials and implementations, and even some work in the NFV ISG, is demonstrating that we’re not always getting the “agility” we need, and for a single common reason.

I had to ship something yesterday, so I packed it up in a box I had handy and took it to the shipper.  I didn’t have a specialized box for this item.  When I got there, they took a measurement, weighed it, asked me for my insurance needs, and then labeled it and charged me.  Suppose that instead of this, shippers had a specific process with specific boxing and handling for every single type of item you’d ship.  Nobody would be able to afford shipping anything.

How is this related to NFV?  Well, logically speaking what we’d like to have in service creation for NFV is a simple process of providing a couple of parameters that define the service—like weight and measurements on a package—and from those invoke a standard process set.  If you look at how most NFV trials and implementations are defined, though, you have a highly specialized process to drive deployment and management.

Let me give an example.  Suppose we have a business service that’s based in part on some form of virtual CPE for DNS, DHCP, NAT, firewall, VPN, etc.  In some cases we host all the functions in the cloud and in others on premises.  Obviously part of deployment is to launch the necessary feature software as virtual network functions, parameterize them, and then activate the service.  The parameters needed by a given VNF and what’s needed to deploy it will vary depending on the software.  But this can’t be reflected in how the service is created or we’re shipping red hats in red boxes and white hats in white boxes.  Specialization will kill agility and efficiency.

What NFV needs is data-driven processes but also process-independent data.  The parameter string needed to set up VNF “A” doesn’t have to be defined as a set of fields in a data model.  In fact, it shouldn’t be because any software guy knows that if you have a specific data structure for a specific function, the function has to be specialized to the structure.  VNF “A” has to understand its parameters, but the only thing that NFV software has to know to do is to get variables the user can set, and then send everything to the VNF.

The biggest reason why this important point is getting missed is that we are conflating two totally different notions of orchestration into one.  Any respectable process for building services or software works on a functional-black-box level.  If you want “routing” you insert something that provides the properties you’re looking for.  When that insertion has been made at a given point, you then have to instantiate the behavior in some way—by deploying a VNF that does virtual routing or by parameterizing something real that’s already there.  The assembly of functions like routing to make a service is one step, a step that an operator’s service architect would take today but that in the future might be supported even by a customer service portal.  The next step, marshaling the resources to make the function available, is another step and it has to be separated.

In an agile NFV world, we build services like we build web pages.  We have high-level navigation and we have lower-level navigation.  People building sites manipulate generic templates and these templates build pages with specific content as needed.  Just like we don’t have packages and shipping customized for every item, we don’t have a web template for every page, just for every different page.  Functional, in short.  We navigate by shifting among functions, so we are performing what’s fairly “functional orchestration” of the pages.  When we hit a page we have to display it by decoding its instructions.  That’s “structural” orchestration.  The web browser and even the page-building software doesn’t have to know the difference between content pieces, only between different content handling.

I’ve been listening to a lot of discussions on how we’re going to support a given VNF in NFV.  Most often these discussions are including a definition of all of the data elements needed.  Do we think we can go through this for every new feature or service and still be agile and efficient?  What would the Internet be like if every time a news article changed, we had to redefine the data model and change all the browsers in the world to handle the new data elements?

NFV has to start with the idea that you model services and you also model service instantiation and management.  You don’t write a program to do a VPN and change it to add a firewall or NAT.  You author a template to define a VPN, and you combine that with a “Firewall” or “NAT” template to add those features.  For each of these “functional templates” you have a series of “structural” ones that tell you, for a particular point in the network, how that function is to be realized.  NFV doesn’t have to know about the parameters or the specific data elements, only how to process the templates, just like a browser would.  Think of the functional templates as the web page and the structural ones as CSS element definitions.  You need both, but you separate them.

I’d love to be able to offer a list here of the NFV implementations that follow this approach, but I couldn’t create a comprehensive list because this level of detail is simply not offered by most vendors.  But as far as I can determine from talking with operators, most of them are providing only those structural templates.  If we have only structural definitions then we have to retroject what should be “infrastructure details” into services because we have to define a service in part based on where we expect to deploy it.  If we have a legacy CPE element in City A and virtual CPE in City B, we’d have to define two different services and pick one based on the infrastructure of the city we’re deploying in.  Does this sound agile?  Especially considering the fact that if we then deploy VCPE in City A, we now have to change all the service definitions there.

Then there’s management.  How do you manage our “CPE” or “VCPE?”  Do we have to define a different data model for every edge router, for every implementation of a virtual router?  If we change a user from one to the other, do all the management practices change, both for the NOC and for the user?

This is silly, people.  Not only that, it’s unworkable.  We built the web around a markup language.  We have service markup languages now, the Universal Service Definition Language or USDL is an example.  We have template-based approaches to modeling structures and functions, in TOCSA and elsewhere.  We need to have these in NFV too, which means that we have to work somewhere (perhaps in OPNFV) on getting that structure in place, and we have to start demanding vendors explain how their “NFV” actually works.  Otherwise we should assume it doesn’t.