Lean NFV: Under the Covers

Last week at the Open Networking Summit, we got our first look at the concept of “Lean NFV”.  This initiative is the brain-child of two California academics, and they have an organization, a website, and a white paper.  Light Reading did a nice piece on it HERE, and in it they provide a longer list of those involved.  The piece in Fierce Telecom offers some insights from the main founders of the initiative, and I’d advise everyone interested in NFV to read all of these.

Anyone who’s read my blog since 2013 knows that I’ve had issues with the way the NFV ISG approached their task.  Thus, I would really be happy to see a successor effort that addressed the shortcomings with the current NFV model. There are obviously shortcomings, because nearly everyone (including, obviously, the people involved in Lean NFV) agrees that NFV has fallen far short of implementations.  I’ve reviewed the white paper, and I have some major concerns.

My readers also know that from the first, I proposed an NFV implementation based on the TMF concept of NGOSS Contract, where a service data model was used to maintain state on each functional element of the service, and from that steer events to the appropriate processes.  This concept is over a decade old and remains the seminal work on service automation.  I applied it to the problem of automating the lifecycle management of services made up of both hosted resources and appliances in my ExperiaSphere work.  I’ve been frank from the first in saying that I don’t believe there’s any other way to do service lifecycle automation, and I’m being frank now in admitting that here I’m going to compare the Lean NFV approach both to the ETSI NFV ISG model and to the ExperiaSphere model.  Others, with other standards, are obviously free to make their own comparisons.

Many have said that the NFV ISG did it wrong, me surely among them.  In my own case, I have perhaps historical primacy on the point since I responded to the 2012 white paper referred to in the quote and cited the risks, then told the ISG at the first US meeting in 2013 that they were getting it wrong.  The point is that we don’t need recognition of failure here, we need a path to success.  Has the Lean NFV initiative provided that?  Let’s look at the paper to see if they correctly identified the faults of the initial effort, and have presented a timely and effective solution.

The opening paragraph of the white paper is something few would argue with, and surely not me: “All new technologies have growing pains, particularly those that promise revolutionary change, so the community has waited patiently for Network Function Virtualization (NFV) solutions to mature. However, the foundational NFV white paper is now over six years old, yet the promise of NFV remains largely unfulfilled, so it is time for a frank reassessment of our past efforts. To that end, this document describes why our efforts have failed, and then describes a new approach called Lean NFV that gives VNF vendors, orchestration developers, and network operators a clear path forward towards a more interoperable and innovative future for NFV.”

The “Where did we go wrong” section of the paper says that the NFV process is “drowning in complexity” because there are “so many components”.  I don’t think that’s the problem.  The ISG went wrong because it started at the bottom.  They had a goal—hosting functions to reduce capex by reducing the reliance on proprietary appliances.  A software guy (like me and many others) would start by taking that goal and dissecting it into specific presumptions that would have to be realized, then work down to the details of how to realize them.  The ISG instead created an “End-to-End Architecture” first, and then solicited proofs-of-concept based on the architecture.  That got implementations going before there was any concerted effort to design anything or defend the specific points that attaining the goal would have demanded.

The paper goes on to “What Is the Alternative”, which identifies three pieces of NFV, the “NFV Manager”, the “computational infrastructure” and the “VNFs”.  These three align in general with the NFV ISG’s concept of MANO/VNFM, the NFV Infrastructure (NFVi), and the Virtual Network Functions (VNFs).  It then proposes these three as the three places where integration should focus.  But didn’t the paper just do the very thing that I criticized the ISG for doing?  Where is the goal, the dissection of the goal into presumptions to be realized?  We start with what are essentially the same three pieces the ISG started with.

The paper proposes that the infrastructure manager be stripped down to “a core set of capabilities that are supported by all infrastructure managers, namely: provide NFV with computational resources, establish connectivity between them, and deliver packets that require NFV processing to these resources.”  This statement in itself is problematic in my view, because it conflates a deployment mission (providing the resources) with a run-time mission (deliver packets).  That would mean that either the Lean NFV equivalent of the Virtual Infrastructure Manager was instantiated for each service, or that all services passed through a common element.  Neither of these would be efficient.

But moving on, the technical explanation for the Lean NFV approach starts with a diagram with three layers—the NFV Manager running a Compute Controller and also an SDN Controller.  Right here I see a reference to another disquieting linkage between Lean NFV and its less-than-illustrious predecessor—“Service Chains”.  In NFV ISG parlance, a service chain is a linear string of VNFs along a data path, used to implement a multifunctional piece of virtual CPE (vCPE).  According to the paper, declarative service chain definitions are passed to the lower layers, and to me that’s problematic.

A service chain is only one of many possible connection relationships among deployed VNFs, and as such should be a byproduct of a generalized connection modeling strategy and not a singular required relationship model.  vCPE isn’t even a compelling application of NFV, and if you look at the broader requirements for carrier cloud, you’d find that most applications would expect to be deployed as cloud application components already are—within an IP subnet.  You don’t need or even want to define “connections” between elements that are deployed in a fully connective framework.

In the cloud, we already do orchestration and connection by deploying components within a subnetwork that supports open connectivity.  That means that members are free to talk with each other in whatever relationship they like.  Service chaining should have been defined that way, because there’s no need to create tunnels between VNFs when all the VNFs can reach each other using IP or Ethernet addresses.  Especially since the tunnels would have to use those addresses anyway.

More troubling is the fact that the three-layer structure described here doesn’t indicate how the NFV Manager knows what the architecture of the service is, meaning the way that the elements are deployed and cooperate.  The fact that a service chain is called out suggests that there is no higher-level model structure that represents the deployment and lifecycle management instructions.  The only element in the paper that might represent that, what’s called the “key-value store” isn’t offered in enough detail to understand what keys and values are being stored in the first place.  The Fierce Telecom story features commentary from one of the founders of Lean NFV on the concept, and it’s clear it’s important to them.

But what the heck is it?  The paper says “Finally, as mentioned before, the NFV manager itself should be architected as a collection of microcontrollers each of which addresses one or more specific aspect of management. These microcontrollers must coordinate only through the KV store, as shown in Figure 3 below (depicting an example set of microcontrollers). This approach improves modularity and simplifies innovation, as individual management features can be easily added or replaced.”

The TMF has a data model for services, and it’s a hierarchical model.  In my ExperiaSphere stuff, I outline a hierarchical model for service descriptions.  Many standards and initiatives in and out of the cloud have represented services and applications as a hierarchy—“Service” to “Service Component” to “Subcomponent” and so forth.  These models show the decomposition of a service to its pieces, and the relationships between components of features and the features themselves.  In my view and in the view of the TMF, that is critical.

KV stores are typically not used to represent relationships.  Intent modeling and decomposition, in my own experience, are better handled with a relational or hierarchical database model.  It’s my view that if Lean NFV contemplated basing NFV deployment and management on a model-driven approach, which I believe is absolutely the only way to do it with the kind of scalability and agility the paper says they’re looking for, they’d have featured that in the paper and highlighted it in their database vision.  They don’t, so I have to assume that like traditional NFV, they’re not modeling a service but rather deploying VNFs without knowledge of the specific relationship they have to the service overall.

Could KV work for management/configuration data missions?  I’ve advocated a single repository for management data, using “derived operations” based on the IETF proposal for “infrastructure to application exposure” or i2aex.  I suggested this was separate from the service model, and that it was supported by a set of “agents” that both stored management data and delivered it through a query in whatever format was needed.

The Lean NFV paper describes something similar.  Continuing with the quote above, “The KV store also functions as the integration point for VNFs (depicted on the left) and their (optional) EMSs; in such cases, coordination through the KV store can range from very lightweight (e.g., to notify the EMS of new VNF instances or to notify the NFV manager of configuration events) to actually storing configuration data in the KV store. Thus, use of the KV store does not prevent NF-specific configuration, and it accommodates a range of EMS implementations. Note that because the KV store decouples the act of publishing information from the act of consuming it, information can be gathered from a variety of sources (e.g., performance statistics can be provided by the vswitch). This decreases reliance on vendor-specific APIs to expose KPIs, and hence allows a smooth migration from vendor- and NF-specific management towards more general forms of NFV management.”

Certainly a KV store could be used for the same kind of “derived operations” I’ve been blogging about, with the same benefits in adapting a variety of resources to a variety of management interfaces.  However, “derived operations” was based on the notion of an intent model hierarchy that allowed each level of the model to define its management properties based on the properties of the subordinate elements.  If you don’t have a model, then you have only static relationships between deployed virtual functions and (presumably) the EMSs for the “real” devices on which they’re based.  That gets you back to building networks from virtual versions of real devices, which is hardly a cloud-native approach.

To me, the relationship concept or hierarchical modeling is the key to NFV automation and cloud-native behavior.  You have to do model-driven event-to-process steering to do service automation; where otherwise would changes in condition be reflected?  If you do that steering, you need to understand the relationship between the specific feature element you’re working on, and the rest of the service.  The TMF recognized that with its NGOSS Contract model a decade or more ago, and it’s still the basic truth of service automation today.  If you have a model, you can define functional elements of services in any way that’s convenient, based on devices, on features, or on both at the same time.  That’s what we’d expect from a cloud-native approach to services.  That I don’t see the modeling in the Lean NFV material is very troubling.

We have a cloud-native service/application modeling approach, TOSCA, that’s already used in some NFV implementations and by some orchestration/automation vendors.  We have a cloud-proven technology, GitOps and GitHub, to organize configuration data and unify deployment strategies.  We have a proposal for a repository-based management system, i2aex, that would allow flexible storage and retrieval of status information and support any device and any EMS.  Why do we need something new in these areas, like KV?  Are we not just repeating the NFV ISG’s mistake of failing to track the cloud?  Lean NFV seems to be another path to the same approach, when another approach is what we need.  What they seem to say is that the implementation of the NFV architecture was wrong, but the implementation is wrong because the architecture is wrong.

Could you mold the Lean NFV picture to make it compatible with the model-driven vision I’ve been promoting for years?  Possibly, but you could have done that with the NFV E2E architecture too (I had a slide that did it, in fact).  To me, the problems here are first that they’ve accepted too much of the NFV ISG methodology and approach and not enough of what we’ve learned both before the ISG (the TMF NGOSS Contract) and after (the Kubernetes ecosystem for the cloud).  They’ve inherited the issues that have arisen from ETSI and ignored solutions from the cloud.  Second, they have not described their own approach in enough detail to judge whether it will make the same mistake that the ISG made, which was to literally interpret what their own material said was a functional model.  Even if this is only a starting point for Lean NFV, it’s the wrong point to start in.

I asked on Thursday of last week to join the group and introduce my thinking; no response so far, but I’ll let you know how that works out.