Does “Lean NFV” Move the NFV Ball?

NFV has seen a lot of movement recently, but all movement isn’t progress.  I noted in earlier blogs that NFV still shows up in a lot of telco diagrams on implementation of 5G, including OpenRAN, and it’s also included in vendor diagrams of their support for telco cloud, even cloud-native.  The problem is that NFV isn’t really suited for that broad an application set, a view I hold firmly and one many operators share.  Thus, it’s not surprising that there’s interest in doing something to redeem NFV.

One initiative getting a lot of attention these days is Lean NFV, whose white paper is now in the 2020 “C” revision.  I told Fierce Telecom about my reservations regarding the concept, and I want to dig into their latest material to see if there’s anything there to either resolve or harden them.  The MEF’s support, after all, could mean something.  At least, it might take NFV out of a venue that didn’t produce into one that still might have a chance.

The first subhead in the paper I referenced above is a good start: “Where did we go wrong?”  The paragraph under that heading is promising, if a big lacking in specificity.  That, we could hope for in the rest of the document.  The main theme is that NFV is too complex, in no small part because it was never truly architected (my words) to define the pieces and how they all fit.  The functional diagram became an implementation guide, which created something that’s “too closely coupled to how the rest of the infrastructure is managed,” to quote the paper.

This is perhaps the best sign of all, because the biggest single problem with NFV was indeed the relentless effort to advance it by containing its scope.  The rest of the telco network and operations world presented a bunch of potential tie points, and rather than define something that was optimum for the mission of virtualizing functions, the ISG optimized the use of those legacy tie points.  But does Lean NFV do any better?

Lean NFV defines three pieces of functionality that represent goals to be addressed by Lean NFV, and if there’s to be a benefit to the concept, it has to come because these are different/better than the ISG model.  The first is the NFV manager, which manages not only the VNFs but “the end-to-end NFV service chains”.  The paper takes pains to say that this isn’t necessarily a monolithic software structure; it could be a collection of management functions.  The second is the “computational infrastructure”, which is I think an unnecessarily complicated way of saying the “resource pool” or “cloud”.  The third is the VNFs themselves, which the paper says might have their own EMS for “VNF-specific configuration”.

The way that Lean NFV proposes to be different/better is by concentrating on what it describes as the “three points of integration”, “when the NFV manager is integrated with the existing computational infrastructure, when VNFs are integrated with the NFV manager, and when coordination is required between the various components of the NFV manager.”  It proposes to use a key-value store to serve as a kind of “repository” to unify things at these three critical points, and leave the rest of the implementation to float as the competitive market dictates.  The paper goes on to describe how the three critical integration points would be addressed, and simplified, by the key-value approach.

What I propose to do to assess this is to forget, for the moment, the specifics of how the integration is to be done (the key-value store) and look instead at what the solution is supposed to be delivering to each of these three integration areas.  If the deliverables aren’t suitable, it doesn’t matter how they’re achieved.  If they are, we can look at whether key-value stores are the best approach.

The first suggestion, regarding the interface to computational resources, is certainly sensible.  The original NFV was very OpenStack-centric, and what the paper proposes is to, in effect, turn the whole computational-resource thing into a kind of intent model.  You define some APIs that represent the basic features that all forms of infrastructure manager should support, and then you allow the implementation to fill in what’s inside that black box.  All of the goals of the paragraph describing this are sensible and, I think, important to the success of NFV.

The second suggestion relates to the NFV manager, and I will take the liberty of reading it as an endorsement of a data-model-driven coupling of events to processes.  The data model serves as the interface between processes, implying that it sets a standard for data that all processes adhere to regardless of their internal representations.  This can all, at the high level, be related to the TMF NGOSS Contract work that I think is the seminal material on this topic.

The third suggestion is the one I have the most concern about, and it may well be the most important.  Lean NFV suggests that the issues with VNF onboarding relate to the configuration information and the fact that VNFs use old and often proprietary interfaces.  Lean NFV will provide “a universal and scalable mechanism for bidirectional communication between NFV management systems and VNFs”, which I believe is saying that the data model will set a standard to “rewrite” VNFs.  I don’t think that there’s much interest among vendors in rewriting, so I’m not comfortable with this approach, even at the high level.

OK, where this has taken us is to accept two of the three “goal-level” pieces of Lean NFV, but not the third.  That leads to the question of whether the key-value store approach is the way to approach those goals, and in my view it is not.  I have to say, reluctantly, that I think the Lean NFV process makes the same kind of mistakes as the original NFV did.  They’re wrong, differently.

One problem is that a key-value store doesn’t define the relationship between services, VNFs, and infrastructure.  Yes, it describes how to parameterize stuff, but a service is a graph, not a list.  It shows relationships and not just values.  In order to commit compute resources or connect things, I need a data model that lets me know what needs to be connected and how things have to be deployed.  I told the ONAP people that until they were model-driven, I would decline to take further briefings.  The same goes here.

The second issue is the lack of explicit event-driven behavior.  APIs are static and tend to encourage “synchronous” (call-and-wait) behavior, where events dictate an asynchronous “call-me-back” approach.  Not only does Lean NFV not mandate events, it provides no specific mechanism to associate events to processes.  It suggests that microcontrollers could “watch” for changes in the key-value store, which makes the implementation more like a poll-for-status approach, something we know isn’t scalable to large-scale networks and services.

The biggest problem, though, is that we’re still not addressing the basic question of what a virtual network function is.  Recall, in a prior blog, that I noted that there was early interest in decomposing current “physical network functions” (things like router or firewall code) into logical features, and then permitting recomposition via cloud behavior.  If we decide that a VNF is the entire logic of a device, then making it virtual does nothing but let us run it on a different device.  There may be differences in subtle performance and economic issues when we look at hosting a VNF in a white box (uCPE), a commercial server, a container, a VM, or even serverless, but will this be enough to “transform?”

There are some good ideas here, starting with a pretty-straight recognition of what’s wrong with ETSI NFV.  The problem I see is that, like the ISG, the Lean NFV people got fixated on a concept rather than embarking on a quest to match virtual functions to modern abstract resource pools like the cloud.  In the case of Lean NFV, the concept was the key-value store.  To a hammer, everything looks like a nail, and so it appears that the Lean NFV strategy was shaped by the proposed solution rather than the other way around.

There’s still time to fix this.  The paper is very light on implementation details, as I’ve noted before regarding the Lean NFV initiative.  That could mean I’ve missed the mark, but it could also give the group the chance to consider some of these points.  The goals are good, the way to achieve them isn’t so good.  I’m really hopeful that the organization will move to fix things, because there’s a lot of wasted motion in the NFV space, and this at least has some potential.

The problem is that if “Lean NFV” were in fact to adopt my suggestions, it might still be “Lean” but it would have moved itself rather far from ETSI NFV.  There’s never been a standard that a telco couldn’t place too much reliance upon, for too long.  NFV is surely not one to break that rule.