The Traps that a Nephio-Based NFV Solution MUST Avoid – Welcome to CIMI Corporation's Public Blog

In my blog last week on the Linux Foundation’s open-source function virtualization project (Nephio), I noted that two things that the project didn’t have. One was service-layer modeling and deployment and the other was a platform-as-a-service API set to define how network functions would be written. Today, I want to explain why I think the two are important and offer my view of how to get them.

Nephio is a framework for network functions virtualization based on open-source development rather than the development of “specifications”. Obviously, we had the spec-based approach in the NFV Industry Specification Group (ISG) within ETSI, and that activity is still going (it’s working through Release 5 in fact). I’ve mad no secret of my view that the NFV ISG got the basic design wrong from the first, and that it’s been unable to work past those initial issues. Therefore, I think, it’s critical that Nephio not fall into the same traps that caused the ISG work to go awry.

The first of the traps, the one that Nephio is obviously intending to avoid, is the “trap of the monolith”. Telcos tend to think of software as monolithic applications, singular instances of stuff that pull events from a queue and push control messages out to devices. Nephio is proposing a true cloud-native, meaning distributed, architecture, and that would be a big positive step. Not, as we’ll see, a step sufficient to ensure success.

The second trap is the “trap of the inconclusive”, meaning that the NFV ISG defined a minimalist implementation that relied on leveraging other stuff that was out-of-scope to their work. That reliance tended to pull the specifications toward a simple virtual implementation of existing appliances, a substitution of virtual boxes for real boxes, and that constrains the ability of NFV to support future missions and ties it to current operations practices, making it difficult for NFV to address operations costs.

My point is that the two omissions in the current Nephio charter could result in an implementation that still falls into one or both of these traps. Let’s see why that is.

Nephio is quite explicit in saying that service-layer modeling and orchestration is outside its scope. On the surface, that might be seen as logical, given that we have both a TMF OSS/BSS framework that addresses the topic, and the ETSI Open Network Automation Platform (ONAP) that more directly aims at service lifecycle automation. However, these relationships also exist for ETSI NFV ISG specifications, and they didn’t save them from the traps.

The purpose of NFV is to deploy virtual functions, but a virtual function outside a service context is just sucking up resource to no purpose. The converse is that the optimum implementation of a given virtual function, and the optimum means of collecting them into feature instances, depends on the way they’d be used in services. If that point can’t be considered because the service processes are out of scope, then there’s a risk that the features required below those service processes won’t address service needs.

Operations issues here could be decisive for NFV and for Nephio, and we learned that in the ISG work. In the ISG, the presumption was that OSS/BSS and NMS relationships were realized by leveraging the current set of management interfaces. That meant that an element management system (EMS) was responsible for managing the virtual network functions created from the physical network functions (devices) the EMS was originally managing. The “management tree” of EMS branches that fed a network management layer (NMS) and finally a service management layer (SMS) was inherited by NFV.

The problem with this is that NFV should present a host of service deployment and remediation options that wouldn’t exist for physical devices. You can’t place, or move, one of those through a console command. You can do both with a virtual element, so how can legacy network and service operations deal effectively with capabilities that don’t exist for legacy devices?

There’s also an issue with the hosting environment itself. A physical network function’s management is a necessary mixture of function and platform management because the two are integrated. With a VNF, the hosting management has to be separate, and if we’re going to push the existing EMS to manage the VNFs, then how do we integrate hosting state and function state into one?

An intent-modeled service would be created by assembling features, which would then be created by integrating some combination of real devices and VNFs. At any level of abstraction, the state of the VNFs, devices, and cloud infrastructure could be integrated. Without service modeling, we’d have to presume that we were going to integrate the cloud and VNF management at the VNF level, and that suggests that a VNF it itself an intent model. If that’s the case, then haven’t we taken a major step toward defining an architecture to model services, without securing service modeling and management benefits?

I think that Nephio has already committed, at least philosophically, to the concepts of service management and modeling. I don’t think that it would be a massive undertaking to address the service layer. Yes, I understand that this would introduce a collision with the ETSI ONAP, but Nephio is a collision with ETSI NFV already. ONAP was done wrong, just as NFV was. Can one be fixed while the other remains broken?

The second of the things not addressed in Nephio is my PaaS layer. Virtual network functions aren’t the same thing as applications; they have a much narrower scope and focus. They do evolve into a form of edge computing at the cloud hosting level, but they have network-specific pieces and features. I think that both should be considered in establishing a cloud-native framework for VNFs.

In the Nephio material, they note that the current model of VNF deployment requires too much customization, too much specialization of integration. True, but the reason that’s true is that the ETSI process had a goal of accepting VNFs from any source, in any original form. Remember that it believed that a VNF was the functional piece of a device, extracted and hosted. Every device has a potentially different hardware/software platform, and thus every VNF has a different potential relationship with its environment. Yes, if we want to accommodate the current VNFs, we have to deal with that. Do we have to perpetuate it?

The software space is replete with plugins. We have them in OpenStack, and we even have them (as “K8S Operators”) in the Nephio diagram of its layers. The goal of the plugin approach is to define an “ideal” interface for something, and then adapt any stuff that can’t use the interface directly. We also have packages that let software for one platform framework (like Windows) run on another (Linux). The ONF P4 approach defines a standard interface between switch software and a switch chip, and a P4 driver adapts different chips to that optimal, common, API set. There’s a software development design pattern called “Adapter” to represent this sort of thing, for a given API. Why not define an API set for a Nephio VNF, which would be a PaaS?

It’s hard to believe that the requirements for edge computing wouldn’t have similarities across all edge applications. Edge applications will have many common requirements, notably latency control. Similarly, it’s hard to believe that the way some edge requirements would be met shouldn’t be standardized to avoid differences in implementation that would waste resources and complicate management. If we defined a PaaS for the edge, that would facilitate edge development, and it would also form the basis for a derivative PaaS set that would be more specialized. VNF development would be such a derivative.

An easy example of a place where a PaaS would be helpful is in intent modeling itself. An intent model exposes a standard “interface” and/or data model, and the interior processes are built to support that exposure no matter what the explicit implementation happens to be. A lot of this process is, or could be, standardized, meaning that a PaaS could expose APIs that the interior implementation could then connect with. Doing that would mean that there would be a toolkit for intent-model-building, which seems a critical step in creating an efficient model development process. Remember that Nephio is based on intent models.

Management is another example. Back in 2012, I was working with operators on what I called “derived operations”. The problem arises with any sort of distributed functionality, because a device can present a single interface as its management information base (MIB) but what does that in a distributed process? There’s also a problem with the MIB approach in that it’s based on polling for status. When a function relies on shared resources, its polls combine with those of other dependent functions to create a risk of what’s effectively a DDoS attack.

My derived operations approach was built on an IETF proposal called “infrastructure to application exposure” or i2aex. A set of daemon processes polled the management information sources and posted results to a database. Anything that needed management state would get it via a query to that database, which would not only gather and format the information, but perform any derivations needed. Derived operations APIs could be part of the VNF PaaS. This, by the way, was a goal in the first NFV ISG PoC that I submitted, and that became the first to be approved. The concept wasn’t adopted.

Nephio isn’t my first rodeo; I’ve been personally involved in multiple initiatives for virtualization of network elements, and assessed others as an industry analyst and strategy consultant. I’ve seen a lot of hard and good work go to waste because of seemingly logical and simplifying steps that turned out not to be either one. Nephio has the best foundation of any network-operator-related software project. That doesn’t mean it can’t still be messed up, and everyone involved in or supporting Nephio needs to take that to heart. I’ll talk about how that could be done in my next blog on this topic.