Suppose We Had an NFV Mulligan

The notion of open-model networks often involves the concept of feature/function hosting, a concept that was introduced a decade ago with the “Call for Action” paper that turned into the NFV ISG. Today, I think that most network professionals agree that the ISG didn’t get the details right but changed the industry by raising the flag on the concept. Today, the ISG is trying to modernize (Release 5) and other forces are weighing in on the question of function hosting. Suppose we were to get a Mulligan on this, a do-over? What would a top-down approach to the question yield? I propose four principles.

The first principle is universal openness demands a universal hosting model. An actual network contains three classes of devices, aggregate traffic handlers, connection feature hosts, and on-the-network elements. NFV really focused on the middle group here, but we need an architecture that addresses all the groups and avoids silos. The way to get that is to break up the model of an open-model network into three layers, platform, PaaS, and function.

NFV proposed hosting functions on commercial off-the-shelf servers, and that’s too limiting. Functions should be hosted on whatever platform is optimal, which means everything from containers and VMs to bare metal, white boxes, controllers, etc. A network would then be made up of a variety of boxes, a variety of network functions riding on them, and a mechanism to organize and manage the entire collection.

Extending the “platform” to white boxes and other embedded-control hardware isn’t easy. In the white box space, for example, you have the potential for a variety of CPU chips, network chips, AI chips, and so forth. In order to implement this principle, you’d need to define a standard API set for each class of chip and a driver model to harmonize each implementation of a chip type with that API set.

The second principle builds on the first; the PaaS layer of function hosting creates a unified model of management and orchestration at all levels. To prevent total anarchy in function hosting, the functions have to be built to a common set of APIs. Those APIs are presented by “middleware” in each platform, and bind the application/function model to the variety of devices that might be useful in hosting stuff. Everything involved in function hosting is abstracted by this PaaS layer and presented through these common APIs, so no matter what the function or platform might be, getting the two together and functioning is the same. Thus, external tools bind to functions through this PaaS layer as well.

This principle, and this layer, are IMHO the keys to a modern vision of function hosting. Absent the PaaS layer, we end up with a bunch of virtual devices whose capabilities and value are constrained by the physical devices we already have. Do we really change networking by converting a real router to a virtual one? Maybe for a bit, while we wring out vendor profits, but not fundamentally.

The third principle is that function relationships to platforms and services are defined by a generalized, intent-based, service model. You can’t create a market-responsive network infrastructure by programming it, you have to be able to use a model to organize things, and build services by building models. The processes within, and in support of the PaaS layer would be integrated through the model, which means that lifecycle management and even OSS/BSS activity would be coordinated through the model.

Way back before any real NFV ISG work was done, I proposed the notion of “derived operations”, which meant that operations interfaces would be created through APIs that were proxies for underlying management information. I think that this task should now be part of the model, meaning that the model should be able to define management interfaces to represent functions and function collections.

Principle number four is all cloud-related operations/lifecycle tasks are carried out by standard cloud tools. This may be more complicated than it appears on the surface, because while we have “cloud tools” that work with containers, VMs, and even bare metal, we don’t traditionally apply (or attempt to apply) them to white boxes. If we’re to adhere to the first principle above, we have to make these tools work everywhere functions could be hosted. That could happen through the PaaS layer, which might be the simplest way to address this issue. That would mean that the PaaS layer is the functional centerpiece of the whole approach, which I think is a good idea.

We have, in the ONF Stratum and P4 architectures, what I think is a good model for the white-box function platform, and it seems likely it could also serve for other specialized hardware, like AI or IoT elements. We have, in Kubernetes, an orchestration platform that would serve well, and it’s already been adapted to bare metal. If we were to port the Kubernetes node elements (Kubelet, Kub-Proxy, etc.) to the Stratum model, that would make it compatible with white boxes, providing that we added Kubernetes’ general orchestration for VMs and bare metal.

We don’t have either a model or the PaaS, and those are the two things that something like the NFV ISG should be developing (that was my proposal to the ISG back in 2013). It’s easier to create the two together, I think, because there’s some cooperative design involved. It wouldn’t be rocket science to do this, either. There’s plenty of work out there on intent models, I’ve blogged about the modeling needs many times, and my ExperiaSphere work is available for reference without even a requirement for attribution, as long as the word “ExperiaSphere” isn’t used and there’s no implication of my endorsement made on the result (unless I know what it is and actually do endorse it).

This is what I believe the NFV ISG should now be doing, but I’m skeptical that they would even consider it. The problem with standards groups is that they tend to be high-inertia. If they do something, subsequent work tends to continually validate that which was done before. That makes any significant change in direction difficult, often impossible. The same problem, I think, would likely inhibit the TMF from making progress.

Could a vendor do the job? Maybe one of the big open-source foundations I mentioned in my blog yesterday (Apache, Linux, CNCF)? Juniper actually attempted something like this over 15 years ago, and turned their work into an international body that was pretty successful till vendor and standards politics messed it up. An open-source group could make progress, but I think that it would need some contributed initial framework or it would likely get mired in debate right from the first, and fail to move things fast enough.

“Fast enough” is the key here. We are already seeing, in 5G Open RAN and the RIC, that different platforms and management/orchestration frameworks are emerging for different missions, creating silos. I’d guess that if something isn’t started this year, it will be difficult to bring the right stuff to market in time to fend off the inevitable silos, proprietary visions, and disorder.