Microservices, cloud-native practices, and service mesh technologies are all popular in cloud development. Does that mean they’re essential in the development of telco applications? Should VNFs be cloud-native microservices, connected via a service mesh? I pointed out in THIS blog that there are very different missions associated with virtual functions, and that the mission determines the best architecture. How would the three VNF categories I discussed map to cloud-native practices? That’s our topic for today.
Cloud-native design isn’t tightly defined (nor is much else these days), but most would agree that it involves building applications from an agile, highly interconnected, set of microservices. A subtext for this is that the application you’re building is logically divided into “pieces of interactivity”, meaning little nubbins of functionality that relate to the steps a user would take in running the application. This logical structure means that the components in an application are regularly involving a GUI and human delays.
A virtual function is different from an application, but how different it is depends on its mission, which maps to the three categories of VNF I outlined in the blog referenced above. Let’s look at each of them to see what we can find.
The first category of VNFs is made up of those involved in direct data-plane connectivity, the flow VNFs category. This is the category where the work of the NFV ISG could be most relevant, since the body focused on “universal CPE” missions almost from the first, and CPE by definition is in the data path.
The challenges with cloud-native implementations in this category are first, that most flow VNFs are likely not stateless, and second that in many cases flow VNFs aren’t really able to be scaled or redeployed in the traditional cloud-native way. That’s because a flow VNF is connected in at a given place, and it may well be that another instance (scaled or replaced) has routing tables that reflect that place.
These issues are in addition to the broader challenge, which is that a hosted function running on a server may not have the performance needed to support the mission. I’ve advocated a function-hosting model that includes white-box hosting, which would partially resolve this challenge. The remainder of the resolution would have to come through something like the ONF Stratum/P4 approach, which uses driver technology to generalize access to switching chips likely to be critical for the VNFs in the data plane.
The second VNF category is the control plane appliance VNFs, and this is largely the focus of NFV in IMS and 5G applications. Here we have things like mobility management and signaling, missions that don’t involve direct packet transfers but may control how and where the packets go. This category raises a number of interesting points with regard to cloud-native implementation.
One obvious point is whether a control-plane appliance’s mission could be related to the mission of a component in an IoT application. A typical control-plane exchange appears analogous to an IoT control loop, and if that’s the case then this category of VNF would behave like an element in edge computing. That would mean that whatever procedures were used in building this category of VNFs should also permit building IoT/edge applications. That, in turn, means that this category of VNFs should be the most “cloud-like”.
I think this is the category of VNF that Nephio is intending to focus on. Certainly the majority of control-plane VNFs should be suitable for cloud-native deployment and for container use. Kubernetes, which is what Nephio centers around, is capable of deploying to bare metal with an additional package, and that might also allow it to deploy to a white-box device.
It’s not clear to me that we’d need to support white-box switches, meaning network devices, for this VNF category, but it may be true that this category of VNF would benefit from custom AI chips, image recognition, etc. Given that, it would be wise to think about how this sort of thing could be accommodated for this category of VNFs, particularly given that it’s probably more likely that edge computing and IoT applications would benefit from those custom chip types.
The final VNF category is by far the most complex; decomposed function hosting. In early NFV discussions, this category of VNF would have supported the breaking up of physical network functions (devices) into components, which would then be reassembled to create a complete virtual device. Thus, this class of VNF depends on what PNFs are being decomposed and how the process divides features. The same category would represent VNFs designed to be reassembled into higher-level “packages” even if there was never a PNF available to provide the feature set involved.
What divides this from our second category is the lack of specificity with regard to what the functions are and how they relate to each other and to the control/data planes. The challenge is that if you’re going to decompose a PNF or build up a kind of “never-seen-before” appliance by assembling features, you can’t accept too many constraints without devaluing the mission overall. It might be easy to say that all you really need to do is consider this category a refined implementation of the two previous ones, but for one factor.
The NFV push for decomposition of PNFs was intended to set up a model of a virtual device where functional components could be mixed and matched across vendor boundaries. That doesn’t mean that the factors related to implementing our first two categories wouldn’t still apply, but it would mean an additional requirement for a standard API set, a “model function set” from which open composition of appliances could be supported. This isn’t really a cloud-native capability, but it draws on a general development technique where an “interface” is “implemented” (Java terms).
The challenge here is that in development, one normally establishes the “interface” first, and if the source of decomposed VNF elements is a prior PNF, you have to not only create an interface retrospectively, you have to figure out how to make current implementations of the PNF decompose in a way that supports it. The idea of decomposition failed in the ISG largely because of these challenges, and that suggests that we might either have to abandon the whole notion of decomposition and assembly of virtual devices, or apply the idea only to things like open-source implementations where standardizing the functional breakdown of a virtual device would be possible.
There’s a wide range of things that these three categories of VNF would be best hosted on, from bare metal and white boxes to containers and cloud-native functions. The third category fits service mesh technology, at least for some functions, the second might in a few, and the first is unlikely to be able to use service mesh at all, given that its message flows and interface relationships would be almost surely static. This illustrates just how variable VNFs are, and how challenging it is to try to shoehorn virtual functions into a single model, which is what the NFV ISG tried to do.
Every one of the three VNF classifications introduce specific development and hosting constraints, but the good news is that these appear to be addressable in an orderly way. You need to be able to map deployment tools to everything from bare metal and white boxes to VMs and containers. You need to introduce some APIs to guide development and standardize operations. None of these needs present overwhelming challenges, and some aren’t really even difficult; we’re part of the way there already. The only question is whether there’s a mechanism and a will. Projects like Nephio may answer both questions by demonstrating progress and support.