Light Reading did an interesting article asking for “a little less conversation, a little more action” on cloud-native VNFs. I agree with a lot of what the piece says, particularly about the fact that the market and technology aren’t really ready for cloud-native VNFs, and that the Linux Foundation is eager to try to close the gap. I’m less sure about why the gap exists, what would actually help close it, and even what the conversation is about.
There seems to be a tendency in the market to conflate “containerized” and “cloud-native”, and that is no more useful than conflating virtual machines and cloud-native. One of the big problems with our whole discussion of VNF evolution is linked to this conflation, because it leads us to fix the wrong problem.
A containerized application is one that is designed to be deployed as a series of containers, by an orchestration system like Kubernetes. You could absolutely, positively, now, and forever make virtually any monolithic application, the stuff we run today in single servers, into a containerized application. You most definitely do not have to be cloud-native to be containerized, and my own input from enterprises is that most container applications today are not cloud-native.
A cloud-native application is an application divided into scalable, resilient, components that are designed to take advantage of the cloud’s inherent agility. They are typically considered to be “microservice-based”, meaning that the components are small and don’t store data within themselves so they’re at least semi-stateless (purists would argue they rely on externally maintained state). It’s likely, given the state of the market, that cloud-native applications would deploy via containers, but it’s not a necessary condition.
NFV, at the moment, isn’t either one of these things, so perhaps in one sense this whole conflation thing could be considered moot. Why not refine two things you’re going to say are bad into one single bad thing? Answer: Because the fixes for the two things are different, and it’s not even clear how much either fix would matter to NFV anyway.
NFV latched onto virtual machines and OpenStack as the deployment model from the very first. A physical network function, meaning a device, was to be replaced by a virtual network function (VNF), which was software instantiated in a virtual machine via OpenStack. Sadly, the model that the NFV ISG came up with was actually more general; the deployment process was to reside in a Virtual Infrastructure Manager, or VIM. The VIM could have been seen as the abstraction that bound virtual hosting as described by NFV Management and Orchestration (MANO), presumably via some sort of model, to actual resources.
The slip from the path seems to have come about largely because of two factors. First, a virtual function as an analog of a physical device is a unitary function, and it demands absolutely the best security because it’s part of a network. Hence, VMs. Nothing in the abstract view of a VIM precludes using VMs, or OpenStack, but apparently the thought was that not precluding it wasn’t enough, you had to mandate it.
And, just maybe, they were right. If you build a virtual network by assembling virtual boxes that are 1:1 equivalent, functionally, of devices, then you might need the security and resource isolation that VMs can bring. Containers are not as secure as VMs, though the difference might not be difficult to accommodate. Containers do put you at risk for more crosstalk between co-hosted “pods” because they share the same operating system and some middleware. The biggest advantages of containers that proponents of “containerized network functions” cite is the fact that you can get more of them on a server, and that since there’s only one OS there’s also less operations effort. Those might not be worth much if you’re managing services not servers, and if your total container throughput limits how many you could stack on a server anyway.
If containers aren’t necessarily the natural goal of VNF hosting, how about cloud-native? It would indeed be possible to take something like “routing” and turn it into a set of microservices. That would mean creating separate elements of the overall VNF, then linking them in a service mesh. If you did that, many of the control, management, and data processes that would have flowed across a router backplane are now flowing through a network connection, with its bandwidth limitations and latency issues.
Where we can make a valid connection between the two concepts is here; if you could somehow make a cloud-native router, you’d almost surely want to use containers in general, and Kubernetes and its growing ecosystem in particular, as its basis. If you don’t do that, and thus unite the two concepts into one effort, you probably don’t get far with either one of them. Which means that the Linux Foundation and similar software-centric efforts may be aiming at the wrong goal. We don’t want cloud-native or containerized VNFs at all; we want no VNFs.
If a VNF is the virtual form of a PNF, we’ll never make it cloud-native, nor will we containerize many of them in any useful way. The only truly “successful” NFV application today is virtual CPE, which involves hosting a VNF inside a white-box device. This mission doesn’t gain anything from containers, but it really doesn’t gain much from NFV overall. The DANOS and P4 initiatives would be a far better way to address vCPE.
The hope for NFV lies elsewhere, in taking a different perspective from the one taken when the ISG launched. What has to happen is something that actually came up in the very first NFV ISG meeting I attended, the first major and open one held in the Valley in the spring of 2013. One attendee proposed that before we could talk about composing virtual functions into services, we had to talk about decomposing services into virtual functions. Wait, you say! Didn’t we already say that a VNF was a transmorphed PNF? Yes, and that’s the problem.
Anyone who’s ever architected a true cloud application knows that the last thing you want to do is make each component a current application element. You’d inherit the limitations of data center computing because that’s what you were starting with. It’s possible to build a VNF-based service correctly (Metaswitch had one in 2013, with their implementation of IMS-as-a-cloud “Project Clearwater”), but you have to divide the service into cloud-native functions, not into devices.
We could jiggle NFV’s specifications to make it possible to deploy VNFs in containers, by doing nothing more than abandoning the presumption that a VIM always invoked an OpenStack/VM deployment. We could create cloud-native VNFs by abandoning the silly notion that a VNF is a software-hosted PNF. But these are first steps, steps that abandon the wrong approaches. The next step, which creates the right approach, would also unite the two threads.
Almost everything that NFV has done could, at this point, be done better by presuming Kubernetes orchestration of a combination of container hosts, VMs, and bare metal. That’s already possible in some implementations. What NFV needs to do is accept that model as a replacement for everything. Does that mean starting over? No, because we don’t need to start at all. Kubernetes is already there, and so all that’s necessary is to frame the requirements of NFV in the form of a Kubernetes model. Then we need to construct “services” from “network-microservices”, like we’d build any cloud application.
The article makes a particularly important point in the form of a quote from Heavy Reading Analyst Jennifer Clark. “It became clear working with virtual machines in NFV was not really going to work in a 5G environment, and where they really needed to go was toward containers, and cloud-native Kubernetes to manage those containers.” That’s true, but for a different reason than some automatic connection between 5G and containers or cloud-native. Mobile networks tend to have quite distinct control, management, and data planes. This means that the part of networking that looks like an application—control and data planes—are separated and could easily be made containerized and cloud-native.
Fixing this within the context of NFV discussions like the Common NFVi Telco Task Force (CNTT) is going to be difficult, because standards seem to have a fatal inertia. Marching to the sea is not a good survival strategy, but lemmings (it is said) have the fatal instinct to do just that. Marching to the legacy of NFV is just as bad an idea. Rather than pushing for a standardization of NFVi, which presumes that the higher layers that control it are doing their job, the body should be asking why you need to standardize a layer that’s supposed to be abstracted fully by the layer above. If the NFV community wants to work on either containerization or cloud-native migration, they should start by fixing the VIM model, because without that, nothing else is going to help.
If VIM is the problem, then NFV is really about how service models are transformed into deployments. That means it’s about how you model a service, how it’s designed as a series of interactive components, which is what “cloud-native” is about. It’s then about how those components are deployed and redeployed, which is what Kubernetes is about. CNTT or NFV shouldn’t be about making NFV compatible with Kubernetes, it should be about making Kubernetes the model of NFV, and mobile services, 5G, IoT, edge computing, and a lot of other initiatives are going to eventually do that, either by accepting NFV as a Kubernetes application or by letting Kubernetes-based deployments of service features eventually subsume NFV.
Mobile and 5G is where this starts. Is this a surprise, given Metaswitch’s seminal work with IMS, another mobile service application? Is it surprising given that SDN separates the data and control planes? Why is it that so much insight in networking is lost because it happens outside some formalized consideration of issues? We don’t live in a vacuum, after all. We live in the cloud, which is the whole point.