There’s VNFs and then There’s VNFs

With all the talk about virtual network functions (VNFs) in the ETSI Network Functions Virtualization (NFV) group and now in the Nephio project (see my blogs HERE and HERE), it occurs to me that we’ve not really talked much about VNFs themselves. In particular, we’ve not talked about the fact that there are multiple VNF types or models, and that this high-level classification of VNFs has a lot to do with how we might expect to host and operationalize them.

While the original ETSI material doesn’t make this totally clear, the original vision of NFV was to create VNFs that were flow appliances, meaning that they were intended to be in the service data plane and provide some feature set “above” basic terminating of the network connection. Think firewall. One of the early goals of NFV was to create cloud-hosted equivalents of what were normally provided as CPE, which operators hoped would lower capex.

A flow appliance model has challenges. First and foremost, focusing as it does on CPE, it’s almost automatically limited in utility to business services. Residential CPE today is almost exclusively a WiFi hub combined with a simple home router, and the cost is typically low. “Standard” residential terminating devices are running between $35 and $70 US to operators, and most of that cost comes from the essential function of CPE, which is to terminate the service connection and provide local WiFi. The problem this creates is that most flow appliance missions aren’t going to be financially sensible, and so flow appliances of this type are not likely to impact operator costs very much.

The other type of flow appliances would be switches and routers, things that in most applications require high-performance switching of packets that a standard server backplane could not provide. This would require a white-box device with special switching chips, and while these were becoming available in 2013 when the ISG’s work framed out the architecture, they were not a target for VNF hosting. I believe that without the inclusion of white boxes, any VNF initiative is likely to be seriously, perhaps fatally, hampered.

The second class of CPE, which was also introduced by the ETSI NFV ISG, was the control plane appliance. This was introduced by the ISG in 2013 in the context of the mobile-network IP Multimedia Subsystem (IMS), which defined separate service control and mobility management features. In today’s world, the focus of control plane appliances has been 5G, of course.

Control-plane functions are interesting because they influence data-plane behavior but don’t actually participate in the flows. That makes them a much better candidate for cloud hosting, and it wouldn’t require specialized hardware either. Another interesting aspect of control-plane hosting is that the “disaggregation” interest in IP networking that separates the control and data plane creates a mission potentially broader than mobile/5G networks.

The primary challenge for control plane appliances is hosting resources. Mobility management is a metro function, one not easily pushed deep into the cloud because of latency issues and the risk that long data paths could reduce reliability/availability. Edge computing is the obvious answer, but given that “edge” here really means “metro” in nearly all cases, the question has always been who would deploy the resource pools. If there isn’t a fairly efficient resource pool, then control plane appliance missions start to look like uCPE or bare metal missions.

5G also appears to have spawned the realization that while the 3GPP mandates “NFV” for control-plane deployments, this probably isn’t a good strategy for the 5G RAN. Open RAN defines a Radio (RAN) Interface Controller or RIC that does a lot of what NFV would do, but in a lightweight form. Nephio appears to promise support for the 5G features, which would make the RIC a Kubernetes element. There are other roles the RIC performs, and so it will be interesting to see how the full RIC function can be mapped to a Nephio implementation.

One important point about this category; the “control plane” is a term that means different things depending on who’s talking about it. In IP networks, there’s a control plane where exchanges between nodes are used mediate data plane routes. In mobile networks, there’s a control plane that defines things like mobility management. The term isn’t precise in one sense, but it always tends to mean a level of interaction that controls the way an “element” of the network relates to the cooperative community that’s the network itself.

Category three for VNFs is decomposed function hosting. At the very first ETSI ISG meeting in 2013, there was some interest in hosting pieces of physical network functions (PNFs) independently rather than having VNFs map 1:1 to PNFs. A formalized model of this can be found in disaggregation and in SDN, but it’s possible to conceptualize almost any “appliance” as a series of decomposed functions. For example, a firewall might be a combination of a flow appliance and a separated element that maintains forwarding rules across an entire enterprise, or even for all consumers.

The challenge here is a model for the decomposition. That’s what killed the idea early on; vendors who offered PNFs were totally uninterested in decomposing their stuff, and without some architecture to define how the decomposed elements could be defined and reassembled, it would be difficult to create a useful and open ecosystem. I think serving this particular mission would likely require support from my hypothetical PaaS layer.

This could well be the pivotal VNF model, whatever its challenges, because it would permit a break between the VNF and PNFs, which would mean that cloud-native design would be applicable to network functions overall. It also connects nicely to Kubernetes’ mission of deploying all the components of an application as a unit, and maintaining their connection. With this model, we start thinking about assembling services from assembled virtual devices that don’t have a direct physical-appliance counterpart.

Which brings us to the final VNF category, truly virtualized features. Once we have a model to build services from built functions, we can rethink the whole notion of services and features. This, I think, is what would be needed to achieve AT&T’s stated goal of building a new layer of “connectivity” features that would empower the network and at the same time support new partnerships with OTTs.

One aspect of this class of VNFs is that it could represent a network destination, like an OTT element. This doesn’t commit the operators to being OTTs, but rather to being able to provide features that are evolutionary/revolutionary rather than ones they traditionally offer. That’s why this category could represent a path to achieving AT&T’s goals.

A pedestrian example of this class is the logic associated with content and ad delivery. There are routing-and-redirect functions and caching functions associated with these related applications, with the former obviously being related to DNS decoding, a feature that nearly all the major operators offer, bundled with their broadband access. The latter is something traditionally provided by others, including CDN vendors like Akamai, but operators have a vested interest in content caching as it would relate to optimizing their own metro/access infrastructure.

The three classes of VNFs I’ve cited here are important not only for what each represents, but also for the progression they demonstrate. Operators have, for decades, avoided what I think is an irrefutable truth; connection services will always commoditize. That means that if you want to improve your profits as a network operator, you need to rise above them. There may or may not have been good reasons for operators to eschew the OTT opportunities a couple decades ago when they were still available. Whichever was the case, those opportunities have passed them by, at least in terms of offering potential for direct retail exploitation. AT&T’s theory of fostering a symbiosis with OTTs that allows some OTT service revenues to pass to them is the best strategy that remains, but exploiting it means moving through the progression the VNF classifications represent.

AT&T made an important decision when they laid out their service-symbiosis model, but they need to flesh it out, and Nephio presents an opportunity to define a platform that could support that. It’s not there yet, and operators aren’t yet fully onboard with exploiting it. I’m excited to see how this all develops, but I’m still wary because operators could have fixed their problems in the past, too. They didn’t, and a change in technology won’t overcome a lack of change in willpower.