An Approach to Cloud-Native 5G

Could we frame out a true cloud native 5G Core?  What would happen if we set aside as many of the implementation presumptions as we could, and focused on trying to come up with the best technology approach?  It’s an interesting question, and it might turn out to be a highly relevant one now that so many cloud providers and software vendors are aiming at carrier 5G Core deployments.  Along the way to answering this question, we might also glimpse a solution to the problems of getting standards bodies aligned with the world of the cloud.

5G Core is arguably different in four ways.  First, it proposes to replace network elements (devices) with network functions (software instances).  Second, it proposes to create virtual partitions within the 5G network overall, called “network slicing”, somewhat analogous to virtual network overlays on a real network.  Third, it proposes to abstract the user-plane and control-planes and separate them fully, and finally, it proposes to fully abstract network transport.  All of this has to be done in a way that defines specific interfaces that promote interoperability.  So how might that happen?

The mandate of 5G is for “virtualization”, both of connection services and network functions.  The standards and the trend in the industry is to equate this to the adoption of “SDN” and “NFV”, but if you look at the implementations of 5G, it’s clear that vendors are taking a broader swipe at virtualization than the specifics of those two technologies.  We can therefore assume that a true 5G core would rely on virtualization principles and perhaps blow a kiss or two at SDN and NFV as formal components.

The architecture for 5G (a revised chart from the 5GPP organization is my reference for this first figure) shows a data plane (the “User Plane”) and a Control Plane, hereafter referenced as the UP and CP.  The functional elements of each plane are shown as dark blue boxes in my figure, and the interfaces between them (the “N” interfaces) represent the standardized relationships mandated in 5G specs.

The problem with this approach is that it tends to lead vendors into a “virtual box” model, meaning that what they do is implement each of the boxes in the figure as though they were virtual devices.  That happened with NFV, and was IMHO the biggest single reason why the initiative hasn’t met expectations.  The problem is that once you define interfaces, you’re defining things that connect, which leads to defining the things themselves.  What we have to do is stop thinking of these boxes as virtual devices, and think of them in some more generic way.  Since that defies the specs, to a degree, we have to work our way into this.

The place to start in refining this starting point is the separation of control and data planes, our third 5G difference.  In the control plane, 5G functions can be viewed as microservices, having no internal state and being infinitely scalable.  Latency management in the control plane isn’t as critical, so it would be possible to adopt a service-mesh framework or something similar to provide for the necessary combination of discovery, load balancing, and scaling and resilience.  Our cloud-native 5G Core should have its control-plane elements based on this framework, since this is where “cloud-native” is clearly going at the application level.

Within the CP piece of my figure, we could then presume that each of the blue boxes are service-meshed microservices with full resilience and load-balancing, or perhaps in some cases are serverless functions loaded on demand.  The difference would be the rate of activity for a given function.  Since the current leading serverless tool is Knative, which works beneath Istio for service-mesh and above Kubernetes for deployment, we have the potential to identify a single container, Kubernetes, Istio, and Knative model for our cloud-native control plane.

On the data plane, things are a bit more complex.  A router function is not stateless; routing tables are stored locally and it’s these tables that determine how packets are handled.  Router functions are also more difficult to scale or replace without impact on traffic, because packets in flight and currently being handled could be lost or delivered out of order.  Ordering is a feature of some data-plane protocols (like TCP) but it always creates a risk of a sharp increase in latency.

We have two options available to us in creating the data plane.  The first is to fall back on our first “difference”, saying that physical devices are replaced by virtual functions.  If we presume that replacement, with respect to packet forwarding at least, is 1:1, then we could say that the data plane will be implemented with stateful virtual devices.  The second is to presume OpenFlow-style SDN control, with central route control, and then assume that each instance of the router function would retrieve its routing table centrally, which would mean that any new instance created for scaling or availability would be equipped to take up its role.

The second model is the one I believe would be most “cloud-native”.  Back-end state control is an acceptable way to introduce stateful information into a microservice, and we would have the option to either deliver updates to the routing tables (from the central control point) or to reload or refresh the table entirely when a change is made.

This second model would be compatible with formal OpenFlow-style SDN, but it could also be applied to any architecture that can build an overall network topology and derive per-device forwarding from it.  In short, we could separate the router-adaptive model of routing now in use, build the topology map in the control plane, and deliver it to the function instances as needed.

I think that this model fits the general thrust of the 5G specs, but it would tend to define what I’ll call a “sub-plane” structure to the control plane.  5G has specific elements it expects to be in the control plane, and what this approach does is generate what’s perhaps a “mapping sub-plane” that maps the 5G control model and data model to the specific virtual network and virtual function requirements of the data plane.  The figure shows the interfaces as vertical lines connecting CP and UP elements.

My presumption is that in a cloud-native model, these N interfaces would terminate not in the UP elements but in intent-model abstractions of each element’s functionality.  These function-based intent models would assert as their external properties the various N interfaces the functions are required to supply by the 5G specs.  Each function’s implementation would involve a set of microservices that represent the features of the mapping sub-plane, which are largely the control and management planes of IP.  These would interface with the real IP data-plane elements, which might be white boxes, hosted instances of forwarding devices, P4-equipped virtual switches, or whatever.

In effect, this mapping sub-plane then would look like an overlap between the CP and UP, as my second figure shows.  From a functional perspective, it would be part of the UP because it would terminate 5G CP N interfaces to the UP.  From an implementation perspective, it would be based on the same microservices/serverless framework as the CP.  Within the CP mesh portion, the control functions of the mapping layer would live, both as a global service (the green box) and as a set of elements distributed within the abstract models of each UP element.

The horizontal N interfaces between UP elements would also be interfaces to these intent-model abstractions, which means that a “virtual UP element” would consist of interfaces to the mapping sub-plane from the CP, interfaces between the UP elements within the UP, and logic to supply the mapping-plane implementation of the IP control and management planes.

You may wonder why all this mapping is necessary, and the reason is that 5G, despite its desire to push telcos toward the cloud, still has very much a device-centric bend.  There is more than one way to deploy in the cloud, more than one way to orchestrate.  Implementations of a given CP or UP function could be conformant to 5G goals and not be identical, to the point where the optimum means of interfacing wouldn’t be simply (for example) an “N” interface to an instance of a UP function.  We have to allow some flexibility because our vision of how the cloud works is evolving as our experience with the cloud expands.

The presumption of this model is that all the “functions” within the CP and all the mapping-layer functions are true microservices, deployed within a service mesh and potentially in serverless, event-driven, form.  The UP elements that make up the “real” data plane are placed and orchestrated rather than meshed, using Kubernetes.

This isn’t the only way to do a cloud-native 5G, but it’s a way that seems to conserve the value of ongoing cloud development, and at the same time harness the benefits of intent modeling.  It’s the combination of these things that create what I think can be justifiably be said to be a true cloud-native approach to 5G architecture.