The Architecture for a Separate Control Plane in Networks

I blogged last week about the value of, or perhaps the necessity of, separating the control and user planes in 5G. The main point I was addressing was that servers and cloud software aren’t really optimal for pushing packets at high speeds and high volumes. If the control plane were broken out, the user plane could be supported via routes/switches in proprietary or white-box form, and the control plane supported as an application. What would the software architecture to achieve this look like?

The key consideration in this separation is that, because control-plane traffic consists of episodic messages, service logic in general is essentially event-driven and thus something we already have experience handling in the cloud. It should be possible to use a variety of cloud-based services to support service control planes, and that raises the question of what the optimum service would look like.

Finally, we have the question of “five nines”. Everyone says that telcos expect five-nines availability, and that’s almost surely an exaggeration, but it is possible that services demand a higher level of availability than traditional transaction or event-handling applications. Is that true, how much more available are they expected to be, and what might be done to make them more available?

The base architecture for a separated control plane could be visualized easily as a control-plane element set that interfaces with a user-plane element to create what looks like single-box behavior. The 3GPP model is then decomposed into these “pairings”, one for each type of 5G device. Assume we have a number of flow switches, which are the foundation of our user plane, and a control-processor that hosts the CP functions associated with those switches.

One thing that’s immediately clear is that the breakdown of “functional elements” like CUs and DUs in Open RAN is unnecessary and maybe destructive. We’ve seen all too often that functional diagrams created by carrier standards groups all too often get implemented with a device (real or virtual) per functional block, and CU and DU could be represented as a flow switch and a couple cloud services. Also, since the 5G RAN front-haul and mid-haul portions are supposed to be low latency and involve a very small number of connected elements (A DU/CU would connect only to the stuff in the next-lower layer of the Tier structure toward the core), we could presume that these were all supported via almost-static routing, meaning very simplified handling would be sufficient at the user-plane (IP network) level.

But that’s not all. The 3GPP specifications for 5G backhaul (CU to 5G Core) have even more functional boxes, but still separate the user plane and the control plane. It’s easy to visualize the former as flow switches because even there, the number of possible addressable destinations for any given element is limited.

And yet more. Recall that the “user plane” is IP, and that IP also includes a data plane and control plane. Control packets handle things like topology/route management, and it’s fair to ask whether this lower-level control plane also lends itself to separation. It does, and in fact there’s already been an argument to separate it, with classic OpenFlow SDN.

In an SDN network, flow switches’ routing tables are maintained by a central controller, rather than created on-box through topology exchanges with neighbors. The element offering central control over routing could be combined with the “application” that supported the combination of 5G RAN and Core functional elements to create a single application, likely made up of a number of components. There are a lot of operators and vendors saying that control-plane activity is a for-sure microservice application, and many cloud providers are suggesting that it’s also “serverless” or functional.

What it really looks like to me is a kind of service mesh application, meaning microservices combined with sidecars and a controlling element that ensures messages flow reliably. That architecture was described in THIS ARTICLE about an implementation of a highly reliable event-handling system created by Atlassian for its Tenant Context Service. I think something very similar could be done with Istio or Linkerd, but given the tendency to interpret telco functional diagrams as component models, I doubt much thought has been given to that notion.

This illustrates the challenge we have with the a standards community that’s box-centric. In NFV, the “functional block diagram” was interpreted as the literal structure of the software. That’s bad enough when we’re talking about legacy server-hosted software, but it’s awful when the target of deployment is the cloud. Cloud people talk about functional architecture in a totally different way. The cloud, in their terms, is a kind of floating reservoir of functionality that events/messages draw on. Not only does having all these little functional (virtual) boxes not translate into something that describes the cloud implementation, it often constrains the optimum way of approaching things. In my view, both the ETSI NFV ISG and the 3GPP have created a 5G model whose description (as a bunch of boxes) interferes with the stated goal of cloud compatibility.

A good example of this problem is the concept of the interface. What standards diagram these days isn’t replete with interfaces with cryptic labels consisting of a letter (or letter/number combination) and a subscript? But “interface” is really a box concept. A microservice is a function, and a function is a message processor. You send a message to a function, you don’t send it over an interface. The interface is the IP network connection. In a cloud implementation of 5G what you should be describing is message types and processing functions. But you can’t have a functional block diagram without blocks and connecting lines, and thus we fall into interface chaos.

What about the software concept of an API? APIs really describe the protocol and format of the event/message exchange. But when we show a box with three or four connecting lines that are described by interfaces, are we saying that this is a software component that exposes those APIs? Hopefully not, because we should be talking about a set of software functions, each having an API that represents one of those lines.

You can dismiss all of this as software-guy ranting against hardware-people-think, but that’s not my reason. The cloud has capabilities, benefits, inherent in the cloud model. We can realize those if we adhere to that cloud model, but if we constrain our software to act like boxes, if we create software that’s a set of virtual network functions that map 1:1 to boxes (physical network functions), then we are not building the cloud model. In that case, pushing the VNFs into containers or deploying them with Kubernetes doesn’t create a cloud implementation, and we can forget creating the kind of infrastructure everyone says they want and need.