Why “Separate the Control Plane?”

What does separating the IP control plane really do?  It’s been a kind of mantra for a decade that a separation of the control and data planes of an IP network creates a beneficial outcome.  I’ve had a lot of discussions with vendors and network operators on this topic, and there’s a surprising variability of viewpoint on something that seems so widely accepted.  It’s probably time to dig down and explore the issue, which means we have to take a look at what the control plane really does.  Hint: The answer may be “it depends.”

All networks have a fundamental responsibility to turn connections into routes.  A route is a pathway that, if followed, delivers packets to some destination.  This is a bit more complicated than it seems, because router forwarding tables only define the next hop.  In order to build a route, there has to be two things; a knowledge of topology and a means of organizing the hops into routes.  That means having an understanding of network topology, including who can be reached where.

This can be a complex process, but the Cliff Notes are simple (please don’t point out the exceptions; it would take a book to describe everything!).  Each router in an IP network advertises what addresses it can “reach” to adjacent routers, who then pass it along to their adjacent routers.  Routers, in some way (hop count, link state), decide which advertisements of reachability are “best” for them, and so they list the advertiser in their routing table, making it the next hop.

When something breaks (a trunk or a router), the result is that some routes are broken.  The advertising process, which is ongoing, will then define a different hop for routers adjacent to the failure, and a set of new routes will be created.  This process is sometimes called “convergence” because it means that all routers impacted by the fault have to agree on new routes, or there’s a risk of packets falling into space.

The actual process of forwarding packets is simple by comparison.  A packet is received, the IP address of the destination looked up in a routing table, and the packet forwarded to the next hop identified in that table.  It’s so simple that you can reduce the whole thing to silicon, which has been done by many vendors already.

The argument for a “separate control plane” starts with this difference in complexity.  Data forwarding is a drudge job, and route determination is an egghead task that seems to get more complicated every day.  Most router vendors have separated the control plane and data plane within their devices for this reason.  Over time, control-plane processes have been more “computer-like” and data-plane more “siliconized”.

Suppose now that we were to take the control-plane processes out of the box completely, put them in a separate box with a direct 1:1 connection to the data-plane devices?  That was proposed over a decade ago (by Juniper, and perhaps others).  We could now size the control-plane device based on the actual compute load.  We’d still need a high-speed connection to the data-plane device because control packets in an IP network are in-band to the data flow.  Even more separation with this model, right?

OK, now we can look at something other than the 1:1 device relationship.  Might a single control-plane processor manage the control-plane packets for a multiple data-plane devices?  Subject to load calculations and connection performance between them, why not?  Similarly, could we visualize the control-plane device as being “virtual”, a collection of resources that cooperated to manage the routing.  We could combine the two as well, creating a “cluster”.  DriveNets, who recently jumped to over a billion-dollar valuation with their latest funding round, is a cluster model.

The interesting thing here is that there’ a lot of variability in how these many-to-many clusters could be constructed.  A virtual node could be a re-aggregation of a bunch of disaggregated functions.  White boxes don’t come in an unlimited variety of configurations, so today we’re seeing a cluster limited by current hardware.  As we diddle with cluster-creating options, might we find other functions that could be solidified into a white-box model, and thus advance the richness of the configuration overall?  I think we could, and one thing that could drive that hardware advance is an enhanced mission.

We started our discussion with a simple view of a router network.  All of our speculations so far presume that we have the same control-packet flows in our separate-control-plane frameworks as we did with classic routing.  Suppose we now think about tweaking the behavior of the control plane itself?  If a “cluster” is the node, the virtual router, then what are its boundaries, and what might be different on the inside versus at the edge?

SDN took the cluster model of a bunch of data-plane “forwarding devices” and combined it with a centralized control plane implementation.  One controller to rule them all, so to speak.  This creates an interesting situation, because that one controller now has centralized end-to-end, all-device, all-route knowledge of the network.  There’s no need to have a bunch of hop-by-hop adaptive exchanges of topology because that master controller has everything inside it.

SDN is a special case of what a few pundits have proposed, which is that IP could benefit from end-to-end control-plane visibility.  Suppose we have “edge interfaces” where real IP-native behavior has to be presented, with both control and data packets.  Also suppose we have “trunk interfaces” where we only forward the end-to-end stuff.  The edge interfaces feed what’s likely a logically centralized but physically distributed control-plane process.  Now every port is a port on what’s essentially a giant virtual router.  A device is simply a place to collect interfaces, because everything operates with central knowledge.

What, exactly, would a control plane like this look like?  There are probably a lot of options, including the ONF SDN controller option, but let’s look at what we know to be true about any such approach.

First, the data path between the edge interfaces and the control plane has to be able to handle the control-plane traffic exchanges there.  How much actual traffic would have to pass would depend on how powerful the edge interface was with respect to proxying the control plane traffic expected, and how complex the control plane was at that edge point.

Second, this first point leads to a conclusion that the separated control plane is really a kind of hierarchy.  There’s an edge-interface process.  There’s probably a local, perhaps per-site, process that knows all about conditions at a given facility where multiple trunks terminate.  There’s a central process that knows everything, perhaps rooted in a highly available database.  Likely the things that were response-time-sensitive would be close to the edge, and those not so sensitive would be hosted deeper in.

Third, if forwarding is controlled by simply manipulating forwarding tables, then anything that’s based on forwarding control could be co-equal with the IP control plane.  I would submit that this is why it’s reasonable to expect that centralized control planes would eventually combine, whatever their source, and so the IP and 5G, or CDN, or EPC, control planes would all combine, and all determine routes by simply updating those forwarding tables.

I think that if you look at 5G Open RAN, the ONF SDN approach, DriveNets’ success, and the historical trend in control-plane separation, it’s inevitable that this creates a model much like I’m describing.  It’s less capital intensive, it creates a less complex overall network, it improves operations, it integrates all the stuff that’s now threatening to create silos…the list goes on.  The question is how it will come about.  Will 5G “open” initiatives recognize the benefits, will Open RAN promoters specifically address the idea, could white-box vendors, network operators, standards groups (like the ONF), or “disaggregation” vendors (like DriveNets) do the job?

DriveNets is closer to the brass ring from an implementation perspective, for sure.  The ONF’s programmable network concept is a contender, but at a distance that’s created by the inertia of the standards process and the necessary loss of differentiation that implementing a standard creates.  The Open RAN initiative is the real contender.  The concept of the Open RAN control plane in general, and the RAN Intelligent Controller (RIC) in particular, could be a jumping-off point to incorporating some of the IP control plane features into (in effect) the 5G control plane.  DriveNets eats routing from below, and has taken some real steps in the right direction. Open RAN eats it from above, and 5G has a lot of budget behind it that Open RAN could co-opt.

If consolation of all control planes is the goal, I think we can rule out standards as a source of progress—it takes too long.  Operators tend to find refuge in standardization when they want something (though AT&T has been pretty bold) so they can probably be ruled out too.  That leaves the “open” and “disaggregation” initiatives to carry the water, or hopeful water, in the matter.  I suspect that the value of all of this will be clear by the end of the year, so it’s likely we’ll know by then just who will be leading this approach, or if the whole idea is hung up on tactical concerns and unlikely to be implemented despite my prediction.  That would be a shame because I think the benefits to the network community are significant.