Integration Woes and Complexity in 5G and Beyond

If ever there was a clear picture of open 5G challenge, THIS Light Reading piece may provide it.  The chart the article offers shows the things that Dish had to integrate in order to build an open-model 5G network.  It doesn’t look like a lot of fun, and there’s been a fair amount of industry skepticism generated.  The old saw, “A camel is a horse designed by a committee” surely comes to mind, and that raises the question of how to avoid camels in an open-model network future.

It’s not just 5G we have to worry about.  The issues that created a table of elements for 5G has done similar things in other areas.  We need to understand why the problem is so pervasive, and what might be done about it.

Networking is perhaps the only technology space that can define ten products based entirely on open standards, and still find that there are almost no interfaces in common.  This sad truth arises from the fact that network transformation projects tend to be very narrow in scope, which means that they have to fit in with a bunch of similar projects to add up to a complete definition of a network.  Why is that, and what (if anything) can be done about it?

I recall a meeting of the NFV ISG in the spring of 2013, where the discussion of the scope of the project included the statement that it was important to limit the scope to the basic question of how you deployed and managed virtual functions as software elements, leaving the question of broader management of the network out of scope.   This was done, it was said, to ensure that the task would be completed within 2 years.  Obviously, it wasn’t completed in that time, and obviously the question of how a virtualized set of functions was managed overall was important.  We didn’t get what we needed, and we didn’t meet the timing goals either.

The decision to make the functional management of a virtual-function network out of scope wasn’t actually a stupid one, if viewed in the proper context.  The difficulty was, and still is, that these kinds of decisions are rarely put in any real context at all.  Without the context, there’s plenty of room for a non-stupid concept to be a bad one nevertheless.

We were, at that moment in 2013, managing real devices, meaning what the NFV ISG called “physical network functions” or PNFs.  The presumption that was inherent in the views of that meeting was that it was critical to preserve, intact, the relationship between the management system and the PNFs when the PNFs were virtualized.  Since the process of deploying and connecting virtual functions would clearly not have been part of the management of PNFs, that was what the NFV ISG ruled as its mission.  However, that decision created a second layer of management, adding to the number of things that had to be present and integrated for NFV to work.

Networks are collections of stuff (the name, in fact, implies that).  Each of the elements of the network stuff are subject to review in terms of whether it provides optimum cost/benefit for the network operator.  This means that there’s a strong tendency to look at stuff-change piecemeal, and to contain the spread of impact to protect the investment in technology areas where there’s no particular need for a change.  That’s the likely foundation of the thinking that led to NFV as it is, and the same pattern of thinking prevails in all areas of network transformation today.  That means that even things like 5G and Open RAN are, to a degree, victims of PNF-think or scope containment.

Does that mean that we’re so locked into the past that we can’t transform at all?  Is transformation possible if we do nothing but reshape the nature of the PNFs, the boxes, without changing anything else?  Those are really subsets of the big question, which is whether we can transform networks by organizing different forms of the same thing, in the same ways as before.  Or, do we have to rethink how networks can work differently within, to open new possibilities and new opportunities?  We need to create an agile model of the new network, somehow, so that scope gaps aren’t inevitable, so that we can fit things that belong together into a common model.

SDN was inherently more transformational than NFV.  Why?  Because SDN proposed that an IP network that looked like the IP networks of old at the point of external interface, but that worked differently and (in some ways) more efficiently within, was the best answer.  The black-box theory, in action.  What’s inside the box doesn’t matter, only its external properties.

Despite SDN’s natural transformation potential, though, it hasn’t transformed the WAN.  Why that is, I think, is that SDN is illustrating the problem of true transformation in networking, which is that spread of impact.  I build a network that’s not based on native IP within itself, but I have to interface with stuff that thinks it is native IP.  Thus, I have to build native IP behavior at the edge, which is a new task.  I can’t manage the SDN flow switches and controllers using IP device management tools because they aren’t IP devices, so I have to emulate management interfaces somehow.  More scope, more pieces needed to create a complete network, more integration…you get the picture.

Scope issues have left is with an awful choice.  We can either do something scope-contained, and produce something that doesn’t change enough things to be transformational, or we can do something scope-expanding, and never get around to doing all the new things we now need.

There’s been a solution to this scope problem for a long time, illustrated by Google’s Andromeda and B4 backbone.  You wrap SDN in a BGP black box, treat it as perhaps a super-BGP-router with invisible internal SDN management, and you’re now an IP core.  That’s a great strategy for a revolutionary company with ample technical skills and money, but not perhaps as great for network operators with constrained skill sets and enormous sunk costs.

This is where I think the ONF is heading with its own SDN and SDN-related initiatives.  We actually have tools and software that can create a thin veneer of IP over what’s little more than a forwarding network, something like SDN.  We can compose interfaces today, with the right stuff, just as Google did.  The ONF doesn’t include the capability now, but might they?

DriveNets, known mostly for the fact that their software (running on white boxes) is the core of AT&T’s network, is perhaps the dawn of another approach.  Rather than applying SDN principles inside a network-wide black box, why not apply at least some of (and perhaps a growing number of) those principles to building a kind of composite device, a cluster of white boxes that looks and works like a unified device?

SDN separates the control and data planes, and it’s this separation that lets it work with composed interfaces, because it’s control-plane packets that make an IP interface unique, that give it the features not just of a network but of an IP network, a router network.  DriveNets does the same thing, separating the cluster control plane.  It makes a bunch of white boxes in a cluster look like a router, so we could say that the SDN model is going from the outside or top, downward/inward, and the DriveNets model is moving from inside to…where?

The reason this discussion is important to our opening topic, which is the 5G challenge, is that 5G standards and Open RAN and other related stuff is making the same sort of mistake SDN and NFV made, which is creating a scope boundary that they don’t resolve.  We have 5G “control plane” and “user plane”, and the 5G user plane is the whole of an IP network.  We have interfaces like N4 that go to a User Plane Facility, which ultimately depends on IP transport, but the UPF isn’t a standard IP network element or feature.  5G’s control plane is an overlay on IP, almost like a new protocol layer, because that’s pretty much what it’s trying to be.  Mobility management by creating tunnels that can follow a user from cell to cell to get around IP’s address-to-location who-it-is-where-it-is dilemma.

Why not get around that by composing the capability via a separate control plane?  Today it’s hard to introduce things like segment routing into IP because you have to update all the devices.  If there is no knowledge of the control plane within the forwarding devices, if all control-plane functions lived in the cloud and controlled forwarding, couldn’t things like tunnels and segments be composed in instead of being layered on?

Separating the IP control and data planes is fundamental to SDN, and also to DriveNets’ software.  It lets you build up a network from specialized devices for forwarding and other devices for hosting the control plane elements, which means that you can do a “cloud control plane”, something that I think the ONF intends but hasn’t yet done, and something DriveNets has definitely done.  But the separate control plane also lets you compose interface behaviors like Google’s Andromeda B4, and it would also be a doorway to creating network-as-a-service implementations of stuff like the 5G user plane elements.

If you can create a forwarding path from “A” to “B” at will by controlling the forwarding devices’ tables, do you need to build a tunnel the way both 4G/EPC and 5G UPF do?  If you can compose interfaces, couldn’t you compose the N4 interface (for example) and present it directly from your transformed IP network to the 5G control plane?  Might you not even be able to host the entire 5G control plane and user plane within the same cloud?

I think the answer to layer complexity, scope complexity, integration complexity, and management complexity in 5G and all transformed networks is composability of the control plane.  Yes, it raises the question of whether this sort of capability could promote proprietary implementations, but if we want to create open yet composable specifications, we can work on that problem explicitly.  That at creates a plausible if not certain path to solution.  I submit that just multiplying elements, layers, and pieces of a solution doesn’t lead us to anywhere we’re going to like.