Big Routers or Big Routing?

Could there be more to cloud-based routing than hosting router instances?  One company, DriveNets, thinks there might be, and Light Reading reports on how “disaggregated” architectures that run software instances on white boxes could create a whole new model of network routing.  But is that really what’s happening here, and is there perhaps a different disaggregated vision of routing that might be the brass ring in this whole cloud-router circus?

If you want to talk about router evolution, you have to start from the protoplasm level, from the hardware routers we have now.  A “router” is a network node that supports multiple ports, and has a forwarding table to guide input packets to output trunks based on a combination of network topology and the points of attachment for real user destinations.  In a traditional router, the capacity of the router is based on what’s often called the “backplane” capacity, which is the speed of the internal data bus that connects all the port interface cards.

Over time, as pressure to support higher speeds has grown, it’s become increasingly difficult to build backplanes fast enough to carry all the traffic the ports could collect.  That’s resulted in a different model of connection, what might be called a “cluster” router.  Think of a cluster router as a router built around a “fabric” switch, something that can provide any-to-any connectivity.  Good fabrics today are “non-blocking” meaning that they don’t interfere with traffic movement no matter how many ports are active.

We’ve had this cluster model of routing for a decade, and it’s at least a loose version of “disaggregated” routing, but I think the real question is whether “cloud routing” has to be more like a cloud than like the kind of cluster we had before the cloud even came along.  Could we actually disperse a cluster, meaning extend the fabric, or should the path to cloud routing be more virtualized?

The obvious problem with extending a cluster is that even today we don’t have network links fast enough to serve as a distributed backplane or fabric.  There are standards (InfiniBand is an example) to define very fast connections, but they’re local.  In any event, to create a non-blocking fabric that’s truly distributed you’d need to have a mesh of very high-speed connections, and if that were practical you might as well must mesh-connect the edges of the network.  Actual cluster dispersal, in other words, isn’t likely a practical answer.

Which leaves us with virtualization.  We have virtual routers today, hosted instances of routing (or more general forwarding) technology.  We can build networks from them, but here we need to take care not to fall into what I’ll call the “NFV trap”, which is mapping virtual to physical so tightly that we end up constraining the use of virtual instances in the same way that physical devices were constrained.

We can put routers where we think we need them.  We can even move them around if, over time, we determine that we need a different configuration.  What we can’t do is create a nodal topology based on short-term traffic requirements.  At least, we can’t do that with physical devices.  We could easily do that in our virtual world, and that’s what I think we have to look at when we talk about “cloud networks”.

At any moment in time, in every network, there’s a series of traffic flows happening, exchanges among the users (human and otherwise) of the network.  These flows would have natural points of concentration where those users were concentrated, and if relationships among certain sets of users were more likely than others, those relationships would also create concentrations of traffic.  The most obvious concentrating factors, though, are the transport paths available.  Unless you presume we can beam traffic over the air, we need to have what will in nearly all cases be optical trunks at the bottom of our connectivity.  Those trunks are point-to-point, joined into complex topologies by nodes that might either provide agile optical switching (reconfigurable add-drop multiplexers or ROADMs) or electrical switching of the optical payload of packets.  Traffic has to go where the trunks go, and so it has to get to specific trunk on-ramps and exit at off-ramps.

What “routers” do in this situation is provide that nodal payload switching.  We put a router at a trunk junction and it decides what path an incoming packet should take, collecting and distributing packets.  What that means is that a router network is in itself a “cloud router”.  In an abstraction sense, a network of routers would look, from the edge looking in, like a big single virtual router.

One obvious example of what might be inside our big virtual router is SDN.  Conceptually, SDN was (and sort of still is) supposed to be a centrally controlled forwarding process, which implies that it replaced adaptive behavior by controlled behavior.  However, if an SDN device is truly a general-purpose forwarding engine, it doesn’t have to behave like a router if what’s at the edge of our virtual-router abstract does present router behavior to the outside world.

Suppose that we presumed that SDN created, inside the virtual router, nothing but forwarding paths, with no control exchanges at all.  The topology of this interior structure would instead be set by the central (or distributed hierarchical) controller, and each of the edge elements would be provided with a forwarding table that linked an interior path to a set of IP subnets reachable through that path.  Those would then arrive at an edge element, and from there be distributed in more-or-less the usual IP way.

The interior paths would traverse a structure of forwarding nodes, but in the event of a problem or simply in response to traffic changes, those interior paths could be changed to traverse a different node set, be expanded in capacity or shrunk, etc.  The capacity of the network, as measured at the edge by summing the bidirectional in/out traffic, could be enormous, far beyond what a single node could possibly carry.

Something similar to this is already done by some SD-WAN implementations.  SD-WANs would often touch the corporate VPN and perhaps even public clouds in multiple places.  SD-WAN traffic would thus have multiple paths to reach a given destination, and if the SD-WAN kept tabs on QoS for each possible connection option, it could adapt to maximize capacity for a given set of conditions.

Does this mean that we’ve had our goal in our hands all along?  Obviously not, if we still think we need more “disaggregation”.  What this preliminary look at routing does is show us where we’d have to look if we wanted true disaggregated cloud routing that was better than we have now.  We’d have to look at what’s inside that big single virtual router, and try to optimize what we found.  The best way to do that is to return to the point about what a hosted instance is that a physical router isn’t, which is traffic-tactical.

With virtual devices that can be spun up in servers or distributed white-boxes-in-waiting, it could be possible to create new nodes and not just new topologies.  Remember that fiber trunks would likely be only partially committed to any given network service, but they could all be available.  Operators could have wholesale carriage agreements with each other to augment their own capacity.  These fiber trunks would terminate in specific locations, of course, and in those locations the availability of dynamic node-hosting capacity would mean that a node could be spun up here and there to exploit additional fiber where traffic conditions or network faults created an issue that couldn’t be resolved by rejiggling the old capacity with a different topology.

What this describes is what could be called “big routing” versus “big routers”.  A single “instance” or “device” or “cluster” is sized based on its total offered traffic.  A lot of aggregation in networking will generate a lot of traffic and require a higher capacity where the aggregation has focused, which is what got us to where we are in cluster routing.  However, if we assume a more edge-driven future, a lack of single points of massive traffic concentration, we don’t need big routers, we need big collective routing.

Assume a set of sites, linked with fiber and supplied with facilities to host virtual SDN nodes and virtual edge routers.  Assume a central control point with the facilities to collect traffic information and load status for everything.  Assume that this control point can then spin up new nodes, harnessing new trunks, and reshape the interior topology of our virtual router.  Assume, in short, what cloud-distributed and truly disaggregated router technology would look like.  Why stop short of the real goal, the real solution.  It’s right in front of us.