Are Operators Still Looking at Virtual Routers?

What is the value of hosted routing instances?  This is a question that’s been asked for a full decade now, and the view of operators has shifted back and forth.  It’s also true that when operator technology planners are asked, they often give a knee-jerk answer and take it back on consideration.  Finally, many are coming to believe that open-model devices, specifically white-box routers with open-source routing software, will obviate most of the need for hosting router instances at the software level.  For quite some time, operators have wrestled with the idea of building their networks cheaper, and with different approaches.  What are they thinking now, and might something actually get done?

If we go back about a decade, we find that almost half of Tier One operators conducted at least one experiment with router software hosted on servers.  Initially, the goal was to overcome what operators felt was increasingly predatory pricing from the big network vendors.  The business driver was the first of several studies that showed that revenue per bit was dropping faster than cost per bit, which meant that future investments might have negative ROI.  Cost-cutting was in order.  At least a couple of Tier Ones went so far as to license a significant number of instances during the early 2010 decade.  None really carried the trial forward.

Part of the issue here was one of application.  Nobody believed they could replace larger core routers with software and servers, they were interested in using the software routers as edge routers, both for general consumer Internet and for business VPNs.  In the former mission, they quickly realized that there were specialized devices that were more reliable and scalable in the mission, so it was the corporate mission that was interesting.  The problem in that mission was largely one of availability/reliability.  Servers broke more often than proprietary routers did, and the higher failure rate compromised customer experience and account control.

This early experience is likely why the Network Functions Virtualization initiative, launched by a paper in 2012, focused on network appliances other than switches and routers.  The NFV initiatives fell short of their goals, in part because their goals evolved over time based in large part on vendor support.  One non-vendor issue, though, was that limiting virtual-feature hosting to what were essentially control or specialized feature roles didn’t address enough cost.  Over time, NFV tended to create a model for universal CPE rather than for hosted software features at the network level.

SDN, which came along (in its ONF OpenFlow version) just about the same time as NFV, seemed initially to be more of a competitor for router devices than an application of feature hosting.  However, there are some operators who now wonder whether SDN might actually have a role in another attempt to create router networks without proprietary devices or white boxes.

SDN builds connectivity by deploying “flow switches” whose forwarding tables are controlled from a central software element.  You can visualize an SDN network as being a router network whose control plane has been separated and centralized, which is exactly what the originators of the concept meant.  The thing is, this central-control thing has an implication in the router instance debate.

Today, we don’t think of router instances as being 1:1 coupled to a server, but as an application hosted on a resource pool.  The resource pool concept offers a general benefit of easy replacement and scaling, and a specific benefit of being able to reconfigure a network broadly in response to any changes in condition.  The latter comes about because the pool is presumed to be highly distributed, so you can spin up a router anywhere you might need one, as opposed to sending out a tech to install it.

The problem with router instances in this mission is that the new routers would have to learn the routes and connectivity of the network, which is often called “converging” on a topology.  It takes time, and because it’s a process that involves a bunch of control-packet exchanges that percolate through the network, it can create issues network-wide until everything settles down.  This is where SDN might make a difference.

A flow-switch instance, receiving its forwarding tables from the central controller, would immediately fit into the network topology as it was intended to fit.  There’s no convergence to worry about.  Even if the central controller had a more radical change in topology in mind than simply replacing an instance that failed or scaling something, the controller could game out the new topology and replace the forwarding tables where they changed, in an orderly way.  There might be local disruptions as a topology was reconfigured, but it could be managed through simulation of the state changes at the controller level.

This raises the potential for agile topology at the network level.  Instead of thinking of networks as being static topologies, think of them as being a set of devices that kind of migrate into the path of traffic and arrange themselves for optimum performance and economy.  It’s a whole new way of thinking about networks, and one that’s being considered by the literati among tech planners in many operator organizations.

The challenge with this is that same old issue about the performance of hosted router instances and server network throughput limitations.  Fortunately, we may be seeing technology changes that could address this.  One possibility is a “white-box resource pool” and another is the use of GPUs to provide flow switching.

If white boxes were commoditized enough, it would be feasible to put a pool of them in at all locations where network transit trunks terminated.  If we were to use agile-optics grooming, we could push lambdas (DWDM wavelengths) or even electrical-layer virtual trunks to all of the pool, and create service-or even customer-specific networks in parallel, all controlled by that One Central Controller or by a hierarchy of it and lesser gods.  These white boxes could be equipped with flow-switching chips, each with a P4 driver to make the device compatible with open-source software.

It might be possible to stick this kind of chip into a server too, but it might also be possible to use GPUs to do the switching decision-making.  OpenFlow switching doesn’t have to be as complex as routing, and there are many missions for GPUs in resource pools that would justify adding them in.  This seems to be the approach currently favored by that operator literati.

The One Central Controller (or the hierarchy) obviously needs to be strong and resilient for this to work.  OpenFlow and the ONF have been slow (and many say, ineffective) in addressing the performance of the central controllers and the way a hierarchy of controllers (like that used for DNS) could be created and secured.  Even slower, and with fewer participants, has been the development of an integrated vision of agile optics and flow switching, which is also needed for optimum value to this approach.

But, presuming we were to adopt this approach, where would that leave the issue of hosting overall?  Bet you’re not surprised that it’s complicated.

One clear truth is that adopting a flow-switch-instance approach would accelerate the overall shift to open-model networking.  At the same time, it could promote differentiation in the new model.  Because NFV hasn’t succeeded in impacting the hosted-router issue (for reasons I’ve blogged on many times before), and because it’s focused more on per-customer services like VPNs, we have no well-respected architectural models, much less standards, for this sort of network.  It’s ripe for vendor exploitation, in short.

A counterforce to this, just as clear, is that moving from a router/device network to an agile cloud-based flow-switch-instance network supported by an agile optics underlay is a long slog.  There’s a ton of sunk cost in current networks, and a perception of a ton of risk in the migration.  The latter is especially true when you consider that it’s tough to make the flow-switch-instance approach work on a small scale; the value is at full scale.

It may be, then, that the future of router or flow-switch instances will not pull through broader open-model networking at all, but that it will be pulled through by it.  Operators are already looking at white-box-and-open-source for edge devices.  The boundary between edge and core is soft, and easily crossed by technology changes.  First the access, then the metro, then the core?  Maybe.  It would appear that replacing routers with router instances, or with flow-switch instances, is the final step in network evolution.  That would give router vendors more time to come up with a strategy to address that issue.

It doesn’t stop the outside-in open-model commoditization, of course.  Will the vendors, presented with a band-aid while free-falling, fix the cut they got going out the door of the plane and ignore the coming impact?  We’ll see.