Cloud ROUTING versus Hosted Router Instances

I mentioned data plane feature hosting in my last blog, noting that we needed to spend some time looking at the connection-service elements and how we’d propose to make them candidates for hosted and cloud-native implementation.  I propose to start the ball rolling with this blog, and to do that we have to look at least a bit at the way we do connection services today.

Connection services are currently implemented using devices, and because most connection services are IP-based, those devices are usually routers.  A router network is a community that cooperates to learn its own structure and connectivity and deliver traffic among the endpoints of every device in the community.  The traffic is the data plane in our discussion, and the cooperation that takes place within the community is mediated by the control plane.

The union of these planes, the center of routing, is the routing or forwarding table.  This is a table in each router, containing addresses and masks that identify a set of packets and associate that set with a destination port.  A route from source to destination is created by the sum of the forwarding tables of the devices.  Router A sends packets of a given type out on port 3, which gets them to Router D, whose table routes them out on port 11 to Router M, and so forth.  Forwarding tables are created using routing protocols, part of our control plane, which advertise reachability in a sort-of-Name-That-Tune way: “I can reach Charlie in THREE hops!”, in the simple case, or reach it based on more complex metrics for modern protocols like OSPF. Each device will keep its best routes to destinations in its forwarding table.

A data plane, then, is the collection of routes created by the sum of the forwarding tables.  In traditional router networks, the forwarding tables are created by the adaptive control-plane routing-protocol processes.  It’s been proposed that in OpenFlow SDN, the tables would be provided or updated by a central control agency, and adaptive routing-protocol exchanges would not happen.

There are about four dozen control-plane packet types (look up the ICMP or Internet Control Message Protocol for the list), and they divide into what could be called “status” packets and “route” packets.  A hosted functional instance of “routing” would have to pass the data plane based on forwarding table entries and do something with the ICMP packets, the control plane.  Let’s go back to OpenFlow for a moment and see how that works by examining the extreme case.  We’ll focus on the “configured forwarding” model of OpenFlow rather than on the “adaptive” model, but it would work either way.

In an OpenFlow device, the central control element will have loaded a forwarding table based on network topology and traffic policies, and then kept it current with conditions.  There would be no need to have control packet exchanges within an OpenFlow network, but you’d probably need them at the edge, where conventional IP connectivity and devices would be or could be connected.  Thus, we could envision a hosted OpenFlow device instance to consist of the forwarding processes of the data plane, the management exchanges with the central controller, and a set of control plane proxies that would generate or respond to ICMP/routing protocols as needed, based presumably on central controller information.  This approach is consistent with how Google, for example, uses SDN within its network core, where it emulates BGP at the core’s edge.

We can see from all of this that our “software-hosted router network” could be visualized in two ways.  First, it could be visualized as a “network of software-hosted routers”, each of which looked exactly how a real router would look, from the outside in.  This would be an abstract router model.  It could also be visualized as a “software-implemented router network” which, from the outside, looked not like a router but like a router network.  This is a distinction of critical importance when you look at how software-hosted, and especially cloud-native, technology would apply in the data plane connection services.

If we go back to the initial NFV concept (the “Call for Action” white paper in the fall of 2012), we see that the goal was to reduce capex by substituting software instances hosted on commercial services for proprietary devices.  Within a year, the operators who authored that paper had largely agreed that this would not make enough difference in capex to be worth the effort.  Thus, I contend that the abstract router model of data plane connection service implementation is not going to offer enough benefits to justify the effort.  We have to look at the abstract router network model instead.

But what the heck is that?  As it happens, we have a general model of this approach that goes back decades, in the IETF RFC 2332 or Next Hop Resolution Protocol (NHRP).  This RFC posits a “Non-Broadcast Multi-Access” or NBMA network, surrounded by a ring of devices that adapt that network to “look” like a router network.  You could implement that NBMA network using a connection-oriented protocol (frame relay, ATM, or even ISDN data calls), or you could implement it using any arbitrary protocol of the type we’d today call “SDN”, including OpenFlow.

The data-plane procedure here is simple.  A packet arrives at our boundary ring, addressed to a destination somewhere on another ring device.  The arrival ring device looks up the destination and does the stuff necessary to get the packet to the ring device on which the destination is connected.  That device forwards the packet onward in an IP-centric way.  The ring devices are thus proxies for the control plane protocols and adapters for the data-plane connectivity.  This, I submit, is the right model for implementing hosted routing rather than hosted routers.

What Google did, and what NHRP proposed, was just that, a model for a software-hosted router network, not for software-hosted routers.  With it, we’ve broken the device-specific model of IP networking into two pieces—the NBMA core where any good forwarding technology is fine, and an adapting ring of functionality that spoofs IP networks in the outward direction to match connection requirements, but matches the traffic to the NBMA requirements.

One obvious feature of this approach is that ring on/off-ramp technology is where we have to look to combine control- and data-plane behavior.  Inside the NBMA, we can probably deal only with data-plane handling.  Another feature is that since the NBMA’s technology is arbitrary, and any place we can put an on/off-ramp is a good place, we could assume that we’d give every user such a ramp point.  That would mean their control-plane relationships could be supported by a dedicated software element.  It could theoretically be scalable too, but it may not be necessary to scale the control-plane element if it’s offered on a per-user basis.

We could imagine our NBMA as being a geographic hierarchy, a network that picks up users near the access edge and delivers traffic to other access edge points.  It might have an edge/metro/core hierarchy, but as the traffic between edge points or metro points increases, we’d reach a point where it made sense to create a direct (in packet-optical terms) path between those higher-traffic points.  I’m envisioning this level of connectivity as being created by agile optics and SDN-like technology.  The hosted elements, in my view, would be between metro and edge, where traffic didn’t justify dedicated facilities and instead required aggregation for efficient use of transport.

This model creates a kind of symbiosis between SDN to define semi-permanent node points and routes, and hosted ring-element functionality to serve as the on/off-ramp technology.  Since the latter would handle either a single user (business) or small access community (residential) the demands for data-plane bandwidth wouldn’t be excessive, and in any case some of the data-plane work could be done using an SDN element in each edge office (where access terminations are found).

We end up with what’s probably a metro-and-core network made up of fairly fixed optical/SDN an edge network made up of hosted ring-element instances, and an edge-to-metro network that might include a mixture of the two technologies, even a mixture of fixed and instance-based elements.  This is what I think a connection service of the future might look like.  It wouldn’t change optical deployment, except perhaps to encourage an increase in capacity and agility.  It would promote electrical-layer grooming via SDN, likely OpenFlow.  It would permit an overlay network, in Ethernet, IP, or SD-WAN/SDN form.

To me, the decisive point here is that the deeper you go into a network, the more likely it is that the elements of the network don’t require repositioning, only resiliency.  If you want to create a more reliable metro/core you do that with multiple paths, meaning more nodes, not by creating scalable nodes.  That’s because the optical paths between major locations are not “agile” or subject to software virtualization.  You need facilities where trunks terminate. Agility, and cloud-native, belong with the control plane and closer to the edge, where variability in activity and traffic might indeed justify virtualization of the node points.

Just because this is what I believe doesn’t mean it’s always true, of course.  Metaswitch, a company whose work in the IMS space (Project Clearwater, in the old days) I know and respect, has a different view of cloud-native VNFs and hosted 5G core.  They published an excellent white paper, available HERE, that talks about cloud-native development in NFV.  I agree with the principles they articulate, and I’ll do a later blog on their paper and approach, but I believe that they’re talking about a software-based router, not a software-abstracted router network, which is where I think we have to be.

The biggest challenge in virtualization lies in what you elect to virtualize.  Virtualizing network elements means building a conventional network of virtual elements.  Virtualizing a network means building a new model of networking.  Only the latter of these approaches qualifies as a revolution, and only a revolution will really optimize what the cloud can bring to networking.