Overlay/Underlay Networking and the Future of Services

Overlay networks have been a topic for this blog fairly often recently, but given that more operators (including, recently, Comcast) have come out in favor of them, I think it’s time to look at how overlay technology might impact network investment overall.  After all, if overlay networking becomes mainstream, something of that magnitude would have to impact what these networks get overlaid onto.

Overlay networks are virtual networks built by adding what’s essentially another connection layer on top of prevailing L2/L3 technology.  Unlike traditional “virtual networks” the overlay networks are invisible to the lower layers; devices down there treat them as traffic.  That could radically simplify the creation of virtual networks by eliminating the need to manage the connectivity in a “real” network device, but there are other impacts that could be even more important.  To understand them we should start at the top.

There are two basic models of overlay network—the nodal model and the mesh model.  In the nodal model, the overlay includes interior elements that perform the same functions that network nodes normally perform in real networks—switching/routing.  In the mesh model, there are no interior nodes to act as concentrators/distributors of traffic.  Instead each edge element is connected to all the others via some sort of tunnel or lower-level service.

The determinant in the “best” model will in most cases be simply the number of endpoints.  Both endpoints and nodes have “routing tables”, and as is the case with traditional routing, the tables don’t have to include every distinct endpoint address, but rather only the portion of an address needed to make a forwarding decision.  However, if the endpoints are meshed then the forwarding decision has to be made for each, which means the endpoint routing tables get large and expensive to process.

Interior node points can simplify the routing tables, particularly since the address space used in an overlay network need not in any way relate to the underlying network address space.  A geographic/hierarchical addressing scheme could be used to divide a network into areas, each of which might have a collecting/distributing node.  Node points can also be used to force traffic along certain paths by putting a node there, and that would be helpful for traffic management.

The notion of an overlay-based virtual network service clearly empowers endpoints, and if the optimization of nodal locations is based on sophisticated traffic and geography factors, it would also favor virtual-node deployments in the network interior.  Thus, overlay networks could directly promote (or be promoted by) NFV.  One of the two “revolutionary elements” of future networking is this a player here.

So is the other.  If tunnels are the goal, then SDN is a logical way to fulfill that goal.  The advantage SDN offers is that the forwarding chain created through OpenFlow by central command could pass wherever it’s best assigned, and each flow supported by such a chain is truly a ship in the night relative to others in terms of addressability.  If central management can provide proper traffic planning and thus QoS, then all the SDN flows are pretty darn independent.

The big question for SDN has always been domain federation.  We know that SDN controllers work, but we can be pretty sure that a single enormous controller could never hope to control a global network.  Instead we have to be able to meld SDN domains, to provide a means for those forwarded flows to cross a domain boundary without being elevated and reconstituted.  If that capability existed, it would make SDN a better platform for overlay networks than even Ethernet with all its enhancements.

The nature of the overlay process and the nature of the underlayment combine to create a whole series of potential service models.  SD-WAN, for example, is an edge-steered tunnel process that often provides multiple parallel connection options for some or even all of the service points.  Virtual switching (vSwitch) provides what’s normally an Ethernet-like overlay on top of an Ethernet underlayment, but still separates the connection plane from the transport process, which is why it’s a good multi-tenant approach for the cloud.  It’s fair to say that there is neither a need to standardize on a single overlay protocol or architecture, nor even a value to doing so.  If service-specific overlay competition arises and enriches the market, so much the better.

Where there obviously is a need for some logic and order is in the underlayment.  Here, we can define some basic truths that would have a major impact on the efficiency of traffic management and operations.

The first point is that the more overlays you have the more important it is to control traffic and availability below the overlay.  You don’t want to recover from a million service faults when one common trunk/tunnel has failed.  This is why the notion of virtual wires is so important, though I want to stress that any of the three major connection models (LINE, LAN, TREE) would be fine as a tunnel model.  The point is that you want all possible management directed here.  This is where agile optics, SDN pipes, and so forth, would live, and where augmentation of current network infrastructure to be more overlay-efficient could be very helpful.

The second point, which I hinted at above, is that you need to define domain gateways that can carry the overlays among domains without forcing you to terminate and reestablish the overlays, meaning host a bunch of nodes at the boundary.  Ideally, the same overlay connection models should be valid for all the interconnected domains so a single process could define all the underlayment pathways.  As I noted earlier, this means domain federation has to be provided no matter what technology you use for the underlayment.

The third point is that the underlay network has to expose QoS or class of service capabilities as options to the overlay.  You can’t create QoS or manage traffic in an overlay, so you have to be able to communicate between the overlay and underlay with respect to the SLA you need, and you have to then enforce it below.

The final point is universality and evolution.  The overlay/underlay relationship should never depend on technology/implementation of either layer.  The old OSI model was right; the layers have to see each other only as a set of exposed services.  In modern terms, that means that both layers are intent models with regard to the other, and the overlay is an intent model to its user.  The evolution point means that it’s important to map network capabilities in the overlay to legacy underlayment implementations, because otherwise you probably won’t get the scope of implementation you need.

You might wonder at this point why, if overlay networking is so powerful a concept, operators haven’t fallen over themselves to implement it.  One reason, I think, is that the concept of overlay networks is explicitly an OTT concept.  It establishes the notion of a network service in a different way, a way that could admit new competitors.  If this is the primary reason, though, it may be losing steam because SD-WAN technology is already creating OTT competition without any formal overlay/underlay structures.  The fact that anyone can do an overlay means nobody can really suppress the concept.  If it’s good, powerful, then it will catch on.