Should We Be Thinking of Network Service Evolution in Overlay Terms?

For the decades where IP dominance was a given, we have lived in an age where service features were network features.  When Nicira came along, driven by the need to scale cloud tenancy more than physical devices tended to support, we learned about another model.  Overlay networks, networks built from nodes that are connected by tunnels over traditional networks, could frame a very different kind of network future, and it’s worth looking at that future in more detail.

One of the challenges for this space is fuzzy terminology (surprise!)  The term “overlay network” is perhaps the most descriptive and least overloaded, but it’s also the least used.  Another term that’s fairly descriptive is “software-defined WAN” or SD-WAN, but many associate SD-WAN with not only technical overlays but business overlays.  You can build your own SD-WAN on top of a real network using only CPE, but you can use overlay networks either independent of or in partnership with internal nodes and the underlying physical network.  SDN is the worst term applied to this because practically anything with an API is called “SDN” these days.  I’m going to use the term “overlay networks” for neutrality’s sake.

In an overlay network you have two essential pieces—tunnels and edge elements.  Tunnels are the overlay part—they represent virtual wires that are driven between network service access points on the physical network that underlays the overlay.  Edge elements terminate tunnels and provide a traditional interface to the user, one that standard equipment and software recognizes.  In an IP overlay network, these edge elements would “look” like an IP device—an edge router or gateway router.  Some vendors offer a combination of the two pieces, while others promote a kind of implicit overlay model by offering hosted switch/router instances and supporting standard tunnel technology.

Some overlay networks have a third element (most often, those offered as a tunnel-and-element package), which is a header that’s attached to the data packets to carry private addresses and other information.  Others simply use tunnels to elevate the routing process and isolate it from the network-layer devices, but retain the same IP addresses or use “private” IP addresses.  You can make an argument for either approach, and to me the distinction isn’t critical enough to include in this particular discussion.

Simple overlays are built by meshing the edge elements with tunnels, using any convenient tunneling protocol that suits the underlayment.  In the edge element, a basic forwarding table then associates IP addresses (usually subnets) with a tunnel, and thus gets traffic onto the right tunnel to terminate in the appropriate edge device.  You can apply policy control to the forwarding tables to either limit the access of specific users/subnets to specific destinations, or to steer traffic onto different tunnels that go the same place, based on things like availability or class of service.

The tunnel-steering thing is one benefit of the architecture.  If you have a set of sites that have multiple service options available at the underlayment level, you can tunnel over them all and pick the tunnel you want, either for failover reasons or to do application-based QoS management.  This is how many SD-WAN offerings work.  But multi-tunneling can also be used to bridge different networks; an edge element functioning as a gateway might allow tunnel-to-tunnel routing, so it might then bridge users on one network with users on a different one.  This mission is the other common SD-WAN application; you link MPLS VPN sites with Internet VPN sites on a common overlay-based network.

In theory, any overlay network-builder could deploy gateway devices even if they didn’t have different underlay networks to harmonize.  An intermediate gateway point could let you create natural concentration points for traffic, creating “nodes” rather than edge points in the overlay.  This could be done to apply connection policies in one place, but it could be combined with multi-underlay features to allow overlay-builders to aggregate traffic on various underlay networks to a place where a different tunnel/underlay technology connected them all.

All of these overlay applications work as business-overlay networks; you can set them up even if you’re not the underlay provider.  However, the real benefit of overlay networks may be their ability to totally separate the connectivity part of networking from the transport part, which requires their use by the network operator.

As I noted earlier, it’s perfectly possible to build an overlay (SD-WAN-like) network without technical participation on the part of the underlay.  It’s also possible to have a network operator build an overlay VPN, and if that’s done there could be some interesting impacts, but the difference depends on just how far the operator takes the concept.  An operator offering an overlay VPN based on the same technical model as a third party wouldn’t move the ball.  To do more than that, the operator would have to go wide or go deep.

If an operator built all their IP services on an overlay model, then the services would be true ships in the night, meaning that it would be impossible for users of one to address users of another, even attack their underlying public address.  Overlay routing policies would control both connectivity and access, and movement in a physical sense (geographic or topological) would not impact the addressing of the endpoints or the efficiency of transport.

The most significant impact, though, would be that if all services were overlay, then the task of packet forwarding/routing would almost certainly be within the capabilities of hosted nodes, not reserved for custom routers.  Since you don’t need real L2/L3 when you’re creating connectivity above it in an overlay, you could dumb down current L2/L3 layers to be simple tunnel hosts.  This approach, then, is one of the pathways to the substitution of hosted resources for devices.  This is not necessarily NFV because you could treat the hosted nodes as virtual devices that were deployed almost conventionally rather than dynamically, but NFV could support the model.

A tunnel-focused infrastructure model would also deal with class-of-service differently.  Each tunnel could be traffic-and-route-engineered to provide a specific SLA, and “services” at the overlay level would be assigned a tunnel set to give them the QoS they needed.  You could implement any one of several options to link service changes and traffic to the infrastructure level, which means that you could implement vertically integrated dynamism.  That’s essential if you’re actually going to sell users elasticity in available connection capacity.  Best of all, you could do the matching on a per-route-pair basis if needed, which means you’re not paying for any-to-any capacity you don’t use in a logical hub-and-spoke configuration of applications and users.

All of these positives could be achieved quickly because of the nature of overlay technology—by definition you can do whatever you like at the service layer without impacting the real-network underlayment.  You could thus transition from IP/Ethernet to SDN, or from IP to Ethernet, or to anything you like from anywhere you already are.  The overlay structure creates unified services from discontinuous infrastructure policies (as long as you have some gateways to connect the different underlayments).

To me, an overlay model of services is the only logical way to move forward with network transformation at the infrastructure level, because it lets you stabilize the service model throughout.  We should be paying a lot more attention to this, for SDN, for NFV, and even for the cloud.