Can We Build Agile Infrastructure with the Overlay/Underlay Model?

Let us suppose for a moment that the goal of operators is to reduce equipment and operations cost in concert and at the same time increase their ability to provision current services quickly and flexibly, and develop new services just as quickly.  Let us further suppose that they have addressed the higher-level operations/portal implications of this.  What would the ideal network approach be?

Since it’s clear that operators do want exactly what’s presented in the last paragraph, this is a fair question.  Since the answer to the question will dictate infrastructure spending in the future, it’s an important one.  Interestingly, we have an answer for it, and it’s been around for a fair period of time.

If we go back to a point in my last blog, operators need to be able to make changes to costs and revenues without forcing a fork-lift, large-scale, change-out of infrastructure.  There is simply no way to bear the risk of a large transformation and at this point no time to prove out alternative infrastructure technologies to the degree needed to contain that risk.  We have to evolve with some grace into the future.

My conversation with the MEF’s CTO convinced me that their Third Network model has merit, providing that the model embrace something that is strongly hinted at but not featured—the concept of an overlay technology.  If the lower three layers of the OSI model (what the model says is actually in the network) is Levels 1 through 3, then let’s call this overlay layer Level I or Li for short.

The basic notion for Li is that services would be defined and delivered at this new layer, which would then consume tunnels (“virtual wires”) created at the layers below.  Since services would now be using existing network technology only as a physical layer, you’d be able to change out any or all of that stuff at whatever pace you find optimal because lower-layer implementations are opaque to the higher layers.

Overlay connections are based on a header that’s appended to data payloads before they’re encapsulated for handling by the tunnel protocol.  They subdivide the traffic at any tunnel-point, and at each such tunnel-point the subdivisions can either extract the traffic with a given header and deliver it to a user access point, or “cross-connect” it to another tunnel.  It’s in how this is done that the efficiency and value of the Li model is determined.

In the original Nicira overlay-SDN model, a LAN or VLAN or VPN architecture created the tunnel paths, and these connected physical network/IT elements like servers.  The SDN overlay then subdivided access by tenant.  In theory, each server could either extract header-identified traffic for its local users or cross-connect it onward.  This is not unlike how lower OSI layers relate to higher layers; you can pull traffic from a LAN (Level 2) and connect it to another LAN through a WAN connection, via a router.

The current SD-WAN products have a slightly different approach but use the same overlay concept.  Here, a series of connections made at a lower level to the same access point are effectively united by a higher overlay that can ride on any of the low-level options.  This higher layer then presents the user interface.

The general overlay model that might be viewed as the basis for MEF’s Third Network should be able to work with any of the following tunnel-models:

  1. The lower-level tunnels can connect all the way to the access points, creating a virtual mesh. The overlay technology would then provide only service-specific handling and addressing, and each tunnel access point would simply forward a packet on the right tunnel.  This would work for modest-scale virtual networks where a fully scalable forwarding technology (like SDN switches) was used.
  2. The lower-level tunnels connect to some number of aggregation points hosted within the network based on traffic topology. At these points, forwarding rules would cross-connect them.  This is the structural model that would optimize the use of hosted/virtual router instances.
  3. The lower-level tunnels, in addition to one of the above approaches, cross a protocol or administrative boundary where tunnel-to-tunnel connection is not available, and where tunnels from each side must therefore terminate. The Li layer now has to cross-connect the tunnels appropriately just to pass across the boundary.

The issue that can mess up a good overlay strategy could be called “tunnel granularity”.  If you have too little tunnel granularity, then you can’t create tunnels to the access points for an overlay-based service without a lot of tunnel cross-connecting.  Not only does this process increase delay and packet loss risk, the fact that it’s happening for a concentration of users sharing an inadequate number of lower-level tunnels means it might well grow in demand to the point where addressing it with a hosted router instance would be difficult.  You’d like to get your lower-level tunnel mesh as close to serving all the access points as possible.  The MEF has been working to improve Ethernet’s ability to support connected-path multiplicity efficiently, and that’s good.

Here is where “universal SDN” might be very helpful.  If you think of an OpenFlow-driven concatenation of forwarding table entries as a kind of “naked tunnel”, you see that SDN could create any arbitrary tunnel configuration end to end if desired.  If you combine this with agile optics (ROADMs) then you’d have a highly functional physical layer over which you could overlay any convenient L2/L3 service protocol while largely ignoring issues like topology and even path failures (because they’d be handled or controlled below).

The overlay approach would be easy to apply to mobile infrastructure because it’s already heavily based on tunnels (EPC).  It would also be easy to apply to business virtual network services and to cloud application services.  It’s not as clear that you could adopt an overlay model for the Internet, which suggests that either you’d want to retain standard Internet routing at least in the core and augment it with SDN forwarding, or at least retain it for non-content delivery services, which are already supported largely from CDNs.

There’s no shortage of potential vendors to support the model, starting with the classic overlay-SDN Nicira/VMware play and extending to SD-WAN vendors like Talari, Citrix/CloudBridge, Silver Peak, and Riverbed/SteelConnect.  In addition, most virtual routers (software router instances) can interconnect tunnels and so could be used to build an overlay-modeled service framework.  However, vendors have been shy so far in committing to the approach, preferring to sell to enterprises in more limited missions rather than to operators.  Even the SD-WAN vendors whose products could easily frame an overlay model (even within the Third Network approach) haven’t played that capability as a differentiator.

The likely reason for this is that selling SD-WAN to enterprises is working, and selling it as a mainstay for next-gen networking is a Great Unknown, particularly for vendors who don’t call on operator CTOs and don’t participate in emerging-network standards.  Despite the resistance, I think it’s clear that overlay networking could play a major role in next-gen infrastructure, perhaps the dominant one.  It may be that the evolution of the MEF’s Third Network will finally legitimize the approach and address the critical question of overlay/underlay relationships.