Tapping the Potential for Agile, Virtual, Network and Cloud Topologies

You always hear about service agility as an NFV goal these days.  Part of the reason is what might cynically be called “a flight from proof”; the other benefits touted for NFV have proven to be difficult to validate or to size.  Cynicism notwithstanding, there are valid reasons to think that agility at the service level could be a positive driver, and there are certainly plenty who claim it.  I wonder, though, if we’re not ignoring a totally different kind of agility in our discussions—topology agility.

For most vendors and operators, service agility means reducing time to revenue.  In the majority of cases the concept has been applied specifically to the provisioning delay normally associated with business services, and in a few cases to the service planning-to-deployment cycle.  The common denominator for these two agility examples is that they don’t necessarily have a lot to do with NFV.  You can achieve them with composable services and agile CPE.

If we step back to a classic vision of NFV, we’d see a cloud of resources deployed (as one operator told me) “everywhere we have real estate.”  This model may be fascinating for NFV advocates, but proving out that large a commitment to NFV is problematic when we don’t really even have many service-specific business cases.  Not to mention proof that one NFV approach could address them all.  But the interesting thing about the classic vision is that it would be clearly validated if we presumed that NFV could generate an agile network topology, a new service model.  Such a model could have significant utility, translating to sales potential, even at the network level.  It could also be a way of uniting NFV and cloud computing goals, perhaps differentiating carrier cloud services from other cloud providers.

Networks connect things, and so you could visualize a network service as being first and foremost a “connection network” that lets any service point on the service exchange information with any other (subject to security or connectivity rules inherent in the service).  The most straightforward way of obtaining full connectivity is to mesh the service points, but this mechanism (which generates n*(n-1)/2 paths) would quickly become impractical if physical trunks were required.  In fact, any “trunk” or mesh technology that charged per path would discourage this approach.  The classic solution has been nodes.

A network node is an intermediate point of traffic concentration and distribution that accepts traffic from source paths and delivers it to destination paths.  For nodes to work the node has to understand where service points are relative to each other, and to the nodes, which means some topology-aware forwarding process.  In Ethernet it’s a bridging approach, with IP it’s routing, and with SDN it’s a central topology map maintained by an SDN controller.  Nodes let us build a network that’s not topologically a full mesh but still achieves full connectivity.

Physical-network nodes are where trunks join, meaning that the node locations are linked to the places where traffic paths concentrate.  Virtual network nodes that are based on traditional L2/L3 protocols are built by real devices and thus live in these same trunk-collection locations.  The use of tunneling protocols, which essentially create a L1/L2 path over an L2/L3 network, can let us separate the logical topology of a network from the physical topology.  We’d now have two levels of “virtualization”.  First, the service looks like a full mesh.  Second, the virtual network that creates the service looks like a set of tunnels and tunnel-nodes.  It’s hard to see why you’d have tunnel nodes where there was no corresponding physical node, but there are plenty of reasons why you could have a second-level virtual network with virtual nodes at only a few select places.  This is what opens the door for topology agility.

Where should virtual nodes be placed?  It depends on a number of factors, including the actual service traffic pattern (who talks to who and how much?) and the pricing mechanism applied.  Putting a virtual node in a specific place lets you concentrate traffic at that point and distribute from that point.  Users close to a virtual node have a shorter network distance to travel before they can be connected with a partner on that same node.  Virtual nodes can be used to aggregate traffic between regions to take advantage of transport pricing economies of scale.  In short, they can be nice.

They can also be in the wrong place at any given moment.  Traffic patterns change over each day and through a week, month, or quarter.  Some networks might offer different prices for evening versus day use, which means price-optimizing virtual-node topologies might have to change by time of day.  Some traffic might even want a different structure than another—“TREE” or multicast services typically “prune” themselves for efficient distribution with minimal generation of multiple copies of packets or delivery to network areas where there are no users receiving the multicast.

NFV would let you combine tunnels and virtual nodes to create any arbitrary topology and to change topology at will.  It would enable companies to reconfigure VPNs to accommodate changes in application topology, like cloudbursting or failover.  It could facilitate the dynamic accommodation of cloud/application VPNs that have to be linked to corporate VPNs, particularly when the nature of the linkage required changed over time to reflect quarterly closings or just shifting time zones for users in their “peak period.”

This has applications for corporate VPNs but also for provider applications like content delivery.  Agile topology is also the best possible argument for virtualizing mobile infrastructure, though many of the current solutions don’t exploit it fully.  If you could place signaling elements and perhaps even gateways (PGW, SGW, and their relationships) where current traffic demanded, you could respond to unusual conditions like sporting or political events and even traffic jams.

These applications would work either with SDN-explicit forwarding tunnels or with overlay tunnels of the kind used in SD-WAN.  Many of the vendors’ SDN architectures that are based on overlay technology could also deliver this sort of capability; what’s needed is either a capability to deliver a tunnel as a virtual wire to a generic virtual switch or router, or a virtual router or switch capability included in the overlay SDN architecture.

Agile topology services do present some additional requirements that standards bodies and vendors would have to consider.  The most significant is the need to locate where you want to exercise your agility, and what triggers changes.  Networks are designed to adapt to conditions, but roaming nodes and trunks aren’t the sort of thing early designers considered.  To exploit agility, you’d need to harness analytics to decide when something needed to be done, and then to reconfigure things to meet the new conditions.

Another requirement is the ability to control the way that topology changes are reflected in the network dynamically, to avoid losing packets during a change.  Today’s L2/L3 protocols will sometimes lose packets during reconfiguration, and agile topologies should at the minimum do no worse.  Coordinating the establishing of new paths before decommissioning old ones isn’t rocket science, but it is something that’s not typically part of network engineering.

Perhaps the biggest question raised by agile-topology services is whether the same thing will be needed in the cloud overall.  If the purpose of agile topology is to adapt configuration to demand changes, it’s certainly something that’s valuable for applications as well as for nodes—perhaps more so.  New applications like a logical model of IoT could drive a convergence of “cloud” with SDN and NFV, but even without it it’s possible operators would see competitive advantages in adding agile-topology features to cloud services.

The reason agile topology could be a competitive advantage for operators is that public cloud providers are not typically looking at highly distributed resource pools, but at a few regional centers.  The value of agile topology is greatest where you have a lot of places to put stuff, obviously.  If operators were able to truly distribute their hosting resources, whether to support agile-node placement or just NFV, they might be able to offer a level of agility that other cloud providers could never hope to match.

The challenge for topology agility is those “highly distributed resource pools”.  Only mobile infrastructure can currently hope to create enough distributed resources to make agile topology truly useful, and so that’s the place to watch.  As I said, today’s virtual IMS/EPC applications are only touching the edges of the potential for virtualization of mobile topology, and it’s hard to know how long it will take for vendors to do better.