Approaching the SDN/NFV End-Game

OK, I admit to liking old songs and poetry, so you’ll probably not be surprised if I quote a song title; “How deep is the ocean, how high is the sky?”  I don’t propose to blog on oceans or skies here, but depth and breadth is an interesting question posed to SDN and NFV.  We might need to ask ourselves another seemingly trivial question for both technologies, which is “What does the SDN/NFV end-game look like and how do we get there?”

We have about a trillion dollars in network assets out there, about a fifth of which is depreciated in a given year.  The capital budget of operators is running a bit lower, so at the moment we’re gradually drawing down on the “installed base” that offers a lot of inertia.  At the same time, that capital budget is reinvesting in the same technology, so inertia isn’t decreasing significantly.  SDN and NFV have to overcome that inertia.

If we just use the rough numbers I presented in the last ‘graph, then you can easily see the issue and perhaps also see the path to change.  It’s unlikely that we’d achieve major SDN or NFV success if operators keep buying the legacy gear.  That means that SDN and NFV have to be presented in an evolutionary posture, as something you can migrate to gracefully.  At the same time, though, operators aren’t really interested in quite evolution—at least not in the long term.  Technology changes present risks that can be justified only by significant upside.

SDN has at one level accepted and addressed the need to balance evolution and revolution.  You can control many legacy switches using OpenFlow.  That lets users invest in SDN at the controller and management level and apply that investment to current network devices.  As those devices age, they can be replaced with white-box gear that uses SDN and only SDN.  At least that’s the theory.  The practice has so far been somewhat stalled in the data center, where the impact on cost and revenue is limited.

For NFV, it’s been a harder row to hoe.  While you can argue capex savings for NFV on an incremental deployment, the fact is that NFV is more complicated than point-of-service devices would be for the same features.  That means that operations efficiency has to be better at least to the point that the incremental complexity is covered.  And remember that most operators don’t believe in capex as a driving NFV benefit.  Service agility and operations efficiency, the benefits de jure, both appear difficult to attain unless you address a service from end to end and top to bottom.  How do you square that with the need contain early cost and risk?

All migrations are justified by the destination.  A couple million African wildebeest don’t swing south toward the Mara River and face the crocs to starve in a different place.  We’d probably have an easier time postulating the migration strategy for SDN/NFV if we knew what a full deployment would be like.

What does an “SDN network” look like?  Obviously it can’t be an Ethernet or router network that adds in some OpenFlow control; that doesn’t move the cost or benefit ball much.  You could in fact build what looked like IP or Ethernet services using SDN.  You could also build services that looked the same at a service demarcation but were created very differently.  Application- and service-specific subnetworks, added to an enhanced virtual router at the customer edge, could frame services in a totally different way and revolutionize networking.  One option presents limited migration risk and limited benefit.  The other seems to go the other way.  Which model is best?

NFV poses a similar question.  From the first, the focus of NFV has been in deploying virtual functions that replace physical appliances operating above switching/routing.  So let’s assume we do that.  How much of the network capital budget is associated with that kind of technology?  A bit more than ten percent, according to operators.  Even if we can harness opex and agility benefits, how many will be available if we touch only a tenth of the gear?  You can contain NFV to that simple mission, or you can try to address the opex and agility goals even if it means extending what we mean by “NFV” significantly.  Again, one way offers risk management and the other a much better benefit case.  What’s the best approach?

So what are the answers?  I think we can best start with what can’t be the answers.  SDN and NFV for network operators cannot be a pure overlay strategy that rides on current switching/routing infrastructure without much change in that base.  How do you add something on top of the original model and by supplementation make it cheaper?  We have to displace switching and routing on a large scale for there to be large-scale SDN and NFV success, and whether we like that or not (and router vendor employees who read this probably won’t) it’s still the truth.

A second truth is that we are not going to replace current network routers and switches with servers and virtual routers.  Many of those current products are simply too big.  Terabit routers have been a reality for a long time, but we don’t have much experience with terabit servers.  Virtual switches and routers clearly have to play a big role in the future, but not as 1:1 replacements.

We need to look from the blue-sky future into the deep here.  The most obvious of the network technology trends has been that of agile optics and the displacement of traditional core-router aggregation with agile optical cores.  We should expect that this trend will continue, and as it does it joins up with SDN and NFV to create that future vision.

Agile physical-layer technology lets you dumb down switching/routing because it subducts error recovery responsibility from the higher layer.  Furthermore, if you can partition users and services economically at the agile optical layer, you could build business services using pipes and virtual routers/switches.  That opens the SDN opportunity to deploy a simple forwarding tunnel over optics, with no real L2/L3 involved, and use that tunnel with virtual switching/routing on a per-user, per-application, and per-service basis.

NFV can play a role here by deploying those L2/L3 virtual elements.  Absent a connection mission like this, NFV is stuck in higher-layer functionality where it can’t easily change the cost or benefit structure of basic services.  But if we build service and application networks one-off using partitioned L1 technology, we need the higher layers to deploy.  These missions, as I’ve pointed out, are less demanding of the devices hosting the virtual routers because they’re limited in scope to a single user or service.  We’ll still need to aggregate stuff, but not nearly as much device-based switching/routing is needed.

A lot of virtual routing could be needed.  Every service edge for every business and every consumer would have in this model a virtual router that provided the user with the specific tunnel/service-network access they needed.  For VPNs you’d have edge virtual routers and floating internal ones that were placed to optimize traffic flows and resource usage.  It’s a different model of optimization.  Forget finding the best path among a nest of routers, you find the best path and nest routers to fit it.

The biggest problem we have with all of this isn’t carrier culture.  Vendor resistance to this approach would be even more problematic because it prevents vendors from accepting a radical change.  And underneath both the carrier and vendor resistance is human resistance.  We have generations of network mavens who have known nothing but IP or Ethernet.  They simply cannot grasp a different model.

Well, we have to make a choice.  The future of networking will be the same as the present if we insist on building future networks using current principles.  We can’t bring the sky and ocean together without making rain.