Why SDN and NFV Could Still Fail Utterly

There’s a popular view that as we move into the future, operators will build networks from commodity/commercial off-the-shelf servers (COTS) rather than specialized network equipment.  Network Functions Virtualization (NFV) is the poster child for the notion, and there have been a flood of announcements from vendors who have made hosted network functionality available under the banner of “NFV”.  That most of the stuff isn’t NFV at all but just something that might be deployable under a conformant NFV implementation is one of my gripes.  Another is the “COTS” angle.

Take a look at Amazon some time and you’ll see an immediate issue with COTS.  You can go out today and buy a tower server configuration for about eight hundred bucks.  A 48-port off-the-shelf gigabit Ethernet switch costs about half that, and obviously you’d need to add a lot of network cards to a server to make it into a 48-port switch.  This raises two important points in itself.

Point one is that most network devices today are specialized in some way that COTS servers aren’t.  The simple example of the number of Ethernet ports comes to mind, but it’s also important to remember that Intel developed DPDK to provide an accelerated data path to suit applications that were more data-plane-centric than usual.  You could probably improve the performance of servers in routing applications in particular if you added content-addressable memory to them.

The second point is that we already have cheap network devices; they’re just not from name-brand vendors with a bunch of management bells and whistles or support for arcane evolving protocols.  I’ve had some of my Ethernet switches for a decade or more at this point and nothing has broken, so you can’t say they aren’t reliable.  I have a hard time keeping a server for five years.

The lesson we should be learning from our little shopping excursion is that the network functions we see the most in the network are functions that are probably not going to be replaced by COTS, but if you still have doubts, look at the next point, the “SDN revolution”.

SDN in its “purist” OpenFlow form says that we can take the software logic out of switch/router devices and centralize it, then use that central logic to drive changes in forwarding tables in white-box bit-shufflers.  Nobody is proposes white-box COTS here, folks.  We admit that handling data flows is probably best left to specialized silicon.

SDN also introduces another obvious question in evolution, which is whether total cost is really being addressed here.  Can we make SDN switches cheaper than switches from Cisco or Juniper?  Sure, but we’ve already proved we can make Ethernet switches cheaper than those vendors’ products, without exposing ourselves to this new centralized-network-and-OpenFlow thing.  And how exactly do we control these new SDN networks?  How much will operations cost?  How do they scale and interconnect?  Without firm answers to these questions we can’t say whether an SDN switch can replace an Ethernet switch at acceptable levels of reliability and performance, much less TCO.

Operations costs are the big question for both SDN and NFV because a virtualized solution to networking in either form requires a combination of hosted elements and agile interconnections.  In the good old days, a router was working when it was working.  A virtual router is working when all its components are running on VMs that are assigned to servers performing within spec, interconnected by pipes that are all delivering on the QoS metrics expected (some of which may be passing over “real” routers!).  We might, for a given implementation of a network appliance, need two or three different VMs and four or more pipes.  Generally, operations costs are proportional to complexity.  Generally, the MTBF of five to eight components is lower than the MTBF of one component, not even considering that few COTS servers can match network devices in the MTBF space.

Then there’s Facebook, whose Open Compute Project is dedicated to the notion that a cloud data center needs specialized technology, not COTS and not even traditional networking.  Two interesting points here; first that “COTS” may not be the right approach even for server hosting of web applications and second that you don’t have to match the mass market to get economies of scale.  Facebook stays that even their own server use could justify specialized design and manufacturing and still create lower TCO than commercial products do now.  Sure, they’d like an open initiative to make it better, but they’re not waiting for that.

I’m not saying that SDN or NFV aren’t going to work, but I am saying that the notion that networking is going to move entirely to either of these things is nonsense.  In fact, my view is that conventional Ethernet and IP network architectures probably can’t be replaced by hosted applications effectively.  What we need to be doing is looking beyond that goal.  If you want to do networking better or cheaper, you have to network differently not replace network devices 1:1 by hosted equivalents.  I think that’s ultimately where SDN, NFV, Open Compute and all our other initiatives will end up, or they’ll die on the vine.

I also think that we have to operationalize differently.  Everyone knows that “provisioning” isn’t an orderly linear process anymore.  Yet we still think of OSS/BSS as applications, in an age where componentized software and workflows have made an application an almost extemporaneous combination of functional elements.  Even last week, we heard about operators deploying “applications” in operations.  It’s that kind of thinking that locks operations in the dark ages, and when we do that we lock TCO up along with it.  And we need to ask the question “If we modernize operations fully, how much cost improvement would we see even presuming legacy infrastructure and current levels of equipment competition?”

We have a thinking-big problem here, and we have a lot of thinking-small processes working to solve it.  Everything that works bottom-up is something that’s at least in part driven by the fear of changes on a large scale.  You can’t conceptualize a skyscraper while your mucking about in the pebbles of the foundation fill.  We’re accepting the goals of the past, the operations of the past, the architectures of the past, and saying we’ll then revolutionize the future.  I don’t think so.  It’s time to start imagining what could be and not replicating what we have using different tools.