Does the Fact that Operators Know SDN and NFV are at Risk Mean They’ll Fix Them?

Are we finally realizing that NFV is going to fail without change?  I’ve been saying that for years, but an article in Light Reading last week suggested that operators in a conference were increasingly skeptical about both SDN and NFV.  Skepticism about the present is at least a credible driver to changes in the future.  Not everyone agrees about operator attitudes, of course; one vendor rep who also attended the session commented that he’d taken away a completely different picture.  Beauty, SDN, and NFV all appear to be in the eye of the beholder.

Let’s start by admitting that catchy headlines are a fixture in the media, and vendor denials of problems in a market they’re committed to are equally common.  The truth is that neither SDN nor NFV have met expectations, but both have had some success.  To my mind, that says that neither of the two technologies can be considered a success, and that SDN is largely hype (so is NFV).  Is NFV a “faux pas” as the article suggests?  The term means a “blundering error” or “gaffe”, and I think the first definition is close to the truth.  We had a chance, with NFV, to do something revolutionary, and messed it up unnecessarily, so that might be a blundering error.

Would operator skepticism be enough to change things for either or both technologies?  To start off, not all operators are skeptical because not all operators saw SDN or NFV as being the source of revolutionary improvements in their profit per bit.  Most operators have been targeting NFV at virtual CPE (vCPE) for business service missions, and for anyone except an operator who does little other than carrier Ethernet, that mission probably wouldn’t impact costs all that much.  For SDN, operators have never been convinced it could replace routing, and most would say that they are already interested in SDN for data center switching.

It also depends whether skepticism is channeled along positive lines.  I don’t mean that we should simply blow kisses at SDN and NFV because they have good intentions; people have told me I should be doing that, and I reject the notion that success can come from self-deception.  We should accept the failures but look to the reasons for the failures only as a pathway to solving problems.  I’ve blogged about some of the basic issues and pathways for solution for both SDN and NFV, so here I’m going to focus on whether attention to the “problems” of SDN and NFV could be enough to drive a solution.

SDN’s problem is lack of demonstrated scalability.  It works great in the data center.  It works great at the core of a transport network, married with optical technology and surrounded by BGP emulators (this is Google’s configuration).  Whether SDN could work at scale in private networks (VPNs) depends on the flavor of SDN we’re talking about.  Overlay SDN (Nokia/Nuage is the best example) can surely do that.  OpenFlow SDN needs controller federation to scale into the WAN.

NFV’s problem is also scalability, in a sense at least.  We know NFV can work for virtual CPE hosted in an agile premises box.  We don’t really know much about how it would work beyond that limited mission, and the mission of vCPE is way too narrow to justify the attention that’s been paid to NFV.  To get beyond that mission, NFV would have to address total service lifecycle orchestration, which it will never do.  We may address it outside NFV (as the NFV ISG intended from the first), but we’re not there yet.

Both the technical evolution of SDN and of NFV are blundering forward, but will they get to where they need to be?  The biggest problem for both, in my view, is that neither really had a specific destination.  They were technologies that asserted properties and not benefits.  A benefit is something that can be quantified, meaning measured against cost to establish a return on investment.  We heard a lot about the properties of both SDN and NFV, but not much about benefits.  Yes, I’ve seen studies sponsored by vendors.  In my view, none of them would stand close financial assessment.  We’d have to do better.

There are only two kinds of savings that a network technology could assert.  One is capex or “capital expense”, meaning the cost of the equipment and software, the stuff that’s usually depreciated over a fixed cycle.  The other is opex or “operations expense”, meaning the cost of operations.  This is normally expensed in the current period.  Any technology exposes operators to both, and so any technology really has to reduce the net of the two.  Ideally, that would mean reducing both at the same time.

The baseline against which both SDN and NFV have to be measured is the current network, which consists of multiple layers of technology.  SDN and NFV really address Level 2 and 3 (and, in the case of NFV, some stuff like firewalls that live above Level 3).  We have to look at what the technologies might do to both capex and opex, focusing on L2/L3/L3+, or we can simply look at how SDN and NFV would fare against other developments.

To me, the big problem for both SDN and NFV today is the open P4-modeled devices, the things that AT&T/Linux Foundation DANOS and ONF’s Stratum define.  If you built commodity switch boxes using merchant silicon and combined them with open-source routing/switching software, you’d erase as much capital cost as either SDN or NFV would.  And you’d do it at no incremental technology risk.  SDN and NFV were both estimated to save about 25% on capex.  Way back in 2013, operators told me that a 25% reduction in capex wouldn’t justify the NFV risk; “We can get that beating Huawei up on price” was their quote.  Open boxes will kill proprietary appliances more surely than SDN or NFV could.  That leaves opex.

Neither SDN nor NFV really gets into operations automation, what’s today called “zero-touch automation” (ZTA) of the service lifecycle.  Neither can then claim to reduce it, and no matter how operators view SDN or NFV, they’re not going to push either technology into ZTA.  That’s the realm of what’s called “orchestration” today, and most operators think that the ONAP project offers the best hope for open-source ZTA.

Perhaps, but how much would that save?  Light Reading contributed another article that bears on the solution to SDN’s and NFV’s problems.  Huawei, the 900-pound gorilla of networking, says that automation could eliminate 90% of network operations jobs.  Most of those who’ve read my blog over time know that I’ve been saying that opex reduction is a better target for modernization than capex reduction.  In fact, capex reduction will be swamped by increased complexity-related costs if something isn’t done with zero-touch automation.  Is Huawei promising the solution?

It depends on what a “network operations job” is.  This year, operators will spend about 30 cents of every revenue dollar on “process opex”, meaning the operations costs directly associated with the network.  If Huawei was going to cut those costs by 90%, they’d save 27 cents, which is more than the operators’ total capital budgets, and it would be a revolution.  But they aren’t going to do that.  They’re talking about “network operations costs”, meaning the resources used to control and maintain the network.  That cost, this year, will run about 4.4 cents of every revenue dollar, so Huawei’s claim would mean a savings of about 4 cents.  Capex savings of 25% would reduce costs by more than that.

The point here is that opex is just as much “bulls**t” as capex unless you address the whole of process opex, which goes way beyond the network operations costs alone.  OSS/BSS has to be redone for true ZTA.  You have to refine your competitive positioning on services, the way you attract and retain customers, sell, market, advertise.  True ZTA, according to my own model, could save about 8 cents from a 2020 total of 31 cents per revenue dollar in process opex.  Guess what that is?  It’s about 25%, but 8 cents is a third of total capex, and if you add a capex reduction of 25% to it, you end up saving 13 cents of each revenue dollar, which is more than enough to make operators’ profit per bit numbers turn around for a decade or more.

The real lesson here is that 25%.  It seems like everything we look at can only impact costs, in its area of focus, by about 25%.  That means that to get to a reasonable number overall, we need to have a broad area of focus, impact a lot of things.  The focus that everyone thought we needed to move standards and specs forward quickly has resulted in a scope too narrow for benefits to grow to a meaningful level.  We’ve undershot relative to necessary benefit scope in the past, and no amount of recognition of past problems will ensure we don’t do it again.  Where the ZTA and open-box stuff differs from SDN and NFV is that it can be applied incrementally and doesn’t require a fork-lift upgrade.  It can survive and grow even when short-term mindsets prevent grabbing ahold of the whole problem.

I think what operators are starting to see isn’t that we need something different in the way of network technology, or a better way of justifying the new technologies like SDN and NFV.  We could make an enormous change simply by adopting open-box technology and ZTA, and I think realization of that truth is dawning.  Not fast enough to reap the savings we could have achieved by 2020, though.  The lesson of the last half-decade is that not facing the truth immediately is very costly in the long run.