More on the Savings or Benefits of NFV

My blog recently on NFV performance has generated a long thread of comments (for which I thank everyone who’s participated), and from the thread I see a point emerging that’s really important to NFV.  The point is one I’ll call scope of benefits.

Operators build networks to sell services from.  If you presume that the network of the future is based in part on hosted resources that substitute for network components, then the evolution to that future network will occur by adding in network components, either to fulfill new opportunities or as an alternate way of fulfilling current ones.  If I want to sell a security managed service I need the components thereof, and I could get those by selling a purpose-built box on premises, a generalized premises box/host with associated software, or a hosted software package “in the cloud” or in an NFV resource pool.  NFV, early on, was based on the presumption that hosting higher-level functions like security on a COTS platform versus custom appliance would lower costs.

I’ve made the point, and I made it in that blog, that operators now tell me that they think that NFV overall could have no more than about a 24% impact on capex, which was in the same range as they expected they could obtain from vendors in the form of discounts (as one operator puts it, by “beating up Huawei on price”).  In the LinkedIn comments for the blog a number of others pointed out that there were examples where capex savings were much higher—two thirds or even more.  The question is whether this means the 24% number is wrong, and if not what it does mean.

Obviously, operators say what they say and it’s not helpful to assume they’re wrong about their own NFV drivers, but I can’t defend their position directly because I don’t know how they’ve arrived at it.  However, I did my own modeling on this and came up with almost exactly the same number (25% with a margin of plus-or-minus two percent for simple substitution, up to 35% with roughly the same range of uncertainty if you incorporated assumptions about multiplication of devices to support horizontal scaling). That number I understand, and so I can relate how those 66% savings examples fit in this picture.  The answer is that scope-of-benefits thing.

Suppose you have a food truck and sell up-scale sandwiches.  Somebody comes along and tells you they have an automatic mayo creator that can make mayo at a third the commercial costs.  Does that mean your costs fall by 66%?  No, only your mayo cost.  The point here is that operators are going to impact capex overall in proportion to how much of total capex a given strategy can impact.  Security appliances represent less than 5% of capex for even the most committed operator in my survey, and across the board their contribution to capex wasn’t even high enough to reach statistical significance.  So if I cut capex for these gadgets to zero, you’d not notice the difference.

If you want to make a big difference in capex you have to impact big areas of capex, most of which are actually not even NFV targets.  Virtual access lines?  Virtual fiber transport?  I don’t think so, nor is virtual radio for mobile very likely.  Yes, we can virtualize some functions of access or transport or radio, but we need real bits, real RAN.  Where we find opportunities for real capex reduction at a systemic level is in L2/L3 infrastructure.  It’s the highest layer that we see a lot of, and the lowest that we can reasonably expect to virtualize.  Every access device is L2/L3, as well as most aggregation components, points-of-presence, EPC, and so forth.

I’m not advocating that we replace everything at L2 and L3 with virtual devices, though.  The problem with that is the fact that capex can’t be used as a measure of cost reduction anywhere at all.  We can only use total cost of ownership, and as I’ve said TCO is more and more opex.  The question that any strategy for capex substitution would have to address is whether opex could at the minimum be sustained at prior levels through the transition.  If not, some of the capex benefits would be lost to opex increases.  And since we have, at this moment, no hard information on how most NFV vendors propose to operationalize anything, we have great difficulty pinning down opex numbers.  That, my friends, is also something the operators tell me, and I know from talking to vendors that they’re telling most of the vendors that as well.

One of the key points about opex is the silo issue.  We are looking at NFV one VNF at a time, which is fine if we want to prove the technical validity of a VNF-for-PNF substitution.  However, the whole IP convergence thing was driven by the realization that you can’t have service-specific infrastructure.  We can’t have VNF-specific NFV for the same reason.  There has to be a pool of resources, a pool of functional elements, a pool of diverse VNFs from which we draw features.  If there isn’t then every new service starts from scratch operationally, they share resources and tools and practices inefficiently, and we end up with NFV costing rather than saving money.

Service agility goes out the window with this situation too.  What good is it to say that NFV provides us the ability to launch future services quickly if we have VNFs that all require their own platforms and tools?  We need an architecture here, and if we want operators to spend money on that architecture we need to prove it as an architecture and not for one isolated VNF example.  There is no such thing as operations, or resource pools, in a vacuum.

Where we start is important too, but there is no pat answer.  We could pick a target like security and expect to sell it to Carrier Ethernet customers, for example.  But how many of them have security appliances already, things not written off?  Will they toss them just because security is offered as a service?  We could virtualize CPE like STBs, but at least some box is needed just to terminate the service, and the scale of replacing real CPE with a virtual element even in part would be daunting without convincing proof we could save money overall.  One operator told me their amortized annual capital cost of a home gateway was five bucks.  One service call would eat up twenty years of savings even if virtual CPE cost nothing at all.

I said this before, and I want to repeat it here.  I believe in NFV, and I believe that every operator can make a business case for it.  That’s not the same thing as saying that I believe every business case that’s been presented, and operators are telling me they don’t believe those presented business cases either, at least not enough to bet a big trial on them.  So my point isn’t to forget NFV, it’s to forget the hype and face the real issues—they can all be resolved.

Do Vendors Now Risk Running Out of NFV Opportunities?

It’s about time for somebody to ask the question “Is NFV going mainstream?” so I may as well ask it here, then try to answer it.  To be sure, NFV deployment is microscopic at this stage but NFV interest is high and growing.  It’s obviously possible that this is a classic hype wave (remember the CLECs?) but there are some interesting signs that perhaps it’s more substantive.  The question isn’t whether NFV is here, but whether it’s now so accepted that it’s on the way that vendors are positioning for NFV as a given, for them and for the industry.

Case in point:  Alcatel-Lucent has hired former Oracle software/operations guru Bhaskar Gorti, who takes leading IP Platforms, which is where NFV activity reports in (CloudBand).  This could be big as far as signaling Alcatel-Lucent’s NFV intentions because up to now CloudBand hasn’t really had much executive engagement.  Many in Alcatel-Lucent have considered it little more than a science project, in fact.  Gorti could put some executive muscle behind it in a political sense, and his operations background means he might also do something useful to link CloudBand to the operations efficiency mission of NFV, a mission that’s been a bit of an NFV weak point for Alcatel-Lucent up to now.

Conceptually, CloudBand has always been a contender in the NFV space.  Of all the network equipment vendors, Alcatel-Lucent is the only one that has really gotten decent reviews from operators in my surveys.  Obviously the company is credible as a supplier of NFV technology, but its big problem for many operators is that it’s seen as defending its networking turf more than working to advance NFV.  Which is why Gorti’s access to Alcatel-Lucent CEO Combes could be important.  He’s either there to make something of NFV or to put NFV to rest.

Why now?  I think the first reason is one I’ve blogged on before.  Operators say that 2015 is the year when either, NFV-wise.  Either NFV gets into some really strong field trials that can prove a business case, or NFV it loses credibility as a path to solving operator revenue/cost-per-bit problems.  Alcatel-Lucent needs leadership either way, and arguably they need OSS/BSS strength either way.  Operators are well aware of the fact that network complexity at L2/L3 is boosting opex as a percentage of TCO, to the point where many tell me that by 2020 there would be little chance that any rational projection of potential capex reduction would matter.

A second reason, perhaps, is HP.  Every incumbent in a given market knows that the first response to a revolutionary concept is to take root and become a tree in the hopes that it will blow over.  However, at some point when it doesn’t you have to consider that losing market share to yourself through successor technology is better than losing it to somebody else.  HP has nothing to lose in network terms and it’s becoming increasingly aggressive in the NFV sector.  HP also links NFV to SDN and the cloud, and their story has a pretty strong operations bend to it.  If HP were to get real momentum with NFV they could be incumbent Alcatel-Lucent’s worst nightmare, a revolutionary that’s not investing their 401k in the networking industry.

Another interesting data point is that Light Reading is reporting that Cisco is going to say that software and the cloud are the keystones of its 2015 business strategy, due to be announced to an eager Wall Street and industry audience late this month.  Cisco poses a whole different dimension of threat for Alcatel-Lucent because they are both a server player (they displaced Oracle in the server market lineup recently) and obviously a network equipment giant.  Were Cisco to take a bold step in NFV, SDN, and the cloud they’d immediately raise the credibility of these issues and make a waffling in positioning by Alcatel-Lucent (which arguably has been happening) very risky.

Then there’s the increasing sentiment that service provider capex has nowhere to go but lower, at least if current service/infrastructure relationships continue as is.  Every operator has been drawing the Tale of the Two Curves PowerPoint slides for a couple years.  These show trends in revenue per bit and cost per bit, and in all the curves there’s a crossover in 2017 or so.  If operators can’t pull up the revenue line, drive down the cost line, or both, then the only outcome is to reduce infrastructure spending to curtail their losses.  Well, what’s going to accomplish that lofty pair of goals if not NFV?

We have some “negative evidence” too.  SDx Central (even the name change here is a nod to NFV!) did a story on things not to look for in 2015—restoration of carrier capex was one.  Another was some sanity at Juniper.  Juniper is a company Alcatel-Lucent has to be concerned about in a different way, which is as the “there but for the grace” alternative for themselves.  Juniper doesn’t have servers.  Its prior executive team didn’t know the difference between SDN and NFV and some at least say that its CTO and vice-Chairman is conflicted on this whole software thing.  It’s changed CEOs twice in the last year and it’s getting downgraded by Street analysts because of capex trends.  Arguably Juniper had as good a set of assets to bring to the NFV party as anyone, but they stalled and dallied and never got up to dance.  The article, and many on the Street, think it’s too late.  How long might it be before it’s too late for Alcatel-Lucent?

It may be already in a traditional NFV sense, in which case the selection of Gorti may have been truly inspired.  We are now, in my view at least, at the point where if you don’t demonstrate a truly awesome grasp of the operationalization of not only NFV but SDN and legacy networking, you don’t get anything new to happen with carrier spending.  There’s not time to save them with new services, and those new services could never be profitable without a better operations platform than we can provide with current OSS/BSS technology.

We seem to be aligning NFV positioning with the only relevant market realities that could drive deployment.  That means that those late to the party may find it impossible to grasp one or more of the NFV benefits and make it their own.  Differentiation could drive later players to avoid opportunity in order to avoid competition.  That would be, as it always is, fatal.  So 2015 is more than the year when field trials have to work, it’s the year when vendor positioning has to work.

Golden Globes, Nielsen, Net Neutrality, OTT Video…and NGN

We’re not there yet, not in the age of online video where TV networks are dinosaurs and Netflix reigns supreme.  Even though (not surprisingy) Netflix’s CEO thinks that TV as we know it will be extinct in five years or so.  But we are clearly in a period of change, driven by a bunch of factors, and what we’re changing to, video-wise, will obviously do something, at least, to shape the future of SDN and NFV.

We can start our journey of assessment of video’s future with the Golden Globe awards and the impressive showing by Transparent and House of Cards shows that OTT video players (Amazon and Netflix, respectively) are capable of producing content that means something.  So the question is how much it will end up meaning, and what exactly the meaning will be.

TV is ad-sponsored broadcast-linear distribution, and that still makes up the great majority of viewing.  Nielsen’s surveys say that people actually watch more hours of TV now; OTT video is filling in time when they don’t have access to traditional channelized video and giving people something to watch when “nothing’s on”.  However, TV is its own worst enemy for two reasons; the law of averages and the law of diminishing returns.

The law of averages in TV means that you have to appeal to an increasingly “average” audience in order to justify the programming to advertisers.  How many great shows have turned to crap?  It’s not an accident, it’s demographics.  You want a mass market and it has to be made up of the masses.  This means that a good segment of the viewing population is in a stage of successive disenchantment as one show after the other becomes banal pap for the masses.  These people are targets for VoD and OTT.

The law of diminishing returns says that when you add commercial time to shows to increase revenue you drive viewers away, which means you have to add even more time and drive away even more viewers.  ABC’s “Galavant” seemed to me to have about a third of its time dedicated to commercials, making a decent show hard to tolerate.  People disgusted with commercials can turn to VoD and OTT.

Both these points bring out another because both relate to advertising—commercials.  Ad sponsorship’s problem is that there’s only a finite and shrinking pie to slice and an ever-growing number of people slicing it.  For every Facebook gain there’s a Google or TV network loss.  One thing that Netflix and Amazon prime have going for it is simple—retail.  You sell stuff and people pay you.  How much depends on your perceived value to them, and it is already clear that there’s a lot more money in selling service than in selling ads.

So more OTT video is in the cards, which means more traffic that ISPs don’t make money on.  In fact, many of those ISPs (all the cable companies and many telcos) are going to lose money if broadcast video dies off.  And to make matters worse, political pressure is converting regulators to the view that settlement for carriage of video on the Internet is non-neutral because voters don’t like it.  Never mind whether the ISPs can be profitable enough to build and sustain infrastructure.

The most likely result of this combination is the increased investment in off-Internet services.  In the US, we already have an exemption from regulation for CDNs and IPTV.  In Germany, there’s a suggestion that there be two “Internets”, one neutral enough to satisfy advocates and the other where commercial reality guides settlement.  I leave it to you to guess which would survive, and Cisco is already opposing the idea because it violates the Cisco principle that operators have a divine mandate to buy (Cisco) routers to carry all presented traffic, profits be hung.

Profit-driven IP, as opposed to the Internet, would be a totally metro affair, with access connected to cloud-hosted services.  Given the strong cloud affinity, it’s very likely that we’d build this Profit Internet with SDN and NFV (another reason for Cisco not to like the idea!) and virtual technology in general.  We’d use some electrical grooming to separate it from the Internet, and gradually “OTT video” would move to it, complete with settlement for carriage and QoS and all the things regulators say can’t be part of the Internet scene.

Why SDN and NFV?  First, you have to keep the regular Internet and the Profit Internet separate or regulators will balk.  Second, you have to constrain costs because you can’t overcharge content providers or you kill off the content.  Third, full connectivity isn’t even useful in the Profit Internet because everyone is trying to get to a video cache or the cloud and not each other.  You don’t need most of routing here.

Advertising can still have a place here, particularly on the “Internet” where there’s no settlement or QoS.  Best efforts is fine for free stuff.  But if the two principle TV faults are insurmountable in the context of broadcast TV like Netflix thinks, then broadcast itself is at risk.  You’d have “VoD commercials” like you have in on-demand TV services from cable or telco providers today.

NGN was going to consume SDN and NFV no matter what, and we were going to end up with some virtualized-L2/L3 model of NGN even if we don’t succeed with SDN and NFV.  Metro was always going to be where most money was invested too.  So it’s fair to say that shifts in OTT video won’t drive all these changes, but it could accelerate them.  In particular, a decision by ISPs to partition profitable services outside the Internet (or create two Internets as has been proposed in Germany) would funnel more money to NGN faster, which could not only accelerate SDN and NFV but put the focus more on service agility.

But remember, the cheapest way to get a new show in front of 60 million viewers is still linear RF.  The thing is, we don’t get an audience of 60 million viewers if we’ve trained people to demand just exactly what they want and not what’s the best thing actually on at a given point in time.  VoD and DVR are conditioning audiences to be consumers of what they want when they want it, a mission linear RF broadcasting will never be able to support.

I watched a lot of TV as a kid.  I still watch a lot of TV.  When I was a kid, I had two networks to choose from, and I picked the best of an often-bad lot.  Today I have fifty networks or more to choose from, but the “lot” is still getting bad.  The very success Nielsen points out is working against the TV industry because the more I view, the more shows move into the “I’ve seen that” category.  The ones I wanted to see are being averaged out into drivel.  That drives me to other viewing resources.

It’s not all beer and roses for the OTT video people, though.  At the root of the video food chain is the producer of the video, who has to buy rights, hire cast and crew, and produce the stuff.  Amazon and Netflix proved they can produce content, but not that they can produce enough.  The big question on the video revolution is where we can get non-revolting video, and we don’t have an answer to that yet.  OTT can still fall on its face, and if it does a lot of my childhood shows may become new again.

A New Policy Managed Model for SDN (and NFV?)

One of the challenges that packet networks faced from the first is the question of “services”.  Unlike TDM which dedicates resources to services, packet networks multiplex traffic and thus rely more on what could be called “statistical guarantees” or “grade of service” than on specific SLAs.  Today, it’s fair to say that there are two different service management strategies in play, one that is based on the older and more explicit SLAs and one based on the packet norm of “statisticalism”.  The latter group has focused on policy management as a specific mechanism.

One reason this management debate has emerged recently is SDN (NFV has some impact on this too, which I’ll get to in a minute).  From the first it’s been clear that SDN promises to displace traditional switching and routing in its “purist” white-box-and-OpenFlow form.  It’s also been clear that software could define networks in a number of lower-touch ways, and that if you considered a network as my classic black box controlled from the outside, you’d be unable to tell how the actual networking was implemented—pure SDN or “adapted Ethernet/IP”.  Cisco’s whole SDN strategy has been based on providing API-level control without transitioning away from the usual switches and routers.

Policy management is a way to achieve that.  We’ve had that for years, in the form of the Policy Control Point and Policy Enforcement Point (PCP/PEP) combination.  The notion is that you establish service policies that the PCP then parses to direct network behavior to conform to service goals.  Cisco, not surprisingly, has jumped on policy management as a kind of intermediary between the best-efforts service native to packet networks and the explicit QoS and traffic management that SDN/OpenFlow promises.  Their OpFlex open policy protocol is a centerpiece in their approach.  It offers what Cisco likes in SDN, which is “declarative control”.  That means that you tell the network what you want to happen and the network takes care of it.

How does this really fit in the modern world?  Well, it depends.

First, policy management isn’t necessarily a long way from purist SDN/OpenFlow.  With OpenFlow you have a controller element that manages forwarding.  While it’s not common to express the service goals the controller recognizes as formal policies, you could certainly see a controller as a PCP or as a PEP depending on your perspective.  It “enforces” policies by translating service goals to forwarding instructions, but it also “controls” policies by providing a forwarding changes in forwarding down to physical devices that shuffle the bits.  If applications talked to services via APIs that set goals, you could map those APIs to either approach.

The obvious difference in the purist model versus the policy model as we usually see it is that the purist model presumes we have explicit control of devices where the policy model says that we have coercive control.  We can make the network bend to our will in any number of ways, including just exercising admission control to keep utilization at levels needed to sustain SLAs.  That’s our second point of difference, and it leads to something that could be significant.

With explicit control, we have a direct link between resources and services.  Even though the control process may not be aware of individual services, it is aware of individual resources because it has to direct forwarding by nodes over trunks.  With coercive control, we know we’ve asked for some behavior or another, but how our desired behavior was obtained is normally opaque.  That’s a virtue in that it creates a nice black-box abstraction that can simplify service fulfillment, but it’s a vice in a management sense because it isolates management from services.

In an ordinary policy-managed process you have network services and you have offered services, with a policy controller making a translation between the two.  Your actual “network management” manages network services so your management tendrils extend “horizontally” out of the network to operations processes.  Your offered services are consumers of network services, but it’s often not possible to know whether an offered service is working or not, or if a network service breaks whether that’s broken some or all of the offered services.

What separates network and offered services is a mapping to specific topology that can relate one to the other.  One possible solution to the problem is to provide topology maps and use them not only to make decisions on management and create management visibility but also to facilitate control.  A recent initiative by Huawei (primary) and Juniper called SUPA (Shared Unified Policy Automation) is an interesting way of providing this topology coordination.

SUPA works by having three graphs (Yang models).  The lowest-level one models the actual network at the protocol level.  The highest one graphs the service abstractly as a connectivity relationship, and the middle one is a VPN/VLAN graph that relates network services to the physical topology.  The beauty of this is that you could envision something like SUPA mapping to legacy elements like VPN and VLAN but also to purist OpenFlow/SDN elements as well.  You could also, in theory, presume that by creating a new middle-level model to augment the current ones, extend SUPA to support new services that have forwarding and other behaviors very different from those we have in Ethernet and IP networks.

Obviously, management coordination between services and networks demands that somebody understand the association.  In SUPA, the high-level controller binds an offered service to a topology and that binding exposes the management-level detail in that it exposes the graph of the elements.  If the underlying structure changes because something had to be rerouted, the change in the graph is promulgated upward.  The graph then provides what’s needed to associate management state on the elements of the service with the service itself.

This is an interesting approach, and it’s somewhat related to my own proposed structure for an “SDN”.  Recall that I had a high-level service model, a low level topology model, and a place where the two combined.  It can also be related to an NFV management option because you could say that the middle-level graph was “provisioned” and formed the binding between service and resources that you need to have in order to relate network conditions to service conditions.

I’m not saying that this is the final answer; SUPA is still in its early stages.  It is a hopeful sign that we’re looking for a more abstract and thus more generalizable solution to the management problem of SDN.  I’d like to see less specificity on the middle-layer graphs—a network service should be any realizable relationship between endpoints and not just IP VPNs, VLANs, or tunnels.  I’d like to see the notion of hierarchy be explicit—a low-level element (an Application Based Policy Decision or ABPD) should be decomposable into its own “tree” of Network Service Agent and ABPD and each NSA should offer one or more abstract services.  I’d also like to see an explicit way of defining management of a service through a hierarchy of related graphs.

We’re not there yet, but I think all this could be done, and so I think SUPA is worth watching.

What Could the Net Neutrality Proposals do to SDN and NFV?

With all the ink net neutrality is getting, I feel like I just have to say something about it.  Regulatory policy, after all, is perhaps the largest single factor in setting business policies for network operators and so one of the largest factors in setting capex policies too.  Since I’ve blogged before on the broad question of what might and should happen, there’s no point revisiting those points.  Instead, let’s take this first Friday of 2015 to ask what the impact of the “current proposal” on neutrality (by FCC chairman Wheeler) might be on SDN and NFV.

The first impact area is the broadest.  The operators are of the view that classifying ISPs as common carriers, even if the FCC uses Section 706 “forbearance” to shield infrastructure from wholesale requirements and other provisions of the Telecom Act, is eventually going to end up with just that—sharing infrastructure.  There is little doubt that the immediate impact would be to focus operator planners on how to invest without exposing their assets to the sharing risk.  It’s probable that this will have a negative impact on capex in 2015 but little impact on SDN/NFV because the final state of public policy won’t be known until at the earliest the end of the year.

Title II constraints will apply to only Internet services, and not to IP overall.  The second impact area is that operators will be looking at what might be in the second category and not the first, and to move stuff between the two to shield investments.  Here we find a possible impact on SDN and NFV.

It’s difficult to say whether regulators would accept an argument that Service A and Service B were both delivered on IP packets across the same access infrastructure, mingled together and indistinguishable in a technical sense, but were virtually/logically separated.  That would suggest that IP-level separation of services could be a risk to operators, which might then encourage them to adopt some form of “deep grooming” of infrastructure.  We know that you can separate Internet flows from video or voice flows at a level below that of IP and claim service separation with some confidence.  SDN might be a mechanism for doing that.

Under the FCC’s prior rules, Internet access was subject to neutrality but content delivery, CDNs, and (implicitly) cloud computing were exempt.  I think it likely that Wheeler’s intentions would be to sustain this immunity because this stuff has no place being considered a common carrier service.  That would mean that operators would be induced to consider NFV enhancements to services part of a cloud offering rather than part of a carrier service offering.  Service chaining and Carrier Ethernet are a good example.  Title II fears would suggest that operators not augment Carrier Ethernet with hosed elements, but rather offer cloud-hosting of some high-level features and connect the Carrier Ethernet service through them.

This promotes what could be called a “common-carrier-service-is-access-only” view.  Your goal as an operator is to terminate the access line in a cloud element and offer managed services through a Title II access connection.  The Internet, Carrier Ethernet, and all business services could be subsumed into a cloud-service and hidden from Title II.  The shift in policy could present challenges for NFV for two reasons.  First, it could undermine the current activity by forcing it toward a separate subsidiary (required to avoid Title II).  That would be a delay, perhaps into 2016.  Second, it could complicate the way that NFV MANO works by requiring that all services other than best-efforts Internet (which isn’t an NFV target) have integrated regulated access and unregulated hosted elements, offered by different business units.  Every deployment might then be a “federation” issue and all management might have to cross business boundaries.  Neither of these are supported in current standards or work.

The third point is that once you decide on a Title II avoidance strategy, you end up with a strong reason to virtualize everything at Level 2 and 3.  Mixing regulated and unregulated services on a common router begs for an appeal to a US District Court by a hopeful CLEC-type reseller of services.  Why not have virtual switches and routers deployed separately by both regulated and unregulated entities?  Or why not have the unregulated entity deploy all the stuff besides the access?  This shift would make current operations groups into access players and shift more and more capex to an unregulated entity that might be a set of new decision-makers adhering to new infrastructure policies.

The regulated entity (or the about-to-be-regulated one) has also paid for all the legacy stuff.  How would operators reinvest in current infrastructure without sinking more money into the Title II pit?  The question of how current infrastructure is handled if the FCC (or Congress) applies Title II has to be a factor.  Logically the assets should be redistributed, but if everything is kept in the now-Title II entity then nobody is going to add to those assets.  Uncertainty about this is obviously going to curtail spending plans while things are resolved.

If all this comes about, then the majority of incremental capex would go inside the cloud, for data centers to host cloud computing and NFV, and to fiber trunks and agile optics to create interior lines of communication.  We’d see mobile services built increasingly through the use of personal agents that disintermediated access and devices from application functionality.  We’d see IoT applications evolving to look more like big data.

Interestingly, none of this would necessarily exclude the OTTs and others from the new cloud-driven market.  Their trap is more subtle.  For the decade of the ‘90s even into the middle of the next decade (when the FCC finally wrote a CLEC order that made sense and set ISPs on the road to non-regulated operation) there were no barriers to actually investing in the next generation except the desire for a free lunch.  OTTs will face that risk again, because there will be an enormous temptation to wait to see if they can ride a Title II-created gravy train.  All that will do is to force investment even more inward, inside the cloud where regulators won’t go.

For SDN and NFV the net of all of this is short-term delay and longer-term success.  For the vendors in both spaces the decision to move to Title II and in particular to bar settlement and paid prioritization will hurt the network players and help the IT people.  It’s not that there will be a windfall of SDN/NFV spending (I’ve already noted the likelihood of a short-term deferral of major projects) but that IT players have little exposure to the market now and won’t be hurt by the hiccup that will certainly hurt the network giants.  Ironically, paid prioritization and settlement—which Internet supporters generally dislike—might be the thing that would save the current business model.  Clearly that isn’t going to happen now, so hold onto your hats and watch while the country gets another lesson in why regulations matter.

NFV’s “Performance Problem” Isn’t NFV’s Problem

PR often follows a very predictable cycle.  When you launch a new technology, the novelty (which is what “news” means) drives a wave of interest, hype, and exaggeration.  Whatever this new thing is, it becomes the singlehanded savior of western culture, perhaps life as we know it.  Eventually all the positive story lines run out, and you start to get the opposite.  No, it’s not going to save western culture, it’s actually a bastion of international communism or something.  You’re hot, or you’re not, and you can see a bit of that with NFV and performance concerns.  These concerns are valid, but not necessarily in the way we’re saying they are.

We could take a nice multi-core server with a multi-tasking OS and load it with all of the applications that a Fortune 500 company runs.  They’d run very badly.  We could take the same server, convert it into virtual machines, and then run the same mix.  It would run worse.  We could then turn it into a cloud server and get even worse performance.  The point here is that all forms of virtualization are means of dealing with under-utilization.  They don’t create CPU resources or I/O bandwidth, and in fact the process of subdividing resources takes resources, so adding layers of software to do that will reduce what’s available for the applications themselves.

The point here is that NFV can exploit virtualization to the extent that the virtual functions we’re assigning require single-tenant software components that don’t fully utilize a bare-metal server.  Where either of these two things isn’t true, NFV’s core concept of hosting on VMs (or in containers, or whatever) isn’t going to hold water.

An application component that serves a single user and consumes a whole server has to recover that server’s cost (capex and opex) in pricing the service it supports.  A multi-tenant application spreads its cost across all the tenant users/services, and so has less to be concerned about, efficiency-wise.  Thus, something like IMS which is inherently multi-tenant can’t be expected to gain a lot by sticking it onto a VM.  We’re not going to give every cellular customer their own IMS VM, after all, and it’s hard to see how an IMS application couldn’t consume a single server easily.

No matter how you overload a server, you’ll degrade its performance.  In many cases, the stuff we’re talking about as NFV applications won’t wash if we see transparent virtualization-based multi-tenancy as the justification.  They’re already multi-tenant, and we would expect to size their servers according to the traffic load when they run on conventional platforms.  The same is true with NFV; we can’t create a set of VMs whose applications collectively consume more resources than the host offers.

What we do have to be concerned about are cases where virtualization efficiency is inhibited not by the actual application resource requirements but by resources lost to the virtualization process itself.  Early on in my CloudNFV activity, I recruited 6WIND to deal with data-plane performance on virtualized applications, which their software handled very effectively.  But even data plane acceleration isn’t going to make every application suitable for virtual-machine hosting on NFV.  We are going to need some bare metal servers for applications that demand a lot of resources.

Our real problem here is that we’re not thinking.  Virtualization, cloud computing, even multi-tasking, are all ways of dealing with inefficient use of hardware.  We seem to believe that moving everything to the cloud would be justified by hardware efficiencies, and yet the majority of mission-critical applications run today are not inefficient in resource usage.  That’s true with the cloud and it will be true with NFV.  Virtualization is the cure for low utilization.

So what does this mean?  That NFV is nonsense?  No, what it means is that (as usual) we’re trapped in oversimplification of a value proposition.  We are moving to services that are made up as much or more (value-speaking) of hosted components as of transport/connection components.  You need to host “hosted components” on something and so you need to manage efficiency of resource usage.  Where we’re missing a point is that managing efficiency means dealing with all the levels of inefficiency from “none” to “a lot”.  In totally inefficient situations we’re going to want lightweight options like Docker that impose less overhead to manage large numbers of components per server.  In totally efficient application scenarios, we want bare metal.  And in between, we want all possible flexibility.

NFV’s value has to come not from simply shoehorning more apps onto a server (which it can’t do, it can only support what the underlying virtualization technology can support).  It has to come from managing the deployment of service components, including both connection and hosted applications or content, that make up the experiences people will pay for.  MANO should have been seen not as a mechanism to achieve hosting, but as the goal of the whole process.  We could justify MANO, done right, even if we didn’t gain anything from virtualization at all.

IMS and EPC are applications that, as I’ve said, gain little or nothing from the efficiency-management mechanisms of virtualization.  They could gain a lot from the elasticity benefits of componentization and horizontal scaling.  Virtual routing is easiest to apply on a dedicated server; it’s hard to see why we’d want to virtualize a router to share a server with another virtualized router unless the router was handling a relatively low traffic level.  But again, elastic router positioning and capacity could be valuable in the network of the future.

It’s unfair to suggest that NFV has resource issues; whatever issues it has were inherited from virtualization and the cloud, whose resource issues we’re not talking about.  Even data plane processing is not uniquely an NFV issue.  Any transactional application, any content server, has to worry about the data plane.  Even web servers, which is why at some point you stop sharing the hosting of websites on a single server and go to a dedicated server for a site, even several per site.  But it is fair to say that the “resource problem” discussion that’s arising is a demonstration of the fact that the simplistic view of how NFV will work and save money has feet of clay.  We can justify NFV easily if we focus on how it actually can benefit us.  If we invent easy-to-explain benefits, we invent things that won’t stand up to close examination.

There are two important things here.  First, componentized applications and features are more agile in addressing both new market opportunities and problems with infrastructure.  Second, componentization creates an exponential increase in complexity that if left untreated will destroy any possible benefit case.  NFV is the operationalization of componentized service features.  Ultimately it will morph into the operationalization of the cloud, joining up with things like DevOps and application lifecycle management (ALM) to form a new application architecture.  And that architecture, like every one before it, will put software to hardware mindful of the fact that capacity is what it is, and you have to manage it thoughtfully whether you virtualize it or not.

Ten Truths for the Future of SDN/NFV

One of the ongoing themes in both SDN and NFV is that operators need these technologies to compete with the OTT players.  We also hear that operators need a cultural transformation to do that, or that they need to form subsidiaries or buy somebody.  We could almost claim to have a cottage industry here, people sitting around explaining what operators have to do to “compete with the OTT players”.  It’s not that simple.  Let’s lay out some basic truths.

Truth one is that most of the OTTs are in a less-than-zero-sum industry.  Ad spending globally is declining, and at some point what can be made available to earn OTT profits would have to come out of what’s available to produce content or do something else essential.  Operators are at least in a market where there’s a willingness to pay for services.  A good Tier One still earns almost as much revenue as the OTT industry does.

Truth two is that operators don’t need new technology to compete with OTTs in a direct sense.  Google and Netflix didn’t get to where they are through SDN or NFV.  “Over the top” is what it says it is—on the network and not in it.  Hosting content or providing ad-sponsored services doesn’t mean inventing new standards, particularly at the network level.

So what we have here, in the net, is that most of the presumptions of what operators should do are based on a couple false premises.  Is staying the course then the right approach?  No, and here are some further truths to prove it.

Truth three is that there is absolutely no way to sustain a business in transport/connection over the long term without public utility protection.  Bits have been getting cheaper and cheaper in a marginal cost sense, and while that trend is certain to slow, it’s not going to slow enough to avoid constricting capital budgets.  The network pricing model of today is eating its own future, and something has to turn that around.  Revenues have to go up, costs have to go down, or both.

Which brings us to truth four; the network must always be profitable if operators are going to continue to invest in it.  That means that even if services “over the top” of the network come along to provide relief, they can’t just subsidize an unprofitable network foundation.  The operators, in that situation, would be competing against players who didn’t have to do that subsidizing and they’d never be able to match pricing.

Add these points up, and we can at least see what SDN and NFV have to do for us.  It is critical that the cost of transport/connection networking be managed better than we’re managing it now.  A couple decades ago, network TCO was a quarter opex and three-quarters capex; now that’s heading toward being reversed.  So more truths.

Truth five is that neither SDN nor NFV are explicitly aimed at solving the problem of operations cost bloat.  We have not proved at this point that SDN has a significant impact on opex because we’ve not proved we know how to apply it across the full scope of current services.  The NFV people have effectively declared opex out of scope by limiting their focus to the hosting of service elements that are above transport/connection.

As to where, then, opex could be managed, we come to truth number six; we can’t make revolutionary changes to opex without revolutionizing something at the technology level.  Our whole notion of network operations, from EMSs up to OSS/BSS, has to be framed in a modern context because we’ve seen opex bloat as a result of a growing disconnect between technology and business realities within the network and the support systems we’ve used.

Truth seven is that no transformation of OSS/BSS is going to work here, nor is any transformation in NMS or EMS or SMS or any other xMS.  What we need to do is to define a management model that’s attuned to the evolution of both services and infrastructure.  That model has to arise almost completely unfettered by concerns about how we’re going to get there.  We have to embrace utopia, and then worry about achieving it as efficiently as possible.  Otherwise we’ll take a series of expensive side-trips and end up in a couple years having spent more to achieve next to nothing.

How does this help the operators, though?  More truths.

Truth eight is that there is a natural base of service opportunity awaiting exploitation, and it’s almost totally owned by the operators.  They have the cell sites that support both mobile users and any realistic IoT models.  They have knowledge of the movement of users, individually and en masse.  They have, if they design their utopian operations mechanisms correctly, the best possible cost base and could produce the best cost/performance/availability combination.  I did a presentation on this issue years ago, and the theme was simple.  The network knows.  Operators could leverage what the network knows, through operations practices honed to do the near-impossible task of keeping transport/connection services profitable.  Do that and they cut their costs low enough that VCs will move on to something else and stop funding OTTs.

And truth nine is that all these future services will have to move toward fee-for-service and away from ad sponsorship to dodge that less-than-zero-sum-game problem.  Operators are already there, in a position where their customers expect to pay for something.  Would it be easier for operators to charge for services, or Google?  If operators can take the simple step of finding stuff people will pay for, they can win.

Which brings us to the final truth.  Services are an app.  Experiences are an app.  The cloud is an app.  Everything that we expect users to consume, however it’s paid for, is an app.  The notion of an app is the notion of a simple storefront hiding a massive factory and warehouse process behind.  It’s critical that we frame our future in app terms because people consume them.  It’s critical that we make the app substantive by exploiting what the network knows, that we link it to point-of-activity services like social and location-aware contextual responses to questions, because those are things that operators can do but that OTTs can do too.  Operators have to gain some leadership there.  But all this has to be done while making the network profitable.

SDN and NFV won’t change the world unless we change what we expect them to support.  It’s not what they do, but what they enable that matters, and we have to get more direct links between SDN and NFV promises and the whole set of truths I’ve identified here if we want to move either or both technologies forward.

And in doing that, it’s not technology that’s the key, it’s operations.  The future will be more complicated than the present.  That’s always meant more expensive, and that cannot happen or we’ll never get to that future at all.

How to Solve Two Problems–With One Model

There are, as you’re all aware at this point, a lot of open questions in the SDN and NFV world.  Recently I covered one of them, the operations impact of both technologies.  While that’s undoubtedly the most significant open issue, it’s not the only one.  Another is the broad question of how networks and network operators are bound together.  Protocols have been layered for longer than most network engineers have been alive, and operators have interconnected their networks for even longer.  We have to be able to understand how protocol layers and internetworking work in any successor network architecture because we’re not going to eliminate either one.

Layered protocols presume that a given layer (Level 2 or 3 in a practical sense) consumes the services of the layer below, and those services abstract the protocol stack at all lower layers.  If we look at how this process works today, we find an important point.  A protocol layer either does what its service says it will do, or it reports a failure.  Implicitly that means that a protocol layer will attempt to recover from a problem in its own domain and will report a failure when it cannot.

In the good old days of data link protocols like HDLC, this meant that if a data link problem occurred the link protocol would ask for a retransmission and keep that up until it either got a successful packet transfer or hit a limit on the number of attempts.  For SDN at any given layer we’d have to assume that this practice was followed, meaning that it’s the responsibility of the lower layers to do their own thing and report a failure upward only when it can’t be fixed.  That’s logical because we typically aggregate traffic as we go downward, and we’d not want to have a bunch of high-level recovery processes trying to fix a problem that really exists below.

This could conflict with some highly integrated models of SDN control, though.  If an SDN controller is managing a whole stack of layers, then the question is whether it recognizes the natural layer relationships that we’ve always built protocols on.  Let’s look at an example.  Suppose we have a three-layer SDN stack, optical, L2, and L3.  We have an optical failure, which obviously faults all of the L2 paths over it, and all the L3 paths over those L2s.  What should happen is that the optical layer recovers if possible, so if there’s a spare lambda or whatever that can be pressed into service, we’d like to see the traffic routed over it at the optical level.  Since the low-level path that the higher layers expect has been fixed there are no faults above (assuming reasonable responsiveness).

But will the controller, which has built all the forwarding rules at all the levels, sustain all the presumptions of the old OSI stack?  If not, then it’s possible that the optical fault would be actioned by other layers, even multiple layers.  That’s not necessarily a crisis, but it’s harder to come up with failure modes for networks if you presume there’s no formal layered structure.  Where SDN controllers reach across layers, we should require that the layer-to-layer relationships be modeled as before.  Otherwise we have to rethink a lot of stuff in fault handling that we’ve taken for granted since the mid-70s.

If layers are the big issue for SDN, then the big issue for NFV is those relationships between providers.  Telecom evolved within national boundaries, and even today there are no operators who could say that they could deliver their own connections over their own infrastructure anywhere in the world.  We routinely interconnect among operators to deliver networking on a large geographic scale, and when we add in cloud computing and NFV feature hosting, we add the complication of perhaps sharing resources beyond transport/connection.

So suppose we have a simple service, a VPN that includes five sites in the US and five in Europe.  Even presuming that every site on each continent can be connected by a single operator, we have a minimum of two operators that have to be interconnected.  We also have to ask the question whether the “VPN” part of the service is provided by one operator with sites from the other backhauled to the VPN, or whether we have two interconnected VPNs.  All of this would have to be accommodated in any automated service provisioning.

Now we add in a firewall and NAT element to all the 10 sites.  Do we host these proximate to the access points, in which case we have two different NFV hosting frameworks?  Do we host them inside the “VPN operator” if we’ve decided there’s only one VPN and two access/backhaul providers?  Does a provider who offers the hosting also offer the VNFs for the two NFV-supported elements, or does one provider “sell” the VNFs into hosting resources provided by the other?  All of this complicates the question of deployment, and if this sort of cross-provider relationship is routine, then it’s hard to claim we understand NFV requirements if we don’t address it (which, at present, we do not).

But this isn’t the only issue.  What happens now if there’s a problem with the “VPN service?”  The operator who sold the service doesn’t own all of the assets used to fulfill it.  How does that operator respond to a problem when they can’t see the assets?  But would another operator provide visibility into their network or cloud to see them?

There is one common element here, which is that the service of a network has to be viewed as a black-box product exported to its user, and that product must include a SLA and a means of reporting against it.  The first point, in my view, argues against a lot of vertical integration of an SDN protocol stack and the latter says that the user of a service manages the SLA.  The owner of the resources manage the resources, what’s inside the black box.

Making this work gets back to the notion of formal abstractions.  A “service” has to be composed by manipulating models that represent service/feature abstractions.  Each model has to define how it’s created (deployment) and also how it’s managed.  This approach is explicit in TOSCA, for example, which is why I like it as a starting point for management/orchestration, but you can do the same thing in virtually any descriptive model, including ordinary XML.  If we take this approach, then layers of protocol can be organized as successive (vertically aligned) black boxes and inter-provider connections represented simply as horizontal structures.  The “function” of access or the “function” of transport is independent of its realization at the consumer level, so we don’t have to care about who produces it.

I think we’ve missed this notion in both SDN and NFV because we’ve started at the bottom, and that sort of thing encourages too much vertical integration because we’re trying to build upward from details to principles.  While the specific problems SDN and NFV face because of this issue differ, the solution would be the same—it’s all in the modeling.

Is There Substance in the “Fog?”

Cloud computing is probably the most important concept of our time, but also likely the most misunderstood.  It will never do what proponents say it will—displace private IT.  In fact, it’s not likely it will displace more than about a quarter of today’s IT spending.  However, it will generate new spending and in the end will end up defining how all of IT is used, and done.

The biggest problem we have with the cloud is that we think it’s an IT strategy and it’s in fact an application architecture.  Cloud computing is based on the fact that as you lower the price of network connectivity you eliminate the barriers to distribution of intelligence.  Componentization of software, already a reality for reasons of development efficiency, marries to this trend of distributability to create a highly elastic model of computing where information processing and information migrate around inside a fabric of connected resources.

What we’ve seen of cloud computing up to now has been primarily not cloud computing at all but the application of hosted IT to the problems of server consolidation.  Application development lagged the revolution at the resource end of the equation, and it still does.  That gap is disappearing quickly, though, as companies and vendors alike come to realize what can be done.  However, we’re stuck with the notion that the cloud is a replacement for the data center and that notion will take time to overcome.

Cisco might be seeing something here, though.  Most of Cisco’s marketing hype innovations, like the “Internet of Everything”, don’t do anything but generate media coverage—their intended purpose both on Cisco’s part and on the part of the media.  Cisco’s attempt to jump one step beyond the cloud (as the Internet of Everything was an attempt to jump beyond the Internet of Things), “fog computing”, may actually have a justification, a value.  If we assigned the notion of “the cloud” to the original hosted-consolidation mission, perhaps “the fog” as the name for where we’re headed might be helpful in making people realize that we’re not going to change the world by hosting stuff, but by re-architecting IT around a new model of the network.  But “fog” isn’t the most helpful descriptor we could have, obviously.  We’re light on the details that might tell us whether Cisco’s “fog” was describing or obscuring something, so we’ll have to look deeper.

The drive to the future of the cloud is really linked to two trends, the first being the continued (and inevitable) reduction in cost per bit and the second being mobility.  As I noted earlier in this piece, lower transport costs reduce the penalty for distribution of processes.  In a net-neutral world, it’s possible to protect low-cost transport resources inside a ring of data centers because these interior elements aren’t part of the network.  Thus, we have a situation that encourages operators to think of “interior lines of communication” because they don’t always cannibalize their current service revenues.  And mobile users?  They are changing the industry’s goal from “knowledge networking” to “answer networking”.

I’ve noted in prior blogs that it’s easiest to visualize the mobile future as being made up of users who are, in a virtual sense, moving through a series of context fabrics based on factors like their location, social framework, mission, etc.  These fabrics represent information available about the specific thing they represent, but not available like the classic notion of IoT sensors to be read.  They’re analytic frameworks instead.  You could visualize a person walking down the street and as they move transiting an LBS fabric, their own social fabric, a shopping fabric, a dining fabric, even a work fabric.

The information from these fabrics is assimilated not by the user’s smartphone but by an agent process that represents the user, hosted in the cloud.  This process will dip into the available fabric processes for information as needed, and these processes will be a part of the “mobile service” the user sees.  Some of them will be provided as features by the user’s mobile carrier and others by third parties.

The cellular network and the mobile service network is now separated into two layers.  One is the user-to-agent connection, which would look pretty much like mobile services would look today, but with the primary traffic anchor point being not the public network gateway but the on-ramp to the cloud where agent processes are hosted.  The second layer is the inter-process links that allow agent processes to contact fabric processes for services.

Many of these fabric processes will be multi-tenant server-based applications that look a lot like the Internet elements of today, and many will be dedicated cloud components that are customized to the user.  Some of these fabric processes, like video delivery, will be able to utilize the connection to the user directly—they are brokered by the user’s agent but the agent doesn’t stand in the data path—while others will deliver information to the agent for analysis, digestion, and delivery to the user.  We could call the two classes of fabric processes Internet processes much like those of today, and Undernet processes that are structured more like back-end applications.

Things like the IoT are important in networking not because they’ll somehow multiply the number of devices on the Internet.  We all know, if we think about it, that IoT devices won’t be directly on the Internet—the process would be insecure and the notion of a bunch of anonymous sensors being queried for information is a processing nightmare.  We’ll have an IoT fabric process, probably several from different sources.  What’s important is that the IoT could be a driver of the Undernet model, which would create a form of intra-cloud connection whose character is different from that of traditional networking.  We don’t have users in the traditional sense, we have members that are application pools running in data centers.  It’s likely that these data centers will be fiber-linked and that we’ll then have a service layer created by virtualized/SDN technology on top.

Business productivity support, in the form of corporate cloud applications hosted in the public cloud or the data center, creates fabric processes in the Undernet.  Things like the SOA versus REST debate, even things like NFV and orchestration, become general questions for the binding of elements of intelligence that give a mobile user/worker what they need when they need it.  We lose the question of the at-home worker to the notion of the worker who’s equally at work wherever they are.  Collaboration becomes a special case of social fabric behavior, marketing becomes a location-fabric and mission-fabric intersection.  Service features and application components become simply software atoms that can be mixed and matched as needed.

Security will be different here too.  The key to a secure framework is for the Undernet to be accessible only to agent processes that are validated, and it’s feasible to think about how to do that validating because we’re not inheriting the Internet model of open connectivity.  You have to be authenticated, which means you have to be a proven identity and have a proven business connection to the framework so your transactions can be settled.

All of this is very different from what we have, so in one sense you can say that Cisco is right for giving it a different name.  On the other hand, it’s a major shift—major to the point where it is very possible that incumbency in the current network/Internet model won’t be too helpful in the Undernet model of the future.  We’ll still have access networks like we do now, still have server farms, but we’ll have a network of agents and not a network of devices.  So the question for Cisco, and Cisco rivals, is whether the giant understands where we’re headed and is prepared to move decisively to get there first.  Are the PR events the first signs of Cisco staking out a position, or a smoke screen to hide the fact that they aren’t prepared to admit to a change that might cost them incumbency?  We’ll probably get the answer to that late in 2015.