Policy Management for SDN and NFV

One of the things about hot trends that my surveys tell me frustrates users is the tendency to talk about something but never really define it properly or explain how it would work.  We’ve all seen that with things like SDN and NFV.  It doesn’t have to be one of the super-revolution trends either; I’ve noticed that we’re hearing more about “policies” in networking, and yet it’s hard to nail down what people are really talking about.  That’s too bad because policies could be critical in both SDN and NFV.

To paraphrase a famous saying, “ex pluribus, chaos” (“from many, chaos”, for those who don’t understand Latin and don’t want to bother running a translate program).  Many of the things we do when we talk about network deployment and management work fine when you’re not doing to many of them.  Many of the strategies for monitoring things like sensors are the same.  But if you’re confronted with millions and millions of things to deal with, they don’t work so well.

Perhaps the deciding problem with technologies like frame relay and ATM was the fact that “connections” that require network devices to know something about specific user sessions are simply not scalable to the level of the Internet.  With connectionless networks you have gadgets shuffling packets one at a time with no knowledge of the overall state of the session.  If you want to provide some particular QoS to the mix, you define class or grade of service and you tag traffic appropriately.  This creates a small number of “sub-networks” in a functional sense, and you can manage them.

The notion of “policy management” emerges from this approach because instead of saying that “connection x gets this QoS” you say that “packets meeting this criteria are handled thusly” which means that when conditions impact the delivery of a specific grade of service you have a set of policies to define what you do.  In effect, policy management disconnects service QoS from services and moves it toward infrastructure.  You don’t manage individual sessions or relationships, you manage the collective conditions that set QoS for the appropriately tagged items.

Policy management could obviously reduce the complexity of SDN management because it would focus management processes on sustaining grade-of-service behavior, presuming that if a packet is admitted to a grade of service (we could in theory block packets to enforce design constraints like total load) it will be handled correctly if all the gadgets associated with that grade of service are performing their tasks.  You could think of it as presumptive QoS.  Unless the network reports that something is wrong, I can presume it’s all going right.  This lets SDN design focus on establishing the traffic management rules for the various grades of service, and SDN management focus on making sure that everything that enforced those rules was working.  If it’s not there are policies to say what to do, and the limited number of grades of service make this scalable.

The nice thing about policy-managed SDN is that you can gather information from anyplace where traffic can be monitored, feed that back to some correlating logic, and pick a policy to do what’s needed.  The devices themselves need not be “managed” at all; you just infer device condition from traffic.

For NFV things are a little more complicated, and the reason is simple.  NFV in its ETSI ISG form presumes that virtual functions can be managed equivalently to real devices, which means that the virtual-element behaviors have to map somehow to a MIB that then interprets and presents them as appropriate status variables.  The question is whether it’s possible, or effective, to do this if the paradigm of network management in place doesn’t allow you to look at the specific state of a specific session.  If two virtual network functions are linked in a service chain, do I need to know what is wrong with the linking path if it’s degraded?  Can I infer, based on the fact that I’ve assigned that path to grade-of-service “C” and that “C” is degraded in a general sense, that this path is degraded?

The challenge in NFV is that while it’s not explicitly necessary to know the state of connecting resources, it’s explicit that NFV management knows the state of hosting resources.  If I construct a management view of a virtual function by combining the state of its hosting resources in detail, then add in a kind of fudge number representing what I think the state of connection resources are based on aggregate grade-of-service state, what kind of result to I end up with?  If I undertake remediation for a problem and pick hosting locations carefully to maximize QoS, can grade-of-service state alone insure I’ve not made a bad choice because of connecting resources?

One thing that policy management is really good for is sustaining services and QoS across administrative or technical boundaries.  If a network is made up of three “zones” each of which is managed autonomously, it’s very helpful to be able to communicate service behaviors and handling policies among the zone-keepers so that a consistent experience can be provided.  Each zone remains a “black box” but the properties of the boxes can be shared, and reports on properties and deviations can be made among the zones.

I think it’s very likely that policy management could sustain SDN service management, and because of the “zone” benefit it would be a good way to organize manageable SDN domains into an overall network.  Given that SDN should be able to abstract any service if it’s exploited correctly, that means it should be possible to manage NFV connection services, interior to the VNFs and outside, using policy management too.  The challenge is knowing how to do it.

The relationship between SDN and NFV is important for a lot of reasons, but if that relationship is important in determining how grade-of-service handling and policy management would be exploited by NFV, it might be critical.  Policy management wouldn’t necessarily change the way that NFV recognizes service events for each of the virtual components because remediation of server failures or other hosting issues would still require handling under at least some conditions, but if we were to visualize NFV as consuming “network-as-a-service” and “hosting-as-a-service” it’s possible we could then create a simple management framework for NFV.  That could answer a lot of the questions that stand in the way of making a strong NFV business case, so exploring the question might be a good topic for PoCs.

Three Steps to Prove NFV is Justified

NFV is all about hosting virtual functions, and justifying it means that the process of hosting functions is somehow better than creating network services by interconnecting fixed devices and custom appliances.  The question is whether that’s true, and it’s a question that’s becoming increasingly important to service provider executives who have to decide whether to take NFV beyond technical trials into some real fieldwork.

Three benefits have been touted for NFV.  First, that it reduces capex by substituting inexpensive software running on commodity hardware for more expensive custom network devices.  Second, that it improves service agility, meaning the ability of operators to quickly offer new services.  Third, that it improves overall operations efficiencies, lowering opex.  To prove NFV at the business level, you have to get enough of these benefits to drive the NFV ROI up above the targets set by carrier CFOs.  To see where we are, I’ll look at each of the benefits more closely, applying carrier data and my financial modeling.

The notion that NFV will reduce capital costs is generally valid where the legacy solution isn’t itself based on commodity hardware.  For example, consumer broadband gateway devices are very cheap and so it’s hard to establish a cost baseline in a hosted alternative that would be much of an improvement.  The challenge is that if you go to applications with more expensive gadgets involved, you find fewer of the gadgets.

Capital cost savings also depend on achieving reasonable economy of scale, which means a data center large enough to host functionality past the tapering knee of the Erlang curve.  Operators know that this likely involves pre-positioning cloud assets when NFV deploys and hoping they can capture enough function hosting to justify the cost.  Thus, NFV generally creates a first-cost risk for operators.

The net here is that operator executives I’ve surveyed are not of the view that capex savings will be enough to drive NFV, and they point out that NFV alternatives to existing boxes are often more complex than single-box solutions, which means that it would be easy to eradicate even modest capex benefits with increased complexity and operations cost increases.

The service agility angle for NFV is similarly complicated.  There are really two “agility dimensions” here with very different needs and impacts.  One is agility that improves time to revenue per customer and the other is agility that improves time to market per opportunity.

When a customer orders something, the presumption is that they’d like it right now.  If there’s a delay in fulfilling the order then the carrier loses the money that could have billed during that delay.  Operators report that on the average the time to revenue is a bit less than two weeks, so that’s a half-month’s revenue or a gain of about 4%.  Nothing to sneeze at, right?

The problem is that this applies only to new orders, and it presumes instant provisioning via NFV.  Most operators report that less than 10% of their orders would qualify as “new”.  They also report that about half of those involve some physical provisioning step.  Overall they think that time to revenue would likely result in less than a half-percent gain.  Nothing to sneeze at, but not a big windfall.

The time-to-market agility is a bit more interesting.  Right now, operators’ own estimates are that an OTT can introduce a new “overlay” service in about 60 days, and operators take about ten months for the same process, five times as long.  That might effectively take an operator completely out of the market for a given service; they’d be so far behind competition it would be difficult to recover.

NFV is credible as a time-to-market accelerator providing that NFV’s service creation process and service deployment process are very easy to drive and very responsive.  There have been no proof of concept trials that have demonstrated a full NFV lifecycle in what I believe is a credible way.  I believe that NFV can in fact generate benefits here, likely benefits significant enough to drive as much as a 5% revenue gain for operators, but I can’t guarantee that the tools to address this opportunity are out there.

That leaves the most complicated of the issues, operations efficiency.  If you read through the material produced by the ETSI NFV ISG, it’s hard not to see an NFV service as anything other than highly complex.  Even in simple topology terms, a given function now performed by a single device might have to be decomposed into a chain of functions described in a forwarding graph, deployed on separate VMs and linked by network paths created at the time of deployment.

How complex the picture might be depends on the nature of the function and the placement of the components relative to the placement of the original device.  My example of the consumer broadband gateway is a good one.  I can stick a box in the customer prem and (according to operators) it’s likely to stay there for five to seven years.  It’s more likely to be returned because the customer moved than because it broke.  Operationally it presents no cost once it’s installed.  If I virtualize most of the functions, I can make the box a little cheaper, but I always need a box to terminate the service, and my virtual version of the features will have to be maintained operationally.  If I use COTS servers to replace any box, I have to look at the MTBF and MTTR of those servers and the connecting network elements, compared to the same data on the real box.  Sure I can use redundancy to make an NFV version of something more available, but I can’t do that without creating complexity.  Two components means a load-balancer.  What if that breaks?  You get the picture.

This is, in my view, the practical where-we-are with respect to NFV.  We are proving that most of the technical principles needed to deploy NFV are valid.  We are proving that our current approach is valid for the test cases we’ve selected.  We have yet to prove the business case because we have yet to prove all the benefits we can secure, net of incremental costs of any sort, can rise above our ROI targets.

I believe that we will prove the case for NFV, but I think getting to the proof points is going to come only when we accept the need to test not the technology itself but the application of that technology at the scale of real services.  We’re probably not going to even start doing that until well into 2015.

I also believe that we have to explore things like edge-hosting of functions on custom devices.  If service agility is the goal, and if business customers normally keep their service makeup constant for a long period of time, there’s nothing wrong with deploying firewall or other features in the edge.  Such a decision scales cost with revenue (you deploy an agile box only when you sell service), it’s operationally simpler (no cloud data center, no virtual connections), and it doesn’t require all the MANO technology that’s needed for full virtualization.  Maybe there’s an on-ramp to full NFV through this sort of thing.

NFV is a good idea.  SDN is a good idea.  Realizing either means more than proving it can work, it means proving that it’s justified.  Validating the three net benefit opportunities is the only way to get there.

Could the Next SDN Battleground be the Branch?

It’s hard not to see the VMworld show as anything other than a VMware-versus-Cisco extravaganza, partly of course because that’s how it tends to be portrayed in coverage.  Underneath, there is surely good reason to see the networking developments in particular as being contra-Cisco, but I wonder whether there’s more to it.  Cisco might be collateral damage here, not the target of VMware’s positioning.  In fact, some of VMware’s allies may be as much at risk as their enemies.

Years ago, when talking about the Cisco-Juniper rivalry (the vogue at the time, now old news given Juniper’s decline in influence) I noted that all of the tea leaves were aligning to predict a data-center-driven future.  That meant that whoever could take a decisive position there would have unusually strong influence over buyers.  At the time, servers were the markers for data center market participation and Cisco was getting into them with UCS.  Juniper, I said, had to get going to make the network the driver.

Today it’s a bit different.  There’s also been a decades-long rivalry between networking and IT to define the critical boundary between the two as we move into a future where everything seems to be cloud-based.  VMware is IT, and Cisco is networking.  But VMware doesn’t have servers, so they face a challenge similar to that of Juniper in the past—how do you be a driver of the data center without servers?  The answer, obviously, is software.  Data center software, built around virtualization, has given VMware a seat at the strategic table.

Whose seat did they take?  Or seats.  The thing about VMware’s data center software positioning is that it’s necessarily hardware-commoditizing.  If you don’t make servers you hardly want to picture them as the center of the IT universe.  What you try to do is to make the features of the data center of the future hardware-agnostic so you can sell onto any convenient platform.  Virtualization happens to be the perfect path toward that goal since it creates virtual servers.  The natural strategy of VMware is the right one, which is a very lucky situation to be in these days.

One of the vulnerabilities of VMware is still that IT/network boundary.  Juniper didn’t seize the initiative years ago, but neither did Cisco.  For an IT player like VMware, the obvious strategy in networking is to follow the same virtualization-driven-and-hardware-anonymizing approach that works for servers, which is exactly what NSX and EVO: RAIL are doing for them.  Virtualize the network; that’s the solution that software/IT providers have to see as the right one.

Considered in this light, Cisco has the right approach with ACI.  If the border is the war zone, you fight in the border area not down in the interior.  Cisco’s challenge with ACI isn’t that they don’t know the right answer or the right place to apply it, but that they don’t want to see the underlying network anonymized.  Their problem, the failure of ACI in my view, is that Cisco needs to protect network infrastructure from anonymization by providing a better way to implement virtualization at the network level.  I think they intend their policy-by-zone approach to be that, but they’re dragging their feet in getting it out.

VMware’s approach is simple (which is why it’s dangerous).  They’ve started with the simple Nicira overlay SDN model and added a lot of meat to it, with what I think is the ultimate goal of making NSX into the service and application network layer of the future.  They don’t want to build Level 4, they want to build “Level 3a”, to add a layer or sublayer to OSI that replaces current hardware-coupled Level 3 as the basic network service.  This then elevates the application features of networking to the new layer, and that disintermediates the hardware below.  It’s the SDN model I advocated in a previous blog, though it’s not yet complete in that NSX is still locked in the data center.

Network segmentation a la Nicira is valuable for the public cloud because of multi-tenancy.  It’s less valuable for the enterprise because they are only one tenant.  The new NSX enhancements and the EVO: RAIL stuff are obviously aimed at making VMware and NSX more useful to enterprises, but there’s still a missing ingredient.

Enterprise segmentation has to be based on some logical division of network resources, and in the enterprise that division would be by application.  Application-specific networks are great in one sense; they allow you to apply different QoS rules and impose different access rules.  But you have to have access meaning that you have to somehow get the user into the process.  This is where things are harder for VMware and potentially easier for Cisco.  If you can extend your SDN model to the user, you can provide the true overlay network.  If you don’t, then you are still covering just a piece of the proverbial elephant with a skimpy (for the enterprise) blanket.

The problem is that a true end-to-end overlay architecture for SDN could be enormously disruptive to the networking market.  Alcatel-Lucent has a model that’s close to the right thing in Nuage, but their positioning of it suggests either a lack of verve and flair or a subliminal desire to avoid rocking the switch/router boat too much.  IP is a strong spot in their portfolio after all.  So it would still be possible for Cisco to go out and beat the drum, at least in the sense that nobody else is leading the parade.  Does Cisco want to lead here, though?  Probably it’s at least as ambivalent as Alcatel-Lucent may be.

Which takes us back to VMware.  To make their vision the market leader, VMware has to get NSX out of the data center and into the branch.  That means that VMware has to virtualize the branch just as they virtualized the data center.  That means articulating a compelling vision of just what’s out there to be virtualized and how it could be done.  Technically, it means somehow getting a software foot in the door of the branch because you can’t create an overlay network if you can’t place an element of that network where you need to be.  VMware is in the data center; they need to be in the branch too.

And Cisco and other network vendors have to keep them out, the only strategy for which is to get there first with overwhelming force.  So while the data center is the focus of the network in most ways, the branch may be its strategic focus.

Amazon’s Influence Might Change OTT Opportunities, and Create Carrier Opportunities

Amazon isn’t just your mother’s online retailer anymore, obviously.  The company has evolved through its position as an ebook provider and public cloud provider, into a video and online music streamer, and now it’s looking at gaming and advertising.  Could all of this revolutionize the OTT market?  Certainly it could revolutionize Google, but the OTT space is already under secular pressure so it may be more a matter of direction than purely of impact.

The problem with OTT is that everything can’t be free; in fact practically nothing can be.  What we’re really talking about isn’t free-ness, it’s simply making the payment for something indirect.  Advertising sponsors many of our online activities, but the global adspend has actually trended downward recently, partly for economic reasons but partly because better targeting means advertisers can reach who they want at a lower total cost.  Companies like Google are looking at a pie that’s smaller, or at least not growing much, and more competition for what there is.

There’s surely been more advertising competition; Facebook comes to mind.  Why would Amazon be more of a threat than Facebook, though?  After all, Amazon’s initial target is ads placed on its own pages by Google.  Well, we can’t say for sure how all this would work at this point, but there are a bunch of interesting possibilities.

For one thing, Amazon’s core business is retail.  They can already “suggest” products to us and they know more about what we buy and who and when we buy from than a search provider would know.  Not only that, more and more people are doing “product searches” by going to Amazon to get information, pricing, and reviews because of the practices of SEO-driven parasitic websites that fill pages of your search results and provide you nothing.  I did a search this morning and one of the results was purportedly a list of the top responses to that search!  Didn’t Google give me ranked results?  Google wants to sell search ads, but every ad or every SEO-based placement hurts the credibility of search and drives people to sites like Amazon.  Now Amazon wants to be able to leverage what it knows about its customers and come after other ad opportunities.  That could hit Google’s display ads, and of course other companies as well.

The second point is Amazon’s Fire Phone.  Google already knows it’s vulnerable in the ad-word business as people shift to mobile devices; you search differently in mobile and it’s more likely to be related to retail or social drivers than to general research.  Google has not really been able to leverage Android as much in mobile advertising as it might like.  Amazon’s prospects for its phone are far from clear, but if you were to start with the phone and its Amazon Prime medium, you’ve got a regular revenue stream from Prime, knowledge of buyer purchase patterns from Amazon’s retail site, backup for location- and context-based services from AWS/EC2…you’ve got a lot.  Then there’s the chance that Amazon might decide to become an MVNO and subsidize part of its cost from ad revenues, giving it more penetration and making an Android device perhaps the Trojan Horse for a threat to Android’s owner.

Then there’s gaming.  Amazon is now getting into online gaming, something that can again leverage its cloud, tablet, and phone positions.  Online games are also a nifty way to introduce ads; “product placements” in TV shows are obviously a small step away from a product placement in a game.  Amazon’s retail strength also means it could give users a reference to an online product when they searched for something in “traditional” product form.  Want a video game?  Try this online game.

All this comes at a kind of bad time for Google.  The company has launched a bunch of things, including Google+ and Gmail and Google Voice, that incur costs and don’t directly add to revenue.  It’s also jumped into high-speed Internet via fiber in at least a number of locations.  Fiber access from Google may be fascinating to our media friends, but the margins on broadband wireline Internet truly suck in comparison with Google’s current return on capital.  So now do they have to spend more, or spend faster, to counter Amazon.

Here’s the reality, IMHO.  Facebook is not going to challenge Google; they can tap off some mobile ad revenue but they’re not a game-changer for Google and nobody knows how long Facebook can even sustain social-network interest.  “Social” networking is really about fads, and another could come along.  Amazon is no fad and Google knows it.  They’re now faced with the ugly question of whether they should attack Amazon by pushing back harder where Amazon is trying to get into Google’s pockets, or whether they should try harder to develop competition for Amazon’s retail and cloud capabilities—things that are the basis for Amazon’s core business.

All of this could be good news for network operators.  There’s a whole flood of contextual services opportunity coming down the pike, and what operators don’t want is for the conventional OTT model to develop for these opportunities.  Unfortunately the operators are so far rather clueless about how to proceed and they’re likely to take some time developing their own positions, by which time an OTT or two may be solidly entrenched and hard to displace.

This is an example of what I think could be an actual driver for SDN and NFV.  If we assume that the future OTT market will be a lot different from the present one (for whatever reason or combination of reasons) then there’s a chance for even slowpoke innovators to do something useful.  Amazon might create a revolution that will leave some niches uncovered, niches that could be addressed in part by a feature architecture that relied on new connectivity models (via SDN) and the deployment of extemporaneous features (NFV).

An opportunity, but not a lock.  If the operators get too focused on saving money on old services (whose profitability is doomed anyway) and forget what’s needed to deploy new ones, they’ll be old-fashioned competitors in a market even newer than the one we have today.

Why SDN and NFV Could Still Fail Utterly

There’s a popular view that as we move into the future, operators will build networks from commodity/commercial off-the-shelf servers (COTS) rather than specialized network equipment.  Network Functions Virtualization (NFV) is the poster child for the notion, and there have been a flood of announcements from vendors who have made hosted network functionality available under the banner of “NFV”.  That most of the stuff isn’t NFV at all but just something that might be deployable under a conformant NFV implementation is one of my gripes.  Another is the “COTS” angle.

Take a look at Amazon some time and you’ll see an immediate issue with COTS.  You can go out today and buy a tower server configuration for about eight hundred bucks.  A 48-port off-the-shelf gigabit Ethernet switch costs about half that, and obviously you’d need to add a lot of network cards to a server to make it into a 48-port switch.  This raises two important points in itself.

Point one is that most network devices today are specialized in some way that COTS servers aren’t.  The simple example of the number of Ethernet ports comes to mind, but it’s also important to remember that Intel developed DPDK to provide an accelerated data path to suit applications that were more data-plane-centric than usual.  You could probably improve the performance of servers in routing applications in particular if you added content-addressable memory to them.

The second point is that we already have cheap network devices; they’re just not from name-brand vendors with a bunch of management bells and whistles or support for arcane evolving protocols.  I’ve had some of my Ethernet switches for a decade or more at this point and nothing has broken, so you can’t say they aren’t reliable.  I have a hard time keeping a server for five years.

The lesson we should be learning from our little shopping excursion is that the network functions we see the most in the network are functions that are probably not going to be replaced by COTS, but if you still have doubts, look at the next point, the “SDN revolution”.

SDN in its “purist” OpenFlow form says that we can take the software logic out of switch/router devices and centralize it, then use that central logic to drive changes in forwarding tables in white-box bit-shufflers.  Nobody is proposes white-box COTS here, folks.  We admit that handling data flows is probably best left to specialized silicon.

SDN also introduces another obvious question in evolution, which is whether total cost is really being addressed here.  Can we make SDN switches cheaper than switches from Cisco or Juniper?  Sure, but we’ve already proved we can make Ethernet switches cheaper than those vendors’ products, without exposing ourselves to this new centralized-network-and-OpenFlow thing.  And how exactly do we control these new SDN networks?  How much will operations cost?  How do they scale and interconnect?  Without firm answers to these questions we can’t say whether an SDN switch can replace an Ethernet switch at acceptable levels of reliability and performance, much less TCO.

Operations costs are the big question for both SDN and NFV because a virtualized solution to networking in either form requires a combination of hosted elements and agile interconnections.  In the good old days, a router was working when it was working.  A virtual router is working when all its components are running on VMs that are assigned to servers performing within spec, interconnected by pipes that are all delivering on the QoS metrics expected (some of which may be passing over “real” routers!).  We might, for a given implementation of a network appliance, need two or three different VMs and four or more pipes.  Generally, operations costs are proportional to complexity.  Generally, the MTBF of five to eight components is lower than the MTBF of one component, not even considering that few COTS servers can match network devices in the MTBF space.

Then there’s Facebook, whose Open Compute Project is dedicated to the notion that a cloud data center needs specialized technology, not COTS and not even traditional networking.  Two interesting points here; first that “COTS” may not be the right approach even for server hosting of web applications and second that you don’t have to match the mass market to get economies of scale.  Facebook stays that even their own server use could justify specialized design and manufacturing and still create lower TCO than commercial products do now.  Sure, they’d like an open initiative to make it better, but they’re not waiting for that.

I’m not saying that SDN or NFV aren’t going to work, but I am saying that the notion that networking is going to move entirely to either of these things is nonsense.  In fact, my view is that conventional Ethernet and IP network architectures probably can’t be replaced by hosted applications effectively.  What we need to be doing is looking beyond that goal.  If you want to do networking better or cheaper, you have to network differently not replace network devices 1:1 by hosted equivalents.  I think that’s ultimately where SDN, NFV, Open Compute and all our other initiatives will end up, or they’ll die on the vine.

I also think that we have to operationalize differently.  Everyone knows that “provisioning” isn’t an orderly linear process anymore.  Yet we still think of OSS/BSS as applications, in an age where componentized software and workflows have made an application an almost extemporaneous combination of functional elements.  Even last week, we heard about operators deploying “applications” in operations.  It’s that kind of thinking that locks operations in the dark ages, and when we do that we lock TCO up along with it.  And we need to ask the question “If we modernize operations fully, how much cost improvement would we see even presuming legacy infrastructure and current levels of equipment competition?”

We have a thinking-big problem here, and we have a lot of thinking-small processes working to solve it.  Everything that works bottom-up is something that’s at least in part driven by the fear of changes on a large scale.  You can’t conceptualize a skyscraper while your mucking about in the pebbles of the foundation fill.  We’re accepting the goals of the past, the operations of the past, the architectures of the past, and saying we’ll then revolutionize the future.  I don’t think so.  It’s time to start imagining what could be and not replicating what we have using different tools.

Brocade Says a Lot About NFV: Is it the Right Stuff?

Most of our discussions of the competitive landscape in networking involve the larger firms; Alcatel-Lucent, Cisco, Ericsson, Huawei, Juniper, and NSN.  While it’s true these firms have the most influence and the greatest resources, they also have a powerful incentive to take root in the current market and become trees.  Mammals may be smaller, but mobility makes them more interesting, and in market terms the wolves of the SDN/NFV revolution might well be the smaller more agile firms.

One such company is Brocade, who transformed itself from a storage network vendor to a broader-based network vendor with the Foundry acquisition.  More recently Brocade acquired the Vyatta soft switch and router, and it’s here that they became interesting from the perspective of SDN and NFV.  They’re a big player in OpenDaylight in the SDN space, and last year they took a data-center-centric vision of NFV on the road and got surprising carrier traction.

What Brocade said was essentially what enterprises already know; the network in an IT-dominated world starts with and orbits around the data center.  You have to architect the data center to build the network of the future.  What you host service features in for NFV, make money with in cloud computing, is data center equipment married effectively to networking.  It’s all true of course, and it was a very different story than operators were hearing from the Big Six.  But Brocade wasn’t able to turn the story into actionable insights and they lost the strategic influence they’d gained with their story.  Now they want it back, and their earnings call yesterday gives us a chance to see what they think might work, and gives me a chance to express what I think about their approach.

Brocade saw an uptick in both SAN and WAN—five and nine percent sequentially, respectively.  On the call, they reiterated their key point—it’s about the data center—and augmented it by saying that the industry transformation now underway favored them.  That’s certainly true, and validated by the strategic influence uptick they had last year just by taking their story out to carrier executives.  They can get in to see the big guys, and tell a story there, which is important.

They also made a point about software networking, which is absolutely key.  The SDN and NFV revolutions are dilutive to the network equipment providers, and so there’s little incentive for them to drive things in a software direction.  Yet software is where we need to go.  The best strategy for a little guy against a giant is to lay a trap, which in this case is to accentuate the aspect of the revolutionary future that large players will be most uncomfortable with.  Your hope is to get them to pooh-pooh the thing that’s really important, because for them it’s really inconvenient.

Brocade thinks this formula is going to work, and they’re forecasting IP growth of 6-15% q/q.  That’s growth any of the Big Six would kill for even at the low end.  A big part of their gains are going to come from their marriage of Ethernet fabric and software—from SDN—and that’s an area where Brocade management specifically feels that they’ve got an agility edge over the incumbents.  They mention SDN and most significantly NFV on their call—they talked more about it than any of their larger competitors in fact.  There is absolutely no question that Brocade thinks the network revolutions of our time are their big opportunity.

So where are the issues?  First, I’d have to point out that Brocade got a big pop in credibility last spring when their data-center-centric NFV story hit.  The problem was that they’d lost virtually all of it by the fall, and the reason was that they couldn’t carry the story forward in enough detail.  In the fall, operators said that they thought Brocade had the right idea but didn’t have a complete NFV solution baked yet.  They still felt that way this spring, despite the fact that Brocade plays in some of the NFV proof-of-concepts.  Thus, operators are on the fence with respect to whether Brocade is a wolf in my forest of competitor trees, or just crying wolf.

There’s some justification for that.  Brocade tends to position their Vyatta stuff as “NFV” when in fact it’s a prospective virtual function in the big NFV picture.  Where does that picture come from, if Brocade isn’t yet articulating it?  If the answer is “one of the Big Six” then Brocade is setting itself up to be a small follower to a giant leader.  If one of the major firms can drive NFV, why would they leave any spot for Brocade?  Not to mention the fact that the whole premise here is that those firms won’t drive it because it undermines their revenue base.  But if neither Brocade nor the Big Six offer the NFV solution architecture, Brocade is depending on an outside vendor like an IT giant to set the stage for it.

Might HP, or Oracle, or IBM, or Intel field the total NFV architecture of the future?  Sure, but in their own sweet time.  So here’s my view; Brocade has de-positioned its competitors but now it has to position itself and its allies.  A smart move at this point would be to lay out the right NFV architecture, address the questions, and assign the major roles to friendly players.  If Brocade could do that it could accomplish two critical things.

Thing One would be that it could help drive a business case for NFV.  Operators today, at the executive level, tell me that they are working to prove NFV technology but their trials are not yet comprehensive enough to prove a business case.  That’s because their scope is too limited—service agility and operations efficiency are secured meaningfully only if you can span the network, not just a tiny hosted piece of it.  If Brocade can advance NFV, their victory in positioning there could be meaningful.  Otherwise they’re playing checkers with the Big Six.

Thing Two is that it could make them a kind of “fair broker of NFV”, a player who is seen as understanding the process well enough to fit the pieces together in the right way.  That they have a nice big piece themselves only proves they have some skin in the game.

So that’s where things stand with Brocade, as we head into a critical period.  Operators tend to do strategic technology planning for the coming year in the September-November period.  This would be a good time for Brocade not to just stand, but to stand tall.

HP’s and IBM’s Numbers Show a Faceoff–On NFV?

HP reported its results and the numbers were favorable overall, with the company delivering year-over-year revenue growth for the first time in three years.  The only fly in this sweet ointment was that the great majority of the gains came in the PC division, which saw a 12% increase in revenue that management attributes to a gain in market share.  Even HP doesn’t think that sort of gain can be counted on for the future, and so it’s really in the rest of the stuff that HP’s numbers have to be explored.  To do that it’s helpful to contrast them with IBM’s.

HP revenues were up a percent overall, where IBM’s fell by two percent.  At HP, hardware revenues were up overall and software was off, where IBM’s were the opposite.  Both companies saw revenues in services slip.  For HP, the PC gains were a big boost for hardware (and IBM doesn’t have them any more), but the industry-standard server business also improved for HP and IBM is selling that business off to Lenovo.

It’s hard not to see this picture as an indication that IBM is betting the farm on software, and I think that the Lighthouse acquisition and the deal with Apple reinforce that view.  IBM, I think, is banking on mobility generating a big change in the enterprise and they want to lead the charge there.  HP, on the other hand, seems to be staying with a fairly “mechanical” formula for getting itself together, having announced no real strategic moves to counter IBM’s clear bet on mobility.  The major reference to IBM on HP’s call was the comment that HP’s ISA business benefitted from IBM’s sale of that line to Lenovo.

One bets on hardware, one on software.  One focuses on transformation in a tactical sense, and one on the strategic shifts that might transform the market.  Both companies took a dip in share price in the after-hours market, so clearly the Street wasn’t completely happy with either result (no surprise).  Which bet is the better one?

I think that hardware is a tough business these days, no matter where you are.  The margins are thin and they’re going to get thinner for sure.  You can divide systems into two categories—ISA and everything else—and the former is going to get less profitable while the latter dies off completely.  Given this, you can’t call hardware a strategic bet.  But, every cloud sale, every NFV deployment, every application deployment, will need hardware to run on.  If HP can sustain a credible position in the ISA business as IBM exits, it becomes the only compute giant that can offer hardware underneath the software tools that will create the cloud, NFV, maybe even SDN.  HP, who has not only servers but some network gear as well, is in a position to perhaps be a complete solution for any of these new applications.  IBM will have to get its revenue from software and services alone.

But it’s software and services that are likely to drive the revolution.  If IBM is guessing right about the future of mobile devices in empowering workers, they could tap into a significant benefit case that would drive integration and consulting revenues as well as software.  They could shape the next generation IT paradigm.  The question is whether they will do all of that, given that in the near term they’re not likely to gain much from the effort.  It’s always been a challenge to get salespeople to work on building a business instead of making quota in the current quarter.  Can IBM do that?

Interestingly, the 2015 success of both companies might depend on the same thing, something neither of them is truly prepared for—NFV.  Neither IBM nor HP mentioned NFV in their calls, but NFV may be the narrow application of a broad need that both IBM and HP could exploit to improve their positions.

NFV, in its ETSI form, is about deploying cooperative software components to replace a purpose-built appliance.  This has some utility in the carrier space, but most operators say that capex savings from this sort of transformation wouldn’t provably offset the additional complexity associated with orchestrating all these virtual functions.  What’s needed is a broader application of management and orchestration (MANO) that can optimize provisioning and management of anything that mixes IT and network technology—including cloud computing.

IBM and HP are both betting big on the cloud.  If the cloud has to be agile and efficient, it’s hard to see how you could avoid inventing most of the MANO tools that the ETSI ISG is implying for NFV.  Thus, it’s easy to see that a vendor with a really smart NFV strategy might end up being able to improve operational agility and efficiency for all IT/network mixtures, and boost the business case.  NFV might actually be critical for cloud services to support mobile productivity enhancement, both in making the applications themselves agile and efficient and in managing mobility at the device level.  MANO is the central value of NFV; make it generalized and you have a big win.  But it’s probably easier to generalize something you have than to start from bare metal, and it’s hard to say what HP or IBM has already.

HP had a strong NFV story in its OpenNFV positioning, but operators still tell me that there’s not much meat to this beyond OpenStack.  IBM has a much better implementation—SmartCloud Orchestrator is a TOSCA-based model for cloud MANO that could be easily converted to a complete MANO story—but they have been so silent in the space that most operators say they’ve not had an NFV presentation from IBM at all.

It’s my view that MANO is the point of greatest risk for HP, and not just in NFV.  If IBM were to come out swinging with SmartCloud Orchestrator even as it’s currently structured, they could claim better operationalization of all virtual-resource computing.  That gets them a seat at a lot of tables.  Furthermore, it would make it harder for HP to link its own hardware exclusively to network and cloud opportunities.  If you can drive the business case for the cloud, you can probably assume you can sell all the pieces to the buyer.  If somebody else (like IBM) drives the software side, it’s in their interest to commoditize the hardware part—throw it open for all comers to reduce the price, raise the ROI, and marginalize others who might want to control the deal.

I’m not trying to say that NFV is the answer to everyone’s fondest wishes.  I’m saying that wish realization will involve a lot of the NFV pieces, so it would be easier for someone who has NFV to stick quarters under a lot of pillows than it would be for somebody who has nothing in place at all.  Watch these two companies in their positioning of MANO-like tools; it may be the signal of which will emerge as the winner in 2015 and beyond.

A Carrier’s Practical View of SDN

Yesterday I talked about the views of a particular operator on NFV trials and evolution, based on a conversation with a very knowledgeable tech guru there.  That same guru is heavily involved in SDN evolution and it’s worthwhile to explore the operator’s SDN progress and directions.

A good place to start is with the focus of SDN interest, and where the operator thinks SDN trials and testing have to be concentrated.  According to this operator, metro, mobile, and content delivery are the sweet spots in the near term.  It’s not that they don’t believe in SDN in the data center or SDN in the cloud or in NFV, but that these applications are less immediately critical and offer less potential benefit.  In the case of data center SDN, obviously, the drive would depend on a large enough data center build-out to justify it, so it’s contingent on cloud and NFV deployment.

The issue the operator wants to address in the metro is that metro networks are in general aggregation networks and not connection networks, but we build them with connection network architectures.  Metro users are connected not to each other, directly, but to points of service where user experiences (including messaging or calling) are provided.  One logical question, asked by my contact here, is “What is the optimum connection architecture for aggregation in the metro?”  Obviously that will be different for residential wireline, wireless backhaul, and CDNs.  With SDN they should be able to create it.

For residential wireline networks, for example, the operator is very interested in using SDN as a means of managing low-layer virtual pipes that groom agile optics bandwidth.  One obvious question is whether emerging SDN-optical standards have any utility, and the operator thinks that will depend on the nature of top-layer management.  “Logically we’d probably control each layer separately, with the needs of the higher layer driving the commitments of the layer below.  But what if there is no top-layer management?”  The operator sees having an SDN controller do everything as a fall-back position should there be no manager-of-managers or policy feedback to link optical and electrical provisioning.

Even here, the operator is changing their view.  At one time they believed that it was essential for optical equipment to understand the ONF OpenFlow-for-optics spec, but now they’re increasingly of the view that having OpenDaylight speak a more convenient optical-control language out of one plugin and OpenFlow out of another would be a more logical approach.

Mobile SDN, as I’ve said in other blogs, seems to cry out for the notion of a new SDN-based service model that would through forwarding control create the agile path from the PGW to the cell where the user is currently located.  But the operator would also like to see some thinking around whether mobile Internet and content in particular don’t suggest a completely different model for forwarding everything to mobile users.  “Why couldn’t I make every mobile user a kind of personal-area network and direct traffic into that network from cache points, gateways, whatever?  We need some outside the box thinking here.”

This particular point raises a question for SDN management, the one that’s the most interesting to this particular operator.  If a collection of devices is designed to provide a well-known service like Ethernet, we have established FCAPS practices that we can draw on, based on the well-understood presumptions of correct behavior and established standards.  How do you represent something that isn’t a well-known service?  What would the “management formula” for it be?  According to my contact here, the utility of SDN may depend on the question of how management interplays with controller behavior when you create something new and different.

Management in SDN is an issue in any event, and at many levels.  First, while it is true that central control of forwarding can create a “service”, can that central point provide a management view?  Obviously what the controller thinks the state of the nodes and paths in an OpenFlow network should be doesn’t mean that the real world conforms to that view.  In fact, if we could assume that sort of thing we’d have declared “management by endorsement” the right answer to all our problems ages ago.  But what is the state of a node?  Absent adaptive behavior on a nodal level, what happens when a node fails?  If the adjacent nodes “see” their trunk to the failed node having dropped, they could poison all the forwarding entries to that trunk, in which case the controller would presumably get route requests for the packets the impacted rules had forwarded.  But will it?  Is there still a path to the controller?  And how about the state of the hardware itself?  Don’t we need to read device MIBs?  If we do, how is the state of a node correlated with the state of a service?

The second level is representing service-independent devices in a service-driven management model where we expect Ethernet services to be built using gadgets that have Ethernet MIBs.  Here’s a specific question from the operator:  Assume that you have a set of white-boxes providing Ethernet and IP forwarding at the same time, for a number of VPN and VLAN services.  These boxes have to look like something to a service management system, so what do they look like?  Is every box both a router and a switch depending on who’s looking?  Is there a big virtual router and switch instance created to manage?  If so, who creates it and parses out the commands that manage it?

This particular operator ran into these questions when considering the question of how NFV would see, use, or create SDN services.  Look at a service chain as an example.  In “virtual terms” it’s a string of pearls, a linear threading of processes by connections.  But what connections in particular?  How does NFV “know” what the process elements in the service chain expect to see in the way of connectivity.  The software has to be written to some communications API, which presumes some communication service below.  What is it?  A “logical string of pearls” might be three processes in an IP subnet or linked with GRE tunnels, or whatever.  How do we describe to NFV what the processes need so we can set them up, and how do we combine the needs of the processes with the actual infrastructure available for connecting them to come up with specific provisioning commands?  And remember, if we say that a given MANO “script” has all the necessary details in it, then how do we make that script portable across different parts of the network, different vendors?

Metro missions seem to dodge many of these issues because the metro network is already kind of invisible in residential broadband, mobile, and CDN applications.  Progress there, this operator hopes, might answer some of the questions that could delay other SDN missions, and management hopes that progress will come—not only from their efforts but from trials and deployments of other operators.  I hope so too.

A Look at an Operator’s NFV Position

I had an interesting discussion late last week with a true thought leader in the service provider networking space.  Not a VP but a senior technical person, this individual is involved in a wide range of both SDN and NFV activities for his company, and also involved with other operators in their own efforts.  It was interesting to hear the latest in our “revolutionary” network technologies from someone inside, and I’ll spend a couple blogs recounting the most important points, first for NFV and then for SDN.  I’ve changed a few unimportant points here for confidentiality reasons.

According to my contact, his company is certain to start deploying some “real NFV field trials” and also early customer offerings in 2015 and very likely to be doing something at the field trial level in late 2014.  However, it’s interesting to note that the provider he represents is taking a kind of “NFV by the layers” approach, and perhaps even more interesting to know why.

Early NFV initiatives for the operator are focused on service chaining applications aimed at virtual CPE, with the next priority being metro/mobile and content.  Service chaining is considered a “low apple” NFV opportunity not only because it involves fairly simple technologies, but also because the customer value proposition is simple and the provider’s costs can be made to scale reasonably well.  It can also prove out NFV orchestration.

The service chaining application for the business side is looking at two questions; whether you can really build a strong user value proposition for self-provisioned access-edge service features and whether the best model for the application would be one where a custom box hosts the features on the customer premises, or where a cloud data center hosted them.  The reason for this particular focus is that the provider does not believe that NFV management is anywhere near mature enough to secure a significant improvement in operations efficiency so service agility would have to be the primary driver.

The challenge on the demand side is a debate over whether business users would buy edge services beyond the obvious firewall and other security services if they were offered.  An example of such a service is DHCP per branch, which could at least let branch offices run local network applications if they lose access to a corporate VPN.  Similarly, having some form of local DNS could be helpful where there are branch servers.  Other services might include facilities monitoring, virus scanning, and local network and systems management.

There’s an internal debate on the credibility of on-demand portals.  Some provider sales personnel point out that buyers have not been beating the doors down for these services, but research seems to suggest that may be because they’re not inclined to be thinking about what they might buy were it offered; it’s not offered today and so they don’t have any reason to evaluate the benefit.  There’s also a question of how much these services would have to be integrated with centralized IT support to sell them to larger enterprises, who are the easiest sales targets because of the revenue potential.

On the residential side, the provider is really interested in how “Internet of Things” awareness driven by home control initiatives from players like Apple might open the chances for a home monitoring application.  The reason is that this operator has concluded that residential gateway applications are not a good service chaining opportunity; the devices now used are inexpensive and typically installed for a long period and central hosting would be mandatory if the goal was to replace customer-prem equipment.  If home control could be sold and made credible on a large enough scale and with a high enough level of feature sophistication, could it justify the application?

The next layer of interest for this operator is the management piece.  As I’ve noted, the operator doesn’t think the NFV management story is baked at this point, and they’re not sure how much could be gained in efficiency under a full implementation.  If NFV practices could improve overall management efficiency by 15% or more, then it would be fairly easy to justify using NFV MANO even to operationalize legacy components, but nobody is offering much of a story in that area yet and this operator won’t have an NFV deployment of enough scale to test management practices unless/until service chaining is deployed for both residential and business trials.  My contact is hoping to see NFV management advances that would let them test MANO more broadly than for pure VNFs but isn’t hopeful.  That means the second layer of NFV wouldn’t get wrung out until 2015.

The issue of breadth of MANO also applies in the third layer of NFV testing, which is the way in which NFV might interwork with SDN.  Here the primary area of interest is the metro/mobile network where EPC and CDN at the “logical network” level combine with agile optics and aggregation networks at the physical level.  The issue for the operator in this case has been a lack of clarity on how SDN and NFV interwork, something that they’ve pressed with both the ONF and the NFV ISG.

The particular area of concern is the management intersection.  NFV management, you’ll recall, is something this operator thinks is fairly immature, and they have a similar view on SDN management.  How the two could be combined is the function of two unbounded variables, as they say, and yet somehow there has to be a solution because the most obvious application of both SDN and NFV is the metro intersection of mobile EPC and CDN.  The operator would like to run a trial in this space in 2015 but so far is having issues defining who to work with.

This operator’s view of NFV justification is simple.  The “capex reduction” model offers them limited benefits, to the point that they wonder whether feature-agile CPE and portal-based service chaining would be a better deal.  They are interested in the service agility justification for NFV but they’re not sure whether the buyers really have enough agile demand to justify agile supply.  They are very interested in management/operations efficiency but they don’t think anyone is telling that story thoroughly.

This detailed look at NFV progress seems to show the same problem my survey earlier this year showed.  Operators are still grappling to size the business benefits of NFV, and part of that grappling is simply figuring out what benefits are actually out there.  We are definitely solving NFV problems, answering NFV questions, in early trials.  We’re just not attacking the big ones yet, and until we do we can’t judge just how far NFV can go and what it can do.

Finding the One Driver for the Future

Networking has, for decades, seemed to advance based on changes in how we do stuff.  We progressed from TDM to packet, from SNA to IP in business networks, and now we’re moving (so they say) from legacy IP and Ethernet to SDN and NFV and from electrical to optical.  Underneath this seeming consistency is an important point, which is that we had not a whole bunch of shifts in networking but two, and not on the “supply side” but on the demand side.

Starting back in the ‘50s when we began to apply computing to business, we realized that information had to be collected to be optimally useful.  Yes, you can distribute computing power and information to workers, but you have to collect stuff for analysis and distribution from a central point.  If you don’t believe that, consider how well your bank would work if every teller had to keep an independent record of the account of every customer who walked into a branch to make a deposit or withdrawal.

When computing made what was arguably the first of its major advances—the mid-60s with the advent of the IBM System 360 mainframe—we were still pushing bits at about 1200 per second.  Even 20 years later we were still dealing with WAN data rates measured in the kilobits, at a time when we’d already advanced to minicomputers and PCs.  The point is that the fact that the public network was based on relatively low-speed analog and TDM created a kind of network shortfall, and we had a lot of investment to be made simply exploiting the information centralization that had occurred while we were poking around with Bell 103 and 212 modems.

The challenge we have now is that we caught up.  We’ve had startling advances in network technology and so we can now connect and deliver the stuff we’ve centralized.

The second shift came about with the Internet and the intersection of the Internet with our first trend.  The Internet gave us the notion of “hosting”, or “experience networking” where we used communications not to talk with each other but with some centralized resource.  Broadband made that access efficient enough to be valuable, for education, shopping and entertainment.  We’re now pushing broadband to the consumer to the point where bandwidth that would have cost a company ten grand a month (T3 access) twenty years ago is less than a hundred a month today.

Some people, Cisco most notably, postulate in effect that what should happen now is a kind of reversal of the past.  Centralized information and content burst out of its cage by driving network costs downward.  The network was the limiting factor.  Now the idea is that the network’s greater capacity will justify a bunch of new content, new applications, new stuff that will drive up usage and empower greater network investment.

I’m not a fan of this view.  Lower cost of distribution can reduce the barriers to accepting new applications or experiences, but it can’t create the experiences or information.  Videoconferencing is a good example; a decade of promoting videoconferencing has proven that if we give it away people will take it, but they’ll avoid paying for it in the majority of cases.  Networking can’t move forward by doing stuff for free; you can’t earn an ROI on a free service.

What limits the scope, the value, of networking today?  You could argue that it’s not anything in networking at all but something back inside, the information or experience source.  Back in the mid-60s I heard a pioneer IT type in a major corporation tell executives that the computer could double the productivity of their workers.  Twenty years later, my surveys showed that almost 90% of executives still believed that was possible, and only a small percentage less believe it today.  But they believe that information will do the job and not connection.  The networking revolution of the future is dependent on IT, on backfilling the information/experience reservoir with more stuff to deliver.  The cloud, or how the cloud evolves, is more important to networking than SDN or NFV because it could answer the questions Why do we want to do this and How will we make money on it?

That doesn’t mean that we have to sit on our hands.  SDN and NFV represent mechanisms for adapting what the network can do and how cheaply it can do it.  They can change the basic economics of networking so that things that were impossible a decade ago become practical or even easy now.  Mobile networking is that kind of new force, and so what we should be looking to now to transform both networking and IT is how SDN and NFV and the cloud would intersect with the mobile trend.

Back in the mid-60s we were collecting transaction information by keypunching information from retail records.  How much broadband do you think businesses would be consuming now if that application was still the driver of data movement?  At some point in the future, when every worker has a kind of super-personal-assistant in the form of a mobile device and uses this gadget in every aspect of their jobs, we’ll look back on today’s models of business productivity and laugh.  Same with entertainment.  But it’s just as laughable to assume that we’d advance networking without mobility as to assume that punched cards could drive broadband deployment.

The battle for network supremacy and the battle for IT supremacy have always been symbiotic in the past.  Cisco’s success was as much due to the impact of the PC on business networking and the shift away from SNA that created as it was from the Internet—maybe even more.  The question is whether the next big thing will be, as past ones have been, a step by a new player into a niche created by another, or a leap by a player who has both network and IT credentials.  Cisco and IBM, arguably the giants in their respective fields, hope it’s the latter and that they’ll do the leaping.  The standards processes, the VCs, those who want to continue both network and IT populism hope that we can somehow do the former and advance as an industry.

Can we?  None of our past successes in networking or IT were fostered by standards and collective action.  I’d hope, as most of you likely do, that it can be different this time, but great advances in an information age are likely to demand great changes with massive scopes of impact, and it’s not going to be easy to let go of all our little projects and envision a great one.  But only a great change can bring great results.  Somehow we have to fuse IT and networking together, and into mobility.  Otherwise we’re going to cost-manage until we’re promoting accounting degrees instead of computer science degrees.