SDN Management: As Forgotten as NFV Management

I’ve talked a quite bit in 2014 about the operations issues of NFV, but much less about the issues associated with SDN.  We’re at the end of the year now, and so I’d like to rectify that in part at least by addressing SDN operationalization now.  There are links to NFV, of course, but also SDN-specific issues.

To open this up, we have to acknowledge the three primary “models” of SDN.  We have the traditionalist OpenFlow ONF model, the overlay “Nicira” model, and the “API-driven” Cisco model.  Each of these has its own issues, and we’ll address them separately, but some common management points will emerge.

OpenFlow SDN presumes a series of virtual or white-box elements whose forwarding is controlled by an SDN Controller component.  Network paths have to be set up by that controller, which means that there’s a little bit of a chicken-and-egg issue with respect to a “boot from bare metal” startup.  You have to establish the paths either adaptively (by having a switch forward an unknown header to the controller for instructions) or explicitly based on a controller-preferred routing plan.  In either case, you have to deal with the fact that in a “cold start”, there is no controller path except where a controller happens to be directly connected to a switch.  So you have to build the paths to control the path-building, starting where you have access and moving outward.

In an operations sense, this means that telemetry from the management information bases of the real or virtual devices involved has to work its way along like anything else.  There will be a settling period after startup, but presumably that will end when all paths are established and this should include management paths.  However, when there’s a problem it will be necessary for the controller to prioritize getting the paths from “devices” to controller established, followed by paths to and from the management ports.  How different the latter would be from the establishing of “service paths” depends on whether we’re seeing SDN being used simply to replicate IP network connectivity or being used to partition things in some way.

However this is done, there are issues related to the controller-to-device-to-management-model coordination.  Remember that the controller is charged with the responsibility for restoration of service, which means the controller should be a consumer of management data.  If a node fails or a trunk fails, it would be best if the controller knew and could respond by initiating a failure-mode forwarding topology.  You don’t want the management system and the controller stepping on each other, so I think it’s logical to assume that the management systems would view SDN networks through the controller.  In my own SDN model, the bottom layer or “Topologizer” is responsible for sustaining an operational/management view of the SDN infrastructure.  This is consumed by “SDN Central” to create services but could also be consumed on the side by an OSS/BSS/NMS interface.

The boundary between the Topologizer and SDN Central in my model, the alignment of services with resources, is also a useful place for SDN management connections to be supported.  Service management requires that a customer-facing vision align with a resource-facing vision (to use old TMF terms) to get meaningful service status.  So if we took the model I reference in the YouTube video link already provided, you could fulfill operations needs by taking management links from the bottom two layers and pulling them into a contextual processing element that would look a lot like how I’ve portrayed NFV management—“derived operations”.

If we look at overlay SDN we see that the challenge here is the one I just mentioned—aligning the customer- and resource-facing visions.  Overlay SDN simply rides on underlying switching/routing as traffic.  There is a logical overlay-network topology that can be aligned with the physical network by knowing where the logical elements of the overlay are hosted.  However, that doesn’t tell us anything about the network paths.

Logically, overlay SDN should be managed differently (because it probably has to be).  It’s easier if you presume the “real” network is a black box that asserts service connections with associated SLAs.  You manage that black box to meet the SLAs but you don’t try to associate a specific service failure to a specific network failure; you assume your capacity management or SLA management processes will address everything that can be fixed.

If we assumed that we had an SDN connection overlay on top of an OpenFlow, central-control, SDN transport network we could presume we had the tools needed to do service-to-network coordination, if a service model created the overlay network and was associated by my SDN “Cloudifier” layer with a “physical SDN” service.  This suggests that even though an overlay SDN network is semi-independent of the underlying network in a topology sense, you may have to make it more dependent by associating the overlay connection topology with the underlying routes or you’ll lose your ability to do management fault correlation or even effective traffic engineering by moving overlay routes.  Keep this point in mind when we move to our last model of SDN.

The “API model” SDN picture is easier in some sense, and harder in others.  Here the presumption is that “services” are policy-induced behavior sets applied through a chain of controllers/enforcers down to the device level.  This is in effect a black-box model because the service is essentially a set of policy invocations that are used to then drive lower-and-lower elements as appropriate.  It’s like “distributed central control” in that the policy control is central but dissection and enforcement are distributed.  When you want to know the state of something you’d have to plumb the depth of policy, so to speak.

Presumably, management variables would be introduced into this distributed-policy system at an appropriate, meaning local, level.  Presumably failures at a given level would create something that rose up to the higher level so alternatives could be picked, since the “failure” should have been handled using alternative resources at the original level had that been possible.  The problem obviously is all this presumption.  We really need to have a better understanding of how policy distribution, status distribution, and device and service state are related.  Until we do, we can’t say much about management here, but we can assume it would follow the general model of a “topology map” a “service map” and an intersection of the two to define management targets from which we have to obtain status.

The common thread here is that all the “SDN” mechanisms (not surprisingly) abstract services away from resources.  So, of course, does traditional switching/routing.  But remember that one of the goals of SDN was to create greater determinism, and that goal could end up being met in name only if we lose the management connections between the services and the resources that “determine” service behavior.  We’ve underplayed SDN management, perhaps even more than we’ve done for NFV management.

NFV management principles could save things, though.  I believe that the principles of “derived operations” that synthesize a service-to-resource management connection by recording the binding made when the abstraction/virtualization of a service is realized in NFV could be applied just as easily to SDN.  The problem, for now at least, is that nobody on either the SDN or NFV side is doing this, and I think that getting this bridge built will be the most important single issue for SDN to address in 2015.

I wish you all a very happy and prosperous New Year!

Illusion is the Enemy of Progress

There are a lot of illusions in life, and thus not surprisingly in networking.  One of the illusions is that there is great opportunity to be had by generating on-demand high-capacity services.  Another, possibly related to the first, is that there’s a high value to vertically integrating the operation of networks from the service layer all the way down to optics.  Offer fiber bits for a day and it will make you rich.

Frankly, I like illusions.  I wish most of the fables I’ve heard were true.  That’s not going to make it happen, though, and these two example illusions are no exception.  We have to look at why that is, and what might happen instead.  In the process, we’ll expose the biggest illusion of all.

High-capacity connections exist because there are concentrations of users, resources, or both and they need to be connected.  Most such concentrations occur in buildings because neither people nor servers do well out in the open.  Buildings don’t come and go extemporaneously, and so the need to connect them doesn’t either.  Businesses network facilities because that’s where their assets are, and those facilities are persistent because they’re expensive, big, and largely nailed down.

I’ve surveyed users for decades on their communications needs.  One thing that’s been clear all along is that site network capacity is a game of averages.  You get enough to support needs and even bursts on the average, and you make do.  If you were offered extemporaneous bandwidth like many propose, what you’d do is to run the math on what it would cost to continue to play the averages game, versus buying only what you need when you need it.  If you found that averages were cheaper, you’d stay the course.  Which means that on-demand capacity is not a revenue opportunity, but a revenue loss opportunity.  If you get lucky, they don’t buy it.  Does that sound like a good business plan to you?

There are situations where burst capacity is useful, but for the network operator these are found inside existing or emerging services and not at the retail level.  CDNs can always use a burst here and there, and the same will be increasingly true in evolving cloud-based services (what I’ve called point-of-activity empowerment).  In these examples, though, we’re seeing operators deciding to augment capacity and not selling on-demand services, so there’s no need to couple service-layer facilities to transport.

The service-to-transport coupling thing seems to be an invention of SDN proponents.  The sad truth is that our industry is almost entirely PR-driven.  People say what they need to say to get “good ink”, and so when something gets publicity there’s a bandwagon effect.  SDN and OpenFlow were really aimed at L2/L3 handling, but we spent a lot of time shoehorning OpenFlow principles into optics.  Think about it; here’s an architecture based on packet-level forwarding being applied to an opaque trunk or wavelength.

The worst thing is that it’s not needed, and in fact counterproductive.  As you go lower in the protocol stack you move away from “connecting” into “aggregating”, which means that your facilities are shared.  So here we are, proposing that service buyers could spin up some optics to give themselves a burst of speed?  How does that impact the capacity plans on which optical networks are based?

Every layer of a network needs to obtain service and a SLA from the layer below.  The responsibility for reconfiguration of connectivity or realignment of resources at a layer has to be within the layer itself, where the capacity plan under which the SLA is valid is known.  You don’t vertically integrate control of network resources.  You integrate a stack of services that create a retail SLA by enforcing capacity plans where aggregation takes place.  Services controlling multi-tenant resources is anarchy; everyone will want to go to the head of the line.

Why then, if so much of the view of future services is nonsense, do we hold to it?  The answer lies in that great master, central, illusion I mentioned.  The problem is that we are conditioned to think of the future of services and networks in terms of the present.  Our vision of NGN is that there isn’t going to be any—that the future network will simply be a bigger version of the current one.  We’ll use capacity exactly as we do now, but more of it.  Crap.

As bandwidth cheapens, it becomes useless except as a carrier of services.  Higher capacity introduces the potential for agility, to be sure, but we don’t have agile locations.  What changes things is mobility.  If we forget networking sites and start networking people, we get a completely different demand profile.  A single person moving about in a mobile world and asking their “personal digital assistant” for help with this or that is a very bursty user of information technology, unlike a group of fixed people in an office.  Their service could look different.

Not different in terms of “communications” though.  Sitting or standing or walking, they have the same voice and the same texting fingers.  What’s different is their information needs and the optimum way to satisfy them.  What shapes the future into something new and different and interesting is that workers and consumers will be empowered individually at the points of their activity.  That new model encourages us to view not networking differently, but IT differently.

In the future, computing and information is part of a fabric that supports movement of processing and capacity like the tides.  When people want to know something, we assemble what’s needed to answer their question and then send them the answer.  All the work is done in the IT fabric, all the connection dynamism that exists is within that fabric.  Old-fashioned connection services don’t change much, but that’s a good thing because enormous downward price per bit can’t be survived if you sell bits, but is essential if you’re selling a fabric of knowledge and decision support.

The cloud, as we conceive it today, is another stupid illusion, not because cloud computing is stupid but because we’re thinking stupidly about it.  This is not about utility computing; there’s no profit in it in the long run.  IaaS is hosted server consolidation, and the only reason anyone would “move an application to the cloud” is to get a lower price.  You make markets vanish to a point with that kind of thinking.  You certainly don’t build massive investment that change the nature of computing or networking.  The cloud will be transformational because we build the cloud with a transformational mission, and it’s that mission that will transform networking.  Not by transforming the services we now buy, but by giving us new service models at the information and knowledge level that we can only glimpse today.

Whenever somebody talks about new service opportunities, ask them how the same bit-pushing with a different pricing model is “new”.  Tell them you know what “new” really means.  Break some illusions, break some vendor hearts, and break out into the future.

Summing Up the Vendorscape For NGN

We’ve now looked at all of the classes of players who might be transformative influences on the road to NGN.  I hope that the exploration brought out some critical points, but in case they didn’t this is a good time to sum them up.  We’re heading into 2015, the year that just might be the most critical in network evolution since the ‘90s.  We won’t get to our NGN destination in 2015, but we’ll almost certainly set the path we’ll take.

Operators have told me many times that while they’re interested in evolutionary visions from people like me, they’re dependent on evolutionary visions from people with real products to change the shape of their networks.  This creates an almost-love-hate relationship between vendors and operators, a relationship you see at work in standards groups.  Operators think vendors are trying to drive them to a future model of dependency, and vendors think operators are mired in useless intellectual debates when there’s a need to do something—either advance or stop trying and make do.

All of this is taking place in the climate of steady erosion in revenue per bit.  The Internet and its all-you-can-eat pricing model has created a rich growth medium for businesses that rely on bits for delivery.  There is no question that consumers have benefitted from the result, and no question that startups and VCs have benefitted even more.  We’ve paid a price in loss of control of personal information, security problems, and so forth.  Who lost, though?  Mostly the operators, who have historically sold bits and watched that process become less profitable over time.

As I said in introducing this NGN series, things can’t go on this way and they won’t.  We have two possible ways to mitigate falling returns on infrastructure investment besides the obvious one of curtailing that investment.  One is to find a better way to reduce costs, so the revenue/cost curves don’t converge as fast, or at all.  The other is to find new services whose revenues aren’t explicitly generated by bit-pushing, services that offer a better ROI.

I think that both these two remedies are going to be explored.  The question is the extent to which we’d rely on them, and how the differences between the approaches would impact the network, now and in the future.

If we were to find a magic service touchstone that could add billions and billions to the carrier pie, we could delay or even eliminate the crossover between revenue and cost.  Had this been considered and supported properly ten years ago, it’s very possible that there would be little pressure on capex even with the current network equipment model.  Operators had “transformation” projects back then, but they were 100% dissatisfied with vendor support of them.  The “Why” is important because it’s still influencing things today.

The fact is that network equipment vendors don’t want transformation from a bit-driven revenue model to something else.  They believe the “something else”, which even then was known to be about software and servers, would simply dilute their influence.  It’s my view that we’re past the point where an “integrated” strategy would work.  We are now committed to having layered infrastructure, with a cloud layer, a grooming layer, and an optical layer.

This structure is important from a cost management perspective.  Cost management changes, of course, are easiest to make at the point where you’re making incremental investment or where you’re able to produce savings to cover the cost of the changes.  Were we to try to control network costs entirely with bits, we’d have to build upward from the optical to create a very efficient grooming layer that could then marry with the service layer.  Optical players would have to push SDN strategies far above optics to create that marriage, and vendors like Alcatel-Lucent who had both optical products and strong SDN positions would be favored.

It’s easier to see the transformation happening at the top, and moving downward.  Comparatively, we’re under-invested in the cloud layer today.  Most of the services we have today can be visualized as application- or service-specific subnetworks that unite cloud-hosted features with cloud-hosted application and service segments.  This model could create not only a more efficient service network, it could also be more secure and more agile in producing new revenues.  And all the agility everyone wants generates more variability in the mapping of features to resources.  We’ve proven with both NFV and SDN that we don’t even have the current level of resource-mapping agility properly operationalized.

The challenge that has to be faced here, no matter where you start, is that of operations efficiency.  Hosted assembled stuff is more complex than boxes, and that means more costly to run.  The difficulty you run into starting at the bottom is that you’re exposed to operational burdens of complexity slowly (which is good) but you’re also exposed to the benefits slowly (which is bad).  Evolving upward from optics is hard for operators.  They need to see benefits to offset any major changes in infrastructure.  It’s difficult for optical vendors, who have to promote a skyscraper-sized change from the basement of the building.  A requirement to support too much buyer education to make a sale makes salespeople unproductive.

So what this says, IMHO, is that operations changes really have to start with vendors in the cloud or service layer, which in the language of modern technology trends means “SDN” or “NFV.”  There’s been little progress in the details of SDN operations, and no indication that SDN operations practices could grow to envelop cloud elements.  This is why I put so much emphasis on an NFV strategy for vendors who aspire to NGN success.  However, NFV has focused so far only on the deployment differences between virtual and real infrastructure, when the real problem is that we’re not managing even real infrastructure optimally, and when ongoing management is more important than deployment.

Operations changes that start at the top can develop not only improved costs for current services and infrastructure, they can improve agility and thus theoretically improve revenues over time.  The ideal situation is that you come up with an operations approach that spreads over everything and immediately improves operations efficiency.  That would generate quick relief from the revenue/cost convergence, so much so it could fund some of your evolutionary infrastructure and service steps.

For 2015, we need to look at the higher layers of the network—the cloud and the virtual binding between the cloud and “the network” in its legacy sense.  All of the vendors I’ve mentioned could do something here, but obviously the ones to watch the most are those who either have a natural position in that critical juncture (IT giants) or who have some elements of the solution at hand and are simply searching for a business model to promote them.  I think we’ll see progress, perhaps as early as the end of Q1.

Can Second-Tier Network Vendors Win in NGN?

You generally find revolutionaries in coffee shops, not gourmet dining rooms or private clubs.  In the race for the right to shape the network of the future, the equivalent of a coffee shop is “second-tier” status.  You can see the candy through the window (to mix a metaphor) but can’t quite get at it—unless you break the window.  Brocade, Extreme, Juniper, and Overture are examples of this group, with Brocade and Juniper “logical contenders” and Extreme and Overture examples of possible upsets.

All of these players are squarely in that L2/L3 network space that, under my NGN model, is a prime target to get virtualized out of existence.  On the one hand, being virtualized is a bit like being emulsified—you probably would like the result more than the process.  On the other hand, the market leaders would like the process a lot less, which means these second-tier players might have a chance to take a lead in the transformation and gain market share while their bigger rivals are defending the past.

This proposition frames the optimum strategy for the second tier pretty well.  You have to be two things to be an NGN giant starting from the second string.  One is a purveyor of a clear evolutionary strategy, one that gets people committed to you without requiring them to fork-lift all their current gear.  The other is a visionary.  Nobody is going to hop from rock to rock to cross a flood if they don’t have a compelling reason to get to the other side.

If you lined up a table of second-tier players with ticks for points of strength and stake in the outcome, it would be hard to find somebody with more promise than Brocade.  The company is a combination of a data center switching vendor and a virtual router vendor, at a time when data centers are the future of the cloud layer and virtual routing is the future of L2/L3.  They have no major assets they have to protect, so they could mark a lot of others’ territory without stepping in anything unpleasant.  They have good operator engagement in both SDN and NFV, good representation in the standards groups that count, and a new CMO who has a reputation for aggression.  For now, though, the aggression isn’t manifesting as I’d hoped.  Brocade can support the NGN evolution but they don’t have the positioning to drive the engagement.  In a market where operators have to take a significant risk to evolve, you can’t win if you don’t inspire.

Juniper, sadly, looks a lot like the opposite.  They have a fixation for boxes and chips to the point where their CTO wants to talk them up to the Street no matter how convincing the trend toward a software-hosted future might be.  They have an activist investor who wants near-term share appreciation when the first steps of a revolution could put revenues at risk.  They haven’t been a strong marketing company in a decade, they’ve lost a lot of their key people in recent CEO shuffling, and they have never done a good software project or made a stellar acquisition.  But…they have some incredible technical assets, including base architectures that are cloud-, SDN-, and NFV-friendly that emerged before anyone had heard of the concepts.  Strong leadership, strong marketing, and a firm hand with the Street could even now propel these guys into being what they could have been years ago.  But getting those three things may be an insurmountable problem at this point.

So if the kings of the second string aren’t leaping at the throats of the establishment, are there any a bit further back in the pack who might?  Well, imagine Brocade without the virtual router.  You’d have Extreme (sort of, at least).  Extreme was one of the major switching players of the past, left behind as the market giants jumped in with broader portfolios and better account control.  The company’s SDN position is not only data-center-centric it’s cloud-centric for the operators, and they’re not a server/cloud vendor.  It’s also primarily an OpenDaylight positioning.  They have no real NFV position.  All this is bad unless you look at things as a blank slate.  There are plenty of assets out there that could be combined to create something very good, and Extreme has no barriers to picking one up.  And unlike rival-in-the-data-center Arista, Cisco isn’t suing them.

Then we have Overture Networks.  Overture is a very narrow player in networking, a Carrier Ethernet second-tier player in fact.  While they have good carrier engagement and are not strictly a startup, they are a private company not a public one.  Given these points, you could be justified in thinking that Overture has no business in this piece, but you’d be wrong.  Their NFV orchestration approach has always been one of the very strongest in the market, and their understanding of the realities of NFV and SDN in the business service space is similarly strong.  What’s held Overture back, and may still be doing that, is the fact that they’ve been unwilling to leap wholeheartedly into the NGN deep end.  If you sell Carrier Ethernet gear for a living it’s easy to understand why you’d be reluctant to have your sales people out there shilling for total network revolution.  Somebody else would get all the money.  But Overture is very close to being able to make a complete NFV business case, good enough to take most PoCs into a field trial.  If they go the rest of the way….well, you can guess.

You probably see the basic truth of this group at this point—sometimes those with fewer assets fight harder to protect them.  The challenge is that there is zero chance that a second-tier player could follow a market to success; they’d be left in the dust of their bigger rivals.  SDN and NFV leadership aren’t going to be attained by yelling the acronyms out while facing the nearest reporter, not anymore.  Trials are critical now, and you have to be able to present not only real assets but an actual business case to win in 2015.  If you don’t have something fairly progressive already in place at this stage, there is little chance to do that.

To me, Overture is the player to watch in this group.  While I still believe they underplay their own assets, they have assets to play.  There are perhaps two companies that could actually make a business case for NFV at this point, and they’re one of them.  However…they just don’t have a lot of upside unless it’s to get bought.  Carrier Ethernet gear isn’t the big win area for NFV, its servers and perhaps data center switching.  A marriage or merger of Brocade and Overture might be compelling; in fact, the only way that an NGN winner could emerge from this particular segment of the market.

Can the Optical Guys Get Out of the NGN Basement?

“The times they are a’changing”, as the song goes.  The pace and direction of the changes could be influenced by vendors agile and determined enough to get out there and take some bold steps.  We’ve looked at the IT giants who have the most to gain from a transition to a software-server vision of networking.  For all their assets (and likely ambitions) most of them are somewhat networking outsiders.  Cisco, the one who most clearly isn’t that, is far from being determined to be a proponent of software-server-centric shifts.  If we continue exploring the NGN transformation vendor landscape in order of decreasing potential gains, our next group is the optical vendors.

All of the optical vendors out there would benefit from a network transformation that skimped on spending on the L2/L3 part of the network, even if some of the savings went into buying software and servers.  Nobody believes that capacity needs won’t rise in the future, even if cost pressures increase.  Bits start in the optical layer.  The challenge for the optical vendors is creating a strategy that brings about a concentration of network-device spending on their own layer, almost certainly meaning extending their reach upward into the grooming part of the network.  This has to happen at a time when service strategy moves by operators seem to favor software/server vendors, far from the optical plane.

If you look at any realistic NGN vision, it includes some kind of “shim” between the optical part of the network and the service networks that could be created using overlay, hosted SDN, and NFV technology.  Today that shim is created by the L2/L3 infrastructure, meaning switches and routers.  Some of the features of the current L2/L3 can migrate upward to be hosted, but others will have to migrate downward toward the optical boundary.  I’ve suggested that an electrical grooming layer would emerge as the NGN shim, based on SDN technology and manipulating packet flows where optical granularity wasn’t efficient.  The optical players, in order to maximize their own role in the future, need to be thinking about doing or defining this new shim.

Adva, Alcatel-Lucent, Ciena, Fujitsu Network Communications, and Infinera all have optical network gear, and all have packet-optical strategies.  There are some differences in their approaches, the most significant perhaps being the extent to which vendor packet-optical strategies are influenced by current L2/L3 devices and positioning.  If we expect the NGN of the future to look different than networks of the present, it’s inevitable that the differences be primarily in how service-layer elements bind to optics.  A focus on current L2/L3 would tend to optimize evolution by reducing the differences between the then and the now.  We’ll look at how that, and other issues, impact vendors in alphabetical order as before.

Adva Optical is one of the smallest of our vendors, certainly in terms of number of web hits on “packet optical” plus their name.  Adva has partnered with Juniper to create a unified solution for packet optical, something that has plusses and minuses.  On the plus side, they aren’t committed to their own ossified L2/L3 approach, but on the minus side that’s likely what they’ll get from Juniper.  Partnerships are a very tough way to support a revolutionary change because they multiply the politics and positioning challenges.  Who ever heard of an aggressive partnership?  It’s going to be hard for Adva to be a conspicuous driver of progress toward a new optical/electrical harmony for NGN.

Alcatel-Lucent is an opposite, size-wise, but it may have some of the same issues as Adva.  Unlike Adva, they’re created by collision with Alcatel-Lucent’s own products at L2/L3.  You don’t have to be a market genius to understand how important switching/routing is to the company (Basil Alwan is the star of most of the company’s events).  On the other hand, Alcatel-Lucent did announce a virtual router and their Nuage SDN stuff is among the best in the industry.  So while Alcatel-Lucent is certainly unlikely to rush out and obsolete its star product family, it is in the position to support a graceful transformation if the industry really starts to move toward constraining equipment spending at L2/L3.  Combine that with Alcatel-Lucent’s strong position in NFV (CloudBand), and you have a vendor who clearly has the pieces of a next-gen infrastructure.  They also have “virtual mobile infrastructure” elements like IMS and EPC, a strong position in content delivery, and good carrier engagement.  Their optical stuff integrates well with all of this and even has generally harmonious management.  This isn’t an unbeatable combination, but it’s a strong one.  However, they’re likely to take the “fast follower” approach Cisco made famous, so we have to look for leadership from another player.

Ciena is an interesting player, with more PR traction for its approaches than the smaller players and also some very specific directions in NFV, something that only Alcatel-Lucent of the others can say.  They have a packet-optical strategy that’s a good fit to evolve to electrical grooming and no strong L2/L3 incumbency to protect.  While their new NFV approach is more a service (“pay-as-you-earn”) than a product, there is a product element to it and Ciena says they do have a strategy to let users wean away from the service-based model as they develop their markets.  Their Agility SDN controller (developed with Ericsson) is OpenDaylight-based and a good foundation for an evolving agile grooming layer.  On the negative side, they still haven’t created a cohesive picture of NGN and aligned their pieces to support it.

Fujitsu Network Communications (FNC) is a bit of an optical giant.  They have a wide variety of optical network devices and an Ethernet-based electrical aggregation and grooming approach that gets good scores from the operators.  While their electrical-layer positioning is very traditional, it certainly has the potential to evolve to the kind of model that I think will prevail by 2020.  What FNC lacks is positioning, perhaps more so than any vendor in the space.  Their SDN material, for example, is a useful tutorial but doesn’t provide enough to drive a project or even generate specific sales opportunities.  They have little presence in NFV and virtually nothing in the way of positioning or collateral.  I agree that NFV is above the optical space, but without talking about NFV and the cloud you can’t talk about NGN evolution.  These guys could be as big in the NGN space as anyone, or bigger, if they could sing and dance.

Infinera is our final optical player, a small and specialized one but big enough to have potential.  They recently hired Stu Elby from Verizon, one of the leading thinkers in terms of the evolution of carrier networks under price/cost pressures.  Infinera also has the DTN-X, arguably the most agile-optical-packet combination available, and a very practical (based on a Telefonica example) transport SDN example on their site.  Their DTN-X brochure has a diagram that is pretty congruent with the model of NGN I think will prevail, and this in a product brochure!  It’s impressive, for an optical player.  Their NFV positioning is very minimal, but more interesting and engaging than that of FNC for example.  If they made it better, if they created some bridge to the service layer in an explicit way, they could be powerful.

That “for an optical player” qualifier is the significant one here.  None of these players are setting the world on fire, positioning-wise.  Ciena and Infinera seem to be the leading players from the perspective of a strong and evolvable packet-optical-and-SDN story, but Alcatel-Lucent and FNC have the market position should they elect to make a bold move.  For all these players, the future may hinge on how well and how fast the IT giants move.  A strong story in the service or “cloud layer” of the network could effectively set standards for the grooming process that would tend to reduce opportunities by optical players to differentiate themselves.

That’s the critical point.  Bits are ones and zeros; not much else you can say.  If management and grooming are dictated from above then optical commoditization is the result of NGN evolution, not optical supremacy.  It’s going to come down to whether the service side or the transport side moves most effectively.

The Server Giants and NGN

Next year is going to be pivotal in telecom, because it’s almost surely going to set the stage for the first real evolution of network infrastructure we’ve had since IP convergence twenty years ago.  We’re moving to the “next-generation network” everyone has talked about, but with two differences.  First, this is for real, not just media hype.  Second, this “network” will be notable not for new network technology but for the introduction of non-network technology—software and servers—into networking missions.

Today we begin our review of the network players of 2015 and beyond, focusing on how these companies are likely to fare in the transition to what’s popularly called “NGN”.  As I said in my blog of December 19th, I’m going to begin with the players with the most to gain, the group from which the new powerhouse will emerge if NGN evolution does happen.  That group is the server vendors, and it includes (in alphabetical order) Cisco, Dell, HP, IBM, and Oracle.

The big advantage this group has is that they can expect to make money from any network architecture that relies on hosted functionality.  While it’s often overlooked as a factor in determining market leadership during periods of change, one of the greatest assets a vendor can have is a justification to endure through a long sales cycle.  Salespeople don’t work for free, and companies can’t focus their sales effort on things that aren’t going to add to their profits.  When you have a complicated transformation to drive, you have to be able to make a buck out of the effort.

The challenge that SDN and NFV have posed for the server giants is that the servers that are the financial heart of the SDN/NFV future are part of the plumbing.  “NFVI” or NFV Infrastructure is just what you run management, orchestration, and virtual network functions on.  It’s MANO and VNFs that build the operators’ business case.  So do these server players try to push their own MANO/VNF solutions and risk limiting their participation in the server-hosted future to those wins they get?  Do they sit back to try to maximize their NFVI opportunity and risk not being a part of any of the early deals because they can’t drive the business case?

The vendor who’s taken the “push-for-the-business-case” route most emphatically is HP, whose OpenNFV architecture is the most functionally complete and now is largely delivered as promised.  A good part of HP’s aggression here may be due to the fact that they’re the only player whose NFV efforts are led by a cloud executive, effectively bringing the two initiatives together.  HP also has a partner ecosystem that’s actually enthusiastic and dedicated, not just hanging around to get some good ink.  HP is absolutely a player who could build a business case for NFV, and their OpenDaylight and OpenStack support means they could extend the NGN umbrella over all three of our revolutions—the cloud, SDN, and NFV.  They are also virtually unique in the industry in offering support for legacy infrastructure in their MANO (Director) product.

Their biggest risk is their biggest strength—the scope of what they can do.  You need to have impeccable positioning and extraordinary collateral to make something like NFV, SDN, or cloud infrastructure a practical sell.  Otherwise you ask your sales force to drag people from disinterested spectators to committed customers on their own, which doesn’t happen.  NGN is the classic elephant behind a screen, but it’s a really big elephant with an unprecedentedly complicated anatomy to grope.  Given that they have all the cards to drive the market right now, their biggest risk is delay that gives others a chance to catch up.  Confusion in the buyer space could do that, so HP is committed (whether they know it or not) to being the go-to people on NGN, in order to win it.

The vendors who seem to represent the “sit-back” view?  Everyone else, at this point, but for different reasons.

Cisco’s challenge is that all of the new network technologies are looking like less-than-zero-sum games in a capital spending sense.  As the market leader in IP and Ethernet technologies, Cisco is likely to lose at least as much in network equipment as it could hope to gain in servers.  Certainly they’d need a superb strategy to realize opex efficiency and service agility to moderate their risks, and Cisco has never been a strategic leader—they like “fast-followership” as an approach.

Dell seems to have made an affirmative choice to be an NFVI leader, hoping to be the fair arms merchant and not a combatant in making the business case for NGN.  This may sound like a wimpy choice, but as I’ve noted many times NGN transformation is very complicated.  Dell may reason that a non-network vendor has little chance in driving this evolution, and that if they fielded their own solutions they’d be on the outs with all the network vendors who push evolution along.  Their risks are obvious—the miss the early market and miss chances to differentiate themselves on features.

IBM’s position in NFV is the most ambiguous of any of the giants.  They are clearly expanding their cloud focus, but they sold off their x86 server business to Lenovo and now have far less to gain from the NGN transformation than any of the others in this category.  Their cloud orchestration tools are a strong starting point for a good NFV MANO solution, but they don’t seem interested in promoting the connection.  It’s hard to see why they’d hang back this long and suddenly get religion, and so their position may well stay ambiguous in 2015.

Oracle has, like HP, announced a full-suite NFV strategy, but they’ve yet to deliver on the major MANO element and their commitment doesn’t seem as fervent to me.  Recall that Oracle was criticized for pooh-poohing the cloud, then jumping in when it was clear that there was opportunity to be had.  I think they’re likely doing that with SDN, NFV, and NGN.  What I think makes this strategy a bit less sensible is that Oracle’s server business could benefit hugely from dominance in NFV.  In fact, carrier cloud and NFV could single-handedly propel Oracle into second place in the market (they recently slipped beyond Cisco).  It’s not clear whether Oracle is still waiting for the sign of NFV success, or will jump off their new positioning to make a go at market leadership.

I’m not a fan of the “wait-and-hope” school of marketing, I confess.  That makes me perhaps a secret supporter of action which makes me sympathetic to HP’s approach more than those of the others in this group.  Objectively, I can’t see how anyone can hope to succeed in an equipment market whose explicit goal is to support commodity devices, except on price and with much pain.  If you don’t want stinking margins you want feature differentiation, and features are attributes of higher layers of SDN and NFV and the cloud.  If those differentiating features are out there, only aggression is going to get to them.  If they’re available in 2015 then only naked aggression will do.  So while I think HP is the lead player now even they’ll have to be on top of their game to get the most from NGN evolution.

Segmenting the Vendors for the Network of the Future

Over the past several months, I’ve talked about the evolution in networking (some say “revolution” but industries with 10-year capital cycles don’t have those).  Along the way I’ve mentioned vendors who are favored or disadvantaged for various reasons, and opened issues that could help or hurt various players.  Now, I propose to use the last days of 2014 to do a more organized review.  Rather than take a whole blog to talk about a vendor, I’m going to divide the vendors into groups and do a blog on each group.  This blog will introduce who’s in a group and what the future looks like for the group as a whole.  I’ll start at the top, the favored players with a big upside.

I want to open with a seldom-articulated truth.  SDN and NFV will evolve under the pressure of initiatives from vendors who will profit in proportion to the effort they have to expend.  A player with a giant upside is going to be able to justify a lot of marketing and sales activity, where one that’s actually just defending against a slow decline may find it hard to do that.  We might like to think that this market could be driven by up-and-comings, but unless they get bought they can’t expect to win or even survive.  You don’t trust itty bitty companies to drive massive changes.

And what’s the biggest change?  Everything happening to networking is shifting the focus of investment from dedicated devices toward servers and software.  It follows that the group with the biggest upside are the server vendors themselves, particularly those who have strong positions in the cloud.  This group includes Dell and HP, with a nod to Cisco, IBM, and Oracle.

The strength of this group is obviously that they are in the path of transition; more of what they sell will be consumed in the future if current SDN and NFV trends mature.  The reason we’re nodding to Cisco and Oracle is that neither company is primarily a server player.  Cisco’s network device incumbency means it’s at risk to losing more than it gains.  IBM and Oracle are primarily software players, and thus would have to establish an unusually strong software position.

The second group on my list is the optical incumbents.  In this group, we’ll count Ciena, Infinera, and Adva Optical, with a nod to Alcatel-Lucent.  The advantage this group holds is that you can’t push bits you don’t have, and optical transport is at the heart of bit creation.  If we could learn to use capacity better, we could trade the cost of capacity against the cost of grooming bandwidth and operating the network.

Optical people can’t stand pat because there’s not much differentiation in something that’s either a “1” or a “0”, but they can build gradually upward from their secure base.  The pure-play optical people have a real shot at doing something transformational if they work at it.  Alcatel-Lucent gets a “nod” because while they have the Nuage SDN framework, one of the very best, they still see themselves as box players and they will likely stay focused on being what they see in their mirror.

The third group on the list is the network equipment second-tier players.  Here we’ll find Brocade, Juniper, and Extreme, with a nod to other smaller networking players.  This group is at risk if money is shifting out of network-specific iron to servers as bigger players do, but they don’t have the option of standing pat.  All these companies would die in a pure commodity market for network gear, and that’s where we’re heading.  Brocade realizes this; the rest seem not to.

What makes this group potentially interesting is that they have a constituency among the network buyers that most of the more favored groups really don’t have.  They could, were they to be very aggressive and smart with SDN and NFV, create some momentum in 2015 that could be strong enough to take them into the first tier of vendors or at least get them bought.  They could also fall flat on their face, which is what most seem to be doing.

The fourth group is the network incumbents, which means Alcatel-Lucent, Cisco, Huawei, and NSN with a nod to Ericsson and the OSS/BSS guys.  The problem for this group is already obvious; any hint that the future network won’t look like the current one and everyone will tighten their purse strings.  Thus, even if these guys could expect to take a winning position in the long run, they’d suffer for quite a few quarters.  Wall Street doesn’t like that.

Ericsson and the OSS/BSS players here are somewhat wild cards.  Operations isn’t going to drive network change given the current political realities among the telcos and the size of the OSS/BSS upside.  Ericsson has augmented its operations position with a strong professional services initiative, and this gives them influence beyond their operations products.  However, integration of operations and networking is a big issue only if somebody doesn’t productize it effectively.

Virtually all of the players (and certainly all of the groups) are represented in early SDN and NFV activity, but what I see so far in the real world is that only three vendors are really staking out a position.  HP, as I’ve said before, has the strongest NFV and SDN inventory of any of the larger players and it’s in the group that has the greatest incentive for early success.  Alcatel-Lucent’s broad product line and early CloudBand positioning helped it to secure some attention, and Ericsson is taking advantage of the fact that other players aren’t stretching their story far enough to cover the full business case.  In theory, any of these guys could win.

That “business case” comment is the key, IMHO.  SDN and NFV could bring massive benefits, but not if they’re focused on per-box capex savings.  If all we’re trying to do is make the same networks we have today, but with cheaper gear, then Huawei wins and everyone else might as well start making toys or start social-network businesses.  Operators now say that operations efficiency and service agility would be needed as the real drivers.  The players who can extend their solutions far enough to achieve both these benefits even usefully much less optimally will drive the market in 2015 and 2016.  If one or two manage that while others languish, nobody else will have a shot and the industry will remake itself around the new giants.  That could be transformational.

Starting next week, I’ll look at these groups in detail and talk about who’s showing strength and who seems to be on their last legs.  Check back and see where your own company, your competitors, or your suppliers fall!

Raising the Bar on SDN and Virtual Routing

One of the questions being asked by both network operators and larger enterprises is how SDN can play a role in their future WAN.  In some sense, though it’s an obvious question, it’s the wrong one.  The broad issue is how virtualization can play; SDN is one option within that larger question.  If you look at the issues systematically, it’s possible to uncover some paths forward, and even to decide which is likely to bear the most fruit.  But most important, it’s likely you’ll realize that the best path to the future lies in the symbiosis of all the virtualization directions.  And realize why we may not be taking it already.

Virtualization lets us create real behaviors by committing resources to abstract functionality.  If we apply it to connection/transport (not to service features above the network) there are two ways that it can change how we build networks.  The first is to virtualize network devices (routers and switches) and then commit them in place of the real thing.  The second is to virtualize network paths, meaning tunnels.  I would assert that in the WAN, the first of these two things is a cloud/NFV application and the second is an SDN application.

When a user connects to a service, they get two tangible things.  One is a conduit for data-plane traffic to deliver stuff according to the forwarding rules of the service.  The other is “control plane” traffic that isn’t addressed to the other user(s) but to the network/service itself.  If you connected two users with a pipe that carried IP or Ethernet, chances are they wouldn’t be able to communicate because there would be control exchanges expected that couldn’t take place because the network elements designed to support them didn’t exist.

SDN in OpenFlow form doesn’t do control packets.  If we want an SDN network to look like an Ethernet or router network, we have to think in terms of satisfying all of the control- and data-plane relationships.  For IP in particular, that likely means providing a specific edge function to emulate the real devices.  The question becomes “why bother?” when you have the option of just deploying virtual routers or switches.

We couldn’t build the Internet on virtual routing alone; some paths have too much traffic in aggregate.  What we could do is to build any large IP network for an individual user, or even individual service, by segregating its traffic below the IP layer and doing its routing on a per-user, per-service basis.  That’s the biggest value of virtual routing; you can build your own “VPN” with virtual devices instead of with a segment of a real device.  Now your VPN is a lot more private.

The challenge with this is that below-IP segregation, which is where SDN comes in.  A virtual router looks like a router.  SDN creates what looks like a tunnel, a pipe.  That’s a Level 1 artifact, something that looks like a TDM pipe or an optical trunk or lambda.  The strength of SDN in the WAN, IMHO, lies in its ability to support virtual routing.

To make virtual routing truly useful we have to be able to build a virtual underlayment to our “IP network” that segregates traffic by user/service and does the basic aggregation needed to maintain reasonable transport efficiency.  The virtual subnets that virtual routing creates when used this way are typically going to be contained enough that servers could host the virtual routers we need.  The structure can be agile enough to support reconfiguration in case of failures or even load and traffic patterns because the path the virtual pipes create and the locations of the virtual routers can be determined dynamically.

This model could also help SDN along.  It’s difficult to make SDN emulate a complete end-to-end service, both because of the scaling issues of the central controller and because of the control-plane exchanges.  It’s easy to create an SDN tunnel; a stitched sequence of forwarding paths does that without further need for processing.  Transport tunnel routing isn’t as dynamic as per-user flow routing, so the controller has less to do and the scope of the network could be larger without creating controller demands that tax the performance and availability constraints of real servers.

If we suggest this is the appropriate model for a service network, then we can immediately point to something that virtual router vendors need to be able to handle better—the “adjacency problem”.  The trouble with multiplying the tunnels below Level 3 to do traffic segmentation and manage trunk loading is that we may create too many such paths, making it difficult to control failovers.  It’s possible to settle this issue in two basic ways—use static routing or create a virtual BGP core.  Static routing doesn’t work well in public IP networks but there’s no reason it couldn’t be applied in a VPN.  Virtual BGP cores could abstract all of the path choices by generating what looks like a giant virtual BGP router.  You could use virtual routers for this BGP core, or do what Google did and create what’s essentially a BGP edge shell around SDN.

This approach of marrying virtual routing with OpenFlow-style SDN could also be adapted to use for the overlay-SDN model popularized by Nicira/VMware.  Overlay SDN doesn’t present its user interface out of Level 2/3 devices, but rather from endpoint processes hosted locally to the user.  It could work, in theory, over any set of tunnels that provide physical connectivity among the endpoint hosting locations, which means we could run it over Layer 1 facilities or over tunnels at L2 or L3.

I mentioned NFV earlier, and I think you can see that virtual routing/switching could be a cloud application or an NFV application.  Both allow for hosting the functionality, but NFV offers more dynamism in deployment/redeployment and more explicit management integration (at least potentially).  If you envisioned a fairly static positioning of your network assets, cloud-based virtual routers/switches would serve.  If you were looking at something more dynamic (likely because it was bigger and more exposed to changes in the conditions of the hosting points and physical connections) you could introduce NFV to optimize placement and replacement.

I think the SDN community is trying to solve too many problems.  I think that virtual router supporters aren’t solving enough.  If we step up to the question of virtual networks for a moment, we can see a new model that can make optimal use of both technologies and at the same time build a better and more agile structure, something that could change security and reliability practices forever and also alter the balance of power in networking.

That’s why we can’t expect this idea to get universal support.  There are players in the network equipment space (like Brocade) who aren’t exposed to the legacy switch/router market enough that a shift in approach would hurt them as much (or more) than help.  Certainly server players (HP comes to mind, supported by Intel/Wind River) with aggressive SDN/NFV programs could field something like this.  The mainstream network people, even those with virtual router plans, are likely to be concerned about the loss of revenue from physical switch/router sales.   The question is whether a player with little to lose will create market momentum sufficient to drag everyone along.  We may find that out in 2015.

How Operators Do Trials, and How We Can Expect SDN/NFV to Progress

Since I’ve blogged recently about the progress (or lack of it!) from proof-of-concept to field trials for SDN and NFV, I’ve gotten some emails from you on just what a “field trial” is about.  I took a look at operator project practices in 2013 as a part of my survey, and there was some interesting input on how operators took a new technology from consideration to deployment.  Given that’s what’s likely to start for SDN and NFV in 2015, this may be a good time to look at that flow.

The first thing I found interesting in my survey was that operators didn’t have a consistent approach to transitioning to deployment for new technologies.  While almost three-quarters of them said that they followed specific procedures in all their test-and-trial phases, a more detailed look at their recent or ongoing projects seemed to show otherwise.

Whatever you call the various steps in test-and-trial, there are really three phases that operators will generally recognize.  The first is the lab trial, the second the field trial, and the final one the pilot deployment/test.  What is in each of these phases, or supposed to be in them, sets the framework for proving out new approaches to services, operations, and infrastructure.

Operators were fairly consistent in describing the first of their goals for a lab trial.  A new technology has to work, meaning that it has to perform as expected when deployed as recommended.  Most operators said that their lab trials weren’t necessarily done in a lab; the first step was typically to do a limited installation of new technology and the second to set up what could be called a “minimalist network” in which the new stuff should operate, and then validate the technology itself.

If we cast this process in SDN and NFV terms, what we’d be saying is that the first goal in a lab trial is to see if you can actually build a network of the technical elements and have it pass traffic in a stable way.  The framework in which this validation is run is typically selected from a set of possible applications of that technology.  Operators say that they don’t necessarily pick the application that makes the most sense in the long term, but rather try to balance the difficulties in doing the test against the useful information that can be gained.

One operator made a telling comment about the outcome of a lab trial; “A properly conducted lab trial is always successful.”  That meant that the goal of such a trial is to find the truth about the basic technology, not to prove the technology is worthy of deployment.  In other words, it’s perfectly fine for a “proof of concept” to fail to prove the concept.  Operators say that somewhere between one in eight and one in six actually do prove the concept; the rest of the trials don’t result in deployment.

The next phase of the technology evolution validation process is the field trial, which two operators out of three say has to prove the business case.  The biggest inconsistencies in practices come to light in the transition between lab and field trials, and the specific differences come from how much the first is expected to prepare for the second.

Operators who have good track records with technology evaluation almost uniformly make preparation for a field trial the second goal of the lab trial (after basic technology validation).  That preparation is where the operators’ business case for the technology enters into the process.  A lab trial, says this group, has to establish just what steps have to be proved in order to make a business case.  You advance from lab trial to field trial because you can establish that there are steps that can be taken, that there is at least one business case.  Your primary goal for the field trial is then to validate that business case.

More than half the operators in my survey didn’t formally work this way, though nearly all said that was the right approach.  The majority said that in most cases, their lab trials ended with a “technology case”, and that some formal sponsorship of the next step was necessary to establish a field trial.  Operators who worked this way sometimes stranded 90% of their lab trials in the lab because they didn’t get that next-step sponsorship, and they also had a field trial success rate significantly lower than operators who made field-trial goal and design management a final step in their lab trials.

Most of the “enlightened” operators also said that a field trial should inherit technical issues from the lab trial, if there were issues that couldn’t be proved out in the lab.  When I asked for examples of the sort of issue a lab trial couldn’t prove, operations integration was the number one point.  The operators agreed that you had to introduce operations integration in the lab trial phase, but also that the lab trials were almost never large enough to expose you to a reasonable set of the issues.  One operator called the issue-determination goal of a lab trial the sensitivity analysis.  This works, under what conditions?  Can we sustain those conditions in a live service?

One of the reasons for all the drama in the lab-to-field transition is that most operators say this is a political shift as well as a shift in procedures and goals.  A good lab trial is likely run by the office of the CTO, where field trials are best run by operations, with liaison with the CTO lead on the lab trial portion.  The most successful operators have established cross-organizational teams, reporting directly to the CEO or executive committee, to control new technology assessments from day one to deployment.  That avoids the political transition.

A specific issue operators report in the lab-to-field transition is the framework of the test.  Remember that operators said you’d pick a lab trial with the goal of balancing the expense and difficulty of the trial with the insights you could expect to gain.  Most operators said that their lab-trial framework wasn’t supposed to be the ideal framework in which to make a business case, and yet most operators said they tended to take their lab-trial framework into a field trial without considering whether they actually had a business case to make.

The transition from field trial to pilot deployment illustrates why blundering forward with a technical proof of concept isn’t the right answer.  Nearly every operator said that their pilot deployment would be based on their field-trial framework.  If that, in turn, was inherited from a lab trial or PoC that wasn’t designed to prove a business case, then there’s a good chance no business case has been, or could be, proven.

This all explains the view expressed by operators a year later, in my survey in the spring of 2014.  Remember that they said that they could not, at that point, make a business case for NFV and had no trials or PoCs in process that could do that.  With respect to NFV, the operators also indicated they had less business-case injection into their lab trial or PoC processes than usual, and less involvement or liaison with operations.  The reason was that NFV had an unusually strong tie to the CTO organization, which they said was because NFV was an evolving standard and standards were traditionally handled out of the CTO’s organization.

For NFV, and for SDN, this is all very important for operators and vendors alike.  Right now, past history suggests that there will be a delay in field trials where proper foundation has not been laid in the lab, and I think it’s clear that’s been happening.  Past history also suggests that the same conditions will generate an unusually high rate of project failure when field trials are launched, and a longer trial period than usual.

This is why I’m actually kind of glad that the TMF and the NFV ISG haven’t addressed the operations side of NFV properly, and that SDN operations is similarly under-thought.  What we probably need most now is a group of ambitious vendors who are prepared to take some bold steps to test their own notions of the right answer.  One successful trial will generate enormous momentum for the concept that succeeds, and quickly realign the efforts of other operations—and vendors.  That’s what I think we can expect to see in 2015.

There’s Hope for NFV Progress in 2015

Since I blogged recently on the challenges operators faced in making a business case for NFV, I’ve gotten quite a few emails from operators themselves.  None disagreed with my major point—the current trial and PoC activity aren’t building a business case for deployment—but they did offer some additional color, some positive and some negative, on NFV plans.

In my fall survey last year, operators’ biggest concern about NFV was that it wouldn’t work, and their second-greatest concern was that it would become a mess of proprietary elements, something like the early days of routing when vendors had their own protocols to discourage open competition.  The good news is that the majority of operators say these concerns have been reduced.  They think that NFV will “work” at the technical level, and they think that there will be enough openness in NFV to keep the market from disintegrating into proprietary silos.

The bad news is that the number of operators who feel that progress has been made has actually declined since the spring, and in some cases operators who told me in April that they were pleased with the progress of their NFV adventures now had some concerns.  A couple had some very specific and similar views that are worth reviewing.

According to the most articulate pair of operators, we have proved the “basic concept of NFV”, meaning that we have proved that we can take cloud-hosted network features and substitute them for the features of appliances.  Their concerns lie in NFV beyond the basics.

First and foremost, these operators say that they cannot reliably estimate the management burden of an NFV deployment.  There is no doubt in their mind that NFV could push down capex, but also no doubt that it would create a risk of increased opex at the same time.  They don’t know how much of an opex increase they’d face, so they can’t validate net savings.  Part of the reason is that they don’t have a reliable and extensible management model for NFV, but part is more basic.  Operators say that they don’t know how well NFV will perform at scale.  You need efficient resource pools to achieve optimal savings on capex, which means you need a large deployment.  So far they don’t have enough data on “large” NFV to say whether opex costs rise in linear way.  In fact, they say they can’t even be sure that all of the tweaks to deployment policy—things ranging from just picking the best host to horizontal scaling and reconfiguration of services under load or failure—will be practical given the potential impact they’d have on opex.  One, reading all the things in a VNF Descriptor, said “This is looking awfully complicated.”

The second concern these operators expressed was the way that NFV integrated with NFVI (NFV Infrastructure).  They are concerned that we’ve not tested the MANO-to-VIM (Virtual Infrastructure Manager) relationship adequately, and even haven’t addressed the VIM-to-NFVI relationship fully.  Most of the trials have used OpenStack, and it’s not clear from the trials just how effective it will be in addressing network configuration changes.  Yes, we can deploy, but OpenStack is essentially a single-thread process.  Could a major problem create enough service disruption that the VIM simply could not keep up?

There are also concerns about the range of things a VIM might support.  If you have two or three clouds, or cloud data centers, do you have multiple VIMs?  Most operators think you do, but these two operators say they aren’t sure how MANO would be able to divide work among multiple VIMs.  How do you represent a service that has pools of resources with different control needs?  This includes the “how do I control legacy elements” question.  All of the operators said they had current cloud infrastructure they would utilize in their next-phase NFV trial.  All had data center switches and network gateways that would have to be configured for at least some situations.  How would that work?  Is there another Infrastructure Manager?  If so, again, how do you represent that in a service model at the MANO level?

Then there’s SDN.  One operator in the spring said that the NFV-to-SDN link was a “fable connected to a myth”.  The point was that they were not confident of exactly what SDN would mean were it to be substituted for traditional networking in part of NFVI.  They weren’t sure how NFV would “talk to” SDN and how management information in particular would flow.  About two-thirds of operators said that they could have difficulties taking NFV into full field trials without confidence on the SDN integration issue.  They weren’t confident in the spring, but there is at least some increase in confidence today (driven by what they see as a convergence on OpenDaylight).

You can make an argument that these issues are exactly what a field trial would be expected to address, and in fact operators sort of agree with that.  Their problem is that they would expect their lab trials to establish a specific set of field-trial issues and a specific configuration in which those issues could be addressed.  The two key operators say that they can’t yet do that, but they aren’t convinced that spending more time in the lab will give them a better answer.  That means they may have to move into a larger-scale trial without the usual groundwork having been laid, or device a different lab trial to help prepare for wider deployment.

That would be a problem because nearly all the operators say that they are being charged by senior management to run field trials for NFV in 2015.  Right now, most say that they’re focusing on the second half—likely because if you’re told you need to do something you’re not sure you are ready for, you delay as long as you can.

What would operators like to see from NFV vendors?  Interestingly, I got an answer to that over a year ago at a meeting in Europe.  One of the kingpins of NFV, and a leader in the ISG, told me that the way operators needed to have NFV explained was in the context of the service lifecycle.  Take a service from marketing conception to actual customer deployment, he said, and show me how it progresses through all the phases.  This advice is why I’ve taken a lifecycle-driven approach in explaining my ExperiaSphere project.  But where do we see service lifecycles in vendor NFV documentation?

I asked the operators who got back to me after my blog, and the two “thought leaders” in particular, what they thought of the “lifecycle-driven” approach.  The general view was that it would be a heck of a lot better way to define how a given NFV product worked than the current approach, which focuses on proving you can deploy.  The two thought leaders said flatly that they didn’t believe any vendor could offer such a presentation of functionality.

I’m not sure I agree with that, though I do think that nobody has made such a service-workflow model available in public as yet.  There are at least a couple of players who could tell the right story the right way, perhaps not covering all the bases but at least covering enough.  I wish I could say that I’d heard vendors say they’d be developing a lifecycle-centric presentation on NFV, or that my operator friends had heard it.  Neither, for now, is true, but I do have to say I’m hopeful.

We are going to large-scale NFV trials in 2015, period.  Maybe only one, but at least that.  Once any vendor manages to get a really credible field trial underway, it’s going to be impossible for others to avoid the pressure to do the same.  So for all those frustrated by the pace of NFV adoption, be patient because change is coming.