Why (and How) Infrastructure Managers are Critical in NFV Management

In a number of recent blogs I’ve talked about the critical value of intent modeling to NFV.  I’d like to extend that notion to the management plane, and show how intent modeling could bridge NFV, network management, and service operations automation into a single (hopefully glorious) whole.

In the network age, management has always had a curious dualism.  Network management has always been drawn as a bottom-up evolution from “element” management of single devices, through “network” management of collective device communities, to “service” management as the management of cooperative-device-based experiences.  But at the same time, “service” management in the sense of service provider operations, has always started with the service and then dissected it into “customer-” and “resource-facing” components.

The unifying piece to the management puzzle is the device, which has always been the root of whatever you were looking at in terms of a management structure.  Wherever you start, management comes down to controlling the functional elements of the network.  Except that virtualization removes “devices” and replaces them with distributed collections of functions.

The thing is, a virtual device (no matter how complicated it is functionally) is a black box that replicates the real device it was modeled on.  If you look at a network of virtual devices “from the top” of the management stack (including from the service operations side) you’d really want to see the management properties of the functionality, not the implementation.  From the bottom where your responsibility is to dispatch a tech to fix something, you’d need to see the physical stuff.

This picture illustrates the dichotomy of virtualization management.  You still have top-and-bottom management stack orientation, but they don’t meet cleanly because the resource view and the service view converge not on something real but rather on something abstract.

If we visualize our virtual device as an intent model, it has all the familiar properties.  It has functionality, it has ports, and it has an SLA manifest in a MIB collection of variables.  You could assemble a set of intent-modeled virtual devices into a network and you could then expect to manage it from the top as you’d managed before.  From the bottom, you’d have the problem of real resources that used to be invisible pieces of a device now being open, connected, service elements.  The virtual device is then really a black box with a service inside.

Might there be a better way of visualizing networks made up of virtual devices?  Well, one property of a black box is that you can’t see inside it.  Why then couldn’t you define “black boxes” representing arbitrary useful functional pieces that had nothing to do with real network devices at all?  After all, if you’re virtualizing everything inside a black box, why not virtualize a community of functions rather than the functions individually?

This gives rise to a new structure for networks.  We have services which divide into “features”.  These features divide into “behaviors” that represent cooperative activities of real resources, and they divide into the real resources.  In effect, what this would do is to create a very service-centric view of “services”, meaning a functional view rather than one based on how resources are assembled.  The task of assembling resources goes to the bottom of the stack.  All composition is functional, and when you’ve decomposed your composition to deploy it, you enter the structural domain at the last minute.

This approach leads to a different view of management, because if you assemble intent models to do something you have to have intent-model SLAs to manage against, but they have to be related somehow to those black-box elements inside them.  To see how this could work, let’s start by drawing a block that’s an intent model for Service X.  Alongside it, we have one of those cute printer-listing graphics that represents the MIB variables that define the intent model’s behavior—what it presents to an outside management interface.

But where do they come from?  From subordinate intent models.  We can visualize Services A and B “below” Service X, meaning that they are decomposed from it.  The variables for Services A and B must then be available to Service X for use in deriving its own variables.  We might have a notion that Service X Availability equals Service A Availability plus Service B Availability (simplified example, I realize!)  That means that if Services A and B are also black boxes that contain either lower-level intent models or real resources, the SLA variables for these services are then derived from their subordinate elements.

This notion of service composition would be something like virtual-device composition except that you don’t really try to map to devices but to aspects of services.  It’s more operations-friendly because that’s what you sell to customers.  I would argue that in the world of virtualization, it’s also more resource-management friendly because the relationship between resource state (as reflected in its variables) and service state (where it’s reflected) is explicit.

How would you compose a service?  From intent models, meaning from objects representing functions.  These objects would have “ports”, variables/SLAs, and functionality.  The variables could include parameter values and policies that could be set for the model, and those would then propagate downward to eventually reach the realizing resources.  Any set of intent models that had the same outside properties would be equivalent, so operators could define “VPN” in terms of multiple implementation approaches and substitute any such model where VPN is used.  You could also decompose variably based on policies passed down, so a VPN realized in a city without virtualization could realize on legacy infrastructure.

In this approach, interoperability and interworking is at the intent model level.  Any vendor who could provide the implementation of an intent model could provide the resources to realize it, so the intent-model market could be highly competitive.  Management of any intent model is always the same because the variables of that model are the same no matter how it’s realized.

The key to making this work is “specificational” in nature.  First, you have to define a set of intent models that represent functionally useful service components.  We have many such today, but operators could define more on their own or through standards bodies.  Second, you have to enforce a variable-name convention for each model, and create an active expression that relates the variables of a model to the variables generated by its subordinates (or internal structure).  This cannot be allowed to go further than an adjacent model because it’s too difficult to prevent brittle structures or consistency problems when you dive many layers down to grab a variable.   Each black box sees only the black boxes directly inside it; the deeper ones are opaque as they should be.

Now you can see how management works.  Any object/intent model can be examined like a virtual device.  The active expressions linking superior and subordinate models can be traversed upward to find impact or downward to find faults.  It it’s considered useful, it would be possible to standardize the SLAs/MIBs of certain intent models and even standardize active flows that represent management relationships.  All of that could facilitate plug-and-play distribution of capabilities, and even federation among operators.

We may actually be heading in this direction.  Both the SDN and NFV communities are increasingly accepting of intent models, and an organized description of such a model would IMHO have to include both the notion of an SLA/MIB structure and the active data flows I’ve described.  It’s a question of how long it might take.  If we could get to this point quickly we could solve both service and network management issues with SDN and NFV and secure the service agility and operations efficiency benefits that operators want.  Some vendors are close to implementing something like this, too.  It will be interesting to see if they jump out to claim a leading position even before standards groups get around to formalizing things.  There’s a lot at stake for everyone.

Why NFV’s VIMs May Matter More than Infrastructure Alone

Everyone knows what MANO means to NFV and many know what NFVI is, but even those who know what “VIM” stands for (Virtual Infrastructure Manager) may not have thought through the role that component plays and how variations on implementation could impact NFV deployment.  There are a lot of dimensions to the notion, and all of them are important.  Perhaps the most important point about “VIMs” is that how they end up being defined will likely set the dimensions of orchestration.

In the ISG documents, a VIM is responsible for the link between orchestration and management (NFVO and VNFM, respectively) and the infrastructure (NFVI).  One of the points I’ve often made is that a VIM should be a special class of Infrastructure Manager, in effect a vIM.  Other classes of IM would represent non-virtualized assets, including legacy technology.

The biggest open question about an IM is its scope, meaning how granular NFVI appears.  You could envision a single giant VIM representing everything (which is kind of what the ETSI material suggests) or you could envision IMs that represented classes of gear, different data centers, or even just different groups of servers.  There are two reasons why IM scope is important; for competitive reasons and for orchestration reasons.

Competitively, the “ideal” picture for IMs would be that there could be any number of them, each representing an arbitrary collection of resources.  This would allow an operator to use any kind of gear for NFV as long as the vendor provided a suitable VIM.  If we envisioned this giant singular IM, then any vendor who could dominate either infrastructure or the VIM-to-orchestration-and-management relationship would be able to dictate the terms through which equipment could be introduced.

The flip-side issue is that if you divide up the IM role, then the higher-layer functions have to be able to model service relationships well enough to apportion specific infrastructure tasks to the correct IM.  Having only one IM (or vIM) means that you can declare yourself as having management and orchestration without actually having much ability to model or orchestrate at all.  You fob off the tasks to the Great VIM in the Sky and the rest of MANO is simply a conduit to pass requests downward to the superIM.

I think this point is one of the reasons why we have different “classes” of NFV vendor.  The majority do little to model and orchestrate, and thus presume a single IM or a very small number of them.  Most of the “orchestration” functionality ends up in the IM by default, where it’s handled by something like OpenStack.  OpenStack is the right answer for implementing vIMs, but it’s not necessarily helpful for legacy infrastructure management and it’s certainly not sufficient to manage a community of IMs and vIMs.  The few who do NFV “right” IMHO are the ones who can orchestrate above multiple VIMs.

You can probably see that the ability to support, meaning orchestrate among, multiple IMs and vIMs would be critical to achieving full service operations automation.  Absent the ability to use multiple IMs you can’t accommodate the mix of vendors and devices found in networks today, which means you can’t apply service operations automation except in a green field.  That flies in the face of the notion that service operations automation should lead us to a cohesive NFV future.

Modeling is the key to making multiple IMs work, but not just modeling at the service level above the IM/vIM.  The only logical way to connect IMs to management and orchestration is to use intent models to describe the service goal being set for the IM.  You give an IM an intent model and it translates the model based on the infrastructure it supports.  Since I believe that service operations automation itself demands intent modeling above the IM, it’s fair to wonder what exactly the relationship between IMs/vIMs and management and orchestration models would be.

My own work on this issue, going back to about 2008, has long suggested that there are two explicit “domains”, service and resource.  This is also reflected in the TMF SID, with customer-facing and resource-facing service components.  The boundary between the two isn’t strictly “resources”, though—at least not as I’d see it.  Any composition of service elements into a service would likely, at the boundaries, create a need to actually set up an interface or something.  To me, the resource/service boundary is administrative—it’s functional versus structural within an operator.  Customer processes, being service-related, live on the functional/service side, and operator equipment processes live on the resource side.

Resource-side modeling is a great place to reflect many of the constraints (and anti-constraints) that the ISG has been working on.  Most network cost and efficiency modeling would logically be at the site level not the server level, so you might gain a lot of efficiency by first deciding what data centers to site VNFs in, then dispatching orders to the optimum ones.  This would also let you deploy multiple instances of things like OpenStack or OpenDaylight, which could improve performance.

Customer-based or customer-facing services are easy to visualize; they would be components that are priced.  Resource-facing services would likely be based on exposed management interfaces and administrative/management boundaries.  The boundary point between the two, clear in this sense, might be fuzzy from a modeling perspective.  For example, you might separate VLAN access services by city as part of the customer-facing model, or do so in the resource-facing model.  You could even envision decomposition of a customer-facing VLAN access services into a multiple set of resource-facing ones, for each city involved for example, based on what infrastructure happened to be deployed there.

From this point, it seems clear that object-based composition/decomposition could take place on both sides of the service/resource boundary, just for different reasons.  As noted earlier, most operators would probably build up resource-facing models from management APIs—if you have a management system domain then that’s probably a logical IM domain too.  But decomposing a service to resources could involve infrastructure decisions different from decomposing a service to lower-level service structures.  Both could be seen as policy-driven but different policies and policy goals would likely apply.

I think that if you start with the presumption that there have to be many Infrastructure Managers, you end up creating a case for intent modeling and the extension of these models broadly in both the service and resource domain.  At the very bottom you have things like EMSs or OpenDaylight or OpenStack, but I think that policy decisions to enforce NFV principles should be exercised above the IM level, and IMs should be focused on commissioning their own specific resources.  That creates the mix of service/resource models that some savvy operators have already been asking for.

A final point to consider in IM/vIM design is serialization of deployment processes.  You can’t have a bunch of independent orchestration tasks assigning the same pool of resources in parallel.  Somewhere you have to create a single queue in which all the resource requests for a domain have to stand till it’s their turn.  That avoids conflicting assignments.  It’s easy to do this if you have IM/vIM separation by domain, but if you have a giant IM/vIM, somewhere inside it will have to serialize every request made to it, which makes it a potential single point of processing (and failure).

Many of you are probably considering the fact that the structure I’m describing might contain half-a-dozen to several dozen models, and will wonder about the complexity.  Well, yes, it is a complex model, but the complexity arises from the multiplicity of management systems, technologies, vendors, rules and policies, and so forth.  And of course you can do this without my proposed “complex” model, using software that I can promise you as a former software architect would be much more complex.  You can write good general code to decompose models.  To work from parameters and data files and to try to anticipate all the new issues and directions without models?  Good luck.

To me, it’s clear that diverse infrastructure—servers of different kinds, different cloud software, different network equipment making connections among VNFs and to users—would demand multiple VIMs even under the limited ETSI vision of supporting legacy elements.  That vision is evolving and expanding, and with it the need to have many IMs and vIMs.  Once you get to that conclusion, then orchestration at the higher layer is more complicated and more essential, and models are the only path that would work.

Why is Network-Building Still “Business as Usual?”

If we tried to come up with a phrase that expressed the carrier directions as expressed so far in their financials and those of the prime network vendors, a good suggestion would be “business as usual.”  There’s been no suggestion of major suppression of current capital plans, no indications of shifts in technology that might signal a provocative shift in infrastructure planning.  We are entering the technology planning period for the budget cycle of 2016, a critical one in validating steps to reverse the revenue/cost-per-bit crunch operators predict.  Why isn’t there something more going on?

It’s perfectly possible that one reason is that operators were being alarmists with their 2017 crossover prediction.  Financial analysts and hedge funds live quarter to quarter, but it seems pretty likely to me that they’d be worried at least a little if there were a well-known impending crisis in the offing.  Somebody might take flight and cause a big dip in stock prices.  But I think it’s a bit more likely than not that the 2017 consensus date for a revenue/cost crunch is as good an estimate as the operators could offer.

Something that ranges over into the certainty area is that operators are responding, by putting vendors under price pressure, buying more from Huawei the price leader.  Except in deals involving US operators where Huawei isn’t a player, we’ve seen most vendors complain of pricing pressures and at least a modest slowing of deals.  Ciena said that yesterday on their call, though they say it’s not a systemic trend but rather a timing issue for a couple of players.

Another almost-sure-thing reason is that the operations groups that do the current network procurements haven’t been told to do much different.  VPs of ops told me, when I contacted them through the summer, that they were not much engaged in new SDN or NFV stuff at this point.  As they see it, new technology options are still proving out in the lab (hopefully).  Their focus is more on actual funded changes like enhancements to mobile infrastructure.

The question, the big one, is whether one reason operators are staying the course is that it’s the only course they have.  We’ve talked, as an industry, about massive changes in network infrastructure but for all the talk it’s hard to define just what a next-gen infrastructure would look like.  Harder, perhaps, to explain how we’d fund the change-over.

That’s the real point, I believe, because in our rush to endorse new network technologies we’ve forgotten a message from the past.  The notion of transformation of telecom infrastructure isn’t new.  We had analog telephony, then digital and TDM, and then the “IP convergence”.  What would we see if we looked back to the past and asked how the changes came about, particularly that last one to IP?

Packet networking was proposed in a Rand Corporation study in 1966, and we had international standard packet protocols and the OSI model just a decade later.  We also had the foundations of the Internet.  None of the stuff that evolved in that period was intended as a replacement for TDM.  That role was envisioned for Asynchronous Transfer Mode, or ATM.

The theory behind ATM at the technical level isn’t relevant here, so I’ll just summarize it.  You break down information into “cells” that are small enough so that the delay you’d experience waiting for a cell to be sent or received is small.  That lets you jump priority stuff in front of that which isn’t a priority, which lets you mingle time-sensitive stuff like voice (or video) with data.  This, in turn, lets you build a common network for all traffic types.  ATM was actually intended to replace the public network, designed for it in fact, and there was an enormous wave of interest in ATM.  I know because I was part of it.

I learned something from ATM, not from its success but from its failure.  There was nothing technically wrong with ATM.  There was nothing wrong with the notion that a single converged network would be more economical as the foundation of a shift of consumer interest from voice to data. The problem was that the transition to ATM was impractical.  Wherever you start with ATM, you deliver technology without community.  You can’t talk ATM with somebody because early deployment would be unlikely to involve both you and your desired partners in communication.  You needed to toss out the old and put in the new, and that’s a very hard pill to swallow for operators.

Why did IP then win?  It wasn’t technical superiority.  It won because it was pulled through by a service—the Internet.  Operators wanted consumer data, and the Internet gave it to them.  The revenue potential of the Internet could fund the deployment of what was then an overlay network based on IP.  More Internet, more IP, until we reached the point where we had so much IP that it became a viable service framework, a competitor to what had previously been its carrier technology—TDM.  We got to IP through the revenue subsidies of the Internet.

What revenue funds the currently anticipated infrastructure transformation?  We don’t have a candidate that has that same potential.  The benefits of SDN or NFV are subtle, and we have no history as an industry in exploiting subtle benefits, or even harnessing them.  That means, in my view, that we either have to find some camels’-nose service to pull through the change as the Internet did for IP, or we have to learn to systematize the change.  I’ve offered an example of both in recent blogs.

IoT and agile cloud computing could both be candidates for the camel role.  We could gain almost a trillion dollars in revenues worldwide from these services.  We’re slowly exploiting the cloud already, and while it would help if we had a realistic understanding of where we’re going with it, we’ll eventually muddle into a good place.  IoT is more complicated because we have absolutely no backing for a truly practical model, but I think eventually it will happen too.

That “eventually” qualifier is the critical one here.  We probably can’t expect any new service to take off as fast as the Internet did, but the Internet took a decade or more to socialize IP to infrastructure-level deployment.  My point with the notion of service operations automation is that we could do better.  If we build, through a combination of cloud features for infrastructure and enlightened software data modeling, a petri dish with an ideal growth medium in it, we could build many new services and attract many new revenue sources.  This could then drive the evolution of infrastructure forward as surely as one giant camel could have, and a lot faster.

Consumerism has blinded us to a reality, which is the reality of justification.  I buy a new camera not because I need one but because I want it.  That works for discretionary personal expenses up to a point, but it’s not likely the financial industry would reward, or even tolerate, a decision by operators to deploy SDN or NFV for no reason other than it was nice and shiny and new.  It’s my belief that we can accelerate change by understanding how it has to be paid for.  A useless technology will meet no financial test, and both SDN and NFV can be justified.  If we want earnings calls that cite explosive deployment growth in these new things, we’ll have to accept the need for that justification and get working on making it happen.

The Technical Steps to Achieve Service Operations Automation

If the concept of service operations automation is critical to NFV success and the NFV ISG doesn’t describe how to do it, how do you get it done?  I commented in an earlier blog that service operations could be orchestrated either within the OSS/BSS or within MANO.  The “best” place might establish where to look for NFV winners, so let’s explore the question further.

Service operations gets mired in legacy issues in the media.  We hear about OSS/BSS and we glaze over because we know the next comments are going to be about billing systems and order portals and so forth.  In truth, the billing and front-ending of OSS/BSS is fairly pedestrian, and there are plenty of implementations (proprietary and open-source).  These can be refined and tuned, but they can’t automate the service process by themselves.

Operations automation of any sort boils down to automating the response of operations systems to events.  These events can be generated either within the service framework itself, in the form of a move, add, or change for example.  They could also be generated in the resource framework, meaning they’d pop up as a result of conditions that emerged during operation.

Automating event-handling has been considered for some time.  As far as I can determine, the seminal work was done around 2008 by the TMF in their NGOSS Contract discussions, which grew into the GB942 specification.  The picture painted by the TMF was simple but revolutionary; events are steered to operations processes through the mediation of a contract data model.  In my view, this is the baseline requirement for service automation.  Things like billing systems respond to events and so are passed them by the model, and things like order entry systems generate events to be processed.

The reason why the TMF established the notion of a contract data model (the SID, in TMF terms) as the mediating point is that the automation of service events is impossible without event context.  Suppose somebody runs into a network operations center that’s supporting a million customers using a thousand routers and ten thousand ports and circuits, and says “a line is down!”  It’s not very helpful in establishing the proper response even in a human-driven process.  The fundamental notion of the TMF’s GB942 was that the contract data model would provide the context.

For that to work, we have to be able to steer events in both the service-framework and resource-framework sense.  It’s pretty easy to see how service events are properly steered because these events originate in software that has contract awareness.  You don’t randomly add sites to VPNs, somebody orders them, and that associates the event with a service data model.  The problem lies in the resource framework events.

In the old days, there was a 1:1 association between resources and services.  You ordered a data service and got TDM connections, which were yours alone.  The emergence of packet technology introduced the problem of shared resources.  Because a packet network isn’t supporting any given service or user with any given resource (it adapts to its own conditions with routing, for example), it’s difficult to say when something breaks that the break is causing this or that service fault.  This was one of the primary drivers for a separation of management functions in networking—we had “operations” meaning service operations and OSS/BSS, and we had “network management” meaning the NOC (network operations center) sustaining the pool of shared resources as a pool.

“Event-driven” OSS/BSS is a concept that’s emerged in part from GB942 and in part because of the issues of resource-framework events.  The goal is to tie resource events somehow into the event-steering capabilities of the service data model.  It’s not a particularly easy task, and most operators didn’t follow it through, which is a reason why GB942 isn’t often implemented (one of almost 50 operators I talked with said they did anything with it).

This is where things stood when NFV came along, and NFV’s virtualization made things worse.  The challenge here is that the resources that are generating events don’t even map to logical components of the services.  A “firewall” isn’t a device but a chain of software-hosted functions on VMs running on servers and connected with SDN tunnels into a chain.  Or maybe it’s a real device.  You see the problem.

Virtualization created the explicit need for two things that were probably useful all along, but not critical.  One was a hierarchically structured service data model made up of cataloged standard components.  You needed to be able to define how events were handled according to how pieces of the service were implemented, and this is easy to do if you can catalog the pieces along with their handling rules, then assemble them into services.  The other was explicit binding of service components to the resources that fulfill them, at the management level.

NFV’s data model wasn’t defined in the ISG’s Phase One work, but the body seems to be leaning toward the modern concept of an “intent model”, with abstract features, connection points, and an SLA.  This structure can be created with the TMF SID.  NFV also doesn’t define the binding process, and while the TMF SID could almost surely record bindings the process for doing that wasn’t described.  Binding, then, is the missing link.

There are two requirements for binding resources to services.  First, you must bind through the chain of service component structures you’ve created to define the service in the first place.  A service should “bind” to resources not directly but through its highest-level components, and so forth down to the real resources.  Second, you must bind indirectly to the resources themselves to preserve security and stability in multi-tenant operations.  A service as a representative of a specific user cannot “see” or “control” aspects of resource behavior when the resource is shared, unless the actions are mediated by policy.

So this is what you need to be looking for in an implementation of NFV that can address service operations automation—effective modeling but most of all effective binding.  Who has it?

Right now, I know enough about the implementations of NFV presented by Alcatel-Lucent, HP, Oracle, and Overture Networks to say that these companies could do service operations automation with at-most-minimal effort.  Of the three, I have the most detail on the modeling and binding for HP and Overture and therefore the most confidence in my views with regard to those two—they can do the job.  I have no information to suggest that the OSS/BSS players out there have achieved the same capabilities, so right now I’d say that NFV is in the lead.

What challenges their lead is that all the good stuff is out of scope to the standards work, while it’s definitely in-scope to the TMF.  I’m not impressed by the pace of the TMF’s ZOOM project, which has spent over a year doing what should have been largely done when it started.  But…I think a couple of good months of work for a couple people could totally define the TMF approach, and that would be possible for the TMF.  I don’t think the ISG can move that fast, which means that vendors like those I’ve named are operating in a never-never land between standards, so to speak.  They might end up defining a mechanism de facto, or demonstrating that no specific standard is even needed in the binding-and-model area.

The deciding factor in whether service operations automation is slaved to MANO or to the OSS/BSS may be history.  MANO is a new concept with a lot of industry momentum.  OSS/BSS is legacy, and while it doesn’t have far to go it’s had the potential to do all of this all along.  The same effort by the same couple of people would have generated all the goodies on this in 2008, and it hasn’t done it yet.  We have four plausible implementations on the NFV side now.  If those four vendors can hook to service operations automation they could make the business case for NFV, and perhaps change OSS/BSS forever in the process.

Five NFV Missions that Build From Service Operations Success

In my last blog I outlined an approach to making an NFV business case that was based on prioritizing service operations with legacy infrastructure.  This, I noted, would provide a unifying umbrella of lifecycle services that NFV-based applications could then draw on.  Since the operations modernization would have been paid for by service operations cost reductions, NFV would not have to bear the burden.  That would facilitate making the business case for individual applications of NFV, and at the same time unify these disconnected NFV pieces.

The obvious question stemming from this vision is just what NFV applications might then be validated.  According to operators, there are 5 that seem broadly important, so let’s look at how these would benefit from the service operations umbrella.

The most obvious NFV opportunity that might be stimulated by service ops is virtual CPE or vCPE.  This app is sometimes presented as a generalized-device-on-premises-hosted set of functions, sometimes as a cloud-hosted set, and sometimes an evolution from the first to the second.  vCPE is fairly easy to justify as a managed service provider (one who uses third-party transport and links it to managed services and extra features) because agile service feature selection is a clear value proposition.  Broader-based telcos might find the service interesting but lacking in the scope of benefits that would be needed to impact EBITDA.

A service operations umbrella would necessarily include a service portal, automatic provisioning and modifications of the service features, and self-care features.  These features, coming to vCPE at no incremental cost, would make the application valuable to any operator for business services.  It’s likely that efficient service operations could even make it useful to SMBs using consumer-like broadband connections.  For some applications, including content (more on this later), home control and medical monitoring, vCPE could be justified all the way to the consumer.

A second NFV opportunity that’s already generating interest and even sales is virtualization of mobile infrastructure.  Even for 4G/LTE services, operators are visualizing software-defined radio, self-optimizing elements, flexible topologies in the evolved packet core (EPC), and add-on services tightly coupled to IMS.  5G could increase the credibility of all of these applications, but particularly the latter.

Operations automation is critical for the success of any mobile infrastructure business case for NFV because mobile infrastructure is multi-tenant and so the dynamism of individual functions is limited.  Most function deployment would come in response to changes in traffic load or as a response to an outage.  Mobile infrastructure is going to be highly integrated with legacy equipment, but also with other services like content delivery and IoT.

What makes add-on services interesting is that IMS-mediated applications would likely be much more dynamic than core IMS or EPC components.  You can imagine mobile collaborative applications involving content sharing or IoT integration with mobility as generating a lot of per-user or per-group activities.  Content-viewer collaboration, sometimes called “social content”, is also a good application because it involves both the indexing and sharing of content clips and the socialization of the finds among a group, almost like collaboration.

The third opportunity area that service optimization would enable is elastic bandwidth and connectivity.  The business key to services that tactically build on baseline VPNs or VLANs is that they don’t undermine the revenue stream of the base service.  My research suggests that this can be done if such services are always tied to an over-speed access connection (which you need for delivery anyway), that they have a pricing strategy that encourages longer-term commitments rather than every-day up-and-down movement, and that they be invoked through a self-service portal that can not only offer capacity or new site connectivity on demand, but on a schedule or even if a management system detects certain conditions.

The differentiating point for this kind of service is its ability to augment private-network services and to do so at the same high level of availability and QoS.  While those factors can’t really be guaranteed using NFV, the NFV-to-SDN tie could help assure it.

IoT is a fourth opportunity, though it’s more complicated to realize.  As I’ve noted in other blogs, IoT is an application that demands harnessing of a large variety of already-deployed sensor/controller elements and their conversion into a data repository that can then be queried.  It’s true that there may be new sensor/controller elements put online via 4/5G, but there are literally billions of devices already out there and a failure to enlist them in early applications would generate a major question of how new deployments would be capitalized and by whom.

The most significant value of IoT lies in its capability to provide context to mobile users, which is the basis for many valuable applications.  You can visualize the world as an interlocking and overlaid series of information fields that users move through or to (or away from) and of which they are selectively aware.  The information available can be about the physical area, traffic conditions, nearby events that might be of interest or create risk, commercial opportunities, and so forth.  The information from a given user’s mobile device could guide the user’s path, and if a community of users are interacting the collective information of them all could help arrange meetings.  Emergency responders and any worker-dispatch function could draw on the data too.

The final opportunity is tactical cloud computing, another activity that emerges from either physical or social mobility.  One of the profound truths of IT spending by enterprises is that it has always peaked in growth when new productivity paradigms become visible.  Mobility is likely to be the next opportunity for such a paradigm, because engaging the worker with information at the point of worker activity is the ultimate marriage of information processing and information delivery.  But mobile workers are mobile and their information needs are tactical and transient, which isn’t ideal for traditional application computing models.  It’s better for the cloud, particularly if elements of the application and data can be staged to follow the worker and anticipate the next mission.

It’s clear that this concept would grow out of business use of IoT, and it might also grow out of maturing collaboration models focused on mobile workers as well.  It would probably be difficult to socialize it by itself, which is why I place it last on the list.

The big long-term opportunities for NFV are in the last two areas, and that’s an important point to make because it’s a validation of the premise of this whole exercise.  To make shareholders happy is the primary mission of public-company business policy, and that for network operators means raising EBITDA, which means lowering opex significantly.  You can’t do that except through broad changes in service operations, and you can’t justify broad changes with narrow service targets.  Since broad service targets are too radical to start an NFV story with, you are stuck unless you decouple service operations from the rest of NFV and give it priority.  We don’t have to forget NFV or pass on its real benefits, but we do have to apply the lesson it’s taught about orchestration to service operations first.  Once that happens, then all the NFV priorities of various types of operators, all the customer directions they might follow, unite in a single technical framework that can propel NFV and operator business success.  That’s the only way we’re going to get this done.

How an NFV Sales Story Can Get Wall Street “Tingly Inside”

Remember from yesterday’s blog that the goal of an NFV business case should be to “make the Street all tingly inside.”  That means that NFV’s business case has to be made in two interdependent but still separate tracks—one to justify new capex with high ROI and the other to create an opex-improving umbrella to improve EBITDA for the operators.

Classic NFV evolution proposes to combine this, creating a significant EBITDA gain and high ROI in any incremental capex.  That’s a problem at two levels.  First, we have no broad model for NFV deployment—we have a series of specific options like vCPE or mobile infrastructure, but no unifying model.  Thus, it’s hard to obtain enough mass to get any significant savings to improve EBITDA.  Second, if we achieve gains through infrastructure modernization to an NFV form, we necessarily have a large capex if we have a large network scope to apply savings to.  That’s hard to reconcile with high overall ROI and harder to manage in risk terms.

My proposal is simple.  We first attack EBITDA by attacking what financial analysts mean when they say “opex”, and we do that by focusing NFV principles on legacy infrastructure.  We then virtualize functions where we can secure a higher ROI, either because of extra cost efficiencies or because of incremental revenue gains.

We have to start with that opex point.  Any discussion about improving “opex” should start with the fact that the word really has two meanings.  We in the networking industry think of opex as “operations expense” but the operators’ CFOs and the financial markets think of it as “operating expense”.  Network operations is only one component of this broader opex definition.  The other two are service payments (roaming, etc.) and service operations costs.  It’s this last group that we need to consider.

Service operations or “service management” is the set of tasks/systems that support the retail or wholesale customer relationships of a network operator.  It starts with an order and ends when the service contract expires and the associated resources are released.  Service operations links with network operations for the realization of the service relationship being purchased.  Through the service lifecycle there are events from both the service and network side that are triggers for action on the other side, so the two operations areas are interdependent.

It is highly unlikely that you could do anything in the way of a new service, or change the operating expenses of an operator, without impacting service operations.  In fact, in terms of total costs, service operations makes up about two-thirds of total management costs (service and network), which in turn make up about half of opex.  Service operations is implemented through OSS/BSS systems and is under the operators’ CIO organizations.

Given that a third of all opex is service management, you can conclude that if we were to modernize OSS/BSS systems to provide full service automation capabilities and simply cut the cost of service operations in half, we’d make an impact equal to eliminating network operations costs completely.

How does this relate to NFV?  Well, as I’ve noted in earlier blogs, NFV declared management/operations to be out of scope and it hasn’t generated recommendations either on how to broadly improve service operations or how to integrate NFV with service operations.  As it happens, service-to-network operations integration has been a growing problem.  NFV would make it worse.

The problem with NFV, or even with packet networks, is that the connection between “service” and “network” isn’t fixed, because resources are multi-tenant.  If you can’t connect resource and service management, you can’t automate customer care—and that’s the largest single component of operations expenses, bigger than network operations.  The more multi-tenancy we have, the looser we couple services to resources through virtualization, the more we risk operations disconnect that fouls our service automation nest.  We can’t let network and service management diverge or we lose automation, so we have to figure out what can converge them.

This makes a critical point in the evolution to NFV.  Any NFV business case will have to address service operations efficiency, and thus should apply NFV orchestration principles to network infrastructure (devices or servers) end-to-end to organize and activate service and network operations tasks.  If you can’t drive service operations modernization, you can’t make any NFV business case at all because you can’t change operating expenses and EBITDA.  At best, you can ride on somebody else’s business case—if they’ll let you.

If you want a tingle from the Street, start by telling them you can cut service operations costs by half by applying NFV orchestration principles to legacy infrastructure and service management.  That would cut total operations costs by 17%, raising EBITDA accordingly.  And you’ve not deployed any NFV, any servers, at all.

That’s right!  Orchestration of service operations, if you can represent legacy devices in the infrastructure model, can operate without any virtual functions.  That’s why you can credibly claim a 50% reduction in service operations costs.  You might think you’ve just killed NFV, but what you’ve really done is get the second half of that tingle.

A service operations gain like this does two things.  First, it offers a powerful collection of risk-reducing measures.  How many NFV salespeople spend time telling a prospect that the hosting of functions won’t impact SLAs or increase the cost of customer care?  If we have a modern, automated, system for service management we have a solution to these objections.  Second, all new services will create profit to the extent that they overcome cost issues.  As we’ve seen, most costs like in the service operations side of the organization.  Since new services are covered under our new service operations umbrella, they inherit its efficiencies.

What new services?  It doesn’t matter nearly as much.  In NFV today we have a bunch of NFV islands trying to coalesce into a benefit continent.  Built the continent with service operations tools and you have the natural collection already in place.  You can now fit individual NFV applications in underneath without having to invent a whole service lifecycle process for them.  If you like vCPE or virtual mobile infrastructure, and if you can make a business case for that application alone, you can deploy it.  A service operations umbrella could save the whole lab trial and PoC process and transition it to a consistent, operationalized, field trial.

Who wins with this sort of thing?  You can come at service operations in two ways.  First, you could expand the current OSS/BSS systems to address the necessary service automation processes.  Second, you could expand the current implementations of NFV to include process integration.

Most OSS/BSS systems are at least loosely compatible with the TMF’s model—the extended Telecom Operations Map or eTOM.  Most also support the TMF SID data model.  These are a necessary step for both approaches—you need to be able to map operations processes, which means defining them, and you need centralized service MIBs.  For the OSS/BSS systems to go the rest of the way, they need two things.  One is to be event-driven, which the TMF has defined at a high level in its NGOSS Contract/GB942 specification.  This is rarely implemented, unfortunately.  The other thing OSS/BSS needs is dynamic binding between resource and service domains.  I proposed to the TMF that they adopt the notion of an explicit binding domain to do this, but they didn’t like the idea.

For NFV, the key requirement is to have some sort of service model, something that like the TMF SID could define a service as an association of functional components with synchronized operation.  This model could then record resource bindings and steer events, which combine to make service operations automation possible.

The key for both these approaches is that they have to apply to legacy infrastructure or we miss most of the service operations processes and savings.  That means that the best NFV trials to conduct are those with no NFV at all, trials of service orchestration with legacy components that can then be expanded to demonstrate they work with NFV components too.

As far as NFV vendors go, I believe that Alcatel-Lucent, HP, Huawei, Oracle, and Overture Networks could model and orchestrate the required elements, with perhaps some tuning of the interfaces with legacy elements.  Of the group I’m confident that HP has the right modeling strategy and that both Alcatel-Lucent and Oracle have the required operations bend.  Oracle is involved in some high-level transformation projects that could actually lead it to the service operations conclusion the industry needs.

Why haven’t we done this already if it’s such a good idea?  A big part of the reason is the NFV ISG’s scope issues; the trials and PoCs have tended to match the scope set by the early work.  However that was never a requirement; you were always able to propose a broader scope.  I did with the original CloudNFV project, though the documents were revised to limit scope once I left my role as chief architect, and that was the first PoC approved.

Why not now, then?  Because vendors want simple sales value propositions with big payoffs, and revamping service operations is not only a complex process it’s a process that doesn’t sell a lot of hosting.  Everyone wants a big, easy, win.  There can be a very big win here with NFV, but it’s not going to be an easy one.  All that means is that it’s time to face reality and get started.

Could a “Wall Street” View of NFV Lead to a Business Case?

We all believe that carrier networks are changing, and anyone who ever took a business course knows that industries are changed by changes in their profit picture or regulations.  The business trends of the network operators is one of many issues that’s pushed under the rug when we talk about new technologies like SDN or NFV.  A business case for either has to fit within the operators’ perception of their business.  Or, maybe, in the perception of Wall Street.  Most operators are public companies responsible to their shareholders.  Most shareholders get their data from financial analyst firms and track the movement of big investors like hedge funds.  What factors drive the Street’s view of networking?

First, and foremost, nearly all network operators are public corporations and their decisions are made so as to maximize their share price.  That’s what companies are supposed to do; if they don’t do that they can be the subject of a shareholder lawsuit.  That means that they will pay a lot of attention to “the Street” meaning Wall Street and the financial industry, and in particular the hedge funds.

The Street looks at company financials, and at things that could impact service base—both customers and ARPU (average revenue per user).  They don’t look at technology trends, and they don’t grill CEOs on how much SDN or NFV they’ve deployed.  In fact these acronyms rarely even show up in either operators’ earnings calls or financial analysis of the operator market.

One largely ignored point here, which I’ll address in detail in the next blog in this series, is that the Street is focused on EBITDA.  If you look at the meaning of the term you see that depreciation of capital assets is not included.  In fact, the Street is much less concerned about operator capex trends than about opex trends and revenues.  On this basis, capital savings accrued through technology shifts to SDN and NFV would be uninteresting to the Street unless they were combined with improvements in opex.

Second, network operator ARPU (average revenue per user) and margins are decreasing, even in mobile services, and growth is happening at the expense of EBITDA (earnings before interest, taxes, depreciation, and amortization) margins because costs aren’t keeping pace.  While mobile broadband has grown the subscriber base, that growth has come primarily through lower service costs.  As a result, ARPU is falling for mobile and wireline services (a sampling of 8 global operators shows all have seen ARPU decline in the last three years) and total revenues are at risk.  Operators are projecting the cross-over in 2017, and the business pressure behind all changes to infrastructure, services, or business practices are aimed at turning this margin compression around.

Keep in mind the definition of “EBITDA”.  When you see financial analyst charts of operator profit trends, they always show the EBITDA trend, which you’ll recall excludes capex both in terms of depreciation and new spending.  To turn EBITDA around you either have to boost revenue or reduce opex.  Nothing else will impact the number.

Next, network operators have an “invisible product”.  Users think of their network services in terms of Internet sites or mobile devices.  The operator provides something whose visibility to the user comes only if it’s not working properly.  It is very difficult to differentiate network services on anything but pricing plans and handset availability for wireless, and wireline services are differentiated more on their TV offerings than on broadband features.

This is a fundamental point for operators’ plans.  Since five out of six operator CEOs think they’ve done about as much as they can to extend their customer base, the lack of feature differentiation means they have to look to growing by taking market share through pricing or incentives.  Since new services are like “features”, that makes them look at profit management primarily through the lens of cost management.

But the top dozen OTT players have a larger market capitalization (total value of stock) that’s greater than the top hundred network operators.  The Street would love to see operators somehow get onto a rising-revenue model, and they’d reward any player who managed the transition.

Mobile broadband is exacerbating, not fixing, all of the operators’ OTT issues.  What every financial analyst says about networking today is that the real competition isn’t other providers of network services, but the OTTs.  That implies that the prize not those featureless connection services but the experiences that users obtain with them.  Mobility has made things worse by decoupling users from the baseline network services operators rely on historically, like voice.  It’s now, through mobile video, threatening the live TV viewing franchises that are the best profit sources for wireline services.

Mobile infrastructure is expensive, too.  Operators’ wireline franchises were developed in home regions, but credible mobile services have to be national or even (in Europe) continental in scope.  You need roaming agreements, out-of-region infrastructure, and all this combines to create a lot of expensive competitive overbuild.  Operators are looking for infrastructure-sharing or third-party solutions, and tower-sharing has already developed.

CxOs have more non-technology initiatives to address their issues than ones based on new technologies like SDN or NFV.  Could a major oil company make money in some other market area, or a soft-drink company become a giant in making screwdrivers?  Probably, but it would stretch the brand and the skill set of everyone involved.  The focus of operators for the last decade has been to improve their current operations, not revolutionize it.  That might frustrate those who would benefit from radical change, but it’s a logical approach.

The important thing is that it hasn’t worked.  If you look at the IBM studies on the industry for that last period, you see recurring comments on making the industry more efficient in each one, and yet no significant gains have been achieved.  It’s becoming clear to operators and Wall Street that you need to do something more radical here, and so technology change is at least viable.

But major IT vendors tend to push “knowledge” or big-data initiatives as having more impact than infrastructure change.  In IBM’s most recent analysis, they rate the potential impact of knowledge-driven operational change at a thousand times that of changing infrastructure.  On one hand, you could argue that “big data” has simply relabeled approaches proposed for a decade without success.  On the other, since CxOs are reluctant to leap into the great unknown, even that sophistry would be welcomed.

As long as there are even semi-credible options more practical than re-engineering a trillion dollars’ worth of network installed base, operators are likely to consider them seriously.  That puts the onus on vendors who want a network-centric solution; they have to make their approach look both “safe” and “effective” because it’s competing with something that’s a slight re-make of what’s been done all along and is therefore at least well-understood.

The current network vendors aren’t anxious to promote a new model and financial markets don’t like infrastructure players or infrastructure vendors.  Operators themselves are unable to drive a massive technology change.  In many geographies they could not collaborate among themselves to set specifications or requirements without being accused of anti-trust collusion.  In some areas they are still limited in being able to make their own products, and in any case it would hardly be efficient if every operator invented a one-off strategy for the next-generation network.  We need some harmonious model, and vendors.  But existing vendors aren’t eager to change the game (any more than the operators are), and the financial markets would rather fund social-network startups than infrastructure startups.

There are major tech vendors who are not incumbent network players, and thus have nothing much to lose in the shift toward a software-driven future.  There are a few who are pure software, some who are hybrid IT, and even some small players.  While perhaps a half-dozen could field an effective NFV solution, none are currently able to overcome the enormous educational barrier.

Financial analysts don’t believe in a network transformation.  None of the analyst reports suggest a major change in network technology is the solution to operators’ EBITDA slide.  Other surveys of operator CEOs reveal that they believe they have made good progress in economizing at the infrastructure level, and one financial report says that capex has declined by about 11% over the last five years.

McKinsey did a report four years ago listing over a dozen recommendations on steps operators should take.  None involved infrastructure modernization, and while the study predates SDN and NFV it shows that technology shifts were not perceived as the path to profit salvation.  EY’s study a year later listed a half-dozen changes operators should make, and none involved transformation of infrastructure.  EY’s 2014 study presumes the value of changes in technology would arise from improvements in network failure rates and other operations-based costs, not in lowering the network cost base.

If operators don’t feel direct pressure from the financial industry to transform infrastructure, the corollary is that they’ll have to convince the financial industry of the benefits of transformation.  They can’t do that if they don’t have a clear picture themselves, and that’s the situation today.

Conclusion:  Operators need to address EBITDA contraction to convince Wall Street that their business is trending in the right direction.  On the cost side, that means addressing not capex but opex, which is the major component of EBITDA.  On the revenue side, it means defining some credible service structure that’s not dependent on selling connectivity.

I have a very good friend in the financial industry, a money manager who understands the Street and who provided me with the Street models and reports I’ve used to prepare this.  It’s worth recounting an email exchange I had with him:

Tom:  “It looks like the Street would reward a reduction of x dollars in opex more than they’d reward the same in capex because the opex savings would go right to EBITDA and the capex one wouldn’t show up there at all.”

Nick:  “Right and if they made a case for a high ROI for capex and could reduce OPEX at the same time it would make the Street all tingly inside.”

What this says is that NFV (and SDN) need to be validated in two steps:  First, you have to improve operations efficiency on a large scale—large enough to impact EBITDA.  Then you have to focus your validation of the capital equipment or infrastructure changes on a high-ROI service.  Would you like to be able to make a business case for NFV that would “make the Street all tingly inside?”  In my next blog I’ll discuss how this might be done.

What Do Salespeople Think of NFV?

If there’s a front line for NFV, that front line is the sales effort.  Since I’ve started to blog about the difficulties associated with making the NFV business case, I’ve gotten a lot of comments from salespeople who are charged with the responsibility of doing that.  I’ve tabulated just shy of 30 of these, and I think it’s interesting to see what they suggest.  Obviously, I’m suppressing anything that would identify salespeople, their companies, or their customers.

Let’s start with the state of the NFV market.  100% of the sales people thought that the NFV market was “developing more slowly than they had expected”, and this same percentage said that they believed their own companies were “unsatisfied” with the level of NFV sales.  93% said that the pace of development was “unsatisfactory for them personally.”  While this sounds dire, it’s wise to remember that salespeople are often unhappy with the pace of market development.  For SDN, for example, 76% say that market development is too slow, 81% that their company is dissatisfied, and 79% that they’re personally unhappy with the pace.  New technology gets a slow start.

Why is NFV developing too slowly?  Remember that buyers are almost unanimous in saying that the problem is that the business case hasn’t been made.  The sellers see it differently.  The reason given by 79% of the sales people is that “buyers are reluctant to take a risk with new technology”.  With multiple answers accepted, 43% say that “too many people have to sign off on NFV” and 29% say that they “have difficulties engaging the decision-makers” on the technology.

Who’s the engagement with?  In 89% of cases, sales engagement is reportedly with the office of the CTO and is linked in some way to lab trials.  In the majority of the rest, engagement is with a team reporting to the CEO.  Only about 4% say they’re engaging a specific CxO outside the CTO group.  This meshes with the question of who the culprit is in slowing NFV sales.  In 89% of the cases, salespeople name the CFO, in 7% the CIO, and in 4% the CTO’s organization.

The number of NFV successes is too small to get much from in a survey sense, but what seems interesting is that for nearly all the successes so far, there’s been no formal PoC or lab trial proving NFV out.  Instead, the sales organization has driven a “virtualization” story or an “agility” story and then linked that story to NFV when the basic benefit thesis is accepted.  In these cases, engagement was not with the CTO or standards people, but with operations or the CIO.

How do salespeople feel about the industry view of NFV?  Here we see a very distinct split.  Exactly half think that the industry view is “pessimistic” and that it creates a “headwind that impacts their sales success.”  Another half say that they believe the view is “optimistic” and that it’s generating unrealistic expectations and failing to cover the issues properly, which then makes the salespeople engage too much in “education”.  Media coverage is “inadequate” in two-thirds of cases, but the company’s own website is considered inadequate by 71% of salespeople and the marketing collateral available to support sales is considered “below par” by 79%.

Will their company stay the course and make NFV a success?  There are a lot of views here.  In all, 75% think their company will stay with NFV and eventually make it a success for them.  But only 29% think that NFV will be a “major revolution in networking” and only 21% think it will be the technology that will dominate their own career.  Those who remember ATM and other network technology waves come down slightly on the side of NFV being another technology that will fall short of its goals (54%).

It sure looks like salespeople are having issues with NFV, and the biggest problem seems to be that selling cycles are incredibly long and there seems no end to the issues that come up.  It’s common for a salesperson to say that every time they prove a point, another question arises.  It’s sales whack-a-mole, in other words.  There’s no consensus on why that is, though you could attribute the problem to a number of points the salesperson makes, including the constituency difficulties and the business case challenge.

It would be lovely if you could simply ask sales people what needs to be done, but not helpful.  A large majority take the position that the buyers just need to suck it up and get to work deploying NFV.  Forget buy-in or a business case.  Obviously that’s not going to happen, and those who have worked with sales organizations for a long time know this is a common reaction.

What do I think this shows?  To me, the sales reaction is a clear symptom of a technology in search of a justification.  When the network operators launched NFV (with the “Call for Action” paper in October 2012) they set a goal, which was their intention.  That goal, like all goals, has to be met through a clear identification of benefits to be reaped and pathways to doing the reaping.  We’ve not done that with NFV, nor did we really do it with SDN or ATM, for that matter.  If a salesperson knows that the buyer wants three things, they know how to approach and control the sale.  If they think the buyer is simply being driven toward a technology change by the relentless progress of technology itself, they get frustrated with those who don’t want to get moving.

What I think is the major disconnect between salespeople and the buyers is the area of operations.  Salespeople didn’t say much about operations issues.  If you ask them what the benefit of NFV would be to buyers, three-quarters say “capex reduction” even though operators have largely determined that benefit won’t be enough to drive massive NFV acceptance.  Only 11% mentioned opex at all, and none of them said that there had to be a better opex strategy.  Operators think that opex is critical in almost 100% of cases, and more than three-quarters recognize that “service agility” is linked to the service lifecycle and service operations.  The disconnect is likely due to the low rate of engagement with the CIOs.

I think this situation is serious for NFV, but I also think it will change in some decisive way by the end of the year.  Buyers going into their fall planning cycle are already starting to make their points more explicitly to sellers, and that’s percolating back to sales management and into the executive suites of vendors.  I also think everyone is realizing that if big bucks are going to be spent on NFV in 2016 we’ll need to get it in the budgets, and that has to happen late this year.  All that focus will either hone an NFV strategy (likely several, in fact) or it will make it clear that NFV will be a feature of the future but not a driver.

What Would an NFV Future Look Like (and Who Would Win It)?

I’ve noted in the past that it’s proven difficult to make a business case for NFV.  Rather than address that point now, I propose to ask “But what if it can?”  Remember that while I’m unimpressed (to say the least) with efforts to paint a plausible picture to justify NFV deployment, I believe firmly that one could be drawn.  In fact at least three vendors and possibly as many as five could do that.  In a competitive market, if one succeeds others will jump in.  Who gets the big bucks?  What are they from, in terms of class of equipment?  The last question is the easiest to answer, and from that answer the rest will likely flow.

No matter what proponents of CPE-hosting of VNFs say, NFV can’t succeed on that basis and in fact can’t generate a boatload of revenue in that space.  Yes, we can expect to see customization of CPE to allow for hosting of features, remote management and updating, and even in some cases “offline” operation.  That’s not where the majority of the money will be, though.  It will help NFV bootstrap itself into the future, but the future of NFV is the cloud.

NFV would be, if fully successful, a source of a bit over 100 thousand data centers, supporting well over a million new servers.  These will not be the traditional hyperscale cloud centers we hear about, though cloud data centers will surely be involved in NFV hosting and NFV principles will extend to influence operations practices in virtually all of them.  What will characterize the NFV-specific data centers is distribution.  Every metro area will have at least one, and probably an average of a dozen.  This is the first distinguishing factor about NFV servers, the key to succeeding in the NFV server space.

The first and most numerous tier of NFV servers have to be placed proximate to the point of user attachment.  That’s a point that operators agree on already.  If you try to haul traffic too far to connect virtual functions you run the risk of creating reliability problems in the connections alone and you create a need for an expensive web of connectivity.  Many operators expect to a server for every central office and every cluster of wireless cells (generally, where SGWs might be located), and expect those servers to be connected by very fast fiber trunks so that intra-function communication is easy.  These trunks will, for services with NFV in the data path, become the traffic distribution elements of the future, so they’ll have to be both fast and reliable.  So will the interfaces, and servers will have to be optimized to support a small number of very fast connections.

The NFV servers will be big, meaning that they’ll have a lot of CPUs/cores and a lot of memory.  They’ll be designed for very high availability, and they’ll use operating system software that’s also designed for “carrier grade” operations.  Yes, in theory, you can substitute alternative instances for higher availability, but operators seem skeptical that this could substitute for high-availability servers; they see it as a way to supplement that feature.

Although there’s been a broad assumption that the servers would run VMs, the trend recently has been toward containers, for several reasons.  First, many of the VNFs are per-user deployments and thus would probably not require an enormous amount of resources.  Second, VNFs are deployed under tight control (or they should be) and so tenant isolation isn’t as critical as it might be in a public cloud.  Finally, emerging NFV opportunities in areas like content and IoT are probably going to be based on “transient” applications loaded as needed and where needed.  This dynamism is easier to support with containers.

So who wins?  The player everyone believes is most likely to benefit from NFV is Intel.  Their chips are the foundation for nearly all the deployments, and the model of NFV I’m suggesting here would favor the larger chips over microserver technologies where Intel is less dominant.  Intel’s Wind River Titanium Server is the most credible software framework for NFV.  Intel is a sponsor of IO Visor, which I think will be a big factor in assuring foundation services for NFV.  While I think Intel could still do more to promote the NFV business case, their support of NFV so far is obviously justified.

A tier up from Intel are the server vendors, and these divide into two groups—those who have foundation technology to build an NFV business case and those who have only infrastructure to exploit opportunities that develop elsewhere.  Servers will be, by a long shot, the most-invested-in technology if NFV succeeds, which gives server vendors a seat at the head of the table in controlling deals.  If there are deals to control, that is.  HP is the only server vendor in that first group, and in fact the NFV vendor who is most likely to be capable of making a broad case for NFV with their current product line.

The fact that a server vendor could make the business case means to me that other server vendors’ positions with NFV are more problematic.  If HP were to burst out with an astonishingly good positioning that included a solid operations story, they could win enough deals to look like a sure path to success for operators, in which case everyone else would have to catch up.  Today in NFV it would be difficult to put together a great competing story quickly, so a lot of momentum would be lost.

That defines the next level of winner, the “MANO player”.  If you have an NFV solution that could form a key piece of an operations/legacy element orchestration story, a supplement to plain old OpenStack in other words, then you might get snapped up in an M&A wave by a server vendor who doesn’t have something of their own.  However, the window on this is short.  I think most NFV-driven M&A will be over by the end of 1H16.

VNF players are often seen as major winners, but I don’t think “major” will be the right word.  It is very clear that operators prefer open strategies, which few VNFs support.  I believe that operators also want either a pure-licensing or a “pay-as” that evolves into a fixed licensing deal.  The VNF guys seem to think they can build a revenue stream with per-user per deployment fees.  Given this, I think that there will be a few big VNF winners, firms who figure out how to make “VNF as a service” work to everyone’s advantage and who have magnet capabilities (mobile, content, collaboration) for which there are fewer credible open alternatives.

To me, this situation makes it clear that the most likely “winners” in NFV will be IT giants who have minimal exposure to traditional telco network equipment.  They have so much to gain and so little to lose that their incentive for being a powerful mover will be hard to overcome.  That said, though, every NFV player so far has managed to overcome a lot of incentive and even managed to evade reality.  That means that a player with a powerful magnet concept like Alcatel-Lucent’s vIMS/Rapport or Oracle’s operations-driven NFV could still take the lead.  We’ll have to see how things evolve.

What Does a “Business Case” Involve for SDN or NFV?

One of the recurring points I make in my blogs about SDN and NFV is the need for a business case.  Most are aware that “business case” means a financial justification of an investment or project, but I’ve gotten some questions from some who’d like to understand a bit more about the process of “making” one in detail.

At the high level, a business case starts with the return on investment for the project, which is the net benefit divided by the investment (roughly; you have to include other factors like cost of money).  This ROI is compared with a corporate target set by the CFO, and if it beats the target the project is financially justified.  This whole process is fairly well understood, and it’s been applied by enterprises and service providers for decades.

What makes business cases challenging for technologies like SDN and NFV is the notion of “net benefit” when it’s applied to either cost savings, revenues, or a combination thereof.  Revenue-based benefits are always challenging because you have to quantify how much revenue you would gain, and also explain why you hadn’t already gained it.  Cost-based benefits are challenging because you have to consider the total impact on costs, which is not only larger than “capex” alone, it’s actually larger than total cost of ownership.

On the revenue side, let me offer an example.  Suppose you have a thousand business customers for a carrier Ethernet service that includes some firewall and other CPE-based features.  You determine that you can accelerate the provisioning of these services by a month.  How much revenue have you earned?  A month’s worth?  Perhaps, but probably much less than that.  The reason is that you’ll gain time-to-revenue for new deployments, and we didn’t identify how many of them there were.  We also don’t know whether a given customer would actually take the service a month early or would delay ordering.  If a new branch office opened on September first, would your enterprise say “Heck, if you can light up my Ethernet a month before anyone even moves in, that’s fine?”

On the cost side, suppose we could replace that Ethernet CPE with a cloud-hosted equivalent.  The box costs two grand today, so do we save that?  Not likely.  First, you need something to terminate the service on premises, so we would simplify the box there but not eliminate it.  Second, we are making a potentially fatal assumption by assuming the only cost is capital cost.  The operational environment associated with cloud-hosted functional elements is surely higher than for a single box.  Even if we don’t know that for sure, we’d have to validate the assumption it wasn’t.  Then, we’d have to look at the question of whether we would impact other costs, like customer support calls to inquire about the status of a service.  You need to understand the impact on all costs to determine the benefit, or lack thereof.

When CFOs look at SDN or NFV projects, or at other network projects, these are the things they look for.  What has made things difficult for SDN and NFV sales is that the trials and tests that have been run on the technologies have not addressed the range of cost and benefit qualifiers I’ve noted here (which are only the most obvious ones).  A CFO is presented with a comment that they could save 25% in capital cost by replacing multi-feature physical devices on premises with a combination of basic service termination and hosted functionality.  OK, says the CFO, what will it cost to operate this?  In nearly all cases, the CFO won’t get a complete response.

Then the CFO says “What will the impact be on service availability and customer support?  Will I have to credit back for outages because I missed my SLA?  Oh, you can’t tell me what SLA I can write?  And if the customer calls and says the service is out, you can’t really be sure whether it is or isn’t?”  You can imagine how well this sits in a financial review.

The cost of a router or a switch can be higher than the cost of a virtual one in a capital outlay sense, but you can see that even traditional total-cost-of-ownership metrics won’t fully address the “financial cost” challenge, and that’s what you need to determine a business case.  Operators know what it costs to run networks based on legacy technology.  Yes, it’s too much.  But they can’t accept a statement that SDN or NFV will make it better as proof that will happen, and that’s what they’re asked to do if a given SDN or NFV trial doesn’t validate operations impact to the point where opex cost comparisons with the legacy network can be made.

There’s also the question of “risk premium”.  If you give a buyer two choices—one known and the other unknown—to solve a problem, they will pick the known approach even if the unknown one has a slight advantage.  They don’t want to face a risk unless they have an ample justification.  With SDN and NFV, operators are not confident that the business case is easily met, so they can’t presume enormous financial benefits.  Thus, they have to reduce that risk premium, which means they have to answer some of the knotty questions like “How many servers do I need to achieve reasonable economy of scale?” or “What is the MTBF and MTTR of a virtual-function-based service chain?”  They may even have a basic question like “What happens if I start deploying this and it doesn’t work at scale?”

Sellers can address the risk premium problem by demonstration, meaning that they can propose and run tests that show what can be expected of SDN or NFV operations at various levels of deployment.  They could also discount or make other commercial concessions (loan equipment, etc.) to address this premium, but they can never simply propose that it not be imposed.  They can never expect operators to buy without an adequate ROI, either.

In rough terms, operators are saying that SDN or NFV deployments that save less than about 20% in costs are not going to fly because they can achieve those levels of savings by pushing vendors for product discounts on legacy equipment.  The need for better benefits is one reason why operators (and therefore vendors) have moved from pure capex reductions to capex/opex/revenue benefits for SDN and NFV.  But capex is nice because you can apply it to a box.  When you start to talk opex and revenues, you’re talking now about service infrastructure as a whole, and that’s what is making things complicated now.

So you’re an SDN or NFV vendor and you want to “make a business case”.  What do you do?  First, you have to identify the specific impact of your approach on revenues and costs, overall.  That means running at least some credible tests to establish either that operations of the new infrastructure is completely equivalent to the old, or what the differences would mean in cost terms.  Second, you have to point to specific proof points that would validate your tests, and finally you have to work with the operator to devise a trial that would demonstrate behavior at these points.

The big hole in all of this is operations, which is what it was from the first.  Because we don’t have a mature management model for either SDN or NFV, we can’t easily validate even basic infrastructure TCO much less total cost impact.  Most of the purported SDN or NFV vendors don’t have a management solution at all, and those who do have been constrained so far by the limited scope of trials.  It might be a slog to make a limited trial big enough, to include enough, to make a business case, but it’s nothing more than applying a process that’s been in place for decades, and we don’t have a choice.