Could a “Wall Street” View of NFV Lead to a Business Case?

We all believe that carrier networks are changing, and anyone who ever took a business course knows that industries are changed by changes in their profit picture or regulations.  The business trends of the network operators is one of many issues that’s pushed under the rug when we talk about new technologies like SDN or NFV.  A business case for either has to fit within the operators’ perception of their business.  Or, maybe, in the perception of Wall Street.  Most operators are public companies responsible to their shareholders.  Most shareholders get their data from financial analyst firms and track the movement of big investors like hedge funds.  What factors drive the Street’s view of networking?

First, and foremost, nearly all network operators are public corporations and their decisions are made so as to maximize their share price.  That’s what companies are supposed to do; if they don’t do that they can be the subject of a shareholder lawsuit.  That means that they will pay a lot of attention to “the Street” meaning Wall Street and the financial industry, and in particular the hedge funds.

The Street looks at company financials, and at things that could impact service base—both customers and ARPU (average revenue per user).  They don’t look at technology trends, and they don’t grill CEOs on how much SDN or NFV they’ve deployed.  In fact these acronyms rarely even show up in either operators’ earnings calls or financial analysis of the operator market.

One largely ignored point here, which I’ll address in detail in the next blog in this series, is that the Street is focused on EBITDA.  If you look at the meaning of the term you see that depreciation of capital assets is not included.  In fact, the Street is much less concerned about operator capex trends than about opex trends and revenues.  On this basis, capital savings accrued through technology shifts to SDN and NFV would be uninteresting to the Street unless they were combined with improvements in opex.

Second, network operator ARPU (average revenue per user) and margins are decreasing, even in mobile services, and growth is happening at the expense of EBITDA (earnings before interest, taxes, depreciation, and amortization) margins because costs aren’t keeping pace.  While mobile broadband has grown the subscriber base, that growth has come primarily through lower service costs.  As a result, ARPU is falling for mobile and wireline services (a sampling of 8 global operators shows all have seen ARPU decline in the last three years) and total revenues are at risk.  Operators are projecting the cross-over in 2017, and the business pressure behind all changes to infrastructure, services, or business practices are aimed at turning this margin compression around.

Keep in mind the definition of “EBITDA”.  When you see financial analyst charts of operator profit trends, they always show the EBITDA trend, which you’ll recall excludes capex both in terms of depreciation and new spending.  To turn EBITDA around you either have to boost revenue or reduce opex.  Nothing else will impact the number.

Next, network operators have an “invisible product”.  Users think of their network services in terms of Internet sites or mobile devices.  The operator provides something whose visibility to the user comes only if it’s not working properly.  It is very difficult to differentiate network services on anything but pricing plans and handset availability for wireless, and wireline services are differentiated more on their TV offerings than on broadband features.

This is a fundamental point for operators’ plans.  Since five out of six operator CEOs think they’ve done about as much as they can to extend their customer base, the lack of feature differentiation means they have to look to growing by taking market share through pricing or incentives.  Since new services are like “features”, that makes them look at profit management primarily through the lens of cost management.

But the top dozen OTT players have a larger market capitalization (total value of stock) that’s greater than the top hundred network operators.  The Street would love to see operators somehow get onto a rising-revenue model, and they’d reward any player who managed the transition.

Mobile broadband is exacerbating, not fixing, all of the operators’ OTT issues.  What every financial analyst says about networking today is that the real competition isn’t other providers of network services, but the OTTs.  That implies that the prize not those featureless connection services but the experiences that users obtain with them.  Mobility has made things worse by decoupling users from the baseline network services operators rely on historically, like voice.  It’s now, through mobile video, threatening the live TV viewing franchises that are the best profit sources for wireline services.

Mobile infrastructure is expensive, too.  Operators’ wireline franchises were developed in home regions, but credible mobile services have to be national or even (in Europe) continental in scope.  You need roaming agreements, out-of-region infrastructure, and all this combines to create a lot of expensive competitive overbuild.  Operators are looking for infrastructure-sharing or third-party solutions, and tower-sharing has already developed.

CxOs have more non-technology initiatives to address their issues than ones based on new technologies like SDN or NFV.  Could a major oil company make money in some other market area, or a soft-drink company become a giant in making screwdrivers?  Probably, but it would stretch the brand and the skill set of everyone involved.  The focus of operators for the last decade has been to improve their current operations, not revolutionize it.  That might frustrate those who would benefit from radical change, but it’s a logical approach.

The important thing is that it hasn’t worked.  If you look at the IBM studies on the industry for that last period, you see recurring comments on making the industry more efficient in each one, and yet no significant gains have been achieved.  It’s becoming clear to operators and Wall Street that you need to do something more radical here, and so technology change is at least viable.

But major IT vendors tend to push “knowledge” or big-data initiatives as having more impact than infrastructure change.  In IBM’s most recent analysis, they rate the potential impact of knowledge-driven operational change at a thousand times that of changing infrastructure.  On one hand, you could argue that “big data” has simply relabeled approaches proposed for a decade without success.  On the other, since CxOs are reluctant to leap into the great unknown, even that sophistry would be welcomed.

As long as there are even semi-credible options more practical than re-engineering a trillion dollars’ worth of network installed base, operators are likely to consider them seriously.  That puts the onus on vendors who want a network-centric solution; they have to make their approach look both “safe” and “effective” because it’s competing with something that’s a slight re-make of what’s been done all along and is therefore at least well-understood.

The current network vendors aren’t anxious to promote a new model and financial markets don’t like infrastructure players or infrastructure vendors.  Operators themselves are unable to drive a massive technology change.  In many geographies they could not collaborate among themselves to set specifications or requirements without being accused of anti-trust collusion.  In some areas they are still limited in being able to make their own products, and in any case it would hardly be efficient if every operator invented a one-off strategy for the next-generation network.  We need some harmonious model, and vendors.  But existing vendors aren’t eager to change the game (any more than the operators are), and the financial markets would rather fund social-network startups than infrastructure startups.

There are major tech vendors who are not incumbent network players, and thus have nothing much to lose in the shift toward a software-driven future.  There are a few who are pure software, some who are hybrid IT, and even some small players.  While perhaps a half-dozen could field an effective NFV solution, none are currently able to overcome the enormous educational barrier.

Financial analysts don’t believe in a network transformation.  None of the analyst reports suggest a major change in network technology is the solution to operators’ EBITDA slide.  Other surveys of operator CEOs reveal that they believe they have made good progress in economizing at the infrastructure level, and one financial report says that capex has declined by about 11% over the last five years.

McKinsey did a report four years ago listing over a dozen recommendations on steps operators should take.  None involved infrastructure modernization, and while the study predates SDN and NFV it shows that technology shifts were not perceived as the path to profit salvation.  EY’s study a year later listed a half-dozen changes operators should make, and none involved transformation of infrastructure.  EY’s 2014 study presumes the value of changes in technology would arise from improvements in network failure rates and other operations-based costs, not in lowering the network cost base.

If operators don’t feel direct pressure from the financial industry to transform infrastructure, the corollary is that they’ll have to convince the financial industry of the benefits of transformation.  They can’t do that if they don’t have a clear picture themselves, and that’s the situation today.

Conclusion:  Operators need to address EBITDA contraction to convince Wall Street that their business is trending in the right direction.  On the cost side, that means addressing not capex but opex, which is the major component of EBITDA.  On the revenue side, it means defining some credible service structure that’s not dependent on selling connectivity.

I have a very good friend in the financial industry, a money manager who understands the Street and who provided me with the Street models and reports I’ve used to prepare this.  It’s worth recounting an email exchange I had with him:

Tom:  “It looks like the Street would reward a reduction of x dollars in opex more than they’d reward the same in capex because the opex savings would go right to EBITDA and the capex one wouldn’t show up there at all.”

Nick:  “Right and if they made a case for a high ROI for capex and could reduce OPEX at the same time it would make the Street all tingly inside.”

What this says is that NFV (and SDN) need to be validated in two steps:  First, you have to improve operations efficiency on a large scale—large enough to impact EBITDA.  Then you have to focus your validation of the capital equipment or infrastructure changes on a high-ROI service.  Would you like to be able to make a business case for NFV that would “make the Street all tingly inside?”  In my next blog I’ll discuss how this might be done.

What Do Salespeople Think of NFV?

If there’s a front line for NFV, that front line is the sales effort.  Since I’ve started to blog about the difficulties associated with making the NFV business case, I’ve gotten a lot of comments from salespeople who are charged with the responsibility of doing that.  I’ve tabulated just shy of 30 of these, and I think it’s interesting to see what they suggest.  Obviously, I’m suppressing anything that would identify salespeople, their companies, or their customers.

Let’s start with the state of the NFV market.  100% of the sales people thought that the NFV market was “developing more slowly than they had expected”, and this same percentage said that they believed their own companies were “unsatisfied” with the level of NFV sales.  93% said that the pace of development was “unsatisfactory for them personally.”  While this sounds dire, it’s wise to remember that salespeople are often unhappy with the pace of market development.  For SDN, for example, 76% say that market development is too slow, 81% that their company is dissatisfied, and 79% that they’re personally unhappy with the pace.  New technology gets a slow start.

Why is NFV developing too slowly?  Remember that buyers are almost unanimous in saying that the problem is that the business case hasn’t been made.  The sellers see it differently.  The reason given by 79% of the sales people is that “buyers are reluctant to take a risk with new technology”.  With multiple answers accepted, 43% say that “too many people have to sign off on NFV” and 29% say that they “have difficulties engaging the decision-makers” on the technology.

Who’s the engagement with?  In 89% of cases, sales engagement is reportedly with the office of the CTO and is linked in some way to lab trials.  In the majority of the rest, engagement is with a team reporting to the CEO.  Only about 4% say they’re engaging a specific CxO outside the CTO group.  This meshes with the question of who the culprit is in slowing NFV sales.  In 89% of the cases, salespeople name the CFO, in 7% the CIO, and in 4% the CTO’s organization.

The number of NFV successes is too small to get much from in a survey sense, but what seems interesting is that for nearly all the successes so far, there’s been no formal PoC or lab trial proving NFV out.  Instead, the sales organization has driven a “virtualization” story or an “agility” story and then linked that story to NFV when the basic benefit thesis is accepted.  In these cases, engagement was not with the CTO or standards people, but with operations or the CIO.

How do salespeople feel about the industry view of NFV?  Here we see a very distinct split.  Exactly half think that the industry view is “pessimistic” and that it creates a “headwind that impacts their sales success.”  Another half say that they believe the view is “optimistic” and that it’s generating unrealistic expectations and failing to cover the issues properly, which then makes the salespeople engage too much in “education”.  Media coverage is “inadequate” in two-thirds of cases, but the company’s own website is considered inadequate by 71% of salespeople and the marketing collateral available to support sales is considered “below par” by 79%.

Will their company stay the course and make NFV a success?  There are a lot of views here.  In all, 75% think their company will stay with NFV and eventually make it a success for them.  But only 29% think that NFV will be a “major revolution in networking” and only 21% think it will be the technology that will dominate their own career.  Those who remember ATM and other network technology waves come down slightly on the side of NFV being another technology that will fall short of its goals (54%).

It sure looks like salespeople are having issues with NFV, and the biggest problem seems to be that selling cycles are incredibly long and there seems no end to the issues that come up.  It’s common for a salesperson to say that every time they prove a point, another question arises.  It’s sales whack-a-mole, in other words.  There’s no consensus on why that is, though you could attribute the problem to a number of points the salesperson makes, including the constituency difficulties and the business case challenge.

It would be lovely if you could simply ask sales people what needs to be done, but not helpful.  A large majority take the position that the buyers just need to suck it up and get to work deploying NFV.  Forget buy-in or a business case.  Obviously that’s not going to happen, and those who have worked with sales organizations for a long time know this is a common reaction.

What do I think this shows?  To me, the sales reaction is a clear symptom of a technology in search of a justification.  When the network operators launched NFV (with the “Call for Action” paper in October 2012) they set a goal, which was their intention.  That goal, like all goals, has to be met through a clear identification of benefits to be reaped and pathways to doing the reaping.  We’ve not done that with NFV, nor did we really do it with SDN or ATM, for that matter.  If a salesperson knows that the buyer wants three things, they know how to approach and control the sale.  If they think the buyer is simply being driven toward a technology change by the relentless progress of technology itself, they get frustrated with those who don’t want to get moving.

What I think is the major disconnect between salespeople and the buyers is the area of operations.  Salespeople didn’t say much about operations issues.  If you ask them what the benefit of NFV would be to buyers, three-quarters say “capex reduction” even though operators have largely determined that benefit won’t be enough to drive massive NFV acceptance.  Only 11% mentioned opex at all, and none of them said that there had to be a better opex strategy.  Operators think that opex is critical in almost 100% of cases, and more than three-quarters recognize that “service agility” is linked to the service lifecycle and service operations.  The disconnect is likely due to the low rate of engagement with the CIOs.

I think this situation is serious for NFV, but I also think it will change in some decisive way by the end of the year.  Buyers going into their fall planning cycle are already starting to make their points more explicitly to sellers, and that’s percolating back to sales management and into the executive suites of vendors.  I also think everyone is realizing that if big bucks are going to be spent on NFV in 2016 we’ll need to get it in the budgets, and that has to happen late this year.  All that focus will either hone an NFV strategy (likely several, in fact) or it will make it clear that NFV will be a feature of the future but not a driver.

What Would an NFV Future Look Like (and Who Would Win It)?

I’ve noted in the past that it’s proven difficult to make a business case for NFV.  Rather than address that point now, I propose to ask “But what if it can?”  Remember that while I’m unimpressed (to say the least) with efforts to paint a plausible picture to justify NFV deployment, I believe firmly that one could be drawn.  In fact at least three vendors and possibly as many as five could do that.  In a competitive market, if one succeeds others will jump in.  Who gets the big bucks?  What are they from, in terms of class of equipment?  The last question is the easiest to answer, and from that answer the rest will likely flow.

No matter what proponents of CPE-hosting of VNFs say, NFV can’t succeed on that basis and in fact can’t generate a boatload of revenue in that space.  Yes, we can expect to see customization of CPE to allow for hosting of features, remote management and updating, and even in some cases “offline” operation.  That’s not where the majority of the money will be, though.  It will help NFV bootstrap itself into the future, but the future of NFV is the cloud.

NFV would be, if fully successful, a source of a bit over 100 thousand data centers, supporting well over a million new servers.  These will not be the traditional hyperscale cloud centers we hear about, though cloud data centers will surely be involved in NFV hosting and NFV principles will extend to influence operations practices in virtually all of them.  What will characterize the NFV-specific data centers is distribution.  Every metro area will have at least one, and probably an average of a dozen.  This is the first distinguishing factor about NFV servers, the key to succeeding in the NFV server space.

The first and most numerous tier of NFV servers have to be placed proximate to the point of user attachment.  That’s a point that operators agree on already.  If you try to haul traffic too far to connect virtual functions you run the risk of creating reliability problems in the connections alone and you create a need for an expensive web of connectivity.  Many operators expect to a server for every central office and every cluster of wireless cells (generally, where SGWs might be located), and expect those servers to be connected by very fast fiber trunks so that intra-function communication is easy.  These trunks will, for services with NFV in the data path, become the traffic distribution elements of the future, so they’ll have to be both fast and reliable.  So will the interfaces, and servers will have to be optimized to support a small number of very fast connections.

The NFV servers will be big, meaning that they’ll have a lot of CPUs/cores and a lot of memory.  They’ll be designed for very high availability, and they’ll use operating system software that’s also designed for “carrier grade” operations.  Yes, in theory, you can substitute alternative instances for higher availability, but operators seem skeptical that this could substitute for high-availability servers; they see it as a way to supplement that feature.

Although there’s been a broad assumption that the servers would run VMs, the trend recently has been toward containers, for several reasons.  First, many of the VNFs are per-user deployments and thus would probably not require an enormous amount of resources.  Second, VNFs are deployed under tight control (or they should be) and so tenant isolation isn’t as critical as it might be in a public cloud.  Finally, emerging NFV opportunities in areas like content and IoT are probably going to be based on “transient” applications loaded as needed and where needed.  This dynamism is easier to support with containers.

So who wins?  The player everyone believes is most likely to benefit from NFV is Intel.  Their chips are the foundation for nearly all the deployments, and the model of NFV I’m suggesting here would favor the larger chips over microserver technologies where Intel is less dominant.  Intel’s Wind River Titanium Server is the most credible software framework for NFV.  Intel is a sponsor of IO Visor, which I think will be a big factor in assuring foundation services for NFV.  While I think Intel could still do more to promote the NFV business case, their support of NFV so far is obviously justified.

A tier up from Intel are the server vendors, and these divide into two groups—those who have foundation technology to build an NFV business case and those who have only infrastructure to exploit opportunities that develop elsewhere.  Servers will be, by a long shot, the most-invested-in technology if NFV succeeds, which gives server vendors a seat at the head of the table in controlling deals.  If there are deals to control, that is.  HP is the only server vendor in that first group, and in fact the NFV vendor who is most likely to be capable of making a broad case for NFV with their current product line.

The fact that a server vendor could make the business case means to me that other server vendors’ positions with NFV are more problematic.  If HP were to burst out with an astonishingly good positioning that included a solid operations story, they could win enough deals to look like a sure path to success for operators, in which case everyone else would have to catch up.  Today in NFV it would be difficult to put together a great competing story quickly, so a lot of momentum would be lost.

That defines the next level of winner, the “MANO player”.  If you have an NFV solution that could form a key piece of an operations/legacy element orchestration story, a supplement to plain old OpenStack in other words, then you might get snapped up in an M&A wave by a server vendor who doesn’t have something of their own.  However, the window on this is short.  I think most NFV-driven M&A will be over by the end of 1H16.

VNF players are often seen as major winners, but I don’t think “major” will be the right word.  It is very clear that operators prefer open strategies, which few VNFs support.  I believe that operators also want either a pure-licensing or a “pay-as” that evolves into a fixed licensing deal.  The VNF guys seem to think they can build a revenue stream with per-user per deployment fees.  Given this, I think that there will be a few big VNF winners, firms who figure out how to make “VNF as a service” work to everyone’s advantage and who have magnet capabilities (mobile, content, collaboration) for which there are fewer credible open alternatives.

To me, this situation makes it clear that the most likely “winners” in NFV will be IT giants who have minimal exposure to traditional telco network equipment.  They have so much to gain and so little to lose that their incentive for being a powerful mover will be hard to overcome.  That said, though, every NFV player so far has managed to overcome a lot of incentive and even managed to evade reality.  That means that a player with a powerful magnet concept like Alcatel-Lucent’s vIMS/Rapport or Oracle’s operations-driven NFV could still take the lead.  We’ll have to see how things evolve.

What Does a “Business Case” Involve for SDN or NFV?

One of the recurring points I make in my blogs about SDN and NFV is the need for a business case.  Most are aware that “business case” means a financial justification of an investment or project, but I’ve gotten some questions from some who’d like to understand a bit more about the process of “making” one in detail.

At the high level, a business case starts with the return on investment for the project, which is the net benefit divided by the investment (roughly; you have to include other factors like cost of money).  This ROI is compared with a corporate target set by the CFO, and if it beats the target the project is financially justified.  This whole process is fairly well understood, and it’s been applied by enterprises and service providers for decades.

What makes business cases challenging for technologies like SDN and NFV is the notion of “net benefit” when it’s applied to either cost savings, revenues, or a combination thereof.  Revenue-based benefits are always challenging because you have to quantify how much revenue you would gain, and also explain why you hadn’t already gained it.  Cost-based benefits are challenging because you have to consider the total impact on costs, which is not only larger than “capex” alone, it’s actually larger than total cost of ownership.

On the revenue side, let me offer an example.  Suppose you have a thousand business customers for a carrier Ethernet service that includes some firewall and other CPE-based features.  You determine that you can accelerate the provisioning of these services by a month.  How much revenue have you earned?  A month’s worth?  Perhaps, but probably much less than that.  The reason is that you’ll gain time-to-revenue for new deployments, and we didn’t identify how many of them there were.  We also don’t know whether a given customer would actually take the service a month early or would delay ordering.  If a new branch office opened on September first, would your enterprise say “Heck, if you can light up my Ethernet a month before anyone even moves in, that’s fine?”

On the cost side, suppose we could replace that Ethernet CPE with a cloud-hosted equivalent.  The box costs two grand today, so do we save that?  Not likely.  First, you need something to terminate the service on premises, so we would simplify the box there but not eliminate it.  Second, we are making a potentially fatal assumption by assuming the only cost is capital cost.  The operational environment associated with cloud-hosted functional elements is surely higher than for a single box.  Even if we don’t know that for sure, we’d have to validate the assumption it wasn’t.  Then, we’d have to look at the question of whether we would impact other costs, like customer support calls to inquire about the status of a service.  You need to understand the impact on all costs to determine the benefit, or lack thereof.

When CFOs look at SDN or NFV projects, or at other network projects, these are the things they look for.  What has made things difficult for SDN and NFV sales is that the trials and tests that have been run on the technologies have not addressed the range of cost and benefit qualifiers I’ve noted here (which are only the most obvious ones).  A CFO is presented with a comment that they could save 25% in capital cost by replacing multi-feature physical devices on premises with a combination of basic service termination and hosted functionality.  OK, says the CFO, what will it cost to operate this?  In nearly all cases, the CFO won’t get a complete response.

Then the CFO says “What will the impact be on service availability and customer support?  Will I have to credit back for outages because I missed my SLA?  Oh, you can’t tell me what SLA I can write?  And if the customer calls and says the service is out, you can’t really be sure whether it is or isn’t?”  You can imagine how well this sits in a financial review.

The cost of a router or a switch can be higher than the cost of a virtual one in a capital outlay sense, but you can see that even traditional total-cost-of-ownership metrics won’t fully address the “financial cost” challenge, and that’s what you need to determine a business case.  Operators know what it costs to run networks based on legacy technology.  Yes, it’s too much.  But they can’t accept a statement that SDN or NFV will make it better as proof that will happen, and that’s what they’re asked to do if a given SDN or NFV trial doesn’t validate operations impact to the point where opex cost comparisons with the legacy network can be made.

There’s also the question of “risk premium”.  If you give a buyer two choices—one known and the other unknown—to solve a problem, they will pick the known approach even if the unknown one has a slight advantage.  They don’t want to face a risk unless they have an ample justification.  With SDN and NFV, operators are not confident that the business case is easily met, so they can’t presume enormous financial benefits.  Thus, they have to reduce that risk premium, which means they have to answer some of the knotty questions like “How many servers do I need to achieve reasonable economy of scale?” or “What is the MTBF and MTTR of a virtual-function-based service chain?”  They may even have a basic question like “What happens if I start deploying this and it doesn’t work at scale?”

Sellers can address the risk premium problem by demonstration, meaning that they can propose and run tests that show what can be expected of SDN or NFV operations at various levels of deployment.  They could also discount or make other commercial concessions (loan equipment, etc.) to address this premium, but they can never simply propose that it not be imposed.  They can never expect operators to buy without an adequate ROI, either.

In rough terms, operators are saying that SDN or NFV deployments that save less than about 20% in costs are not going to fly because they can achieve those levels of savings by pushing vendors for product discounts on legacy equipment.  The need for better benefits is one reason why operators (and therefore vendors) have moved from pure capex reductions to capex/opex/revenue benefits for SDN and NFV.  But capex is nice because you can apply it to a box.  When you start to talk opex and revenues, you’re talking now about service infrastructure as a whole, and that’s what is making things complicated now.

So you’re an SDN or NFV vendor and you want to “make a business case”.  What do you do?  First, you have to identify the specific impact of your approach on revenues and costs, overall.  That means running at least some credible tests to establish either that operations of the new infrastructure is completely equivalent to the old, or what the differences would mean in cost terms.  Second, you have to point to specific proof points that would validate your tests, and finally you have to work with the operator to devise a trial that would demonstrate behavior at these points.

The big hole in all of this is operations, which is what it was from the first.  Because we don’t have a mature management model for either SDN or NFV, we can’t easily validate even basic infrastructure TCO much less total cost impact.  Most of the purported SDN or NFV vendors don’t have a management solution at all, and those who do have been constrained so far by the limited scope of trials.  It might be a slog to make a limited trial big enough, to include enough, to make a business case, but it’s nothing more than applying a process that’s been in place for decades, and we don’t have a choice.

Can “Federation” Accelerate NFV Toward a Business Case?

Vice presidents of operations at the network operators, especially the telcos, think that the solution to NFV silo risk is to do a better job of “federation”.  That term is one often used to describe the interconnection of elements across administrative or technical boundaries.  If a network isn’t built in a truly homogeneous way, then you connect the domains using some kind of federation.  That’s worked in the past, the VPOs think it could work here, so let’s take a deeper look.

The big question with federation is what you federate.  Federation is a relationship between things, so you have to be able to identify specific things to work with.  In classic networking we’ve tended to link domains, divided by technology or whatever, based on one of three things.  First, OSI layers.  You can link “routing” and “transport” by laying IP on top of optics.  Second, network-network interfaces.  You can use BGP boundaries or even IS-IS boundaries to create federation points.  Finally, APIs.  In the cloud (in OpenStack, for example) you can federate at a point where a “model” is passed to an API for realization.

In most cases, federation points are naturally suggested by technology practices, which may be the defining challenge for NFV.  Like operations, federation was declared out of scope by the ISG early on (in the same meeting, in fact).  There has been some work done by the ISG on federation, but the ISG work tends to focus on “interfaces” and functional blocks and it doesn’t define NNI points explicitly.  That doesn’t mean federation wouldn’t work with NFV, only that we’d have to define where we wanted it and then work out how it could be done.

The beauty of the model approach is that a model is a kind of agile “thing”, meaning that the model approach lets you define a wide range of items that can then be federated.  It’s the only approach to federation that lets you preserve the agility of software-driven implementations of functionality, and so it should be a requirement for any SDN or NFV implementation.

You all probably know my view of how a “service” should be defined—as a structured model that starts with a retail entity and works downward like an upside-down tree to the resource tendrils.  This is what I laid out for both CloudNFV and ExperiaSphere.  If you presumed that we had such a structured model for NFV overall, you could declare a useful and fundamental rule of NFV federation; federation should allow any piece of a structured service model to be sourced from within the owning provider or from a partner.  Further, either of these two options should look the same to the higher-level elements that referenced them.  Federation should be an outcome of a resource-abstraction process.

If we start with this approach, it’s easy to take the next step and say that federation is based on “outsourcing” the resolution of an intent model.  An intent model is a black box that’s defined by its interfaces (ports/trunks), its functionality, and its SLA.  A “LINE” intent model has two interfaces, the functionality of bidirectional open connectivity, and an SLA of whatever can be delivered or is needed.  If the LINE connects two points inside the service owner’s network, then it’s resolved on the owner’s infrastructure.  If not, then it’s resolved at least in part by a third party.

The “how” is interesting here.  If we had a partner providing us with a connection to a single customer site that happens to be on their network, we could visualize the LINE as decomposing into two LINEs, one of which is “partner-to-NNI” and the other “NNI-to-my-own”.  One thing this demonstrates is that if you use intent modeling of service elements, you can make the NNI points part of the resolution of an intent model, meaning that you don’t have to define explicit NNIs in the specifications.  That means that as long as we could use intent-modeling principles with ETSI NFV we could make it work without NFV having specific federation specifications and interfaces.

In order to make something like this work, we’d need to build a bridge between the two implementations, so that when the owning operator decomposed the LINE, it would result in the dispatch of a kind of virtual order into the partner.  We’d also need to be sure that lifecycle management and event handling passed across this bridge.

You can’t have bridges if everyone wades across on the periphery.  The bridge has to isolate the two environments—one is a customer to the other—without disconnecting lifecycle management.  The key requirement to make this work is that the intent model has to send parameters to the realization partner, and receive management state from that partner, within the model.  The model is the in-two-worlds binding that couples lifecycle processes.

The interesting thing about this approach is that it would work between operators or within one.  This means that federation processes would support partnership, but also (as VPOs suggest) would be a path to eliminating silos.  To make it work, though, you would need to have an intent-model structure for services in all of the implementations.  Since the ETSI ISG is looking hard at intent models, that means that it might be possible to address the silo and federation issues at the same time and with something already in the works.

Intent modeling of this sort could make it unnecessary to extend federation beyond service modeling, because resources are typically going to be federated as collections of behaviors within an intent model.  However, the NFV specifications call for another level of abstraction, the VIM abstraction of infrastructure (NFVI).  You can model infrastructure as an intent model too, and you can use the same bridging concepts noted above to allow any implementation of a resource to assert the same functional/intent behaviors.  Same with management variables.  This would mean that an operator could publish a portal into their NFVI to another, as long as there was a VIM to represent the NFVI in both worlds and a model that provided the bridge.

How many business-case issues could federation fix?  That’s hard to say, but I think that if we had a structured service and a VIM/NFVI implementation based on intent modeling we could harmonize differences in NFV implementations to prevent service silos.  That would allow operators to develop business cases that were more service-specific and still be assured we could harmonize them down the line.

The weak spot, as always, lies in the operations model.  In theory you could manage a federated service element in its own way, but unlike NFVI whose economies of scale can be preserved by federation, there’s no assurance that separated operations processes would be compatible, much less efficient.  What this means is that federation is a step toward an NFV business case, but by itself it doesn’t guarantee you’ll arrive.

What Carrier VPs of Operations and CMOs Think About NFV

Many of you have noticed the fact that my blog on the “missing NFV constituencies” of CIO and CFO was still missing a constituency or two.  The ones I didn’t cover are the head of operations, and the CMO, and the reason I didn’t is that I was still massing the comments I’d had from these people.  As you’ll see, both these new constituencies have a reason to be on the fence longer.  You’ll also see that both have some questions of their own, some issued held in common with other groups, and some unique ideas.

Operations people are responsible for the management of the infrastructure itself.  Some have expressed surprise that this activity is separate from “operations” in the OSS/BSS sense, and it’s not fully separated at all.  From the days of TDM, operations personnel have used some operations systems.  However, packet networks have created a world where “resource management” and “service management” can be polarized and many operators have done that.  The resource or infrastructure side, the domain of the VP of Operations (VPO, hereafter) is often a “resource” just like “craft” personnel.

The VPO has been largely disconnected from the early NFV work in most operators.  Only about 15% of lab trials ever result in anything actually getting deployed, so there’s no reason for the VPO and staff to be too excited about concepts until they’re released for field trials.  At the beginning of this year, when it was obvious that the next step for NFV would have to be a field trial, VPO staff started getting involved in NFV plans.  In some operators who had started top-down programs of transformation rather than technology-oriented bottom-up trials, VPOs were involved early on.

The big issue VPOs have expressed is a new one, but still one that by implication cuts across some of the other issues of the CIO and CFO; whether NFV can deploy at scale.  We know how to build large networks, global networks.  We know how the operations processes and costs scale with network scope.  We know how to divide networks into functional or administrative zones (metro, for example) and we know how to connect services across zonal boundaries.

The VPOs, reviewing the results of PoCs, see in most cases a collection of service-specific trials that cut across a range of infrastructure—both in technology/vendor terms and in administrative terms.  The problem this poses for them is that it doesn’t give them a handle on how generalized infrastructure would look.  OK, we know vCPE or vMI (“mobile infrastructure” in case you forgot that acronym from my earlier blog).  What does vEverything look like?  VPOs say that if we can’t answer what an at-scale NFV deployment would look like and how it would be deployed and how we’d connect services across it, they can’t certify that it would work at all much less deliver any benefits.

What has the VPO particularly concerned is that the “hottest” NFV application seems to them not to be an NFV application at all.  Virtual CPE created by hosting functional atoms in an agile edge device is not the model of NFV as it was first considered, nor is it the model that the ETSI ISG has worked on.  Many of the issues of generalized infrastructure are clearly moot if you forego generalizing by hosting customer-specific functions in customer-specific devices.  Even management is simpler because there are no multi-tenant NFVI resource pools, only slots in CPE into which you fit something.

The VPOs are among the group of operator executives who fear “the death of a thousand successes”, meaning a bunch of service-specific evolutions that don’t add up to anything systematic, infrastructure with any economy of scale, or any common and efficient operations practices and tools.  They love the notion of low-apple starts (they’d be scared to death of anything else) but they don’t see the tree yet and so they distrust the notion of “low-ness”.

CMOs have also gotten more engaged as NFV has evolved toward greater independence on service agility.   Their biggest fear is that NFV is defining a “new service” as a “new way of doing an old service”.  Most of the CMOs believe that the current network operator problem is the OTTs, who are successfully reaping new service revenues while generating unprofitable traffic for them.  They believe that new service revenues will have to come from services that are really new.

There are challenges to making this particular notion of newness workable, though.  Operators are not used to selling in the traditional sense; everyone needs basic communications services and so it’s rarely necessary to make a case for Internet access or WAN services to an enterprise.  You may have to compete for the win but you’ll not have to establish the need.  For the truly new services, CMOs acknowledge that the operators’ big problem isn’t creating the services but creating the idea of the services.  They don’t visualize unrealized demand easily, so they don’t know how to generate it.

It’s interesting to note that while CFOs and CIOs didn’t make spontaneous suggestions on how their own issues could be resolved, both the VPOs and CMOs did.  These not only help refine just what the challenges of NFV are, they may actually point toward elements of a resolution.

VPOs say that they build networks from diverse vendors, technologies, and so forth all the time.  They have service-specific elements too.  They think that trying to build a unified infrastructure where everything is based on a single common standard is unrealistic because it flies in the face of experiences they’ve had at the device level.  Instead they suggest that the key is to recognize that there will be “domains” and focus on making sure that 1) the domains interconnect at the service level, and 2) that they consume the expensive infrastructure (servers, in this case) efficiently.  To the VPOs, the biggest void in NFV technology is the lack of formal federation.

The CMOs say that the solution to truly new services is to consider NFV to be seamlessly linked with the cloud.  Applications, hosting, content, and everything else that we say is an OTT service is on its way to being a cloud service, if it’s not already there.  The CMO says that a “new” service combines several cloud components with traditional communications.  It’s less important whether communications features are migrated to virtual functions (to the cloud, in other words) than that new service features from the cloud are migrated into communications services.

I agree with the views of both these groups.  I also understand that while VPOs and CMOs might be providing real insight into how we could fix the NFV of today to fully realize its benefit potential, they’re also asking the industry to reverse itself.  My view is that the VPO concept of federation and the CMO concept of cloud integration might combine to create a lifeline for NFV.  A good federation approach could help unify PoC silos.  A cloud integration approach could frame new services based on IoT or simply allow operators to participate more effectively in current OTT services.  Together these could address, perhaps, enough issues to let operators invest in actual field trials, and give vendors time to address some of the technical issues.

This seems to me to argue that the twin pillars of NFV change I presented in an earlier blog—Intent Modeling and IoT execution—could be augmented by federation and cloud integration, and the result would be a path to NFV success.

Can the Union of SDN and NFV Make HP Into a Strategic Player?

Open architectures for IT have a profound impact on the sales process.  In the old days, when a vendor sold a complete proprietary IT ecosystem, you pitched your benefits holistically and when you won you won it all.  When things shifted to an open framework, with COTS servers and Linux operating systems, the “sale” was for a component not an ecosystem, and when you won you had to get back out there the next day and win again elsewhere.

There have been a lot of changes driven by this evolution to open IT, but one stands out even today.  No sales organization can both drive “component-of-infrastructure” sales on one hand and validate infrastructure-wide differentiation on the other.  I’ve surveyed enterprises on how they buy things since 1982, and the surveys show this dilemma clearly.  In the early surveys, IBM had a strategic influence on buyers so great that they could almost totally control the direction of IT evolution.  Today, no vendor has even half the influence IBM had.

IBM has fallen far, fallen because they’ve focused on sales and neglected marketing.  Even Cisco, who now leads the pack in terms of strategic influence on enterprise buyers, has been unable to make an ecosystemic case for its wares, and so has been forced to fight for every deal.  The whole industry has gone tactical because nobody has learned strategy.

HP just reported, and it’s an example of the pressure of openness.  At one time they were the number-two in influence, and they’ve now fallen to a near-tie with Oracle for slots four/five.  The company is on the verge of a historic separation of its personal and enterprise businesses, a move that could help it address the issues it’s been facing—some because of that historic openness point, and some of its own making.  HP’s portfolio in NFV is part of that shift, and the success of their path forward with NFV may depend on how well HP addresses that whole openness thing there.

Meg Whitman’s quote de jure is “…we’re seeing the benefits of the work we’ve done over the past several years to strengthen the product strategy and go-to-market execution for the Enterprise Group” but that’s a bit of an overstatement.  HP’s enterprise revenues were narrowly higher, which is good, and their servers and network products sold well.  This says that HP has core strength that can help it exert influence, because generally buyers listen more to the vendors who make up the largest piece of their budget.

Whitman touted NFV in the earnings call, something that’s rare in today’s market climate, but I think it reflects an important truth.  NFV is two things for HP.  First, it’s the on-ramp to an enormous increase in total addressable market (TAM) because it could be the largest single driver for data center and server deployment growth over the next five years.  Second, it’s a perfect place for HP to try out an ecosystemic marketing/positioning initiative.  On the call, Whitman talked about the NEC partnership in NFV and the ConteXtream acquisition, and I think that’s fitting because these two things are critical in both these NFV mileposts.

The NEC deal’s touted benefit is “to create more open environments for customers”.  Openness isn’t exactly a benefit—it’s an objection management point for the sales process.  The thing is, we’re really not selling NFV right now, we’re testing it.  There is a place where openness has great influence, though, and that is as a companion to a compelling business case that you’re pretty much the only vendor who can make.  HP is one of only three (at the most) vendors who can actually make a broad NFV business case, and in my view they’re on top of that list.

So this raises that issue of marketing versus sales in an open world.  To win in such a world, to exploit technology leadership, you have to accelerate the decision.  Every day that passes with NFV in the hands of lab dabblers is a day when HP isn’t earning revenue from it and a day when other vendors can whittle away at their lead.  Strategic account control, the thing IBM used to be able to exercise, gave a vendor a way of driving a decision first, and driving it in a favorable direction second.  You can’t win at a game that doesn’t get started no matter how good your stats are.

And there’s more.  There is no question that NFV will, over time, become a commodity space.  That’s been the goal of the operators driving NFV from the very first.  The question is what will happen leading up to that point, and I submit that is the same question that’s facing IT in general.  When you move to an open world, you move to a world where summing the parts to the glorious whole is somewhat in the hands of buyers.  The contradiction is that there is no commodity technology on earth that’s not dependent on underqualified buyers—you can’t get enough of anything else to commoditize something.  This kind of market cannot be sales-driven because sales people can’t be there to influence every step.  We need to socialize a vision, even for NFV, that will carry it to benefits first and second set our own role in the process.

What separates Cisco, who has gained influence over the last five years, and IBM and HP who have lost it, is a natural mission focus.  Everyone identifies Cisco with the union of networking and IT, and Cisco’s products and concepts are fairly well known.  IBM and HP have more generalized missions so nobody can say exactly what it is that would put their label on a project.  IBM and HP also lack concept brands and issue ownership; they aren’t linked in a strategic sense with the key concepts that drive their markets.

ConteXtream may be the litmus test here, not because it’s SDN and a foot in another revolutionary door, but because SDN and NFV are pieces of the same puzzle.  HP is a leader in preparing the NFV specifications for SDN integration, but while their technology leadership and strong product capabilities could make this combination a killer one, it’s not really given much emphasis in their positioning.  The critical website graphic on NFV has SDN represented as a label on a signpost, and the five specific areas that HP says NFV has to address don’t include SDN.  They also don’t include operations integration or even a mention of specific NFV benefits.

What you’re left with here is a sense that HP lags Cisco in strategic leadership because Cisco has a natural area of focus and HP has to provide one.  In the cloud, SDN, and NFV areas, HP has objectively stronger assets than Cisco has.  HP has less aging turf to defend, more experience at competing effectively in a commoditizing market.  Where Cisco has the advantage is that the network is a natural ecosystem and they know how to compete within one.  SDN and NFV could create natural ecosystems for HP, and their union could be the thing that turns HP’s earnings calls from tactical and pedestrian to strategic and compelling.

What Do the CIOs and CFOs Think about NFV?

NFV has a lot of constituencies to appease within each operator to get to deployment, and so far engagement has largely been with the CTO organizations.  I’ve noted in past blogs that the operators’ CFOs are concerned about the NFV business case and CIOs are concerned about operations.  I thought it might be interesting to review the aspects of NFV technology that are of most concern to the CFOs and CIOs.  These might be the directing factors in moving from lab to field trials and deployment because they might be issues that will have to be addressed to get broader buy-in for NFV.

The number one CFO concern is NFVI to VNF/MANO compatibility.  The largest investment an operator will make in NFV is the NFVI, and CFOs are concerned that the “best” or “most efficient” NFVI might not be workable with all the NFV implementations.  Most say they are not clear on the relationship between the NFVI and the Virtual Infrastructure Manager that’s supposed to link it to the rest of the NFV software.  Is there a “standard” VIM, or is there a VIM for each NFVI, depending on the vendor of the hardware and the software?  Can you have multiple VIMs, and if not how would you integrate servers from—for example—Cisco and HP?

The number one CIO issue is the viability/scalability of the NFV management and operations model.  A service made up of virtual functions could have a dozen VMs, as many or more internal tunnels between them, and linkages to existing infrastructure for customer delivery and transport connectivity between data centers.  How this will be managed is completely unclear to the CIOs.  The specifications suggest that VNFs present themselves to management systems as “virtual devices” managed as the real devices would be, but that doesn’t address all the management of resources that realize those virtual devices in functional terms.  With no clear picture of what could be called “NFVI operations” they can’t validate the opex, and thus can’t sign off on a TCO for NFV.

CFOs’ second-most-significant concern is the VNF cost model itself.  The presumption in NFV from the first was that inexpensive software implementations of functionality hosted on commodity servers would be cheaper than actual appliances.  What CFOs say is that the providers of today’s devices want to convert them to VNFs and price the licensing such that they have the same profit as before.  CFOs are particularly concerned about the “pay as you grow” licensing model, which would increase their fees to VNF providers as customer volume grows, rather than setting a fixed license charge.  The as-a-service model seems to CFOs to penalize them for success.

The number two CIO concern is the integration of operations/management processes for legacy infrastructure with VNF lifecycle management.  Nobody in the CFO or CIO organizations thinks that future services will be completely VNF-based, and in the early stages it’s likely that most services will have significant legacy device participation.  Can you improve service agility or operations efficiency when you’re not able to manage the service from end to end?  They don’t think so, and having a different management model for legacy than for NFV makes it hard to even know what management costs would be for a given service—you couldn’t tell what would end up legacy versus VNF.

CFO issue number three is actually also CIO issue number three, but obviously the reasoning is a bit different.  The issue is portability of VNFs.  CFOs believe that many of the major vendors will develop VNFs that have explicit ties to their own implementation of MANO, VNFM, and NFVI.  This makes sense from the vendor perspective; they can use their key VNFs to pull through their implementation of NFV.  The problem for the CFO is that they lose pricing power and they risk replication of assets—silos—because they need specific VNFs from different vendors and end up with separate NFV ecosystems because of that.

CIOs’ concern here is in management.  They point out that there’s no specific mechanism for federation of NFV assets, nor really any satisfactory model of how multiple NFV implementations could even connect within a given operator.  That could silo management visibility, creating a potential disaster for service management efficiency.

Both CFOs and CIOs point out that non-portable VNFs would mean that if a given NFV provider went out of business, failed to keep up with NFV evolution, or simply dropped their NFV product, the operator might have to put together a whole new ecosystem just to continue to sell their current services to their current customers.

The final problem for CFOs is the lack of a convincing point of purchase.  What every buyer wants is a collection of credible sellers.  Although there are credible sellers for NFV, it’s not clear whether any one of them is sufficient and it’s pretty clear that there’s little basis for combining them to form a multi-vendor ecosystem.  Nobody wants a flavor-of-the-month NFV solution, and that seems a real risk now because even the media—ever hungry to name winners in any contest—seems unable to name one with NFV.

For CIOs the final issue is that it’s not clear whether we need an enhancement to current OSS/BSS, a next-gen operations model, or maybe no model at all.  Service automation implies lifecycle automation, which could represent a major shift in the way operations software works.  The TMF reflected such a shift in their GB942 and NGOSS Contract stuff, but this hasn’t been widely implemented.  None of the CIOs I talked with had done so, which is too bad because it might resolve some of the debate on where operations software should be taken.  I was at a Tier One operator meeting where two people sitting next to each other had totally incompatible views of what was needed in operations—retaining the old or tossing it in favor of the Great Unknown.  That’s reflective of the confusion overall, and that’s a problem because of the obvious key role that OSS/BSS plays in service agility and operations efficiency.

So there we have it.  You can see that there are two issues here.  First, the “new” players within the operator organizations are yet to be fully convinced that NFV is the right answer (though they really want to believe it is because they’re not sure what else is on the horizon).  Second, those new players don’t have the same issues on their bucket lists.

As I’ve said before, there’s no reason why we can’t address these points; even today I think we could meet enough requirements with some of the existing NFV implementations to build the necessary momentum.  We do need to meet them, though, and we need to raise all the issues and address them if we want NFV to develop competitively and to its full potential.

How Will the Major Vendors Fare in This Fall’s Operator Planning?

I blogged earlier this week about the “fall planning cycle” for network operators, and the issues and forces associated with that cycle this year.  An obvious follow-on question is how vendors will be impacted by the cycle.  Will some be hurt by events, others helped, and is there still time to move yourself from the “hurt” to “help” group?  Time is short here, so whatever happens will have to be focused as much on positioning as on product.  I can’t review every vendor in a blog like this, but let’s look at some major “revolutionary” vendors and see where they are and what they might do.

Alcatel-Lucent is one of the functional market leaders in NFV and is the runaway winner in SDN.  As they have since the merger that created them, Alcatel-Lucent has been near dead last in terms of positioning effectiveness.  Always a geeky player, they’ve relied on technical engagement to advance their goals, but the problem with the current SDN/NFV revolutionary period is that there are a lot of new players to sell.  Even where Alcatel-Lucent has the strength to promote a holistic strategy, the inexplicably separate their wonderfully unified stuff into functional silos.  OSS/BSS, NFV, IMS, Rapport…all of these should be bricks on the pathway to the new networking age.  They’re not.

The biggest challenge for Alcatel-Lucent is positioning Nuage, their SDN strategy, given the dominance of traditional routing within the company.  You have to protect your sales, of course, but you can’t protect them for long if you ignore evolution and hunker down on the present.  Earth to Alcatel-Lucent; you can’t virtualize most of your infrastructure by 2020 (as AT&T says it will) by staying with Big Iron.  Alcatel-Lucent is at risk to losing their SDN lead while they’re dallying on whether SDN matters enough to promote it.

Cisco is the poster child for dallying in the eyes of most.  They always seem to be trying to cap any new development, largely because they are.  Why foster change when you’re winning the current game?  In the case of SDN, Cisco definitely plays a “cap” game; they’ve built a software veneer on top of the usual infrastructure to tap off a lot of the early motivation to change to “real” SDN.  The problem for them is that they’re defending against their own success.  Cisco’s best chance to be the next IBM is to ride the wave of the “network-facilitated cloud”, which uses SDN for tenant networking and NFV for deployment and operation of features.  If Alcatel-Lucent were stronger in Nuage positioning they’d have put Cisco’s SDN strategy to bed already.  There’s still time for them to do that.

While Alcatel-Lucent could clean Cisco’s SDN clock, HP is the biggest potential disruptor of the networking industry.  HP has, via M&A, a decent SDN position, a superb NFV story, what might be the best IoT strategy of anyone.  They have all the products needed to build the virtual world of the future, and most importantly they have the hardware framework that will earn the most revenue, so they have the best financial incentive to stay engaged.  Their problem is the ISG’s Proof-of-Concept activities.  HP got seduced into believing that if you won at PoCs you won in deployment.  That would be true if the PoCs were aligned with convincing business cases, but they aren’t.

The future of NFV and SDN is the future of networking, either proactively or reactively.  HP needs to build its own ecosystemic story, crossing over the boundaries between its product areas, its technologies, and most important crossing over all those PoCs.  We are building one network here, gang, not a bunch of PoC silos.  What your vision is for that one network must be communicated clearly and (most important) quickly.

Huawei is way beyond the 900-pound gorilla phase of evolution in networking.  They are the price leader, and likely will be forever.  That gives them both assured success even if no real network evolution happens, and a solid shot at framing the future if they want to.  That’s because low prices can ease the risk burden that buyers of revolutionary stuff always have to face.  Huawei knows all of this, I think.  They have quietly managed to put together a lot of strong elements in NFV and SDN, not only the glamorous high-level stuff but also some of the base technology stuff.

Huawei has two problems that are related.  First is their lack of marketing/positioning skill.  While they’ve been getting better, Huawei isn’t a marketing-driven player and you have to be that to foster a revolution, or take your place in one.  The second problem is their political impasse with US carrier sales.  Not only are the US operators giant spenders, they’re also often on the leading edge of technology changes.  Further, they are close to the tech media both geographically and culturally.  If you are not winning hearts and minds in the US, the US media doesn’t take you as seriously.  Huawei can never fix their political problems, but they could position.

Oracle doesn’t need to learn much about positioning, in my view.  Their technology credentials in NFV are limited and their credentials in SDN even more so, but they were smart enough to see something that all the SDN and NFV leaders failed to see—and still largely fail to see.  You cannot win at either SDN or NFV without an operations story so complete and compelling that it shines like sunrise in the darkness.  They’ve been making (can you believe it!) OSS/BSS announcements and relating them to NFV and SDN!  From the PR of most of the NFV players, you’d think there was no such thing as an OSS/BSS.

Service agility and operations efficiency depend on operations systems.  Oracle has grasped that, but they are still weak in terms of how their operations vision actually combines with either SDN or NFV.  You can’t sell SDN or NFV without operations, but you’re not going to upset the network applecart by starting to revamp operations and hoping it will trickle down.  That’s why this whole SDN/NFV thing is complicated; it’s inherently multifaceted, both in technology and constituency.

Oracle is the only player in the revolutionary networking space that actually needs new product functionality.  They should be looking out there for somebody with strong SDN and NFV credentials to buy—somebody with good technology but not too much market cap.  Cisco, I suspect, has the technology it needs but is still focused on retention of the old model—“fast following”.  Alcatel-Lucent is torn between a revolutionary cadre and a bunch of stick-in-the-muds, and HP is chasing too many different rabbits with too many different hounds.  Huawei may be the player doing the most right, but they win primarily if everyone else messes up.

Which, so far, they are.  Every one of these vendors needs to make a major SDN/NFV/operations policy announcement by early October at the latest.  If anyone does that well, they gain an upper hand in budget planning for 2016.  If only one does it well, they may have won the SDN/NFV future.

What We May Have Here is a Quiet Revolution

If you look at the combined state of networking and IT, the most interesting thing is the fact that it’s getting harder to find the boundary point.  We’ve been linking the two since online applications in the ‘60s.  Now, componentization of software, virtualization of resources, and mobility have combined to build agile applications that move in time and space and rely on the network to be almost an API between processes that spring up like flowers.

While software and this network/IT boundary are symbiotic and so co-evolving, you could argue that our notions of platforms have been less responsive to the changes.  In most operating systems, including Linux, we have a secure and efficient “kernel” and a kind of add-on application environment.  Since the OS is responsible for network and I/O connections, we’ve limited the scope and agility of virtualization by treating I/O as either “fixed” in the kernel or agile only as an extension of the applications—middleware.  Now all of that may be changing, and it could create a revolution within some of our other revolutions—especially SDN and NFV.

Some time ago, PLUMgrid developed what was essentially a vision for an I/O and network hypervisor, an “IO Visor” as they called it.  This product was designed to create a virtual I/O layer that higher-level software and middleware could then exploit to facilitate efficient use of virtualization and to simplify development in accommodating virtual resources.  What they’ve now done, working with the Linux Foundation, is to make IO Visor into an architecture for Linux kernel extension.  There’s an IO Visor Project and the Platinum members are (besides PLUMgrid) Cisco, Huawei, and Intel.

The IO Visor project is built on what’s called “Berkeley Packet Filters”, an extension to Linux designed to do packet classification for monitoring.  BPF fits between the traditional network socket and the network connection, and extended in 2013 to allow an in-Kernel module to handle any sort of I/O.  You can link the extended BPF (eBPF) at multiple layers in the I/O stack, making it a very effective tool in creating or virtualizing services.  It works for vanilla Linux but probably most people will value it for its ability to enhance virtualization, where it applies to both hypervisor (VM) and container environments.

The technical foundation for IO Visor is an “engine” that provides generalized services to a set of plugins.  The engine and plugins fit into the Kernel in one sense, and “below” it, just above the hardware, in another.  Unlike normal Kernel functions that require rebuilding the OS and reloading everything to change a function, these IO Visor plugins can be loaded and unloaded dynamically.  Applications written for IO Visor have to obey special rules (as all plugins do) but it’s not rocket science to build there.

What IO Visor creates is a kind of “underware” model, something that has some of the properties of middleware, some of user applications, and some of the OS (Kernel) itself.  You can put things into “underware” and create or modify services at the higher layer.  The monitoring example that was the basis for BPF in the first place has been implemented as an IO Visor case study, for example.

What’s profound about IO Visor is the fact that it can be used to create an “underservice” that’s got components distributed through the whole range of Linux OS deployments for something like SDN or NFV.  An obvious example is a virtual switch or router “service” distributed among all of the various hosts and a functional part of the Kernel.  You could create a security service as well, in various ways, and there’s an example of that on the IO Visor link I referenced above.

Some of the advantages of this approach in a general sense—performance, security, and agility—are easily seen from the basic architecture.  If you dig a bit you can find other benefits, and it’s in these that the impact on SDN and NFV is most likely to emerge.

Signaling and management in both SDN and NFV are absolutely critical, and you can see that by applying IO Visor and plugins to a signaling/management service, you could create a virtual out-of-band connection service accessible under very specific (secure, auditable, governable) terms by higher-layer functions.  This could go a long way toward securing the critical internal exchanges of both technologies, the compromise of which could create a complete security/governance disaster.

Another feature is the distribution of basic functions like DNS, DHCP, and load balancing.  You could make these services part of the secure Kernel and give applications a basic port through which they could be accessed, a port like that of my hypothetical signaling/management network above would be limited in functionality and thus virtually impossible to hack.

If you’re going to do packet monitoring in a virtual world, you need virtual probes, and there’s already an example of how to enlist IO Visor to create this sort of thing as a per-OS service, distributed to all nodes where you deploy virtual functions or virtual switch/routers.  Management/monitoring as a service can be a reality with this model.

NFV in particular could benefit from this approach, but here’s where “impact” could mean more than just reaping more benefits.  You can load IO Visor plugins dynamically, which means that you could load them into a Kernel as needed.    That could mean that NFV deployment orchestration and management would need to contend with “underware” conditioning as well as simply loading VNFs, and it would certainly mean that you’d want to consider the inventory of IO Visor features that a given set of VNFs might want, and decide which you’d elect to bind persistently into the kernel and which you’d make dynamic.

This raises another obvious point.  If one of the big benefits of the IO Visor approach is to support the creation of distributable kernel-based service.  If that’s what you’re aiming for, you can’t just have people doing random IO Visor plugins and hoping they come together.  You need to frame the service first then implement it via plugins.  I’ve blogged about the notion in the past, and it’s part of my ExperiaSphere model—I call it “infrastructure services”.  Since you don’t need to deploy something that’s part of the kernel (once you’ve put it there), you need to conceptualize how you use a “resident” element like that as part of a virtual function implementation.

This probably sounds pretty geeky, and it is.  The membership in the project is much more limited than that of the ONF or the ETSI NFV ISG.  There are three members who should make everyone sit up, though.  Intel obviously has a lot of interest in making servers into universal fountains of functionality, and they’re in this.  Cisco, ever the hedger of bets in the network/IT evolution, is also a member of the IO Visor Project.  But the name that should have SDN and NFV vendors quaking is Huawei.  While they’re not a big SDN/NFV name in a PR sense, they’ve been working hard to make themselves into a functional leader, not just a price leader.

And IO Visor might just be the way to do that.  I think IO Visor is disruptive, revolutionary.  I think it brings literally unparalleled agility to the Linux kernel, taking classic OSs forward into a dynamic age.  It opens entirely new models for distributed network services, for NFV, for SDN, for control and management plane interactions.  It could even become a framework for making Linux into the first OS that’s truly virtualized, the optimum platform for cloud computing and NFV.  You probably won’t see much about this in the media, and what you see probably won’t do it justice.  Do yourself a favor, especially if you’re on the leading edge of SDN, NFV, or the cloud.  Look into this.