The Credibility of “New Revenue” to Drive SDN and NFV

If you’ve tracked both SDN and NFV carefully, as I have, you’ve probably noticed that the value propositions for both have shifted or evolved over time.  Service revenue increases are great, but you have to be able to justify them with some hard opportunity numbers.  Where are the brass rings with new SDN and NFV services?

One important point to make is that “new services” have to be broken down into the connection services and cloud services that I’ve talked about in prior blogs.  The reason this is important is that network operators have a natural place in the connection services market, with infrastructure, skills, brand, and so forth.  They’re trying to wrestle a place in the cloud or hosted feature space, but they are not there yet.  That’s why operators tend to think of “new” services in terms of legacy stuff that’s tweaked somehow to be presented differently.

We’ve heard about turbo buttons and bandwidth on demand for decades now, and it’s totally true that you could do them with SDN and NFV.  You could also have done them without either technology, and still could.  The concept of elastic bandwidth has been difficult to promote in the real world, both because buyers don’t see a big value and sellers see a big risk.

Most companies size their VPNs and VLANs and physical-layer trunks based on their typical traffic needs and over-engineer a bit to accommodate peaks.  I surveyed users about this practice for years and they were comfortable with the approach.  Yes, about two-thirds of buyers said they thought that having some elasticity to better accommodate bursty traffic would be nice, but what was interesting is that they had essentially zero willingness to pay for it.  In fact, nearly all the users who wanted elastic bandwidth wanted to reduce current spending by downsizing capacity on the average and boosting it only during peaks.  That’s why operators finally realized this was a less-than-zero-sum game for them.

One solution to this problem being proposed today is what might be called “extranets”, meaning network relationships among companies instead of within them.  Traffic levels here are more variable and few companies say they support extranet applications with fixed network services.  But few companies say they have significant extranet traffic, and most companies who do “extranetting” today say that secure Internet VPNs are the best solution.

OK, what this tells me is that there is little credibility that extensions to connection services based on SDN or NFV could really add much in the way of revenue.  You might be able to frame offerings differently if you had already completed an SDN/NFV infrastructure transition, but the benefits would fall short of justifying the change.

Revenue from connection-related features (the vCPE model) is also difficult to justify on a large scale (though I think a business case for vCPE can be made in other ways).  The problem is that credible revenue opportunities from vCPE are really limited to current Carrier-Ethernet sites or prospective sites, meaning satellite sites of multi-site businesses.  These sites can’t be sold one-off, you have to sell the HQ locations.  There, the most credible connection-related VNF-hostable features—security—have long been considered and addressed via CPE.  Yes, SMBs don’t fit this model and may be extremely interested in managed services, but with some notable vertical-market exception it has proved difficult to sell to these communities because the average revenue per user is small compared to the cost of sales/marketing.

Before we throw in the towel on new service revenues as an NFV justification, though, we need to examine what could change all this.  Not in a heartbeat to be sure, and not without some extensive marketing by vendors, but change it for sure.  The best general name we could give the path to new service success is the as-a-service-extension model.

We have static networking today because we visualize the network as being separate from the application, which means it serves the aggregate of applications.  This tends to level capacity and connectivity needs and also limits the extent to which a given application can have network services tuned to its needs.  We build application-independent, permissive-connectivity, networks then pay incrementally to build application awareness and access control.

The cloud teaches us (or should be teaching us) a lesson here, which is that network services can be specific to an application.  If you look at the virtual networking model of giants like Amazon and Google, it’s based on virtual-networking techniques that could easily create a whole vertical stack of virtual networks for a company, linked to workers and partners at any point where you find a suitable human (or machine).

The easiest way to promote this model is with as-a-service in-cloud applications, and so SaaS trends would be a way to make this all happen.  In fact, every major network vendor who has an SDN strategy—Alcatel-Lucent, Cisco, Juniper—has the ability to do this now.  Since selling traditional boxes and security/application-awareness add-ons is a great business, though, we don’t hear much about this approach.

You could also support this from the inside, from the application side.  A virtual router product that can be hosted on a generic server or edge device could build this kind of model without the support of the big vendors.  Brocade could do this, for example, and you could create an open-source project to enhance any open switch/router to support virtual networking in this form.  Once you have it, you could start dialing down private virtual networks in today’s site-networking form.

Well, maybe.  One big barrier to this is regulatory uncertainties with respect to net neutrality interpretations.  If we tried to build something like this today, from the application out to the user, we’d almost surely have to adopt an Internet overlay connection model, and that would be easier if we could have SLAs on services.  SLAs are at least a close neighbor to fast lanes, and most operators are reluctant to jump into this space lest the services they create end up violating regulatory and public policy goals.

IoT might be another answer to the problem.  There is no rational model of IoT other than a big-data-and-analytics model that lives in the cloud, sucks in sensor data from any convenient source over any worthwhile connection technology, and then makes everything available under a policy-metered query umbrella.  An operator or even a big vendor could establish a model like this, and since the networking in the model is largely inside the IoT cloud or represents a simple access extension, you could do whatever you wanted with SLAs and network architecture without impacting the current services or running afoul of regulators.

Cloud services are another dodge.  Inside a SaaS or perhaps even PaaS envelope, you could create pathways that were similar to those within a mobile network or CDN, both exempt from neutrality regulations.  Now all sorts of intra-cloud high-value services could be presented and procured.

The point here is that it’s probable that new service revenues won’t present a widespread benefit for either SDN or NFV justification unless you can do something to increase their scope of impact, which my two examples would do.  The question would then be whether that broader scope was seen by both operators and vendors as too much change, too much risk.

We can justify SDN and NFV for some operators with new service revenues today, but only on a limited scale.  For the big justification, for the benefits that can build out enough infrastructure to make it easy to add on services and features as the market demands, we’ll need something else.  Opex is all that’s left, so we’re back to the same point—you’ll need exceptional opex automation to make either SDN or NFV work.

Virtual CPE for NFV: Models, Missions, and Potential

There is no doubt that virtual CPE is the most populist of the NFV applications.  There are more vendors who support it than any other NFV application, and more users who could potentially be touched by it.  vCPE is also arguably the most successful NFV application, measured by the number of operators who have actually adopted the model and offered services.  So how far can it go, and how would it get there?

All the popular vCPE applications are based on connection-point services, meaning services offered to users ancillary to their network connections.  Things like firewalls, NAT, IPsec, and similar higher-layer services are the common examples.  These services have the advantage of being almost universally in demand, meaning that in theory you could sell a vCPE service to anyone who used networking.  Today, they’re provided either through an access device like an Internet router, or in the form of discrete devices hooked to the access device.

While all vCPE applications involve hosting these connection-point functions rather than ossifying them in an appliance, they don’t all propose the same approach.  Two specific service models have emerged for vCPE.  One, which I’ll call the edge-hosted model, proposes to host the vCPE-enabling virtual network functions on devices close to or at the customer edge.  This group includes both vCPE vendors (Overture, RAD, etc.) and router vendors who offer boards for VNF hosting in their edge routers.  The other, the cloud-hosted model, would host VNFs in a shared and perhaps distributed resource pool some distance from the user connections.  That model is supported by server or server platform vendors and by NFV vendors with full solution stacks that include operations support and legacy device support.

The edge-hosted model of vCPE can generate some compelling arguments in its favor.  Most notably, something almost always has to be on the customer premises to terminate a service.  For consumer networking, for example, the user is nearly certain to need a WiFi hub, and SMBs often need the same.  Even larger user sites will normally have to terminate the carrier service on something to provide a clean point of management hand-off.  Given that some box has to be there, why not make the box agile enough to support a variety of connection-point services by hosting them locally?  This approach seeks to magnify service agility and eliminate truck rolls to change or add premises appliances when user feature needs change.

For many CFOs, the next-most-compelling benefit for edge-hosted vCPE is the synchronized scaling of revenue and cost.  If you sell a customer an edge-hosted strategy, you send the customer an edge-box.  You incur cost, but you have immediate revenue to offset it.  Cloud-hosting the same vCPE would mandate building a resource pool, which means that you’re fronting considerable capex before you earn a single dollar (operators call this first cost).  The broader the means of marketing the service to prospects, the more useful this first-cost control is.  That’s because the size of that initial resource pool is determined by the geographic breadth of the prospect base; you have to spread hosting points to be at least somewhat close to the user or network cost and complexity will drive your business case into the dust.

The next plus for edge-hosting is that management is simplified considerably.  Here’s a customer and customer service people used to managing real devices that perform the connection-point functions.  If an edge-hosted vCPE strategy is used, then the platform software in the edge-host can make the collection of functions look like a real device, and it’s simple to do.  There are no distributed, invisible, complicated network/hosting structures whose state must somehow be related to the functional state of the connection-point service of a user.  There’s no shared hosting to consider in SLAs.  All the stuff needed is in the same box, dedicated to the customer.

The final point in favor of edge-hosted vCPE, just now being raised, is that it considerably simplifies the process of deploying virtual functions.  Where does the customer’s VNF go?  On their edge device.  No complex policies, no weighing of hosting cost versus network connection costs.  There are twenty-page scholarly papers on how to decide where to put a VNF in a distributed resource pool.  What would implementing such a decision cost, and how would it impact the economies of shared resources?  Why not punt the question and put everything on the customer edge.

The obvious argument against edge-hosted vCPE is the loss of shared-resource economies.  If we presume that network operators follow through on plans to create a large number of NFV data centers to host functions, the cost of hosting in these data centers would be lower than hosting on the customer premises.  In addition, the service could be made more resilient through resource substitution, which is difficult if your resource is hanging on the end of an access line.

According to operators, though, the big problem with edge hosting isn’t that it’s more expensive because you don’t share resources among users.  The big problem is that it’s totally non-differentiated.  You don’t even need NFV to do edge-hosted vCPE because you do little or nothing of the orchestration and optimization that the ETSI ISG is focused on.  Any credible vendor would be able to offer edge-hosted vCPE if they could partner with the VNF players.  Who, as we know, will partner with nearly everyone vertical and not on life support.  Instant, infinite, competition?  Who wants that?

This point leads to the deeper problem, almost as profound.  It’s hard to see how basic edge-hosted vCPE leads anywhere.  If network functions virtualization has utility in a general sense, then it would make sense to pull through the critical elements of NFV early on, with your first successful applications.  How do you do that when your application doesn’t even need NFV?  And given that lack of pull-through, how do you ever get real NFV going?

Some of the smarter edge-hosted vCPE vendors recognize these issues and have addressed them, which can be done in two ways.  First, you could build real NFV “behind” your vCPE approach so that you could interchangeably host in the cloud.  That would require actually doing a complete NFV implementation, and this is what Overture does, and RAD announced an expansion of its own deployment and management elements just today.  Second, you could partner with somebody who offers complete NFV, which many of the edge-hosted vCPE players do.  Anything other than these approaches will leave at least some of the edge-hosting problems on the table.

A hybrid approach of edge- and cloud-hosted vCPE is by far the best strategy, but that model hasn’t gotten the play I’d have expected.  The reason is sales traction.  Overture has a very credible NFV solution but they’ve been a bit shy in promoting themselves as a complete NFV source, and even though they have Tier One traction they’re not seen as being on the same level as an equipment giant.  The partnership players seem to be stalled in the question of how the sale is driven and in what direction.  Many of the larger players who can make the overall NFV business case see edge-hosted vCPE as a useless or even dangerous excursion because they don’t make the edge gear themselves.

Overall, vCPE might be the only way that competing vendors can counter players like Alcatel-Lucent who have an enormous VNF opportunity with mobile/content that they can ride into a large deployment.  Edge-hosting vCPE would let operators get started without a massive resource pool, and with proper orchestration and management elements it could at least be partially backed up or augmented with cloud hosting, even replaced as resource density in the cloud rises.  But it still depends on having some credible NFV end-game, and it’s still hard to deduce what even the best vendors think that is.

The Paths to an SDN/NFV Business Case

The issues I’ve raised on the difficulties operators experience making a business case for SDN and NFV are increasingly echoed by other sources, so I hope that at this point most objective people believe there is a problem here.  One question, then, is how business-case difficulties might impact SDN and NFV deployment and the future of both technologies.  And, of course, how it might impact vendors.  Another is what vendors could or are doing about the business case.

The best place to start this off might be with what could be expected to play out if we had a number of convincing players promoting a viable business case for SDN and NFV.  In such a situation, the immediate impact would be to create a validated long-term vision of what NGN would look like, both in terms of infrastructure and operations.  This vision would serve as a template to validate the discrete steps taken in the direction of SDN and NFV, which would make incremental (per-project, per-service) evolution practical.  If we knew what we were building toward looked like and what benefits it would generate, we could then take the steps that moved us furthest at the lowest risk, and both technologies would begin to fly.

What would happen if that general business case is not made?  Without it, individual projects would have to prove themselves out in a vacuum, without any broad benefits from infrastructure change as a whole.  On one hand that might seem to be a good thing; many vendors would like to promote their own narrow vision.  The problem is that both my modeling work and input I’ve received from operators shows that most of these narrow projects would not be able to deliver the benefits needed.  Worse, there’s no guarantee that these individual projects would even add up to a harmonious vision of infrastructure or operations, and we could end up with a bunch of silos—the thing we’ve been trying to get rid of for decades.

The big problems are operations and economy of scale.  NFV is a cloud technology that demands enough deployment to achieve hosting economies.  SDN is a transformation of cost that demands enough scope of deployment to make a consequential contribution to reduction in capex.  Both technologies have a very strong and not very often acknowledged dependence on operations automation to prevent the additional complexity from creating an opex explosion that would swamp all other benefits.  How do you accommodate operations of hybrid SDN/NFV/legacy infrastructure, a problem that’s ecosystemic by definition, when you can only drive benefits on a narrow service front?

The most popular NFV application, based on operator activity, is virtual CPE.  The most popular SDN application is data center networking.  vCPE in the form where a general device is placed on-prem and used to host VNFs is actually fairly easy to justify for some operators like MSPs, but it’s not easy to build the business case on a broad scale.  Host vCPE with service chaining in the cloud and it actually gets harder to justify.  And data-center SDN isn’t transformative to carriers.  It’s not even clear if it is to vendors.  Getting SDN out where it can build a complete service architecture would be transformative, but operators admit that nobody is really pushing that vision.

So what do you do?  You’ve got to broaden the benefit front, and the most obvious of our broadening options is the classic “killer app” problem.  You win in SDN or NFV if you can find a single application of the technologies that delivers enough scale and enough benefit to justify a pretty significant transformation that builds critical mass in both areas.  This then creates the “gravitational attraction” to pull in other services/projects to leverage what’s in place, and from that we end up with systemic SDN or NFV.

This killer-app approach is obviously most credible for SDN/NFV providers who have a credible candidate of this sort, and the market position to drive it.  The most obvious example is Alcatel-Lucent, who has one of the few credible holistic SDN/NFV positions and also has a commanding position in mobile infrastructure (IMS/EPC) and content delivery (CDN, IPTV).  No matter how many other VNFs and opportunities may be out there, few if any can hope to pull through enough SDN/NFV to establish that critical mass.  One mobile or content infrastructure win could establish enough scale in both infrastructure and operations to build a viable SDN/NFV platform on which other stuff can build.

What do you do if you don’t have pride of place in some critical service area?  You could, (to quote the poet Milton), “stand and wait.”  You could forget specific services and think horizontally, or you could look for a new/emerging critical area to be champion of.

If some vendor eventually gets a critical mass in SDN/NFV, it’s unlikely they’ll be able to command the whole market (particularly in NFV).  Thus, you join everyone’s partner program to hedge your bets and hunker down until somebody makes a success of SDN/NFV large enough to establish convincing momentum, then you sell your heart out to that vendor’s customers.  For most NFV and SDN hopefuls, this is the only realistic choice because they lack the product feature scope to deliver a business case on any scale.  It’s hard to see how this approach would create a big winner—too many hungry mouths are waiting to be fed at the table of SDN/NFV success.

Obviously a subservient role like that isn’t appealing to a lot of vendors, and for those with strong product breadth (those who can make a business case with their own stuff), another option is to jump over the killer app and go for the critical horizontal element that will underpin all business cases.  That’s operations efficiency and service automation.  If you can show a decisive impact on opex, you can generate so big a benefit case that you can justify virtually any SDN/NFV service or project.  In fact, you could justify your service automation even without SDN and NFV.  Oracle seems committed to this path, and it would likely be the route of choice for OSS/BSS vendors like Amdocs.

This is a path any major vendor with a complete NFV strategy could take, but it would be harder for a smaller player.  Operators themselves are split on whether modernization of management and operations should preserve current tools and practices or have the specific goal of eliminating them in favor of a different model.  They’d probably trust a big player with a compelling story to do either of these things (literally either; they’d like to see a kind of elastic approach that can either reorganize things or replace them), but only a big one.

The final option is perhaps the most interesting.  Suppose the “killer app” is an app that’s got no real incumbent?  Suppose there’s something that could have such profound impact that if it were implemented using SDN/NFV principles it would create that critical mass of support whoever does the implementing?  There is only one thing that fits this, and it’s IoT.

IoT, as the mass-media hype says, could change everything, but that’s about as much as the mass media has right about it.  It won’t develop as many believe, meaning it won’t grow out of a bunch of new and directly-on-the-Internet, mobile/cellular-connected devices.  IoT is an abstraction—three in fact, as I suggested in a blog last week.  The center of it is a big-data and analytics amalgamation that will be cloud hosted as a vast array of cooperative processes.  From it there will be a series of cloud-, NFV-, and SDN-enabling changes that will transform mobility, content delivery, and networking overall.  It’s not that “things” will be the center of the universe but that thing-driven changes will envelope the rest.

The good news for SDN/NFV vendors is that IoT in itself could justify both SDN and NFV and drive both into rampant adoption.  The bad news is that if SDN and NFV have a problem today it’s that they’re too big to confront effectively and IoT dwarfs them both in that sense.  Vendors who are presented with giant opportunities tend to see giant costs and interminable sales cycles, and somebody will inevitably blink and think small.

It’s hard to see how IoT could become a credible driver for SDN/NFV as things stand, because the link to either could be created only when a credible IoT architecture could be drawn and the role of the two technologies identified.  If we diddle for years on the former, SDN/NFV’s fate and leadership will be settled before IoT even arrives.  Thus, it’s hardly surprising that no vendor has stepped forward with a compelling story here.

Here’s where we are, then.  We have limited candidates to pin SDN/NFV business hopes on because there are only limited number of full-spectrum NFV solutions out there.  Only a vendor who can implement everything needed for SDN/NFV can make the business case.  Long-time functional leaders Alcatel-Lucent, HP, Huawei (a maybe because I don’t have full detail on their approach), Oracle, and Overture Networks have now in my view been joined by Ciena after its Cyan acquisition—if they follow through on their commitments.  So optimistically we have six total solution players.

We have two (Alcatel-Lucent and Oracle) committed to something that could lead to a business case—the killer app of mobile/content in the former case and operations revolution in the latter.  Neither, though, has delivered an organized proof of their ability.  That may be because they’re holding their cards close for competitive reasons, because the business case is too operator-specific in details to make a general document useful, or because they can’t do it yet.  The same issues could be holding back those players who have no visible commitment to an approach.  Maybe they are holding back, or maybe they are just hoping none will be required.

For all the NFV hopefuls, time is short.  Operators need to respond to what they say is a 2017 crossover between revenue- and cost-per-bit.  To do that they’ll need to get something into trial by mid-2016, and the trial will have to prove a business case with an impact large enough to influence their bottom line.  I think SDN and NFV, properly presented, could do just that, but as operators have told me many times “We have to hear that from vendors who can prove the business case!”  They’re as anxious to know who that could be as I am.

Is Verizon’s ThingSpace Really IoT or Is It “IONT?”

Verizon’s entry into the IoT space, via its “ThingSpace” API offering, has been covered by Light Reading and also released directly by Verizon.  The company is also opening its analytics engine for IoT use and making some network and pricing changes to accommodate IoT.  Verizon’s launch presentation on IoT properly characterizes IoT as a “Tower of Babel” and they promise to take the babel and complexity out of IoT.  It’s a bit early to say if this will happen, but let’s look at the approach and see what we can learn.

First and foremost, Verizon is like other operators (including rival AT&T) in seeing IoT through LTE-colored glasses.  On one hand that makes sense because Verizon is a network operator who sells services to connect things.  IoT as a connected set of devices is a market.  On the other hand, as I’ve pointed out, this directly-connected-devices vision cuts IoT adrift from the massive number of sensors and controllers already out there, connected with local and low-cost technology.

There’s no question that the IoT has enormous potential, but there is a lot of question about what specifically is feasible out of that vast pool of “potential”.  One thing I think hurts Verizon and most others in IoT is the notion Verizon called a “customer journey roadmap”, to guide people on the journey to IoT.

Why?  Because that shouldn’t be the journey at all.  In their presentation, Verizon is falling into the same trap we typically fall into with new technology development.  They make the means into the goal.  IoT is a mechanism that solves some business problems.  We need to focus first on the business problems, or opportunities, that we might apply IoT to, and from that focus derive their needs and cost tolerance.

ThingSpace, Verizon’s stated on-ramp into IoT, is built around the notion of new devices, directly connected.  The APIs published at this point relate to connection management for these directly connected devices, but Verizon is promising to expand these APIs to the thousands, and we’d hope that some of these new APIs step beyond this specific LTE-and-connected-devices mission.  Maybe they will, but Verizon’s presentation spends a lot of time talking about the IoT being built around the WAN, and they explicitly deprecate those local device attachment strategies that have already deployed literally billions of sensors and controllers.

There are, in my view, three elements to a complete LTE solution.  One is the device-connectivity portion that gets sensors and controllers into accessible form, connected to “IoT” as an abstraction.  Another is the set of applications that will draw on data and exercise control, and the final piece is that core abstraction that represents IoT to the applications.

A software developer, looking at IoT, would say that you should never build based on the connectivity abstraction for the simple reason that connectivity isn’t what you’re trying to find out about.  The fact is that an IoT application should never care about how the sensor or controller is connected.  They have to care about whether they can access it, how that is done, and how they can exploit it.

You can draw this.  Take a blank sheet of paper and do a big oval in the middle, labeling it “IoT Core Abstraction”.  Now put another small oval overlapping at the bottom and call it “Sensor/Controller Management”.  You complete the picture by drawing an oval overlapping at the top and labeling it “Applications”.

My view is that if we have industrial sensors and controllers in the hundreds of millions, it would make sense to prioritize drawing these into the IoT Core Abstraction.  Most residential control and all industrial control networks have gateways to make them accessible externally, including via the Internet.  If you presume that all sensors populate a database with information, via their Sensor/Controller Management elements, then you can add directly connected devices through their own (new) Sensor/Controller Management links.  The result is a kind of populated abstraction, which is how IoT should really be visualized.

Verizon might be seeing this in the long run, because they’re also talking about their analytics engine, which in their material they relate to “big data”.  That in my view relates it to a repository, and that then gives a face to the abstraction that forms the core of IoT.  It’s a big-data repository.  It’s populated with data from external elements, which include that directly connected stuff Verizon’s focusing on.  In short, this analytics engine might be the thing Verizon needs, and if Verizon positions it correctly then third-party developers could use APIs to introduce the information from those deprecated sensor architectures Verizon mentions, and jumpstart the utility of the whole concept.

So Verizon might be doing the right thing?  Yes, but I’m concerned by the fact that they’ve made such an explicit commitment to the new-device, LTE-connection model.  If you look at technologies like SDN and NFV, you’ll see that we’ve lost a lot of time and utility by having vendors pick approaches that were good for them but essentially shortsighted or even stupid by the standards of the opportunity overall.

Why would Verizon, who like most operators is very interested in home security and control, not want to link their ThingSpace to current in-home security/control architectures?  Most homes that want such capability already have stuff in place, and they’re unlikely to trash their investment (which in many cases is in the thousands of dollars) to jump into an “IoT” model of the same thing.

Verizon’s launch presentation included a comment that they had to find the “IoT Easy Button”.  Well, expecting every sensor and controller to have an independent WAN-based IoT Internet connection isn’t easy.  What this approach would do is place Verizon at a cost disadvantage versus every current technology sensors use.  These are in-building and incrementally free, and in many cases use low-power technologies that will live a year on a nine-volt battery.  The average upscale suburban connected home has twenty sensors and controllers.  Does Verizon propose to sell a hundred-dollar device for each of these, and then collect even reduced wireless fees per month?  I don’t think the average suburban user will accept that.

As it was presented, ThingSpace isn’t IoT, it’s the “Internet of New Things.”  Yes, Verizon might be thinking of shifting their focus more broadly, but if that’s the case why trash the technology options that are now in place and would eventually have to be accommodated to create acceptable early costs for buyers?  If they come around after a year or two, having exposed both their strategy and the fatal cost limitation to its adoption, will all the companies who now sell home and industrial control have sat on their hands and done nothing?  Not likely, nor is it likely that OTT companies won’t have seen the value of being a truly universal IoT Core Abstraction.

Maybe this is an important point even beyond IoT.  Here we are with revolutionary SDN and NFV, and operators are fixated on using them to do the same tired crap we can already do with legacy elements.  How much of our technology opportunities are fouled by a greedy fixation on the current business model?  Operators, with IoT, have an opportunity to look at selling something above their usual connectivity bit-pushing.  This same kind of revolutionary opportunity was presented with the Internet and operators booted it to the OTTs.  Will they do that with IoT now?  I think Verizon is perilously close to committing to that very thing.

Are Federation Requirements Converging SDN and NFV?

Federation is an important topic in both SDN and NFV for a number of reasons, but it’s not clear that the standards processes for either SDN or NFV are really moving to define a reasonable strategy.  The issues of federation will eventually have to be faced—there are some indications that vendors are jumping out to address them even pre-standard—so it’s a good idea to look at the issues and also to try to outline an effective solution.  Particularly since the best approach unites SDN and NFV in a way that’s not been discussed much.

“Federation” is a term that’s been applied loosely to service relationships among operators that allow for services to be created and deployed across administrative and technical boundaries.  Where there’s a standard network-to-network interface that serves to connect services seamlessly, it’s not much of an issue (the Internet, based on IP, has BGP for this).  Where there isn’t, you need some structure to determine how to create the link.

All federations expose operators who sell services and enterprises/consumers who buy them to a set of common issues.  First, you have to be able to connect the service across the boundaries of the federation.  That includes not only passing packets but also harmonizing any SLAs offered and dealing with any required financial settlement.  Second, you have to be able to manage the service end-to-end and address problems/remediation wherever they occur, without compromising the independence of the federated partner domains.

Most federation today is based on the presumption of a series of interconnected service networks that offer comparable capabilities based on comparable technologies, and whose technical standards include one for network-to-network interconnect.  The question that both SDN and NFV raise is how the concept would work when networks were virtual, meaning that they didn’t have a fixed capability set or service-layer standards.

The easiest thing to do is assume that you’re going to use virtual technology to duplicate current service-layer protocols like Ethernet or IP.  If this is the case, then you can federate your offerings at the first or connection level.  To make this approach work at all, you need to build virtual networks that exactly mimic L2/L3 service networks, including the NNIs.  If you presume that SLAs and financial issues are related to the “service layer” then you can assume that duplicating service-level capabilities with virtual infrastructure would address all of the connection issues.

Not so the management, and this is the fact that opens the rest of the discussion on federation.

If you presume service-layer federation, then you are effectively drawing a service as a set of connected domain-specific sub-services of the same type.  The constraints on protocols supported and features offered can be used to justify a position that these sub-services are intent models and so is the retail offering.  In past blogs I’ve asserted that a network service abstraction expressed as an intent model has a functionality, endpoints, and an SLA.  That would be true of our sub-services, and if we say that a sub-service implemented using SDN or NFV meets the functional requirements and SLA and serves the endpoints, then it realizes the intent model as well as legacy devices could.

The first issue this raises is one of out-of-the-legacy-box service functionality.  Where we have accepted boundary interfaces and features and SLAs, we can map SDN/NFV to service models inherited from the legacy world.  Where we don’t, meaning where we elect to extend today’s L2/L3 connection services, the question that arises is whether constraining SDN/NFV to legacy service models is a suitable strategy given the goal of revenue augmentation.  If such constraint isn’t optimal, then what has to happen for federation to work?

To start with, it should be clear that the SLA properties of intent models might be enforced by something like policy control inside the model, but structuring and federating a management view based on them is probably forever outside the scope of SDN.  NFV, on the other hand, pretty much has to create derived management views just to present a virtual-function implementation of something in a way equivalent to a native-device implementation.  My conclusion here is that NFV is a natural partner to SDN in management and federation, and that we shouldn’t bother to try to augment SDN to address management federation.

What about connection federation?  The issue here, if we look at our sub-services as intent models, relates to the question of what the “functionality” of an intent model is, and how it’s expressed.  If we were to see an IP sub-service, for example, we could define the function as “IP Domain” and offer two broad classes of interfaces—UNI and NNI.  If we presumed that there was a general model for OpenFlow forwarding, we could define a function as “OpenFlow Domain”, but our “Ports” would now be of one class and we’d have to address the question of how forwarding decisions could be made across multiple OpenFlow Domains.

The ONF, in OpenFlow, has taken a protocol-and-interface approach to things, which is inherently more limited than what I’ll call a “software-process” approach—the approach NFV explicitly supports.  It’s hard to see how you could define a simple protocol/interface to accommodate the sharing of forwarding-decision-support data for a generalized service, but easy to see how a software process could offer that.  Thus, it would seem that connection federation would be more easily supported by NFV, which could then drive SDN.

This creates a model of NGN where NFV sits on top to manage agile operations and management, and which then drives a series of lower-layer intent-modeled NaaS abstractions realized using whatever technology is optimal.  The decisions on forwarding in this lower layer could be met through the SDN Controller, and higher-layer multi-domain federation would then be managed by NFV.  This model seems appropriate if we assume that selection of NaaS domains would take place based on something other than traditional topology and traffic metrics, because NFV has to be able to make that kind of selection already.

The model wouldn’t foreclose using inter-SDN-controller links to exchange information among control domains.  If this were supported, then a lower-layer NaaS abstraction could actually be made up of multiple SDN-implemented domains.  The problem with this is that if metrics other than forwarding efficiency were needed to select implementations for a given NaaS, SDN isn’t equipped for it and it makes no sense in my view to augment SDN to creep into space NFV already occupies.

All of this suggests to me that carrier SDN might well depend on NFV for deployment, and that the relationship between SDN and NFV might be more complicated than either the SDN or NFV communities are currently envisioning it to be.  I’ve always believed that many, perhaps most, of the NFV benefits were tied to a more organized NaaS vision than either SDN or NFV currently presents, which would mean that evolving both SDN and NFV toward such a vision could be a prerequisite for the success of both.

Aligning NFV Business Cases with Reality

Before I took off on my vacation (just completed), I asked a bunch of my CFO contacts to review a document I prepared that outlined potential sources of NFV (or SDN) benefits.  They came back with some suggested changes and additions, and the result was a document with about 20 categories.  I’d also outlined some comments on the methodology for developing a business case using these benefits, and they had some views on that as well.

The first point the operators made was that of my 20 categories of savings, none could be considered valid except within a specific context of operations/management that could be costed out versus current practices.  Even capex reduction and all classes of revenue augmentation demand operations cost validation, because operators are interested in infrastructure TCO and not just capital cost and because revenue augmentation only happens if you generate net revenue after costs.

The reason my contacts thought this was the most critical point is that they tell me there are well over 80 NFV-related trials underway (that they are involved in or aware of) and that slightly less than ten percent of those have adequate engagement of operations/management and CIO-level people.  Of those, the CFOs think that perhaps half actually explore the operations practices sufficiently to create a cost model.

The second point I found interesting was that operators said operations cost reduction was the most credible, essential, benefit.  Based on my first comments here, you can see that the CFO organizations don’t think there’s much credible work on opex being done at present, but they had an additional point I found quite interesting.  They said that they had yet to have a vendor present them with a document that outlined their current cost sources and quantified each, even for a category of operator.

At the high level, the financial industry tells us that operators spend about 20 cents of each revenue dollar on capex, about 20 cents is returned as gross profit, and about 60 cents is operations and administration—expenses.  Some vendors take this whole pie and say they can reduce it, so CFOs say.  Well, one example the CFOs give is roaming charges for mobile operators, which is their largest single cost category.  How does NFV reduce that?

The CFOs say that there’s a specific subset of “opex” that I’ll call “process opex” which relates to the operations processes that could be influenced directly and indirectly by NFV.  They put six cost categories in this group.  How many NFV business cases had they been presented that outlined these six credible areas?  Zero.

One reason for the shortfall in useful opex data is that when you break down opex you’re forced to ask how your strategy would actually change it.  Here’s an example.  What’s the largest component of process opex?  Customer acquisition and retention.  Imagine yourself as the product manager of some NFV strategy, asked to tell the world how much your product will reduce marketing costs and churn, or help operators eliminate incentive programs?

Well, OK, you can see the issue there and perhaps you think the answer is to drop that category, which is OK as long as you want to kiss off a major benefit source.  What about the rest of the process?  CFOs point out that at least initial NFV deployment would increase costs in IT operations (because it requires deployment of data centers, which have to be run).  It would also likely increase the cost of service operations where VNF-based services had more distributed components to manage than discrete box strategies.  Offsetting this is the improvement that might be made in service operations through automation.

How much is that?  Most vendors who tout NFV don’t have a management/operations strategy (five, perhaps, do).  Even for those who do have an approach, would the conversion of an NFV lab trial to a field trial realize the savings, or prove them?  In order for operations to get more efficient, you have to automate all of it, not just the NFV pieces of it.  Otherwise your best outcome is to present the same costs as you had before, meaning no opex benefit.

On the revenue side, things aren’t much better according to my CFO sources.  Service revenue gains, as I said, have to be net of cost to be meaningful.  We can’t easily determine the operations costs of NFV for hypothetical new services because the focus has been on individual trials and PoCs and not on a broad new operations model.  Every new service might thus have new impact on operations, demand new features.  How do you get them?

Then there’s the issue of what exactly the service is.  Vendors, say the operators, are presenting them two “new services” in almost every presentation.  One is improved time to revenue achieved through improved deployment times.  The other is “on-demand” services—firewall-for-a-day, dial up your bit rate, etc.  Are these justified?

Time-to-revenue improvements are certainly achievable if you are talking about a new service or a new site within it.  Otherwise your customer is already provisioned, and what you’re really saying is firewall-as-a-service.  Is that credible?  Sure, but most operators say their users will buy as-a-service features when they connect a site and then hang in with those features.  How much revenue can really be created with this depends on how many suitable feature-enabling changes are made, and how many new prospects can be sold.  Those qualifications don’t seem to be working their way into business cases.

Elastic bandwidth is nothing new; we’ve talked about it for ages in fact.  Operators have long believed that if customers were offered a mixture of static long-term services and the ability to dial up capacity at time of need, there would indeed be a revenue gain from exercising the latter.  There’d also be a revenue loss for traditional leased services because all customers would game the pricing to get the lowest total cost.  Thus, operators say, they’re likely to lose money in the net.

At this point you probably think that the CFOs believe NFV is never going to prove out at a significant level, but that’s not the case.  Nearly every CFO thinks NFV will succeed eventually.  On the average, CFOs think that by 2018, SDN and NFV will have impacted about 20% of all network infrastructure investment.  That number is quite consistent with my own modeling of SDN/NFV opportunity.

We can do better than this.  Light Reading has published interviews with operators who said quite openly that the industry’s hype was hurting the business case, and they’re right.  That business case can be made, but it’s not easy to do and it requires broadening the presumed scope of NFV and SDN deployment from diddling at individual projects or services to building toward a systemic shift in infrastructure spending and management.  Hundreds of billions of dollars are at stake.  We could have proved out a strategy by now, and all we’ve proved is that there’s no easy way to get to one.

Well, maybe it’s time to try the hard, right, way.

Let’s Stop Thinking Small About Network Virtualization

Somebody told me last week that network virtualization was well underway, which surprised me because I don’t think it’s really even begun.  The difference in view lies in two factors—is “acceptance” equivalent to deployment and is a different way of doing the same thing the same as doing something different.

The issue we have with network virtualization, in which category I’ll put both SDN and NFV, is much the same as we have with the cloud.  We presume that a totally different way of doing things can be harnessed only to do what we’ve been able to do all along—presumably cheaper.  If all the cloud can offer is a lower cost because of economies of scale, then most enterprises will get enough scale through simple virtualization not to need public or private cloud services at all.  The cloud will succeed because we’ll build new application architectures to exploit its properties.  Network virtualization will be the same.

Traditional network services created through cooperative service-specific infrastructures impose a single set of connection rules as the price of sharing the cost among users.  Virtualization, at the network level, should allow us to define service connection rules on a per-user and per-service, basis without interfering with the cost sharing.  There are two elements to service connection rules—the address space or membership element and the forwarding rules.  With today’s networks we have default connectivity and we add on connection controls and policies.  We have a very one-dimensional vision of forwarding packets—it’s an arbitrary address.

Virtual networking should relax these constraints because it should allow us to impose any convenient addressing or connection model on a common infrastructure framework.  That means that networks would work according to the utility drivers associated with each user and application; no compromises to be part of the team, to secure economy of scale.

One of the most important prerequisites for this is breaking down the one-user-one-network rule.  We tend to think of networks today in a static and exclusive membership sense.  I have an address on a network, associated with a network service access point.  Send something to it and I get that something.  We already know from Amazon and Google’s experience in the cloud that you need to change that simple approach.  In virtual networking, a user is a vertical stack of NSAP/addresses, one for each connection network they’re a member of.  Google represents this well in their Andromeda documents, but Andromeda is still all about IP and there’s no reason to presume that NSAPs all have the same protocol, or any of the protocols that are in use today.

Multi-networkism (to coin a phrase) like this is critical if elastic networking is to be useful because we have to presume that the intersection of user and application/need will be multifaceted.  You need to be able to be a member of different networks if you want networking to be different.

The next step is getting traffic to users.  Forwarding rules define how a packet is examined by nodes to determine how to route it onward.  They associate an address and handling instructions, so they are linked to the address/membership side of the picture by the address concept.  The address is your “name” in a connection network.  The forwarding rules define how the address is interpreted to guide handling and delivery.

OpenStack’s real advance (which sadly isn’t completely realized for reasons we’ll get to) is that it defines a more elastic model of describing packet-handling by nodes.  Ideally what you’d like to have is a kind of mask-and-match or template structure that lets you pick what an “address” is from the range of stuff you’re presented with in the packet header.  Ideally, you’d also like to be able to transform the stuff you find, even to the extent of doing some high-speed local lookup and using the result.  The architecture might not work for all applications, but we should not constrain virtualization at the network level by the limits of current technology.  We have to accommodate those limits, but not perpetually.

An example of transformation-driven handling is the optical routing issue.  Optical is really a special case of non-packet traffic segmentation; TDM is another.  The point is that if there is any characteristic that separates traffic flows (and there’d better be or routing is kind of moot), we should be able to abstract that and then to convert that abstraction back to the form needed for the next hop.  A flow that’s incoming on Lambda A might be outgoing as a TDM slot; as long as we know the association we should be able to describe the handling.

Forwarding rules also unite the vertical stack of NSAP/addresses and the user who represents the logical top of that stack.  Every virtual network in the stack associates that user with an NSAP and the rules needed to get packets to and from the user.  How exactly that would work, and how complicated it would be, depends on how homogeneous you think the networks are.

If we presume (as is the case in cloud computing today) that the virtual networks are all IP networks, then what we have is multi-addressed users.  The presumption is that every virtual network has its own address space and that a packet sent by the user is linked to a virtual network by the address the user presents or by the destination address.  When a packet is received, it’s sent to the user and it can be presumed that the virtual-network affiliation of the origin doesn’t really matter.  This is consistent with the one-address-space Internet IP model.

This is the cloud-inspired virtual network model today, the one Amazon and Google have deployed.  This model offers considerable advantage in application-specific VPN examples for the future.  Imagine as-a-service apps presented with their own address space, connected outward via VPN into the virtual-network stacks of users.  Access to an application now depends on having a virtual-network-forwarding connection from that app’s NSAP to your vertical “stack”.

If we have different network memberships with different protocols in each, then network software in the user’s space would have to provide a means of separating the traffic.  You could assign multiple logical software ports, put a network ID in the packet, or use any other mechanism handy.  This shows that for virtual networking to reach its full potential we’ll need to examine how software accesses network connections.  Otherwise usage practices of the present will tie down our options for the future.

I’m not saying that this is the only approach to virtual networking that realizes its potential; obviously the benefit of virtual networking lies in large part in its agility to present many connection and forwarding models.  I do think this approach represents an advance from today, one that’s already being used by cloud giants, and so it’s the kind of thing that could start discussions that might break many out of excessive “IP-and-Ethernet-think”.  We need that to advance networking as far as it can be taken.

IBM, Juniper, and Jumping at the Right Time

You all probably know how I love blog topics that present a contrast, and today we have a nice opportunity for one with the quarterly results from IBM and Juniper.  The former disappointed the Street and the latter made them happy.  The former has touted transition and the latter seems to be staying the course.  What might be unifying them is partly a timing issue; Juniper might be staying too long and IBM may have jumped too soon.

I’ve admired IBM longer than any other tech company, in part of course because they’ve been around longer.  IBM has in the past weathered some of the most dramatic transitions in IT.  They’ve retained a high level of account control and strategic influence, but they’ve lost a bit of both in the last decade.  Their challenge has been a lack of marketing.  The IBM brand was iconic all the way through the 90s, but in the current century they’ve lost touch with the broad market.

Nobody thinks PCs are the profit engines of IT, but getting out of the PC business had the effect of taking IBM’s label off the piece of gear that most professionals carried with them every day.  It also disconnected IBM, in a product sense, from the wave of technology populism that swept us into the era of the cloud.

The cloud, or hosted IT in any form, is an automatic issue for a company like IBM who relied on controlling IT planning with sales presence.  The line organizations could either bypass the IT organization or beat them up with cloud threats, and in either case cloud marketing was having an influence on buyers that IBM’s sales people could never hope to call on.

IBM’s cloud strategy seems driven by the notion that the cloud is an alternative infrastructure for IT professionals.  They’ve discounted the populism of as-a-service delivery, and they’ve expected the IT organizations to embrace the cloud and drive it through their line departments in spite of the fact that the cloud was unfamiliar to these IT professionals.  They saw the commoditization of hardware, replaced by the demand for cloud platform and development tools.

Eventually they may well be right in their vision, but the problem is that they acted on the presumption that the shift to the cloud would occur in their way, and fast enough to offset the loss of hardware revenue.  Instead, the cloud overhung IT budgets and projects and the new stone IBM hoped to jump on as it tried to navigate the river crossing was less stable than the one it left.

If IBM is an IT incumbent who got out of IT too soon, Juniper is a networking incumbent who may have stayed too long.  It started as a router vendor, got into switching way later than it should have (and switching is now its strongest area), and under the leadership of an ex-Microsoft executive tried to make a transition into software that’s never really gelled.  They made acquisitions in the mobile, voice, and content delivery spaces and none of that paid off.  It’s hard to say whether Juniper was wedded to the past big-box glory or just couldn’t figure out how to get out of it.

Juniper has marketing issues too, but their marketing challenge is posed by their reluctance to embrace any form of networking other than big-box networking.  Under the previous CEO they couldn’t get SDN and NFV straight, and even though Juniper talked about cloud computing before any other network vendor, they never got a tight positioning on it even when they had product announcements that clearly favored cloud trends.

SDN is now perhaps Juniper’s big success; their Contrail strategy is a good way to build VPNs on top of legacy infrastructure.  I like Contrail and the approach, and yet I think it’s still a conservative view of an aggressive market opportunity.  Juniper can succeed with conservative SDN positioning, as it can succeed with big-box strategies, as long as someone else with a more aggressive take on the evolution doesn’t step up and say convincing things.  Alcatel-Lucent, with Nuage, could in my view present a much more futures-driven picture of SDN and its evolution from legacy networking, but they’re also mired a bit in the past.

So Juniper took a safe position with their big-box story, a position that could win only if everyone in the SDN and NFV space booted their opportunity.  Well, that’s what happened.  Telcos want to invest in a new infrastructure model, but there’s no such model being presented as yet.  NFV and SDN are just a bunch of disconnected projects.  That favors evolutionary approaches that focus more on the starting point than on the ultimate destination.

There’s the critical common element between our two vendors in a nutshell.  Both IBM and Juniper were impacted by the inertia of past practices.  IBM hoped for change and didn’t get it, and lost ground because they moved too quickly.  Juniper feared change, and buyers apparently feared it too, or at least feared jumping into an unknown.  Juniper gained ground by being behind.

One obvious conclusion here is that we’re stuck in a legacy IT and networking model because we can’t demonstrate significant benefits to justify transitioning to a new one.  In the past, productivity improvements have fueled major tech transitions but we don’t have that today.  Focusing on cost reduction tends to limit the ability of buyers to tolerate mass changes.

IBM needs to get its IT professional base, the people it has influence on and regular contact with, embarked on an effective campaign for internally driven application agility and worker productivity that’s centered on the cloud.  It also needs to get its marketing act in order, and provide its sales force cover for the effort needed.

It should also consider the question of whether telco cloud and NFV could be a good Greenfield opportunity to drive changes in a space where the upside is very large.  The telco universe, with its low internal rate of return, should be a natural place for cloud services to emerge and that’s not happened.  Part of the reason is that operators aren’t particularly strong marketers, but it’s also true that they’re not particularly excited about getting into another low-margin business.  NFV principles applied to cloud services could reduce operations costs and improve margins.

It’s harder to say what Juniper’s response should be, and perhaps it’s also a waste of time given that Juniper is doing OK while doing what it wants to do—for now at least.  Maybe they’ll be right and both the SDN and NFV revolutions will fizzle, creating nothing more than eddy opportunities that won’t threaten the big-box story.  But even if that’s true, Juniper will lose margins and market share as operators respond to the situation by pushing on product pricing.  With, of course, Huawei eager to respond.

Is the future unrecognized and upon us, or is it perhaps just that it’s no different from the present?  IBM bets on the future, Juniper on the past, but any bet is a risk and it may come down to execution.  I didn’t care for the Microsoft crowd at Juniper, but the new CEO (Rahim) seems to have a much better handle on things.  He may be able to jar Juniper out of past mistakes.  IBM on the other hand has past successes it needs to recapture, not by turning back but by harnessing the agility they exploited through all the previous technology earthquakes they’ve endured.  Do they have the leadership for that?  I’m not so sure.

The Dell/EMC Deal: Can It Work for NFV or Even for Dell?

Dell’s decision to acquire EMC has raised a lot of questions among fans of both companies, and there’s certainly a new competitive dynamic in play with the move.  The most dramatic aspect of the deal might turn out to be the impact it has on the cloud, SDN, and NFV positioning of the combined company.  Dell, like most industry players, has been a cautious advocate of open tools.  VMware virtually demands that Dell rethink that open posture, and in particular how it might define “success” with NFV.

NFV, if it were to be optimally deployed, has the potential to generate over 100,000 new data center installations worldwide, and to consume well over ten million new servers.  That would make NFV the largest single new application of data center technology, and make any data center equipment or software vendor who wasn’t in on the revolution into a second-rate player.

The challenge is that NFV’s business case has been difficult to make, not because there’s real questions on how it could be done but because doing all that’s required is complex and expensive both at the product development and the sales level.  Since most of the largess spilled into the market by NFV success would fall into the arms of server vendors no matter how that success is generated, server vendors have to decide whether to try to push NFV as an ecosystem and develop the business case, or simply presume someone will and sit back to rake in the dough.

I was involved with Dell in the CloudNFV project, where Dell provided the hosting facilities and integration for CloudNFV.  At one point, in the fall of 2013, it looked to many network operators in the NFV ISG like Dell was going to field an NFV solution based on CloudNFV.  Dell did take over the leadership of the project when I bowed out as Chief Architect in January 2014, but nothing came of whatever hopes operators might have had for Dell’s entry as a full-scope NFV player.  Dell seems to have decided that it would sell NFV Infrastructure to support any full solution that did emerge.

That approach might be difficult to sustain now.  With VMware in house, Dell needs to find a role for VMware in NFV, which VMware itself has been working to do, in addition to making sure it gets server deals.  At the simple level, that could mean nothing more than supporting a vision of the Virtual Infrastructure Manager (VIM) element of NFV that maintains independence from implementation specifics that currently tend to tie VIMs to OpenStack.  Such a move would make VMware part of NFV Infrastructure, aligning the deal with Dell’s current position with servers.  But that might not be enough.

You cannot make a business case for NFV through NFVI and VIMs alone.  You need orchestration and management, and you need support for legacy infrastructure and for operations/management process orchestration as well as deployment orchestration.  When Dell was supporting the OpenStack party line, they could presume that anyone who could do all the orchestration and management would pull through a general OpenStack solution, a solution Dell could sell to.  Many have specific OpenStack commitments of their own, and Dell now has to be seen as representing another camp.  Could they then have to build or acquire a complete NFV orchestration solution?

Up until fairly recently, that probably didn’t matter much.  NFV has been more a media event than a real technology revolution.  CFOs in the operator space have been griping about the lack of a business case for a year now, but if nobody had one then everyone was happy to ply the press with sound bites.  Now, though, it’s clear that operators will start spending on NFV in 2016 and that will create some real winners, winners whose bottom line will be augmented by NFV’s transformation.  Those winners will become NFV incumbents to beat.  Dell, if they want to be among those being augmented, will have to make that business case too.  And so far they can’t.

What could change that?  The easiest approach for Dell would be M&A, given that Ciena has already embarked on an M&A-driven quest for an NFV business case.  With their acquisition of Cyan, they updated their website to push all the right buttons for a complete NFV story.  They say they’ve put an order of magnitude more engineering talent on the problem, too.  So with Cyan off the table, what’s left for Dell?

The most obvious answer would be “Overture Networks”, who was one of the other players in CloudNFV and so is known to Dell.  Overture has a complete NFV solution too; no need for a big engineering push.  But while that would be a smart buy for Dell, I think evidence says they won’t make it.  Why?  Because if they wanted Overture the smart thing would have been to grab it before they did the EMC deal.  Now there might be other contenders.

The less obvious answer is that Dell has no intention of buying anyone because they have no intention of being an NFV business case leader.  Remember, Dell had that position in its grasp, as the most credible player in what was the first and leading NFV PoC.  They could have taken CloudNFV on to commercialization and they didn’t.  So why not presume that they wanted none of that management and orchestration business case stuff?

SDN, maybe?  Remember that EMC/VMware got Nicira, the first credible SDN player.  Now, of course, SDN seems locked in an open-source duel with Open Daylight on one side and ONOS on the other.  How many articles have you seen on whether Nicira technology might supplant either?  So SDN’s out too.

That leaves only two possibilities—Dell is doubling down on its NFVI-centric vision or it’s not even thinking about the service providers in the EMC deal—it’s about the enterprise.  Both these possible drivers have arguments in their favor.

Dell could be looking at the open-source movement among operators, embodied in the OPNFV project, and thinking that the solution to the business case problem will be created in open-source software and thus could be applied to any vendor.  There are two problems with this.  First, OPNFV is a long way from delivering anything comprehensive enough to make the business case, and frankly I’m not sure it’s ever going to get there.  Second, Dell would need to insure that all the decisions made in architecting the software were at least compatible with an implementation using VMware.

It’s hard to tell whether Dell or VMware know what steps they’d need to take to accomplish that.  There is a movement within NFV to move to intent modeling at critical interfaces, but Dell has not led that movement or even been a particularly conspicuous supporter of it.  Neither has VMware.  Given that a lot of the structure of OPNFV is getting set in stone, it might be too late to do the critical compatibility stuff, and certainly there’s going to be plenty of time for competitors to drive their own initiatives with their own full NFV solutions.  Remember, we have at least four vendors who have enough in place.

On the other hand, VMware virtualization is well established in the data center.  The logical pathway for a VMware shop to the cloud is through VMware, whether that shop is an enterprise or an operator.  VMware has its own vCloud approach, and an NFV activity that seems primarily directed at supporting NFV applications of vCloud in the carrier space.  So Dell could have cloud evolution in mind, period, and might plan to exploit it more than drive it in both cases.

Which might not be as dumb as it sounds.  The big problem both NFV and the cloud have is their reliance on open-source, which has a specialized revenue model for vendors to say the least.  Who pays for buyer education when the result is open to all?  Dell might realize that in the end both NFV and the cloud have to succeed by somebody selling servers to host stuff on.  If Dell can bundle servers with VMware and vCloud and actually deliver what buyers want, will they care about open source or even standards?  Yes if there’s an open/standard option on the table, but will there be?

In the end, though, Dell can probably win only if some key competitors dally at the starting gate.  HP has everything needed for the cloud, SDN and NFV.  Oracle has a lot of the stuff.  IBM has some, as does Alcatel-Lucent.  Red Hat and Intel/Wind River have powerful platform tools that could do what VMware does, and if they get a lot of good PR and are developed optimally, they could pose a challenge for Dell—do they embrace competitive software platforms to sell servers and undermine their VMware assets, or toss the opportunities these software platforms represent aside to protect their latest acquisition?

This is going to be a challenging time for Dell, for sure.

 

Is Carrier-Grade NFV Really Important?

OpenStack has been seen by most as an essential element in any NFV solution, but lately there have been questions raised on whether OpenStack can meet the grade, meaning “carrier-grade” or “five-nines”.  Light Reading did an article on this, and Stratus recently published an NFV PoC that they say proves that OpenStack VIM mechanisms are insufficient to assure carrier grade operation.  They go on to say that their PoC proves that it’s possible to add resiliency and availability management as a platform service, and that doing so would reduce the cost and difficulty associated with meeting high availability requirements.  The PoC is interesting on a number of fronts, some of which are troubling to classical NFV wisdom.

Let’s start with a bit of background.  People have generally recognized that specialized appliances used in networking could be made much more reliable/available than general-purpose servers.  That means that NFV implementations of features could be less reliable, and that could hurt NFV deployment.  Proponents of NFV have suggested this problem could be countered by engineering resiliency into the NFV configuration—parallel elements are more reliable than single elements.

The problem with the approach is twofold.  First, a network of redundant components deployed to avoid single points of failure is harder to build and more complicated, which can raise the operations costs enough to threaten the business case if you’re not careful.  Second, if you define a fault as a condition visible in the service data plane, most faults can’t be prevented with parallel component deployment because some in-flight packets will be lost.  That’s a problem often described as “state management” because new instances of a process don’t always know what the state was for the process instance they’re replacing.

I blogged early on in the NFV cycle that you could not engineer availability through redundant deployment of VNFs alone so I can hardly disagree with the primary point.  What Stratus is saying is that if you enhance the platform that hosts VNFs you can do things like maintain state for stateful switchovers, essential in maintaining operation in the face of a fault.  I agree with that too.  Stratus’ message is that you can address the two issues better than OpenStack can, with configuration-based availability by making the platform for hosting VNFs configuration-availability-aware.

Well, I’m not sure I can buy that point, not the least because OpenStack is about deployment of VNFs, and most availability issues arise in the steady state, when OpenStack has done its work.  Yes, you can invoke it again for redeployment of VNFs, but it seems to me that the questions of NFV reliability have to be solved at a deeper level than just OpenStack, and that OpenStack may be getting the rap for a broader set of problems.

State maintenance isn’t a slam dunk issue either.  Most stateful software these days likely uses “back end” state control (Metaswitch uses this in its insightful implementation of IMS, called Project Clearwater) and you can use back-end state control without OpenStack being aware or involved, and without any other special platform tools.

Worse, I don’t think that even state-aware platforms are going to be a suitable replacement for high-availability gear in all cases.  You can’t make a router state universal across instances without duplicating the data stream, which is hardly a strategy to build an NFV business case with.  But of course routers recover from the failure of devices or trunks, and so we may not need routers to be fully paralleled in configuration-based availability management.  Which raises the question of whether “failures” that are routine in IP or Ethernet networks have to be afforded special handled just because they’re handled with VNFs.

The final point is that we still have to consider whether five-nines is actually a necessary feature.  Availability is a feature, and like any other feature you have to trade it against cost to decide if it’s got any buyer utility.  The old days of the PSTN versus the new world of mobile services is a good example; people are happy to pay less for cellular services even if they’re not nearly as high-quality as wireline voice used to be.

Two situations argue for high availability for VNFs.  One is multi-tenancy, meaning VNFs that deploy not just for a single customer but for a large number.  The other is interior network features like “virtual core routing” that might be associated with a large-scale network virtualization application.  The mainstream VNF stuff, which all falls into the category of “service chaining”, is much more problematic as a high-availability app.  Since Stratus is citing the benefits of their availability-platform approach to VNF providers, the credibility of that space is important, so we’ll deal with the classic VNF applications of service chaining first.

Yes, it is true that if you could make state control a feature of a platform rather than something that VNFs have to control on their own, VNF vendors would have an easier time.  As a software architect (fallen, to be sure, to the dark side of consulting!) I have a problem believing that you can control distributed state of multiple availability-managed components without knowing just what each component thinks is its own state.  There are plenty of variables in a program; which are state-critical?

Even more fundamentally speaking, I doubt that service-chained VNFs, the focus of most VNF providers, really need carrier-grade availability.  These features have historically been provided by CPE on the customer premises, after all.  It’s also true that most of the new services that operators feel they are missing out on, services that OTTs are winning, have far less than five-nines reliability.  Why should the operators have to meet a different standard, likely at a higher cost?

Multi-tenant features like IMS or core routing would make sense as a high-availability service, but again I wonder whether we should be listening to the voice of perhaps the most experienced VNF provider of all—Metaswitch.  They built in resiliency at the VNF level, and that means others could do the same.  Give the limitations of having a platform anticipate the state-management and other needs of a high-availability application, letting VNFs do their own thing makes the most sense.

I think platformized NFV is not only a good idea, it’s inevitable.  There will surely be a set of services made available to VNFs and VNF developers, and while it would be nice if we had a standard to describe this sort of thing, there is zero chance of getting anything useful in place in time to influence market development.  I’d like to see people argue for platform features for VNFs and not against OpenStack or any other platform element.  That means describing what they propose to offer under realistic conditions, and I don’t think Stratus has yet done that.

I also think that we’re ignoring the big question here, which is the complexity/cost question.  We’re acting like NFV deployment is some sort of divine mandate rather than something that has to be justified.  We propose features and capabilities that add both direct cost and complexity-induced costs to an NFV deployment equation that we know isn’t all that favorably balanced at best.  We can make VNFs do anything, including a lot of stuff they should not be doing.