Is Verizon’s ThingSpace Really IoT or Is It “IONT?”

Verizon’s entry into the IoT space, via its “ThingSpace” API offering, has been covered by Light Reading and also released directly by Verizon.  The company is also opening its analytics engine for IoT use and making some network and pricing changes to accommodate IoT.  Verizon’s launch presentation on IoT properly characterizes IoT as a “Tower of Babel” and they promise to take the babel and complexity out of IoT.  It’s a bit early to say if this will happen, but let’s look at the approach and see what we can learn.

First and foremost, Verizon is like other operators (including rival AT&T) in seeing IoT through LTE-colored glasses.  On one hand that makes sense because Verizon is a network operator who sells services to connect things.  IoT as a connected set of devices is a market.  On the other hand, as I’ve pointed out, this directly-connected-devices vision cuts IoT adrift from the massive number of sensors and controllers already out there, connected with local and low-cost technology.

There’s no question that the IoT has enormous potential, but there is a lot of question about what specifically is feasible out of that vast pool of “potential”.  One thing I think hurts Verizon and most others in IoT is the notion Verizon called a “customer journey roadmap”, to guide people on the journey to IoT.

Why?  Because that shouldn’t be the journey at all.  In their presentation, Verizon is falling into the same trap we typically fall into with new technology development.  They make the means into the goal.  IoT is a mechanism that solves some business problems.  We need to focus first on the business problems, or opportunities, that we might apply IoT to, and from that focus derive their needs and cost tolerance.

ThingSpace, Verizon’s stated on-ramp into IoT, is built around the notion of new devices, directly connected.  The APIs published at this point relate to connection management for these directly connected devices, but Verizon is promising to expand these APIs to the thousands, and we’d hope that some of these new APIs step beyond this specific LTE-and-connected-devices mission.  Maybe they will, but Verizon’s presentation spends a lot of time talking about the IoT being built around the WAN, and they explicitly deprecate those local device attachment strategies that have already deployed literally billions of sensors and controllers.

There are, in my view, three elements to a complete LTE solution.  One is the device-connectivity portion that gets sensors and controllers into accessible form, connected to “IoT” as an abstraction.  Another is the set of applications that will draw on data and exercise control, and the final piece is that core abstraction that represents IoT to the applications.

A software developer, looking at IoT, would say that you should never build based on the connectivity abstraction for the simple reason that connectivity isn’t what you’re trying to find out about.  The fact is that an IoT application should never care about how the sensor or controller is connected.  They have to care about whether they can access it, how that is done, and how they can exploit it.

You can draw this.  Take a blank sheet of paper and do a big oval in the middle, labeling it “IoT Core Abstraction”.  Now put another small oval overlapping at the bottom and call it “Sensor/Controller Management”.  You complete the picture by drawing an oval overlapping at the top and labeling it “Applications”.

My view is that if we have industrial sensors and controllers in the hundreds of millions, it would make sense to prioritize drawing these into the IoT Core Abstraction.  Most residential control and all industrial control networks have gateways to make them accessible externally, including via the Internet.  If you presume that all sensors populate a database with information, via their Sensor/Controller Management elements, then you can add directly connected devices through their own (new) Sensor/Controller Management links.  The result is a kind of populated abstraction, which is how IoT should really be visualized.

Verizon might be seeing this in the long run, because they’re also talking about their analytics engine, which in their material they relate to “big data”.  That in my view relates it to a repository, and that then gives a face to the abstraction that forms the core of IoT.  It’s a big-data repository.  It’s populated with data from external elements, which include that directly connected stuff Verizon’s focusing on.  In short, this analytics engine might be the thing Verizon needs, and if Verizon positions it correctly then third-party developers could use APIs to introduce the information from those deprecated sensor architectures Verizon mentions, and jumpstart the utility of the whole concept.

So Verizon might be doing the right thing?  Yes, but I’m concerned by the fact that they’ve made such an explicit commitment to the new-device, LTE-connection model.  If you look at technologies like SDN and NFV, you’ll see that we’ve lost a lot of time and utility by having vendors pick approaches that were good for them but essentially shortsighted or even stupid by the standards of the opportunity overall.

Why would Verizon, who like most operators is very interested in home security and control, not want to link their ThingSpace to current in-home security/control architectures?  Most homes that want such capability already have stuff in place, and they’re unlikely to trash their investment (which in many cases is in the thousands of dollars) to jump into an “IoT” model of the same thing.

Verizon’s launch presentation included a comment that they had to find the “IoT Easy Button”.  Well, expecting every sensor and controller to have an independent WAN-based IoT Internet connection isn’t easy.  What this approach would do is place Verizon at a cost disadvantage versus every current technology sensors use.  These are in-building and incrementally free, and in many cases use low-power technologies that will live a year on a nine-volt battery.  The average upscale suburban connected home has twenty sensors and controllers.  Does Verizon propose to sell a hundred-dollar device for each of these, and then collect even reduced wireless fees per month?  I don’t think the average suburban user will accept that.

As it was presented, ThingSpace isn’t IoT, it’s the “Internet of New Things.”  Yes, Verizon might be thinking of shifting their focus more broadly, but if that’s the case why trash the technology options that are now in place and would eventually have to be accommodated to create acceptable early costs for buyers?  If they come around after a year or two, having exposed both their strategy and the fatal cost limitation to its adoption, will all the companies who now sell home and industrial control have sat on their hands and done nothing?  Not likely, nor is it likely that OTT companies won’t have seen the value of being a truly universal IoT Core Abstraction.

Maybe this is an important point even beyond IoT.  Here we are with revolutionary SDN and NFV, and operators are fixated on using them to do the same tired crap we can already do with legacy elements.  How much of our technology opportunities are fouled by a greedy fixation on the current business model?  Operators, with IoT, have an opportunity to look at selling something above their usual connectivity bit-pushing.  This same kind of revolutionary opportunity was presented with the Internet and operators booted it to the OTTs.  Will they do that with IoT now?  I think Verizon is perilously close to committing to that very thing.

Are Federation Requirements Converging SDN and NFV?

Federation is an important topic in both SDN and NFV for a number of reasons, but it’s not clear that the standards processes for either SDN or NFV are really moving to define a reasonable strategy.  The issues of federation will eventually have to be faced—there are some indications that vendors are jumping out to address them even pre-standard—so it’s a good idea to look at the issues and also to try to outline an effective solution.  Particularly since the best approach unites SDN and NFV in a way that’s not been discussed much.

“Federation” is a term that’s been applied loosely to service relationships among operators that allow for services to be created and deployed across administrative and technical boundaries.  Where there’s a standard network-to-network interface that serves to connect services seamlessly, it’s not much of an issue (the Internet, based on IP, has BGP for this).  Where there isn’t, you need some structure to determine how to create the link.

All federations expose operators who sell services and enterprises/consumers who buy them to a set of common issues.  First, you have to be able to connect the service across the boundaries of the federation.  That includes not only passing packets but also harmonizing any SLAs offered and dealing with any required financial settlement.  Second, you have to be able to manage the service end-to-end and address problems/remediation wherever they occur, without compromising the independence of the federated partner domains.

Most federation today is based on the presumption of a series of interconnected service networks that offer comparable capabilities based on comparable technologies, and whose technical standards include one for network-to-network interconnect.  The question that both SDN and NFV raise is how the concept would work when networks were virtual, meaning that they didn’t have a fixed capability set or service-layer standards.

The easiest thing to do is assume that you’re going to use virtual technology to duplicate current service-layer protocols like Ethernet or IP.  If this is the case, then you can federate your offerings at the first or connection level.  To make this approach work at all, you need to build virtual networks that exactly mimic L2/L3 service networks, including the NNIs.  If you presume that SLAs and financial issues are related to the “service layer” then you can assume that duplicating service-level capabilities with virtual infrastructure would address all of the connection issues.

Not so the management, and this is the fact that opens the rest of the discussion on federation.

If you presume service-layer federation, then you are effectively drawing a service as a set of connected domain-specific sub-services of the same type.  The constraints on protocols supported and features offered can be used to justify a position that these sub-services are intent models and so is the retail offering.  In past blogs I’ve asserted that a network service abstraction expressed as an intent model has a functionality, endpoints, and an SLA.  That would be true of our sub-services, and if we say that a sub-service implemented using SDN or NFV meets the functional requirements and SLA and serves the endpoints, then it realizes the intent model as well as legacy devices could.

The first issue this raises is one of out-of-the-legacy-box service functionality.  Where we have accepted boundary interfaces and features and SLAs, we can map SDN/NFV to service models inherited from the legacy world.  Where we don’t, meaning where we elect to extend today’s L2/L3 connection services, the question that arises is whether constraining SDN/NFV to legacy service models is a suitable strategy given the goal of revenue augmentation.  If such constraint isn’t optimal, then what has to happen for federation to work?

To start with, it should be clear that the SLA properties of intent models might be enforced by something like policy control inside the model, but structuring and federating a management view based on them is probably forever outside the scope of SDN.  NFV, on the other hand, pretty much has to create derived management views just to present a virtual-function implementation of something in a way equivalent to a native-device implementation.  My conclusion here is that NFV is a natural partner to SDN in management and federation, and that we shouldn’t bother to try to augment SDN to address management federation.

What about connection federation?  The issue here, if we look at our sub-services as intent models, relates to the question of what the “functionality” of an intent model is, and how it’s expressed.  If we were to see an IP sub-service, for example, we could define the function as “IP Domain” and offer two broad classes of interfaces—UNI and NNI.  If we presumed that there was a general model for OpenFlow forwarding, we could define a function as “OpenFlow Domain”, but our “Ports” would now be of one class and we’d have to address the question of how forwarding decisions could be made across multiple OpenFlow Domains.

The ONF, in OpenFlow, has taken a protocol-and-interface approach to things, which is inherently more limited than what I’ll call a “software-process” approach—the approach NFV explicitly supports.  It’s hard to see how you could define a simple protocol/interface to accommodate the sharing of forwarding-decision-support data for a generalized service, but easy to see how a software process could offer that.  Thus, it would seem that connection federation would be more easily supported by NFV, which could then drive SDN.

This creates a model of NGN where NFV sits on top to manage agile operations and management, and which then drives a series of lower-layer intent-modeled NaaS abstractions realized using whatever technology is optimal.  The decisions on forwarding in this lower layer could be met through the SDN Controller, and higher-layer multi-domain federation would then be managed by NFV.  This model seems appropriate if we assume that selection of NaaS domains would take place based on something other than traditional topology and traffic metrics, because NFV has to be able to make that kind of selection already.

The model wouldn’t foreclose using inter-SDN-controller links to exchange information among control domains.  If this were supported, then a lower-layer NaaS abstraction could actually be made up of multiple SDN-implemented domains.  The problem with this is that if metrics other than forwarding efficiency were needed to select implementations for a given NaaS, SDN isn’t equipped for it and it makes no sense in my view to augment SDN to creep into space NFV already occupies.

All of this suggests to me that carrier SDN might well depend on NFV for deployment, and that the relationship between SDN and NFV might be more complicated than either the SDN or NFV communities are currently envisioning it to be.  I’ve always believed that many, perhaps most, of the NFV benefits were tied to a more organized NaaS vision than either SDN or NFV currently presents, which would mean that evolving both SDN and NFV toward such a vision could be a prerequisite for the success of both.

Aligning NFV Business Cases with Reality

Before I took off on my vacation (just completed), I asked a bunch of my CFO contacts to review a document I prepared that outlined potential sources of NFV (or SDN) benefits.  They came back with some suggested changes and additions, and the result was a document with about 20 categories.  I’d also outlined some comments on the methodology for developing a business case using these benefits, and they had some views on that as well.

The first point the operators made was that of my 20 categories of savings, none could be considered valid except within a specific context of operations/management that could be costed out versus current practices.  Even capex reduction and all classes of revenue augmentation demand operations cost validation, because operators are interested in infrastructure TCO and not just capital cost and because revenue augmentation only happens if you generate net revenue after costs.

The reason my contacts thought this was the most critical point is that they tell me there are well over 80 NFV-related trials underway (that they are involved in or aware of) and that slightly less than ten percent of those have adequate engagement of operations/management and CIO-level people.  Of those, the CFOs think that perhaps half actually explore the operations practices sufficiently to create a cost model.

The second point I found interesting was that operators said operations cost reduction was the most credible, essential, benefit.  Based on my first comments here, you can see that the CFO organizations don’t think there’s much credible work on opex being done at present, but they had an additional point I found quite interesting.  They said that they had yet to have a vendor present them with a document that outlined their current cost sources and quantified each, even for a category of operator.

At the high level, the financial industry tells us that operators spend about 20 cents of each revenue dollar on capex, about 20 cents is returned as gross profit, and about 60 cents is operations and administration—expenses.  Some vendors take this whole pie and say they can reduce it, so CFOs say.  Well, one example the CFOs give is roaming charges for mobile operators, which is their largest single cost category.  How does NFV reduce that?

The CFOs say that there’s a specific subset of “opex” that I’ll call “process opex” which relates to the operations processes that could be influenced directly and indirectly by NFV.  They put six cost categories in this group.  How many NFV business cases had they been presented that outlined these six credible areas?  Zero.

One reason for the shortfall in useful opex data is that when you break down opex you’re forced to ask how your strategy would actually change it.  Here’s an example.  What’s the largest component of process opex?  Customer acquisition and retention.  Imagine yourself as the product manager of some NFV strategy, asked to tell the world how much your product will reduce marketing costs and churn, or help operators eliminate incentive programs?

Well, OK, you can see the issue there and perhaps you think the answer is to drop that category, which is OK as long as you want to kiss off a major benefit source.  What about the rest of the process?  CFOs point out that at least initial NFV deployment would increase costs in IT operations (because it requires deployment of data centers, which have to be run).  It would also likely increase the cost of service operations where VNF-based services had more distributed components to manage than discrete box strategies.  Offsetting this is the improvement that might be made in service operations through automation.

How much is that?  Most vendors who tout NFV don’t have a management/operations strategy (five, perhaps, do).  Even for those who do have an approach, would the conversion of an NFV lab trial to a field trial realize the savings, or prove them?  In order for operations to get more efficient, you have to automate all of it, not just the NFV pieces of it.  Otherwise your best outcome is to present the same costs as you had before, meaning no opex benefit.

On the revenue side, things aren’t much better according to my CFO sources.  Service revenue gains, as I said, have to be net of cost to be meaningful.  We can’t easily determine the operations costs of NFV for hypothetical new services because the focus has been on individual trials and PoCs and not on a broad new operations model.  Every new service might thus have new impact on operations, demand new features.  How do you get them?

Then there’s the issue of what exactly the service is.  Vendors, say the operators, are presenting them two “new services” in almost every presentation.  One is improved time to revenue achieved through improved deployment times.  The other is “on-demand” services—firewall-for-a-day, dial up your bit rate, etc.  Are these justified?

Time-to-revenue improvements are certainly achievable if you are talking about a new service or a new site within it.  Otherwise your customer is already provisioned, and what you’re really saying is firewall-as-a-service.  Is that credible?  Sure, but most operators say their users will buy as-a-service features when they connect a site and then hang in with those features.  How much revenue can really be created with this depends on how many suitable feature-enabling changes are made, and how many new prospects can be sold.  Those qualifications don’t seem to be working their way into business cases.

Elastic bandwidth is nothing new; we’ve talked about it for ages in fact.  Operators have long believed that if customers were offered a mixture of static long-term services and the ability to dial up capacity at time of need, there would indeed be a revenue gain from exercising the latter.  There’d also be a revenue loss for traditional leased services because all customers would game the pricing to get the lowest total cost.  Thus, operators say, they’re likely to lose money in the net.

At this point you probably think that the CFOs believe NFV is never going to prove out at a significant level, but that’s not the case.  Nearly every CFO thinks NFV will succeed eventually.  On the average, CFOs think that by 2018, SDN and NFV will have impacted about 20% of all network infrastructure investment.  That number is quite consistent with my own modeling of SDN/NFV opportunity.

We can do better than this.  Light Reading has published interviews with operators who said quite openly that the industry’s hype was hurting the business case, and they’re right.  That business case can be made, but it’s not easy to do and it requires broadening the presumed scope of NFV and SDN deployment from diddling at individual projects or services to building toward a systemic shift in infrastructure spending and management.  Hundreds of billions of dollars are at stake.  We could have proved out a strategy by now, and all we’ve proved is that there’s no easy way to get to one.

Well, maybe it’s time to try the hard, right, way.

Let’s Stop Thinking Small About Network Virtualization

Somebody told me last week that network virtualization was well underway, which surprised me because I don’t think it’s really even begun.  The difference in view lies in two factors—is “acceptance” equivalent to deployment and is a different way of doing the same thing the same as doing something different.

The issue we have with network virtualization, in which category I’ll put both SDN and NFV, is much the same as we have with the cloud.  We presume that a totally different way of doing things can be harnessed only to do what we’ve been able to do all along—presumably cheaper.  If all the cloud can offer is a lower cost because of economies of scale, then most enterprises will get enough scale through simple virtualization not to need public or private cloud services at all.  The cloud will succeed because we’ll build new application architectures to exploit its properties.  Network virtualization will be the same.

Traditional network services created through cooperative service-specific infrastructures impose a single set of connection rules as the price of sharing the cost among users.  Virtualization, at the network level, should allow us to define service connection rules on a per-user and per-service, basis without interfering with the cost sharing.  There are two elements to service connection rules—the address space or membership element and the forwarding rules.  With today’s networks we have default connectivity and we add on connection controls and policies.  We have a very one-dimensional vision of forwarding packets—it’s an arbitrary address.

Virtual networking should relax these constraints because it should allow us to impose any convenient addressing or connection model on a common infrastructure framework.  That means that networks would work according to the utility drivers associated with each user and application; no compromises to be part of the team, to secure economy of scale.

One of the most important prerequisites for this is breaking down the one-user-one-network rule.  We tend to think of networks today in a static and exclusive membership sense.  I have an address on a network, associated with a network service access point.  Send something to it and I get that something.  We already know from Amazon and Google’s experience in the cloud that you need to change that simple approach.  In virtual networking, a user is a vertical stack of NSAP/addresses, one for each connection network they’re a member of.  Google represents this well in their Andromeda documents, but Andromeda is still all about IP and there’s no reason to presume that NSAPs all have the same protocol, or any of the protocols that are in use today.

Multi-networkism (to coin a phrase) like this is critical if elastic networking is to be useful because we have to presume that the intersection of user and application/need will be multifaceted.  You need to be able to be a member of different networks if you want networking to be different.

The next step is getting traffic to users.  Forwarding rules define how a packet is examined by nodes to determine how to route it onward.  They associate an address and handling instructions, so they are linked to the address/membership side of the picture by the address concept.  The address is your “name” in a connection network.  The forwarding rules define how the address is interpreted to guide handling and delivery.

OpenStack’s real advance (which sadly isn’t completely realized for reasons we’ll get to) is that it defines a more elastic model of describing packet-handling by nodes.  Ideally what you’d like to have is a kind of mask-and-match or template structure that lets you pick what an “address” is from the range of stuff you’re presented with in the packet header.  Ideally, you’d also like to be able to transform the stuff you find, even to the extent of doing some high-speed local lookup and using the result.  The architecture might not work for all applications, but we should not constrain virtualization at the network level by the limits of current technology.  We have to accommodate those limits, but not perpetually.

An example of transformation-driven handling is the optical routing issue.  Optical is really a special case of non-packet traffic segmentation; TDM is another.  The point is that if there is any characteristic that separates traffic flows (and there’d better be or routing is kind of moot), we should be able to abstract that and then to convert that abstraction back to the form needed for the next hop.  A flow that’s incoming on Lambda A might be outgoing as a TDM slot; as long as we know the association we should be able to describe the handling.

Forwarding rules also unite the vertical stack of NSAP/addresses and the user who represents the logical top of that stack.  Every virtual network in the stack associates that user with an NSAP and the rules needed to get packets to and from the user.  How exactly that would work, and how complicated it would be, depends on how homogeneous you think the networks are.

If we presume (as is the case in cloud computing today) that the virtual networks are all IP networks, then what we have is multi-addressed users.  The presumption is that every virtual network has its own address space and that a packet sent by the user is linked to a virtual network by the address the user presents or by the destination address.  When a packet is received, it’s sent to the user and it can be presumed that the virtual-network affiliation of the origin doesn’t really matter.  This is consistent with the one-address-space Internet IP model.

This is the cloud-inspired virtual network model today, the one Amazon and Google have deployed.  This model offers considerable advantage in application-specific VPN examples for the future.  Imagine as-a-service apps presented with their own address space, connected outward via VPN into the virtual-network stacks of users.  Access to an application now depends on having a virtual-network-forwarding connection from that app’s NSAP to your vertical “stack”.

If we have different network memberships with different protocols in each, then network software in the user’s space would have to provide a means of separating the traffic.  You could assign multiple logical software ports, put a network ID in the packet, or use any other mechanism handy.  This shows that for virtual networking to reach its full potential we’ll need to examine how software accesses network connections.  Otherwise usage practices of the present will tie down our options for the future.

I’m not saying that this is the only approach to virtual networking that realizes its potential; obviously the benefit of virtual networking lies in large part in its agility to present many connection and forwarding models.  I do think this approach represents an advance from today, one that’s already being used by cloud giants, and so it’s the kind of thing that could start discussions that might break many out of excessive “IP-and-Ethernet-think”.  We need that to advance networking as far as it can be taken.

IBM, Juniper, and Jumping at the Right Time

You all probably know how I love blog topics that present a contrast, and today we have a nice opportunity for one with the quarterly results from IBM and Juniper.  The former disappointed the Street and the latter made them happy.  The former has touted transition and the latter seems to be staying the course.  What might be unifying them is partly a timing issue; Juniper might be staying too long and IBM may have jumped too soon.

I’ve admired IBM longer than any other tech company, in part of course because they’ve been around longer.  IBM has in the past weathered some of the most dramatic transitions in IT.  They’ve retained a high level of account control and strategic influence, but they’ve lost a bit of both in the last decade.  Their challenge has been a lack of marketing.  The IBM brand was iconic all the way through the 90s, but in the current century they’ve lost touch with the broad market.

Nobody thinks PCs are the profit engines of IT, but getting out of the PC business had the effect of taking IBM’s label off the piece of gear that most professionals carried with them every day.  It also disconnected IBM, in a product sense, from the wave of technology populism that swept us into the era of the cloud.

The cloud, or hosted IT in any form, is an automatic issue for a company like IBM who relied on controlling IT planning with sales presence.  The line organizations could either bypass the IT organization or beat them up with cloud threats, and in either case cloud marketing was having an influence on buyers that IBM’s sales people could never hope to call on.

IBM’s cloud strategy seems driven by the notion that the cloud is an alternative infrastructure for IT professionals.  They’ve discounted the populism of as-a-service delivery, and they’ve expected the IT organizations to embrace the cloud and drive it through their line departments in spite of the fact that the cloud was unfamiliar to these IT professionals.  They saw the commoditization of hardware, replaced by the demand for cloud platform and development tools.

Eventually they may well be right in their vision, but the problem is that they acted on the presumption that the shift to the cloud would occur in their way, and fast enough to offset the loss of hardware revenue.  Instead, the cloud overhung IT budgets and projects and the new stone IBM hoped to jump on as it tried to navigate the river crossing was less stable than the one it left.

If IBM is an IT incumbent who got out of IT too soon, Juniper is a networking incumbent who may have stayed too long.  It started as a router vendor, got into switching way later than it should have (and switching is now its strongest area), and under the leadership of an ex-Microsoft executive tried to make a transition into software that’s never really gelled.  They made acquisitions in the mobile, voice, and content delivery spaces and none of that paid off.  It’s hard to say whether Juniper was wedded to the past big-box glory or just couldn’t figure out how to get out of it.

Juniper has marketing issues too, but their marketing challenge is posed by their reluctance to embrace any form of networking other than big-box networking.  Under the previous CEO they couldn’t get SDN and NFV straight, and even though Juniper talked about cloud computing before any other network vendor, they never got a tight positioning on it even when they had product announcements that clearly favored cloud trends.

SDN is now perhaps Juniper’s big success; their Contrail strategy is a good way to build VPNs on top of legacy infrastructure.  I like Contrail and the approach, and yet I think it’s still a conservative view of an aggressive market opportunity.  Juniper can succeed with conservative SDN positioning, as it can succeed with big-box strategies, as long as someone else with a more aggressive take on the evolution doesn’t step up and say convincing things.  Alcatel-Lucent, with Nuage, could in my view present a much more futures-driven picture of SDN and its evolution from legacy networking, but they’re also mired a bit in the past.

So Juniper took a safe position with their big-box story, a position that could win only if everyone in the SDN and NFV space booted their opportunity.  Well, that’s what happened.  Telcos want to invest in a new infrastructure model, but there’s no such model being presented as yet.  NFV and SDN are just a bunch of disconnected projects.  That favors evolutionary approaches that focus more on the starting point than on the ultimate destination.

There’s the critical common element between our two vendors in a nutshell.  Both IBM and Juniper were impacted by the inertia of past practices.  IBM hoped for change and didn’t get it, and lost ground because they moved too quickly.  Juniper feared change, and buyers apparently feared it too, or at least feared jumping into an unknown.  Juniper gained ground by being behind.

One obvious conclusion here is that we’re stuck in a legacy IT and networking model because we can’t demonstrate significant benefits to justify transitioning to a new one.  In the past, productivity improvements have fueled major tech transitions but we don’t have that today.  Focusing on cost reduction tends to limit the ability of buyers to tolerate mass changes.

IBM needs to get its IT professional base, the people it has influence on and regular contact with, embarked on an effective campaign for internally driven application agility and worker productivity that’s centered on the cloud.  It also needs to get its marketing act in order, and provide its sales force cover for the effort needed.

It should also consider the question of whether telco cloud and NFV could be a good Greenfield opportunity to drive changes in a space where the upside is very large.  The telco universe, with its low internal rate of return, should be a natural place for cloud services to emerge and that’s not happened.  Part of the reason is that operators aren’t particularly strong marketers, but it’s also true that they’re not particularly excited about getting into another low-margin business.  NFV principles applied to cloud services could reduce operations costs and improve margins.

It’s harder to say what Juniper’s response should be, and perhaps it’s also a waste of time given that Juniper is doing OK while doing what it wants to do—for now at least.  Maybe they’ll be right and both the SDN and NFV revolutions will fizzle, creating nothing more than eddy opportunities that won’t threaten the big-box story.  But even if that’s true, Juniper will lose margins and market share as operators respond to the situation by pushing on product pricing.  With, of course, Huawei eager to respond.

Is the future unrecognized and upon us, or is it perhaps just that it’s no different from the present?  IBM bets on the future, Juniper on the past, but any bet is a risk and it may come down to execution.  I didn’t care for the Microsoft crowd at Juniper, but the new CEO (Rahim) seems to have a much better handle on things.  He may be able to jar Juniper out of past mistakes.  IBM on the other hand has past successes it needs to recapture, not by turning back but by harnessing the agility they exploited through all the previous technology earthquakes they’ve endured.  Do they have the leadership for that?  I’m not so sure.

The Dell/EMC Deal: Can It Work for NFV or Even for Dell?

Dell’s decision to acquire EMC has raised a lot of questions among fans of both companies, and there’s certainly a new competitive dynamic in play with the move.  The most dramatic aspect of the deal might turn out to be the impact it has on the cloud, SDN, and NFV positioning of the combined company.  Dell, like most industry players, has been a cautious advocate of open tools.  VMware virtually demands that Dell rethink that open posture, and in particular how it might define “success” with NFV.

NFV, if it were to be optimally deployed, has the potential to generate over 100,000 new data center installations worldwide, and to consume well over ten million new servers.  That would make NFV the largest single new application of data center technology, and make any data center equipment or software vendor who wasn’t in on the revolution into a second-rate player.

The challenge is that NFV’s business case has been difficult to make, not because there’s real questions on how it could be done but because doing all that’s required is complex and expensive both at the product development and the sales level.  Since most of the largess spilled into the market by NFV success would fall into the arms of server vendors no matter how that success is generated, server vendors have to decide whether to try to push NFV as an ecosystem and develop the business case, or simply presume someone will and sit back to rake in the dough.

I was involved with Dell in the CloudNFV project, where Dell provided the hosting facilities and integration for CloudNFV.  At one point, in the fall of 2013, it looked to many network operators in the NFV ISG like Dell was going to field an NFV solution based on CloudNFV.  Dell did take over the leadership of the project when I bowed out as Chief Architect in January 2014, but nothing came of whatever hopes operators might have had for Dell’s entry as a full-scope NFV player.  Dell seems to have decided that it would sell NFV Infrastructure to support any full solution that did emerge.

That approach might be difficult to sustain now.  With VMware in house, Dell needs to find a role for VMware in NFV, which VMware itself has been working to do, in addition to making sure it gets server deals.  At the simple level, that could mean nothing more than supporting a vision of the Virtual Infrastructure Manager (VIM) element of NFV that maintains independence from implementation specifics that currently tend to tie VIMs to OpenStack.  Such a move would make VMware part of NFV Infrastructure, aligning the deal with Dell’s current position with servers.  But that might not be enough.

You cannot make a business case for NFV through NFVI and VIMs alone.  You need orchestration and management, and you need support for legacy infrastructure and for operations/management process orchestration as well as deployment orchestration.  When Dell was supporting the OpenStack party line, they could presume that anyone who could do all the orchestration and management would pull through a general OpenStack solution, a solution Dell could sell to.  Many have specific OpenStack commitments of their own, and Dell now has to be seen as representing another camp.  Could they then have to build or acquire a complete NFV orchestration solution?

Up until fairly recently, that probably didn’t matter much.  NFV has been more a media event than a real technology revolution.  CFOs in the operator space have been griping about the lack of a business case for a year now, but if nobody had one then everyone was happy to ply the press with sound bites.  Now, though, it’s clear that operators will start spending on NFV in 2016 and that will create some real winners, winners whose bottom line will be augmented by NFV’s transformation.  Those winners will become NFV incumbents to beat.  Dell, if they want to be among those being augmented, will have to make that business case too.  And so far they can’t.

What could change that?  The easiest approach for Dell would be M&A, given that Ciena has already embarked on an M&A-driven quest for an NFV business case.  With their acquisition of Cyan, they updated their website to push all the right buttons for a complete NFV story.  They say they’ve put an order of magnitude more engineering talent on the problem, too.  So with Cyan off the table, what’s left for Dell?

The most obvious answer would be “Overture Networks”, who was one of the other players in CloudNFV and so is known to Dell.  Overture has a complete NFV solution too; no need for a big engineering push.  But while that would be a smart buy for Dell, I think evidence says they won’t make it.  Why?  Because if they wanted Overture the smart thing would have been to grab it before they did the EMC deal.  Now there might be other contenders.

The less obvious answer is that Dell has no intention of buying anyone because they have no intention of being an NFV business case leader.  Remember, Dell had that position in its grasp, as the most credible player in what was the first and leading NFV PoC.  They could have taken CloudNFV on to commercialization and they didn’t.  So why not presume that they wanted none of that management and orchestration business case stuff?

SDN, maybe?  Remember that EMC/VMware got Nicira, the first credible SDN player.  Now, of course, SDN seems locked in an open-source duel with Open Daylight on one side and ONOS on the other.  How many articles have you seen on whether Nicira technology might supplant either?  So SDN’s out too.

That leaves only two possibilities—Dell is doubling down on its NFVI-centric vision or it’s not even thinking about the service providers in the EMC deal—it’s about the enterprise.  Both these possible drivers have arguments in their favor.

Dell could be looking at the open-source movement among operators, embodied in the OPNFV project, and thinking that the solution to the business case problem will be created in open-source software and thus could be applied to any vendor.  There are two problems with this.  First, OPNFV is a long way from delivering anything comprehensive enough to make the business case, and frankly I’m not sure it’s ever going to get there.  Second, Dell would need to insure that all the decisions made in architecting the software were at least compatible with an implementation using VMware.

It’s hard to tell whether Dell or VMware know what steps they’d need to take to accomplish that.  There is a movement within NFV to move to intent modeling at critical interfaces, but Dell has not led that movement or even been a particularly conspicuous supporter of it.  Neither has VMware.  Given that a lot of the structure of OPNFV is getting set in stone, it might be too late to do the critical compatibility stuff, and certainly there’s going to be plenty of time for competitors to drive their own initiatives with their own full NFV solutions.  Remember, we have at least four vendors who have enough in place.

On the other hand, VMware virtualization is well established in the data center.  The logical pathway for a VMware shop to the cloud is through VMware, whether that shop is an enterprise or an operator.  VMware has its own vCloud approach, and an NFV activity that seems primarily directed at supporting NFV applications of vCloud in the carrier space.  So Dell could have cloud evolution in mind, period, and might plan to exploit it more than drive it in both cases.

Which might not be as dumb as it sounds.  The big problem both NFV and the cloud have is their reliance on open-source, which has a specialized revenue model for vendors to say the least.  Who pays for buyer education when the result is open to all?  Dell might realize that in the end both NFV and the cloud have to succeed by somebody selling servers to host stuff on.  If Dell can bundle servers with VMware and vCloud and actually deliver what buyers want, will they care about open source or even standards?  Yes if there’s an open/standard option on the table, but will there be?

In the end, though, Dell can probably win only if some key competitors dally at the starting gate.  HP has everything needed for the cloud, SDN and NFV.  Oracle has a lot of the stuff.  IBM has some, as does Alcatel-Lucent.  Red Hat and Intel/Wind River have powerful platform tools that could do what VMware does, and if they get a lot of good PR and are developed optimally, they could pose a challenge for Dell—do they embrace competitive software platforms to sell servers and undermine their VMware assets, or toss the opportunities these software platforms represent aside to protect their latest acquisition?

This is going to be a challenging time for Dell, for sure.

 

Is Carrier-Grade NFV Really Important?

OpenStack has been seen by most as an essential element in any NFV solution, but lately there have been questions raised on whether OpenStack can meet the grade, meaning “carrier-grade” or “five-nines”.  Light Reading did an article on this, and Stratus recently published an NFV PoC that they say proves that OpenStack VIM mechanisms are insufficient to assure carrier grade operation.  They go on to say that their PoC proves that it’s possible to add resiliency and availability management as a platform service, and that doing so would reduce the cost and difficulty associated with meeting high availability requirements.  The PoC is interesting on a number of fronts, some of which are troubling to classical NFV wisdom.

Let’s start with a bit of background.  People have generally recognized that specialized appliances used in networking could be made much more reliable/available than general-purpose servers.  That means that NFV implementations of features could be less reliable, and that could hurt NFV deployment.  Proponents of NFV have suggested this problem could be countered by engineering resiliency into the NFV configuration—parallel elements are more reliable than single elements.

The problem with the approach is twofold.  First, a network of redundant components deployed to avoid single points of failure is harder to build and more complicated, which can raise the operations costs enough to threaten the business case if you’re not careful.  Second, if you define a fault as a condition visible in the service data plane, most faults can’t be prevented with parallel component deployment because some in-flight packets will be lost.  That’s a problem often described as “state management” because new instances of a process don’t always know what the state was for the process instance they’re replacing.

I blogged early on in the NFV cycle that you could not engineer availability through redundant deployment of VNFs alone so I can hardly disagree with the primary point.  What Stratus is saying is that if you enhance the platform that hosts VNFs you can do things like maintain state for stateful switchovers, essential in maintaining operation in the face of a fault.  I agree with that too.  Stratus’ message is that you can address the two issues better than OpenStack can, with configuration-based availability by making the platform for hosting VNFs configuration-availability-aware.

Well, I’m not sure I can buy that point, not the least because OpenStack is about deployment of VNFs, and most availability issues arise in the steady state, when OpenStack has done its work.  Yes, you can invoke it again for redeployment of VNFs, but it seems to me that the questions of NFV reliability have to be solved at a deeper level than just OpenStack, and that OpenStack may be getting the rap for a broader set of problems.

State maintenance isn’t a slam dunk issue either.  Most stateful software these days likely uses “back end” state control (Metaswitch uses this in its insightful implementation of IMS, called Project Clearwater) and you can use back-end state control without OpenStack being aware or involved, and without any other special platform tools.

Worse, I don’t think that even state-aware platforms are going to be a suitable replacement for high-availability gear in all cases.  You can’t make a router state universal across instances without duplicating the data stream, which is hardly a strategy to build an NFV business case with.  But of course routers recover from the failure of devices or trunks, and so we may not need routers to be fully paralleled in configuration-based availability management.  Which raises the question of whether “failures” that are routine in IP or Ethernet networks have to be afforded special handled just because they’re handled with VNFs.

The final point is that we still have to consider whether five-nines is actually a necessary feature.  Availability is a feature, and like any other feature you have to trade it against cost to decide if it’s got any buyer utility.  The old days of the PSTN versus the new world of mobile services is a good example; people are happy to pay less for cellular services even if they’re not nearly as high-quality as wireline voice used to be.

Two situations argue for high availability for VNFs.  One is multi-tenancy, meaning VNFs that deploy not just for a single customer but for a large number.  The other is interior network features like “virtual core routing” that might be associated with a large-scale network virtualization application.  The mainstream VNF stuff, which all falls into the category of “service chaining”, is much more problematic as a high-availability app.  Since Stratus is citing the benefits of their availability-platform approach to VNF providers, the credibility of that space is important, so we’ll deal with the classic VNF applications of service chaining first.

Yes, it is true that if you could make state control a feature of a platform rather than something that VNFs have to control on their own, VNF vendors would have an easier time.  As a software architect (fallen, to be sure, to the dark side of consulting!) I have a problem believing that you can control distributed state of multiple availability-managed components without knowing just what each component thinks is its own state.  There are plenty of variables in a program; which are state-critical?

Even more fundamentally speaking, I doubt that service-chained VNFs, the focus of most VNF providers, really need carrier-grade availability.  These features have historically been provided by CPE on the customer premises, after all.  It’s also true that most of the new services that operators feel they are missing out on, services that OTTs are winning, have far less than five-nines reliability.  Why should the operators have to meet a different standard, likely at a higher cost?

Multi-tenant features like IMS or core routing would make sense as a high-availability service, but again I wonder whether we should be listening to the voice of perhaps the most experienced VNF provider of all—Metaswitch.  They built in resiliency at the VNF level, and that means others could do the same.  Give the limitations of having a platform anticipate the state-management and other needs of a high-availability application, letting VNFs do their own thing makes the most sense.

I think platformized NFV is not only a good idea, it’s inevitable.  There will surely be a set of services made available to VNFs and VNF developers, and while it would be nice if we had a standard to describe this sort of thing, there is zero chance of getting anything useful in place in time to influence market development.  I’d like to see people argue for platform features for VNFs and not against OpenStack or any other platform element.  That means describing what they propose to offer under realistic conditions, and I don’t think Stratus has yet done that.

I also think that we’re ignoring the big question here, which is the complexity/cost question.  We’re acting like NFV deployment is some sort of divine mandate rather than something that has to be justified.  We propose features and capabilities that add both direct cost and complexity-induced costs to an NFV deployment equation that we know isn’t all that favorably balanced at best.  We can make VNFs do anything, including a lot of stuff they should not be doing.