Is Virtualization Reality Even More Elusive than Virtual Reality?

Software, in defining networks, shouldn’t be expected to cull through a lot of details on network operation.  But yes it should.  SDN will be the agent of making applications a part of the network itself.  No, that’s NFV’s role, or maybe it’s nobody’s role because it’s not even a goal.  If you listen to the views of the future of networking, you get…views in a plural sense.  We have no consensus on a topic that’s critical to the way we build applications, networks, data centers, maybe even client devices.  We have no real idea what the demands of the network of the future might be, or what applications will demand of it.

I’ve come to believe that the issues we face in networking are really emerging out of the collision of two notions; elastic relationships between application components and their resources,  and the notion that functionality once resident in special-purpose devices could be exported to run on servers somewhere.  Either of these things change the rules; both change everything.

When we build networks from discrete devices, we connect them through well-known interfaces, much the way a kid builds castles with interlocking blocks.  Each box represents a unit of functionality and cooperation is induced across those standard connections.  Give the same kid a big heap of cement and you’re not likely to get as structured a castle.  The degrees of freedom created by the new tools overwhelm the builder’s natural abilities to organize them in a useful way.

When we pull functionality out of devices, we don’t eliminate the need to organize that functionality into cooperating systems we’d call a “service”.  In fact, make that harder.  First, there was a lot of intra-device flow that was invisible inside the package and now is not only visible but has to be connected somehow.  Worse, these connections are different from those well-known interfaces because they represent functional exchanges that have no accepted standards to support them.  And they should never be exposed to the user at all, lest the user build a bad image of a cow instead of a good castle by sticking the stuff together in the wrong way.  Virtualizing network functions requires that we organize and orchestrate functionality at a deeper level than ever, and still operationalize the result at least as efficiently as we’ve always done with devices.  Any operator or enterprise network planner knows that operationalization complexity and cost tends to rise based on the square of the number of components, and with virtualizing functions we explode the number of components.

And just when you’ve reconciled yourself to the fact that this may well suck, you face the second issue, which is that the cloud notion says that these components are not only divorced from those nice self-organizing devices, they’re scattered about in the cloud in a totally unpredictable way.  I might have a gadget with three functional components, and that creates three network applications to run.  Where?  Anywhere.  So now these things have to find one another at run time, and they have to be sited with some notion of optimizing traffic flow and availability when we deploy them.  We have to recognize a problem with one of these things as being a problem with the collective functionality we unloaded from the device, even though the three components are totally separated and may not even know about one another in any true sense.  Just because I pump out data downstream doesn’t mean I know if anyone is home, functionally, out there.

Over the years, as network equipment evolved from being service-specific to being “converged”, we developed a set of practices, protocols, and tools to deal with the new multiplicity of mission.  We began to gradually view service management as something more complicated than aggregated network or device management.  We began to recognize that providing somebody a management view into a virtual network was different than such a view into a real one.  We’re now confronting a similar problem but at a much larger scale, and with a timeline to address it that’s compressed by market interest, market hype, and vendor competition.

That’s the bad news.  The good news is that I believe that the pieces of the holistic vision of cloud, SDN, and NFV that we need to have today are all available out there.  We don’t have a pile of disorderly network cement, but rather a pile of Legos mixed with Lincoln Logs, Tinkertoy parts, and more.  We can build what we need if we organize what we have, and that seems to be the problem of the day.  The first step to solving it is to start at the top to define what applications need from networks, to frame the overall goals of the services themselves.  We’ve kind of done that with the cloud, which may be responsible for why it’s advanced fairly fast.  We’ve not done it with SDN (OpenFlow is hardly the top of the food chain) and we’re not doing it with NFV, which focuses on decomposing devices and not composing services.  We’re groping the classical elephant here, and while we may find trees and snakes and boulders in our exploration we’d better not try to pick fruits or milk something for venom or quarry some stones for our garden.  Holistic visions matter, sometimes.  This is one of them.

Taking the On-Ramp to the Virtual Age of Networking

Light Reading made an interesting point yesterday in commenting about this year’s Interop show.  I’ve been to Interop in the past, and it’s always been the bastion of Big Network Iron.  Now it may be about to show its softer side, if you believe the advance comments on keynotes and vendor announcements.  Trade shows drive hype more than they drive the industry, but a virtual change in Interop could presage something big if it really develops.  But could it be that virtual players will threaten real ones?  That’s not so sure.

As I noted in a blog I did on Metaswitch’s virtual IMS (an interview with LR on that development was also on their site), virtualization as a mechanism for radically reducing the unit cost of functionality is still a hope more than a predictable outcome.  There are many examples of purpose-built gear for residential and light business use where hardware prices of fifty bucks can be sustained.  It’s hard to imagine hosted functionality of the same level being priced much less, and it’s completely unknown whether such a framework can be operationalized.  But while we can’t expect that just transplanting network functions into the IT world will save a zillion bucks, we can presume it would do some interesting things, and these might have the effect of empowering some players and disemboweling others.

The functions that would be easiest to switch to hosted form would be the higher-layer functions involving things like security, VPNs, or application performance management.  Many vendors, including Cisco and Juniper, hoped to pull these functions into smarter edge devices to increase their value and differentiability.  If they’re hosted on servers instead, the result isn’t a reduction in the number of edge routers sold, but a reduction in the price per router.  We might see a “unit of edge functionality” that would have cost ten grand in the vendors’ smart-edge model costing nine in the new model, but two of that nine might then go to hosting and software, which would reduce vendor sales by 30% unless they got the hosting/software deal.

More complex higher-layer functionality like content delivery (CDN) and the components of mobile infrastructure (IMS, EPC) could create even more impact.  Both CDN and IMS/EPC are sold in package form today, and there are open-source and cloud-based commercial versions of all of the stuff already.  A major thrust toward open-source IMS, driven in my view more by the potential efficiencies of a properly designed/componentized software framework than by the fact that the software is “free”, would undermine Alcatel-Lucent, Ericsson, and NSN who rely a lot on IMS/EPC supremacy to pull through their overall mobile/metro solutions.

In the data center, I don’t think that SDN or virtualization is as much a threat as an opportunity.  No matter how many vSwitches you deploy, you don’t switch squat unless you have something real underneath that can carry traffic.  From the first, Nicira’s papers on the topic always made it clear that you might actually need to oversupply your data center network with bandwidth since virtual overlay traffic can’t be managed on a per-switch, per-trunk basis for optimized flows.  But what you can do in the data center is create a whole new model of how applications are deployed, a model that would transform our notion of security, application performance management, load balancing, and more stuff.  Each of these transformations is an opportunity for a far-seeing vendor, legacy or emerging.  If these transformations could be pulled out of the data center and extended to the branch, they would transform both carrier Ethernet networks and enterprise networks.  HP and Dell would be potentially big winners, or losers, in this changing of the guard.

Then there’s metro.  Metro will be the beneficiary of more incremental dollars than anywhere else, and more dollars means more changes could be made to create a transformation or revolution.  I pointed out before that we could add tens of thousands of new data centers to host network features.  These are green fields, open to any new architecture, and that’s more opportunity than all the enterprises on the planet will generate.  Metro is where all this virtualization will come home, literally, because whatever we know about services, one thing that’s clear is that you have to couple features to them at the customer edge where it’s easy—not in the core where it’s hard.

The key point in all of this, though, is the “R” in the “ROI”.  As I said, simple cost management, even improvements in operations costs, are not going to revolutionize the network because they have too slow an uptake in a modernization cycle.  You need some prodigious benefits, which means significant revenue, to create enough return to justify a major transformation.  IaaS isn’t going to cut it.  SDN isn’t going to cut it.  This is an applications-and-services revolution that is funded because we figured out how to transform the services of the service providers and the productivity of the enterprises.  So it’s not really Interop we need to be looking at to read networking’s tea leaves, it’s some yet-to-be-launched show that talks about what Apple and Google might talk about, but in carrier and enterprise terms.  Services, in the end, are what we perceive to be in our hand, and what we can do with those things.

A Tale of APIs, Executives, Teeth, and Bridges

Today we have the network sequitur and the network non sequitur.  In the former category I place the Alcatel-Lucent comments on the demise (or surrender) of their API efforts, and in the latter I place the Cisco guy leaving for Big Switch.  And amid all of this, I see tooth fairies and trolls.

Some people believe in SDN.  Some believe in NFV.  Some are holding out for the tooth fairy or trolls or ghosts.  For those with one of the first two belief sets, or others relating to improved network functionality, it’s hard to escape the fundamental truth that if software is going to control something, it’s going to do the controlling through APIs.  Further, it’s inescapable that if you start with those high-level APIs that would allow software to influence the network’s behavior, and you build downward from them, you’ll eventually create a cohesive set of control elements that translate what software wants to what the network does.  Top-down design, in short.

Look at Alcatel-Lucent’s notion of APIs and you run into the concept of “exposure”, and IMHO where Alcatel-Lucent went wrong is at that very early and critical point.  Exposure is a bottom-up concept, something that makes the current mechanisms for control accessible rather than defining the goals.  If you progress from what you have toward something you don’t quite grasp, you likely go wrong along the way.  You have to get from NYC to LA by making a series of turns, but if you don’t have the context of the destination in mind all along, you’ll make the wrong ones and end up in the wrong place, as Alcatel-Lucent did.

Perhaps, as Alcatel-Lucent believes, they aren’t needed where they focused their initiatives, but that means the focus was wrong and not the need.  What Alcatel-Lucent needs to do right now is to take up the NFV process as their baseline for progress.  NFV is thinking a lot about how to deploy and manage virtual functions but so far relatively little about where these things come from.  Hint:  The tooth fairy and trolls will not figure prominently in providing them!  Building services from virtual components starts with building services, and building them in a way that allows software processes to drive everything along the path to delivery.  Otherwise the operations costs for virtual functions will explode as the number of choices and elements explodes.  Any good software guy would look at the process and whip out a blank sheet on which to start drawing an object model.  Where’s that happening?  Nowhere I see, so Alcatel-Lucent can be on the ground floor, still.

On the Cisco-Big-Switch switcheroo on an executive, we’re trying to make news out of what is far more likely to be a simple career decision.  If you’re an SDN guy, why not join an SDN startup and collect a zillion dollars on the flip, or at least try to?  You can always go back if it doesn’t work out, or go to a competitor.  An old friend told me that in the late 40s or 50s, networking executives/managers get to the point where they have to roll the startup dice or forget that game forever.  But since having somebody leave establishment Cisco for revolutionary Big Switch looks like the revolution is working, it’s a great tale.  So was the tooth fairy and trolls, but that doesn’t make them real.  Stick a tooth under your pillow or look under the next bridge you see, and you’re unlikely to become a believer in fairy tales.

There’s a connection here besides the negative one.  The cloud is creating a dynamic application and resource model that demands a dynamic connection model, a different model from the static site networking or experience networking that businesses or consumers (respectively) now expect.  SDN can’t make a go of itself by providing a mechanism to do something that software can’t get its head around.  We have to start by looking at how that dynamic connection model is instantiated on dynamic infrastructure to support something stable and manageable and cost-effective enough to be commercially viable.  Big Switch isn’t doing that, not for the cloud as a whole.  If they don’t, or if somebody doesn’t gratuitously do it for them, then Cisco is going to win and Big Switch is going to lose.

Same with NFV.  We can have a great architecture to deploy virtual functions but we have to connect them into cooperating systems and manage their behavior, and most of all we have to do this while addressing the agile applications that absolutely have to come along and become the revenue kicker for telcos that makes all this work worthwhile.  And doing that means getting out that sheet of paper and drawing that object model, not drawing trolls and tooth fairies.

Juniper’s Contrail Story: Left on Base?

Juniper has released an SDN Controller based on its Contrail acquisition, and the early state of the material makes it difficult to judge just how much of an advance the JunosV Contrail product is, for the industry or for Juniper.  I want to stress that I was not briefed on the announcement and so have had no opportunity to get any more detail than was presented in the press release, blog, and a white paper.  The latter was a re-release of an earlier one, so it didn’t contribute much to my understanding.  If we did baseball here, Juniper left a guy on base at the end of the first inning.

Those who read my blogs know that my biggest gripe with SDN is a lack of an architected higher layer to receive network telemetry and service requests and synthesize routes based on a combination of service needs and topology.  My second-biggest gripe is a data-center-only focus, something that doesn’t extend SDN to the branch.  Behind the first gripe is my conviction that you need to have a tie between SDN and the IP world that goes beyond providing a default gateway, and behind the second is my conviction that the IP world has to include all of the enterprise network or the strategy loses enterprise relevance.

I can’t say that Juniper hasn’t addressed these points, but I can’t say that they have.  There is nothing in the material that’s explicit in either area, and a search of their website didn’t provide anything more than the press release.  Juniper does include something I think is important for carriers and the cloud, and even for NFV—federation across provider or public/private boundaries for NaaS.

The best way to approach enterprise clouds is to consider them a federation, because nearly all enterprises will adopt a public cloud service and also retain data center IT in some form.  If we presumed that an enterprise was a private cloud user, the hybridization of the data center with public cloud providers would be almost a given in enterprise cloud adoption.  For cloud providers, the need to support enterprises with global geographies and different software and platform needs would seem to dictate a federation strategy—either among providers or across them based on an enterprise federation vision.  Juniper promises a standards-based approach to federation.

Cloud federation at the enterprise level, meaning adopted by the enterprise and extended to public providers without specific cooperation on their part, would be a matter of providing something like OpenStack APIs (Quantum, Nova, etc.) across multiple management interfaces, and the ability to recognize jurisdictional boundaries in Quantum to know which interface to grab for a given resource.  Juniper does mention OpenStack in their material, so it’s entirely possible that this is what they have in mind.

At the provider level, it’s hard to say exactly what federation would entail because it would depend on the nature of the cloud and network service being offered by the various providers.  There are three general cloud service models for IT (IaaS, PaaS, and SaaS) and a Quantum-based evolution of NaaS models as well.  In theory, you could federate all of these, and I think that would be a good idea and a strong industry position for Juniper to take.

Facilitating network federation is probably not much of an issue; physical interconnect would be sufficient.  The question is what virtual network structures were used to underpin application services.  Most of the prevailing cloud architectures use a virtual network overlay product set (OVS and related tunneling strategies) to create flexible and segmented application-specific VPNs.  To extend these across a provider boundary could be done in a variety of ways, including creating a boundary device that could link the VPNs or providing something to harmonize Quantum administration of virtual networks across providers (as I noted above).  Other formal approaches to exchanging route information would also be possible if we went beyond the virtual level to actual OpenFlow SDNs.  I think that some mechanism for SDN-to-SDN route exchange would be smart, and again something Juniper might well do—and do well.

I just don’t know if they did it.  There was nothing on how federation was done or the boundaries of the capabilities.  Juniper isn’t alone in saying little in their SDN announcements, of course.  Beyond avowing support for SDN, we don’t really know what Juniper’s competitors have done.  The whole topic is so under-articulated that I expect our spring survey will show that buyers can’t even align their goals with current products.  We have a fairly good idea of how SDN and OpenFlow can support data center segmentation and multi-tenancy for cloud providers, but we know little beyond that.  We have less on NFV, but here it’s because the work of the body hasn’t identified a full set of relevant standards.  Juniper has only one mention of NFV on their website according to our search, and it’s not related to their current Contrail announcement, but they have made NFV presentations in the past.

I think federation could be a good hook for Juniper and SDN, but to make it one they have to embrace an NFV story to cover the buyer-side issues and they have to outline just what federation approach they’re taking in order to validate the utility of their federation claim.  It may be these things will come along in a later announcement; they’re not there now.

An API-Side View of Networking Revolutions

If you look at the developments in the SDN, NFV, and cloud areas, you find that there’s a lot of discussion about “services” and “APIs”.  One of the things I realized when I was reviewing some material from Intel/Aepona and Cisco was that there’s a bit of a disconnect in terms of how “services” and “APIs” are positioned, both with respect to each other and with respect to the trio of drivers we read about—SDN, NFV, and the cloud.

The term “service” is fairly well understood at the commercial level; a network service is something we buy from a service provider and it connects us to stuff.  At the technical level, a service is a cooperative relationship among devices linked by transmission facilities, for the purpose of delivering traffic.  One thing you can see straightaway (as my British friends would say) is that services can be hierarchical, meaning that they can be made up of component elements.  These components are commercially “wholesale” elements, and technically there’s no clearly accepted name for them.  We’ll call them “components”.  Things like IMS or EPC or a CDN are components.

An API is an interface through which a software application gains access to a “service”, which in most cases is really one of our “components”.  Like services, APIs are a bit hierarchical; there are “high-level” APIs that essentially mimic what we can do with appliances (phones, for example) to request service, and lower-level ones that move toward the mission of organizing network elements in that cooperative behavior I mentioned.  High-level APIs are simple to do and don’t pose any more risk than the use of phones or other user devices would pose.  Low-level APIs could in theory put the network into a bad state, steal resources from others, and create security issues.

Cloud networking means the creation of a resource network to hold application resources and the connection of that network to the user community.  In OpenStack, that’s the Quantum interface.  Using Quantum I can spin up a virtual network and then (using Nova) add VMs to it.  Quantum lets me define a means for connecting this structure to the user—a default gateway for example.  So you could assume that everyone who’s talking about virtual networks and virtual network services would be talking Quantum, right?  If only that were true!

Let’s look at SDN from Quantum’s perspective.  If I want to build a virtual network, I need to specify the connection points and service levels.  I can generate a BuildSDN(endPointList, QoS) command and have my Quantum interface build the result, right?  Well, maybe.  The first problem is that there are a number of connection topologies—LINEs between endpoints, a LAN on which all the endpoints are connected, or a TREE for multicast distribution.  Quantum most often assumes a virtual LAN subnet, but there are use cases in both SDN and NFV (“service chaining”) that imply the network connection is a set of paths or lines. I can fix this by adding a parameter to my BuildSDN, the connectionTopology.   The second problem is that OpenFlow doesn’t know anything about connection topologies even if I specify one, it only knows individual forwarding table entries.  Something has to organize the SDN request into a set of forwarding table changes so the Controller can send them to the devices.  If you look at the SDN stories of most vendors, you find it’s silent about how that happens.  So we have a wonderful BuildSDN API and nothing to connect it to.

NFV faces similar challenges.  The body wants to identify current standards for various roles, but API standards are rare.  Most of the network standards we have are really interfaces between devices and not APIs that link software components.  How does an interface translate to an API?  Do we have to connect two virtual functions with a communications path just because the devices they came out of were connected that way?  Suppose they were running on the same platform?  And what kind of “API” would we like to see something like IMS or EPC expose?  How would those APIs relate to SDN APIs if the activity we were supporting could have elements of both?

The logical way to plan for the future of network services is to consider what’s driving the changes.  That’s the cloud.  Quantum is the interface that defines “network-as-a-service” to the cloud, and so it’s the gold standard of high-level deployment APIs.  We need to be looking at SDN and NFV through Quantum-colored glasses to establish how their own APIs will be derived from the highest-level Quantum models.

But that’s only part of the story.  What does an SDN service look like?  Is it just an Ethernet or IP service pushed through SDN devices, or is it a service that has properties that can be generated in virtualized, centralized, infrastructure and couldn’t have been offered before?  If we take the former view, we cripple the SDN benefit case.  Are all NFV’s virtual functions simply models of existing devices, connected in the same way and offering the same features?  If so, we’re crippling the benefits of NFV.  Deploying stuff is essential for consuming it, but deployment of a flexible framework lets us compose cooperative relationships among functional elements in many different ways, some of which might even result in a whole new way of presenting services to end users.  Can we say that any of our current activities are thinking about that?

Cisco is publishing a rich set of intermediate APIs under its ONE banner.  Intel, with the acquisition of Aepona, is entering the world of intermediate-level APIs as a means of exposing provider assets.  We need a roadmap now, from both vendors.  We need to understand how these new and wonderful APIs fit in the context of Quantum, and through that context into both SDN and NFV.  Absent that map, we have no way to navigate the value proposition of APIs…or of SDN, NFV, or cloud networking.

Revolution or Pillow Fight?

This has been a pretty active week in terms of happenings of real relevance to the future of networking.  We’ve also had some background stuff going on, things that don’t rise to the level of being part of the revolution for various reasons.  Taken as a whole, they may be a signpost into how the revolution is proceeding at the tactical level, though.

HP, Brocade, and Arista all announced high-capacity data-center switches that supported OpenFlow and all of them made considerable hay on their SDN credentials with the products.  It’s clear that SDN compatibility has street creds with the media, but less clear just how much buyers pay attention to it.  The actual number of data centers that need this sort of support is limited but the players hope “the cloud” will fix that.

It’s not that I’m against an evolution to data center SDN, but I’m in favor of our actually having a value proposition to drive it.  New technology walks a fine line between creating a new paradigm and offering bathroom reading, and what puts it decisively in one camp or the other is the benefit case that can be harnessed to justify deployment.  Yes, cloud data centers will need massive scale.  The overwhelming majority of users do not believe they’re building one as of last fall (we’ll be looking at their spring views in July).  So the moral is that all this good SDN switching stuff has to drive not only SDN in the data center, but also cloud data center deployment, to get onto the value map.  Which means SDN players should be a lot more cloud-literate than they are.

Alcatel-Lucent posted its quarterly results, which were certainly disappointing to them but not completely surprising to many Street analysts.  Like Juniper’s results, Alcatel-Lucent’s had elements of good and bad, but the problem was that they didn’t prove a lot of progress toward a turnaround in sales (off about 22% sequentially).  Investors are getting leery about companies who can sustain profits by cutting costs; all you need is a ruler to extend the lines and you cross the zero axis in a couple of years.  Negative costs might be a concept of much greater value to the industry than SDN.  Sharpen your pencils, MBAs!  You have a future in technology after all.

The thing is, Alcatel-Lucent has what I believe to be the best SDN story of the lot.  It’s not as much a matter of technology as a matter of scope; they have a vision of SDN that truly goes end-to-end in the network.  If you focus on the cloud data center, you miss the cloud itself, the users who have to be the benefit drivers.  Alcatel-Lucent has captured the broadest SDN footprint, and their only problem is (as I’ve said) a substandard job of positioning what they’ve done.

I just had a conversation with a network operator on the Alcatel-Lucent SDN story, and they didn’t know it was end to end.  That, my friends, is a serious problem, and if Alcatel-Lucent wants to turn itself around it has to learn to be an effective seller and singer of technology anthems, and not just somebody who pushes geeks into a room until something new emerges, like fusion inside a star.  Some of the financial pubs are calling Alcatel-Lucent dead, and while I don’t think it’s true now or even necessarily so in the long pull, I do think that poor positioning will be fatal if it’s not corrected.  The one thing you cannot do in this industry and survive is fail to exploit your own value propositions.

HP, according to the media coverage of its switch launch, did the “fabric of the cloud”.  Wrong.  They did the fabric of the data center, and they hope that somehow the cloud is going to drive more data centers into a total capacity where fabric is needed.  This, from a company who ought to be the poster-child of cloud value propositions.  Look at all of the SDN and NFV stories that we’re hearing out there and you find the same sad kind of reductio ad absurdum; I want to focus on my own contribution and needs, not those of the buyers.  Thus, I will hope that they figure out their value proposition on their own, ‘cause I darn sure can’t figure it out!

In SDN, in NFV, and even in the cloud, we have a very clear set of value propositions.  The problem is that nobody wants to take the time and trouble to tell buyers about them.  Are we, as an industry, so specialized in our work that nobody sees the big picture anymore?  If that’s true, then there’s a problem that’s bigger than Alcatel-Lucent or HP, because we need the broadest possible benefit case to justify a revolutionary change, which is what SDN and NFV are supposed to bring us.  When the benefits shrink, the revolution becomes two kids in a pillow fight.  It’s time to step back from the minutiae and start from the top, where we should have started all along.

A Cloud IMS Solution

Virtualizing network functionality is the aim of a surprisingly large number of initiatives these days, for a surprisingly large number of reasons.  In NFV, operators focused on cost savings versus custom devices in their seminal paper on the initiative, but in SDN the focus is on improving network operations and stability.  Other operators have been looking to define a framework for hosting service features that would be both agile and operationalizable.

One of the most interesting topics in the virtualize-my-functions game is IMS.  Mobile services monetization, in the advanced form I’ve called “mobile/behavioral symbiosis” has been the second-most-important goal on the operators’ list (after content) but it’s also the area where they’ve made the least progress.  Some, at least, say the reason is that IMS is a key element in mobile services.

In theory, IMS is mostly about voice services, which many would say are strategically unimportant.  The problem is that it’s hard to offer wireless services that don’t include voice, and intercalling with the public voice network.  It’s also true that as we evolve to 4G, things like mobility management (part of the Evolved Packet Core or EPC) are more often linked to the deployment of IMS than run on their own.  I don’t even know of any plans to run EPC without the rest of IMS.  Finally, if we presume that we might eventually get wireless services from WiFi roaming and have some service continuity among WiFi hotspots or roam-over to cellular services when you’re not in one, we’ll likely need something like IMS.

The problem is that IMS isn’t known for its agility.  Words like “ossified” or “glacial” are more likely to be used to describe IMS, in fact.  People have been talking about changing that, and today I think there are a half-dozen initiatives to make IMS itself cloud-compatible.  Most are players who had IMS elements supported on fixed platforms or appliances and are now moving them to the cloud.  One vendor who’s taken a different approach is Metaswitch, who’s promoting a cloud IMS that was “cloud” from the first, and it’s an interesting study in some of the issues in cloud-based virtualization of network features.  Their cloud IMS is called “Clearwater”.

Metaswitch starts where most people will have to start, which is the structural framework of IMS defined by 3GPP.  You need to support the standard interfaces at the usual spots or you can’t work with the rest of your own infrastructure, much less with partners.  You also need to rethink the structure of the software itself, because cloud-based virtual components (as I noted in a previous blog) can’t be made reliable by conventional means.  So Metaswitch talks (if you let them) about N + M redundancy and scalable distributable state and automatic scale-out and RESTful interfaces.  Underneath all this terminology is that point about the fundamental difference of virtual, cloud-hosted, functionality.  You can’t take a non-cloud system and cloud host it and get the same thing you started with in reliability, availability, and manageability.  Skype and Google Voice didn’t implement a virtual bunch of Class 4s and Class 5s.  You have to transport functionality and not functions.

While it supports the externalized IMS interfaces, its intercomponent communications is web-like, and it manages state in a web-friendly way.  That means that if something breaks or is replicated, the system can manage a load-balancing process or fail-over without losing calls or data.  The web manages this by “client-maintains-state”, which is what RESTful interfaces expect.  You can make that stable by providing state storage and recovery via a database, something I did on an open-source project out of TMF years ago.  This is why they can claim a very high call rate and a very low cost-per-user-per-year; you can use generic resources and you can grow capacity through multiplicity using web-proven techniques.  It’s telco through web-colored glasses.

Does Metaswitch answer all the questions about cloud IMS?  I don’t think even they would say that.  The fact is that we have, as an industry, absolutely no experience in the effective deployment and operationalization of cloud-hosted network services.  Metaswitch tells me that they’re involved in a number of tests that will help determine just what needs to be done and what the best way to do it might be.  Some of this activity may contribute further understanding to initiatives like SDN and NFV, because the central-control notion of IMS makes it easier to adapt IMS to an SDN framework and cloud-anything is an NFV target.

Because Clearwater is planned for release early this month under the GPL, there are some delicate questions regarding how you could deploy it in situations where commercial interfaces or components might also be involved.  My personal concern here is that the value of cloud IMS might be diminished if it can’t be tightly coupled with elements that are already deployed outside GPL, or that are developed with more permissive licenses like Apache.  I’d recommend that the Metaswitch people look into this and make some decisions; perhaps GPL isn’t the best way to do this.

The most important aspect of IMS evolution is an area where this licensing may hit home.  While Clearwater includes native WebRTC support, integrating IMS into a web world is likely to mean writing services that are web-coupled and also IMS-coupled.  Even implementations of SDN and NFV might run afoul of licensing issues in creating composite functionality, and if the goal of bringing IMS into the cloud is in part driven by a goal of bringing it into the web era, the licensing could be a big issue.

This is a good thing, though, not only because we need to have virtual functions virtualized to deploy in the cloud in the network of the future, but because we need a model on how to do it.  IMS is one of the most complex functional assemblies of networking that we’re likely to encounter in the near term, so if we can figure out how to deploy IMS in the cloud, we can deploy pretty much anything there.  Which is good, because many operators plan to deploy everything there.

Virtual Networking’s Dirty Operations Secret

Huawei seems to be projecting a future where network equipment takes a smaller piece of the infrastructure budget—IT and software getting a growing chunk.  Genband seems to be envisioning a UC/UCC space that’s also mostly in as-a-service software form, and they’re also touting NFV principles.  It would seem that the industry is increasingly accepting the transition to a “soft” network.

The challenge for “the industry” is that it’s probably not true that a simple substitution of hosted functionality for dedicated devices would cut operator costs enough to alter the long-term industry profit dynamic.  In fact, my model says that simple pipelined services to SMB or consumer customers would cost, in the net, significantly more to operate in hosted form.  Even for business users, the TCO for a virtual-hosted branch service could almost be a wash versus dedicated devices; certainly not enough of an improvement to boost operators’ bottom lines.

I’ve already noted that I believe the value of a more software-centric vision of network services will come not from how it can make old services cheaper but how it can create new services that aren’t even sold today and that will thus fatten the top line.  But there’s a challenge with this new world, whatever the target of the software revolution might be, and it’s related to the operationalization of a software-based network.

Networks of old were built by gluing customer-and-service-specific devices together with leased-line or virtual-circuit services.  We managed networks by managing devices, and the old OSI management model of element-network-service management layers was tried and true.  When we started to transition to VPN services, we realized that when you added that world “virtual” to a “private network” you broke the old mold.  VPNs and other virtual services are managed via a management portal into a carrier management system and not by letting all the users of shared VPN infrastructure hack away at the MIBs of their devices.  Obviously we’re going to have to do the same thing when we expand virtualization through SDN or NFV, or even though just the normal “climbing the stack” processes of adding new service value.  In fact, we’re going to have to do more.

There’s something wonderfully concrete about a nice box, something you can stick somewhere to be a service element, test, fix, replace, upgrade, etc.  Make that box virtual and it’s a significant operational challenge to answer the question “Where the heck is Function X?”  In fact, it’s an operational challenge to put the function somewhere, meaning to address all the cost, performance, administrative, regulatory, and other constraints that collectively define the “optimum” place to host something.  Having made the decision, though, it’s clear that we can’t decide on how to “manage” our virtual function by tearing it down and putting it back again, which means we have to find all the pieces and redefine their cooperative relationship.  This is something that we have little experience with.

The TMF, a couple of years ago, was working on this problem and while I’m not particularly a fan of the body (as many of you know), they actually did good, seminal, work in the space.  Their solution was something called the “NGOSS Contract”, and it was in effect a smart data model that described not only what the service constraints were—the things that would have to define where stuff got hosted and how it was connected—but also described what resources were committed to the service and how those resources could be addressed in the service lifecycle process.

A service has a lifecycle—provision, in-service parameter change, remove-add-replace element, redeploy, and tear down come to mind—and every step of this lifecycle has to be automated or we’ve reverted to manually managing service processes.  In any virtual world, that would be a fatal shift from an operations cost perspective.  But with SDN, for example, who will know what the state of a route is?  Do we look at every forwarding table entry from point A to B hoping to scope it out, or do we go back to the process that commanded the switches?  But even the SDN controller knows only routes, it doesn’t know services (which can be many routes).  You get the picture.  The service process has to start at the top, it has to be organized to automate deployment to be sure, but it also has to automate all the other lifecycle steps.  And if you don’t start it off right with those initial resources, you may as well seal your network moves adds and changes into a bottle and toss them into the surf.

One of our challenges in this positioning-and-exaggeration-and-media-frenzy world is that we’re really good at pointing out spectacular things on the benefit or problem side—even if they’re not true.  We’re a lot less good at addressing the real issues that will drive the real value propositions.  Till that’s fixed, all this new stuff is at risk in becoming a science project, a plot for media/analyst fiction, or both.

More on the SDN, NFV, and Cloud Opportunities

One of my previous blogs has generated a lot of discussion on the question of what the SDN market might be, and whether any market-sizing on the SDN space is simply an exercise in hype generation.  This comes at the same time as a series of articles, the latest in Network World, that cast doubt on some cloud numbers.  According to NWW, some analysis of the cloud market shows a very high rate of adoption (70% or more) and even a high rate of multiple-cloud hybridization, where others suggest that the majority of businesses haven’t done anything at all with the cloud.

A big part of the problem here is in definition, starting with what “the cloud” or “SDN” mean.  Many will define the cloud as any service relationship involving hosted resources, in which case every human who uses any remote storage service like Google Drive or Microsoft SkyDrive is a cloud user.  It would also cover users of shared hosting for their websites.  But if you look at services limited to IaaS, PaaS, SaaS offerings based on shared infrastructure, the population of users would fall significantly.  In the SDN space, virtually every vendor wants to call what they do “SDN” so it follows that anyone doing networking at all is doing SDN by that definition.  If you limit the population to those doing formalized centralized control of network behavior using either OpenFlow or another protocol set (MPLS, BGP, etc.) then we’re way down in the single digits in terms of adoption.  And in both cloud and SDN, we’ve found that the population of “users” is made up primarily of “dabblers”, companies who have significantly less than 1% of their IT spending committed to either cloud or SDN.

Another problem we have is in defining a “business”.  If we look at the US, we find that there are about 7.5 million purported business sites, and that well over half of them represent businesses with fewer than 4 employees.  So if we mean “business” when we say it, any assertion that the cloud has even 50% penetration is downright silly.  In point of fact, if you look at the business population of the US, the most cloud-intensive of all the global markets, you find that only about 10% of businesses currently use cloud computing other than basic web hosting or free online storage.  In the enterprises, by contrast, you’d find that nearly 100% had adopted cloud technology, though on a very limited scale.  The “limited” qualifier today means less than 1% of IT spending.  The number who spend even 25% of their IT on the cloud is statistically insignificant.

The reason behind all this hype is the media, who demand that everything be either the single-handed savior of all global technology or the last bastion of International Communism.  Why?  Because those two polar stories are easy to write and get a lot of clicks.  So if you’re a vendor, you’re forced to toe the marketing line or languish in editorial Siberia forever.  Still, this might be a good time to be thinking about more than gloss and glitter.  Both service providers and enterprises are under their own unique pressures, and they look to initiatives like the cloud, SDN, and NFV for salvation.

Amazon, so they say, is going to launch its own TV service, presumably one that competes with other players like Roku for providing access to Amazon Prime.  All of these services rely on the “free aether” of the Internet, which of course isn’t free except in an incremental-price sense.  Operators have to pay to produce the capacity that all these guys are consuming to support their own business models, and that’s the proximate cause of initiatives like NFV, aimed at both reducing the hemorrhage of revenue per bit by reducing the cost per bit, and by increasing the revenue through higher-level service participation.  Vendor support is lumpy; some are stepping up and others are hanging back.  Same with SDN, of course.

Enterprises too have the benefit issue; over 70% of CIOs told me in the surveys that they would prefer to promote technologies that improved benefits rather than cut costs, but of that group two-thirds say they have no specific directions to follow toward benefit nirvana.  Most enterprises think that SDN isn’t relevant to them at all, or that if it is the technology is really about somehow preparing for the private cloud.  Why they need to do that, given that they don’t know what the benefits of the private cloud are, remains a mystery.

So what’s needed?  I think there are three elements to the future of “the cloud” and the same three for the future of SDN and NFV.  Yes, resource pools and cost-efficient production of services, applications, and features is one of them.  Connecting these flexible resource pools is another—obviously the SDN dimension.  The third, and the most important, is a new application architecture that exploits the combination of hosting flexibility and connectivity flexibility, and does so in the context of an increasingly mobile-broadband-linked population and workforce.  I think we have a handle on resource efficiency, and on network connectivity, but we’re never going to drive benefits or even finalize resources our networks without that application model.

The cloud (forgive me, Amazon!) is not IaaS; that’s only an on-ramp for getting legacy stuff onto the cloud.  The cloud is a future world of highly composible and distributable software components that migrate here and there over very flexible networks to live in a transient sense on flexible resource pools.  Applications, resource pools, and flexible connectivity are the three legs of the stool holding up the cloud, like Atlas purportedly held up the earth.  We can only advance so far in any area without advancing in all or tipping the whole thing over.  We’re all in the cloud together, despite the fact that to a lot of vendors we don’t seem to be connecting SDN or NFV with the cloud at all, and it’s the cloud that’s the path to those benefits.  So when somebody talks to you about SDN or NFV and leaves out the cloud, run screaming.  They’re inflicting the death of a thousand hypes on you.

Remember, though, that the cloud is about elastic, composable services.  So if somebody says “cloud” to you and doesn’t talk about componentized software and web services, REST, or SOA, run screaming too.  And maybe all these technical dimensions are why vendors and media alike are degenerating into sloganeering.  The truth of our future is hard…complicated.  A nice fuzzy lie would be so much easier, and build enough clicks on stories or product data sheets in the short term.  But it won’t build a market, and eventually we’re going to have to do that or watch networking become a junk yard of old ideas.

Playing the SDN/NFV Opportunity Curve

If you take stock of the network equipment earnings thus far, you definitely get a picture of an industry under pressure.  I’ve commented on the main players as they’ve come out, and one of the common themes has been that you can’t expect network operators or enterprises to spend more when you’re presenting them with a static or declining ROI.  More service revenues for operators, or more productivity gains for enterprises, are essential in preserving the health of the industry.

In this context, things like SDN and NFV are both risks and opportunities.  For at least some of the supporters of each of these new initiatives, the goal is to lower the cost of network equipment, offload some high-value features to servers, and generally commoditize the market.  For some, the goal is to improve network agility, support cloud computing better both as an end-user service and as a framework for building other services, and enhance operations.  Clearly vendors with market share risks should be addressing only the latter goal and those who hope to gain market share might think about supporting both goals.

In truth, it’s hard to see what vendors are trying to do in either space.  In the world of SDN, all of the focus of the centralized/OpenFlow revolution have been pushing what’s arguably control without context.  An application (from what source?  Not our problem) can drive a controller to make changes.  Great, but it cedes the utility, the business case, to that undefined application.  In the NFV world, the challenge is devising an architecture for hosting functions that can present significantly lower cost than current discrete devices would present, and at the same time add agility and flexibility.  It’s too early to be sure whether that challenge can be met, much less will be met.

Our only certainty here is the notion of cloud networking.  If we view applications as a set of components distributed over a pool of resources in such a way as to optimize both cost and QoE, we have a pretty good mission statement for both SDN and NFV.  That kind of application vision is the inevitable result of the cloud, and that might explain our problem with creating benefit cases.  If we separate the conceptual driver of change from the details of how and what gets changed, we’re likely going to lose any realistic view of ROI.  What good is agile networking absent anything that needs agility?  How valuable is centralized control if there’s nothing centralized to control anything?

We even need to think about the question of a “network”.  In a traditional sense, a network is a collection of devices that are induced to cooperate in the creation of a delivered experience to a community of users.  We induce that cooperation today through a series of control-plane, data-plane, and management standards.  But if we were to view the network as a collection of components, of software elements, then we have no rigid distribution of functionality, no fixed set of things to connect, and no need for rigid standards to connect them.  A service like content delivery can be abstracted as a single functional block distributed in any useful way through the cloud.  It can be abstracted as a set of virtual components that make up that single block, and those components’ nature and relationship can be changed freely as long as the external feature requirements are met.  In this situation, just what is an “interface”?  Is a virtual device an analog of a real one, or just some arbitrary collection of functionality that makes service composition easy?  Or is there never any such thing at all?  The fact is that we don’t know.

There’s been a lot of talk about the impact of SDN and NFV on the network equipment space, and subscribers to our Netwatcher publication found out in the March issue that across all segments of the network infrastructure market, SDN and NFV would actually be accretive to network opportunity through 2015, and would diminish it thereafter.  They also learned that SDN and NFV would be impacting nearly all network spending within five years, but that this would not likely result in major changes in market share.  They learned, no surprise, that the fastest-growing segment of network spending was spending on IT elements to host features.

Amazon’s quarter should be teaching everyone something.  The company’s profit line dipped because Amazon is making some massive investments in the future.  Company shares have traded just a bit lower pre-market, but certainly they didn’t take a major hit, and this pattern has been followed many times before with Amazon.  Network vendors should be thinking about this; if a company is investing in a credible future opportunity, the market rewards that investment in the long term, even in the face of some short-term pain.  The years prior to 2016, the years when SDN and NFV will drive spending up, are the years when companies need to grab hold of the benefit side of the technologies, to take control of the “R” in “ROI” while it can be done without reducing net revenues.  How many will be smart enough to do this?  Based on what I’ve seen in both SDN and NFV positioning, not many.  It will only take one, though, to change the market share numbers radically and put true fear in the hearts of the others.