NFV: What We Virtualize Matters

Transformation in telecom means investment in something not currently being invested in, but it doesn’t likely mean “new” investment.  Most operators have capital budgets that are based on return on infrastructure, which means that the sum of services supported by a network determines how much you spend to sustain or change it.  One of the reasons mobile networking is so important to vendors is that mobile revenues and profits are far better than wireline, so there’s more money to spend in support of infrastructure involved in mobile service delivery.

Despite mobile services’ exalted position as revenue king, none of the major operators believe that it can sustain current profit levels given the competitive pressure.  As a result, mobile services has been a target for “modernization”, meaning the identification of new technologies or equipment that can deliver more for less.  We’ve had a number of announcements of NFV proof-of-concepts built around mobility—IMS and most recently (from NSN) EPC.  NFV is a kind of poster child for modernization of network architectures, so it’s productive to look at these to see what we might learn about NFV and modernization in general.

One thing that jumps out of mobile/NFV assessment is that it demonstrates that there’s no single model of an NFV service.  When you create a service using NFV you deploy assets on servers that will provide some or all of the service’s functionality.  It’s common to think of these services, and their assets, on a per-customer basis but clearly nobody is going to deploy a private IMS/EPC for everyone who makes a mobile call or sends an SMS.  We have some services that are per-customer and some that are shared (in CloudNFV we call the latter “infrastructure services”).

This range of services illustrates another interesting point, which is that there are services that have a relatively long deployment life and others that are transient.  An infrastructure service is an example of the former and a complex telepresence conference an example of the latter.  Obviously, something that has to live for years is something whose “deployment” is less an issue than its “maintenance”, and something that’s as transient as a phone call may be something that has to be deployed efficiently but can almost be ignored once active—if it fails the user redials.

If we look into things that are logically infrastructure services—IMS and EPC—we see another interesting point.  Most infrastructure services are a cooperative community of devices/elements, bound through standardized interfaces to allow operators to avoid vendor lock-in.  When we think about virtualizing these services, we have to ask a critical question; do we virtualize each of the current service elements or do we virtualize some set of cooperative subsystems.  Look at a diagram of IMS and you see what looks all too much like some complex organizational chart.  However, most of the relationships the diagram shows are internal—somebody using IMS or EPC from the outside would never see them.  They’re artifacts not of the service but of the service architecture.  So do we perpetuate the “old” architecture, which may have evolved to reflect strengths and limitations of appliances, by virtualizing it?  Or do we start from scratch and build a “black box” of virtual elements?

In the IMS world, we can see an example of this in Metaswitch’s Project Clearwater IMS implementation.  Project Clearwater builds IMS by replicating its features not its elements, which means that the optimum use of the technology isn’t constrained by limitations of physical devices.  I think something like that is even more important when you look at EPC.  EPC is made up of elements (MME, SGW, PGW…you get the picture) that represent real devices.  If we virtualize them that way, we’re creating an architecture that might fit the general notion of NFV (we’re hosting each element) but flies in the face of the SDN notion of centralization.  Why have central control of IP/Ethernet routing and do distributed, adaptive, mobility management?

So at this point you might wonder whether all the PoC activity around mobility is reflecting these points, and the problem is that we don’t really know for sure.  My interpretation of the press releases from the vendors involved (most recently NSN) is that the virtualization is taking place at the device level.  Every IMS and EPC element in the 3GPP diagram is reproduced in the virtual world, so all the limitations of the architecture used to provide IMS or EPC are sustained in the “new” world.  You can argue that this is a value in transitioning from hard devices to virtual EPC or IMS, but I think you must then ask whether you’ve facilitated the transition to an end-game you’ve devalued.  Can we really modernize networks by creating virtual versions of all the current stuff, versions that continue to demand the same interfaces and protocol exchanges as before?  Frankly, I don’t think there’s a chance in the world that’s workable.

So here’s my challenge or offer to vendors.  If you are doing a virtual implementation of some telecom network function that takes a black-box approach rather than rehashing real boxes, I want to hear from you.  I’ll review it, write about it, and hopefully get you some publicity about your differentiation.  I think this is a critically important point to get covered.

Conversely, if you’re virtualizing boxes I want you to tell me how much of the value proposition for the network of the future you’re really going to capture through that approach, if you want me to say nice things.  I also want to know how your box approach manages to dodge the contradiction with SDN goals.  We have one network and SDN and NFV will have to live in that network harmoniously.

Signposts: A Missed Quarter from One, a New CEO from Another

Cisco turned in a disappointing quarter, one that screams “Networking isn’t what it used to be!” for all with even mediocre ears.  Juniper named a new CEO, one that the Street speculates may have been picked to squeeze more value for shareholders by cutting Juniper’s operating expenses.  Such a move would seem to echo my suggested Cisco-news shout, so we have to ask some questions (and hopefully answer them).  One is whether the networking industry really isn’t what it used to be, another is whether it could still be more than a commodity game, and the third is whether Juniper’s new CEO will really do what the Street suggests.

Most of my readers know it’s been my view for several years now that networking as bit-creation and bit-pushing is in fact never going to be what it was.  When you sell cars only to millionaires you can charge a lot per car, but when you have to sell to the masses the prices come down.  Consumerism has killed any notion of profit on bits because consumers simply won’t pay enough for them.  In fact, they really don’t want to pay for bits at all, as their habits in picking Internet service clearly shows.  Everyone clusters at the bottom of the range of offerings, where prices are lowest.  And if bits aren’t valuable as a retail element, you can’t spend a ton of money to create them and move them around.  Equipment commoditizes as the services the equipment supports commoditizes.  In bit terms, network commoditization is inescapable.  It’s not a macro trend to be waited out—it’s the reality of the future.

The thing is, we’re clearly generating money from networking, it’s just that we’re not generating it from pushing bits.  Operators are largely looking into the candy store through nose-print-colored glass, and in no small part because their vendors have steadfastly refused to admit that bits aren’t ever going to be profitable enough.  However, refusal to accept reality doesn’t recast it—as Cisco is showing both in its quarterly results and in its new Insieme-based “hardware SDN”.  What operators need to have is a mechanism for building higher-level services that not only exploit network transport/connectivity as a service (as the OTTs do) but also empowers capabilities at the transport/connection level that can increase profits there.  That builds two layers of value—revenue from “new services” and revenue from enhanced models of current services, models that may not be directly sellable.

Application-Centric Infrastructure (ACI) is arguably aimed at that.  It’s essentially an implementation of SDN that’s based not on x86 processors but on networking hardware.  The argument is that the “new role” for the network is something that demands new hardware, but hardware that’s incorporating the ability to couple to central policy control to traditional switches/routers.  It says that custom appliances are still better.

There’s nothing wrong with this as an approach; SDN at first presumed commodity switches and has translated to a software-overlay vision in no small part through the Nicira product efforts of Cisco rival VMware.  There are specific benefits to the Cisco approach too—custom appliances like ACI switches are almost certainly more reliable than x86 servers and they also offer potentially higher performance.  Finally, they’re immune from hypervisor and OS license charges, which Cisco points out in its marketing material.

Cisco’s notion of ACI is a watershed for the San Jose giant.  To reverse the bit-pushing commoditization slide you have to create value above bits, not just push bits better.  Otherwise you’re in a commodity market yourself.  For Cisco, therefore, the big barrier is likely less the proving of the specialty-appliance benefit than it is the definition of the services and applications.  We can make network transport/connection the underlayment to services if we can create services, meaning services above connection/transport.  Cisco has great credentials in that space with their UCS stuff, but they’ve been reluctant to step out and define the service layer.  Is it the cloud, or SDN control, or NFV, or something else?  If Cisco wants application-centricity it needs some application center to glom onto, and so far they don’t have it.

I think Cisco’s quarterly weakness is due to their not having that higher-layer strategy.  I don’t think operators would mind an evolutionary vision of networking—they’re deeply invested in current paradigms too.  Not demanding revolution isn’t the same as not requiring any movement at all, though.  To evolve implies some change, and Cisco’s not been eager to demonstrate what the profit future of networking is.  They’re happy to say they have new equipment to support it, though, which leaves an obvious hole.

Which, one might expect, a player like Juniper might have leaped to plug.  To be fair, Juniper has said nothing to suggest that the Street view of the new CEO, Shaygan Kheradpir (formerly from Barclays and Verizon, where he held primarily CTO/CIO-type positions) is correct.  However, it’s clear that from the investor side the goal for Juniper shouldn’t be chasing opportunities Cisco might have fumbled, but cutting costs.  If, as one analyst suggests, Juniper has the highest opex of any company they cover, then cutting opex is the priority.  You can’t cut sales/marketing, so what do you cut if not the stuff usually lumped into “R&D?”  How might Juniper seize on the Cisco gaffe without investing in something?  You see the problem.

What I’m waiting to find out is whether Juniper sees it, and sees the opportunity to continue to be a network innovator rather than a network commoditizer.  Juniper is a boutique router company, with strong assets in the places where services and applications meet the network.  The problem they’ve had historically is that they focus their research and product development almost exclusively on bit-pushing.  The former CEO, Kevin Johnson from Microsoft, was arguably hired to fix the software problem at Juniper, but Microsoft doesn’t make the kind of software Juniper needs and Johnson didn’t solve the problem.  The question now is whether Kheradpir’s CTO background and carrier credentials include the necessary vision to promote a Juniper service layer.  If not, then cost-scavenging may be about as good as it gets.

Right now, the Street seems to be betting against its own analyst vision of Juniper the scavenger hunter.  Logically a cost-managed vision wouldn’t generate the kind of P/E multiple Juniper has, and that should mean its stock price would tumble.  Either traders think Juniper will quickly increase profits through cost reductions without being hurt much by the Cisco-indicated macro trend in network equipment, or they think Juniper might still have some innovation potential.  Which, to be real, would have to be at the service layer.

Both Cisco and Juniper also have to decide where their positioning leaves them with respect to NFV.  For Cisco, espousing hardware SDN almost demands a similar support for appliance-based middle-box features, the opposite to where NFV would take things.  For Juniper, NFV could be a way of either exploiting Cisco’s hardware-centricity or a way of tapping third-party application innovation as a substitute for internal R&D.  Construct an NFV platform and let others write virtual functions.  In fact, we could say that NFV direction could be Juniper’s “tell” on its overall direction; get aggressive with NFV and Juniper is telegraphing aggression overall…and the converse of course would also be true.

Did Juniper Sing MetaFabric Pretty Enough?

Juniper may not have wowed the Street with its last earnings call, but with some of its rivals definitely closer to the drain, earnings-wise, and it’s clear Juniper still has assets in play.  The latest Juniper move to leverage those assets is a new slant on the data center of the future, called “MetaFabric”.  Unlike Juniper’s (dare I say, “ill-fated”) QFabric, MetaFabric is an open architecture supported by Juniper products.  I like it, and given that the data center isn’t exactly a Juniper lake, it’s a smart move, but not one without risks.

MetaFabric is a good hardware strategy but not an exceptional one.  What probably differentiates it is its goal of higher-level openness, the support for a variety of cloud and IT environments and the specific embracing of VMware, a player who’s been on the outs with Cisco.  I think it’s clear that MetaFabric is aimed at Cisco, and at invigorating the old Juniper strategy of hitting Cisco on lack of openness.  However, the openness angle makes it hard to differentiate and puts more pressure on Juniper to establish some high-level value proposition.

The stakes here are high.  Networking is becoming the poor stepchild to IT in both the enterprise and service provider spaces, and that shift is largely because you can’t create profitable services based only on pushing bits around.  For network equipment vendors like Juniper, the problem is that pushing bits is what you do, what your incumbent positions are based on.  For a bit-pusher to enter an era where services are built higher up, it makes sense to focus on the place where those services couple to the network—the data center.  Not to mention the fact that for six years now my surveys have shown that data center network trends drive network spending in the enterprise.  Juniper needs help there too, based on its last quarterly call.

Juniper’s first challenge is overcoming a truly alarming loss of strategic influence.  Among service providers, Juniper has lost more influence since 2010 than any other network vendor.  In the enterprise space their losses lead all other network vendors too; only HP has fallen further, and their problems can be attributed to their management shifts.  In the enterprise, Juniper’s data center evolution strategic influence loss is the largest of any vendor’s, primarily because of their muddy QFabric launch.  Historically, what it takes to make something like this right is some impeccable positioning, something Juniper hasn’t been able to muster.  Can they do it now?  The potential is there, I think, because there are market trends that could be leveraged.

In cloud computing of any sort, the biggest difference in data center networking arises from the growth of horizontal traffic.  Any time you compose applications from components you generate horizontal flows and that stresses normal data center architectures that are designed to deliver application silos vertically to users.  Even cloud multi-tenancy doesn’t change this, but componentization does and componentization is the critical element in supporting point-of-activity empowerment, which is likely the largest driver of enterprise change in the rest of this decade.  More productivity, more benefits.  More benefits, more spending to secure them.

On the operator side, it’s NFV that’s driving the bus.  A fully optimized NFV would create more new operator data centers than new data centers from all other sources combined.  In fact, it would make network operators the largest single vertical in data center deployment terms.  NFV more than even the cloud demands very high performance within its data center, and nearly all NFV traffic is horizontal.  Juniper, who is hosting the NFV event that starts literally today, has a chance to leverage NFV into a place in the largest set of data center orders the world has ever known.

There’s a lot of ground for Juniper to plow, but they’re having problems getting hitched up.  Juniper has up to now resisted even the notion of naming something “NFV”.  I had a useless conversation with a top Juniper exec who took exception to my characterizing their service chaining as an “NFV” application when they wanted to say it was SDN.  They need to start thinking of it as NFV now, if they’ve not learned that already.  But if they have, Juniper’s white paper on MetaFabric misses the mark.  They gloss over the key issues that are driving horizontal traffic to focus on automation and operationalization, which would be great topics if you targeted them at NFV where such things are literally the keys to the kingdom.  Juniper’s paper never mentions NFV at all.  It never mentions horizontal traffic, in fact.

The most telling point Juniper makes about MetaFabric is buried in a VMware-specific paper.  There, they talk about the need to connect physical and virtual networks, and while they open the topic in a pedestrian way there’s a lot of substance in what Juniper can offer here.  I’ve become convinced that the SDN world of the future is bicameral, with a higher virtual layer that’s very agile and application-specific linked to a physical network that’s policy-driven.  The Juniper Contrail approach is well suited to that story (so is Alcatel-Lucent’s Nuage) and you can make a strong argument for the story that the evolution of the data center will drive this new SDN vision all the way to the edge of the network.

There’s a good metro story here too, a story that could address the hole Juniper’s left in its strategy after killing its mobile architecture recently.  Operators want virtual EPC.  Data centers host virtual EPC.  Juniper is doing MetaFabric for data centers.  See a connection here?  I’d sure like that connection to be made strongly, but it isn’t.

So here’s the net.  I think NFV is the movement that’s most directly driving data center evolution, even for enterprises, because it’s pulling the cloud, SDN, and NFV goals into a single package and wrapping it in a strong operational story (one Juniper wants to tell).  Juniper needs to board the NFV bus if they want MetaFabric to succeed.

NFV PoCs: What Concepts Get Proved?

The NFV ISG released its first public documents in early October, but the most significant release was the details of its process for proof-of-concept demonstrations.  PoCs are one of the things that I’ve believed from the first would be essential in making NFV a reality.  Specifications don’t write software, and I think the IETF has actually proved that sometimes software should guide specifications and not the other way around.  Now, with a process in place, the ISG can start to look at elements of its approach in running form, not as an abstraction.  It could be a tremendous step forward for NFV, and also for SDN and the cloud.  The qualifier, as always, reflects the often-found gulf between possibilities and realization.

The most important concept to prove is the business case.  If you asked the service providers what they wanted from NFV a year ago, they’d have said “capex reduction”.  In fact they did say just that in their first Call for Action white paper.  The newest paper, just released, broadens that to more of a TCO quest by adding in a healthy desire to see operations efficiencies created.  There are also suggestions that improvements to the revenue side would be welcome, through the more agile response to service opportunities.

Where this intersects with the PoC process is that if the goals of the ISG are in fact moving toward TCO and service agility, then there is a specific need to address both these things in proofs of concept.  So I contend that ISG PoCs should address operational efficiencies and service agility.  Will they?  There are some barriers.

Barrier number one is that this sort of broad comprehensive stuff flies in the face of vendor opportunism.  First of all, vendors don’t want there to be any radical changes created by NFV or SDN or much of anything else if the word “saving” appears anywhere in the justification.  They reason that savings means spending less on their gear, which is likely true.  The second barrier is scope.  All of the benefits we’re talking about mandate comprehensive and systemic solutions, and the scope of the ISG has been drawn to insure that it doesn’t become the god-box of standards activities, responsible for everything in the world.  But if the problem is everything in the world (or a lot of it) how does NFV solve it?  And if they can’t, did they disconnect from their own benefit case?

So far, what we’ve been seeing in NFV demos and announcements has been focusing on a very small piece of this very large puzzle.  We can deploy VNFs—from a command line.  We can service chain, with static commands to connect things.  We can scale in and out, but not based on central operational/orchestration-based policies.  We can control networks, but not based on models of service that unify a deployment process and connect it with management.  Proving that 1) I have a ship, 2) I can sail it, and 3) I know what direction “west” is, doesn’t discover the New World.  We need something that links all the pieces into a voyage of discovery.

That’s what I think the PoCs have to do.  Yes we need to validate some of the specific points of the specifications, but more than that we need to understand how all of the variables that are being discussed will impact the business case for NFV.  We can’t do that if we don’t know how that business case is developed.  If the full service, from conception by an architect to management by a NOC expert or customer service rep, isn’t in the ISG scope then we still need to be sure not only that the way NFV works adds value to this overall process, but also just when and how that value is added.  Otherwise we’re building something that may not get us to where we need to be.  The PoCs, with no explicit limit in the range of things they can show, could provide the ISG with that critical business context.  They could prove the case for NFV no matter whether it’s capex, opex, revenue from more agile services, or a combination thereof.

Something constructive in the PoC sense might come out of a joint TMF/ISG meeting on management/operations that’s being held this week as the two bodies have meetings in the Bay Area.  The TMF also has PoC capability through its well-known and often-used Catalyst projects, and perhaps something that should happen in the joint discussions is the creation of policies that would advance systemic PoCs in one or both groups, and/or a dividing line to show where they belong.  It’s pretty hard to see how you can get reduced opex without some incremental “op”, but it’s also worthwhile to note that cloud-based, virtual services don’t work like the old services do.  Should we look at what the optimum operational framework for NFV and the cloud might be, then fit it to the TMF model, adapting the latter as needed?  That’s a viable approach, I think.

The TMF connection raises an interesting point, which is how the ISG will communicate its PoC findings to bodies like the TMF.  While a PoC could well move outside the ISG’s scope for what it does and shows, there appears to be some limitation on how results are fed back.  If you report against specific ISG specifications then you can’t report extra-ISG activities back, so they can’t pass through the liaison linkage to other bodies where those activities could be meaningful.  So do the ISG people then have to be members of all the other bodies?  I’m a TMF member but none of the other CloudNFV team are, for example, and I suspect a lot of the smaller companies in the ISG aren’t either.  I’d like to see the ISG encourage people to run PoCs that are outside the scope of the ISG where the extension works to cement the benefit case, and also then agree to provide liaison for these points.

The PoC process is critical to NFV, I think, perhaps even more critical than the specifications themselves.  Operators who need some of the NFV benefits now can build on PoC results for early trial activity because you can test an implementation in the lab or even a pilot service.  That means that those critical business cases and early tire-kicking can happen earlier, and that vendors can be forced by competitive pressure into doing more and doing it faster.  The PoCs could be the big win for the ISG, if they can get them going in the right direction and keep up the momentum.

VNFs equal What Plus What?

I’ve been blogging over the last week about things related to the SDN World Congress event and what the activity there shows about the emerging SDN and NFV space.  I’ve covered issues of deployment and management of complex virtualized systems, high-performance network data paths, and DPI as a service.  For this, my last blog on the Congress topics, I want to look at the virtual network functions themselves.

Every substantial piece of network software today likely runs on some sort of platform, meaning there’s a combination of hardware for which the software is complied into machine code, and software that provides service APIs to the network software with operating system and middleware services.  If you have source code for something, you can hope to recompile it for a new environment, but even then there will be a need to resolve the APIs that are used.  For example, if a firewall or an EPC SGW expects to use a specific chip for the network interface, the platform software is written for that chip and the firewall or EPC is written for the APIs that expose chip access.  You can’t just plunk the software on a commercial x86 server under Linux and expect it to run, and if you don’t own the rights to the software or have a very permissive license, you can’t do anything at all.

Given that the ETSI NFV process expects to have virtual functions to run, it’s clear that they’ll have to come from some source.  The most logical place for a quick fix of software would be the vast open-source pool out there.  We have literally millions of software packages, and while not all of them are suitable for network-function translation, there’s no question that there are hundreds of thousands that could be composed into services if we can somehow make them into network functions.  But…there’s a lot more to it.  The ISG wants virtual network functions (VNFs) to be scalable, have fail-over capability, and be efficient enough to create capex savings.  How does that happen?

When I was looking at candidates for a CloudNFV demo, I wanted something that addressed one of the specific ETSI use cases, that could provide the performance and availability enhancements that the ISG was mandating, and that could be deployed without making changes to the software itself just to make it NFV-compliant.  The only thing I could find was Metaswitch’s Project Clearwater open-source IMS.

Metaswitch’s Project Clearwater was a project to demonstrate that you could do a modern cloud-based version of an old 3GPP concept and get considerable value out of the process.  The idea was to create not just an IMS that could run in the cloud but rather one that was optimized for the cloud.  There are a number of key accommodations needed for that, and fortunately we get a look at them all in Metaswitch’s Project Clearwater project.

First, the application had to self-manage its cloud behavior.  Metaswitch’s Project Clearwater contains a provisioning server that can manage the spin-up of new instances and handle the way that the DNS is used to load balance among multiple instances of a given component.  Because of this, the management system doesn’t have to get involved in these functions, which means that there’s no need to expose special APIs to make that all happen.

Second, the application has to have stateless processes to replace the normally stateful behavior of IMS elements.  Call processing is contextual, so if you’re going to share it across multiple modules you have to be sure that the modules don’t store call context internally, because switching modules will then lose the context and the call.  Metaswitch’s Project Clearwater does a sophisticated back-end-distributed state management that allows for switching modules without loss of call context.  That works for both horizontal scaling and for fail-over.

Third, the application has to have the least possible customization needed to conform to VNF requirements.  Even open-source software has licenses, and if we were to grab a network component and specialize it for use in NFV we’d have to fork the project to accommodate any special APIs needed.  If those APIs were to involve connecting with proprietary interfaces inside an NFV implementation, we’d have a problem with many open-source licenses.  Metaswitch’s Project Clearwater can be managed through simple generic interfaces and there’s no need to extend them to provide NFV feature support.

It was a primary goal for CloudNFV to be able to run anything that can be loaded onto a server, virtual machine, container, or even a board or chip.  Meeting that goal is largely a matter of wrapping software in a kind of VNF shell that translates between the software’s management requirements and broader NFV and TMF requirements.  I think we designed just that capability, but we also learned very quickly that software that was never intended to fail over or horizontally scale wasn’t going to do that even with our shell surrounding it.  That’s why we started with Metaswitch Project Clearwater—it’s the only thing we could find that was almost purpose-built to run in an NFV world even though NFV wasn’t there at the time the project launched.

We also learned a couple of things about VNF portability along the way.  There is always a question of how stuff gets managed as a virtual function, how it asks for a copy to be spun up, or how it is deployed with the right setup and parameterization.  It’s tempting to define management interfaces to address these things, but right now at least the NFV ISG has not standardized all the APIs required.  Since the scope of the ISG is limited to the differences in deployment and management created by virtualization of functions, it may never define them all.  Any specialized API creates a barrier to porting current software to NFV use, and any non-standard APIs mean VNFs themselves are not portable at all.  That could be a big problem; creating software silos instead of appliance silos isn’t what most ISG founders had in mind, I’m sure.

The point here is that there are a lot of things we have to learn about VNFs and how they’re deployed.  I learned a bunch of them in working with Metaswitch on our Project Clearwater demo, and I’m fairly confident that Metaswitch learned even more in its Project Clearwater trials and tests.  I’d bet they’re looking to apply this to other virtual function software, and if they do then how they build this stuff will be a valuable reference for the industry, a critical element in meeting the goals of NFV.

Is DPI the Next “as-a-Service?”

One of the interesting things that emerged from the SDN World Congress was the multiplicity of roles associated with the concept we call “deep packet inspection” or DPI.  Like almost everything in tech, DPI is being positioned or proposed in more missions than we have methodologies to support.  I had an opportunity to spend quite a bit of time at the show with Qosmos, a company who is a leader in DPI not only in a functional sense but also in terms of how many places it’s already installed.  What emerged was a useful view of how we might present DPI in a virtual age.

Qosmos DPI is available in what’s effectively three flavors; as a plugin to Open vSwitch, as a standalone software element that could be a virtual network function (Qosmos is presenting that to the NFV ISG), and as a set of libraries that can be used to construct stuff, including DPI in embedded applications/appliances.  Like DPI overall, the Qosmos stuff is directed at sniffing into packets deeper than the normal IP headers to determine the higher-layer context of a given packet.  This can be used to add “Level 4-7” processing to switching applications, to identify contextual flows, to screen packets for further processing or monitoring, to manage QoS-based routing or handling, etc.

Traditionally DPI would be used like routing or switching or firewalls are used, which means embedding it in an appliance and inserting it into the data path somewhere.  This sort of thing is still possible in a virtual world, but it kind of violates the abstraction-driven spirit of virtualization, not to mention creating some very practical issues on how you’d stick a real box into a virtual process.  If we have compute as a service, networking as a service, you should logically be able to get DPI-as-a-Service.  The question is how, and what might the outcome do for us, and the “distributability” of Qosmos offers some insights here.

Qosmos’ material shows a diagram of an NFV-ish framework with OpenStack on top, operating through an SDN controller, to nodes underneath that might be OVS or something else that the controller can manage.  In this structure, you could apply DPI at any level from network applications (VNFs for example) deployed by OpenStack, to applications inside the controller, down to stuff inside the nodes that are being controlled.  So the high-level framework presented here is really similar to a server resource pool, and that’s the important point to consider.

DPI as Qosmos presents it is a bit like infrastructure, something you can harness virtually.  There are places where you can utilize DPI and cast that utilization into a specific mission, but also places where DPI is potentially as flexible an abstraction as “routing” or “switching”.  That’s why I think you have to consider it as a virtual element and the basis for an as-a-service modeling to get the most bang for your buck.  You can still use DPI components in more static applications, of course, to apply policies from above or gather information from below, but the static model is a subset of the DPI-as-a-service picture, and the most general is the most valuable in a virtual world.

In a DPIaaS model, you would presume that a network/cloud was populated with a bunch of places where DPI functionality was available, much as it’s populated with OVS functionality.  In fact, a lot of the DPI fabric might well be created because we’ve installed plugins in OVS or used the DPI libraries to add features to NICs or cloud host OS and middleware stacks.  In any given service created by virtual means (orchestration) we presumably have a model that describes each element in the service and how it’s deployed.  As we deploy stuff, we first look at policy and attributes of available resources to decide where to place/route something, then record where we put it.

The availability of DPI can be considered in our selection of resource placements, and we could also compose an explicit DPI virtual function into one of the components of a service.  In the latter case, the service role of the DPI element is explicit.  In the former case, we have a kind of “DPI in waiting”, there if we want to use it because we considered the chance of needing it in our decision of where to deploy something.

So let’s say that we have our vast web of Qosmos DPI stuff out there embedded in OVSs and other stuff, and that we have also composed DPI in some way (we’ll get to that) into services.  How do we use it?

One easy application is IDP.  We have a place where we think something bad is getting into the network.  We could do a tap on that to look for packets addressed to our attack target to see if they’re emitted from a given place.  That means letting OVS replicate a flow and direct it to Qosmos DPI, where we look for the attack signature.  If we find it we move to Stage Two.  That means either shutting down the port, or if we can’t because it’s shared, perhaps injecting DPI into the main flow to block the attack.

Monitoring as a service can work in a similar way, but let’s make it complicated to improve the example.  We have a service that has a problem and the NOC wants to look at a particular packet stream.  We know from the resource allocation where this particular stream goes (perhaps we get that from the OpenFlow controller), and we also have a map of where along that path we might have DPI points or taps where something could be diverted to DPI.  So our management system shows our NOC expert the map and says “Where do you want to look?”  When the operations person responds, the software sends a “Monitor-This” request to our MaaS service, and it responds by showing the NOC what they want to see at the point where they want to see it.

What I think this shows is that when you start virtualizing stuff, it’s hard to stop.  You lose the benefit of a virtual environment any time you constrain it, and management, monitoring, policy control and other “non-service” elements are things that constrain as much as fixed boxes do, but because they’re behind the scenes in a functional sense we miss them.  That’s bad, because we may miss opportunities that way too.

Half Magic?

Progress is always a combination of evolution and revolution, adaptation of the current situation for the next progressive step versus leaping boldly out of line and sprinting to the finish.  Revolution provokes change and fright.  Evolution provokes comfort and stagnation.  Thus, as a practical matter, anything that gets announced could be viewed as either “revolutionary evolution” or “evolutionary revolution”.  We either accept a radical goal and try to get to it in a non-threatening way, or we accept a radical step toward a pedestrian goal.  So which is Ericsson’s notion of “network slicing” unveiled at BBWF?  Let’s see.

What Ericsson talks about is a future where services are sliced out of infrastructure using SDN principles, linked with dynamic virtual functions and cloud components, and the projected to users through highly automated lifecycle processes.  If you put the goal that way, it’s not only pretty revolutionary, it’s very congruent with my own view of the network’s future.

I think that all the SDN hype and the current SDN activity demonstrate that SDN has a value that’s greatest where the software rubber meets the infrastructure road.  Applications, users, and services are highly agile in their expectations because all of them are framed by momentary need.  We have, from the dawn of data networking, framed services based on collectivizing users.  You don’t network people, you network sites, and while Ginormus, Inc. may have a highly variable number of workers with variable needs, they probably don’t sprout sites like a blossoming peach and then drop them in a few days.  As we’ve moved more toward explicit worker empowerment, we’ve started to collectivize more on applications than on sites, and so we’ve taken some steps on the path to agility.  SDN can help.

Software-connection overlays are great ways to partition connectivity.  They have little or no capital cost and they’re really focused either at the user edge or at the application edge, where we can presume software is available to direct connectivity because there are computers there to run it.  I characterize a vendor’s SDN strategy primarily based on whether it supports both ends, only because you can’t (as a friend of mine noted recently) be “half-agile”.

The other interesting point is that once you take the connectivity mission and assign it to software, you’ve created some natural SDN domains.  Applications are like tenants, and so are groups of cooperating workers in branch locations—even customers and suppliers.  We can’t scale SDN today to the level of the Internet (and we may never do that) but we can definitely scale it to the scope of an application or a cooperative set of workers.

So what about all that lower-level stuff?  My contention is that you don’t want agile SDN there.  Imagine your neighbor deciding to make their video better by reallocating the access capacity that you happen to share with them.  Hackers could forget petty disruptions and bring everything crashing down in minutes.  There has to be a strict policy boundary around shared resources.  You can divvy up your own stuff any way you like, but you don’t mess outside your own litter box.

So what does this mean with respect to the Ericsson vision?  I think they captured the goal, but I’m not sure that they captured the road to it, or even demonstrated they know where that road is, or that they can get us onto it.  Ericsson is hardly a household name in data centers.  Their practice is more focused on mobile than anything else, and it’s not clear how this vision could be deployed in mobile infrastructure without any of the data center support that any agile IT-based service or software or feature demands.

If you look at Ericsson’s own progress with things like SDN, they’ve focused more at those lower layers, in no small part because they really don’t have a horse in the race for software-overlay SDN.  But if you don’t have that key element, the cornerstone of agility, isn’t it going to be hard to make this vision a reality?  Particularly if you also don’t have the cloud-data-center collateral.  Ericsson may have stated the dream, but they also stated the obvious in a little different and certainly more clever way.  Their credibility in their vision depends on having some clear path to execution, and that isn’t a multi-application card in an edge router—the primary thing Ericsson seemed to be linking to their execution.

This isn’t a good time for Ericsson to be singing without a backup band.  The company’s earnings just came out and they were disappointing in both the profit and revenue areas.  Like other “bit-belt” companies, Ericsson has been moving more to professional services and perhaps that’s what they believe will be their mechanism to achieve their vision.  Nobody thinks operators will make a transition to SDN, NFV, and the cloud alone.  But will they look to vendors who have most of the pieces of equipment and software, or those who have to get them from someone else?  Someone who, not surprisingly, has professional services aspirations of their own.

Remember my quote that you can’t be “half-agile?”  When I was a kid, I read a book called “Half-Magic” that was about a group of kids who found a magic coin that would give you some arbitrary half of whatever you wished for.  If the kids wanted to be safe and warm in LA, they’d end up half-way there.  I have to wonder if somebody at Ericsson may have read that same book.

Facing the Future…Sort Of

The future is one of those things that always gets ahead of us.  Companies today are particularly challenged with futures because financial practices focuses them on quarterly results.  Still, while the future may be redefined daily and may be hard to come to terms with, it’s either there for us all or we don’t have much further to worry about.  Yesterday we had two indications of that impending future in two different areas, and it’s worthwhile to look at them both.

Juniper reported its quarterly numbers, and followed what’s been a bit of a pattern for the company in reporting a slight beat on the current number and somewhat downbeat guidance.  Analysts liked the fact that Juniper was somewhat bullish about carrier capex, but they were concerned that Juniper was clearly still struggling in the enterprise and thus at risk to becoming a pure-play carrier networking company.  None of those companies deliver anything like the P/E multiples that Juniper does, so if Juniper can’t somehow fix something then the stock price would be in for an agonizing reappraisal.

I want to frame the Juniper situation in the carrier space with a quote from the call, from Kevin Johnson the CEO:  “The response we’re getting from customers indicates our SDN and network function virtualization strategy continues to resonate with them. SDN is a growth opportunity for us in both networking systems and software.”  This is interesting first because so far the only vendor with a resonant NFV strategy that I’ve heard about is Alcatel-Lucent.  Second, carriers see SDN and NFV (the latter in particular) as a means of reducing capex, and of course Juniper is the kind of company on whom the operators are spending.

It is true that many operators don’t really see capex reduction as the drive for SDN and NFV these days.  One told me that if they wanted 20% savings they’d simply beat up Huawei.  But if you’re going to offer service agility and operational savings as the alternative benefits to SDN and NFV, you have to climb above the current level of articulation—whether you’re Juniper or somebody else.  Either of these benefits is mostly a matter of operationalizing, not building, and none of the network vendors—even Alcatel-Lucent—have a really strong story in the operational side of the problem.

Kevin Johnson talked more about SDN and Contrail than anything else, and yet SDN today tends to be stuck in the data center, lacking any strong end-to-end message.  Contrail, in my view, could offer a lot as an end-to-end SDN architecture but it’s not being positioned that way, likely because end-to-end SDN has the potential to vastly undermine the spending on traditional routers and switches.  You can build an agile “connection SDN” layer on top of very low-level infrastructure based on “transport SDN” and almost forget the middle layers.  That’s particularly true if you adopt NFV on a large scale because you’ll have a lot of functionality running as software in servers.  So, arguably, Juniper is working to protect its current earnings (as in one sense the must) while trying to address where future earnings will come from.

The biggest thing for me in the call wasn’t what was said, though.  It was who didn’t say it.  For several quarters now, Johnson has trotted out his Microsoft compatriot Bob Muglia to talk about the progress of Juniper, even in areas where Muglia has neither experience nor direct responsibility (he’s head of software).  Yesterday, a platform guy was on the call instead.  I don’t want to read too much into this, but it may mean that Juniper has decided that people who understand application software and PC operating systems may not be the “software guys” that Juniper needs.  It may also mean that Muglia will not succeed Johnson (that likely would have happened by now if it were the plan, in any case) and calls into question whether he’ll stay around.  I think Juniper needs to take a fresh look at things.  They have all the assets Johnson talks about, they only lack the most important asset of all—the ability to innovate and inspire at the same time.

Apple has always had that asset, and it announced what most agree is another evolutionary shot at the tablet space with its new stuff.  Price points don’t change radically but enough to worry some analysts who fear that Apple will be drawn into a price war.  But you can’t invent a new wearable tech item every year; buyers would be festooned with glowing gadgets to the point where they’d irradiate themselves and endanger air travel with all the lights.  They have to do some consolidating, but of course they have to innovate too.

Where that may finally be happening is in the Mac side.  Analysts dismissed the notion of offering free software updates to harmonize the OS across all the models and get everyone on the same page, but that would be a critical first step in creating a true Apple ecosystem.  Microsoft knew when they entered the tablet space that they somehow had to pull the Windows franchise through into the mobile future.  Apple knows that too, and they think (likely correctly) that they’re in a better position to do that.  If everyone runs the same basic software, it helps users of multiple devices a bit, but the real value is that it enables a common framework of tools to support everything.

Where is that framework?  In the cloud, even though Apple isn’t saying that.  Apple is in the position of being able to create a kind of virtual user agent that lives in every Apple device and in the Apple Cloud, and that migrates functionality seamlessly through all of this.  They can create the point-of-activity empowerment I’ve been harping on, because they really have more of the pieces than anyone, including Microsoft.  I can’t say what the Apple people really believe and what they’ll really do, but they could do something truly revolutionary here.

So could Juniper.  So could all of the competitors of both companies.  We in the tech space are on a train that’s broken, that’s slowing down on a siding as a fresh new one overtakes us to pass.  There’s a time in such a situation where the new is going just slow enough and the old still moving just fast enough that you can jump safely between them.  Jump, Apple and Juniper.  It’s not going to get easier later on.

The Fast Track and the Waterhole

We’re getting some news that suggests that maybe we need to think not more about mobility, but less—in at least one sense.  It’s not the mobile user that’s changing things, it’s the migratory user.

Mobility means moving, and in terms of devices and services it means stuff directed at someone who is consuming on the go.  A mobile user fits their services into the context of their life, and that context is as variable as where they are or what they’re doing.  This is a sharp contrast to the user who goes to a laptop or desktop to do something; the device is essentially setting context and not framing it.  That’s why we have such a problem conceptualizing mobile services—they’re very different from what we’ve had.

What the news today may be showing is that the in-betweens are the battleground.  Imagine for the moment a tablet user, one of the new iPads perhaps.  Our user wants to watch something on their high-res glitzy screen and have others faint in admiration.  Do they sit in their car and try to grab a glance between steering and braking?  Do they walk down the street, bumping into those they hope to impress as they struggle to keep their eyes on the screen?  Or do they sit down in Starbucks where the admiring hordes are waiting for them?  Our user is “mobile” only between resting points—they’re migrating.

Netflix saw profits triple, and you have to believe that a big part of their growth comes from serving content to migratory viewers.  There’s a lot wrong with broadcast TV these days (a zillion commercials being one, stale content and short production season another) but despite that Verizon showed tremendous growth in FiOS.  Yes, they can be taking market share but if there were millions out there eager to view only streaming video it would be hard to create market-share gains large enough to cover the masses abandoning channelized video in favor of streaming.

It’s not happening, but what is happening is that people are viewing where they can’t drag their TVs.  That entertainment void has to be filled by somebody, and so Netflix is picking up slack that’s been created by a late and fairly insipid program of TV Everywhere for the channelized providers.  Sit in Starbucks, grab a cup of Joe or Johan or whatever, and amaze the masses with video on your new tablet.  Video from Netflix.

Ah, but now we come to the question that’s facing networking in general and mobile networking in particular.  Our tablet viewer in Starbucks doesn’t need LTE, they need WiFi.  They don’t need roaming, they don’t need voice calling, they don’t need a ton of stuff that we’ve focused on building since the late 1990s.  They need video and IM, which they can get easily from Netflix and Google and Amazon and Skype…over WiFi.

Today we hear that Comcast is going to be expanding their WiFi cell testing and checking out the notion of covering the waterfront, service-wise, with small WiFi cells.  They don’t have to cover the vast landscape, only the waterholes.  I calculate that only about 7% of the geography in a city and a statistically insignificant portion of the suburban/rural landscape would have to be covered in WiFi cells to capture every hangout location, every site where an eager viewer could sit down and view instead of driving or walking, options that would likely result in exiting the gene pool eventually.  Comcast could win in mobile by winning the migration.

This could have truly profound implications for the services of the future.  If content is a migratory application with WiFi delivery then the LTE market is much harder to make profitable because incremental content services can’t easily be sold.  If OTTs can bypass mobile networks to reach the customer they don’t create traffic, don’t force operators to usage pricing or to declining mobile revenue per bit.  We have a bunch of new and competing forces that would influence the design of networks in that all-critical metro space.

There are vendor impacts too.  We are now pursuing a mobile vision in an age of smartphones that was first articulated before the bubble, back in the ‘90s.  If you confronted a smart service architect with the problem of mobile services today, what they would create would bear no resemblance whatsoever to what we have in IMS.  We’d likely create something that looks a lot like Metaswitch’s Clearwater, in fact, a software-filled black box that has a few simple interfaces and scales over a vast range of performance and geographic extents.

I’m not dissing mobile and mobility.  What I’m saying is that every critter who moves around the ecosystem isn’t in a frantic and continuous race, creating nothing but a blur of motion.  Even the Road Runner must sit down for a coffee at some point.  We can couple technology to our every move, and also couple it to us when we’re not moving but not home either.  There’s a full spectrum of stuff that we might want to do, entertainment we might want to enjoy, and we will be able to get it everywhere.  Well, “get it” in the sense that delivery will be possible.  Whether consumption will be effective is another matter, and that means that we have to look at the “mobile user” not as a new homogeneous alternative to the home user, but rather as a coalition of interests.  Some of those interests will be more profitable to support and have lower deployment costs, and it’s those that will likely drive the changes in the near term.

We plan networks for four years and justify them financially every three months.  That means that niches like the waterhole might be the real future.

Guilt by Association and Lockstepism

One of the things I found interesting about the SDN World Congress last week was that it asserted, in effect, that the whole wasn’t the sum of the part but rather than one part was good enough to make it.  Anyone who had “network”, “function” or “virtualization” in any form erected the great banner of NFV for people to flock to.

An operator told the story of going to various booths to see their NFV strategy, and asking at some point “Where does this happen?” in reference to some specific NFV requirement.  They were told “Oh, that’s in the higher layer”.  SDN déjà vu, right?  All the world’s computing power, knowledge, and software skill couldn’t possibly cover all the stuff we’ve already pushed north of those northbound APIs.  Now NFV is doing the same thing.

The “why” of this is pretty simple.  Besides the universal desire to get maximum PR benefit with minimal actual product collateralization, this is a hard problem to solve.  I’ve talked about it from the network side in the past, and most of you who read this know that the CloudNFV project that I’m Chief Architect for says it solves it.  Perhaps knowing how we do that will help understand why there are so many people raising their eyes up toward that higher layer.

Virtualization demands mapping between abstraction and reality, and the more complicated and unbounded the need for abstract things is, the more difficult it is to realize them on resources.  It’s easy to make a virtual disk, harder to make a virtual computer, and darn hard to make a virtual network—because there are a lot of pieces in the latter and because there are a lot of things you’d have to ask your network abstraction to support.

When I looked at this problem first about a year ago, when the NFV concept was launched, I knew from running my own open-source project in the service layer that the architecture to fulfill NFV could be defined, but that implementation of that architecture from “bare code” would be very difficult.  There are too many pieces.  What made CloudNFV from a concept into a project was the fact that I found an implementation framework that resolved the difficulties.

In software terms, the challenge of both SDN and NFV is that both demand a highly flexible way of linking data and process elements around service missions.  We always say we’re moving toward that sort of agility with things like service-oriented architecture (SOA) and cloud databases like Hadoop, but in fact what we’re doing isn’t making our architecture agile, it’s making a new architecture for every new mission.  We’re stuck in two things, guilt by associations in data management and lockstepism in the processes.

We wouldn’t read as much about unstructured data as we do if we didn’t need structure to conceptualize data relationships.  Look at a table, any table, for proof.  The problem is that once you structure data, you associate it with a specific use.  You can’t look up on something that’s not a key.  When we have multiple visualization needs, we may be able to create multiple views of the data, but that process is really one of breaking down our “defaults” and creating something else, and it doesn’t work well for large data masses and for real-time missions.

On the process side, the whole goal of componentization and SOA has been to create what software geeks call “loose coupling” or “loose binding”.  That means that pieces of a process can be dynamically linked into it and that as long as the new pieces have the required functionality, they fit because we’ve described a bunch of stuff (through WSDL for example) to make sure of that.  But look inside this and what you find is that we’ve taken business processes and linked them to IT processes by flowing work along functional lines.  Our “agile processes” are driven like cattle in a pen or people marching in formation.  We have in-boxes and outboxes for workers, so we give that same model to our software.  Yet, as anyone who works knows, real work is event-driven.  There is no flow, except in a conceptual sense.  We have the in- and out-boxes to reflect not the needs of the business but the nature of our work distribution.  We did the same with SOA.

When I found EnterpriseWeb’s stuff, I saw a way out of this double-mess.  They have a flat and structureless data model, a process model that’s effectively a bunch of functionality floating out there in space with no particular connection set or mission.  To this seeming disorder, we bring a semantic layer that allows us to define bindings by defining how we want something to be done.  Data and processes are linked with each other as needed.  The context of doing things—the in- and out-boxes—are defined by the semantics and maintained in that layer, so that when we have to do something we marshal whatever we need and give it a unit of work to do.  You can scale this dynamically to any level, fail over from one thing to another, load-balance and scale-in/out.  All of that is automatic.  And since the contextual links represent the way work was actually done, we can journal the links and capture the state of the system not only now but any time in the past too.

This is how we made CloudNFV (which runs, and was demoed to operators in Frankfurt last week) work.  We have abstracted all the data and semantics of the NFV application into “Active Virtualization”, whose virtualizations are those semantic-layer overlays that define what’s supposed to happen under the conditions NFV poses.  We can make this structure look like whatever anyone wants to see—management can be expressed through any API because any data model is as good as any other.  We can make it totally agile because the way that something is handled is built into a semantic model, not fossilized into fixed data and software elements.  It’s taking an old saw like “You are what you eat”, which implies your composition is set by a long and complicated digestion-based sequence, and changing it to “You are what you want”, and wants don’t have much inertia.

Orchestration is more than OpenStack.  Management is more than SNMP.  NFV is more than declaring something to be hostable.  SDN includes the stuff north of the APIs.  We do the industry a disservice when we shy away from requirements for any reason.  It’s not necessary that we do that.  Six companies, none of them network giants, have built a real NFV and there’s no excuse for the market to settle for less from any purported supporter of the technology.