Sub-Service Management as a Long-Term SDN/NFV Strategy

For my last topic in the exploration of operator lessons from early SDN/NFV activity, I want to pursue one of the favorite trite topics of vendors; “customer experience”.  I watched a Cisco video on the topic from the New IP conference, and while it didn’t IMHO demonstrate much insight, it does illustrate the truth that customer experience matters.  I just wish people did more than pay lip service to it.

Customer experience management in a service sense is a superset of what used to be called SLA management, and it reflects the fact that most information delivered these days isn’t subject to a formal SLA at all.  What we have instead is this fuzzy and elastic conception of quality of experience, which is the classic “I-know-it-when-I-see-it” concept.  Obviously you can’t manage for subjectivism, so we need to put some boundaries on the notion and also frame concepts to manage what we find.

QoE is different from SLAs not only in that it’s usually not based on an enforceable contract (which, if it were, would transition us to SLA management) but in that it’s more statistical.  People typically manage for SLA and engineer for QoE.  Most practical customer experience management approaches are based on analytics, and the goal is to sustain operation in a statistical zone where customers are unlikely to abandon their operator because they’re unhappy.  That’s a very soft concept, depending on a bunch of factors that include whether the customer was upset before the latest issue and whether the customer sees a practical alternative that can be easily realized.

Sprint and T-Mobile have launched campaigns that illustrate the QoE challenge.  If I believe that some significant percentage of my competitors’ customers (and likely my own as well) are dissatisfied with service but unwilling to go through the financial and procedural hassle of changing, they I’ll make it easy for competitors’ customers to change—even give them an incentive.  Competition is the goad behind customer experience management programs; if your competitor can induce churn then you have a problem despite absolute measurements.

Operators recognize that services like Carrier Ethernet are usually based on recognizable resource commitments, which means that you can monitor the resources associated with the service and not just guess in a probabilistic sense what experience a user has based on gross resource behavior.  In consumer services there are no fixed commitments, and so you have to do things differently and manage the pool.

NFV, according to operators, has collided with both practice sets.  For business services, dynamic resource assignment and automated operations are great, but they introduce new variables into the picture.  With business services, NFV is mostly about deriving service state from virtual resource state.  That’s a problem that can be solved fairly easily if you look at it correctly.  The consumer problem is different because we have no specific virtual resource state to derive from.

What operators would like to avoid is “whack-a-mole” management where they diddle with resource pool behavior to achieve the smallest number of complaints.  That sort of thing might work if you could converge on your optimum answer quickly, and if resource state was then stable enough that you didn’t have to keep revisiting your numbers.  Neither is likely true.

One possible answer that operators are looking at, but have not yet been able to validate in a full trial, is correlating service and resource analytics.  If you have a quirky blip on your resource analytics dashboard, you could presume with fairly low risk of error that service issues at that time were correlated with the blip.  Thus, you could work to remedy the service problems by remediation of the resource blip, even if you didn’t understand full causal relationships.  The barrier to this mechanism is not only that it’s not easy to test the correlations today, it’s not even easy to gather the service-side analytics.  Measurement of QoE, you’ll recall from earlier comments, is measuring “windy”.  It’s in the eye of the beholder.

Most of the operators I’ve talked with are now of the view that NFV management, SDN management, and probably management overall, is going to be driven by the same notions (QoE substitutes for SLA, multi-tenancy substitutes for dedicated, virtualized substitutes for real) into the same path and that they need a new approach.  A few of the “literati” are now looking at what I’ll call “sub-service management”.

Sub-service management says that a “service” is a collection of logical functions/behaviors that are individually set to at least a loose performance standard.  The responsibility of service automation is to get each functional element to conform to its expectations.  Each element is also responsible for contributing a “management view” in the direction of the user, perhaps in the simple form of a gauge that shows red-to-green transitions reflecting non-conforming to beating the specifications.

If something goes wrong with a sub-service function we launch automated processes to remediate, and at the same time we look at the service through the user-side management viewer to see if something visible has gone bad.  If so, we treat this as a QoE issue.  We don’t try to associate user service processes with resource remediation processes.

The insight of sub-service management is that if you aren’t going to have fixed, dedicated, resource-to-service connections with clear fault transmission from resource to service, then you can’t work backwards from service faults to find resource problems.  The correlation may be barely possible for business services but it’s not possible for consumer services because the costs won’t scale.

There are barriers to sub-service management, though.  One is that we don’t have a clear notion of a service as a combination of functional atoms.  ETSI conflates low- and high-level structuring of resources and so makes it difficult to take a service like “content delivery” and pick out functional pieces that are then composed to create services.  And because only functionality can ever be meaningful to a service user, that means it’s hard to present a user management view.  Another is that there is no real notion of “derived operations” or the generation of high-level management state through an expression-based set of lower-level resource states.

I don’t think that it will be difficult to address any of these points, and I think the only reason why we’ve not done that so far is that we’ve focused on testing the mechanisms of NFV rather than testing the benefit realization.  As I’ve said in earlier blogs, the focus of PoCs and trials is now shifting and we’re looking at the right areas.  It’s just a matter of who will come up with an elegant solution first.

What Operators Think about Service-Event versus Infrastructure-Event Automation

I’m continuing to work through information I’ve been getting from operators worldwide on the lessons they’re learning from SDN and NFV trials and PoCs.  The focus of today is the relationship between OSS/BSS and these new technologies.  Despite the fact that operators say they are still not satisfied with the level of operations integration into early trials, they are getting some useful information.

One interesting point clear from the first is that operators see two different broad OSS/BSS-to-NFV (and SDN) relationships emerging.  In the first, the operations systems are primarily handling what we would call service-level activities.  The OSS/BSS has to accept orders, initiate deployment, and field changes in network state that would have an impact on service state.  In the second, we see OSS/BSS actually getting involved in lower-level provisioning and fault management.

There doesn’t seem to be a strong correlation between which model an operator thinks will win out and the size or location of the operator.  There’s even considerable debate in larger operators as to which is best, though everyone said they had currently adopted one approach and nearly everyone thought they’d stay with it for the next three years.  All this suggests to me that the current operations model evolved into existence based on tactical moves, rather than having been planned and deployed.

There is a loose correlation between which model an operator selects and the extent to which that operator sees seismic changes in operations as being good and necessary.  In particular, I find that operators who have pure service-level OSS/BSS models today are most likely to be concerned about making their system more event-driven.  Three-quarters of all the operators in the service-based-operations area think that’s necessary.  Interestingly, those that do not seem to be following a “Cisco model” of SDN and NFV, where functional APIs and policy management regulate infrastructure.  That suggests that Cisco’s approach is working, both in terms of setting market expectations and in fulfilling early needs.

The issue of making operations event-driven seems to be the technical step that epitomizes the whole “virtual-infrastructure transition”.  Everyone accepts that future services will be supported with more automated tools.  The question seems to be how these tools relate to operations, which means how much orchestration is pulled into OSS/BSS versus put somewhere else (below the operations systems).  It also depends on what you think an “event” is.

Most operations systems today are workflow-based systems, meaning that they structure a linear process flow that roughly maps to the way “provisioning” of a service is done.  While nobody depends on manual processes any longer, they do still tend to see the process of creating and deploying a service to be a series of interrupted steps, with the interruption representing some activity that has to signal its completion.  What you might call a “service-level event” represents a service-significant change of status, and since these happen rarely it’s not proved difficult to take care of them within the current OSS/BSS model.

The challenge, at least as far the “event-driven” school of operations people is concerned, lies in the extension of software tools to automatic remediation of issues.  One operator was clear:  “I can demonstrate OSS/BSS integration at the high level of the service lifecycle, but I’m not sure how fault management is handled.  Maybe it isn’t.”  That reflects the core question; do you make operations event-driven and dynamic enough to envelop the new service automation tasks associated with things like NFV and SDN, or do you perform those tasks outside the OSS/BSS?

This is where I think the operators’ view of Cisco’s approach is interesting.  In Cisco’s ACI model, you set policies to represent what you want.  Those policies then guide how infrastructure is managed and traffic or problems are accommodated.  Analytics reports an objective policy failure, and that triggers an operations response more likely to look like trouble-ticket management or billing credits than like automatic remediation.  It’s not, the operators say, that Cisco doesn’t or can’t remediate, but that resource management is orthogonal to service management, and the “new” NFV or SDN events that have to be software-handled are all fielded in the resource domain.

Most operators think that this approach is contrary to the vision that NFV at least articulates, and in fact it’s NFV that poses the largest risk of change.  It’s clear that NFV envisions a future where software processes not only control connectivity and transport parameters to change routes or service behavior, the processes also install, move, and scale service functionality that’s hosted not embedded.  This means that to these operators, NFV doesn’t fit in either a “service-event” model or in a “resource-based-event-handling model.  You really do need something new in play, which raises the question of where to put it.

The service-event-driven OSS/BSS planners think the answer to that is easy; you build NFV MANO below the OSS/BSS and you field and dispatch service-layer events to coordinate operations processes and infrastructure events.  This does not demand a major change in operations.  The remainder of the planners think that somehow either operations has to field infrastructure events and host MANO functions, or that MANO has to orchestrate both operations and infrastructure-management tasks together, creating a single service model top to bottom.

I’ve always advocated that view and so I’d love to tell you that there’s a groundswell of support arising for it.  That’s not the case.  In all the operators I’ve talked with, only five seem to have any recognition of the value of this coordinated operations/infrastructure event orchestration and only one seems to have grasped its benefits and how to achieve it.

What this means is that the PoCs and tests and trials underway now are just starting to dip a toe in the main issue pool, which is not how you make OSS/BSS launch NFV deployment or command NFV to tear down a service, but how you integrate all the other infrastructure-level automated management tasks with operations/service management.  This is what I think should be the focus of trials and tests for the second half of 2015.  We know that “NFV works” in that we know that you can deploy virtual functions and connect them to create services.  What we have to find out is whether we can fit those capabilities into the rest of the service lifecycle, which is partly supported by non-NFV elements and overlaid entirely by OSS/BSS processes that are not directly linked with MANO’s notion of a service lifecycle.

I think we may be close to this, and though “close” doesn’t mean “real close”, I think that the inertia of OSS/BSS is working in favor of keeping service events and infrastructure events separated and handling the latter outside OSS/BSS.  Since that’s what most are doing now, this might be a case where the status quo isn’t too bad a thing.  The only issue will be codifying how below-the-OSS orchestration and the OSS/BSS processes link with each other in a way broad and flexible enough to address all the service options we’re hoping to target with NFV.

Feature Balance, Power Balance, and Revolutionary Technologies

Networking and IT have always been “frenemies”.  They often compete for budgeting in the enterprises, they certainly have competed for power in the CIO organization.  One of the interesting charts I used to draw in trade show presentations tracked how the two areas were competing for “feature opportunity”.  By the year 2010, my model showed, IT would have convincingly owned about 17% of the total feature opportunity, networking 28%, and 55% would still be up for grabs.  Since no market wants to differentiate on price alone, having a feature-opportunity win would be a big boost for technologies, vendors, and the associated political constituencies in the enterprise.

That forecast largely came true in 2010, and networking did gain strength and relevance.  Since then things have been changing.  In 2014, the model said that if you looked at the totality of feature opportunity, networking and IT had cemented about 19% each, and everything else was yet to be committed.  What changed things certainly included the combination of the Internet and the cloud, but these two forces don’t tell the whole story.

The Internet demonstrates resource can be turned into network abstractions.  All forms of cloud computing tend to make things more network-like for the simple reason that they promote network access to abstract IT features.  That much of the cloud trend could promote networking versus IT flies in the face of the shift actually seen.  What made the difference goes back to abstraction, and the details might explain why John Chambers seems to be saying “white boxes win”, why IBM might (as reported on SDxCentral) be investing more in SDN, and why EMC might want to buy a network company.

Even before SDN came along, we were seeing a trend toward the abstraction of network behavior, “virtual” networks like VPNs and VLANs.  This trend has tended to reduce differentiation among network vendors by creating a user-level, functional, definition for services at L2 and L3.  Sure, users building their own networks could appreciate the nuances of implementation, but functionality drives the benefit case and thus enables consumption.

SDN takes virtualization of networks in a new direction.  By proving abstractions of devices and not just services, SDN makes it more difficult to differentiate even at the level of building networks.  If we assumed that SDN  in its pure form went forward and dominated, then “white box” is inevitable at least in a functional sense.  Only what could be specified by OpenFlow could be used to build services.  That’s the ultimate in abstraction.

NFV takes another perhaps more significant step, along with cloud-management APIs like OpenFlow’s Neutron.  If you have a means of creating applications and services that consume network abstractions, then anything that realizes these abstractions is as good as anything else.  That’s the explicit goal of NFV, after all.  Properly applied, NFV says “You can resolve our abstractions of network services using SDN, but also using anything else that’s handy”.  It embraces the legacy elements, which limits how much network incumbents can do to stave off commoditization by bucking evolution to new models like SDN.

The interesting thing here is that networking, despite having a lead before, lost ground between 2010 and 2015.  Not lost in terms of investment but lost in terms of feature-value leadership.  Perhaps even more interesting is that IT didn’t gain, it also lost.  The gainer was the “in-between”, and I think that’s the most important lesson to learn here.

Virtualization is the general trend at work here.  It’s a combination of abstraction and instantiation, intended in large part to promote resource independence.  Abstraction reduces everything to functionality.  Functionality is a slave to demand, not to supply, and abstraction’s very goal of resource independence shouts “Hardware doesn’t matter!”  The important thing my modeling shows is that both IT and networking are losing, and nobody grabs for a lifeboat like a drowning man.  Thus, it’s abstraction that I think is behind the news items I cited.

EMC, whose VMware unit acquired Nicira, is in a position to abstract everything in physical networking.  A virtual overlay doesn’t care what the underlayment is.  The problem that they have is that even if the “undernet” is anonymous, it still has to be something.  So it makes sense for EMC to think about buying a network company to get some real gear.  If they don’t then a vendor who offers real equipment might well offer virtual-overlay software too.  A vendor like Cisco.  Chambers knows that overlay wins, but he dares not to say “overlay” because everyone will then think VMware/EMC.  So he says “white box”, and commits his own version of abstraction.

For a vendor on the IT side like IBM, the smart play is to abstract the network stuff as completely as possible.  So IBM is an OpenDaylight champ, and it continues to develop OpenDaylight even though it seems to have no clear SDN story or mission of its own.  It doesn’t need one to win; it only has to make sure that a network abstraction wins.

Making network abstraction win means to make sure software wins, because ultimately IBM is now a software company.  Hardware, network or IT, is more of a risk than anything else.  A giant hardware player can still hold its own against a giant software player because you can’t run software in thin air, or overlay nonexistent infrastructure.  So IBM has to fight not only Cisco but also EMC and HP, more perhaps even than it has to fight Oracle and Microsoft.  Why?  Because software plus hardware will beat software alone, mostly because the majority of spending will still be on the platform and not on the software.  The company who has both can sustain sales presence and control, at least in the near term.

Even in the long term, how commoditized can hardware be?  We’ve had standard-platform x86 “COTS” for decades and while all the vendors would love to see better margins and less competition, there are still competitors and we’re seeing commoditization and consolidation rather than the collapse of the hardware space.  Chambers’ view of a white-box future may be likened to a story about trolls to keep kids out of the woods.  He may be afraid that he can’t dynamite Cisco into a more software-centric stance without making it clear that they can’t just hunker down on the hardware forever.  Whatever he says, though, you still need vendors to make white boxes and to integrate and support the function/host combination.

The interesting thing is that while Cisco and EMC and IBM have been in the stories, they’re not the players I think will decide the issues.  Those are HP and Oracle.  HP is perhaps the last real “server” vendor left and Oracle is the real “software” player, if one focuses both categories on the abstraction, SDN, and NFV battleground.  Both HP and Oracle are looking for a strong NFV story.  Both have good middleware credentials, but Oracle has the advantage in middleware.  HP has the advantage with servers, networking and SDN.  Neither of the two have fully leveraged their assets.  If and when they do they may finally decide who gets to shape the future.

In Search of a Paradigm for Virtual Testing and Monitoring

Virtualization changes a lot of stuff in IT and in networking, for two principle reasons.  One is that it breaks the traditional ties between functionality (in the form of software) and resources (both servers and associated connection-network elements).  The other is that it creates resource relationships that don’t map to physical links or paths.  The end result of virtualization is something highly flexible and agile, but also significantly more complicated.

When SDN and NFV came along, one of the things I marveled at was the way that test and monitoring players approached it.  The big question they asked me was “What new protocols are going to be used?”  as if you could understand NFV by intercepting the MANO-to-VIM interface.  The real question was how you could gain some understanding of network behavior when all the network elements and pathways were agile, virtual.

Back in the summer of 2013 when I was Chief Architect for the CloudNFV initiative, I prepared a document on a model for testing/monitoring as a service.  The approach was aimed at leveraging the concept of “derived operations” that was the primary outgrowth of the original ExperiaSphere project and the associated TMF presentations, to provide the answer to the real question, which was “How do you test/monitor a virtual network”.  There was never a partner for that phase and so the document was never released, but I think the basic principles are valid and they serve as a primer in at least one way of approaching the problem.

Like ExperiaSphere, CloudNFV was based on “repository-based management” where all management data was collected in a repository to be delivered through management proxies and queries against that data base, in whatever form was helpful.  A server or switch, for example, would have its MIB polled by an agent who would then store the data (including time-stamp) in the repository.  When somebody wanted to look at switch state, they’d query the repository and get the relevant information.

What makes this “derived” operations was the idea that a service model described a set of objects that represented functionality—atomic like a device or VNFC or collective like a subnetwork.  Each object in the model could describe a set of management variables whose value derived from subordinate object variables using any expression that was useful.  In this way, the critical pieces of a service model—the “nodes”—could be managed as though they were real, which is good because in a virtual world, the abstraction (the service model) is the “realest” thing there is.

The real solution to monitoring virtual networks is to take advantage of this concept.  With derived operations, a “probe” that can report on traffic conditions or other state information is simply a contributor to the repository like anything else that has real status.  You “read” a probe by doing a query.  The trick lies in knowing what probe to read, and I think the solution to that problem exposes some interesting points about NFV management in general.

When an abstract “function” is assigned to a real resource, we call that “deployment” and we call the collective decision set that deploys stuff in NFV “orchestration”.  It follows that orchestration builds resource bindings, and that at the time of deployment we “know” where the abstraction’s resources are—because we just put them there.  The core concept of derived operations is to record the bindings when you create them.  We know, then, that a given object has “real” management relationships with certain resources.

Monitoring is a little different, or it could be.  One approach to monitoring would be to build probes into service descriptions.  If we have places where we can read traffic using RMON or DPI or something, we can exercise those capabilities like they were any other “function”.  A probe can be what (or one of the things that) a service object actually deploys.  A subnet can include a probe, or a tunnel, or a machine image.  Modeled with the service, the probe contributes management data like anything else.  What we’d be doing if we used this model is similar to routing traffic through a conventional probe point.

The thing is, you could do even more.  In a virtual world, why not virtual probes?  We could scatter probes through real infrastructure or designate points where a probe could be loaded.  When somebody wanted to look at traffic, they’d do the virtual equivalent of attaching a data line monitor to a real connection.

To make virtual probes work, we need to understand probe-to-service relationships, because in a virtual world we can’t allow service users to see foundation resources or they see others’ traffic.  So what we’d have to do is to follow the resource bindings to find real probe points we could see, and then use a “probe viewer” that was limited to querying the repository for traffic data associated with the service involved.

One of the things that’s helpful in making this work is the notion of modeling resources in a way similar to that used for modeling services.  An operator’s resource pool is an object that “advertises” bindings to the service objects, each representing some functional element of a service for which it has a recipe for deployment and management.  When a service is created, the service object “asks” for a binding from the resource model, and gets the binding that matches functionality and other policy constraints, like location.  That’s how, in the best of all possible worlds, we can deploy a 20-site VPN with firewall and DHCP support when some sites can use hosted VNF service chains and others have or need real CPE.  The service architect can’t be asked to know that stuff, but the deployment process has to reflect it.  The service/resource model binding is where the physical constraints of infrastructure match the functional constraints of services.

And monitoring, so it happens.  Infrastructure can “advertise” monitoring and even test data injection points, and a service object or monitoring-and-testing-as-a-service could then bind to the correct probe point.  IMHO, this is how you have to make testing and monitoring work in a virtual world.  I think the fact that the vendors aren’t supporting this kind of model is in no small part due to the fact that we’ve not codified “derived operations” and repository-based management data delivery, so the mechanisms (those resource bindings and the management derivation expressions) aren’t available to exploit.

I think that this whole virtual-monitoring and monitoring-as-a-service thing proves an important point, which is that if you start something off with a high-level vision and work down to implementation in a logical way, then everything that has to be done can be done logically.  That’s going to be important to NFV and SDN networks in the future, because network operators and users are not going to forego the tools they depend on today just because they’ve moved to a virtual world.

Fixing the Conflated-and-Find-Out Interpretation of MANO/VIM

I blogged recently about the importance of creating NFV services based on an agile markup-like model rather than based on static explicit data models.  My digging through NFV PoCs and implementations has opened up other issues that can also impact the success of an NFV deployment, and I want to address two of them today.  I’m paring them up because they both relate to the critical Management/Orchestration or MANO element.

The essential concept of NFV is that a “service” somehow described in a data model is converted into a set of cooperating committed resources through the MANO element.  One point I noted in the earlier blog is that if this data model is highly service-specific, then the logic of MANO necessarily has to accommodate all the possible services or those services are ruled out.  That, in turn, would mean that MANO could become enormously complicated and unwieldy.  This is a serious issue but it’s not the only one.

MANO acts through an Infrastructure Manager, which in ETSI is limited to managing Virtual Infrastructure and so is called a VIM.  The VIM represents “resources” and MANO the service models to be created.  If you look at the typical implementations of NFV you find that MANO is expected to drive specific aspects of VNF deployment and parameterization, meaning that MANO uses the VIM almost like OpenStack would use Neutron or Nova.  In fact, I think that this model was explicitly or unconsciously adopted for the relationship, which I think is problematic.

The first problem that’s created by this approach is what I’ll call the conflation problem.  A software architect approaching a service deployment problem would almost certainly divide the problem into two groupings—definition of the “functions” part of virtual functions and descriptions/recipes on how to virtualize them.  The former would view a VNF implementation of “firewall” and a legacy implementation of the same thing as equivalent, not to mention two VNF implementations based on different software.  The latter would realize the function on the available resources.

If you take this approach, then VIMs essentially advertise recipes and constraints on when (and where) they can be used.  MANO has to “bind” a recipe to a function, but once a recipe is identified it’s up to the VIM/chef to cook the dish.

In a conflated model, MANO has to deploy something directly through the VIM, understanding tenant VMs, servers, and parameters.  The obvious effect of this is to make MANO a lot more complicated because it now has to know about the details of infrastructure.  That also means that the service model has to have that level of detail, which as I’ve pointed out in the past means that services could easily become brittle if infrastructure changes underneath.

The second issue that the current MANO/VIM approach creates is the remember-versus-find-out dichotomy.  If MANO has to know about tenant VMs and move a VIM through a deployment process, then (as somebody pointed out in response to my earlier blog on this) MANO has to be stateful.  A service that deploys half a dozen virtual machines and VNFCs has a half-dozen “threads” of activity going at any point in time.  For a VNF that is a combination of VNFCs to be “ready”, each VNFC has to be assigned a VM, loaded, parameterized, and connected.  MANO then becomes a huge state/event application that has to know all about the state progression of everything down below, and has to guide that progression.  And not only that, it has to do that for every service—perhaps many at one time.

Somebody has to know something.  You either have to remember where you are in a complex deployment or constantly ask what state things are in.  Even if you accept that as an option, you’d not know what state you should be in unless you remembered stuff.  Who then does the remembering?  In the original ExperiaSphere project, I demonstrated (to the TMF among others) that you could build a software “factory” for a given service by assembling Java Objects.  Each service built with the factory could be described with a data model based on the service object structure, and any suitable factory could be given a data model for a compatible service at any state of lifecycle progression and it could process events for it.  In other words, a data model could remember everything about a service so that an event or condition in the lifecycle could be handled by any copy of a process.  In this situation, the orchestration isn’t complicated or stateful, the service model that describes it remembers everything needed because it’s all recorded.

There are other issues with the “finding-out” process.  Worldwide, few operators build services without some partner contributions somewhere in the process.  Most services for enterprises span multiple operators, and so one operates as a prime contractor.  With today’s conflated-and-find-out model of MANO/VIM, a considerable amount of information has to be sent from a partner back to the prime contractor, and the prime contractor is actually committing resources (via a VIM) from the partner.  Most operators won’t provide that kind of direct visibility and control even to partners.  If we look at a US service model where a service might include access (now Title II or common-carrier regulated) and information (unregulated), separate subsidiaries at arm’s length have to provide the pieces.  Is a highly centralized and integrated MANO/VIM suitable for that?

I’m also of the view that the conflated-find-out approach to MANO contributes to the management uncertainty.  Any rational service management system has to be based on a state/event process.  If I am in the operating state and I get a report of a failure, I do something to initiate recovery and I enter the “failed” state until I get a report that the failure has been corrected.  In a service with a half-dozen or more interdependent elements, that can best be handled through finite-state machine (state/event) processing.  But however you think you handle it, it should be clear that the process of fixing something and of deploying something are integral, and that MANO and VNFM should not be separated at all.  Both, in fact, should exist as processes that are invoked by a service model as its objects interdependently progress through their lifecycle state/event transitions.

If you’re going to run MANO and VNFM processes based on state/event transitions, then why not integrate external NMS and OSS/BSS processes that way?  We’re wasting enormous numbers of cycles trying to figure out how to integrate operations tasks when if we do MANO/VNFM right the answer falls right out of the basic approach with no additional work or complexity.

Same with horizontal integration across legacy elements.  If a “function” is virtualized to a real device instead of to a VNF, and if we incorporate management processes to map either VNF and host state on one hand and legacy device state on the other to a common set of conditions, then we can integrate management status across any mix of technology, which is pretty important in the evolution of NFV.

If we accept the notion that the ETSI ISG is a functional specification then these issues can be addressed readily by simply adopting a model-based description of management and orchestration.  Another mission for OPNFV, or for vendors who are willing to look beyond the limited scope of PoCs and examine the question of how their model could serve a future with millions of customers and services.

Parcel Delivery Teaches NFV a Lesson

Here’s a riddle for you.  What do Fedex and NFV have in common?  Answer:  Maybe nothing, and that’s a problem.  A review of some NFV trials and implementations, and even some work in the NFV ISG, is demonstrating that we’re not always getting the “agility” we need, and for a single common reason.

I had to ship something yesterday, so I packed it up in a box I had handy and took it to the shipper.  I didn’t have a specialized box for this item.  When I got there, they took a measurement, weighed it, asked me for my insurance needs, and then labeled it and charged me.  Suppose that instead of this, shippers had a specific process with specific boxing and handling for every single type of item you’d ship.  Nobody would be able to afford shipping anything.

How is this related to NFV?  Well, logically speaking what we’d like to have in service creation for NFV is a simple process of providing a couple of parameters that define the service—like weight and measurements on a package—and from those invoke a standard process set.  If you look at how most NFV trials and implementations are defined, though, you have a highly specialized process to drive deployment and management.

Let me give an example.  Suppose we have a business service that’s based in part on some form of virtual CPE for DNS, DHCP, NAT, firewall, VPN, etc.  In some cases we host all the functions in the cloud and in others on premises.  Obviously part of deployment is to launch the necessary feature software as virtual network functions, parameterize them, and then activate the service.  The parameters needed by a given VNF and what’s needed to deploy it will vary depending on the software.  But this can’t be reflected in how the service is created or we’re shipping red hats in red boxes and white hats in white boxes.  Specialization will kill agility and efficiency.

What NFV needs is data-driven processes but also process-independent data.  The parameter string needed to set up VNF “A” doesn’t have to be defined as a set of fields in a data model.  In fact, it shouldn’t be because any software guy knows that if you have a specific data structure for a specific function, the function has to be specialized to the structure.  VNF “A” has to understand its parameters, but the only thing that NFV software has to know to do is to get variables the user can set, and then send everything to the VNF.

The biggest reason why this important point is getting missed is that we are conflating two totally different notions of orchestration into one.  Any respectable process for building services or software works on a functional-black-box level.  If you want “routing” you insert something that provides the properties you’re looking for.  When that insertion has been made at a given point, you then have to instantiate the behavior in some way—by deploying a VNF that does virtual routing or by parameterizing something real that’s already there.  The assembly of functions like routing to make a service is one step, a step that an operator’s service architect would take today but that in the future might be supported even by a customer service portal.  The next step, marshaling the resources to make the function available, is another step and it has to be separated.

In an agile NFV world, we build services like we build web pages.  We have high-level navigation and we have lower-level navigation.  People building sites manipulate generic templates and these templates build pages with specific content as needed.  Just like we don’t have packages and shipping customized for every item, we don’t have a web template for every page, just for every different page.  Functional, in short.  We navigate by shifting among functions, so we are performing what’s fairly “functional orchestration” of the pages.  When we hit a page we have to display it by decoding its instructions.  That’s “structural” orchestration.  The web browser and even the page-building software doesn’t have to know the difference between content pieces, only between different content handling.

I’ve been listening to a lot of discussions on how we’re going to support a given VNF in NFV.  Most often these discussions are including a definition of all of the data elements needed.  Do we think we can go through this for every new feature or service and still be agile and efficient?  What would the Internet be like if every time a news article changed, we had to redefine the data model and change all the browsers in the world to handle the new data elements?

NFV has to start with the idea that you model services and you also model service instantiation and management.  You don’t write a program to do a VPN and change it to add a firewall or NAT.  You author a template to define a VPN, and you combine that with a “Firewall” or “NAT” template to add those features.  For each of these “functional templates” you have a series of “structural” ones that tell you, for a particular point in the network, how that function is to be realized.  NFV doesn’t have to know about the parameters or the specific data elements, only how to process the templates, just like a browser would.  Think of the functional templates as the web page and the structural ones as CSS element definitions.  You need both, but you separate them.

I’d love to be able to offer a list here of the NFV implementations that follow this approach, but I couldn’t create a comprehensive list because this level of detail is simply not offered by most vendors.  But as far as I can determine from talking with operators, most of them are providing only those structural templates.  If we have only structural definitions then we have to retroject what should be “infrastructure details” into services because we have to define a service in part based on where we expect to deploy it.  If we have a legacy CPE element in City A and virtual CPE in City B, we’d have to define two different services and pick one based on the infrastructure of the city we’re deploying in.  Does this sound agile?  Especially considering the fact that if we then deploy VCPE in City A, we now have to change all the service definitions there.

Then there’s management.  How do you manage our “CPE” or “VCPE?”  Do we have to define a different data model for every edge router, for every implementation of a virtual router?  If we change a user from one to the other, do all the management practices change, both for the NOC and for the user?

This is silly, people.  Not only that, it’s unworkable.  We built the web around a markup language.  We have service markup languages now, the Universal Service Definition Language or USDL is an example.  We have template-based approaches to modeling structures and functions, in TOCSA and elsewhere.  We need to have these in NFV too, which means that we have to work somewhere (perhaps in OPNFV) on getting that structure in place, and we have to start demanding vendors explain how their “NFV” actually works.  Otherwise we should assume it doesn’t.

How Buyers See New Network Choices

Networking is changing, in part because of demand-side forces and in part because of technologies.  The question is whether technology changes alone can have an impact, and for that one I went to some buyers to get answers on how they viewed some of the most popular new technology options of our time.  The results are interesting.

One of the most interesting and exciting (at least to some) of the SDN stories is the “white box” concept.  Give an enterprise or a service provider an open-source SDN controller, OpenFlow, and a bunch of “white box” generic/commodity switches and you have the network of the future.  Since “news” means “novelty” rather than “truth” it’s easy to see why this angle would generate a lot of editorial comment.  The questions are first, “Is it true?” and second, “What would it actually mean?”

The white-box craze is underpinned by two precepts.  First, that an open-source controller could create services using white-box switches that would replicate IP or Ethernet services of today.  Second, that those white-box switches would offer sufficiently lower total cost of ownership versus traditional solutions to induce buyers to make the switch.  The concept could deliver, but it’s not a sure thing.

Buyers tell me that in the data center the white-box concept isn’t hard to prove out.  Any of the open-source controllers tried by enterprises and operators were able to deliver Ethernet switching services using white-box foundation switches.  This was true for data centers ranging from several dozen to as many as several thousand servers.

However, buyers were mixed on whether the savings were sufficient.  Operators said that their TCO advantage averaged 18%, which was they said less than needed to make a compelling white-box business case if there was already an installed base of legacy devices.  Most said it was sufficient to justify white-box SDN in new builds.  Enterprises reported TCO benefits that varied widely, from as little as 9% to as much as 31%.  The problem for enterprises was that they had little expectation of new builds and most set a “risk premium” of about 25% on major technology changes.  Thus most enterprises indicated that they couldn’t make the business case in the data center.

Outside the data center it was even more negative.  Only 8% of operators’ projects outside the data center were able to even match the data center’s 18% TCO benefit, and operators expressed concerns that white-box technology was “uproven” (by a 2:1 margin) or offered too low a level of performance (by 60:40) to be useful at all, savings notwithstanding.

Interestingly, virtual switching/routing fares a lot better outside the data center.  Almost 70% of operators thought that virtual switching/routing could if hosted on optimal servers deliver at least 20% TCO benefit relative to legacy devices.  For enterprises the number was just over 75%.  Inside the data center, both operators and enterprises believed vSwitch technology could reduce their need to augment data center switching substantially (offering savings nearly 40% in TCO) but they didn’t see it displacing current switches or eliminating the need for new switches if new servers were added.  The consensus was that vSwitches were good for VMs, not for servers.

Operators believe that agile optics can supplement vSwitch technology and selected white-box deployments to displace as much as 70% of L2/L3 spending by 2025.  This suggests that white-box SDN and virtual switching/routing is best employed to supplement optical advances.  They see white-box data centers emerging more from NFV deployments, interestingly, than they do from directly driven SDN opportunities.  The reason seems to be that they believe NFV will generate a lot of new but smaller data centers where white-box and virtual technology is seen as suitable in performance.

Buyers are in general not particularly enthusiastic about white-box support or vendor credibility.  Three out of four enterprises and almost 90% of operators think their legacy vendors are more trustworthy and offer more credible support.  Virtually 100% of both groups think that they would want “more contractual assurances” from white-box vendors to counter their concerns about reputation and historicity.

What about white-box devices from legacy vendors.  Almost half of both buyer groups think that will “never happen” meaning no chance for at least five years.  Everyone saw legacy vendors entering the white-box space in earnest only when there was no option other than to lose business to others.  Nobody saw them as being leaders, though almost all buyers say that they can get SDN control for legacy devices from their current vendors.

Another option that generates mixed reviews is the overlay SDN model popularized by Nicira (now part of VMware).  While nearly all network operators and two-thirds of enterprises see benefits in overlay-based SDN, they’re struggling to assign an economic value to their sentiment.  The most experienced/sophisticated buyers in both groups (what I call the “literati”) believe that virtual-overlay technology combined with white-box basic physical switching in the LAN and agile optics in the WAN.  They say that the potential benefits are not promoted by vendors, however.

Interestingly, both network operators and enterprises are more hopeful about the Open Compute switching model than about white-box products based on SDN.  Almost 80% of enterprises say they would purchase OCP switches from “any reputable vendor” and almost 70% say they would buy commodity versions of these products.  Operators run slightly lower in both categories.  The difference, say buyers, is that OCP is a “legacy switch in a commodity form” where white-box SDN switches are based on a “new and less proven” technology combination.

What I get from all of this is that buyers need a more holistic statement of a new switch/routing paradigm than they’re getting.  It would seem that a combination of white-box physical switching and overlay SDN might be very attractive, but in the main buyers don’t see that being offered as a combination and they see do-it-yourself integration of two less-than-proven technologies as unattractive.  They’d love to see a major computer vendor (HP or IBM) field that combination; they’re not convinced that network giants will do that, and they’re still a little leery of startups, though less so than they’d been in the past.

The lesson is that there’s no such thing as a “point revolution”.  We have to expect rather significant and widespread change if we’re going to see much change at all, and users need a lot of reassurance about new technologies…and new partners.

A Deep Look at a Disappointing Neutrality Order

The FCC finally released its neutrality order, causing such a run on the website that it crashed the document delivery portion.  Generally, the order is consistent with the preliminary statement on its contents that was released earlier, but now that the full text is available it’s possible to pin down some of the issues I had to hedge on before.

First, the reference.  The official document is FCC 15-24, “REPORT AND ORDER ON REMAND, DECLARATORY RULING, AND ORDER” issues March 12, 2015.  Not surprisingly in our current politicized age, it was passed on a 3:2 partisan vote.  It’s 400 pages long in PDF form, so be prepared for a lot of reading if you intend to browse it fully.

This order was necessitated by the fact that the previous 2010 order was largely set aside by the DC Court of Appeals.  The problem the FCC had stemmed from the Telecom Act of 1996, which never mentioned the Internet at all and was woefully inadequate to serve as guidance in what was the dawn of the broadband era.  I won’t rehash all the past points, but in summary we spend about seven years trying to come up with a compromise reading of the Act that would let broadband investment continue but at the same time provide some regulatory clarity on the Internet itself.  The formula they arrived at was that the Internet was “an information service with a telecommunications component.”  That exempt it from common-carrier regulation, which is defined by Title II of the Communications Act.

When in 2010 the FCC tried to address some of the emerging neutrality issues, they were trapped by their own pronouncement.  If the ISPs were common carriers there was no question the FCC could do what it wanted, but the FCC had said they were not.  The order of 2010 was largely an attempt to salvage jurisdiction from that mess, and it failed—that’s what the Court of Appeals said.  So the fact is that unless you wanted no neutrality order at all, the FCC had no option but Title II regulation.  Fortunately for the FCC, it is not bound legally by its own precedent, which means it can simply change its mind.  It did.

The essence of the 2015 order is simple.  The FCC declares the ISPs to be common carriers with respect to broadband Internet service, making them subject to Title II.  They then exercise the once-famous-now-forgotten provision of the Telecom Act, Section 706, which allows the FCC to “forebear” from applying provisions of the act to assure the availability of Internet services to all.  In this basic sense, the order is following the recipe that the DC Court of Appeals offered in its opinion on the 2010 order, and so this part of the order is fairly bulletproof.

What the FCC proposes to do with the authority it has under Title II is a bit more complicated.  At a high level, the goal of the order is to draw what the FCC calls a “bright line”, a kind of classic line-in-the-sand that would tell everyone where they can’t go.  The basic principles of that bright line are:

  • No blocking of lawful traffic, services, devices, or applications.
  • No throttling of said traffic, except for clear network management purposes.
  • No paid prioritization.

Unlike the order of 2010, the FCC applies these rules to both wireless and wireline.  They exempt services based on IP or otherwise that are separate from the Internet, including VoIP, IPTV, and hosting and business data services.  I interpret the exemptions as including cloud computing services as well.  The key point is that an exempt service is one that does not provide access to the Internet overall, and uses facilities that are separate from those of broadband Internet access.

The last point is important to note.  Broadband Internet access is a Title II service.  The Internet itself is not.  However, the FCC does reserve for itself with this order the right to intervene on interconnect issues, though it declines to do that at present.  The order says that regulators lack the history of dealing with Internet interconnect issues and is not comfortable with prescriptions without further data and experience.  Thus, the order neither affirms nor rules out paid settlement among ISPs of the Netflix-Comcast type.

A point that cuts across all of these other issues is that of transparency.  The FCC wants broadband Internet providers to say what they mean and then do what they say.  My interpretation of this means for example that a mobile provider can’t offer “unlimited” data and then limit it by blocking or throttling or by adding hidden charges based on incremental usage.

To me, the order has one critical impact, perhaps not what the FCC intended.  Operators want to make a favorable return on investment.  If they don’t have a pathway to that through paid prioritization, then it is unlikely that Internet as a service will ever be truly profitable to them.  The best they could hope for would be to earn enough to cover the losses by selling other over-the-top services.  That’s a problem because the OTTs themselves wouldn’t have those losses to cover, and so could likely undercut operators on price.  Thus, the operators may look to “special services” instead, and I think that works against everything the FCC says it wants.

The order gives the distinct impression that the FCC believes the distinguishing point about the Internet is its ubiquity.  A “special service” has the defining criteria of not giving access to all of the Internet.  You can use IP and deliver some specific thing, not Internet access, and call it a special service, immune from regulations.  Certainly the universality of Internet access is a valid criteria, but in an investment sense the fact is that most paying services travel very short distances—less than 40 miles—and involve largely content delivery or (in a growing amount) cloud computing.  Does the order allow operators to separate out the profitable stuff—even encourage them?  Already it’s clear that profitable services are largely special services and the prohibition on paid prioritization guarantees that will be truer in the future.

Video is delivered both on- and off-Internet today, but channelized viewing is a special service.  Most for-fee operator VoIP is also a special service.  Business data services are special services.  Were there paid QoS on the Internet, might there be pressure to move these special services back onto the Internet?  Might the FCC even be able to take the position that they should be combined?  As it is, I see no chance of that happening, and in fact every chance that operators will look to special services, off the Internet, to insure they get reasonable returns.  Home monitoring, cloud computing, IoT, everything we talk about as being a future Internet application could, without paid prioritization, end up off the Internet not on it.

We might, with paid prioritization and a chance for Internet profit, see VC investment in the Internet as a network instead of in other things that increase traffic and complicate the ISP business model.  Certainly we’d give traditional L2/L3 devices a new lease on life.  The order, if it stands, is likely to put an end to those chances and accelerate the evolution toward virtual L2/L3 and minimization of “access” investment.

Will it stand?  The FCC has broad powers on Title II services; do they have the power to say that some commercially viable options cannot be presented, or that operators have to provide services with limits on their features?  I don’t know the answer to that one, but I suspect that there will be pressure now for Congress to step in.  In this day and age that’s a doubtful benefit, but there’s plenty of doubt in what we have now.

The problem here is that we don’t have a real middle ground in play.  Compromise, even when it’s politically possible, is easier to achieve if there is a position between the extremes.  With neutrality we’ve largely killed off moderation, leaving the best position one none of the partisan advocacy groups occupy.  There is then no constituency on which to build a compromise because a middle-ground view simply offends all the players.

“Internet” is an information network on top of a telecommunications service.  We have to treat the latter like all such services, meaning we have to regulate it and apply rules to settlement and interconnect.  We have to include QoS (where have we had a commercial service without SLAs?)  I think that Chairman Wheeler was on the right track with neutrality before the Administration intervened.  Sadly, we can’t take that back and sadly Congressional intervention will only create the other extreme polar view.  Now, I guess, we’ll have to wait until some symptoms develop and rational views can prevail, or so we can hope.

Alcatel-Lucent Offers a Bottom-Up Metro Vision

While vendors are typically pretty coy about public pronouncements on the direction that networking will take, they often telegraph their visions through their product positioning.  I think Alcatel-Lucent did just that with its announcement of its metro-optical extensions to its Photonic Service Switch family.  Touting the new offerings as being aimed at aggregation, cloud, and content missions, Alcatel-Lucent is taking aim at the market area that for many reasons is likely to enjoy the most growth and provide the best opportunities for vendors.  It’s just shooting from cover.

Networking isn’t a homogeneous market.  Infrastructure return varies, obviously, by service type so wireless is generally more profitable than wireline, business more profitable than residential, high-level more profitable than lower-level.  Operators will spend more where profits are greater, so there’s an emphasis on finding ways to exploit higher return potential.  Unfortunately, the universality of IP and the fact that broadband Internet is merging wireline and wireless to a degree work against service-based targeting.  Another dimension of difference would be helpful, and we have it with metro.

I’ve noted in past blogs that even today, about 80% of all profitable traffic for operators travels less than 40 miles, meaning that it stays in the metro area where it originates.  Cloud computing, NFV-based services, and content services will combine to raise that percentage through the next five years.  If NFV achieves optimum deployment, the number of NFV data center interconnects alone would be the largest source of cloud-connect services.  Mobile EPC transformation to an SDN/optical model and the injection of SDN-based facilitation of WiFi offload and integration are another enormous opportunity.

Aside from profit-and-service-driven changes, it’s obvious that networking is gradually shifting focus from L2/L3 down to the optical layer as virtualization both changes how we build high-level connectivity and virtual switch/routers displace traditional hardware.  It’s also obvious that the primary driver of these changes is the need to deliver lower cost bit-pushing services in the face of steadily declining revenue per bit.

Given that one of Alcatel-Lucent’s “Shift” focus points was routing, the company isn’t likely to stand up and tout all of this directly.  Instead of preaching L2/L3 revolution from above, they’re quietly developing more capable optical-layer stuff and applying it where it makes the most sense, which is in the metro area.  The strategy aims to allow operators to invest in the future without linking that investment to displacement of (or reduction in purchasing of) legacy switches and routers.  Unlike Juniper, who tied its own PTX announcement to IP/MPLS, Alcatel-Lucent stays carefully neutral with its approach, which doesn’t commit operators to metro IP devices.

One of the omissions in Alcatel-Lucent’s positioning was, I think, negative for the company overall.  They did not offer specific linkage between their PSS metro family and SDN/NFV, though Alcatel-Lucent has highly credible solutions in both these areas.  Operators don’t want “service layer” activities or applications directly provisioning optical transport, even in the metro, but they do want service/application changes to influence transport configuration.  There is a significant but largely ignored question of how this comes about.  The core of it is the extent to which optical provisioning and management (even based on SDN) are linked to service events (even if SDN controls them).  Do you change transport configuration in response to service orders or in response to traffic when it’s observed, or maybe neither, or both?  Juniper, who has less strategic SDN positioning and no NFV to speak of, goes further in asserting integration.

I’m inclined to attribute the contrast here to my point on IP specificity.  Juniper’s approach is an IP “supercore” and Alcatel-Lucent’s is agile optical metro.  Because of its product portfolio and roots, Juniper seems determined to solve future metro problems in IP device terms, where Alcatel-Lucent I think is trying to prepare for a future where spending on both switches and routers will inevitably decline (without predicting that and scaring all their switch and router customers!).  They can presume “continuity” of policy; transport networks today are traffic engineered largely independent of service networks.  Juniper, by touting service protocols extending down into transport, has to take a different tack.

I’d hope that Alcatel-Lucent takes a position on vertical management integration in metro networks, even if they don’t have to do so right away.  First, I think it would be to their competitive advantage overall.  Every operator understands where networking is heading; vendors can’t hide the truth by not speaking it.  On the other hand, vendors who do speak it have the advantage of gaining credibility.  Alcatel-Lucent’s Nuage is unique in its ability to support what you could call “virtual parallel IP” configurations where application-specific or service-specific cloud networks link with users over the WAN.  They also have a solid NFV approach and decent OSS/BSS integration.  All of this would let them present an elastic approach to vertical integration of networks—one that lets either management conditions (traffic congestion, failures) or service changes (new service orders pending that would demand an adjustment at the optical layer) drive the bus.

With a story like this, Alcatel-Lucent could solve a problem, which is their lack of a significant server or data center switch position.  It’s hard to be convincing as a cloud player if you aren’t a server giant, and the same is true with NFV.  You also face the risk of getting involved in a very expensive and protracted selling cycle to, in the end, see most of the spending go to somebody else.  A cloud is a server set, and so is NFV.  Data center switching is helpful, and I like Alcatel-Lucent’s “Pod” switch approach but it would be far stronger were it bound into an interconnect strategy and a Nuage SDN positioning, not to mention operations/management.  That would build Alcatel-Lucent’s mass in a deal and increase their return on sales effort and their credibility to the buyer.

Most helpful, perhaps, is that strong vertical integration in a metro solution would let Alcatel-Lucent mark some territory at Cisco’s expense.  Cisco isn’t an optical giant, doesn’t like optics taking over any part of IP’s mission, doesn’t like purist OpenFlow SDN, NFV…you get the picture.  By focusing benefits on strategies Cisco is inclined to avoid supporting, Alcatel-Lucent makes it harder for Cisco to engage.  De-positioning the market leader is always a good strategy, and it won’t hurt Alcatel-Lucent against rival Juniper either.

I wonder whether one reason Alcatel-Lucent might not have taken a strong vertical integration slant on their story is their well-known insularity in product groups.  My recommended approach would cut across four different units, which may well approach the cooperation vanishing point even today.  But with a new head of its cloud, SDN, and NFV effort (Bhaskar Gorti), Alcatel-Lucent may be able to bind up what has traditionally been separate for them.  This might be a good time to try it.

More Signposts Along the Path to an IT-Centric Network Future

I always think it’s interesting when multiple news items combine (or conflict) in a way that exposes issues and market conditions.  We have that this week with the Cisco/Microsoft cloud partnership, new-model servers from HP, a management change at Verizon, and Juniper’s router announcements.  All of these create a picture of a seismic shift in networking.

The Cisco/Microsoft partnership is a Nexus 9000/ACI switching system with a Windows Azure Pack (a Microsoft product) to provide a hybrid cloud integration of Microsoft Azure with Windows Server technology in the data center.  The software from Microsoft has been around a while, and I don’t frankly think that there’s any specific need for a Nexus or ACI to create a hybrid cloud since that was the mission of the software from the first.  However, Microsoft has unusual traction in the hybrid space because Azure is a PaaS cloud that offers easy integration with premises Windows Server and middleware tools.  Cisco, I think, wants to take advantage of Microsoft’s hybrid traction and develop its UCS servers as a preferred strategy for hosting the premises part of the hybrid cloud.

This is interesting because it may be the first network-vendor partnership driven by hybrid cloud opportunity.  Cisco is banking on Microsoft to leverage the fact that Azure and Windows Server combine to create a kind of natural hybrid, and that this will in turn drive earlier deployment of Azure hybrids than might be the case with other hybrid cloud models.  That would give Cisco street cred in the hybrid cloud space.  The IT strategy drives the network.

One reason for Cisco’s interest is the HP announcement.  HP has a number of server lines, but Cloudline is an Open Compute compatible architecture that’s designed for high-density cloud deployments, and would also be a darn effective platform for NFV.  HP has a cloud, it has cloud software for private clouds, and a strong server position (number one).  If HP were to leverage all its assets for the cloud, and if it could pull hybrid cloud opportunity through from both the public cloud provider side (through a hybrid-targeted Cloudline positioning) and from the IT side (through its traditional channels) then Cisco might see its growth in UCS sales nipped in the bud.

A Microsoft cloud alliance won’t help Cisco with NFV, though, and that might be its greatest vulnerability to HP competition in particular.  Even before Cloudline, HP had what I think is the best of the major-vendor NFV approaches.  Add in hyperscale data centers and you could get even more, and my model still says that NFV will generate more data centers in the next decade than any other application, perhaps sell more servers.  I’d be watching to see if Cisco does something on the NFV side now, to cover that major hole.

NFV’s importance is, I think, illustrated by the Verizon management change.  CTO Melone is retiring, and the office of the CTO will then fall under Verizon’s CIO.  Think about that!  It used to be that the CTO, Operations, and CMO were the big names.  The only people who called on the CIO were OSS/BSS vendors.  Now, I think, Verizon is signaling a power shift.  CIOs are the only telco players who know software and servers, and software and servers are the only things that matter for the future.

Globally, CIOs have been getting more involved with NFV, but now I think it’s fair to say they may be moving toward the driver’s seat.  That’s a dynamic that will require some thinking, because of the point I just made on what CIOs have historically been involved with.  OSS/BSS vendors have more engagement with CIOs and OSS/BSS issues have taken a back seat from the very first meetings of the ETSI ISG.  Might this shift impact the vendor engagement?  It won’t hurt HP because they have a strong operations story, and obviously Ericsson and Alcatel-Lucent do as well, but Cisco will have to do a lot more if operations is given a major role.  Of course, everyone will have to address OSS/BSS integration more effectively than they have if the guy who buys the OSS/BSS systems is leading the NFV charge.

Speaking of network vendors, we have Juniper.  Juniper has no servers, and they don’t have a strong software or operations position either.  They can’t be leaders in NFV because they don’t have the central MANO/VNFM components.  I think they represent what might be the last bastion of pure networking.  Cisco, Alcatel-Lucent, Ericsson, Huawei all need more bucks and more opportunity growth than switching and routing can hope to provide.  All of them, as contenders for leader status in network equipment, will have to expand their TAM.  Juniper is likely hoping that with the rush to servers and software, there will be opportunity remaining in the network layers.

Will there be?  Truth be told it won’t matter for Juniper because there are no options left.  They can’t be broader players now; time has run out.  The union of IP and optics, at least part of the focus of their announcements, is inevitable and it will inevitably cap the growth of IP and Ethernet alike, working with virtual routing and switching driven by SDN and NFV at the technical level and by operators’ relentless pressure to reduce capex and opex.  It’s hard to see how a switch/router company only recently converted to the value of agile optics can win against players like Alcatel-Lucent or Ciena or Infinera or Adva, all of whom have arguably better SDN and NFV stories.

There are other data points to support my thesis that we’re moving toward the “server/software” age of networking.  Ciena already announced an NFV strategy and so now has Adva.  Alcatel-Lucent’s CEO said that once they’re done “shifting” they will likely focus more on services.  Logical, given that professional services are almost inevitably more important as the rather vast issues if the cloud and SDN and NFV start driving the bus.  Few vendors will field comprehensive solutions and operators want those.  They’ll accept consortium insurance where specific vendor solutions just aren’t available from enough players to give the operators a comfortable competitive choice.

All of these points demonstrate the angst facing network vendors, but adding to that is the fact that Huawei is running away with the market, racking up 20% growth when almost all the competition is losing year-over-year.  It’s Huawei that in my view renders the pure networking position untenable for competitors; everyone else will lose on price and network equipment differentiation is now almost impossible.  For five years now, vendors have played Huawei’s game, focusing their attention on reducing costs when the price leader in the market is sharpening their blade.  It may be too late to change that attitude, though Cisco at least is certainly trying.

We have a true revolution here.  It’s not the platitudes we read about, it’s the relentless march of commoditization driven by that compression of revenue/cost curves.  It’s the shift in approach to hosted software with greater agility, from monolithic specialized network hardware.   We are moving to an IT-driven future for networking and there is no going back now.