Facing the Future: Inside Vendor Failures to Sell NFV Transformation

I blogged recently on the views of the “literati” of network operator planning for NFV, and one blog dealt with what they thought vendors were doing wrong.  Yesterday, Ray Le Maistre from Light Reading did a video piece on the OPNFV event and made one of the same points.  The literati said that vendors don’t present their complete solution, and the LR piece said that operators said that OSS/BSS integration was lacking as a part of NFV solutions.  The truth is that at least a dozen vendors can provide just what operators want and that Light Reading said was missing.  What’s up?

I’ve noticed that when the same vendors who I know have a complete solution either send me stuff or talk with me, they seem to omit the very thing operators are saying they need NFV to provide.  I’ve also noticed that the vendors’ salespeople tell me that they aren’t getting the deals as fast as they’d like, which suggests that at the sales level there’s proof that there is an omission of capabilities going on.  So it’s not in the operators’ minds here, and Ray at Light Reading isn’t misreading the market.  We’re not singing the right song.

I’d like to be able to call on an objective outside resource like my literati to explain this, but of course you can’t expect somebody to tell you why someone isn’t telling them something. I have to present my own view of the vendor motives, and that’s what I’ll do here.

IMHO, the biggest problem is that vendors have been chasing media-generated NFV rainbows.  The NFV market is at this point so totally speculative that no responsible forecast for it could be done, but not only do we have forecasts, we have escalating ones.  The reason is that a story that says that NFV will sell five or ten or twenty or a hundred billion dollars in five or three or even one year is exciting.  People will click on URLs to read it.  For the forecasters, who buys a report that says there’s no market forecast for something?  Who buys one that builds a vision of an extravagant market?  In the first case, nobody.  In the second, every product manager responsible for the products for that extravagant market.

The natural consequence of heady forecasts is unrealistic expectations.  One large NFV hopeful confided that their quota for 2016 was probably going to end up larger than the total of NFV spending because that had been the case in 2015.  As pressure to perform against the forecasts mounts, companies look for quick deals and shorten selling cycles.  Nobody wants two-year projects to prove a business case here.  They don’t need it; after all, NFV is the way of the future and XYZ Analytics says so.

The second problem is that there is no PR benefit to having a full solution, so there’s no incentive to talk about one.  Everybody is an NFV player these days.  If you sell servers you’re a NFV infrastructure player, and those who offer Linux and middleware are virtualization and platform players.  Everyone who has anything that could be considered a network function can virtualize it (meaning make it cloud-hostable) and be an NFV virtual-function provider.  If you have OpenStack support or OpenDaylight you have “orchestration” so you have MANO, and of course if you provide any form of management you’re a shoo-in.  These are all conflated in a story or a press release, and we lose the distinction between somebody who can build a case for NFV deployment and those who expect to ride on the coattails of such a deployment.

NFV is really complicated.  Even the ETSI ISG, who declared a lot of the issues out of scope originally, are now admitting that we need things like service-level orchestration and federation across operators and implementations. Add in these new requirements and you rule out almost everyone as an NFV leader, someone who can make the business case, and relegate them to NFV exploiters who hope that first somebody else makes that business case and, second, that they can horn in on it.

The next problem is related, which is that operator drive for an open solution means nobody gains much by developing the market on their own.  A vendor who comes up with the right approach is essentially mandated to share it.  Given the enormous cost of building a complete solution and productizing it, it’s not a surprise that vendors don’t want to bear that cost without having a leg up on reaping the benefits.  Open approaches mean that they probably can’t do that.

The fact is that open strategies are going to have to be funded by either the buyer or by those vendors who win almost no matter what.  An example of the former is the initial focus on standards and open-source, and now the operator-created architectures.  Intel is an example of a vendor who wins no matter what; whose chips are in all those boxes?

Problem number four is that sales organizations have no collateral to support the kind of operations-integrated sale that Ray from Light Reading heard operators want to see.  Some of the salespeople for those six vendors who could do what operators want literally don’t know that they can.  More don’t know how to promote it or who within the operator organizations to promote it to.

Some of this problem is surely related to the issue of sales quotas an instant gratification, but that’s not the whole story.  This, as I said, is complicated and sales organizations who are used to selling interchangeable boxes into an established demand aren’t prepared to build a business case and sell a complex solution to a complex problem.

And there is a final problem, organizational culture is holding back success.  Ray made that point in his video, and we’ve all heard that operators have to change their culture to make NFV succeed.  Well, perhaps there are some issues with the culture of operators, but the truth is that it’s the vendor culture and not the operator culture that are the problems today.

Transformation to a new model of networking means, for vendors, that the easy happy days of having a buyer come to you and say “give me a dozen routers” is gone.  Instead they’re coming with a request for you to make their services more agile and their operations more efficient, and by the way do that at a lower overall cost than they’d been presented with before.  You can’t really even make money selling professional services to accomplish these goals, because without an understanding of how to do what the buyer wants you can’t make a business case for doing anything at all.

All of this explains why NFV has failed to live up to heady expectations, but it doesn’t mean it never will.  There are benefits to NFV, and in many areas they’re compelling.  We tried to advance to NFV on a broad front, and we simplified ourselves into failure doing that.  Breadth is breadth in complexity as well as in opportunity.  Now we’re moving on multiple narrow fronts, constrained more by the fact that we’re not sure all our steps will add up to progress and not silos.  Operator architectures are guiding the framework for unification and federation, a framework we should have had from the first.

But we could still mess this up.  Will every operator do an ECOMP like AT&T, or even be able to adopt AT&T’s own model if they open-source it?  Benefits justify an architecture, architectures frame product elements, and product elements are what buyers purchase.  Making the connection at the top is what makes the world go ‘round, and more attention to that top-level process is essential, even and perhaps especially for the media.

How We Can Create a Framework for Open VNFs and Easy Onboarding

Earlier this week in my blog I talked about a model-driven open approach to NFV that focused on deployment elements, meaning infrastructure.  I introduced the notion of virtual function classes as the framework for VNF openness too, and I want to follow up on that today.

The general point of my earlier blog was that for openness to happen, you had to define a series of abstractions or functional models that represented the base class of capabilities you wanted, then define the various ways that those capabilities could actually be decomposed onto infrastructure.  You assemble functional models into services in a generic way, then commit functional models to infrastructure in whatever way the infrastructure demands.

This same approach can be applied to Virtual Network Functions.  The high-level functional model (“Firewall”) might be decomposed into one of two technology-based models—“RealDevice” and “FirewallVNF”.  The latter is the combination of the VNF software and the hosting of that software on infrastructure.  We talked about the latter yesterday, and so it’s the VNF software we’re going to address today.

All firewall VNFs should be equivalent, meaning that all the ways of instantiating a virtual network function of a given functional class should be interchangeable and identical when looked at from the outside.  Clearly all the software that might be used for a firewall VNF doesn’t take the same parameters, expose the same interfaces, or have the same management capabilities, so the harmonization of those differences has to be framed in some way.

That harmonization has to start, as the whole modeling approach had to start, with an abstraction that represents the function class—“FirewallVNF” in this case.  It’s my view that this function class would be an extension (for software types, think of a Java Class that “Extends” a base class) of the base class of “AnyVNF”.  If we adapt this approach to the ETSI ISG’s approach, the base class would define the scope of the necessary APIs that were exposed by the centralized part of the VNF Manager (VNFM).  This base class might expose a parameter port, a management port, a map of “internal ports” to interconnect elements, and a set of “service ports” that will connect outside.

The goal of onboarding a given VNF is now to map to that FirewallVNF template (and by inference to the AnyVNF set).  The responsibility for that, again using ETSI’s concepts loosely, devolves on the “distributed” part of the VNF manager.  You have a VNF that’s a firewall implementation.  You add to that VNF the necessary custom logic to make it appear in all respects as an instance of FirewallVNF, which means that all of the management, deployment, and functional characteristics of that implementation are mapped to the common model.  Once that’s done, you can deploy any of those implementations.

There are five beauties to this approach, IMHO.  The first is that it solidifies what I think has been an incredibly fuzzy concept—the VNFM.  Second, it defines a very specific goal for openness—which is to harmonize something to a function class.  Third, it makes the validation of a given function’s adherence to the open model a simple matter to test.  You could build a test jig into which instances of the function class (implementations of FirewallVNF in our example) could be inserted and validated.   Fourth, all of the integration happens in matching the implementation to the function class, so it’s the responsibility of the VNF vendor or an agent thereof.  What’s handed off is a complete, open, useful, package.  Finally, anyone can run these tests in the same framework and get the same result.  Vendors and operators can do testing for themselves, and any “lab” can recreate the “experiment” just like science generally requires.  We don’t have to establish or accept some central body as responsible for testing.

Vendors should like this model because it makes their own labs and integration testing easier because it provides them an explicit target to hit—matching to the functional class.  Operators should like it because most of their own onboarding efforts would be expended for new functional classes; basic testing or certification of vendor labs would cover the specific integration issues.  VNF vendors should like it because it could simplify their own integration.

I emphasize the “could” here because the question of who defines the functional classes remains.  This would be a worthy activity for some group like the OMG or even the ETSI NFV ISG, with the cooperation of operators and vendors.  I suspect that at least the “AnyVNF” base class would have to be defined by the NFV ISG because its requirements relate to the interface between VNFs and the rest of the NFV world.  The management stuff would also have to be integrated here.

This might prove to be the big hole in the functional class approach for VNFs.  There are, IMHO, a lot of open questions on management for NFV.  I suggested in an earlier blog that the ISG needed to incorporate the notion of Infrastructure-as-Code from the cloud, and that notion has at least the potential for defining the use of events to link services and resources when linkage is required for service lifecycle management.  That poses two challenges; what events get generated under what conditions, and how do you reflect events in the ETSI model in the first place.

This issue can be finessed (and probably should be) in part by saying that resource events are directed to the functional models for the service and not to the VNFs.  If that’s done, then all we need to worry about is how to harmonize the failure of a real device versus its equivalent virtual function.  If a firewall device fails, we need to communicate that to the service level.  If a FirewallVNF fails, the same has to be true if we follow the precepts of our open model.  That means that part of the base class AnyVNF stuff has to be able to recognize the state of the VNF just as a management system would recognize the state of firewall hardware.  One path to that is to use the SNMP MIBsets for the basic “real” devices as the foundation data.  That would work even if we didn’t assume “event generation” to drive service changes and instead relied on testing the state of resources and VNFs.

I think all of this demonstrates that the techniques for creating open VNFs or facilitating the integration and onboarding of VNFs has to be based on the same notion of a functional-class model set that can be composed from above and instantiated below.  That way, it integrates with resource allocation during deployment, redeployment, and other lifecycle stages.  In VNF onboarding and integration, as in network infrastructure accommodation, we’re trying to resolve detailed issues without a suitable architecture framework to guide us.

This is where I think vendors and operators need to guide things.  The architecture models for next-gen networks promulgated by operators in various parts of the world demonstrate (and in some cases explicitly say) that the ISG framework for NFV and the ONF framework for SDN aren’t sufficient.  You can’t just cobble together standards at a low level and expect them to add up to something optimal, or even useful, at a high level.  Harmonizing the architecture of VNFs is critical and I’d love to see vendors present their approaches, and base them more on the cloud than on abstract interfaces.  A good one could form the basis for broad deployment.

Achieving Openness in NFV

What operators want most from their next-gen infrastructure (whether SDN or NFV or the cloud) is openness.  They feel, with some justification, that equipment vendors want to lock them in and force them down migration paths that help the vendor and compromise operator goals.  Networks in the past, built on long-established standards that defined the role of devices, were considered fairly open.  It’s far from clear that goal can be achieved with the next generation.

You can make anything “open” if you’re prepared to spend a boatload of cash on professional services and suffer long delays when you commission a piece of hardware or software.  That all flies in the face of a transformation to be driven by efficiency and agility.  What you need is for pieces to fit because they were designed to fit, and specifications that insure that you realize the goal of “fitting” that you’ve designed.

Everyone I’ve talked to in the operator community (CxOs and “literati” alike) believe that an open environment has to be based on three things.  First, an architecture model that makes the business case for the migration.  Second, a series of functional models that define open elements that can then be made interchangeable by vendors/operators through “onboarding”.  Finally, validation through testing, plug fests, etc.

The problem we have today in realizing openness is that we don’t have either of the first two of these, and without them there’s little value in validating an approach because there’s no useful standard.  There doesn’t seem to be much of a chance that a standards group or even open-source activity is going to develop either of the missing pieces either.  Vendors, even the half-dozen who actually have a complete model, don’t seem to be promoting their architectures effectively, so what we’re now seeing is a set of operator-driven architecture initiatives that might result in a converging set of models, or might not.  Fortunately, we can learn something from them, and in particular learn why that second point of requirements for openness is so critical.

“Open” in IT and networking, means “admitting to the substitution of components without impacting the functionality of the whole.”  That almost demands a series of abstractions that represent classes of components, and a requirement that any component or set of components representing such a class be interchangeable with any other within that class.  I think that this divides what’s come to be called “orchestration” and “modeling” into two distinct areas.  One area builds from these functional models or component classes, and the other implements the classes based on any useful collection of technology.

Let’s return now to the bidirectional view of these functional models.  Above, you recall, they’re assembled to create services that meet an operator’s business needs.  Below, they’re decomposed into infrastructure-specific implementations.  With this approach, a service that’s defined as a set of functions (“network functions” perhaps in NFV terms) could be deployed on anything that could properly decompose those functions.  If infrastructure changes, a change to the lower-layer decomposition would update the service—no changes would be needed at the service level.

The service structure could be defined using TOSCA, where my functions are analogous to high-level application descriptions.  It could also be defined using the TMF’s SID, where my network functions would be analogous to either customer-facing or resource-facing services.  That means it should be largely accommodating to OSS/BSS as long as we frame the role of OSS/BSS to be the management of the CFS and RFS and not of “virtual devices” or real ones.

Decomposing a function requires a bit more attention.  Networks and services are often multi-domain or multi-jurisdictional.  That means that the first step in decomposing a function is to make a jurisdictional separation, and that’s complicated so let’s use a VPN as an example.

Let’s say I have a North American VPN that’s supported by AT&T in the US, Bell Canada in Canada, and TelMex in Mexico.  My first-level decomposition would be to define three administrative VPNs, one for each area, and assign sites to each based on geography.  I’d then define the interconnection among providers, either as a gateway point they had in common or a series thereof.  In the complex case I’d have six definitions (three area VPNs and three gateways), and these are then network functions too.

For each of these network functions, I’d then decompose further.  If a given operator had a single management API from which all the endpoints in their geography could be provisioned, I’d simply exercise that API.  If there were multiple domains, technology or otherwise, inside one of these second-level functions, I’d then have to decompose first to identify the proper domain(s) and then decompose within each to deployment instructions.

This description exposes three points.  First, there’s a fuzzy zone of network function decomposition between the top “function” level and the decomposition into resource-specific deployment instructions.  Is my administrative separation, for example, a “service” function or a “resource” function?  It could be either or both.  Second, it’s not particularly easy to map this kind of layered decomposition to the ETSI processes or even to traditional SDN.  Third, the operator architectures like AT&T’s and in particular Verizon’s calls out this middle layer of decomposition but treats it as a “model” and not specifically as a potentially n-layer model structure.

All of which says that we’re not there yet, but it gets a bit easier if we look at this now from the bottom or resource side.

A network’s goal is to provide a set of services.  In virtual, complex, infrastructure these resource-side services are not the same as the retail services—think the TMF Resource-Facing Services as an example.  I’ve called these intrinsic cooperative network-layer structures behaviors because they’re how the infrastructure behaves intrinsically or as you’ve set it up.  SDN, NFV, and legacy management APIs all create behaviors, and behaviors are then composed upward into network functions (and of course the reverse).

Put this way, you can see that for example I could get a “VPN” behavior in one of three ways—as a management-driven cooperative behavior of a system of routers, as an explicit deployment of forwarding paths via an SDN controller, and by deploying the associated virtual functions with NFV.  In fact, my middle option could subdivide—OpenDaylight could control a white-box OpenFlow switch or a traditional router via the proper “southbound API”.

The point here is that open implementations of network functions depend on connecting the functions to a set of behaviors that are exposed from the infrastructure below.  To the extent that functions can be standardized by some body (like the OMG) using intent-model principles you could then assemble and disassemble them as described here.  If we could also define “behaviors” as standard classes, we could carry that assemble/decomposition down a layer.

For example, a behavior called “HostVNF” might represent the ability to deploy a VNF in a virtual machine or container and provide the necessary local connections.  That behavior could be a part of any higher-layer behavior that’s composed into a service—“Firewall” or even “VPN”.  Anything that can provide HostVNF can host any VNF in the catalog, let’s say.

The notion of functional behaviors is the foundation for the notion of an open VNF framework too.  All virtual network functions, grouped into “network function types”, would be interchangeable if all of them were to be required to implement the common model of the function type they represented.  It would be the responsibility of the VNF provider or the NFV software framework provider to offer the tools that would support this, which is a topic I’ll address more in a later blog.

Openness at the infrastructure level, the equipment level, is the most critical openness requirement for NFV for the simple reason that this is where most of the money will get spent as well as where most of the undepreciated assets are found today.  We can secure that level of openness without sacrificing either efficiency or agility, simply by extending what we already know from networking, IT, and the cloud.

SD-WAN’s Potential as a Game-Changer is Growing

Software-defined WAN is one of those terms that’s vague enough to be applicable to a lot of things, but the core reality is a fairly classic “overlay model” of networking.  An overlay network is a layer on top of the “network protocol” of a real network, an overlay that provides the connectivity service to the user and uses the real network protocol (or protocols) as a kind of physical layer or virtual wire.

We’ve had SD-WAN overlay concepts around for many years.  In a very limited sense, many IP networks are overlays that add an IP layer to Ethernet or another Level 2 technology.  The whole OSI model in fact is a series of overlays.  But the SD-WAN concept’s real birth was probably the work done by Nicira, the cloud-network player who was bought by VMware.  Nicira recognized that you could use overlay networks to build connectivity for cloud data centers without the constraints that exist in “real” network protocols and without the need to involve physical network devices in what was “virtual connectivity”.  SD-WAN technology extends this model, one of the forms of software-defined networking or SDN, to the WAN.

The early SD-WAN products aim at one of several major goals.  First, they create a very agile connection-layer service that can easily build a virtual private network without the headaches and costs of something like MPLS.  Second, they can build a unified virtual network across locations that don’t have a common real-network connection.  Users like this because they can use traditional MPLS VPNs in major sites and add in minor sites or even transient locations through an SD-WAN that supports both underlayments.  Finally, they can use the Internet as an adjunct to private VPNs or to create a wider pipe for a period of time.

SD-WAN has carved out a nice but not enormous market in these areas, and while all of them are valuable it’s not likely that these three drivers would result in explosive growth.  That doesn’t mean that SD-WAN doesn’t have potential—it may end up being as important or more important than SDN or NFV, and in fact be a critical enabler of both.

One obvious mission for SD-WAN is the creation of agile virtual networks over a combination of traditional L2/L3 technology and SDN.  Using SD-WAN I can build a VPN by creating SD-WAN tunnel meshes of all my endpoints over a common, evolving, or disparate underlayment.  If I use NFV to deploy interior router instances, I can create a virtual topology of nodes and trunks (both virtual, of course) that aggregates traffic and eliminates the potential inefficiency or lack of route control that meshing could create.

The use of SD-WAN could mean that connectivity, meaning the service connection layer, resides only in endpoints or in hosted instances that are dedicated to a service or user.  This would make provisioning new endpoints or changing services trivial; no real hardware would even have to know.  If my underlayment offered grades of service representing QoS and availability options, I could offer various SLAs from this, and change QoS and SLAs easily without interrupting anything.

If I used SDN to build only virtual wire tunnels, I could build services entirely from software elements.  This would clearly scale to support VPN/VLAN services and probably also to support content delivery networks, mobile Evolved Packet Core, and even IoT.  With some reasonable way of managing Internet services as predictable sessions, I could probably support most of the demanding Internet applications for content delivery and ecommerce.

In both SDN and NFV and in the cloud, you have a presumptive “virtual network” built in data centers and across data center interconnect (DCI) trunks.  In the cloud, these virtual networks are most often subnetworks linked to the Internet or a VPN, meaning that they’re a homogeneous part of a larger network.  In NFV, the presumption is that these networks are truly private, with only selected ports adapted into a broader address space.  SDN and SD-WAN could create a hybrid of these models.

Suppose I build an SDN network to link application components in a data center complex.  Yes, I can gate these to a larger address space like always, but suppose that instead I consider these to be a series of separate networks, independent.  Now suppose that I have a bunch of users and organizations out there.  With the proper software (SD-WAN, in some form) I link users or groups of users (themselves formed into a separate network) with applications selectively.  I don’t have a uniform address space for every user, I have a composed address space.

Composed address spaces may seem hokey, but they go back a long way conceptually.  When switched-virtual-circuit services like frame relay and ATM came along, there were a number of initiatives launched to exploit these (if they became widespread, which they did not) for IP.  One of the protocols invented for this was called the “next-hop resolution protocol” or NHRP.  With NHRP you surrounded an SVC network of any sort (the standard calls these networks “NBMAs” meaning non-broadcast multi-access to show that traditional IP subnet processes won’t work) with a kind of IP ring, where each ring element maintained a table that showed the NBMA address for each remote subnet.  When traffic arrived there, the ring station simply looked to see if the associated ring station for the subnet was already connected, and if not connected it.  SD-WAN composed address spaces could be similar, except that they translate a remote application address in its own space to the address space being composed.

These features all make SD-WAN the perfect companion to SDN and NFV because they illustrate that SD-WAN can virtualize the service layer independent of what’s happening to layers below.  That’s the value of overlay technology.  The agility benefits, and the ability to decouple service-layer connectivity from transport infrastructure, are so profound that I think it’s fair to say that SD-WAN technology is the best option for VPN/VLAN services.  In addition, its linkage to cloud data center connectivity means that it could revolutionize NFV managed services, cloud computing service delivery, and even (by creating explicit connectivity versus permissive networking) security.

Obviously SD-WAN is a threat to established service-layer paradigms, particularly box-based IP, and for that reason it’s going to be heavily opposed by the big-name players like Cisco.  The question is whether those who could benefit from it, including obviously the SD-WAN vendors but less obviously companies like Brocade who have world-class software routers and Nokia whose Nuage SDN architecture has all the right interior components, will position their assets better.  Complementing SD-WAN or (in the case of Nuage, offering an alternative implementation model) could be a big win for them, and ultimately for the market.

The big question will be whether the network operators or managed service providers jump on SD-WAN technology in a big way.  Verizon has launched an SD-WAN service and MetTel has a competitive-carrier SD-WAN service aimed at the MPLS VPN space.  None of these has so far created a serious threat to incumbent VPN technology, but the potential is there, and a big shift could come literally at any moment.  SD-WAN could even be a first-deployed technology, paving the way for adoption of SDN and NFV by disconnecting the service layer from infrastructure.  We’ll have to watch for developments, and I’ll blog on the important ones as they occur.

What Operator Experts Think is Wrong with Vendor NFV Strategy

If you are a company with aspirations in the SDN or NFV markets, then operators themselves say you have a problem.  In fact, you probably have more than one problem, and those problems are hurting your ability to engage customers and build revenue.  This is a message from those same literati I talked “tech-turkey” with last week, and again it’s interesting that their views, my views, and vendor views of the issues are both congruent and different.

I’ve noted in past blogs that SDN and NFV salespeople have complained to me that their markets are not moving, they’re not making their sales goals as a company, and they’re frustrated by what they see as the intransigence of buyers.  The key phrase from their emails is “The buyer won’t…” as though salespeople had either the right or ability to simply expect that buyers would adopt a frame of reference that’s convenient to the sales process.  What do operators see, through the literati, see?

The number one problem with vendors, according to operator literati, is “they act like SDN and NFV are decisions already made and all they have to do is differentiate themselves from the competition.”  This, when 100% of the CFOs I’ve talked with or surveyed say that they still can’t make a broad-based SDN or NFV business case and so there is no commitment as yet to either technology.

I’ve done sales/marketing consulting for a lot of years, and one point I’ve always made is that there are three message elements in positioning your offering.  The first and most important are the enablers, meaning the features and value propositions that can make the business case for a deal.  Second are the differentiators that make you stand out from others who can “enable” too, and last are the objection management statements that can put to rest mild issues of resistance or credibility.  What the literati are saying is that vendors aren’t enabling SDN or NFV, and so there’s not really much of a market to compete for.

There’s unanimity among the literati on this first problem, but not on what the next one is.  About half the literati say that problem two is that vendors don’t address the complexity of getting support for their projects from all the relevant buyer constituencies.  “The CTO doesn’t have a deployment budget and can’t make a deployment decision,” says one literati who happens to work in the CTO organization.  The other half say that vendor views of the market are set by the media, who in turn are setting their views from vendors.  “Most of what these salespeople tell us is what they read somewhere, and at the same time they know that their own company is promoting analysts and writers to say that very stuff.”

It’s not going to surprise you to hear that I believe that the media processes in the tech industry took a turn over two decades ago when subscription publications were replaced by ad-sponsored controlled-circulation pubs.  At one point, I had an opportunity to review the reader service cards for a mainstream network rag, and the total value of purchases the respondents said they made decisions for was at least triple the total market.  Processes took another turn with the online shift, because serving an ad happens pretty much as you click a URL, where you have a better chance of seeing a print ad the longer you’re on the page.  At any rate, what we see and hear and read is increasingly set by vendors.  Even if you assumed it was all true (which obviously it is not) then it makes no sense for salespeople to simply mouth the same story.  Why should a buyer even bother to take a sales call if that’s what happens?

The engagement issue is probably the longest-standing problem, and it’s related to another issue the literati brought up, which is that vendors don’t understand anything about my business.  I remember consulting with a switch/router vendor two decades ago, and pointing out to them that the diagram they were showing for network evolution by US operators was in fact a violation of the regulatory framework that governs the industry.  Operators used to send me moaning emails making that same point, and they saw it as an indication that their vendors didn’t take the trouble to understand the customer.

The thing is, there’s more than one customer.  A transformation like SDN or NFV would bring has to be lab tested, network operations-tested, CIO/OSS/BSS-tested, pass CFO muster, and get CEO and executive committee approval.  All these constituencies have to buy in, all will do so if their own issues are addressed, and vendors tend to expect their own sales contacts to run the ball internally, which in most cases can’t be done because the internal operator groups probably no less about each other than the vendors do.

And the literati say vendors don’t present their total solution either.  SDN and NFV are not monolithic.  Generally speaking, you have a combination of IT infrastructure on which stuff will be deployed, facilitating software that handles the virtualization and deployment processes, and operations and management tools and processes that manage the commercial offerings and the sales/support processes.  In all of the SDN or NFV vendors I’ve talked with, these three pieces of tech transformation are different profit centers, or they’re not even present (the buyer, seller, or integrator would have to add in stuff from outside).

How many cars would an auto giant sell if you had to talk to a showroom salesperson about the car, another about the engine, yet another about tires and the seats?  “Infrastructure” isn’t seamless but it has to be cohesive.  Yet I’ve listened to vendors who won’t talk about NFV orchestration because they want to sell servers and platforms, and others who won’t talk OSS/BSS because that’s either another business unit or it’s a partner company.

There seems to be a “vendors are from Mars and operators from Venus” thing going on here.  I think part of the reason is that vendors are looking for profits in the next couple quarters and transformation is seen by telcos as a three-to-five-year process.  Another part is that vendors are used to selling equipment into what could be called an established paradigm, not to working to invent one and then sell into it.  Finally, they are used to the telco’s own internal processes taking “successful” trials into production, where today the trials don’t have a broad enough scope to make the business case.

One operator literati made what might be the definitive comment on all of this, relating to the tendency for vendors to go after tactical service-specific NFV and SDN projects.  “These services that they’re talking about, if you presumed they were 100% converted to NFV hosting, and assuming they delivered on the benefit case promised, would make a difference for us that’s a rounding error on our bottom line.”

You can’t easily creep by midget steps into NFV because none of the steps make a visible difference in profits.  Somehow, vendors have to convince operators that they can do more than creep.  The disconnect vendors face now makes that hard, but far from impossible, because the literati say the operators want vendors, and NFV, to succeed.

Should NFV Adopt “Infrastructure as Code” from the Cloud?

From the first, it was (or should have been) clear that NFV was a cloud application.  Despite this, we aren’t seeing what should then have been clear signs of technologies and concepts migrating from the cloud space into NFV.  One obvious example is TOSCA, the OASIS Topology and Orchestration Specification for Cloud Applications, which has been quietly increasing its profile in the NFV space despite a lack (until recently) of even recognition in the ETSI NFV activity.  But I’ve talked about TOSCA before; today I want to look at “Infrastructure as Code” or IaC.

IaC is a development in the DevOps space that, at first glance, is actually kind of hard to distinguish from DevOps.  Puppet and Chef both talk about it, and Amazon has picked the notion up (along with Chef) in its OpsWorks stuff.  The explanation of just why we have IaC and DevOps independently is not only useful for IaC, it’s also instructive in how NFV’s own management and orchestration should be expected to work.

Any virtualized environment is a combination of abstraction and instantiation.  You have an abstract something, like an application or virtual function, and you instantiate it, meaning that you commit it to resources.  In software, “DevOps” or “Development/Operations” described an initiative to transfer deployment and later lifecycle management information from the application development process forward into data center operations.  Because both virtualization and DevOps end up with deploying or committing resources, the similarity at that level overwhelms an underlying difference—one is really a layer on the other.

DevOps is about the logical organization of complex (multi-element) applications.  But if you’re doing virtualized resources, the resources have a life separate from that of applications.  A server pool is a server pool, and some VMs within it are the targets of traditional DevOps deployment.  But not only does the existence of a resource pool versus a specific resource complicate the deployment, it also separates the management of resources as a collection from the management of applications.

The DevOps people, especially market leaders Chef and Puppet, were among the first to see this and to reflect it in their products through the addition of resource descriptions.  You could describe what was needed to commission a resource in a pool or independently, just as you could describe what was needed to commission an application.  Rather than trying to tie the two tightly, these evolving changes reflected an interdepencence.  They created another side to the DevOps coin, and that other side became known as IaC, to reflect the fact that DevOps-like tools were to be used to commission resources and to handle their lifecycle management.

It’s my view that what makes IaC a critical concept in DevOps and the cloud should also make it critical in NFV, and probably in SDN too.  Resources are always separate from what they’re resources for, meaning separate from the deployment of applications/components or the threading of connections.  The mission that commits them—which we could call a “service” or an “application”—is one deployment and management domain, and the resources themselves are another.  Logical, and perhaps even compelling, but not something we hear about in NFV.

It’s also interesting to note that what the DevOps community seems to be doing (or moving to do) is supporting the “interdependence” I talked about earlier by providing an event-based link between the DevOps and IaC processes.  The two are separate worlds when everything is going normally, but if the IaC operations activities are unable to sustain the resource lifecycle properly, then they have to trigger a DevOps-level activity.

An example here is that of a failed instance of a component or virtual function.  You might have a resource-level process that attempts to recover the lost component by simply reloading or restarting it, or by instantiating a new copy local to the original.  But if you need to spin that copy up in another data center, you need to make connections that are outside the domain of the resource control or IaC processes and you have to kick the problem to another level.

This shows a couple of critical points.  Obviously some resource conditions, like the failure of a resource not currently committed to anything, has to be handled at the “IaC level” by NFV, whether we have such a function or not.  You wouldn’t want to deal with that kind of failure only when you tried to deploy.  Second, there are some kinds of failures of resources that could be handled at the resource level alone, and others that would require higher-level coordination because multiple resource types are involved.  There are also things, like a service change initiated by the customer, that could require high-level connection/coordination first, but might then require something at the resource level—setting up vCPE devices on prem for example.

Virtualization introduces not two layers, potentially, but three.  We have services/applications and resources, but also “mapping”.  Resource management is responsible for maintaining the pool, services/applications for maintaining what’s been hosted on the pool, but the “virtualization mapping” or binding process is itself dynamic.  The trend with the cloud and IaC seems to be to presume that resource issues, including mapping issues, are reported as service/application events.  With NFV there is at least an indication of a different approach.

Arguably, NFV presents a three-layer model of “orchestration” (which, by the way, Verizon’s architecture makes explicit).  You have services, then MANO, then the Virtual Infrastructure Manager (VIM).  None of these three layers correspond to IaC because pure resource management is out of scope.  Service-layer orchestration is recognized but not described either.  Presumably, in NFV, resource conditions/events that impact orchestrated deployments are reflected into the VNF Manager.  The MANO-level orchestration is where mapping/binding management is sustained, meaning that any “resource” problems that aren’t automatically remediated at the resource level are presumed to be handled by MANO.  IaC would then be “in” or “below” the VIM.

Where or why, you may be thinking is the integration of IaC with NFV important?  My view is that lifecycle management has to be coordinated wherever there are layers of functionality.  Logically if we have cloud-like IaC going on with NFV resources, then that IaC should be the source of “events” to signal for the attention of higher-layer lifecycle processes, be they MANO or service/OSS/BSS.  If I have an “issue” with resources, the IaC gets the first shot, then signals upward to (hypothetically) VNFM/MANO, and then upward to OSS/BSS or “super-MANO”.

The ETSI ISG has generally accepted the notion of a higher orchestration layer, which is good because the network operators are writing that into their approaches.  The only thing, as I’ve said before, is that if you have multiple orchestration layers you have to define how they communicate, and that should be somebody’s next step.  Defining APIs implies a non-event interface, and there is no question that all of the layers of orchestration create parallel, asynchronous, processes that can’t communicate except through events.

More broadly, the IaC point is another example of what I opened with, the need to contextualize NFV in light of other industry developments in general, and the cloud in particular.  The cloud is much more advanced than NFV in both technology thinking and market acceptance.  It’s framing tomorrow’s issues for NFV today.  The NFV ISG has, like most standards groups, set narrow borders for itself to insure it can make progress rather than “boil the ocean”.  That’s fine as long as the rest of the technology landscape can be harmonized at those borders, and I think IaC makes it clear that efforts to do that may be getting outrun by events.

What the Court Ruling on the Net Neutrality Order Really Means

The DC Court of Appeals has upheld the FCC’s latest neutrality order, and as is nearly always the case with these regulatory things the move has created a combination of misinformation and apocalyptic warnings.  However, it’s always dangerous to think that these regulatory moves are just the usual political jostling.  The telecom industry has been driven more by regulation than by opportunity, and that may still be the case.

In the Court of Appeals decision, the court cited the troubled and tangled history of regulating consumer broadband and the Internet.  The FCC’s position evolved as the mechanism for connection evolved.  In the beginning, we accessed the Internet over telephone connections.  Today we access telephone services over the Internet, and this is the shift that more than anything justifies the FCC’s position that regulations should be changed to accommodate market reality.  The order in question here was the third attempt to do that—both previous attempts failed because the courts said the FCC didn’t have the statutory authority to regulate as it wanted since it had previously classified the Internet and broadband as information services.  That led to the latest order from the FCC, which classified them as common carrier utility services, and the court has now upheld that decision.  There are some important points behind this, though.

First, in telecom regulation the FCC is not a legislative body but what’s called a quasi-judicial agency, meaning that in these matters the FCC is equivalent to the court of fact.  An appeal from a court of fact cannot be made on facts, but must be made because there’s a possibility that the law was incorrectly applied.  Thus, the court did not agree with the points the FCC took in the order, just the FCC’s right to make those points.

Second, the fact that the FCC has declared broadband Internet to be a common carrier service doesn’t mean that all the burdens of common carrier regulation will be applied to it.  The Telecommunications Act of 1996, which forms the legal foundation for the FCC’s regulations, provides (in the famous Section 706) that the FCC can “forbear” from applying such regulations as needed to promote Internet availability.  The FCC has already said it’s doing that in the order in question.

Third, the issue could in theory be appealed to the Supreme Court and that could change the result.  There have been cases where the Supreme Court disagreed with Court of Appeals findings on telecom, and I’m not qualified to judge the fine details of these issues so I won’t comment on the chances of appeal or of success.  We’ll have to wait.  I do want to say that I’ve read every FCC order on the Internet and the text of every court ruling, and I think I have some understanding of the tone.  In this order, the DC Court of Appeals was pretty firm.

The end result is that an order that disappoints pretty much every group in some way has been upheld, and for years the ruling of the court is likely to remain law whether there’s an appeal or not.  They say that the sign of a good ruling is that everyone dislikes something about it, in which case this is a good one.  The question is where it leaves us.

First, the fact that there is equal treatment for mobile and fixed broadband in virtually all neutrality matters means that there’s no value to operators to shift users to a mobile access technology just to get more favorable regulatory treatment.  I think there will continue to be great operator interest in moving those wireline customers who can’t justify FTTH to fixed wireless simply for infrastructure economy, though.  The deciding issue is content delivery, because fixed wireless isn’t likely a good way to deliver streaming multi-channel video to an ever-more-demanding HDTV market.  Particularly competing with cable, which isn’t likely to change.

Second, the FCC is not going to force wireline or wireless ISPs to unbundle their assets so despite dire comments to the contrary, the fear of that isn’t likely to deter broadband investment.  It’s also not going to spawn another stupid CLEC-type notion that reselling somebody else’s infrastructure is “competition”.  The big problem with broadband investment is ROI, and that’s the next point.

Three, the big problem with the order was, and still is, the whole notion of paid prioritization and interconnect policy.  I understand as much as any Internet user the personal value of having streaming video available to me at the lowest possible cost—free if possible.  I also understand, as a long-standing industry/regulatory analyst, that something that drives up traffic without driving up revenue is going to put pressure on return on infrastructure investment.  Having settlement for content peering and having paid prioritization could make broadband Internet access more responsive to traffic cost trends.  The order removes that possibility.

So what does this all add up to?  There were really two parts to the Neutrality Order and we have to look at them independently.

The first part is largely what the Court ruled on—the authority part.  The FCC has finally done what it had to do (and should have done from the first) by asserting that broadband Internet was a telecommunications service, which gives it the authority to regulate it fully.  The FCC seems to have navigated the “forbearance” point well, so the risks it’s created to the market by the way it established its authority is minimal.

The second part is the policy part, which is what the FCC intends to do with its authority.  Here’s where I personally parted with the Commission on the Order and where I remain at odds.  The Neutrality Order that was promulgated by the Genachowski FCC (the current one is the Wheeler FCC) was in my view more sensible in that it seemed to explicitly allow the FCC to address abuses of either settlement-based interconnection or paid prioritization, but didn’t foreclose either option.

The important point here is that there was a policy difference between the orders, and that nobody really said that the FCC couldn’t change its approach even though the reason for reversing prior orders was in the “authority” area.  The FCC is not completely bound by its own precedent; it can adapt the rulings to the state of the market and it’s done that many times in the past.  Thus, the current Neutrality Order could still be altered down the line.  A new administration, no matter which party comes to power, usually ends up with a new FCC Chairman, and that new Chairman might well decide to return to Genachowski’s views, or formulate something totally different from the views of either of the two prior Chairmen.

And, of course, there’s Congress.  The parties, not surprisingly, are split on the best policies, with Democrats favoring a consumeristic-and-OTT-driven vision and Republicans favoring network operators.  However, since 1996 there have been many Congressional hearings on Internet policy and precious little in the way of legislation (most would say “none”).  We may well see the usual political posturing on the fact that the Court didn’t reverse the FCC, but we’ll probably not see Congress really act.

There are new revenue opportunities besides things like settlement or paid prioritization, but they are either targeted at only business services (which isn’t where profit-per-bit is plummeting) or they’re above the connection layer of the network, where OTT players compete.  The FCC order largely forecloses a revenue-based path out of the profit dilemma, which leaves us only with cost management.  Going that way is going to continue pressure on network vendors and ultimately risks curtailing network expansion.

What the Network Operator Literati Think Should Be Done to Accelerate NFV

I am always trying to explore issues that could impact network transformation, especially relating to adopting NFV.  NFV offers a potentially radical shift in capex and architecture, after all.  I had a couple emails in response to some of my prior blogs that have stimulated me to think of the problem from a different angle.  What’s the biggest issue for operators?  According to them, it’s “openness”.  What are the barriers to achieving that?  That’s a hard topic to survey because not everyone has a useful response, so I’ve gathered some insight from what I call the “literati”, people who are unusually insightful about the technical issues of transformation.

The literati aren’t C-level executives or even VPs and so they don’t directly set policy, but they are a group who have convinced me that they’ve looked long and hard at the technical issues and business challenges of NFV.  Their view may not be decisive, but it’s certainly informed.

According to the literati, the issues of orchestration and management are important but also have demonstrated solutions.  The number of fully operational, productized, solutions ranges from five to eight depending on who you talk with, but the point is that these people believe that we have already solved the problems there, we just need to apply what we have effectively.  That’s not true in other areas, though.  The literati think we’ve focused so much on “orchestration” we’ve forgotten to make things orchestrable.

NFV is an interplay of three as-a-service capabilities, according to the literati.  One is hosting as a service to deploy virtual functions, one is connection as a service to build the inter-function connectivity and then tie everything to network endpoints for delivery, and one is function as a service which relates to implementations of network functions with virtual network functions (VNFs).  The common problem with these things is that we don’t base them on a master functional model for each service/function, so let’s take the three elements in the order I introduced them to see how that could be done.

All hosting solutions, no matter what the hardware platform is, or hypervisor, or whether we’re VMs or containers, should be represented as a single abstract HaaS model.  The goal of this model is to provide a point of convergence between diverse implementations of hosting from below and the composition of hosting into orchestrable service models above.  That creates an open point where different technologies and implementations can be combined, a kind of a buffer zone.  According to the literati, we should be able to define a service in terms of virtual functions and then, in essence, say “DEPLOY!” and have the orchestration of the deployment and lifecycle management then harmonize to a single model no matter what actual infrastructure gets selected.

Connection-as-a-service, or NaaS if you prefer, is similar.  The goal has to be to present a single NaaS abstraction that gets instantiated on whatever happens to be there.  This is particularly important for network connectivity services because we’re going to be dealing with infrastructure that evolves at some uncontrollable and probably highly variable pace, and we don’t want service definitions to have to reflect the continuous state of transition.  One abstraction fits all.

The common issue that these two requirements address is that of “brittleness”.  Service definitions, however you actually model them in structure/language terms, have to describe the transition from an order to a deployment to the operational state, and other lifecycle phases that then are involved with maintaining that state.  The service-level stuff, if it has to reference specific deployment and connection technology, would have to be changed whenever that technology changed, and if new technologies like SDN and NFV were deployed randomly across infrastructure as they matured, it’s possible that every service definition would have to be multifaceted to reflect how the deployment/management rules would change depending on where the service was offered.

The as-a-service goal says that if you have an abstraction to represent hosting or connection, you can require that vendors who supply equipment supply the necessary software (a “Virtual Infrastructure Manager” for example, in ETSI ISG terms) to rationalize their products to the abstractions of the as-a-service elements their stuff is intended to support.  Now services are insulated from changes in resources.

The literati say that this approach could be inferred from the ETSI material but it’s not clearly mandated, nor are the necessary abstractions defined.  That means that any higher-level orchestration process and model would have to be customized to resources, which is not a very “open” situation.

On the VNF side we have a similar problem with a different manifestation.  Everyone hears, or reads, constantly about the problem of VNF onboarding, meaning the process of taking software and making it into a virtual function that NFV orchestration and management can deploy and sustain.  The difficulty, says the literati, is that the goal is undefined in a technical sense.  If we have two implementations of a “firewall” function, we can almost be sure that each will have a different onboarding requirement.  Thus, even if we have multiple products, we don’t have an open market opportunity to use them.

What my contacts say should have been done, and still could be done, is that virtual functions should be divided into function classes, like “Firewall”, and each class should then have an associated abstraction—a model.  The onboarding process would then begin by having the associated software vendor (or somebody) harmonize the software with the relevant function class model.  Once that is done, then any service models that reference that function class would deploy the set of deployment instructions/steps that the model decomposed to—no matter what software was actually used.

The problem here is that while we have a software element in NFV that is at least loosely associated with this abstract-to-real translation (though it lacks the rigorous model definitions needed and a full lifecycle management feature set built into the abstraction) we have nothing like that on the VNF side.  The closest thing we have is the notion of the specialized VNF Manager (VNFM), but while this component could in theory be tasked with making all VNFs of a function class look the same to the rest of NFV software, it isn’t tasked that way now.

There are similarities between the view of the literati and my own, but they’re not completely congruent.  I favor a single modeling language and orchestration approach from the top to the bottom, and the literati believe that there’s nothing whatsoever wrong with having different models and orchestration at the service layer and to decompose the abstractions I’ve been talking about.  They also tend to put service modeling up in the OSS/BSS and the model decomposition outside/below it, where I tend to integrate both OSS/BSS and network operations into a common model set.  But even in these areas, I think I’ve indicated that I can see the case for the other approach.

One point the literati and I agree on is that orchestration and software automation of service processes is the fundamental goal here, not infrastructure change.  Most of them don’t believe that servers and hosting will account for more than about 25% of infrastructure spending even in the long run.  They believe that if opex could be improved through automation, some of the spending pressure (that for example resulted in a Cisco downgrade on Wall Street) would be relieved.  SDN and NFV, they say, aren’t responsible for less spending—the profit compression of the converging cost/price per bit curve is doing that.  The literati think that the as-a-service abstraction of connection resources would let operators gain cost management benefits without massive changes in infrastructure, but it would then lead in those changes where they make sense.

It seems to me that no matter who I talk with in network operator organizations, they end up at the same place but by taking different routes.  I guess I think that I’ve done that too.  The point, though, is that there is a profit issue to be addressed that is suppressing network spending and shifting power to price leaders like Huawei.  Everyone seems to think that service automation is the solution, but they’re all seeing the specific path to achieving it in a different light.  Perhaps it’s the responsibility of vendors here to create some momentum.

What Microsoft’s LinkedIn Deal Could Mean

Microsoft announced it was acquiring business social network giant LinkedIn and the Street took that as positive for LinkedIn and negative for Microsoft.  There are a lot of ways of looking at the deal, including that Microsoft like Verizon wants a piece of the OTT and advertising-sponsored service market.  It seems more likely that there’s more direct symbiosis, particularly if you read Microsoft’s own release on the deal.

LinkedIn, which I post on pretty much every day, is a good site for business prospecting, collaboration, and communication.  It’s not perfect, as many who like me have tried to run groups on it, but it’s certainly the winner in terms of engagement opportunity.  There are a lot of useful exchanges on LinkedIn, and it borders on being a B2B collaboration site without too much of the personal-social dimension that makes sites like Facebook frustrating to many who have purely business interests.

Microsoft has been trying to get into social networking for a long time, and rival Google has as well, with the latter launching its own Google+ platform to compete with Facebook.  There have been recent rumors that Google will scrap the whole thing as a disappointment, or perhaps reframe the service more along LinkedIn lines, and that might be a starting point in understanding Microsoft’s motives.

Google’s Docs offerings have created cloud-hosted competition for Microsoft, competition that could intensify if Google were to build in strong collaborative tools.  Google also has cloud computing in something more like PaaS than IaaS form, and that competes with Microsoft’s Azure cloud.  It’s illuminating, then, that Microsoft’s release on the deal says “Together we can accelerate the growth of LinkedIn, as well as Microsoft Office 365 and Dynamics as we seek to empower every person and organization on the planet.”

Microsoft’s Office franchise is critical to the company, perhaps as much as Windows is.  Over time, like other software companies, Microsoft has been working to evolve Office to a subscription model and to integrate it more with cloud computing.  The business version of Office 365 can be used with hosted exchange and SharePoint services.  Many people, me included, believe that Microsoft would like to tie Office not only to its cloud storage service (OneDrive) but also to its Azure cloud computing platform.

Microsoft Dynamics is a somewhat venerable CRM/ERP business suite that’s been sold through resellers, and over the years Microsoft has been slow to upgrade the software and quick to let its resellers and developers customize and expand it, to the point where direct customers for Dynamics are fairly rare.  There have also been rumors that Microsoft would like to marry Dynamics to Azure and create a SaaS version of the applications.  These would still be sold through and enhanced by resellers and developers, targeting primarily the SMB space but also in some cases competing with Salesforce.

Seen in this light, a LinkedIn deal could be two things at once.  One is a way of making sure Google doesn’t buy the property, creating a major headache for Microsoft’s cloud-and-collaboration plans, and the other is a way to cement all these somewhat diverse trends into a nice attractive unified package.  LinkedIn could be driven “in-company” as a tool for business collaboration, and Microsoft’s products could then tie to it.  It could also be expanded with Microsoft products to be a B2B platform, rivaling Salesforce in scope and integrating and enhancing Microsoft’s Azure.

Achieving all this wondrous stuff would have been easier a couple years ago, frankly.  The LinkedIn community is going to be very sensitive to crude attempts to shill Microsoft products by linking them with LinkedIn features.  Such a move could reinvigorate Google+ and give it a specific mission in the business space, or justify Google’s simply branding a similar platform for business.  However, there is no question that there is value in adding in real-time collaboration, calling using Skype, and other Microsoft capabilities.

The thing that I think will be the most interesting and perhaps decisive element of the deal is how Microsoft plays Dynamics.  We have never had a software application set that was designed for developers and resellers to enhance and was then migrated to be essentially hybrid-cloud hosted.  Remember that Azure mirrors Microsoft’s Windows Server platform tools, so what integrates with it could easily integrate with both sides of a hybrid cloud and migrate seamlessly between the two.  Microsoft could make Dynamics into a poster child for why Azure is a good cloud platform, in face.

Office in general, and Office 365 in particular, also offer some interesting opportunities.  Obviously Outlook and Skype have been increasingly cloud-integrated, and you can see how those capabilities could be exploited in LinkedIn to enhance group postings and extend groups to represent private collaborative enclaves.  Already new versions of Office will let you send a link to a OneDrive file instead of actually attaching it, and convey edit rights as needed to the recipient.

So why doesn’t the Street like this for Microsoft, to the point where the company’s bond rating is now subject to review?  It’s a heck of a lot of cash to put out, but more than that is the fact that Microsoft doesn’t have exactly an impressive record with acquisitions.  This kind of deal is delicate not only for what it could do to hurt LinkedIn, but what it could do to hurt Microsoft.  Do this wrong and you tarnish Office, Azure, and Dynamics and that would be a total disaster.

The smart move for Microsoft would be to add in-company extensions to LinkedIn and then extend the extensions to B2B carefully.   That way, the details of the integration would be worked out before any visible changes to LinkedIn, and it’s reasonable to assume that B2B collaboration is going to evolve from in-company collaboration because it could first extend to close business partners and move on to clients, etc.

From a technology perspective this could be interesting too.  Integrating a bunch of tools into a collaborative platform seems almost tailor-made for microservices.  Microsoft has been a supporter of that approach for some time, and its documentation on microservices in both Azure and its developer program is very strong.  However, collaboration is an example of a place where just saying “microservices” isn’t enough.  Some microservices are going to be highly integrated with a given task, and thus things you’d probably want to run almost locally to the user, while others are more rarely accessed and could be centralized.  The distribution could change from user to user, which seems to demand an architecture that can instantiate a service depending on usage without requiring that the developer worry about that as an issue.  That could favor a PaaS-hybrid cloud like that of Microsoft.

This is also a pretty darn good model of what a “new service” might look like, what NFV and SDN should be aiming to support.  Network operators who are looking at platforms for new revenue have to frame their search around some feasible services that can drive user spending before they worry too much about platforms.  This deal might help do that.

Perhaps the most significant theme here is productivity enhancement, though.  We have always depended as an industry on developments that allow tech to drive a leap forward in productivity.  That’s what has created the IT spending waves of the past, and what has been lacking in the market since 2001.  Could this be a way of getting it all back?  Darn straight, if it works, and we’ll just have to wait to see what Microsoft does next.

Server Architectures for the Cloud and NFV Aren’t as “Commercial” as We Think

Complexity is often the enemy of revolution because things that are simple enough to grasp quickly get better coverage and wider appreciation.  A good example is the way we talk about hosting virtual service elements on “COTS” meaning “commercial off-the-shelf-servers”.  From the term and its usage, you’d think there was a single model of server, a single set of capabilities.  That’s not likely to be true at all, and the truth could have some interesting consequences.

To understand hosting requirements for virtualized features or network elements, you have to start by separating them into data-plane services or signaling-plane services.  Data-plane services are directly in the data path, and they include not only switches/routers but also things like firewalls or encryption services that have to operate on every packet.  Signaling plane services operate on control packets or higher-layer packets that represent exchanges of network information.  There are obviously a lot less of these than the data-plane packets that carry information.

In the data plane, the paramount hosting requirements include high enough throughput to insure that you can handle the load of all the connections at once, low process latency to insure you don’t introduce a lot of network delay, and high intrinsic reliability because you can’t fail over without creating a protracted service impact.

If you looked at a box ideal for the data plane mission, you’d see a high-throughput backplane to transfer packets between network adapters, high memory bandwidth, CPU requirements set entirely by the load that the switching of network packets would impose, and relatively modest disk I/O requirements.  Given that “COTS” is typically optimized for disk I/O and heavy computational load, this is actually quite a different box.  You’d want all of the data-plane acceleration capabilities out there, in both hardware and software.

Network adapter and data-plane throughput efficiency might not be enough.  Most network appliances (switches and routers) will use special hardware features like content-addressable memory to quickly process packet headers and determine the next hop to take (meaning which trunk to exit on).  Conventional CPU and memory technology could take a lot longer, and if the size of the forwarding table is large enough then you might want to have a CAM board or some special processor to assist in the lookup.  Otherwise network latency could be increased enough to impact some applications.

The reliability issue is probably the one that gets misunderstood most.  We think in terms of having failover as the alternative to reliable hardware in the cloud, and that might be true for conventional transactional applications.  For data switching, the obvious problem is that the time required to spin up an alternative image and make the necessary network connections to put it into the data path to replace a failure would certainly be noticed.  Because the fault would probably be detected by a higher level, it’s possible that adaptive recovery at that level might be initiated, which could then collide with efforts to replace the failed image.  The longer the failure the bigger the risk of cross-purpose recovery.  Thus, these boxes probably do have to be five-nines, and you could argue for even higher availability too.

Horizontal scaling is less likely to be useful for data-plane applications for three reasons.  First, it’s difficult to introduce a parallel path in the data plane because you have to introduce path separation and combination features that could cause disruption just because you temporarily break the connection.  Second, you’ll end up with out-of-order delivery in almost every case, and not all packet processing will reorder packets.  Third, your performance limitations are more likely to be on the access or connection side, and unless you paralleled the whole data path you’ve not accomplished much.

The final point in server design for data plane service applications is the need to deliver uniform performance under load.  I’ve seen demos of some COTS servers in multi-trunk data plane applications, and the problem you run into is that performance differs sharply between low and high load levels.  That means that a server that’s assigned to run more VMs is going to degrade everything, which means you can’t run multiple VMs and adhere to stringent SLAs.

The signaling-plane stuff is very different.  Control packets and management packets are relatively rare in a flow, and unlike data packets that essentially demand a uniform process—“Forward me!”—the signaling packets may spawn a fairly intensive process.  In many cases there will even be a requirement to access a database, as you’d see in mobile/IMS and EPC control-plane processing.  These processes are much more like classic COTS applications.

You don’t need as high hardware reliability in the signaling-plane services because you can spawn a new copy more easily, and you can also load-balance these services without interruption.  You don’t need as much data-plane acceleration unless you plan on doing a lot of different signaling applications on a single server, because the signaling packet load is smaller.

Signaling-plane services are also good candidates for containers versus virtual machines.  It’s easier to see data-plane services being VM-hosted because of their greater performance needs and their relatively static resource commitments.  Signaling-plane stuff needs less and runs less, and in some cases the requirements of the signaling plane are even web-like or transactional.

This combination of data and signaling plane requirements makes resource deployment more complicated.  A single resource pool designed for data-plane services could pose higher costs in signaling-plane applications because they need less resources.  Obviously a signaling-plane resource is sub-optimal in the data plane.  If the resource pool is divided up by service type, then it’s not uniform and thus not as efficient as it could be.

You also create more complexity in deployment because every application or virtual function has to be aligned with the right hosting paradigm, and the latency and cost of connection has to be managed in parallel with the hosting needs.  This doesn’t mean that the task is impossible; the truth is that the ETSI ISG is already considering more factors in hosting VNFs than would likely pay back in performance or reliability.

It seems to me that the most likely impact of these data-plane versus signaling-plane issues would be the creation of two distinct resource pools and deployment environments, one designed to be high-performance and support static commitments, and one to be highly dynamic and scalable—more like what we tend to think of when we think of cloud or NFV.

The notion of COTS hosting everything isn’t reasonable unless we define “COTS” very loosely.  The mission for servers in both cloud computing and NFV varies widely, and optimizing both opex and capex demands we don’t try to make one size fit all.  Thus, simple web-server technology, even the stuff that’s considered in the Open Compute Project, isn’t going to be the right answer for all applications, and we need to accept that up front and plan accordingly.