Finding the Right Path to Virtual Devices

One of the early points of interest for NFV was “virtual CPE”, meaning the use of cloud hosting of features that would normally be included in a device at the customer edge of services.  I’ve blogged a number of times on the question of whether this was a sensible approach, concluding that it isn’t.  The real world may agree, because most “vCPE” as it’s known is really not hosted in the cloud at all.  Instead it involves the placement of features in an agile edge device.  Is this a viable approach, and if so, how important might it be?

Agile or “universal” CPE (uCPE) is really a white-box appliance that’s designed (at least in theory) to be deployed and managed using NFV features.  Virtual network functions (VNFs) are loaded into the uCPE as needed, and in theory (again) you could supplement uCPE features with cloud-hosted features.  One benefit of the uCPE concept is that features could be moved between the uCPE and the cloud, in fact.

What we have here is two possible justifications for the uCPE/vCPE concept.  One is that we should consider this a white-box approach to service edge devices, and the other that we’d consider it an adjunct to carrier-cloud-hosted NFV.  If either of these approaches present enough value, we could expect the uCPE/vCPE concept to fly, and if neither does, we’ll need to fix some problems to get the whole notion off the ground.

White-box appliances are obviously a concept with merit, as far as lowering costs are concerned.  However, they depend on someone creating features to stuff in them, and on the pricing of the features and the uCPE being no greater than and hopefully at least 20% less than, the price of traditional fixed appliances.  According to operators I’ve talked with, that goal hasn’t been easy to achieve.

The biggest problems operators cite for the white-box model are 1) the high cost that feature vendors want for licensing the features to be loaded into the uCPE, and 2) the difficulties in onboarding the features.  It’s likely the two issues are at least somewhat related; if feature vendors have to customize features for a uCPE model, they want a return.  If in fact there are no “uCPE models” per se, meaning that there are no architecture or embedded operating system standards for uCPE, then the problem is magnified significantly.

You could argue that the NFV approach is a way out of at least the second of these two problems, and thus might impact the first as well.  Logical, but it doesn’t seem to be true, because both licensing costs and onboarding difficulties are cited for VNFs deploying in the cloud as well.  Thus, I think we have to look in a different direction.  In fact, two directions.

First, I think we need a reference architecture for uCPE, a set of platform APIs that would be available to any piece of software on any uCPE device regardless of implementation or vendor.  Something like this has been done with Linux and with the Java Virtual Machine.  Suppose we said that all uCPE had to offer an embedded Linux or JVM implementation?  Better yet, suppose we adopted the Linux Foundation’s DANOS?  Then a single set of APIs would make any feature compatible with any piece of uCPE, and we have at least that problem solved.  There are also other open-device operating systems emerging, and in theory one of them would serve, as long as it was open-source.  Big Switch announced an open-source network operating system recently, and that might be an alternative to DANOS.

The second thing we need is an early focus on open-source features to be added to the uCPE.  I’ve always believed that NFV’s success depended on getting open-source VNFs to force commercial VNF providers to set rational prices based on benefits they can offer.  No real effort to do that has been made, to the detriment of the marketplace.

These steps are necessary conditions, IMHO, but not sufficient conditions.  The big problem with uCPE is the relatively narrow range of customers where the concept is really viable.  Home devices are simply too cheap to target, which means only business sites would be likely candidates for adopting the technology.  Then you have the question of whether agile features are valuable in the first place.  Most enterprise customers tell me that they believe their sites would require a single static feature set, and a straw poll I tool in 2018 said that the same feature set (firewall, SD-WAN, encryption) was suitable for almost 90% of sites.  We’ll have to see if a value proposition emerges here.

Let’s move on, then, to our second uCPE possibility.  The notion of uCPE as being a kind of outpost to the NFV carrier cloud has also presented issues.  Obviously, it’s more complicated to populate a uCPE device with features if you have to follow the ETSI NFV model of orchestration and management, and so having uCPE be considered a part of NFV is logical only if you actually gain something from that approach.  What, other than harmonious management where features might move between uCPE and cloud, could we present as a benefit?  Not much.

Operators tell me that they have concerns over the VNF licensing fees, just as they have for the white-box model.  Some are also telling me that the notion of chaining VNFs together in the cloud to create a virtual device is too expensive of hosting resources and too complex operationally to be economical.  Onboarding VNFs is too complex, again as it is for white-box solutions.  They also say their experience is that enterprises don’t change the VNF mixture that often, which means it would be more cost-effective to simply combine the most common VNF configurations into a single machine image.

The solution to these problems seems straightforward.  First, you need that common framework for hosting and to encourage open-source VNFs, the same steps as with white-box uCPE.  Second, you need to abandon the notion of service chains of VNFs in favor of packaging the most common combinations as machine images.  One operator told me that just doing the latter improves the resource efficiency and opex efficiency by 50% or more.

The common thread here is pretty clear.  More work needs to be done to standardize the platform for hosting vCPE, both on-prem (in uCPE) and in the cloud.  If that isn’t done, then it’s likely that neither framework for vCPE will be economically viable on a broad enough scale to justify all the work being put into it.  Second, the best source for VNFs is open-source, where there are ample business models out there in sale of support for operators to mimic.  In addition, commercial software providers would be more likely to be aggressive in VNF pricing if they knew they had a free competitor.

It would have been easy to adopt both these recommendations, and the “one-machine-image” one as well, right at the first, and I know the points were raised because I was one who raised them.  Now, the problem is that a lot of the VNF partnerships created don’t fit these points, and operators would have to frame their offerings differently in order to adopt them today.  The biggest problem, I think, would be for the NFV community to accept the changes in strategy given the time spent on other approaches.

It would be smart to do that, though, because the DANOS efforts alone would seem to be directing the market toward a white-box approach.  If that’s the case, then the APIs available in DANOS should be accepted as the standard to be used even for cloud-hosted VNFs, which would make VNFs portable between white boxes and the cloud.  It would also standardize them to the point where onboarding would be much easier.

To make this all work, we’d need to augment DANOS APIs to include the linkages needed to deploy and manage the elements, and we’d also need to consider how to get DANOS APIs to work in VMs and containers.  A middleware tool would do the job, and so the Linux Foundation should look at that as part of their project.  With the combination of DANOS available for devices, VMs, and containers, operators would have the basis for portable data-plane functions hosted in the cloud or uCPE.

The icing on the cake would be to provide P4 flow-language support on DANOS in all these configurations.  P4 could be used to create specialized switch/router features anywhere, and could also (perhaps with some enhancements) be used to build things like firewalls and VPN on-ramps.  Given that the ONF at least is promoting P4 on DANOS and that AT&T originated DANOS (as dNOS), getting broad operator support for this approach should be possible.

Vendors with competitive network operating systems (like Big Switch) would need something like the points I’ve cited here just to bootstrap themselves into credibility.  There are already enough options in the space to confuse prospective users, and none of them so far have really taken aim at the major value propositions that would justify them.  If we had a bit of a feature-value war among these vendors, it would help elevate the discussion overall, even if it did magnify near-term confusion over selection.

If all this is technically possible and if we could get the framework into place, who would buy into it?  Some operators like AT&T likely would, but another strong possibility is the US Government.  The feds are always looking for ways to get more for less, and they’re supporters of open-source and standard frameworks with interchangeable parts.  Even if operators might drag their feet, government support could create direct deployments and at the same time push operators to support the same architecture.

I firmly believe that this is the right way to do virtual devices, including vCPE.  Despite the fact that things didn’t get off to a smooth start, I think the approach could still be promoted and help drive vCPE, uCPE, and even NFV forward.

Turning “Hype Cities” into “Smart Cities”

Smart cities are an idea that generates lots of excitement, perhaps in part because everyone has their own view of what the term means.  I’ve been surprised to find that many of the technologists I’ve talked with see a smart city as one where practically everything is measured or viewed by a sensor open on the Internet.  This contrasts with the view of most of the experts I know, who think that vision has zero chance of success.  Everyone does agree that we have to step beyond the usual overpromotion; this is about “smart” cities not “hype” cities, after all.

I don’t want to beat a dead horse on the open-Internet approach, but it is important to understand what’s behind it.  This camp sees IoT as a kind of successor to the Internet.  We had open connectivity and hosting with the Internet.  It spawned a whole industry of what we’d call “over-the-top” applications and players, in no small part because the open connectivity eliminated what would otherwise have been a big barrier to market entry.  The same, say the open-Interneters, could happen with IoT.  Simple.

The contrary view is pretty simple too.  The investment needed to create the initial “open Internet” had already been made by telcos and cable companies.  With IoT, the sensors are not out there to exploit, they’d have to be deployed and connected at considerable cost.  Who would invest to create an open sensor community, the contrarians ask?  Then there’s the issue of security and privacy.  How would you ensure that sensors weren’t hacked, and that people didn’t abuse them by (for example) tracking/stalking others?

You can see arguments for both sides here, but suppose we could figure out a way of uniting the positions.  Could a smart-city architecture provide financial sensibility, security, and privacy and at the same time create an open community whose attempts to build their own value would end up building value for all?  It might be possible.

The first thing we’d need is a solution to the “first telephone” problem, the old saw that says that nobody will buy the first telephone because they’d have nobody to call.  A smart city needs smarts, and needs them fast, or there’s no credibility to participation.  That’s one reason why the problem of getting an ROI on “open sensors” is so critical.

There’s a possible solution.  We have literally billions of sensors out there today.  Most proposed smart cities are already made up of buildings and homes that have some smarts.  Most of those have private sensor and controller devices installed, accessible remotely via the Internet.  However, most of the information those sensors collect are truly private to the facility owner, tenants, or both.  What’s needed, first and foremost, in any smart city strategy is cooperation from current sensor owners.  That means identifying information that could be shared without risking privacy, and identifying an architecture for collecting and sharing it.

Suppose you have one of those fancy Internet home thermostats.  You set a temperature using your phone and the heat or air conditioning work to match what you want.  Obviously, you don’t want anonymous third parties setting your heat and air, but there are some things you might be willing to accept.

Example: Suppose your power/gas company would give you a rate adjustment if they had the right to change your thermostat setting during an emergency.  Many users would accept the idea, as long as they could agree on just how much they’d get and how much of a change the utility was allowed to make.  Example: Suppose your heating is set at 72 degrees, but your thermostat reading says the temperature in the home has increased from 72 to 85 in just ten minutes.  Could there be a fire or a major malfunction?  Would you be willing to allow that condition to be reported to you, or to a security company?  Example: Would you allow the video feed from your video doorbell or security cameras to be made available to public safety personnel or security contractors under controlled conditions?  Example: Suppose that every video doorbell and Internet thermostat in a given area suddenly dropped contact.  Could that be an indication of a power problem?

The point here is that the right way to start a smart city initiative is to identify things that IoT would be able to do based on what’s already deployed in facilities.  While an individual city could define a specification for how a home/building security or facility control system would expose this information, a better approach would be for a standards body to define both a collection mechanism that home systems or their vendors could elect to install, and a distribution system to control how that information can be shared.  Cities could then adopt the standards, provide incentives to expose the information, even require some classes of facilities to share information.

The obvious next step in this process would be to create a “trusted agent”.  Smart thermostats and security systems often store information in the cloud and work in a kind of three-way relationship between the device, the device vendor, and the device owner.  We can envision smart-city IoT starting with a series of services, represented by APIs and hosted by a trusted entity, that would “publish” sensor information in a variety of forms.

The obvious question is who will provide these services.  This is what network operators should be looking at in the IoT space, not promoting a host of 5G-attached sensors.  The latter space would almost surely require the operators to invest in the sensors themselves, which of course kills the revenue associated with the deployment.  It would also surely result in legal/regulatory action to open the sensors to all, which would make the whole deployment a big money pit.  The former would be an opportunity for operators to get into a high-level service space before the OTTs did.

All of the major public cloud providers (Amazon, Google, IBM, Microsoft, and Oracle) could reasonably decide to get into this business too.  IoT-related services would not only provide an early lead in the realistic IoT deployment model, it would buff up the providers’ credentials in the enterprise hybrid cloud space.  Apple, who is clearly trying to find a revenue secret that doesn’t boil down to “Yuppies buy more iPhones”, could get smart and make a bet in the space.  IBM might be the most obvious player to move here, given that it’s been a player (or at least prospective player) in the smart cities space for some time.

The big lessons to be learned here are first that we’re not going to get to smart cities because cities or others bite the bullet and pay big bucks to deploy a bunch of open sensors, and second that the path to a smart city is by expanding from a smart-building base.  Once those points are accepted, I think it’s fairly easy to plot a rational path forward, which I’ve tried to do here.  If they’re not accepted, then we have IoT problems to face down the road.

Making the Case for Hyperconverged and Composed Infrastructure

Most buzzwords are probably hype these days, and all buzzwords probably contain a measure of hype.  The saddest part of this is that often the hype covers legitimate value points, and that’s what may be happening in the carrier cloud and edge computing world.  We hear a lot about “hyperconvergence” and “composable infrastructure”, but we’ve not really heard much about what makes either concept truly useful.  And yes, there are such things.

It might seem that hyperconvergence, which is all about packing a lot of stuff into a single data center, and composable infrastructure, which is all about building virtual servers from agile hosted components, are totally different.  The fact is that the mission of both technologies is converging, particularly for carrier cloud.  When you host stuff on a pool of resources, you want to have good performance for all the things you put there.  “Good” means not developing a lot of unnecessary latency in connecting things, it means not creating small specialized and inefficient resource pools to handle unusual hosting needs, and it means not creating management bottlenecks.  Both our target technology trends can help with all these things.

There have always been benefits to big data centers, relating to efficient use of real estate, heating and cooling, power, and even operations staff.  A benefit that’s less recognized is that if you have a lot of servers in one place, you probably can connect components within an application or service using fast local switching, reducing latency and improving performance.  Operators I’ve talked with have been (like everyone else) weighing the benefits of concentrated resource pools against distributed ones, and they tell me that if you neglect for the moment edge-compute considerations, a large data center with highly concentrated hosting resources will offer better financial and user-subjective performance.

Hyperconvergence aims at getting a lot of servers in a rack, which means a lot get into a given data center.  It also tries to improve switching performance with efficient multi-tier switches and fabrics, and this increases the benefit of hyperconvergence by increasing the difference that a combined data center presents in performance versus one that’s broken up into physically separate data centers that are connected via WAN links.  The more we focus on the cloud, in any form, the more hyperconvergence matters.  Even big OTT players like Facebook, who deploy their own data centers, like hyperconvergence.

In effect, the limiting factor in hyperconvergence is the latency within the data center.  The more efficient connectivity is within the data center in reducing connection latency, the better the case for hyperconverged, giant, data centers.  If you reach a point of diminishing returns because of network connectivity issues, then you top out in data center concentration benefits.

Latency isn’t just about server-to-server, though.  Database resources, the storage systems that hold information for those components you deploy, are also a factor.  Where services or applications rely on storage arrays, it’s very possible that the biggest argument in favor of resource concentration is that it normalizes the access time needed to interact with those arrays.  If you distribute your data center, you either put your data store in one place or in several, and no matter which way you go, you’ve impacted resource pool efficiency and performance.  A single array has to be somewhere, and any components that access that array from a different data center will pay a penalty.  Multiple arrays, if the entire data set is accessed similarly by all components, will just change how the problem of latency is distributed.  If you have specialized component placements to accommodate the fact that some components access different parts of your overall database, you limit the size of your resource pool because components have to be deployed with their data.

You do have to be wary in the database-as-a-connected-resource space.  If you read and write at the detail-record level, latency is a killer, of course.  Many database applications don’t do raw direct access to data at all, but rather do queries that could be sent to a remote query engine, which would then return only results.  Good application design can address database efficiency, but it’s still important to consider database latency in transactional applications where a transaction equals an access, and there are a lot of transactions being done.

Other forms of resource specialization have their own potential impacts on resource pools and application performance.  If a given component needs a different kind of resources than the typically deployed components, you are presented with a choice of either establishing a specialized resource pool for each specialized component-to-resource relationship, or making all the specialized resources generally available.  The former means less efficiency and the latter raises the average cost of hosting.

That’s where composable infrastructure comes in.  Suppose that storage, memory, even custom chips, were designed to be swapped in and out as needed, from a pool of those specialized resources.  Now you’d be able to compose servers to match requirements, which means you could build specialized hosting into any resource within the range of your composability.  That could radically improve hosting efficiency and performance within the domain where composition works, if it’s done right.

The challenge with composable infrastructure lies in the efficiency of the composition.  What you’re doing is building a server from piece parts, and the question comes when you look at the connection between all those parts.  If we assume that the pieces of your composed virtual server were connected using traditional networking, you would be introducing network latency into resource accesses that, in a traditionally integrated server, would be handled on an internal bus at a very high speed.  Thus, composable infrastructure solutions should be divided into the “networked class” that depend on local data center networking, and the “bus” class that provide some sort of high-speed component interface.

Some resources, like memory, clearly cannot be connected using traditional networking; the impact on performance would be truly awful.  With other resources, the question would be the way in which the resources were used.  For example, if a GPU were used in some specialized calculation that had to be done a thousand times for a given event or transaction, traditional connectivity is probably going to introduce too much delay.  If the calculation is done once per event/transaction, it could be fine.

There is value to composability as long as the connection efficiency of the solution matches the requirements of the applications/components involved.  That’s a requirement that only the data center owner can enforce, or even assess.  It’s far from clear at this point just what range of connection efficiencies might be presented in the market, and at what cost.  Similarly, we don’t know whether application/service design could separate components that needed specialized resources from those that didn’t, to permit better management of specialized resources.  And finally, we don’t know how the edge might fit into this.

Edge computing isn’t a universal benefit.  An edge data center is almost certainly less efficient in terms of resource utilization than one deeper in the metro area, or even region.  The latency associated with a connection to a data center depends first and foremost on the number of network devices you transit; geography is a secondary issue given that the propagation of data in fiber is about a hundred thousand miles per second.  Applications vary on their sensitivity to latency, too.  While autonomous vehicles seem a perfect example of why latency could be important, even a fast-moving vehicle wouldn’t go far during the time it took an event to be transported over a hundred miles of fiber.  One millisecond would be required, during which a vehicle might travel 0.088 feet.  In any event, avoidance of nearby objects isn’t likely to be ceded to a network-connected entity, but rather built into car control locally.

If there are few applications/components that have to be hosted at the edge, then hyperconvergence and composability might be unimportant to edge computing.  Certainly at the edge, hyperconvergence is a benefit less for achieving efficiency than for reducing real estate requirements, which would be a factor only if you needed a lot of servers in your edge data centers.  Composability might be a factor in edge computing even with lower edge hosting requirements, though, because you wouldn’t be able to fully utilize specialized server configurations in a small data center; too few applications would likely be running.

Everything doesn’t need to be hyperscaled or composed, and it’s likely that before the concepts become broadly useful, we’ll need to see more cloud commitment and greater use of specialized resources.  It would also help to have an awareness of how component architectures and resource specialization intersect; applications should divide into components based in part on how specialized resources are used within them.  Still, the timing of market hype here may be optimistic but the eventual reality of the benefits is surely there.

How Much “Carrier” versus “Cloud” Should we Have in “Carrier Cloud?”

Should carrier cloud look like a cloud or like a carrier?  In past blogs I’ve pointed out many places where the two aren’t converging and probably should be.  Another such area is virtual networking.  In cloud computing, including public cloud services, hybrid cloud, multi-cloud, and even in data center computing, there’s increased attention being paid to virtual networks.  Arguably, the whole SD-WAN craze is about virtual networking too, and yet we’ve not been hearing much about how virtual networks would figure in carrier cloud.

It’s not that operators (carriers) don’t understand virtual networks, at least at one level.  VPN services are a fixture for most big operators and many smaller ones specialize in business services of some sort.  The increased interest in SD-WAN shows that operators are receptive to different virtual network paradigms, at least for the creation of “VPNs” in a different way.  Operators have also generally adopted virtual network technology in data centers, mostly in support of the use of OpenStack.  With all of that, though, they’re still not comfortable about virtual networking at the level it’s discussed in cloud circles.

Broadly speaking, virtual networking in the cloud world is an extension of “tenant networking”, introduced back in the early SDN days by Nicira (it’s now VMware’s NSX).  Tenant networking was designed to create an overlay network to segment data center networks used in the cloud, without resorting to the more limited formal Ethernet-related protocols like VXLAN.

In these early applications, the goal was to create application/tenant subnetworks that would be isolated from each other, and exposed onto tenant VPNs through a gateway or NAT process.  Thus, the virtual network was really a virtual LAN, something that built subnets in an IP world but really itself lived at Level 2.

Three factors have moved things out of that limited conception.  First, vendors like Nokia/Nuage came along and offered a virtual network that could extend beyond the data center.  This actually happened before most buyers really understood why they’d even want to do that, and Nokia/Nuage (and other vendors who followed, like Juniper) weren’t marketing giants, so this first factor waited in the wings for a time.  Second, startups realized that virtual-network concepts could be used to build or extend VPNs over the Internet or other non-MPLS network resources.  That launched SD-WAN, and created a market awareness of the value of virtual networking in the wide area.

It was really the third factor that’s setting the pace today.  That last factor is the inherent elasticity of the cloud.  Physical networks have no business connecting virtual resources, for the obvious reason that virtual resources like the cloud require connection of logical entities that can be put anywhere and can move often.  If you look at how virtual networking is evolving in the cloud, you see that it’s becoming increasingly an integrated piece of “virtual resources”, a recognized co-equal to hosting.

That’s what carrier cloud is still missing, in part because operators still tend to think in terms of NFV, and NFV has never had a broad conception of virtual networking.  Today, we build real networks with real devices, “physical network functions” or PNFs.  The conception NFV brought along was that you built the same networks using virtual network functions, VNFs.  Thus, whatever you used to connect your PNFs is what you used to connect your VNFs.  A few realized that some VNFs might need to deploy within a private network space, and so you might need a subnet something like the ones that spawned the whole cloud virtual network initiative (but got left behind in the cloud).  Very few have gotten beyond that.

In the cloud, you can see the progress in one simple evolution—the virtual network’s evolution to the service mesh.  A virtual network, even one with a network-wide vision, still provides connectivity.  You connect virtual stuff, of course, but you connect.  If your virtual stuff is moved (by orchestration, for example) or redeployed, you have to reconnect, but it’s probably going to be a task outside virtual networking to accommodate things like load balancing for scaling.  Service mesh is different; instead of presuming your goal is agile connectivity, the assumption is that your goal is agile service delivery, with “service” here meaning software services/microservices.  Spawn a component in response to a need to scale capacity, and service mesh is responsible for getting it connected in correctly and load-balancing work to it and its partner component instances.

If you think about it, even NFV has “service mesh” applications, things where the goal is not just to connect something but rather to fit that something into an application/service-feature framework.  It is possible to do that without a service mesh, using multiple steps and multiple products, but it’s additional complexity and integration.  It would make more sense to use a mesh, and that’s even more likely true if you consider that most of what carrier cloud does resembles cloud applications more than it does a network of physical devices.  Even 5G is likely to deploy more “control plane” VNFs than data-plane VNFs, and things like content delivery or IoT will be even more cloud-centric.

I think we got into the carrier-cloud-versus-cloud disconnect on virtual networking because the operator community has tended to look to bottom-up standards efforts to advance their state of technology.  This tends to harden them on the details before they’ve really established the requirements, and the labyrinthian standards processes take years to advance, where cloud computing is based on open-source initiatives and advances at ten times that pace.  What the cloud is with respect to network and hosting technology today, for example, is so far beyond where it was in 2013 it would be hard for someone of that day to even visualize what’s happened.  NFV, launched at the same time, is still churning along with the same vision as before.

Another contributing problem is the excessive focus on virtual CPE.  vCPE is the simplest and probably overall the least-valuable mission you could project for carrier cloud.  Service chaining, the linking of separately hosted components to create a virtual form of a multi-feature appliance, is the worst possible way of implementing a virtual device.  It’s too expensive in its use of hosting resources, less reliable, and much more operationally complex.  Sticking all the VNFs inside a universal CPE device or a single virtual machine (better yet, a container) would be smarter.  But focusing on vCPE has hidden the connectivity and hosting needs of broader applications of NFV, and so they’re not yet being understood, much less addressed.

What did we miss?  Well, if you look at a “real” NFV opportunity, it’s hard to find one better than IMS, which in fact was my personal pick for a proof-of-concept for NFV back in 2013.  Everyone has seen a diagram of the components of IMS, and if you look carefully at one, you’ll see there are two places (the cell site interface to devices and the network gateway to the Internet) where there’s a clear and exposed public interface.  All of the other pieces of IMS talk only to each other.  Logically, then, what you’d like to have is a private subnetwork hosting “IMS” and exposing the two interfaces that are actually public.  That’s exactly how container hosting works in the cloud (Docker and Kubernetes each have different conceptions of “private subnetworks” but both require explicit exposure of the public addresses).  Where are the discussions of private subnetworks in carrier cloud?

The money, the ROI, is the real answer to my question about the bias of carrier cloud.  Anyone who believes that there is any significant incremental revenue to be had from things like virtual CPE is simply wrong.  The cloud industry is proving there is revenue in hosting parts of business applications and entirely new application models.  There’s also money to be earned from things like streaming video advertising and personalization, IoT, and so forth.  These are not connection applications, they’re experience applications, much like the applications we’re building in public cloud services today.  If operators want their share of the future, they need to do what the cloud is already doing.

Derived Operations for NFV and the Cloud

One of the fundamental principles of NFV, deeply engrained but little recognized, is that NFV is about replacing devices with virtual devices.  This was done largely to fit NFV into the scope of prevailing network and service management practices, which of course limits the impact that NFV would have on broader operations and management systems.  That’s arguably a good thing, but the same decision has negative impacts too.  Does the good justify the bad, and is there a way of magnifying the former and reducing the latter to help NFV cope with modern reality?  That’s an important question to address if we’re to hope for any significant positive impact from all the work that’s gone into NFV.

The easiest way to start this discussion is with the concept of the “model”, or more explicitly the notion of abstractions and intent models.  A device is a physical box that contains logical features and functions, a box that’s deployed as a unit and managed as a unit.  In modern thinking, we could visualize this device as an intent model whose external interfaces and visible features were the representations of the functions and features inside.  The device is then an implementation of an abstraction, and a software instance (a Virtual Network Function or VNF in NFV terms) is simply another implementation of the same abstraction.

The nice thing about this approach is that it shows why it’s useful to model VNFs after devices; you could switch one for the other interchangeably and the functionality and management wouldn’t be impacted.  However, this benefit requires that the device and the VNF implement the same intent-modeled abstraction.  To make this happen, you’d have to define your abstraction explicitly and then map both all devices and all VNFs purporting to be implementations to that same abstraction.  If you don’t define an abstraction, or if you define one for each vendor/product combination, you have VNFs that don’t have a common model at all, which means you have to integrate VNFs into a network on an almost-case-by-case basis.  Which we’re doing, badly, with VNF on-boarding.

This is only one of the problems with device/virtual-device framing of NFV.  Another is that there are likely many network functions and features that are not represented by devices at all, or reasons to want to deploy a feature/function that’s already available as a device-hosted element, but in a different context—alone or as part of a composed virtual device.  This would involve shifting the management view of a function, either before deploying it or potentially even afterward, as part of evolution.

An example of this is vCPE.  What is “inside” vCPE could vary over time depending on what a customer ordered.  Does the customer want to manage what they see as a uCPE-hosted set of VNFs as though they were separate devices?  Probably not; they’d want to “see” the vCPE instance and see the features thereof as sub-elements in the overall management view.

The final problem is that there are management tasks associated with at least some of the implementations of a given device abstraction.  If you manage a box, there are only limited things you can see about what’s inside it, and probably fewer things you can do about it.  It’s hardware.  If you manage a VNF, you have to manage not only the software instance but the collection of hosting and connection resources that are associated with it.  How do you manage that when the management abstraction you’ve assigned to the VNF doesn’t offer implementation visibility?  There are no hosts inside an appliance, no virtual connections.

What this sets up is a two-level management process, where one level manages the equivalent of appliances (which can be instantiated in either real devices or in virtual form) and the other manages the implementation of the hosting and connectivity inside the virtual instances of these appliances.  This second management task has to be related to the first, but today’s EMS/NMS/SMS processes don’t know anything about what’s being managed (because that was your goal in the first place) and so they don’t offer any way to manage the second part or relate to it.  You end up having to create a management process for the virtualized elements, and that tends to reduce the advantages gained by leveraging EMS tools to manage the VNFs themselves.  The more complex the virtual configuration is, the less overall benefits you achieve from leveraging EMS tools.

The final problem is perhaps the worst, and one that was developing even before NFV came along.  Any time services are created from a pool of shared resources, there is a risk that the management of those resources will introduce instability in itself.  You can’t have shared resources managed collectively by the users sharing them.  Imagine a hundred users, each allocating the same resources or manipulating the parameters that controlled resource operations, to their own ends.  Too many chefs, in this case, might end up not only spoiling the broth, but creating something quite un-broth-like.

In fact, even the attempt by multiple users to access a common resource management interface could bombard the resources with management commands, creating in effect a denial of service attack on the management APIs.  That’s particularly true when it’s possible that some users would attempt to obtain resource state frequently, a situation that prompted a proposal to the IETF to create an intermediation layer between management systems and the things they managed.

Called “i2aex” for “infrastructure to application exposure”, the proposal established a database that would collect and store, via a series of agent processes, the data from resource/device MIBs.  Management applications or other elements that needed resource status would then obtain it with a database query.  Updates would be filtered through a process that regulated what could be changed and coordinated the changes to prevent collisions.

I incorporated the i2aex concept into my thinking early in 2013, calling the result “derived operations”.  I presented it to the NFV ISG and also included it in my ExperiaSphere project.  While the IETF never developed i2aex, I think something like it is essential in resolving the inherent management conflicts in NFV.

With derived operations, the MIBs accessed by management systems and applications are constructed by “query agents” that include both the query needed to gather the data for the MIB from the repository and filter logic to control what can be seen or changed.  These query agents can be spun up as needed, they don’t have to store information, only access the repository.  Because a query agent can combine data from the VNF and data from what the VNF deploys on/with, it can create a meaningful status to project at the traditional EMS level.  At the same time, it can (with proper privileges) dig into the lower-level details of hosting.  A new MIB can be constructed by building the associated query agent, and this MIB can represent either a totally non-device-associated capability, or a composite capability presented by a uCPE host.

Query agents, as the intermediary in all management actions, are also a good place to insert journaling to record management activity and mediate changes that might collide when multiple users share resources.  This contributes to stability and governance, and also helps revolve finger-pointing problems.  Because a query agent can reference another query agent, a service feature hosted in another administration can “export” a MIB view that reflects the relationship between the retail and wholesale contributors to the service, keeping the wholesale partners’ infrastructure secrets except where they’re revealed to support an SLA.

The principle of derived operations would allow for the composition of management views, both for virtual functions/devices and for services.  It would let operators use existing EMS systems where they’re valuable, and construct new models where the old ones won’t work.  This wouldn’t solve all the problems of NFV, but it would solve one of the major ones, and so it’s something that the NFV ISG should develop.  The IETF, where the original i2aex proposal was introduced, should also take action to resurrect the approach and offer it broadly in networking.  Finally, cloud management practices should consider the approach as a solution to the problem of application views of shared resources.  This was a good idea that somehow got sidetracked, and getting it back on track would benefit the industry.

How the Cloud is Solving the Federation Problem

The old Ben Franklin quote about the difficulties in getting 13 clocks to chime at the same time wasn’t just about colonial politics.  Synchronizing autonomous systems to behave cooperatively has been a networking challenge for decades.  For a variety of reasons, including business practices, technology, and regulations, networks have been divided into separate “administrations”, and this division creates challenges when you want to create cohesive services or deploy applications for consistent behavior.  We may finally be seeing activity that will help resolve this, and perhaps it’s no surprise it’s coming from the cloud.

If we were to step back a decade, we’d see a number of international initiatives aimed at the creation of services that involved multiple administrative entities.  Three that come to mind are the work of the IPsphere Forum, the TMF, and the ITU.  The first of these created an explicit model of multiple administrations in network services, the second an approach that implicitly supported administrative separation, and the last a way of defining management harmony across a network, regardless of its ownership.  None of these has fully resolved the problem.

Likely because it’s not an easy problem to resolve, but in the cloud the vibrant and innovative open-source community has made some real progress.  There, we have a number of competing approaches too.  One is orchestration and management through a single tool with specialized plugins for each hosting environment.  This is what OpenStack does.  A second is infrastructure abstraction, where an abstract “host” is defined and exposed for use, and a lower layer then maps that abstraction to a variety of “real” hosting options, both in the data center and the cloud.  The final one is federation, which recognizes different administrative/management entities exist, and tries to work with that reality.

Federation as a concept really goes back to the network operators.  Operators have long been reluctant to give others any visibility into or control of their infrastructure, for the simple reasons that it would pose a competitive risk and create a risk of instability if the partner misused the capability.  The IPsphere Forum dealt with this by presuming that services were defined by the selling operator, and assembled from “elements” that were contributed by other operators.  An element had the properties of today’s intent models, meaning that what was inside was invisible and interchangeable.

The problem that the IPsphere Forum worked to address is similar to the one we find increasingly today because of highly architected or “managed” cloud services.  When public clouds were simply infrastructure-as-a-service (IaaS) virtual hosts, it was fairly straightforward to make deployment of something in the cloud look quite similar to data center deployment on virtual servers.  As we evolved toward things like functional/serverless computing, managed container services, and the use of web-service features in applications, we ended up with a situation not unlike the one found in multi-administration carrier services.

The relationship between administrative or architected entities and services overall is often called “federation” by the operator community.  The federation model recognizes that the contributing pieces of a service or application are autonomous elements with their own internal management and practices, and simply harnesses them based on externally exposed interfaces (hence my comment that it’s essentially an intent-modeled system).  Federation-modeled orchestration creates what’s effectively a multi-level process where we ask modeled administrative elements to do something that’s part of a higher-level vision of the service or application.

There are challenges associated with the federation model, the most significant of which emerge when the federation process hides properties of the infrastructure.  If we decide to scale or redeploy a component of an application or service from one administrative domain to another, the impact of the decision will depend on the quality of service in the new location.  That in turn will depend on the advertised internal behavior of the new location (which can be known) and on the way that the new location will impact overall workflow—which is harder to know.

The key term here is “workflow”, meaning the path of information movement among application components.  Suppose we deploy an instance of a component in Cloud A, and it happens that the hosting point within that cloud is across a continent, through a fairly large number of network hops, from the components we need to connect with.  The impact on overall performance will be different than it would be if the hosting point was in the same city as the partner components, but we don’t know where Cloud A will put the component, and Cloud A doesn’t know where the partner components are located.

Connectivity is another issue in federation, because it’s almost certain that address assignment within a federated element is controlled by the owner.  Lack of address control limits the options available for maintaining connectivity during redeployment of failed elements, and it can also impact security be making it difficult to predict the addresses of components and offer them protection through firewall-like mechanisms.

There’s a general increase in interest in virtual networking, particularly in SD-WAN form, as a means of dealing with connectivity.  Virtual networking creates, through some mechanism or another, an effective overlay network that disconnects the “logical” address of a user or resource from the physical network address and structure.  It’s the responsibility of the virtual network layer to maintain an association between the logical address and the physical address of the “node” that offers connectivity to the current location of the logical destination.

The cloud is promoting another network-related notion, which is the concept of the “service mesh”.  Remember that the cloud is a resilient and elastic hosting resource that necessarily has to be supplemented/supported by networking.  It does no good to replace or scale a component if you lose touch with what you’ve done.  A service mesh is designed to augment networking by including features to identify components and to provide load balancing if you scale them.  Thus, it tends to be more of a service abstraction layer than a connectivity layer alone.

Is there a unifying theme emerging here?  I think there is, though it’s still a bit murky.  We know that federation as a concept is essential because we know that individual cloud providers, network operators, and even data centers are going to be autonomous to a significant degree.  However, we also know that granularity to the level of administration is probably going to be a problem.  If everything that impacts reliability and performance is hidden inside an administrative element, then either specific measures to enhance these qualities aren’t available, or all the factors that you want considered have to be communicated to get a favorable outcome.  The latter isn’t practical.

What I think is the likely answer is a mixture of the federation model of old, with elements framed by administrations, and a more cloud-like model where elements are functionally defined.  To me, something based on a model hierarchy using TOSCA would be ideal, but even though TOSCA is about the cloud, it doesn’t seem as though this approach is gaining traction overall.  That means that while open-source components for cloud orchestration and federation are surely the best approach, getting them piecemeal and doing your own integration isn’t what enterprises tell me they want.

Today, I’d argue that VMware and Red Hat/IBM (in that order) are the most likely sources of a viable strategy.  VMware’s relationship with Amazon (primarily) and other cloud providers make it a contender in federation by definition, and Red Hat’s OpenShift is also a great approach to federated orchestration.  The company also seems determined to improve cloud integration and monitoring, and hybrid cloud is the main focus of its recent initiatives.  Best of all, the vehicle for nearly all the developments on federation in open source and the cloud center on the same technology, Kubernetes.

In early efforts like IPsphere, we never got our clocks chiming at the same time.  Vendors all tried to push things to favor their own product plans, and operators were inclined to blow in the wind.  It’s ironic to me that open source software, which has never had a central guiding body (or even goal) has somehow managed to collect itself on a unifying approach.  Maybe we need to rethink how market-driven solutions work!

Could Linear TV Really be Going Away?

Live TV was for years the Golden Goose of wireline services, but we all know what happened to the Golden Goose of fable.  The same thing may be happening to live TV according to a Light Reading piece.  I don’t agree with everything in the article, but I do agree that the forces of the market are aligning convincingly against TV as we know it.  The question is whether there’s any basic shift in technology or practices that will save anything at all from destruction.

Television has been, and still is, the dominant form of family entertainment.  Despite an onrush of smart devices, despite interest in gaming, and despite the increased availability of streaming video that’s not part of live TV, TV viewing has declined very little in the last decade by the most reliable measures.  However, even in that statistic we can find the first source of trouble.

What’s become much more common (three to five times as common by most surveys) is time-shifted viewing.  People today tend not to watch shows during their scheduled slot, partly because they’re doing something else at the time and partly (but increasingly) so they can skip commercials.  Time-shifting means that people don’t build their evenings around TV shows as they did in the past.  I don’t need to watch “my show” at 9 PM every Tuesday, or whatever.  Except for news and sports, most people today don’t insist that they watch a show live.

The commercials are a problem in themselves.  It’s not uncommon these days to find a station that’s playing three minutes of commercials for every ten minutes of the actual program.  The interruptions are constant, the commercials are often repetitive, and there are a lot of them in a row—five to seven in many cases.  Long commercial delays are one thing parents say contribute to kids either playing games (and thus not really watching) during a show or watching something else on a personal device.

Personal devices enter in more as a means of taking entertainment with someone than watching something different.  Young people would generally prefer not to be constantly under parental eyes, and so might go to their room or to someone else’s home, where they tend to use personal devices to entertain themselves.  This habitual avoidance of supervision becomes habitual avoidance of live TV.

Show quality is also cited by many of the people I talk with as a reason to rely on TV less.  If you ask someone whether TV is “worse” than it was five years ago, my experience says that two out of three will say it is.  If you ask whether their favorite show that’s run for two or three seasons has gotten better or worse, an even larger percentage says “worse”.  A friend complained to me recently that “there’s nothing good on anymore”.

Streaming has impacted this primarily by offering an alternative.  You don’t watch the best of a bad lot anymore, you turn to Netflix or Hulu or Amazon and grab something from the past (when, most say, stuff was better anyway).  As people become more accustomed to what you can get in online video libraries, they’re less tolerant of network TV.  The less they watch, the more commercials the networks need to insert to make the same (or more) money.  That drives more viewers away, and the feedback loop begins.

The premise of the Light Reading piece is that operators will be driven to provide connectivity, meaning high-speed Internet, rather than focus on TV, and that’s where I part ways with the thinking.  I do believe that having linear TV delivered to the home through a facility that either can’t provide or steals capacity from the delivery of broadband Internet is on its way out.  What I’m not sure of is whether broadband Internet alone is a viable business model for many.

But, Internet-only proponents will argue, isn’t Verizon heading that way already?  Don’t we have countries where broadband Internet is highly profitable?  Yes to both, but in both the case of Verizon and many other countries, the difference is the high demand density of the operators’ market.  If your infrastructure passes a lot of dollars for each mile of deployment, you can be profitable selling broadband Internet service.  Verizon has seven times the demand density of the US on the average, and there are countries (Japan, for example) that has twelve times the demand density of the average for the US.

This is where the 5G/FTTN millimeter-wave hybrid technology comes in.  If it’s possible to deliver broadband to urban and suburban locations using this combination, it could reduce the operators’ cost of access by as much as 70% versus FTTH, making it slightly cheaper than CATV cable.  However, the combination can only support streaming video, not linear TV.  Most wireline providers who rely on TV rely on linear delivery, and since the new hybrid would support much faster broadband, its advent will pose a competitive risk to wireline providers who have to carve out capacity and cost for linear delivery.

Streaming live TV is possible, of course, but it’s not necessarily easy.  DirecTV Now, a streaming service that took off and for a time boosted AT&T’s TV subscriptions, quickly fell from grace and now its users face a price increase.  Verizon seems to have dropped its own streaming strategy, but will these issues matter if everyone were to be forced to either adopt a streaming model or get out of the TV business?

Maybe.  If you can get the cost of 5G/FTTN down enough, broadband Internet could be profitable for at least 90% of homes.  What then happens to “TV”?

Amazon might offer an answer, which is that custom programming would be changed to a more library model, rather than live streaming.  Amazon’s original series are released as a season and streamed on demand, not metered out at a fixed time on a given day of a week.  Could other networks move to the same model?  Interestingly, while it’s the future technology of 5G/FTTN that may create the real risk to linear TV, it’s the past that might save it.

“The past”, meaning over-the-air TV.  An increased number (at least a quarter, though the number is hard to pin down) of households get some or all their TV over the air.  A decision to abandon linear TV and fixed timeslots for viewing new programming would largely disenfranchise these viewers, who would surely complain to lawmakers and regulators.  Whether this would save the concept of live TV will depend on how the other factors cited here play out.

The biggest issue is user satisfaction with the combination of show quality and commercial interruptions.  Deterioration in this space risks negative feedback and accelerated deterioration, resulting eventually in shows and commercials few will be willing to watch.  The decisive issue will likely be whether children shift their entertainment from TV to games played on computers, phones, and tablets, or watch library content on one of these devices.  If that happens, then “family viewing” declines, scheduled programming can no longer accommodating viewing behavior, and live TV is at serious risk.

That doesn’t mean that operators, both telco and cable companies, would abandon selling TV services.  Some might work to develop their own streaming and library combination and others could partner with OTT streaming video players.  There will always be an opportunity for both mobile and fixed broadband providers to resell content services because of their established, and even preferential, position with customers.  But if we continue to see negative reinforcements in the form of angst over show quality and commercial content, then linear live TV may be in trouble.

All of this is good for networking, at least IP networking.  Even without the reinforcing concept of 5G/FTTN, a shift away from linear TV is happening for reasons of viewer behavior, and this will induce wireline access providers to repurpose their connections to emphasize Internet over a combination of Internet and linear, creating significantly more traffic.  It also changes the caching equation because there’s less synchronicity in the material people are viewing.  Most viewers watch a scheduled show when it’s scheduled, even with time-shifting.  There is no clear pattern to when they watch an Amazon series that’s never had a specific timeslot.

At one level, there’s nothing really surprising here because people have predicted the death of linear for a decade.  At another level, we’ve never before seen the clear signals that from a viewer behavior and technology perspective, it’s actually possible.  We’re seeing those signals now.

Oracle, SaaS, and “Cloud Fragmentation”

Oracle didn’t offer great guidance in its earnings report, but it didn’t have a bad quarter.  The best thing about it was its as-a-service business.  That raises the question of whether SaaS is a new channel for cloud service success.  It also raises the question of whether, as an AT&T executive suggested, there’s fragmentation in the cloud.  Might we end up with more than three major cloud providers, and might we see new differentiators and issues emerge in the market?

Broadly speaking, cloud providers used to be divided into three classes depending on the services they offered—infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS) and software-as-a-service (SaaS).  Today, the big three providers are all rightfully PaaS providers because their offerings include “foundation” or “platform” services that developers exploit to build cloud-specialized applications.  We also have some SaaS providers, including Salesforce and our friend Oracle, but SaaS is generally seen as a specialized version of the cloud, not one that defines market trends overall.

That might be the first mistake that current market trends expose.  Application software providers, meaning companies who author popular application packages that have fairly broad (horizontal) industry appeal, have a strong incentive to offer their applications as a service, meaning they deploy them in SaaS form.  Vertical market software, particularly in the healthcare space and for SMB users in many more industries, has already seen some haphazard movement toward a cloud-hosted model.  Oracle is showing that revenue growth for application software companies may depend on this shift.

This trend is interesting in combination with the trend that’s made the Big Three cloud providers into PaaS players.  The cloud, meaning the general-purpose host-what-you-want cloud, has been evolving into a vast distributed platform with its own middleware, the web services the providers offer as platform services.  New applications developed for the cloud are built on cloud-specific features now, and this means they aren’t “migrating” to the cloud at all.  Most enterprise applications being built today are designed to run part-in-part-out of the cloud, with the cloud providing the user front-end processing and mission-critical back-end stuff still running in the data center.

The combination?  Well, if application software vendors are driving toward SaaS and everything else is driving toward custom cloud-specific PaaS form, then whatever migration is happening is happening because the applications, individually, are being offered in SaaS form.  Traditional computing is transformed by the cloud not by moving the servers to IaaS but to moving the applications to SaaS.  This supplements the new application builds, the stuff adding cloud front-ends to traditional applications.

The competitive implications of this shift would be significant.  Vanilla IaaS service would be a losing proposition, for starters.  That means that “cloud providers” in the traditional sense would have to offer a full suite of web services to compete with the Big Three, which means that there’s little or no chance of anyone getting traction that way.  IBM may be on either side of the threshold of survivable cloud aspirants, which is why they’ll have to exploit Red Hat, carrier cloud, or some rational strategy rather quickly or they lose any chance of playing in the cloud space in the future.

For software vendors in the segments that the cloud impacts, it means that either vendors will have to be big enough to offer their software in SaaS form, or they’ll have to partner with one or more of the Big Three to get a cloud presence.  My modeling on this isn’t precise, but it appears that only about a dozen of the biggest software names would be able to field their own cloud and perhaps three or four dozen could partner with a cloud provider.  The rest will have to continue on-premises only and hope their market doesn’t demand something cloudier.

The focus of front-end and new applications on a PaaS model has near-term implications for cloud users.  Anyone who’s tried to deploy to multiple cloud providers while using web services in their apps knows that this isn’t an easy proposition.  Most web services are at least subtly different across clouds, and some so different they may as well not be considered the same service at all.  This is one of the forces acting against cloud fragmentation; nobody is going to have fun moving from one provider to another.  It’s also a force that tends to get the Big Three bigger, perhaps at the expense of true “multi-cloud”.

There are big differences among sources in just how much multi-cloud is done.  My surveys suggest that it’s not commonplace to use multiple cloud providers except among large multi-nationals, and even there the usage tends to be limited (one of my survey targets spent 87% of their cloud dollars on one provider, and the other 13% spread among an additional four).  The data so far suggests that where multiple clouds are used, they’re confined to basic front-end missions, using relatively few additional services.  Those missions will decline as the major cloud providers expand their service geographies.

This may sound bad for multi-cloud, but only in one sense.  Remember that most of the big application vendors will be doing SaaS clouds and even more will be partnering to create them.  This means that almost every enterprise will be multi-cloud in the sense that it obtains much of its third-party applications from the cloud in SaaS form.  The difference with SaaS-multi-cloud is that users are not deploying the application elements and so have less control over the middleware used to support hybrid connections with the data center and other applications hosted elsewhere.

What a proliferation of SaaS applications could create is something we could call an “application cloud” with “compartments” into which any cloud-hosted application is slotted.  The details of the hosting would necessarily vary as the cloud model itself varies, but the fundamental issue of the way that a “compartment” appears in the company VPN and connects with other enterprise elements in a workflow would have to be defined.

Ironically, this could contribute to a greater order in cloud connectivity, at least eventually.  If we define how a cloud “compartment” is accessed, we’re essentially creating an abstraction that could then be used to contain the applications, even those created by front-end web-service customization rather than SaaS.  That would create a kind of intent model that could then harmonize the ways that things like service meshes and orchestrators have to support the integration of workflows in enterprise networks.

I’m not suggesting Oracle sees this, nor that anyone in particular does.  We’ve been unable so far to construct a top-down architecture of the hybrid or multi-cloud model, though we’ve done a pretty good job creating the pieces needed to build one.  I do think that SaaS “compartments” could be a step toward that model, and so we should keep an eye out for insightful players who figure out what it means and how to do it.

The IBM/Vodafone Deal and the Future of Carrier Cloud

IBM and Vodafone want to partner in the future of enterprise cloud, as the deal announced at MWC shows.  Given how much enterprise hybrid cloud is a focus for cloud providers, and given IBM’s need to catch up, it’s not a surprise.  Vodafone would also gain if a new enterprise vision for cloud applications promoted 5G.  There are even some early indicators of success.  The question isn’t whether the goals are worthy, then, but (as usual) whether there’s any meat on the bones of the deal that would provide a good chance of success.

The deal itself is an “eight-year strategic commercial partnership” that links IBM’s cloud and its AI services with Vodafone Business services and customers.  The deal becomes operational in the coming quarter, and because of that early launch relative to full 5G availability, it’s clear it’s not dependent on 5G apps like IoT that require low latency.  In fact, since Vodafone and pretty much everyone else is looking at the 5G Non-Stand-Alone (NSA) new-radio-only model, it’s not clear that there’s a real dependence on 5G even for the future.  So what gives?

Let’s get one thing straight from the start.  The direct focus of the deal is really about “carrier cloud” and not “enterprise cloud” in that the sense of the deal is promoting carrier sales to enterprises, not direct enterprise sales.  However, anything about the cloud is in the end about the architecture of cloud applications, and that means that even “carrier cloud” strategies could impact the cloud at large if the strategies change or accelerate cloud architecture development.  Because of the sales focus of the deal, though, we’ll have to first ask ourselves about the motivation of the players in the near term.

One obvious possibility is that IBM is looking toward a future where carrier cloud is going to be the largest incremental driver of cloud deployment.  If the operators build out their own clouds, they’ll be establishing a scale of operations and a set of capabilities that will be potentially highly competitive with the current cloud providers.  On the other hand, suppose IBM hosted Vodafone’s carrier cloud?  Suppose that deal single-handedly made IBM one of the giants in public cloud?  On Vodafone’s side, if operators are inevitably going to get into business services based on the new, microservice-based, model of cloud services, why not get to selling that stuff now instead of waiting to deploy your own cloud, and possibly getting it wrong.

I said five years ago that the largest market for public cloud computing services were applications that were not “migrated” to the cloud, but rather had never been implemented because they didn’t fit well in the current data-center-centric compute model at all.  A hybrid cloud that mixes the benefits of public cloud computing and secure data center transaction processing is critical in reaping a trillion dollars’ worth of incremental spending.  With the acquisition of Red Hat (which, somewhat to my surprise, IBM didn’t play up in their announcement), IBM could have all of the critical pieces needed to make this new cloud vision work.

As always, there are qualifications, it seems.  The obvious one is the lack of specific linkage to the Red Hat OpenShift stuff that’s actually the technical foundation for any new hybrid-cloud vision IBM could offer.  A less obvious but still important one is the linkages to “multi-cloud” in the sense of lots of public clouds, and IoT.  The least obvious of all is the question of how a specific Vodafone Business relationship will impact any broader carrier-cloud plans that IBM might (or should) have.

The biggest problem with getting our hands on that trillion dollars in new cloud revenue is defining a solid architectural model for the application platform that will span multiple clouds and data centers.  The industry has been spinning around on this topic for at least four years, and we only now seem to be coming to a consensus on even the basic points.  It’s going to be container-based.  It’s going to integrate virtual networking, and it’s going to link these two through Kubernetes orchestration.  There are even a decent number of architects who could draw you a picture of this glorious hybrid future, but the challenge is first that most buyers probably don’t know any of them, and second that the details of the elements in the architecture would be different across the architects.  Lack of a uniform vision always makes buyers antsy.

Red Hat has a decent number of those literate architects, and OpenShift is at least the beginning of a solid commercially credible framework for hybrid cloud.  If I were IBM, I’d be singing that point loud and clear at any point in my positioning where the words “cloud” or “hybrid” were uttered.  I’m not seeing that clear association here, and that means that the most compelling part of the IBM story isn’t being told, at least to the media.  That’s bad because IBM’s biggest current weakness is too much reliance on sales influence and not enough on broad marketing.  Remember my Robert Frost poem?  “So all who hide too well away must speak, and tell us where they are.”

The second issue is that multi-cloud-and-IoT flavor.  On one hand, it’s obvious that if 5G promotion is your goal, and you’re a network operator, you want to believe in IoT as much as a rabid conspiracy theorist wants to believe in Roswell UFOs.  On the other hand, UFOs are still “unidentified”, which proves that fervent belief doesn’t make a market.  The multi-cloud stuff could actually be more insidious; one comment made about the space in the stories on the deal says that companies have over a dozen clouds.  Not the companies I’ve talked with, and surely not most companies.  The more “multi” we go, the fewer true prospects there are.

The biggest problem operators have with transformation is their utter lack of realism.  What they want is a future that’s profitable for the same reason the past was, which is never going to happen.  There will be enterprises who do use many different clouds, multi-nationals who have no real choice if they want to cover their market areas.  Most won’t, and over time the major cloud providers will expand their geographic scope to improve their own market share.  Most IoT is within a facility, and will therefore never require more than WiFi or one of the sensor/controller protocols already in use.  Yes, there is a need for cellular IoT, in the transportation industry in particular.  No, it’s not a widespread need or one likely to develop early on.

This brings us to our final question, which comes down to the classic issue of “channel conflict”.  Being a seller to the carrier cloud market is a good thing.  Being a seller to an initial big player there is a good thing.  Selling only to that player is a bad thing, and if it sounds to other operators like IBM and Vodafone are joined at the hip, there’s a risk that the deal will be seen as a risk to other carrier cloud infrastructure buyers.  Certainly, it’s hard for an operator to differentiate their business services if they’ve got the same service supplier as all their competitors do.

This could be a case of sacrificing the long-term opportunity for short-term gain.  IBM clearly needs a cloud win, and in particular one that’s truly a hybrid cloud win.  They can benefit from a 5G-linked deal, though that’s not as clear a win as the carrier cloud piece.  If they can ink some contracts in the first half of this year, that boosts their stock.  Everyone else in the tech space thinks this way—make your numbers for the next quarter first, then worry about the ones after that with a worry-level that decreases as the quarters advance.  Why not IBM?

Well, for starters, IBM’s success in the past has been clearly associated with one single thing, which is strategic account control.  “Strategic” definitely doesn’t mean “only the current quarter”.  I can tell you from personal contact with operators that they’ve wanted IBM to be a player in carrier cloud for at least six years, so there is an opportunity for IBM to exert some strategic influence.  The operators also have what’s likely the longest capital cycle of any vertical, meaning it takes them longer to age out older equipment.  That certainly puts strategic thinking at a premium.  Anything IBM does to limit its strategic credibility will hurt them, likely disproportionately, in the carrier cloud market.  Losing strategic credibility overall is, according to my own surveys, IBM’s biggest current problem overall, so this would be a double whammy.

Is there a positive slant that could be taken?  Yes, there is.  If IBM sees the carrier cloud space as both an opportunity that needs some leadership by example, and an opportunity to define the new application model for the cloud in a space that has significant revenue potential, then a deal with Vodafone would be well-justified.  The big “if” is whether the deal can reap a quick result in both those areas.  Get out in front of the carrier cloud and hybrid cloud markets and you have a lot to gain, but you gain it only if you get out in front.

Nvidia and F5 Do Potentially Great Acquisitions

One of the most potentially dramatic developments in networking is an acquisition.  We’ve just had two of them; the acquisition of NGINX by F5 and the acquisition of Mellanox by Nvidia.  Will either or both of these deals have an impact on the industry, do they signal an impact already being felt, or are they a not-uncommon example of a vendor simply doing a silly and sometimes desperate deal?  One deal is about software, the other about hardware, so what’s in this for the industry?

Both F5 and Nvidia are companies transitioning their focus in an age of both growing and shrinking opportunity.  F5 has been about server load balancing and security, and Nvidia has been about graphics display systems for computer systems.  Neither of these starting points suggest a desire to get involved in network transformation, which is why we need to look deeper into the deals and what probably lies behind them.  When we do that, we’ll look at the future the deals seem to be aiming toward.

NGNIX is a premier web and application server provider, competing with popular servers like Apache.  If you look at their own take on the deal, you see what I believe are two clear signals of direction.  The first is that the cloud is transforming applications into a highly agile front-end piece that’s increasingly based on microservices and “events”, and a back-end piece that’s traditional online transaction processing (OLTP).  NGNIX does the first part and F5 the second.  The second direction is the orchestration and containerization of application deployment, and the need to frame application front-end services in those terms.

The componentization of applications has been going on for thirty years or more, but until the last decade or so it’s been componentization in building applications rather than in deploying them.  That means the application components that used to be bound into a common load image are now network-distributed.  That process is the real meat of virtualization.  It’s not so much about making hosting virtual, but rather about making a whole pool of hosts look like a single structure.  Things like microservices, orchestration (and Kubernetes), service meshes, and the like are the heart of this higher-level virtualization process.  Because technical companies (including both NGNIX and F5) are historically bad at articulating their story at any level other than geek-speak or press platitudes, this whole process is badly understood.

That’s why F5 may have made an incredibly smart move here.  There are a zillion pieces the new application ecosystem that’s wrought by this massive virtualization shift, and nearly all of them are in open-source form, promoted by a host of disconnected and small players.  NGNIX actually has a lot of the most critical pieces baked into its own offerings, and it also has a massive installed base.  You could have justified the purchase price ($670 million) for the base alone.  Add in the critical application virtualization ecosystem and you have a very sweet deal indeed.

The big question here is whether F5 really knows what it’s doing.  It’s easy to recount network technology acquisition failures, and actually rather difficult to identify great successes.  F5, like most network companies, has had somewhat insipid marketing and positioning, and as a hardware player it’s far from certain that they really have a handle on what’s going on in the software space, particularly the container-and-Kubernetes space that’s actually central to the value proposition for NGNIX going forward.  They plan to continue the NGNIX brand, but will that then limit the symbiosis, and will they end up messing with NGNIX strategy anyway?  We’ll have to see.

Let’s look now at the Nvidia deal.  The company’s own news release talks about how Nvidia as “unite two of the world’s leading companies in high performance computing (HPC)”, which noticeably plays on Nvidia’s own shift to GPUs as computing tools instead of purely graphics processors.  I know Mellanox from work I did in an ad hoc group working on an NFV implementation back in 2014.  Their specialty was (and is) high-speed interconnection and networking.  They have InfiniBand and Ethernet products and also custom packet system-on-chips, and these are a great element in a hyperconverged data center or in a cloud, including a carrier cloud.

Carrier cloud, I think, is the reason this deal is a smart play for Nvidia.  Remember that for five years now, my model has said that an optimal carrier cloud deployment would add over one hundred thousand data centers globally by 2030.  Those data centers, focused as they would be on hosting service components, would be unusually dependent on high-performance networking, which is what Mellanox can provide.  Mellanox, having worked in the NFV space, knows the system interconnect requirements for feature hosting.

GPU chips or other Nvidia products don’t come rolling off your lips when you think “carrier cloud”, but they might well do just that.  The reason is that feature hosting, unlike cloud applications for enterprises, isn’t particularly dependent on an Intel chip architecture.  For hosting microservices, which is where feature hosting in general and event hosting in particular, should be going, is a great GPU application.  Combine that with a great interconnect strategy and you have something that could be very big indeed.

Which it needs to be, because Nvidia is spending just about exactly ten times as much for Mellanox as F5 is spending for NGNIX.  Yes, Mellanox has a pretty solid business that Nvidia is buying, but the Street is skeptical that the price may be too high unless there’s significant leverage to be gained, and that’s where Nvidia has to step up its game.

The challenge in doing that is formidable, because cloud computing in general, and carrier cloud in particular, is really about software, meaning that hardware is just what you run software on.  That mindset comes hard to a hardware vendor of any sort, but particularly hard to chip vendors.  Not only is cloud and carrier cloud software-centric, any attempt to gain access to or to accelerate the space is going to require world-class positioning, which hardware vendors are always very bad at.  Very bad.

Both our acquisitions, then, have a lot of common elements.  First, the Street thinks that basic business fundamentals will have a challenging time justifying the M&A.  If this is about buying revenue, then both these deals are almost surely mistakes.  Second, to make the moves more than that simple business play the Street isn’t accepting, both the acquiring vendors are going to have to do singing and dancing like they’ve never done before, and back that up with some significant architectural advances.

Behind everything here, and much of the other things happening in networking, is a combination of a switch of technology from hardware-centricity to software centricity, and the new model of agile multi-component (microservice) applications.  This combination unlocks the real value of the cloud, it underpins the changes in networking to support an elastic community of services, and it challenges traditional vendor (and even user) planning.  It is very difficult for even platform software vendors (Wind River comes to mind) to promote their role in a transformation that higher-level software has to drive.  For Nvidia/Mellanox, it will surely be even more challenging.  For F5 and NGNIX, there’s a big advantage in that NGNIX actually has a lot of the key technology that will need to be promoted.

The important thing for both companies is the positioning.  Right now, carrier cloud and even the future of traditional public cloud services are tied to broadly useful but largely disconnected developments in the open-source space.  Operators don’t want to be integrators (many don’t even have the skills to be software integrators at all) and even planning out a cloud transformation is difficult without specific technical objectives and an architecture that can bind them into a deployment.  These are both great acquisitions, but to sustain their greatness there’s a lot of work still to be done.