Do We Need a New Vision of NFV Infrastructure, or of NFV?

Light Reading did a thoughtful piece on NFVi yesterday, and it highlights a number of points we should be thinking about when we talk about “NFV”, “cloud-native”, and “transformation”.  Some of the points show we’re at least thinking in the right direction, but others show we still have a long way to go.

NFVi stands for “NFV Infrastructure”, the term the NFV ISG gave to the collection of hosting and connection resources that are drawn upon to host virtual network functions (VNFs).  The presumption in the ISG material is that the link point between management and orchestration (MANO) and NFVi is the Virtual Infrastructure Manager (VIM).  From the first, it was assumed that the VIM would be based on OpenStack.

Also from the first, it was clear to at least some people (including me) that the concept of the VIM as the virtualization mid-point to NFVi had issues.  If we assume that we have a single VIM representing the totality of NFVi, then we have to make it support whatever we want to deploy, and also need it to make the choices of how to apply resources to services, since the resources are abstracted below the VIM.  The problems with this take a moment to explain.

The biggest problem is VIM device support complexity.  Suppose you want to host on containers, or bare metal, or in a public cloud service.  All of these have different management interfaces, so you’d need to have a plug-in for each.  Imagine a typical operator network that might have two or three network vendors, perhaps a dozen different devices, and then add in the hosting.  How many hypervisors, container systems, and so forth?  The number of permutations of all of this are astounding, and no vendor would supply a VIM that supported competitive equipment, so it’s an integration problem.

The next problem is that if there one VIM that represents all NFVi, then any decisions you make on where to host, what technology to use, and so forth, have to be pushed down into the VIM itself.  MANO service modeling (if indeed there is any such thing) is reduced to simply saying “host” and letting something else figure out everything else.  The VIM then has to become a second-level orchestrator, and its details in that role are not even discussed in the ISG work.

There are two possible ways of addressing this.  One is to have multiple VIMs, one for each resource domain.  The higher-level model then picks a VIM when it makes a hosting decision, implicitly, because it picked a place to host.  There is still a need for that invisible sub-orchestration within the resource domain.  The second possibility is to define a set of “service-layer” models and a set of “resource-layer models” that define the respective orchestration tasks, and view the VIM as representing the boundary where service-layer and resource-layer mapping happens.  I tried out both approaches and like the second one best; see my ExperiaSphere tutorials.

The problems cited in the article are due in part to the fact that the NFV ISG didn’t really pick either of the choices, which tends to create a very brittle coupling between the VNFs that you want to deploy and the resources you want to deploy them on.  The VIM of today doesn’t really abstract the NFVi, and that’s what makes VNF-to-NFVi connections almost specialized.

The other part, an issue we’re really not fully confronting even now, is the problem of automating the service lifecycle.  In order to do that, you have to define a goal-state “intent model” for the pieces of a service, and map the result to abstract resources.  Without this, you have no easy way to figure out what you’re supposed to do with a service event.  I know I sound like a broken record on this, but this problem was solved using data-model-coupled event-to-process mapping when the TMF published its original NGOSS Contract stuff a decade ago.  Why we (and even the TMF) forgot that, I can’t speculate.

My point here is that in a theoretical sense, there is no such thing as an NFVi that’s “too fragmented”.  Infrastructure reality is what it is, and the whole purpose of virtualization or abstraction is to accommodate the diversity of possibilities.  If you don’t have that, the fault lies in what’s supposed to be doing the abstracting, which is the VIM.  The VIM concept, candidly, frustrates me because I’ve spoken against the monolithic VIM vision from the first and even demonstrated to operators that it wouldn’t work.  That the point wasn’t understood back almost six years ago speaks to the problem of a lack of software architecture skill in the standards process.

How about a practical sense?  The article notes that one problem is that VNFs themselves demand different NFVi configurations, because of specialization in the hosting requirements.  The recommendation is to define only three classes of hosting for VNFs so that you limit the fragmentation of the resource pool created by the different VNF configuration requirements.  Is this a real issue?

Only maybe.  First, we’re at risk diving into the hypothetical when we talk about this topic.  Do the differences in configuration really create significant benefits to VNF performance?  How many VNFs of each of these three types would be hosted?  How different are the costs of the configuration for each, and would those cost differences be partially or fully offset by improved resource efficiency?  Do we reach a level of resources for each of the three configurations that we’ve achieved optimal efficiency anyway?  All this depends on what we end up deploying as VNFs.

Which we don’t know, and in one sense we shouldn’t be writing the concerns about this into specifications on the architecture.  The architecture should never constrain how many different resource classes we might want, which parameters might be selected to decide where to hist something.  I think that the simple truth is that operators should have a MANO/VIM/NFVi model that accommodates whatever they want to run, and what they want to run it on.  Let the decision on what specific resources to deploy be made by how resource consumption develops.

What really frightens me, though, is that the GSMA is looking into framing out some “NFVi categories”.  It frightens me because it implies we’re going to take an architectural vision that had major issues from the start, and that has since been diverging from the cloud’s vision, and use it as a basis for framing categories of resources to be used in the carrier cloud.  You don’t do the right thing by building on a vision that’s wrong.

What all virtualization needs, even depends on, is a true and dualistic vision.  In the “upward” direction, it has to present a generalized and comprehensive view of the virtual element or resource, one that envelops all the possible realizations of that element, whether they’re a singular resource or a collection acting cooperatively.  In the “downward” direction, it has to realize that upward vision on any suitable platform or pool.  That means that “NFVi” shouldn’t be the focus of virtualization for NFV, that the VIM has been the critical piece all along.  The sooner we deal with that, the better.

Have We Passed “Orchestration” to “Micro-Orchestration?”

Where do you go when you’re tired of orchestration?  Cloudify says it’s “micro-orchestration”, according to their latest release of the Spire edge-orchestration platform.  Given that Spire is targeted at edge computing, and given that operators are obviously looking at the edge and considering how they’d monetize it, I think it’s clear that edge computing is in fact driving orchestration needs.  I also think it’s reigniting the question of how NFV orchestration, cloud orchestration, and edge orchestration fit together.

NFV, as an ETSI Industry Specification Group (ISG) project, was launched to define how appliances used in network services could be replaced by hosted functions.  While the broad concept of NFV doesn’t specify whether the functions are single- or multi-tenant, the focus the ISG had on virtual CPE and service chaining has bent the activity toward single-service, single-tenant missions.  These missions were presumed to involve services under fairly long-term contract, and since physical appliances tend to be deployed for a long time, so their virtual equivalents were presumed to be semi-permanent.

Cloud computing is about replacing dedicated on-premises servers with virtual servers (in some form) from a resource pool.  The goal is to run applications, nearly all of which are inherently multi-user, and the great majority of which are persistent, meaning they’re loaded and run for an indefinite but fairly long time.  Where dynamism comes into the cloud is largely through the use of multi-component applications some of whose pieces are expected to scale under load, and all to be replaced if they’re broken.

Edge computing is about placing hosting resources close to the points of user activity.  The linkage of the hosting to what the user is doing implies that edge computing is tactical, extemporaneous, temporary, in nature.  In most cases, it’s likely that edge computing would be used as an adjunct to cloud computing or premises “distributed” or “private cloud” deployments that use a resource pool (containers come to mind).  An edge component might serve a single user or multiple users, depending on just what was happening in the area the component was serving at a given time.  It might have to scale, and it might also be unloaded to save resources if there was no need for it at a given moment.

What we call “orchestration” is the process of placing a software component in a hosting resource and setting up the necessary parameters and connections for that component to work.  NFV is an example of “simple orchestration”, largely because the goal of NFV orchestration is primarily to deploy (or redeploy in case of failure) and because the connection of VNFs is presumed to be a matter of linking through trunks, just like physical devices would be connected.  The cloud started with similarly simple deploy/redeploy thinking (NFV is OpenStack-centric, and OpenStack is also the basic model for virtual machine deployments in the cloud).

So, you may be thinking, it’s edge computing that’s changing orchestration.  Nope, it’s microservices.  A cloud is based on a featureless resource pool, so it shouldn’t matter whether you smear the pool out toward the edge.  What’s changing isn’t “edge computing”, it’s the way we componentize applications and services that are based on hosted features.  More dynamism demands more dynamic orchestration, and microservices are by nature more dynamic.

A microservice is a lightweight software component that performs a simple task on a simple request.  Unlike things like transactions, which often involve complex processing and many steps, microservices are presumed to have an input (the request, or event) and then to return a result that’s based on that input alone.  That means that a single microservice can support a whole community of users whose requests are intermingled, because the order in which they’re handled doesn’t matter to a microservice.  The context or state of the user’s job has to be maintained elsewhere, if it’s needed.

So why are we hung up on edges here?  Three reasons.  First, nobody but software people understand microservices, so writing stories about them is of limited marketing value, so you can’t sell ads for the stories easily.  Second, it is true that the easiest way to explain an application of a microservice is to say “events” or “IoT” because simple signals are indeed a perfect thing for microservices to process, and most events originate at the edge.  Third, if you are doing event processing, you may want to limit the latency associated with processing the event to keep from messing up the real-world activity your application is supposed to be supporting.  Push handling to the edge and you can do that.

The orchestration challenge of microservices, and the “container” architecture that’s most often used to host them, has gradually grown as the dynamism of our target microservice architectures has increased.  The original container systems, like Docker and the initial Kubernetes stuff, were very similar to OpenStack in terms of what they were expected to do for deployment.  Containers were a simpler environment, and that simplicity facilitated easier setup for deployments, but the steps were much the same.  Over time, containers have evolved, generally by adding elements to Kubernetes to create and expand what I’ve been calling the Kubernetes Ecosystem.

The problem with Kubernetes as an ecosystem is that it doesn’t orchestrate everything, even with the add-on tools that make it an ecosystem in the first place.  There are ways of accommodating things like bare metal or VMs, but one of the specific shortcomings of the framework is that it’s not a universal model-based approach.  What everyone (me included, or perhaps in particular) would like is a model-driven framework that includes intent-based features to model complex applications and services, and that can be used to represent hosting and connectivity in a fairly arbitrary way.  Such a framework, as the TMF’s work with NGOSS Contract proves, could also be used to provide state/event activation of processes for lifecycle automation.

Modeling is pretty much what Cloudify has focused on.  Its architecture is based on TOSCA, which is what I think should be the modeling language for any cloud-hosted, component-based, service or application.  It appears to have some event-handling capability, though it’s not clear to me whether there’s a facile way of defining state/event tables within TOSCA models and using them to steer events to processes (that capability exists in TOSCA).  If that’s not present in Cloudify’s approach, then it’s not micro-orchestrating enough.

I can’t tell for sure whether it is or not.  When I blog, I rely on a company’s documentation and never on their representations, because there’s no proof in a statement, only a document.  Cloudify’s website is much like the websites of DriveNet or SnapRoute, two companies I blogged about last week.  The material is aimed almost totally at a developer rather than at a decision-maker.  If micro-orchestration is totally transformational, it’s going to take more than a developer to buy it, and sell senior executives on the total range of technology impact that transformation implies.

Our biggest challenge in orchestration today is getting past the term itself into the details.  You can have little orchestras and big ones, general guidance or total lifecycle management, and yet we use the same word.  Everyone who talks orchestration, including Cloudify, should provide enough material to confidently convey the technology’s overall implications and the business impact buyers could expect.

I like Cloudify, but they need to look over their shoulder.  The Kubernetes ecosystem I talked about is advancing very rapidly, filling in the gaps we still have in total application and service lifecycle automation.  There’s a huge community behind it, and if Cloudify or any other player with a specific solution wants to keep up with the market, they’ll need to get ahead, because the ecosystem is way too large a mass to pass once you’ve gotten behind.

A Reality Check During MWC

What should network operators be focusing on today?  I know you think I’m going to say “5G” but I’m not, at least not in any emphatic or traditional sense.  The question isn’t what network evolutions are going to be, after all.  If you’re going to do wireless in five years, you’re going to do 5G in some form.  The real question is what would be transforming in terms of profits, and that’s what I’m going to try to answer.  To do that, I’ll focus on the 2020-2023 period.

The high-level question for operators is whether to focus on cost management or revenue growth in order to improve profits.  My modeling says that in our 2020-2023 period, the maximum impact we could expect from improvements in operations efficiency would be about 8 cents per current revenue dollar.  The maximum that could be expected from improvements in capex associated with the transformation from devices/appliances to hosted features is about 7 cents per current revenue dollar.  There are some cross-impacts between these, making the maximum combined savings about 12 cents per current revenue dollar.  This isn’t anything to sneeze at overall, but it’s not really transformational in the minds of many operators.

Let’s now look at the revenue side, but in order to talk about revenue potential in a practical way, we have to consider it in light of the industry’s competitive dynamic, and the realities of the current marketplace.  We’ll start with the latter, and express those realities in the form of a couple of axioms.

Axiom number one is that for any given service, the average revenue per user will fall over time.  Mobile users aren’t going to pay more for a service next year than they did this year, nor will users of other services.  Competition and the natural tendency of users to constrain usage to control spending see to that.

Axiom number two is that network operators will win new service competition with over-the-top players only where they have an advantage to leverage.  Where have operators been able to compete effectively with OTTs?  Nowhere other than in access services, where they have an advantage of having a lower ROI expectation than the OTTs.

What the first axiom means for network operators is that only new customers create new revenue for the current services.  Since we’re already almost fully penetrated in terms of human customers, the only hope of customer growth is to sell to non-humans, which of course is why we have such a craze for cellular-connected IoT.  The problem is that today’s applications for IoT are almost totally confined to “facilities” where local sensor attachment to a portal is the most cost-effective approach.

Cellular IoT advocates will point to things like autonomous vehicles as the solution, which presumably means giving a vehicle a separate mobile connection.  That’s possible, but don’t we have today the ability to link phones to vehicles?  We even have auto GPSs that will link to phones to pass traffic and other data.  I would argue that the average household, who today are looking for family plans to cover kids’ phone services, are unlikely to embrace having to buy services for their cars unless there’s no alternative, which there is.

In any event, it’s not clear exactly what an autonomous vehicle needs a connection for.  We navigate using GPSs.  We can get traffic conditions via current GPSs, or supplement it with a GPS connection to our phone.  That handles the “strategic dimension” of self-driving.  The “tactical dimension” of stopping at lights or avoiding collisions is clearly something that needs on-vehicle sensors.

The net of this is that operators can’t look for “non-human” customers for revenue growth in traditional services, wireless or wireline.  That means they have to go for non-traditional services, and the problem there is that they lose their pride-of-place advantage over the OTTs because their access assets aren’t a value…or are they?

The asset the operators do have to leverage is the central offices and mobile sites, which combine to create the service edge.  “Edge computing” means something if you have an edge somewhere to compute in.  There are perhaps 14,000 sites in the US where operators could install equipment easily, and almost 50,000 where they could install some equipment.  The figures for the rest of the world are similar when you factor in population and GDP, but my numbers say that we have a total of 206,000 sites where network operators could install edge computing.  Nobody else has that kind of real estate to leverage.

So what services do operators leverage?  In the 2020-2023 period, the number one service opportunity is related to streaming video.  Video caching, ad caching, and ad insertion are critical services even today, and you can see that personalization of the material demands a form of what I’ve been calling “contextualization”.  If a video viewer has been surfing around for cars, you don’t want to show the viewer boat ads.  If the viewer is in a boat showroom, however….  Anyway, my model says that video and ad caching and personalization represent a potential for about 35 cents of revenue gain during our four-year period.

There is no way that IoT would match that, nor any way that normal growth in mobile services would match it.  In short, the best-by-far chance for network operators to improve profits is to address the video space.  Yes, there will be other opportunities that develop after 2023, even for IoT which could generate as much as 45 cents more of revenue from 2024 through 2030.  The problem is that they bloom rather late, and operators need profit gains in the near term.

Contextualization is also the on-ramp to augmented reality (AR), which is what I said in a previous blog.  That could be a direct driver of 5G, but it depends less on the network than it does the intelligence that relates the user to the user’s environment, which is outside the network and more directly related to personalization.  IoT, at least in the form of location-related services, is something that AR could exploit, but you’d need a rich AR opportunity to drive wholesale deployment of cellular-connected sensors.

Remember all of this as you’re navigating the MWC hype on 5G.  We’re looking at the future as a technology race, and it’s really a profit race.  Nobody is going to spend much on 5G or anything else if it doesn’t generate a respectable ROI.  The future is what we’re able to pay for.

How Vulnerable are Networks to Induced Security Holes in Devices?

What are the risks of a network-device exploit?  The flap about Huawei has raised the question of a deliberate, even state-sponsored, built-in vulnerability, but of course network devices have the potential for a hack/exploit created by a failure in testing and quality control.  Cisco recently had to patch one in an SD-WAN product.  We hear about exploits regularly in the world of personal computers, particularly Windows systems.  Are networks at risk?  It’s hard to gather buyer data on this sort of thing, but here’s my summary of our current reality, starting with general comments on exploits and attacks and moving on to network devices.

To start with, a network device is like a computer system in that its functionality is made up of a combination of hardware and software.  Everyone I’ve talked with indicates that potentially harmful exploits, meaning ones that can impact functionality or create a security breach, says that ultimately it’s software that does the exploiting, but that hardware can open the door.  That means that the “risky” attacks would likely come by exploiting a software fault (as is the case with most Windows exploits) or by exploiting a hardware problem with a software-based attack (the Intel data protection exploit is an example).

The next question is how an attack could be launched.  With computer systems, applications are supposed to run on the platform, which means that applications themselves are usually part of the attack vector.  Sometimes the application is itself malware, and sometimes the application contains some vulnerability that allows malware to be planted; browsers or media players are examples.  In a smaller number of cases, there’s a vulnerability in the “platform software”, the operating system and middleware, that can be directly exploited.

All these mechanisms require one thing—access.  The Internet has been the greatest hole in security of our age, because it provides a means of access to launch an attack.  A system that’s not connected to anything is difficult to attach, and further the benefits of attacking it are limited since it has no observable external behavior that an attacker could benefit from.

You can now summarize the options for a computer attack.  You get malware on the system, either by loading it on by subterfuge or by creating a “hole” that could be exploited to load it.  One popular example is the “buffer overflow” hole, where you send a long packet that overflows the data buffer and overwrites code.  In your packet, you put some code that then bootstraps the malware in, and you then cause that code to be run by causing the condition that executes the overwritten code.  You now have malware loaded and run, and that malware can do whatever internal platform controls (memory and storage protection tools) don’t prevent.

Another similar option that’s less flexible in terms of what the attacker gains is that you can create a fault instead of creating a hole.  If you overwrite code with our buffer overflow, and if you simply crash the software that’s been overwritten, you probably break the computer.  This would likely be done only if there were protections in the computer platform that essentially prevented “exploiting” a hole to steal data or do more subtle manipulations.

Let’s now contrast the computer situation with that of network devices.  A network device has the same hardware/software platform combination, but in most cases it’s not used to run external third-party applications.  That means that introducing malware onto the device is more difficult.  To do so, you’d have to somehow exploit a platform vulnerability, and that would mean either sending a particular combination or type of packet that created a fault that could be exploited, or using the device management system.

It’s much harder to find an exploitable problem with a network device because the device doesn’t run third-party software and isn’t generally available to hackers in a way that would let them play with possible exploit problems.  Buying a router is harder than buying a PC, and you’d actually probably have to buy a number of them and set up a live network to create an intrusion.  Most operators think that network device hacks are probably going to be undertaken not by lone black-hats, but rather by criminal organizations or state-sponsored entities.

If you had the resources, could you hack a router as easily as a PC?  Probably not, because the router is probably not designed with as many “hackable features” and probably doesn’t present as many places to “dig a hole” in.  However, it could be done if there were any errors in coding that could be exploited.  Or if there were hardware issues, including chip issues.

A chip or hardware defect that could be used to open a hole could be exploited in a router in much the same way it could be exploited in a computer, but with the greater difficulty in finding it and then gaining access to it that I already noted.  If we assume the “criminal enterprise or state-sponsored” source of the hacking, we could assume that the goal was either to disable the device on demand and so break the network it was in, or gain access to the management channel to perform more subtle manipulations that could include security violations like stealing packets.  Either would be possible, but difficult.

Unless you could introduce the hole, which is why there’s a concern in some quarters about Huawei.  NPR did a program (the transcript is available HERE) outlining the possible links between the company and the Chinese government and military.  Other links have been cited by other sources in the past.  The Chinese military have long been suspected of hacking US sites, including government sites.  If these connections are real, and if there is a desire in the Chinese government and/or military to exploit networks, could Huawei be building hardware that included holes waiting to be exploited?

The short answer is, in theory, “Yes”.  A proprietary network device, meaning one that’s supplied with a network operating system from the same vendor, could easily be designed to incorporate an exploitable hole.  To incorporate one in the software would be almost childishly simple; think of how so many Linux systems were exploited because the default userid (“admin”) and password (none) were allowed to remain valid.  To incorporate one in hardware would also be fairly easy, particularly if the device used custom semiconductors (including FPGAs) that were programmed by the device company itself.

A hole introduced in hardware, waiting to be activated with a seemingly innocent packet, could let a hacker gain control of a device.  It would then be easy to disable the device, and possible for the device to be used to gather information about the network overall.  It would be possible, but more difficult, to use the hole to inspect device traffic and “spy” on packet flows.

Could the hole be located with careful analysis?  Probably, but perhaps not easily.  The smarter a chip, the more difficult it would be to identify all its behaviors by examining it, particularly if “examining” meant exploring its behavior by running it.  If you asked for the program or logic description of a chip rather than trying to explore its behavior in operation, could you be sure you were getting the correct information?  Not if you presumed the equipment vendor was deliberately creating an exploitable hole.

The reason why 5G is such a hot button on the issue of the security of network devices is that when you introduce a new technology you introduce a lot of new devices.  Not only does that multiply the possibility of hacking one or more of them, it means that the devices could “cover for each other”, meaning that systemic goals like spying would be harder to detect because the extra traffic created would pass through other compromised devices.  It’s not that faster wireless offers more hacking opportunities, but that more new devices offer more opportunities to introduce and disguise a hole.

The truth is that any network vendor could build any or all of their devices with exploitable holes.  So could any chip vendor, as Intel proved with its accidental data protection error.  Every box we have out there, in every network, could be waiting for the command to turn itself into a zombie.  No traffic, no application, no secret, is completely safe.  We know that because we’ve seen the “accidental” holes already.  What can be done by accident could be done on purpose.

Nothing is ever certain or safe, of course.  I remember when I was in high school and taking an advanced lecture course in quantum theory, the thing they called the “tunnel effect”.  Imagine a marble on a track that had a hump in the middle.  You could calculate, based on the shape of the track and the material used for both track and marble, the force needed to push the marble fast enough to get over the hump.  Less force means (in Newtonian physics) the marble would never make it.  Quantum theory says that it has to make it with a very low probability, so the marble must be able to “quantum tunnel” through the hump.  This is sound theory, but very few quantum physicists would spend much time flicking marbles to try to prove it out—the probability is so low it would take longer than the universe is likely to last.  My point is that we can’t make networks safe, only make them safe enough for all practical purposes.

How safe would that be, though?  I do not believe that any testing or examination of any vendor’s hardware would be sufficient to absolutely assure there’s not an exploitable hole built in, deliberately introduced or otherwise.  With respect to a willful creation of an exploitable hole, I don’t think any vendor assurances that there were no such holes could be trusted unless the vendor was trusted.  If an exploit occurred, no monitoring or management process could be expected to detect it.  It comes down to trust.

I can’t assess the validity of the concerns expressed about the risk that the Huawei relationship with the Chinese government and military might pose, or whether that relationship even exits.  I have no comment on whether Huawei’s devices pose any threat to networks, and in particular on whether such a threat would be intentional.  I’m sure there are people here in the US and in other countries that believe that the US’s National Security Agency is planting exploits and spying on everyone, and I can’t assess whether that’s true either.  All I can say is that if you believe that you’re at risk to willfully introduced exploits with a given vendor or piece of equipment, I don’t believe you can test your way out of that belief.  That’s the view I express to any who ask me, client or otherwise.

Does the AT&T Deal with Mirantis Transform Enough?

ATT has been a leader in what we could call transformation, and we recently had a number of stories about its adoption of Airship, via a deal with Mirantis.  This follows on an AT&T blog on the topic last May.  There are a lot of things about this deal to like, but there’s not been a whole lot of ink spent on describing what Airship does, or could do.  I want to try to address that here, because Airship is related to the broader issues of transformation and carrier cloud.

In a blog last week and one earlier this week, I talked about some technologies to virtualize networks by converting fixed devices into software instances, or by framing networking within a Kubernetes model (SnapRoute).  We can see virtual networking, in a sense, as a sliding scale from virtualizing network elements (DANOS, 6WIND virtual router) through virtualizing network elements and topology (DriveNets) to virtualizing network services (SnapRoute).  We can visualize Airship as most similar to the last of these.

Airship is a project of the OpenStack Foundation, aimed ironically at providing a strong framework for lifecycle automation in container environments.  The container focus may seem ironic, given that one of the goals AT&T has for Airship is a replacement of the classic virtual-machine (VM) framework that OpenStack has been promoting from the first.  The project includes Kubernetes and Helm, and its goal is to provide what AT&T called an “under-cloud platform” (UCP) using Kubernetes on top of bare metal, and OpenStack Helm (OSH) on top of Kubernetes.  You can also make this configuration work on top of VMs instead of bare metal, but AT&T’s goal seems to be to go right to Kubernetes.

AT&T’s blog touts this as “delivering a unified, declarative and cloud-native way for operators to manage containerized software delivery of cloud infrastructure services”.  The “unified” part seems to reference the fact that Airship creates a single operations environment.  “Declarative” means that like Kubernetes overall, Airship defines a goal-state or end-state using parameters (YAML) and the software then works to achieve and maintain it.  “Cloud-native” is a bit harder to interpret, but I take it to refer to the fact that containers are the logical way to deploy microservices or other small logic pieces that cloud computing in native form would favor.

In my view, though it’s not stated explicitly by AT&T, the goal here is to create a carrier cloud framework.  If all AT&T wanted to do was deploy traditional VNFs in service chaining applications, there’s no reason why VMs wouldn’t have worked.  As I pointed out last week and in an earlier blog this week, if you virtualize devices (physical network functions or PNFs in the NFV lexicon) you tend to create traditional networks by non-traditional means, and the application of carrier cloud to things beyond NFV isn’t about creating networks at all, but rather about creating experiences.  In that light, carrier cloud would treat traditional network services as a class of experience to be created on an experience-centric platform, not a translation of appliances (PNFs) into software (VNFs).

What AT&T is doing seems to center on getting Mirantis to join the ONF and Airship project, and then doing some code exchanges to flesh out Airship to be more in line with Mirantis’ Cloud Platform (MCP).  I’m showing the architecture in the figure below, from AT&T’s white paper on the topic.

The challenge for me here is interpreting exactly what mission Airship is supposed to be supporting.  You’ll note that ONAP (derived from another AT&T project, ECOMP) is noted as a “consumer” of the Airship platform, meaning it sits on top like applications.  That suggests to me that Airship is an abstraction of the cloud (OpenStack calls it part of the “Network Cloud” initiative).  Besides OSH (Helm) and Kubernetes, Airship includes nine other open-source components (some of which relate to document management), and so you could rightfully call it a form of “Kubernetes ecosystem” for the carrier cloud.

The figure above also shows “VNF” as a consumer of Airship, which to me suggests that AT&T is looking at containers as the mechanism for hosting VNFs, further leading to a focus on “carrier cloud” versus “NFV” in a specific sense.  As I noted in the two earlier blogs on white-box and virtual routing, there are some data-plane activities that probably require a customized platform rather than a general resource pool.  VNFs consume the same Nova/Neutron framework as before, meaning that they still use OpenStack, but the hosting maps to containers and Kubernetes.

Anything that moves operators toward carrier cloud instead of NFV Infrastructure is a good thing, but Airship has its issues.  My own concerns about Airship arise from its very conception as a way to map OpenStack to containers.  In the figure above, Kubernetes is a kind of parallel/orthogonal element to OpenStack’s Nova/Neutron.  This does in theory allow you to continue to use OpenStack to deploy VMs, but that’s not the stated goal AT&T has for Airship.  Why, then, use a VM-centric strategy to deploy containers when you’re using Kubernetes too?  Why even have ONAP in there for lifecycle management?

One clear problem here is the figure itself.  We saw with the NFV ISG’s E2E architecture block diagram, which they asserted was just to illustrate a functional relationship, how easy it is to let explanatory drawings become architectural models.  You don’t build cloud applications, services, or infrastructure from monolithic blocks, but it’s hard to depict functional relationships the way the cloud would implement them.  We thus fall into an illustrative diagram becoming a guideline for implementation, and implementation of that diagram breaks our original goal of cloud-native.

The easiest interpretation of the figure, in NFV terms, would be that Airship’s effect is to build a kind of NFV Virtual Infrastructure Manager (VIM).  This would abstract infrastructure for NFV using containers, but would also present an abstract hosting option for the broader mission of carrier cloud.  The problem with that is that Kubernetes is shown under ONAP, which means that ONAP is presumed to be “orchestrating” and providing “lifecycle automation”.  That would seem to diminish the role of Kubernetes in doing that, and that would be bad because the cloud community is making Kubernetes (more properly, the Kubernetes ecosystem) into the best overall framework for lifecycle automation.

I think the Airship concept, the “Network Cloud”, is a really good one, and it might well even be essential (in thought if not necessarily as a unique implementation) for carrier cloud to succeed.  However, I’m not sure that AT&T is realizing its potential fully.  What the AT&T vision surely does is to create a fairly concrete barrier between what could be called “hosting” and what could be called “service”.  In the sense that an intent model naturally cedes the SLA management of a given model-element to processes within that element, AT&T has made a logical choice.  In the sense that NFV from the first tried to sustain current device-centric management practices, AT&T seems to have made hosting into a virtual device.  But more could be done.

If AT&T or any operator really believes in transformation, they have to believe in selling experiences and not just connectivity.  There is still too much connectivity in their thinking, and that’s true even with Airship.  Where is the link between the Airship framework and true cloud experiences?  It should be easy to find, if Airship is really a complete solution.  So yes, AT&T has literally led the world in technologies to support operator transformation, but that doesn’t mean that they’ve provided the complete solution.  I’d like to see them go further.

Beyond Routers: Part Two of the Options Available

“White boxes in the network” sounds like a bit like “Red Sails in the Sunset” (at least to my generation), but it’s not just about a nice appearance.  White-box switching and routing are critical for network operators, and increasingly so for enterprises.  I blogged about some initiatives in the space last week, and today I want to look at two more ambitious and cloud-centric developments, and see how they might play in the space.

In my last blog, I talked about two initiatives that could be considered variants on the software router or white-box router theme.  DANOS and P4 is an AT&T-sponsored approach to building a routing appliance.  6WIND’s virtual router is a hosted-router-software commercial offering that is often server-hosted but could be appliance-based.  Both these approaches fit a software router model more than what I’d call a virtual router model, in that both would likely be deployed in place of a physical device and would, like a physical router, stay where it was put unless it broke.

There are other approaches, of course.  One is, like DANOS, a white-box appliance approach, but unlike DANOS it aims at providing a kind of “white-box resource pool” on which virtual routers are deployed according to the service topology requirements.  The other is one that focuses not just on virtual routers but virtual cloud networks.

Because it’s a nice transition from the first part of this series, let’s do our white-box solution first.  DriveNets, a just-out-of-stealth startup, also has a cloud network solution, designed this time for white-box deployment and so less focused on the “virtual” network than the real one.  As I noted above, DriveNets is virtual in the sense that like the cloud it presumes a resource pool, this time of white boxes.  Instances of routers are then deployed on that pool.

The foundation of this approach is the DriveNets Network Operating System (DNOS), which is the software platform that combines with white boxes to create the network cloud resource pools.  DriveNets presumes these would be distributed in a hierarchy from edge to core, with deeper pools being larger than those at the edge.  Thus, toward the edge, the router instances would be more fixed in assignment, and would become more cloud-like as things go deeper.  I interpret their material to say that they are able to assign a hot-standby instance of routing processes for resiliency and quick failover.

DNOS can run either on bare metal or in a VM, so you could integrate it with a broader carrier cloud concept if VMs were compatible with your plans.  DriveNets makes a lot of container-friendly comments but there’s no detail on whether the implementation is actually based on containers, or whether container orchestration is used.

All this requires an element of coordination, which is provided by the DriveNets Network Orchestrator (DNOR).  This does the provisioning and lifecycle management for DNOS-equipped nodes, and plays the same overall role that a cloud orchestrator would play.  There’s little technical detail provided on DNOR, other than that DriveNets say it’s microservice-based.

The cloud-centric initiative is from SnapRoute, who has what they call the Cloud-Native Network Operating System, or CN-NOS.  CN-NOS is a “disaggregated white-box switch” tool, one that’s integrated into Kubernetes and microservices.  Thus, it’s really a kind of a mixture of virtual networking, orchestration, containers, microservices, and the cloud.  CN-NOS is, most of all, about cloud virtual networking, a kind of networking where network connectivity and application deployment and lifecycle management are tightly coupled.

You read a lot about containers and continuous integration and development (CI/CD) in the SnapRoute material, which I think shows that the product is focused on making networking a part of the cloud rather than something glued on from the outside.  You still need some transport networking, of course, but you build dynamic virtual networks as part of building applications, and then follow that tie-in through the rest of traditional Application Lifecycle Management (ALM), CI/CD, and operational lifecycle management.

From an implementation perspective, SnapRoute is a load image for “disaggregated white-box switches” (to quote their term), one that includes a Linux distro and the specialized CN-NOS integrated-Kubernetes and other tools for lifecycle monitoring and management.  Switches so equipped become container hosts for the virtual-switch functionality that is then deployed, using the switches as a resource pool.  Since network services are containerized, they can be updated/redeployed like any other container application, and the automation/orchestration for lifecycle management is handled by Kubernetes itself.

Which brings us to another term you see a lot in the SnapRoute material is “NetOps”, a term becoming popular as a description of the need to deploy virtual networks in an automated way following the same basic “declarative” principles as Kubernetes follows for container deployment.  Services are made very much equivalent to applications, in that you can deploy them and remove them explicitly, and you can link network services to applications to specificizes the connectivity to the governance/security requirements.

You can see similarities in these last two examples of white-box/virtual networking, as well as similarities between these two models and the two I covered in my prior blog.  I think the best way to look at them is to say that DANOS and 6WIND are more “transport” or “network-independent” in their approach, while DriveNets and SnapRoute start to shift more to the virtual, and to a very cloud/container-centric model in the case of SnapRoute.

The challenge for prospective buyers here is that while everyone understands the basic principles of “router networking” and these apply as well to white-box and virtual router approaches, nobody really understands the virtual models nearly as well, and probably not even well enough to be confident.  The documentation for both DriveNets and SnapRoute is difficult to navigate; the former doesn’t provide enough information and the latter provides perhaps too container-Kubernetes-centric a picture, lacking a good overview of the architecture and how NetOps and applications would relate to each other.

The question that I think these two offerings pose is that of the relative importance of virtual networks versus “transport networks”.  In a cloud data center, in an enterprise using Kubernetes and containers, and for network operators and carrier cloud, I would submit that virtual networking is more important.  We can in theory overlay virtual connectivity on any set of transport infrastructure.  That was the MEF’s message (not particularly strongly stated) in its “Third Network” story, and it was true then and remains true now.  The problem of course is that a virtual network is whatever you make of it, and it’s thus hard to pin down.

One approach to cutting the cost of routers is white-box, of course.  Another is the subduction theory of dumbing down Level 3 by absorbing some of its features in an SDN-over-optical approach, something like Ciena sometimes seems to be advocating.  Another is the “Third Network” overlay of SD-WAN and virtual networking on top of arbitrary transport.  The four products I’ve highlighted in the two blogs on the topic show a lot of this range of options, and more are likely to come along as the industry looks longer at the problem and starts to show a preference.

One thing I think is clear is that simple box-for-box replacement of routers is only a transitional approach.  The long-term model has to be more virtual in nature, which favors the two approaches I’ve talked about here.  But DANOS and P4 could in theory be used to build a white-box that’s not strictly a router at all, and of course 6WIND could add virtual-network-specific features to its own product.  There are any number of possible futures, and only time will tell which unfolds.  Remember, Cisco had a great quarter, so routers aren’t dead yet.

In Search of Router Alternatives, Part 1

Everyone accepts that network operators would love to eliminate proprietary network devices.  NFV was aimed at eliminating at least some of them, but using a cloud to host a data-plane function has its issues.  AT&T recognized this and decided to frame an open (later released as open-source) operating system to run in an appliance and perform Level 2 and 3 processing, defined by an existing chip-control forwarding language called “P4”.  We also have commercial initiatives aimed at the “white box” virtual router opportunity, including one that just got a win in Europe.  What does this portend?  Let’s see.

I blogged about the dNOS and P4 approach last year.  In brief, what AT&T has done is create an embedded operating system (dNOS, now Linux Foundation DANOS), and a general architecture that includes the use of programmable chips, which is where P4 comes in.  The combination is aimed at creating open switches and routers that perform almost as well as their proprietary, custom-chipped, equivalents.

dNOS is an open framework for building disaggregated routers and switches.  As I noted, it’s been released open-source via the Linux Foundation (DANOS), but there’s no material available from that source at this point.  AT&T based dNOS largely on the Vyatta software router from Brocade, and dNOS/DANOS has native ability to handle the traditional IP protocol suite.  It also has facilities to be extended to incorporate semiconductor packet processors through P4.  P4 came out of an ACM article, but it’s also been adopted by the ONF.  Complicated, huh?

DANOS is a part of AT&T’s 5G cell-site white-box router strategy, large-scale testing for which was announced in December 2018.  This implementation doesn’t include P4, and I’m seeing a lot of interest in the DANOS concept from other operators (especially in the EU) but not a wave of commitment.  Part of that may be the slow start in the Linux Foundation project, which as I’ve noted doesn’t have anything on its website yet.  Part may also be that some operators may be reluctant to jump in on what was initially a competitor’s approach, and part that introducing something like DANOS is difficult unless you have a greenfield buildout like 5G.

On the commercial side, deep packet inspection vendor 6WIND has a virtual router offering that, perhaps ironically, is positioned as an alternative to the Vyatta stuff from now-broken-up Brocade.  It recently got a win in Europe where it runs on the same boxes that Vyatta used.

Operators had a lot of interest in Vyatta when it came out.  One Tier One talked extensively to me about it, and ran a fairly large-scale trial.  Brocade, riding a combination of Vyatta and a positioning that transformation was mostly a data center technology task, managed to get the largest gain in strategic influence within a 6-month period that my surveys had ever recorded (this was in the spring of 2013).  Brocade, of course, didn’t manage to get its act together in strategy, and subsequently got sold in pieces.

6WIND started with a deep packet inspection story, and in that same period was pushing that more than virtual routing.  Virtual routing turned out to be a better approach, and so 6WIND applied its packet inspection thinking to the forwarding mission, doing a rather nice job.  Their virtual router is designed to run on a commercial off-the-shelf server, and to take the place of a physical device.  That means that it’s generally expected that, absent a failure, the virtual routers would stay where you put them.  You can get source code on this, and roll your own implementation.

You can save a lot of money with a 6WIND virtual router, versus proprietary router hardware.  Some operators have told me the savings could easily be 70%, in fact.  The virtual routers will work in nearly any mission a real router could be used, except the very high-traffic cord devices.  Operators have already deployed the 6WIND solution; the company lists half-a-dozen successes, but obviously we’re not seeing a tectonic shift with 6WIND either, and since they’re not a network operator (AT&T obviously is) they don’t have a captive deployment to leverage.

These two examples illustrate two slightly different approaches to the “get-rid-of-routers” job.  DANOS and P4 offer an open-source model, which has the advantage of cheapness and the risk of having little or no real support.  6WIND has a commercial virtual router of the kind that operators have been pretty interested in for about 6 years.  DANOS is an optimized appliance operating system, where 6WIND’s approach is designed to work on those open servers.  Is one strategy better a clear winner?

To me, the issue with either approach is a realistic deployment scenario.  Nobody wants to stick a virtual router in the middle of a vendor-router network; you can imagine how much finger-pointing that would generate!  Every network failure that happened from the deployment to the end of the world would be blamed on that hypothetical single virtual device.  A more likely model would be a greenfield deployment.  The most likely greenfield place, of course, would be 5G, which of course is what AT&T had in mind for dNOS in the first place.

Greenfield deployments pose their own risks.  AT&T might be willing to bet their 5G edge on the DANOS successor to dNOS, but would other operators?  However, would they be more likely to bet on something like 6WIND, who after all isn’t exactly a household word in networking?  I know the 6WIND people and have worked with them, but that’s not true of everybody.

To compare the approaches in more detail, I think you have to look at the concept of P4.  The DANOS model was designed to support packet-processing chips using a kind of stub interface and the P4 programming language to create the forwarding rules.  This approach, which is still based on open concepts, could result in some very useful capabilities.  Vendors like 6WIND, of course, can customize their own stuff to accommodate specialized semiconductor support too, so the question may come down to how fast it happens.  Part of that depends on how fast P4 happens.

Right now, DANOS and P4 are embodied almost totally in open consortium activities, each of which tend to move at the pace of a turtle.  Since we have at least two such activities involved in DANOS/P4, we have what’s analogous to a turtle race where your entrant is two turtles tied together.  That image doesn’t need to be expanded on very much to make the point that process is far from lightning fast.  6WIND, on the other hand, has shown the ability to transform its mission and be quickly credible, but if they’re a rabbit in our race, they may be one that’s not yet accepting where the finish line is.  Do we need an architecture like DANOS/P4?  If so, then 6WIND either has to adopt the model or offer an alternative.

I like the virtual-router 6WIND approach for its simplicity of message—replace routers with virtual routers.  It’s easy to understand and to sell.  I like the open model of DANOS and P4; in the reliance of open-source software, the vision of an open mechanism for incorporating packet processing chips, and the agility of a forwarding language to help navigate the evolution of network services.  I think both approaches have a challenge in meeting the varied expectations of today’s market.

These aren’t the only initiatives in transforming networking for the age of the cloud.  Next week I’ll talk about two others, and how they combine with these two to frame a lot of questions about how networks will look in ten, or even in five, years.

Discovering the Secret that Could Launch Real Mobile 5G

One of the things I learned in the past was that “necessary” and “sufficient” conditions were worlds apart.  Another thing I learned was that “pent-up demand” reflects in how realization of opportunity bunches up until released.  It’s pretty clear that 5G supporters haven’t learned that lesson, as an article in Light Reading reflects.

A “necessary” condition for something to occur is a condition that must be satisfied, but which by itself does not guarantee the thing will happen.  A “sufficient” condition can be a sole driver.  One of the big problems that 5G has faced (and that frankly in this age of over-hype, practically all technologies face) is that 5G is not a sufficient condition for what we’d like it to do.

Do you need 5G for mobile broadband?  Clearly not; we have mobile broadband today without it.  But there’s another side to this particular truth, which is that even if you couldn’t do something without 5G, having 5G doesn’t guarantee it would get done.  The best example of that is virtual reality and augmented reality (VR/AR).

You can make a decent case that the bandwidth requirements for many VR/AR applications could not be satisfied for any reasonable number of users without 5G.  It’s not just a matter of per-user bandwidth; you would need fairly high bandwidth density per cell, which is really where 5G could shine.  But suppose we got 5G.  Does that automatically mean that we’d have augmented reality?  Clearly not.  It would be a “necessary” condition, but not “sufficient”.

There are two barriers to achieving the things that 5G might be “necessary” for.  One is the availability of a marketable application that provides the experience.  The other is the cost of the capacity needed to fulfill it.  The latter is a primary impediment for many things that 5G “could do”, so let’s deal with it first.

How much would operators charge for 5G service?  The answer, at least in an enduring-market sense, is “nothing more unless there’s significant willingness to pay.”  If you can play your video now, and can still play it with 5G service, you’re gold, and anyone who says otherwise is dreaming.  That means that it’s going to be difficult to get operators to offer a lot more per-user bandwidth, because they’d have to charge for it.  What does the user get for the money?  All the wonderful 5G applications (like VR/AR) that everyone built and had ready to go, just hoping that 5G would come along and the users would buy it.  In other words, probably nothing.  That’s the other side of our pair of points.

Suppose we were to postulate what kinds of mobile broadband applications would use a lot of bandwidth, the one thing that we could say 5G could theoretically provide that 4G doesn’t.  VR/AR is about the only thing that comes to mind, since video is already available and there are few other high-bandwidth applications that anyone really seems to be thinking about.  Let’s look at VR/AR.  There are surely VR/AR applications in gaming and entertainment, and we do have some examples of these, but these applications are far more likely to be run in the home or an area where WiFi is available.  Gaming does appear to be a convincing opportunity, but how much of it would be WiFi?  Most, I suspect.

It’s hard to see how a lot of really hot uniquely 5G apps could come along absent any way to run them until you had 5G service widely available.  It’s the classic “first-phone” problem; nobody buys the first phone because there’s nobody they can call.  That means the applications and the service would have to stumble forward in jerks, hoping something created a justification.  That’s unrealistic, so what we can really expect is that 5G will roll out largely as an evolution to 4G aimed at improving overall cell capacity not individual user speed.  When that happens, and when user speed could be upped if there was a demonstrable market, we could see different services and not just different cells.

Time now to talk about my second life-lesson, the fact that a pent-up market creates clumping where the constraint is found.  Let’s suppose that there are super applications of VR/AR out there that could easily consume 5G-scale bandwidth.  Doesn’t that make it likely that there are super applications of VR/AR out there that would still work, at least in a limited sense, on 4G?  We should have many such apps already, if they were convincingly valuable, and they should be clustered at the high end of current 4G network performance.  If that were true, nobody would doubt the need for and value of 5G, and we’d be in a real race instead of a hype contest.

The most credible and potentially ubiquitous VR/AR application that would require cellular broadband is the familiar AR one where a user with special glasses sees the names of places and potentially even people, messages from them, etc. laid on top of the real visual field.  This application has to keep the augmentation in phase with the real world while the user is likely moving, and so it’s performance-critical.  The thing is, that’s also possible today with 4G.  Perhaps the application wouldn’t perform as well as it might with more bandwidth, but we could surely offer it.

This kind of AR application, as opposed to VR gaming, is critical for mobile 5G because it’s credibly a mobile application.  Gaming is something likely to be done in a fixed location, and while it’s possible that the location wouldn’t have WiFi, it’s not probable given the drive to deploy WiFi in almost all public facilities.  Further, if operators wanted to charge for 5G bandwidth based on gaming use, they first miss a lot of the population and second would (by the added cost) put pressure on public locations where gamers congregated to offer WiFi.  If there’s a mobile application to drive 5G, then, it’s augmented reality.

That means that what we’re seeing now with 5G mobile, which is the classic “Field of Dreams” model of a new service where operators “build it” and hope “they will come”, is the wrong approach.  What should be happening is that the stakeholders who want mobile 5G, like operators, handset vendors, and 5G network equipment vendors, should band together and foster applications for AR.  A solid AR app, in itself, could go a long way toward justifying 5G.

An AR application architecture could be the home run.  AR is an obvious example of what I’ve called “contextual” services, meaning that the service is more valuable the more it knows about what a user is trying to do.  In our example of the labels on the user’s visual field, you can see it would be easy to get the visual field so cluttered with labels that nobody could see where they were going.  On the other hand, if the application knew you were trying to find a specific product, kind of restaurant, or friend in a crowd, the labels could be made manageable.

Even cellular IoT, another application that’s getting all the hype and none of the attention to reality, could be promoted by contextual services.  Where you are in relationship to other stuff is the essence of context.  Even Cisco said that in one of their recent blogs.  However, IoT-related location services are not a compelling application for mobile services, because we don’t have many of them now.  Yes, 5G is needed to fulfill the needs of a vast cellular-based IoT community, but we could get that started with 4G and we’re really not seeing that.

My suggestion is that everybody stop hyping 5G and start supporting it.  The way to do that is to frame an AR architecture, a contextual architecture.  Let’s set up a body, maybe an open-source project or maybe an industry group, and attack the real problem.  Like most technologies, 5G doesn’t need promotion, it needs justification.

Decoding Google’s Cloud Executive Announcement

Light Reading did a nice piece on Google and their new hire to head the telecom, media, and entertainment piece of Google Cloud.  John Honeycutt came from the TV side, the Discovery Channel specifically, and so he’s much more a content guy than a cloud service guy.  Specifically, he was in charge of IT, media technology, production and operations.  There are a lot of ways you could spin his background in regards to Google Cloud, so let’s spin a bit and see what makes sense.

Variety offers a bit of detail into the story, so let’s start there.  “Honeycutt will lead development and implementation of Google Cloud’s worldwide strategy, using his pay-TV background to focus on helping migrate traditional video distribution workflows from film studios, content creators and distributors to the cloud. In addition, Honeycutt will work closely with telcos, cable and satellite TV providers and will focus on driving cloud adoption by gaming and e-sports companies.”  This statement, I think, offers us a good jumping-off point for discussion, since Google doesn’t provide its own statement on his role.

Migrating video distribution workflows covers a lot of ground, but the “focus on driving cloud adoption by gaming and e-sports companies” seems a bit easier to parse.  Put another way, they believe that the Internet is going to be an expanding delivery vehicle for entertainment, and they want Google Cloud to be there to host the content distribution side.  In my view, that also means that they want Google Cloud to play in some way in streaming TV and content delivery, since that’s the current market opportunity and it would make little sense to ignore it.

Google, of course, is already a provider of streaming video services (YouTube) including live TV (YouTube TV).  They could easily be looking at themselves as the vehicle for delivering e-sports and gaming, meaning looking at enhancing their own offerings in that area and attracting content provider business.  That would make sense of Honeycutt’s background; who better to pitch Google delivery to channels like Discovery?

They could also be looking at becoming a provider of the tools to allow content companies to self-distribute.  CBS (“All Access”) already streams its own content for pay.  Amazon has some Channels deals with content owners.  Other content providers are surely looking at the squeeze in profits in the entertainment food chain and wondering if they could make more by cutting out the middleman and going their own distribution route.  Obviously few would want to build out the infrastructure, so hosting it would be the logical choice.

Finally, Google might see all the cable companies and telcos getting into streaming, and thus needing a mechanism to deliver content that way.  AT&T has already committed to DirecTV Now, but it’s lost a significant market share recently on what most say are service quality problems likely arising from infrastructure insufficiency.  Verizon has been offering YouTube TV along with some of its 5G home broadband already.  Could Google become a provider either of content distribution on a resale basis, or as the host of a kind of MVNO-like content relationship with the telcos and cablecos?

At one level the good news for Google is that a lot of the technology needed to fulfill any or all of these three missions would be very similar.  Google has been fairly open about the software it’s invented for the cloud (Kubernetes came from Google, for example), so if they follow the same pattern they wouldn’t risk much by offering a toolkit from which anyone, including themselves, could assemble and deliver entertainment.

At another level, Google may realize that if any of these three apples fall, they all will.  The forces acting on the online entertainment market jostle everyone in it.  For example, suppose that network operators and even some cable companies start to deploy 5G/FTTN millimeter-wave solutions for home broadband.  Those companies will immediately need a streaming TV strategy, which might induce them to either resell Google or assemble their own.  In either case, the new competition would compress margins and make content owners think about building their own distribution model.

Finally, Google may be making the savviest (and perhaps most risky) bet of all, which is that carrier cloud is likely to follow the implementation path of carrier video and advertising.  Video-related stuff is the overwhelming majority of early driver stimulus for carrier cloud data center growth.  If Google could snare telcos’ video and advertising delivery in some way, they would own the largest driver of new data center deployments at least to 2024 when my model says that other forces catch up.

Think about that one.  My model has said, since 2013, that carrier cloud would be the largest incremental source of new data centers, and the largest single source of data centers overall.  Imagine if you’re Google and you can get that business.  Forget Amazon and Microsoft; their clouds would be pygmies in comparison.  The model says that by 2030, carrier cloud could drive one hundred thousand incremental data center deployments, the great majority at the edge.

It’s not conclusive, but the fact that Honeycutt is in the Google Cloud area suggests that something other than a retail or even pure resale model is on Google’s mind.  Were Google to offer its own services, it would be less attractive as a partner to others because partners would be directly competing with Google.  I’m therefore tempted to bet on the notion that Google is looking to host third-party entertainment distribution in the Google cloud, in which case they may not be looking to expand their current YouTube stuff as much.

Google has most of the technology it needs to deploy streaming entertainment for partners, it just has to organize it into part of Google Cloud.  While they’re at it, they could consider the fact that the personalization and contextual services driver and the IoT driver that will catch up with video/advertising in 2024 would benefit from many of the same features.  Contextualization is a superset of personalization, clearly important for video/advertising, and contextualization is also a big part of IoT.

5G is a key point here too, not only because the telco/cableco space is pretty interested in it, but because a 5G/FTTN boom would accelerate the need for new content/experience delivery relationships.  For carrier cloud, 5G and mobile-related technology is a secondary driver, never more than in third place prior to 2024 and slipping slowly after that.  But it’s associated with a new-service build, and if operators get really serious about any flavor of 5G, the steady 15-18% contribution it could make to carrier cloud deployment could add up to something important.

Would a telco or cable company outsource “carrier cloud”?  I think most would say they would not, but when I chat with high-level internal planners, they’re not so sure.  The problem they cite with rolling their own carrier cloud is the incredible inertia associated with getting into something they have no experience with and no skill set with.  Could they hire people?  I harken back to my early career, when I worked with an eager systems manager who had good management skill but zero knowledge of technology.  He ran up to me excited one day and said “Tom!  I just hired a great software engineer!”  My question to him was “How did you recognize him?”

Then there’s NFV and the operator standards stuff.  The chance that Google would build carrier cloud, or even build the services that the six credible carrier cloud drivers require, based on ETSI NFV or ZTA is in my view zero.  If operators don’t build carrier cloud, but instead outsource the service features those drivers mandate, you can bet that the whole of the ETSI initiatives relating to transformation are out the window.

A final key point here is that Honeycutt is taking over an activity that could well decide the future of Google Cloud, and he’s neither a cloud guy nor a Google guy.  That’s not a bad thing given that Google lags its competitors in the cloud and has little realistic chance of catching up by going head-to-head with them in the market segments they’ve already staked out.  However, getting the carrier cloud positioning just right is absolutely critical, and getting the technology right is at least as critical.  That’s going to take more than Honeycutt, I suspect.  Kurian, who heads Google Cloud, came from Oracle’s cloud business and probably has a good technical grasp of the cloud, but not necessarily the carrier cloud (even carriers may not have that!)  Maybe Kurian and Honeycutt in tandem can work some magic.

Will Ciena’s New Blue Planet Division Really Help?

Ciena is now spinning out its Blue Planet orchestration, a move first referenced in their earnings call and now reported by SDxCentral.  Ciena offered a blog of its own on the topic recently.  Both these sources are long on the “new division” piece, but short of details on the mission, the architectural model, and the sales strategy for Blue Planet.

When Ciena acquired Cyan and Blue Planet, it wasn’t exactly clear how the company expected to leverage their new asset.  It hasn’t become any clearer since, and Ciena now thinks that having Blue Planet tied too closely to its core optical products is making sales more difficult.  As I pointed out early on, different people buy service automation tools than buy optical products.  However, it’s not certain that just breaking things out will make a real difference.

I had a lot of hope for the combination of Ciena and Cyan.  Optical players are probably the only ones with a sure place in future networks; everything else could in theory be “virtualized” in some way.  A new set of SDN layers above the optical layer, combined with strong orchestration capability, could absorb a lot of the features of Levels 2 and 3 in services.  They could be much more capital- and operations-efficient too.  Sneak up from the bottom and rule the world, in other words.

Ciena’s overall strategy of packet optics and blending electrical behavior seems to fit this model, and that may be the core of the problem.  Companies are not surprisingly entrenched in what they’ve done all along, and so if you give them a story that includes that (remember, optical networks are the core of everything) and a bunch of new stuff that needs more sales effort, they hunker down on legacy.  Blue Planet turned into more of a captive than a bridgehead to a new age.

Having Blue Planet joined at the hip with optical stuff surely didn’t make sense.  Software products are totally different from network hardware, even if the products are “network software”.  The optical sales types have no understanding of service orchestration, since optical products are transport-layer entities.  As I’ve noted above, you don’t even sell them to the same people.  However, orchestration or “service lifecycle automation” as I’ve termed it, still could integrate optical network behavior into service behavior more efficiently and thus benefit Ciena.  A separate Blue Planet division is potentially a solution to the original problem but it brings or at least exposes four new risks.

First, you don’t create insight by dividing ignorance.  Ciena didn’t have people who could knowledgeably plan for and sell Blue Planet.  It wasn’t a matter of having them buried in the optical masses, they weren’t there.  They still aren’t there, new division or not.  Anyone at Cyan who really knew anything about lifecycle automation (and Cyan wasn’t in my view an enormously insightful service lifecycle automation outfit even as an independent company) didn’t make their way to the surface in positioning terms.

This challenge was exacerbated by the addition of Packet Design and DonRiver to the Blue Planet ecosystem.  We start with an orchestration tool that Ciena didn’t really get.  We add in specialized Level 3 features that Ciena, as a transport vendor, arguably gets even less.  Combining these three product sets into a new division may integrate them organizationally, but not functionally.

Second, making Blue Planet a separate division doesn’t necessarily deal with the sales issues.  Who will sell the stuff, a new sales force?  I doubt it.  Will the new division, still joined by invisible silken ties of sales to the optical line, end up doing nothing more than trying to lever Blue Planet into those same optical deals?  I think that’s a real possibility.

The sales challenge really starts with a positioning challenge, which Ciena hasn’t addressed, period, even at the time of the initial Cyan deal and at any of the other two related acquisitions.  What, exactly, is this combined collage of tools supposed to add up to?  Is it full-bore service lifecycle automation?  Is it “resource-layer automation” presuming a higher-level service automation tool?  The role dictates the sales target, which in turn dictates the salesforce knowledge, experience, and support collateral.

Challenge number three is a cohesive technical architecture.  Three legs are a necessary condition for a stool, but not a sufficient condition.  You need a seat, and you need them assembled.  Cyan’s Blue Planet mission was NFV, but Ciena has gradually de-emphasized that, given that NFV hasn’t set the world on fire.  Perhaps Packet Design and DonRiver (the latter in particular) are supporting elements in a new grand strategy, but what is that strategy?  It’s not NFV, but what?

I think Ciena intends, in a loose ad hoc sense, to offer a set of agile-resource-layer automation facilities.  If that’s what they want, they need to define just what such a thing is, how it relates to broader service lifecycle automation, and how it relates to existing network equipment and perhaps to future software-hosted features.  NFV provided a framework for such a thing—imperfect in my view—with the “virtual Infrastructure Manager (VIM) and the VNF Manager (VNFM).  Does Ciena want to presume this structure, or does it propose to define a new one that can be fit to the NFV model, or even abandon that model totally?  Whatever they propose, they have to define it and how it links with the tools and missions that service management demands and that Ciena is not electing to support itself.

The final problem is the cloud.  Whatever the details of the implementation of the concept might be, it seems pretty clear that virtual networking is the foundation of the cloud.  Not surprisingly, the cloud community is fleshing out what virtual networking means and how it’s implemented.  That same community is defining how distributed components operate in concert to create an experience, which is exactly what functional components of services have to do.  The work there is broad, compellingly good, and totally applicable to virtual service frameworks and components.  Ciena has done nothing to relate to it.

Cloud-native is everyone’s goal, and in theory Ciena could fit Blue Planet into a cloud-native ecosystem.  In practice, they’d have to address this point and the three previous ones at the same time to gain anything from that effort.  They need a mission and an architecture to fulfill it.  The mission has to make sense in the broad framework of transformation and operations efficiency.  The architecture has to blend cloud-native with service lifecycle automation, NFV, and the optical-layer stuff.  All of that has to be wrapped in a sellable story, marketed, and pushed in tandem through the sales force.

Can they do it?  Perhaps the best answer we can offer for that is that we have an industry full of vendors with product lines that are much more aligned with the service-layer story than Ciena has, and none of them have been able to pull themselves out of the network-level mud.  Ciena is at the bottom of that mud, technologically speaking, and so has many more layers of it to pull themselves through.

This is far from an impossible task, but you have to wonder whether Ciena, had they recognized the need and the pathway to positioning success, wouldn’t have taken the path back when they were doing the Packet Design and DonRiver deals, or at least before or along with their new-division announcement.  I think it’s clear that Ciena would have to both realize the need for some critical positioning work for their new division, and have some sense of how to carry that out.  They might have that now, of course, and the organization might be the start of a shift.

Or not.  I still think there’s at least an even chance that Ciena is positioning the whole division to be sold off.  Integrated with its other operations as a related product, that might be difficult to do because breaking out the assets would be complicated.  Make a separate division out of Blue Planet, and you can sell it off in a wink.  I think Wall Street would be unlikely to reward that move given that they’ve been acquiring companies to flesh out their Blue Planet offering, and would be unlikely to reap full value were they to sell it off.  Still, stranger things have happened, and as I’ve said, no network equipment vendor has made a success of software yet.  Ciena might be the first, or it might have already decided to throw in the towel.

There’s a third possibility, of course, which is that Ciena is trying to fix a non-organizational problem with an organizational change, and has no real intention of either selling off Blue Planet or making it into something useful.  That to me seems not only the most muddled but less-likely-to-succeed scenario.  Does drawing a new organizational chart change things like product architecture or sales success?  Not that I can recall, and we can see from what’s happened with Blue Planet to date what would likely happen if Ciena stays the course.  Listening, Ciena?