Can Juniper Get It’s Mojo Back?

Juniper is an interesting network equipment vendor, perhaps the only major pure-play IP and Ethernet vendor out there.  They’re also in the midst of a major revenue issue, as this Light Reading story shows.  There seem to be paths Juniper could take to get out of their current dilemma, but those paths have been open and unused for several years.  Will the company turn around, and if so, what might it have to do for that to happen?

I remember well the foundation of Juniper.  Their original story was that they were a true standards-based router product to counter Cisco’s push toward proprietary technology that locked users in.  That was a good story, one that grew Juniper a lot and made them a worthy rival to Cisco.  I remember when Juniper was resisting entry into the data center switch space, despite warnings (including mine) that the space would eventually be critical for them.  I remember when Juniper had the best marketing campaign in the industry with cartoons contrasting them with Cisco.  In short, I remember a lot of good stuff, most of which has aged into irrelevance or has been lost.

It’s not that Juniper has lost key products, or failed to evolve them.  Their routing/switching technology is as good as any in the industry, and better in fact than most.  The problem is that they have always seen the future of networking as being linear exponential growth in the size and number of switches and routers.  One ex-Juniper friend described this as the “more and bigger” theme.  Last year, in fact (as Light Reading reminds us in the article) they were saying that sure operators were spending less but 5G was going to ride in from a golden sunrise to save everyone.  They’re still listening for the hoofbeats, I guess.

For the last six years, operators have been honest with vendors.  I know because I’ve been there for many presentations, and they all showed revenue per bit falling, cost per bit falling more slowly, and an inevitable crossover if something wasn’t done.  Cost per bit means both capex and opex, and that set of charts (some of which were made public) should have told network equipment vendors that something dire was going to happen if they didn’t step up somehow.  They didn’t, it did, and Juniper is feeling the bite.

Juniper has never been a price leader (Huawei is the price leader).  Juniper has never been perceived as a feature leader either, because in my surveys of strategic influence exercised by network equipment vendors they’ve consistently lagged the field.  There are three primary reasons for this, in my view.  First, Juniper’s marketing has been insipid to say the least.  There are no funny anti-Cisco cartoons any longer, and in fact that was the heyday of Juniper marketing.  It’s been downhill since.  Second, every single acquisition Juniper has made has failed to deliver the kind of innovation boost Juniper needs.  Third, there is no company so dedicated to drinking its own bathwater as Juniper.

Juniper is bad at marketing in part because they developed a sales-and-engineering-centric culture a decade ago, led by executives in both areas that were relentless in promoting their visions.  Their marketing genius left (perhaps as a result of the uphill battle) and the company tilted decisively toward engineering and sales.  They did some great things, but they could never explain them to people who lived in the real world, which is what marketing does.  They still have that culture today.

When Rami Rahim, the CEO, talked on their earnings call, he talked about “go-to-market” improvements that were specifically in the sales organization, improved enterprise sales that were improved only versus this quarter last year but down sequentially, and an acquisition (Mist Systems, the WiFi management company) without an explanation of how it will somehow relaunch the company’s fortunes overall.  Not a word about marketing, and without a strong marketing program any sales organization is hamstrung.

That raises the point about acquisition failures.  They’ve made 22 acquisitions, and I’d venture to guess that few people in the industry could name more than one or two of them.  Of that list, there is only one that I think was a potential strategic game-changer, and that’s HTBASE.  Juniper, with that acquisition, became the only network vendor with a strong multi-and-hybrid-cloud fabric strategy.  Do they even know that?  I’ve not seen a single piece of strong collateral on the acquisition, and I see no mention of HTBASE or what it could bring to Juniper in their earnings call.

At one time, after one particular acquisition that I’m sure nobody remembers, the CEO of the acquired company was for a time the largest shareholder in Juniper.  That was obviously a big commitment, and yet if there was a result, I sure never saw it.  Like most of the other acquisitions, Juniper seemed to have done nothing more than thrown a dart at a board with target company names on it.  There has to be a strategy, a world-view of the market, that drives not only M&A but also engineering and marketing.  I don’t see it with Juniper.

And that brings me to the last point.  All companies are inherently sub-cultures.  You go to work every day with the same people, you often socialize with them too.  Over time, you develop your own view of the world shaped (as it always is) by those around you—who all happen to have the Juniper company view.  More and bigger.

So what does Juniper do to get out of this?  The long-term answer is to get that long-term strategic vision and use it to address the three specific issues I’ve named here.  The problem with that is that long-term doesn’t fit in today’s quarter-at-a-time Wall Street world, so we need a more practical solution.

Which, in my view, has to revolve around HTBASE.  Nokia has Nuage, a virtual-network approach that’s extended to SD-WAN, but they’ve never marketed it well and probably won’t do any better in the near term.  They also just turned in a bad quarter.  Coincidence?  Anyway, HTBASE is strategically more tied to the cloud and thus more useful than Nuage, and Juniper still has a chance to do something smart with it before it dies the sad death of the 20-odd other acquisitions.

The difference between HTBASE and a “virtual network” is that HTBASE is really about component/service portability in a cloud that can extend over data centers and public clouds alike.  It could offer true resource independence, resiliency, and scalability.  It really could turn a global network or global cloud into one enormous virtual host.  Juniper, by creating a preferred-but-not-exclusive link between its data center and WAN technology and HTBASE, could take an enormous step toward building what a cloud network would have to be.

This potential, or at least the potential to define a cloud network, isn’t lost on other players, though.  Google’s Istio is also a contending technology, and because the CNCF is a better marketing machine than Juniper is, the fact that at the moment HTBASE is in the lead as far as features/capabilities go could mean less quickly, and nothing at all eventually.  Not to mention that there are a lot of people ardently supporting enhancements to the Kubernetes ecosystem that Istio is a part of.  Not to mention that Juniper will have to support Istio too.

Network companies have a really hard time with virtualization in any form.  Nokia proved that with Nuage, and Juniper isn’t showing us a different level of vision so far with HTBASE.  That’s too bad because it’s already clear that in enterprise networking and cloud computing, virtual networks are essential because you need “logical connectivity” that’s independent of the current physical linkage between an IP address and a location in the network.  You need to float components around as needed, keeping their connectivity intact.  You need to load-balance and optimize location based on the location of older instances and linked components.  You need a cohesive cloud-centric vision of virtual networking, and HTBASE could offer that, while Juniper could link it all to traditional IP and Ethernet.  “Could”, of course, is not the same as “will.”

I have a lot of friends at Juniper, and I’d like to see them turn themselves around, but the key to that is in the phrase I just used.  They have to turn themselves around, not hope the market and 5G will turn them, or that service providers will forego profits to buy more routers, or that buying companies is a strategy in itself.  They have to shed the Juniper past and discover a new Juniper future, and I don’t know if Rahim or Juniper overall are going to be able to do that.

Truth or Myth: A Game We Can Play in Transformation

Ever hear of the game “Truth or Lie?”  We’ll soften the concept a bit by calling this “Truth or Myth” with a focus on transformation for network operators, but the essential idea is the same.  There are stories about what’s needed for transformation that are myth, and other stories not being told despite the fact that they’re true.  So let’s get started.

Truth or myth: standard APIs are critical to transformation.  APIs are link points between software elements, and organizations ranging from ETSI to the Linux Foundation and ONAP have focused on defining APIs as a pathway to ensuring interoperability among components and the creation of an open framework for the software.  Is this as essential as we seem to think?

The problem with standard APIs in lifecycle management software is that lifecycle management software shouldn’t be built as a series of connected components in the first place.  Management is inherently an event-driven process.  When we talk about connected components, we’re talking about a software application that has a linear workflow, where information moves in an orderly way from component to component.  This is true of transaction processing systems, but it can’t be true of operations management systems.

Imagine that we get a command from a service order system to spin up a service under a given order number.  That might sound like a transaction, but consider this:  suppose the service is already in the process of being spun up.  That’s an error, so you need to check to see if the state of the service matches the command, right?  OK, but can’t we still do this as a linked workflow?  No, because as we start to spin up the service, we’re going to start deploying and connecting pieces, and each of these tasks could complete properly or end in an error.

How does our linear system handle the fact that while one command (our spin-up order command) is moving through the system, other signals about completion of early steps are still going on?  How do we even know when all the pieces are fully operational and the service can go live?  In linear systems, we’d always be testing the status of other pieces of the order, and that means the software would have to know what and where those pieces were.  That makes the software specific to the service structure, and we can’t have that.

Verdict: Myth.

OK, how about this one.  Truth or Myth: service components like VNFs need a standard descriptor to hold their parameters because otherwise they aren’t interchangeable.  In NFV, we have a VNF Descriptor that describes the deployment of a VNF and defines the structure of the information needed to parameterize it.  We’re hearing all the time that one big problem with VNFs today is that they don’t have the same parameter needs, and so every onboarding of a VNF is an exercise in transforming their parameters to the VNFD.  Is that really a problem?

Well, look at the cloud and containers and the notion of “GitOps” for the answer.  In container deployments in the cloud, a software component image is packaged with the associated parameters, which are stored in a repository.  A “unit of deployment” is the component plus its parameters, and that’s deployed as a unit by the container system (Kubernetes, usually).  The needed information is simply pulled from the repository, which is now the “single source of truth” on the component in detail and on the service overall.

The only thing we have to deal with in supporting multiple implementations of a given service function like a VNF is the notion of an intent model that harmonizes its external parameters and features.  In other words, we have to make all the things look the same from the outside in, not make them the same in all their parameterizations.  To do this, you don’t need (and in fact can’t use) a descriptor, you need a class-and-inheritance model.  All firewalls are members of the “Firewall” class, and all must harmonize to the same visible parameter set.  But is that a burden?  Nope; we transform one data model to another in programming all the time.  Descriptors are artifacts of “old-think”.

Verdict: Myth.

Next, Truth or Myth: Network services based in software can’t be based on cloud principles alone, so we need to define new standards for them.  The original mandate of the NFV ISG was to identify specifications that would serve to deploy and manage service components, not write new ones.  Yet we’ve spent five years or more writing specifications.  Obviously that meant that there were no standards that could serve to deploy and manage service components, right?

A deployed component, in the sense of a VNF, is simply a piece of software.  The cloud standards for deploying it are the same as for any other piece of software.  From the first, back in 2013 when the work of the NFV ISG started, there were multiple standards that could have defined the deployment process, and in fact the ISG’s work has always been based on OpenStack at the “bottom” of the mapping of resources to VNFs.  Since then, advances in the cloud have continued and now offer us an even better approach to deployment.  But doesn’t that mean what whatever we did back in 2013 would have been obsolete today had we gravitated totally to a cloud model?

No, it doesn’t.  Generally, cloud evolution is optional and not mandatory, and people pick a new concept because it offers overall benefits to offset any migration efforts needed.  If you don’t like the trade, you don’t change your approach.  The fact that the cloud is evolving means that people do like what’s happening, and operators would have liked it too, had they done the right thing up front.

Verdict: Myth.

OK, no truths here?  Try this: Truth or Myth:  NFV is not living up to its promise.  NFV will totally transform networks, except that it isn’t happening.  Is the problem of NFV simply a startup problem, a case of working through other issues to refine the approach before the inevitable explosion of adoption, or is there a fundamental problem?

This one is Truth, and a big part of the reason is the fact that the operators, or at least the CTO organizations where standards work is done, accepted the other myths as truth.  Another big part is that from the very first, NFV was about reducing capex by substituting commercial server hosting of VNFs for custom network appliances.  That simply didn’t save enough money, and furthermore the cost (the opex) associated with deployment and lifecycle management of virtual functions more than offset the savings on equipment.

NFV is never going to be what was planned, period.  It targets stuff like virtual CPE where the business case is minimal for the services that are appropriate, and non-existent for most services.  It excludes the operations issues that alone could have justified NFV.  Even doing NFV right wouldn’t have produced the sweeping benefits many had hoped for, but it would have prepared operators for the right story, which was (and is) carrier cloud.

The sad thing here is that the zero-touch automation stuff, including ETSI ONAP, have gone exactly the same place as NFV.  We’re uniting things like ONAP and OSM in an effort to coordinate standards and open-source, but neither ONAP nor OSM is the right approach to start with.  Two wrongs make a greater wrong, not a right, and that’s truth, the ultimate one.

Taking Another Look at the US Carriers and Broadband Services

Verizon and ATT had long marched to different strategic drummers, and nowhere is that as obvious as in their streaming video strategy.  AT&T launched its DirecTV Now as a mobile and wireline live TV service, supplementing its DirecTV satellite acquisition, and Verizon stayed with linear FiOS TV.  Now, Verizon has done a deal with (gasp!) Google’s YouTube TV for streaming services, not only to mobile customers but also to FiOS customers.

Both companies have also reported earnings this week, and while Verizon delivered a modest beat, AT&T missed slightly on revenue, which the Street attributes to their need to absorb and strategize around their Time Warner deal.  It’s interesting to note that the Street values both companies primarily on wireless business, interesting because Verizon and AT&T seem to have different strategies to address the challenge of sustaining wireless growth.

I think one of the most relevant numbers hidden in the quarterly reports are the subscriber losses in TV.  AT&T and Verizon both lost wireline TV viewers in the quarter, and it seems virtually certain that this loss is due to a gradual shift away from linear TV consumption toward streaming.  That shift applies not only to an increase in mobile viewing, but also to a “cord-cutting” trend seen across the whole cable TV industry.  The message there is that live TV, once seen as the enduring profit source for wireline services, isn’t going to be able to deliver.

In past blogs, I’ve noted that one big difference between AT&T and Verizon is the economic value of their base regions.  I’ve used the metric “demand density”, which is an adjusted measure of revenue opportunity for telecom services per square mile, as an indicator of whether ordinary communications services can pay off.  Verizon’s demand density is about seven times that of AT&T, which means that Verizon’s strategic planners can look forward to at least the possibility of profits in baseline mobile and wireline broadband.  AT&T has a much harder row to hoe, as they said in my youth, because they have to deploy more infrastructure (at higher capex and opex) to reap the same revenue opportunity.

For AT&T, this means two things.  First, you have to look somewhere other than return on infrastructure for profits.  That’s why the TW deal was so important to them.  In a sense, they’re mimicking rival Comcast, who acquired NBCU to get content revenue to supplement their own limited return on infrastructure.  Second, you have to pull down your cost per bit sharply, because you can’t compete in the content production space if your content has to subsidize a loss in broadband.

Verizon has demonstrated with the YouTube TV deal that it’s not going to try (at least for now, and probably never) to get into the streaming business itself.  The big question is whether that decision is a reckless bet on the enduring value of their demand density advantage.  Can Verizon, without draconian steps to cut capex and opex, stabilize their cost per bit at a level that at least guarantees a break-even with revenue per bit?  If they cannot, then they may regret not being more aggressive in their own capex/opex measures, and they may find content company opportunities either gone or priced too high by the time they decide they need one.

That’s not their biggest risk, though.  A smart strategy for streaming live TV is more than a new revenue opportunity, it’s the strongest early driver for carrier cloud.  Verizon, by adopting YouTube TV as its streaming strategy, is essentially walking away from the opportunity to build an early carrier cloud platform that could then be expanded to serve the broader drivers coming down the line.

They may be heartened by AT&T’s missteps with DirecTV Now.  This service, which was at launch one of the better one, had many reported problems with service quality and seemed unable to introduce basic new features (like cloud DVR) in a smart and timely way.  As a result, a service that for at least a quarter or two led the streaming market in growth, has now fallen behind.  Verizon’s YouTube TV choice is far better than DirecTV Now, particularly out of AT&T’s home region.

That reopens the question of whether telcos can even build their own cloud infrastructure.  Google’s YouTube TV is good because of the software and the consumer-friendly design.  It’s built on a solid cloud-native platform, right at the leading edge of the space.  I don’t think that AT&T or any other telco could draw the talent needed to duplicate it, which begs the question of whether AT&T is in the lead by doing something that probably won’t work before rival Verizon does, or behind.

Verizon may in the end have to follow AT&T into a strong push for carrier cloud, but might their hedge against this future risk be simply doing a deeper cloud deal with Google?  Could Google host an entire network operator’s carrier cloud, or even multiple operators?  Maybe.

If we look at the six drivers of carrier cloud (NFV, video/advertising streaming, mobile infrastructure and 5G, carrier-provided third-party cloud computing, contextual/personalization services, and IoT), two of the six are difficult to outsource, but all the rest could be.  NFV today is all about vCPE, which means that it’s probably mostly an opportunity for hosting agile features on uCPE.  Other NFV stuff is largely business-targeted and thus could be handled with modest edge computing.  5G and mobile infrastructure is, at least in my view, similarly divisible into a white-box and edge computing deployment.  Thus, neither of these is necessarily a barrier to third-party hosting of carrier cloud.

Google has its eye on the telco space, as a number of recent announcements have shown.  So does Amazon and IBM and even Microsoft.  What may be interesting here is that YouTube TV might give Google an inside track.  If a streaming service is essential for operators, and if deploying their own service on their own cloud isn’t viable, then Google has an alternative for them.

Into this brew we must also stir the issue of the 5G/FTTN hybrid.  If millimeter wave proves out as an alternative to fiber to the home, it introduces a wireline future where linear TV delivery isn’t possible.  Every operator then has to ask whether they’ll bundle a third-party service like YouTube TV, roll their own streaming service, or simply be a fat pipe provider.  The millimeter-wave model for home broadband isn’t universal, though.  In fact, it’s most useful where demand density is high, and that means both that Verizon is likely to deploy it and that competitors like T-Mobile are likely to concentrate more on Verizon’s territory.

This means that demand density both enables Verizon and threatens it, and that streaming for Verizon is likely a given.  AT&T, with lower demand density, will find that many of its TV customers have to stay with DirecTV satellite because rural and thin suburban locations won’t benefit as much from the hybrid.  But AT&T can credibly look anywhere in the US for customers for TV (and does) while Verizon is almost surely going to offer YouTube TV integration only to its own customers, leaving Google to market to the rest.  This puts more pressure on AT&T to get DirecTV Now to work for it.

A final question all this raises is whether even streaming services can be viable in the long run.  Apple and Amazon think that being effectively a TV aggregator and delivering stuff directly from the networks or content owners is the best approach.  Some networks think they can go it alone, and it’s probable over time that content owners’ desire for profit growth will put the squeeze on all forms of content delivery services.  If CBS All Access can work for CBS, will more providers start to offer their streaming directly, and will this a la carte offering set finally break the mold of channelized, what’s-on-now TV forever?  It might happen.

Public Cloud Feature Evolution and the Carrier Cloud

In past blogs I’ve noted that Amazon’s biggest cloud customers weren’t enterprises at all.  We now know, thanks to a story on CNBC, that Apple spends over $30 million per month on AWS, and they might not even be the biggest of the non-enterprise customers.  This reflects the underlying opportunity and technology turmoil facing public cloud computing, and that same turmoil may impact enterprise cloud computing and even application development.  Most of all, it will influence carrier cloud planners as they figure out their market and feature targets.

I’ve noted before that social media and other tech users are the major customers for Amazon, and they’re nearly as large a percentage of Google’s cloud customers.  Microsoft Azure, IBM, and Oracle are more enterprise-centric in their cloud base, and as a result these three have an advantage if the cloud market shifts more toward enterprise hybrid cloud applications.  However, there is some technical overlap between the two market segments, and Apple actually fits almost exactly in that overlap.

Most hybrid cloud applications deployed by enterprises consist of three pieces—a web and mobile application GUI, a cloud front-end, and the data center transactional back-end processes.  Because enterprises are almost totally transactional in their application use, this trifecta makes the cloud front-end piece an intermediary between the user (represented by browser/app technology) and the transaction processing.

Social media applications, as I’ve noted and has been widely discussed elsewhere, are really event processors.  This means that, compared with enterprise apps, they have little or no back-end element and thus tend to look like a highly developed front-end-and-GUI.  IoT, widely viewed as the event-driven king of the hill, is far less important as an event source than social media, and I believe that a lot of the enterprise IoT offerings we see from cloud providers are really a repositioning of the assets they developed to support the social media space.

Where Apple fits into this is important.  Apple’s cloud use is a bit of social media and a bit of enterprise front-end.  Their online store stuff is very similar to an enterprise retail (or wholesale or customer care) front-end.  Their media delivery is more like social media.  In short, Apple seems to match a binding point between the social media and enterprise models of cloud usage.

Another important point about Apple is that they’re reliant on other companies’ cloud services rather than rolling their own.  The story says that over five years, Apple is contracted to use about a billion and a half dollars of Amazon’s cloud services (and so my sources say, a bit less than a third of that from Google in addition).  Three hundred sixty million per year would be a nice piece of change for any cloud provider, including carriers.  Could carriers then hope to reap some Apple, or Apple-like, business?

It’s hard to get detail on what Apple or any other cloud customer consumes in the way of services.  For one thing, the nature of cloud use by practically everyone is shifting as the underlying application model shifts.  We started off with virtual-machine hosting (infrastructure as a service or IaaS) and we’re evolving more to some form of container hosting.  Bigger customers are said to be hosting containers on cloud IaaS directly, while smaller ones are interested in managed container services.  There are two reasons for that.  One is that big customers probably have the technical skill to do all the container orchestration and lifecycle management themselves, and the other is that big customers want hosting economy above all else, which means they tend to use “raw resource” services like virtual machines.

The issue with application-model evolution is significant for a lot of reasons.  When the cloud was new, applications for the cloud were written for the data center, and so cloud-unique features couldn’t have been exploited even had they been offered.  Now the cloud is out there, and so are cloud features that were designed to exploit the cloud’s unique properties.  Many, as I’ve already noted, were recast from social-media features (serverless was “invented” by Twitter).  New applications could therefore be written to depend on these features, so the more new stuff is written the more benefits the cloud can bring.

This is both a blessing and a curse for network operators looking at cloud services as a justification (even in part) for their carrier cloud deployments.  The good news is that we have a trillion-dollar-per-year cloud spending upside globally.  The bad news is that what’s needed to reap the golden harvest is all the cloud features that the current public cloud leaders have been developing and marketing for years.  The pie was smaller five years ago, but the apples for it were lower.

The biggest problem operators face is the time it will take for them to replicate even roughly the current features of the major cloud providers.  During that time, those providers will be developing new features, creating new software models, and so it will be a continuous game of catch-up.  Operators are way down the list of employers that eager young software geniuses think they’d want to work at, so how operators will gain the skills needed to get ahead of the game is a big question with only one obvious answer—they won’t.

Even the leading-edge open-source players find it a challenge to visualize, much less address, the future of cloud-native application architectures.  We’re just starting to glimpse the edges of some of the critical developments.  For example, the combination of functional computing and a contextualizing layer (like Amazon’s step functions) is an entrée into the imponderables of distributed-state systems.

Event processing is probably the unification of the cloud opportunity areas, in a technical sense.  You can obviously link events to IoT, but you can also dissect transactions into a series of events, then choreograph them into a proper sequence and context using our previously described contextualization layer.  In other words, events plus context equals business activity.  This is where operators who want to play in the cloud-service space need to be looking.  Event contextualization and orchestration is the leading edge of today’s cloud-native world.  It’s beyond what Apple’s use of the cloud represent, a way station on the path to most of what carrier cloud will have to serve if it’s to be useful.

Operators should have been on the leading edge of event-focused cloud applications, because most of networking and all of service lifecycle automation is about event processing.  Sadly, this truth wasn’t recognized when it could have been, and in fact isn’t recognized by many operators even today.  The big question for carrier cloud is, was, and likely always will be how to handle events effectively, because nearly every application driving carrier cloud is event-centric.  Certainly all the major drivers are.

That could be very good news for operators even now, because it would be a lot easier for them to build up an event-centric platform for their cloud than to try to catch up with all the features that the major cloud providers have already deployed.  But, as I’ve noted, the big guys are starting to get their arms around event-handling, and if they gain a decisive lead on operators, the operators may find themselves without any practical way of deploying their own cloud infrastructure.  That means all the revenue of the cloud of the future will go to the current cloud giants, and carriers will end up outsourcing much of their own futures to their arch-rivals.

Big Routers or Big Routing?

Could there be more to cloud-based routing than hosting router instances?  One company, DriveNets, thinks there might be, and Light Reading reports on how “disaggregated” architectures that run software instances on white boxes could create a whole new model of network routing.  But is that really what’s happening here, and is there perhaps a different disaggregated vision of routing that might be the brass ring in this whole cloud-router circus?

If you want to talk about router evolution, you have to start from the protoplasm level, from the hardware routers we have now.  A “router” is a network node that supports multiple ports, and has a forwarding table to guide input packets to output trunks based on a combination of network topology and the points of attachment for real user destinations.  In a traditional router, the capacity of the router is based on what’s often called the “backplane” capacity, which is the speed of the internal data bus that connects all the port interface cards.

Over time, as pressure to support higher speeds has grown, it’s become increasingly difficult to build backplanes fast enough to carry all the traffic the ports could collect.  That’s resulted in a different model of connection, what might be called a “cluster” router.  Think of a cluster router as a router built around a “fabric” switch, something that can provide any-to-any connectivity.  Good fabrics today are “non-blocking” meaning that they don’t interfere with traffic movement no matter how many ports are active.

We’ve had this cluster model of routing for a decade, and it’s at least a loose version of “disaggregated” routing, but I think the real question is whether “cloud routing” has to be more like a cloud than like the kind of cluster we had before the cloud even came along.  Could we actually disperse a cluster, meaning extend the fabric, or should the path to cloud routing be more virtualized?

The obvious problem with extending a cluster is that even today we don’t have network links fast enough to serve as a distributed backplane or fabric.  There are standards (InfiniBand is an example) to define very fast connections, but they’re local.  In any event, to create a non-blocking fabric that’s truly distributed you’d need to have a mesh of very high-speed connections, and if that were practical you might as well must mesh-connect the edges of the network.  Actual cluster dispersal, in other words, isn’t likely a practical answer.

Which leaves us with virtualization.  We have virtual routers today, hosted instances of routing (or more general forwarding) technology.  We can build networks from them, but here we need to take care not to fall into what I’ll call the “NFV trap”, which is mapping virtual to physical so tightly that we end up constraining the use of virtual instances in the same way that physical devices were constrained.

We can put routers where we think we need them.  We can even move them around if, over time, we determine that we need a different configuration.  What we can’t do is create a nodal topology based on short-term traffic requirements.  At least, we can’t do that with physical devices.  We could easily do that in our virtual world, and that’s what I think we have to look at when we talk about “cloud networks”.

At any moment in time, in every network, there’s a series of traffic flows happening, exchanges among the users (human and otherwise) of the network.  These flows would have natural points of concentration where those users were concentrated, and if relationships among certain sets of users were more likely than others, those relationships would also create concentrations of traffic.  The most obvious concentrating factors, though, are the transport paths available.  Unless you presume we can beam traffic over the air, we need to have what will in nearly all cases be optical trunks at the bottom of our connectivity.  Those trunks are point-to-point, joined into complex topologies by nodes that might either provide agile optical switching (reconfigurable add-drop multiplexers or ROADMs) or electrical switching of the optical payload of packets.  Traffic has to go where the trunks go, and so it has to get to specific trunk on-ramps and exit at off-ramps.

What “routers” do in this situation is provide that nodal payload switching.  We put a router at a trunk junction and it decides what path an incoming packet should take, collecting and distributing packets.  What that means is that a router network is in itself a “cloud router”.  In an abstraction sense, a network of routers would look, from the edge looking in, like a big single virtual router.

One obvious example of what might be inside our big virtual router is SDN.  Conceptually, SDN was (and sort of still is) supposed to be a centrally controlled forwarding process, which implies that it replaced adaptive behavior by controlled behavior.  However, if an SDN device is truly a general-purpose forwarding engine, it doesn’t have to behave like a router if what’s at the edge of our virtual-router abstract does present router behavior to the outside world.

Suppose that we presumed that SDN created, inside the virtual router, nothing but forwarding paths, with no control exchanges at all.  The topology of this interior structure would instead be set by the central (or distributed hierarchical) controller, and each of the edge elements would be provided with a forwarding table that linked an interior path to a set of IP subnets reachable through that path.  Those would then arrive at an edge element, and from there be distributed in more-or-less the usual IP way.

The interior paths would traverse a structure of forwarding nodes, but in the event of a problem or simply in response to traffic changes, those interior paths could be changed to traverse a different node set, be expanded in capacity or shrunk, etc.  The capacity of the network, as measured at the edge by summing the bidirectional in/out traffic, could be enormous, far beyond what a single node could possibly carry.

Something similar to this is already done by some SD-WAN implementations.  SD-WANs would often touch the corporate VPN and perhaps even public clouds in multiple places.  SD-WAN traffic would thus have multiple paths to reach a given destination, and if the SD-WAN kept tabs on QoS for each possible connection option, it could adapt to maximize capacity for a given set of conditions.

Does this mean that we’ve had our goal in our hands all along?  Obviously not, if we still think we need more “disaggregation”.  What this preliminary look at routing does is show us where we’d have to look if we wanted true disaggregated cloud routing that was better than we have now.  We’d have to look at what’s inside that big single virtual router, and try to optimize what we found.  The best way to do that is to return to the point about what a hosted instance is that a physical router isn’t, which is traffic-tactical.

With virtual devices that can be spun up in servers or distributed white-boxes-in-waiting, it could be possible to create new nodes and not just new topologies.  Remember that fiber trunks would likely be only partially committed to any given network service, but they could all be available.  Operators could have wholesale carriage agreements with each other to augment their own capacity.  These fiber trunks would terminate in specific locations, of course, and in those locations the availability of dynamic node-hosting capacity would mean that a node could be spun up here and there to exploit additional fiber where traffic conditions or network faults created an issue that couldn’t be resolved by rejiggling the old capacity with a different topology.

What this describes is what could be called “big routing” versus “big routers”.  A single “instance” or “device” or “cluster” is sized based on its total offered traffic.  A lot of aggregation in networking will generate a lot of traffic and require a higher capacity where the aggregation has focused, which is what got us to where we are in cluster routing.  However, if we assume a more edge-driven future, a lack of single points of massive traffic concentration, we don’t need big routers, we need big collective routing.

Assume a set of sites, linked with fiber and supplied with facilities to host virtual SDN nodes and virtual edge routers.  Assume a central control point with the facilities to collect traffic information and load status for everything.  Assume that this control point can then spin up new nodes, harnessing new trunks, and reshape the interior topology of our virtual router.  Assume, in short, what cloud-distributed and truly disaggregated router technology would look like.  Why stop short of the real goal, the real solution.  It’s right in front of us.

What Does the Recent News Mean for 5G?

There are more signs that 5G hype has outrun reality, perhaps by a long shot.  Light Reading summarizes their view HERE, and another alarming piece of news is that Intel won’t be making 5G chips.  Light Reading ends its piece on an upbeat note, though, and many believe Intel’s decision came because of the Apple/Qualcomm settlement.  Are there more signs that 5G may be a bit too much sauce and not enough beef, or are we just seeing the inevitable media shift on a story that’s run a long time?

I’ve always said that to the press, any given technology had to be either the singlehanded savior of western culture, or the last bastion of International Communism, because nothing in between would generate enough clicks.  In the early days of any technology, we see a “bulls**t bidding war” as articles strive to get attention, and of course vendors and advertisers like any story that gives them a reason to call on prospects.  Later on, most technology developments take more time and both readers and sponsors see the downside, so the focus tends to shift to the negative.

The truth is that 5G was never going to meet all the extravagant claims made for it, and in the near term probably won’t meet many of them at all.  We are looking at 5G all wrong.  It’s not going to drive augmented reality or connected cars or the Internet of Things and Smart Cities.  It’s not a driver, but rather an enabler.  Some of the things 5G enables were held back primarily by network-related limitations, and these will advance in an orderly way once those limitations are relaxed.  Other things were speculative applications that need to prove out their business case, network issues aside.  These are really no closer to realization today than they were three or four years ago.  We just need to know which applications fit in what categories.

As an enabler, 5G requires two things—widely available 5G service and a strong population of 5G-capable devices.  Only the millimeter-wave 5G/FTTN hybrid has the luxury of deploying at a fairly controllable pace; operators can decide when to roll it out because they control both of the two requirements.  For the mobile 5G services, you need to deploy enough 5G service to make users comfortable with it, and at the same time get a population of handsets out into the market.  Since 5G handsets will also surely work with 4G, the latter can be just a matter of time.  The credibility of the services themselves will depend on both service cost and handset costs; users won’t pay a lot for a 5G phone if they don’t believe there will be 5G services available, and affordable.

The balance we see with 5G enabling is creating slow but steady 5G transformation, at roughly the pace of mobile modernization.  My model shows that 5G doesn’t seem to have the enthusiasm that 4G had, perhaps because you can only cry wolf so many times and perhaps because the hype wave leading up to 5G (which hasn’t even yet arrived) has been so long.  In any event, there are sure signs of 5G success, and I think Ericsson is probably the poster child.  Ericsson has a 5G radio strategy that’s evolutionary rather than revolutionary, and it seems to be profiting the most from 5G today.  As I noted in a past blog, operators like the fact that they can modernize 4G RAN and prep for 5G in one deal.

Apple’s deal with Qualcomm may be another indicator of slow-but-steady 5G progress.  Intel, say my sources, informed Apple they would be exiting the 5G modem space, believing they had little chance of being a market leader there.  Apple, again according to what I heard, didn’t want to deal with Huawei given its uncertain position with the US Government.  What’s left?  The guy you were embroiled in a lawsuit with, which is obviously not an ideal choice for Apple but surely the best currently available.  And the fact that Apple felt it had to make the choice indicates it believes it needs 5G in iPhones in the near future.

Apple may, in fact, be the key to the slow-and-steady model of 5G adoption.  They’re hardly the price leader in the smartphone space, but they do have a reputation for being the phone of choice for trendsetters.  If the cool people like Apple and Apple likes 5G, the commutative property seems to apply and cool people will come to like 5G too.  That will gradually influence others, spawn stories of how great an iPhone with 5G is, and force competing smartphone players to accept 5G too.  In two or three years, there’ll be a decent 5G phone population….

…. if we have service coverage in key markets, that is.  Huawei is the 5G price leader, and its current troubles could mean that operators would have to pay more for 5G infrastructure, raising the risk by raising the “first cost”, the initial investment needed just to create credible coverage and thus promote customer adoption.

Intel’s departure from the 5G modem chip space isn’t a signal that 5G won’t happen, but rather that it won’t happen quickly.  In order for 5G chips to take off, you’d need to see either a massive phone refresh, or widespread adoption of 5G-based IoT.  We know that consumers aren’t likely to toss their phones just to get 5G, so that leaves IoT.  Isn’t that IoT stuff on the verge of happening?

Probably, but not in the form of 5G devices.  The great majority of IoT is in-building, where WiFi or other specialized control protocols work fine and offer connectivity at zero monthly cost.  Connected car or autonomous vehicles don’t need 5G because despite the hype, nobody would seriously consider running vehicle control out-of-vehicle; there are too many things that could make control impossible and it’s too easy to use local intelligence to sense obstacles and take action.  We have that already, right?  Hey, gang, why would we be looking at distributing intelligence to the edge and at the same time remove it from cars?

How about augmented reality or virtual reality?  We have that now as well.  Many gamers use it, and it’s also based on WiFi today.  Could you do better VR/AR with 5G?  Certainly you could do it in more places, but while AR could theoretically be employed while driving or walking, the issues in delivering that are profound, and those in regulating it even more so.  People do badly enough with texting while driving; imagine them with AR glasses on as they navigate through a busy intersection or pass a school that’s letting out.

Within a year or so, we’ll probably be hearing about how 5G was an epoch failure, how it didn’t happen as it should have, didn’t deliver what it could have.  That’s not true.  5G is going to end up doing exactly what it should have been expected to do all along, which was an improvement in cellular bandwidth and cell capacity.  It’s going to enable, facilitate, new applications, but each of those applications will have to prove their own business model and meet public policy and safety goals before they happen.  They won’t spring up like weeds when they’re watered with the magic elixir of 5G.

Hype has no inertia, no limitations.  Business and life do, and if we let ourselves build our expectations based on something with no practical boundary, we’re always going to be disappointed.

Do We Need More Coordination in Standards, or Less?

The future of networking probably depends on defining a futuristic architecture for networking.  Traditionally, standards bodies have driven progress in network technology and services, as the example of the 3GPP and 5G shows.  When we talk about software-defined networks, software-driven services, and automated (even AI) operations, we’re in a different world, a world of software architectures and open-source.  A recent Fierce Telecom article says the industry needs more collaboration among and within both standards bodies and open-source communities.  Do we, and can we get it even if we need it?

The article is an interview with T-Systems executive Axel Clauberg, and the key quote is “Because we as operators don’t have enough resources, don’t have enough skilled resources to actually reinvent the wheel in different organizations. So for me, driving collaboration between the organizations and doing this in an agile and fast way is very important.”  As someone who has been involved both in international communications and network standards, and open-source software, I can sympathize with Clauberg’s view.  Operators have historically had difficulties in acquiring and retaining people with strong software architecture skills, and it’s worse today with all the startup and cloud competition for the right skills.  But collaboration isn’t easy; there are several factors that can create chaos even individually, and sadly they tend to unify in the real world.

First is the classic problem of “e pluribus unum”, which is who gets to define the “unum” piece, the overall ecosystemic vision that will align the “cooperating” parties into a single useful initiative.  What has tended to happen in the standards area is that a body will take up what it sees as a “contained” issue, and then exchange liaison agreements with other groups in related areas.  The idea is that these agreements will guarantee that everyone knows what everyone else is up to, and that where bodies are adjacent in terms of mission, they’ll have a means of coordinating.

In practice, this approach tends to secure adjacencies but not ecosystems.  There is still no clear vision of “the goal” in the broadest sense, and the problem with networking is that it is an ecosystem.  You can’t have the right network without all the right pieces, and the definition of rightness piece-wise has to be based on the definition of rightness network-wise.  But who is defining that?  Years ago, the 3GPP started thinking about 5G, and they came up with what they believed were logical technical evolutions to address what they thought were meaningful market trends.  Were they right?  A lot of what’s been happening to pull 5G work apart and advance it selectively seems to show that our current vision of what 5G should be (and when) isn’t what the 3GPP came up with.  Given that, how useful would liaisons be in creating a framework for cooperation between 3GPP and other standards groups?

Even in open-source, we have differences in perspective on what the glorious whole should look like.  Containers should replace VMs, or maybe run inside them, or maybe run alongside them.  They should be orchestrated and lifecycle-managed optimally for the cloud, or for a hybrid of cloud and data center, or for both separately.  Differences in hardware and hosting should be accommodated through infrastructure abstraction, or through the federation of different infrastructure-specific configurations, or perhaps by picking only one approach to hosting and lifecycle management and making everything conform.

Another interesting quote from Clauberg is “For me, the biggest nightmare would be if we would have competition between the organizations, competition and overlap.”  That’s not the biggest nightmare I see.  For me, the worst case is where we have a bunch of organizations that studiously avoid competing and overlap, and by doing so operate in a majestic isolation that can produce the outcome we want only through serendipity.  Liaison doesn’t mean cooperation.  Furthermore, I submit that the way the market has worked successfully in computing and the cloud is through the competition and overlap process.  We know the optimum solution to a problem because it wins, and for it to win there has to be a race.

Why are telcos like T-Systems seeing the competition and survival-of-the-fittest notions of computing and software’s past as a bad thing in their own future?  Answer: they don’t have time for it.  Telcos started their transformation discussions over a decade ago, and in an architecture sense they’ve not really moved the ball much.  Ten years ago, for example, nobody was looking at software-defined anything, or lifecycle automation.  Now it’s clear that software is where things are heading, and so operators are trying to adapt quickly, having started out late, and at the same time they’re trying to avoid missteps and sunk costs.

The interesting thing about that is the fact that open-source software strategies don’t really sink much cost.  If you assume that you’re going to host on either COTS or white-box devices, then the equipment side of your strategy isn’t much in doubt.  If you acquire open-source software, then software costs are minimal to zero.  Thus, you really don’t have to worry about sunk costs unless you think you’re not going to host things or use white-box technology.  Which, in short, means you don’t have to worry.

Carrier cloud should be a given in the sense of cloud infrastructure.  There are also few questions regarding the OS—it’s Linux.  Yes, middleware tools are still up in the air, but that’s true for the cloud overall, and it’s not stopping the cloud.  Are operators simply being nervous Nellies, or are there deeper issues.

One candidate for the deeper issue topic is the bottom-up implementation model that standards groups and operator-driven activities in general have been taking.  If you start at the bottom, you are presuming an architecture and hoping implementations add up to something useful.  I’ve beaten this drum before, of course, and so I don’t propose to continue to do that.

The next candidate is actually the one I cited in the first quote from Clauberg.  Operators lack the qualified resources, which means that they tend to be at the mercy of the rest of the industry, which means the vendors that they think are trying to lock them in and ignore their priorities and issues.  Gosh, gang, if you don’t trust these people why do you continue to under-staff in the places that could create a vendor-independent position?  Back in 2013 I argued that without open-source software emphasis in NFV, there would be little chance the initiative would meet its goals.  Yet operators didn’t do anything to pursue open-source, and NFV goals have not been met.  No surprise here, or at least there shouldn’t be.

What do operators want from standards, or from open-source?  “Success” or “transformation” aren’t enough of an answer, and nobody is going to goal-set for you in the real world.  Operators need to take “transformation” and decompose it into paths toward achieving it, both in the areas of service innovation and operations efficiency.  They’ve yet to do that, and until they do, cooperation among the groups trying to help with transformation is as likely to focus on the wrong thing as on the right.

Google’s Anthos and the Cloud’s Application and Business Model

I mentioned yesterday that Google had a new slant on being a competitive cloud provider, one that focused on combining open-source tools to create a true hybrid cloud platform.  The move is important, maybe even critical, to both public cloud competition and cloud vendors, but it’s still a bit of a work in progress.  I want to take a look at the Google initiative, Anthos, based on the best available information, and see how and when it might impact the concept of enterprise cloud computing overall.

Anthos is a rebranding of Google Cloud Services, but also a reframing of the mission itself.  Anthos takes Kubernetes as a starting point, adds in some other important “Kubernetes-ecosystem” pieces, and creates what is effectively an abstraction of container hosting that can be applied to both data centers and public cloud services to create a true and fully integrated hybrid cloud.  What’s interesting about the latter option is that Anthos would also work with competing cloud provider’s offerings.

It’s not hard to see why Google would take this tack.  Hybrid cloud, meaning the combination of data center hosting and one or more public cloud offerings, is the preferred strategy for enterprises in deploying their current and future applications.  While the cloud computing market overall has gone convincingly to Amazon, hybrid cloud is the fastest growing segment of that market and it’s far from owned by any single cloud player at this point.  In fact, hybrid cloud may be the most significant driver of premises-centric Kubernetes container ecosystems like Red Hat/IBM’s OpenShift, VMware’s PKS Enterprise, or Rancher.

At a deeper, perhaps more cynical, level, Google has to face the fact that it’s just short of being an also-ran in the public cloud space.  If IBM manages to leverage its Red Hat acquisition optimally, it could slip decisively into third place in public cloud computing, and that could doom Google’s own cloud offerings to permanent obscurity.  What Anthos does is extend the “hybrid cloud” notion to mean a hybrid of data center hosting with multiple public clouds.   That’s likely to be a future requirement, but even in the present it offers a nice and easily explained differentiator.  Google’s container cloud is inherently hybrid cloud, and hybrid with anyone and everyone.

Virtually everything related to hybrid cloud these days is linked to Kubernetes, but Anthos adds in two other fundamental elements—Istio for service mesh and Knative for serverless hosting.  Neither of these are household words, but Knative’s role in hybrid cloud is probably the best-kept secret in the industry now, so we’ll start with a look at what these two tools offer.

A service mesh is a means of connecting independent microservices so as to facilitate things like component reuse, load balancing, resiliency, and security.  Any container-based application set that intends to broadly reuse components really needs a service mesh as much as it needs Kubernetes.  It’s interesting to note that both Red Hat and VMware have also just integrated Istio, Google’s service mesh technology, in their own Kubernetes-based container management suites.

Without a service mesh, the use of shared, scalable, resilient microservices is a combination of complex issues.  You need to have virtual networking to make connections for the workflows, you need load balancing, service discovery DDoS protection, and of course deployment and redeployment.  Service meshes integrate all of this with Kubernetes, which makes the use of true microservices much easier.

Knative is a bit harder to explain, partly because the concept of “serverless” computing has gotten conflated with public cloud services of the same name.  It’s probably better to think of Knative as being a kind of logical evolution of containers.  Rather than just having containers represent a sort of “portable deployment”, Knative aims for “resourceless” deployment.  Containers are still representatives of persistent hosting, and Knative adds a tactical on-demand hosting dimension to the picture.  It provides for what’s called a “scale-to-zero” capability, meaning that under very light loads a microservice might have no committed resources at all, and then scale up as load increases.

Kubernetes, Istio, and Knative combine to build a realistic framework for what’s called a “cloud-native” applications deployment model.  Hosting is totally abstract, workloads are balanced and resources scaled, through what’s effectively a unified-logically but distributed-physically middleware layer.  Underneath this abstraction is whatever you want—data center resources, public cloud, whatever.

One of the nice competitive impacts of an abstraction layer like this is that it renders a lot of stuff invisible, which is what you’d like your competitors to be.  If applications are built on the Anthos model, they are fully portable across clouds and data centers, which limits how much others can differentiate themselves.  Full portability of applications surely helps an aspiring cloud provider more than it helps a dominant one.

Speaking of competitive impacts, Anthos may be a step toward defeating what could be a blossoming relationship between VMware and Amazon, or Red Hat and IBM’s cloud.  By making full abstraction and generalizing of hosting resources, including public cloud, Anthos could make any tighter relationships between cloud providers and premises vendors look limiting and even sinister.  It could elevate the hybrid battle from where it is today (cloud providers having a premises solution they link with) to where users would really like it to be (wherever you host, we have you covered).

Anthos could also impact managed container service offerings by the public cloud providers.  If users are able to deploy their containers easily across multiple clouds and their data centers, then that weakens the value proposition for managed container services within specific clouds.  Cross-cloud elasticity of resources is a user goal, and even mechanisms for cluster federation cross clouds won’t fully realize it with managed container services.  Anthos could do that.

All of these changes, of course, depend on Anthos succeeding.  Does that mean they all depend on Google gaining the number three (or even number one or two) spot in public cloud?  No, only that the abstraction-layer model of hosting that Anthos promotes be adopted widely.  Given my earlier comment that both Red Hat and VMware have now introduced Istio, it’s not farfetched to believe that these two companies, and others in the space as well, might frame their own universal-abstraction vision, and even incorporate the same tools.  Competition would then drive everyone to build bigger and better ecosystems.

There are implications for this in carrier cloud and NFV as well.  The cloud is obviously moving toward a broader container-and-Kubernetes ecosystem that serves both to abstract hosting and to connect elements in a uniform way.  These properties would be enormously valuable in carrier cloud and even specifically in NFV.  NFV has suffered from the first because it ignored the cloud initiatives that were maturing as it was first introduced, choosing instead to back a continuation of device-centricity in thinking and planning services, even though NFV was supposed to virtualize network functions.

Most of my readers know that I’ve long said that the cloud had to be viewed as a vast distributed platform with its own unique middleware and APIs.  I have to wonder whether Google is intending Anthos to be that platform, for both traditional cloud computing and the carrier cloud.  Google, recall, has not yet taken any technical steps to grab at the market for carrier cloud, either as a provider of cloud services to operators who don’t want (yet, or ever) to host their own clouds, or as the provider of an architecture on which carrier cloud could be based.  Amazon seems to have aspirations in the space.  Anthos could be Google’s secret weapon, in which case it could be a very important development indeed.

The Street View of Cloud Providers and Vendors

Recently Wall Street has been on a kind of cloud blitz, looking at both providers of cloud services and vendors whose fortunes will likely be impacted or even determined by the cloud.  It’s always interesting to look at these stories, because it’s important to remember that companies are accountable to their shareholders, and investors are the focus of Street research.  The Street isn’t always right (as we’ll see) but it’s always relevant.

Let’s get the “not always right” piece out of the way first, with both a general comment and a specific example.  Investing is all about having your stock picks go up, and success isn’t the only factor in making that happen.  Let’s assume you’re looking at a big player like Amazon or Microsoft, and a smaller up-and-coming.  The Street would love to see the latter win, because the stock would have more potential for appreciation.  Thus, they’d tout a modest win from a new player over a giant win over an established one, particularly if the established player wasn’t a pure play in the cloud alone.

The specific example is the almost-universal Street view that the cloud will eat all of information technology, eventually displacing every data center.  Some of this view derives from my general comment about looking for appreciation in stocks; the cloud enables new technology and thus new players with a lot of upside.  A lot is also due to the fact that it’s hard to write Street research when you always have to say the same thing, so a revolution is a welcome notion.  Finally, the Street doesn’t understand technology, particularly cloud technology, so you can’t expect much technical insight.

I’m raising these points because it’s important to see Street views and issues through the lens of Street biases.  Still, it’s interesting to note that the Street is making some cloud-related calls that make sense from the perspective of technical fundamentals.  What I’ll do is take a “liked” company and one or two counterpoint like-less companies to lead into a discussion of the specific space the players fit in.

Let’s start with one pick that the Street particularly likes for the cloud, IBM.  The fundamentals-oriented Street research seemed to like the Red Hat acquisition, seeing it as being IBM’s opportunity to grab control of the burgeoning “hybrid cloud” opportunity that will dominate both the overall IT space and the cloud space in the future.  IBM had some solid assets all along, but Red Hat opens that hybrid cloud door.  In addition, Red Hat opens a door for IBM that the Street doesn’t seem to value—an opportunity to broaden its customer base.  IBM’s strength tends to lie in old-line IBM accounts, who are getting…well…older.  Red Hat is the opposite, the darling of open-source success in the New IT World.

For every winner, there’s at least one loser, and the Street has a “relative loser” and “big loser” candidate in Dell and HPE, respectively.  In both cases, the company’s cloud positioning and opportunity is a factor in the Street analysis.  The Street sees Dell with assets it’s not maximizing, and HPE with either no assets or no strategy for leveraging them.

Dell’s big strength in the cloud is VMware, one of the subsidiaries.  One analyst makes the point that if you were to add up the share prices of all the Dell subsidiaries (VMware, Pivotal, Secureworks) you’d end up with a value greater than the Dell market capitalization, which would mean that Dell’s own business has negative value.  Two of the Dell subsidiaries could be the basis for a strong cloud story (VMware and Pivotal), and obviously these could be married to a server story to create a nice ecosystem.  Dell doesn’t do that.

HPE’s issue is simpler; think of HPE as Dell without subsidiaries.  If the server core of Dell has negative value, where does that leave a company that’s all server core?  The problem is that in today’s world, even without the cloud dimension, a server is something you run your software on.  Add in the presumption of hybrid cloud dominance in IT strategic planning, and a pure-play server strategy sure doesn’t look very survivable.  Add to that the fact that HPE is probably as bad at marketing as IBM is, and you sure don’t look like you’re in a good place.

The key point here is that it’s not about IBM, HPE, or Dell as much as about Red Hat and VMware.  What is true is that hybrid cloud has always been the key to the cloud market, not because it’s a path through which every data center app will pass as it heads to its inevitable cloud destination, but rather because most data center apps will never move entirely to the cloud.  Instead, they’ll break into a cloud-front-end and data-center-back-end, and stay that way.  The combination of componentization and deployment and management of the componentized and cloud-divided pieces creates a new software architecture, an architecture that Red Hat and VMware (and Pivotal, to a lesser degree) are well-suited to provide.

Who wins in the Red Hat versus VMware war, then?  That’s tougher to say, because it may be that the strategies of their parents will decide the question.  I think Red Hat has a broader set of software assets, a larger number of pathways to succeed, but VMware has made a lot of very smart moves lately, and Dell has largely kept hands off.  IBM really can’t afford to treat Red Hat as totally separate or it probably won’t gain the symbiosis the Street expects.

The hybrid cloud story is also a battleground for the public cloud providers, notably Amazon, Google, and Microsoft.  These players live in a market undergoing a largely misunderstood transformation.  Early cloud computing, in IaaS form, got a boost from the “server consolidation” movement, a desire to eliminate under-utilized servers often located outside the data center.  As time passed, OTT startups recognized the value of the cloud as the hosting point for web-related activity.  The server consolidation mission never had a chance to displace more than about 24% of current servers, and so startups became the major piece of the cloud market.  Now, the value of that two-piece front/back-end application model has come home, and enterprise cloud spending is on the rise.  This is what “hybrid cloud” means to the cloud providers themselves.

Microsoft has long led the “hybrid cloud” space, to the extent that the space existed in the past, because it wanted from the first to tie in its data center Windows Server stuff with its cloud positioning.  Amazon has been struggling to create a similar symbiosis, perhaps with VMware, and Google is now promoting a kind of exportable public cloud vision that even ties in competitors.

The Street seems to want to have Microsoft get the win here.  They don’t think Google’s cloud business wags enough of the ad revenue dog to promote much share appreciation for Google if they won.  They think Amazon’s share price already has the cloud baked in.  One Street analyst said “Easy.  Go long on Microsoft cloud, and sell Amazon short” (bet they’ll go down).  I don’t agree.

There doesn’t need to be any specific symbiosis between data center and cloud products to win a hybrid cloud opportunity.  The front and back ends of these applications are loosely coupled, so what really matters is having the right pieces inside the cloud to support front-end development, and a good hand-off strategy.  Amazon and Microsoft are more evenly matched in that regard, and so the big question may well be the path to market for the solutions of each of the vendors.

Well-designed cloud applications require a new model of application composition and deployment, a new lifecycle management process, and even a new networking model.  These are technical, even highly technical pieces that senior management is unlikely to understand.  Without senior management buy-in, though, no hybrid transformation wave is going to crest very far in from the beach.  Does each company have to rely on its technical geeks learning business-talk, its executives learning tech-talk?  That hasn’t worked through the whole history of IT.  Somebody, some vendor or cloud provider, is going to have to do a Rosetta Stone.

Google may have this in mind with its new strategy for the cloud, relying primarily on open-source elements rather than proprietary in-cloud-only tools.  One hidden advantage of this approach is that it’s easy to hybridize because users can deploy the same stuff in their data center as Google is using to build its cloud services.  Google could be intending to create a true hybrid cloud ecosystem that’s able to seamlessly cross the front/back-end boundary, an ecosystem created by assembling the right middleware tools.

This point is why Red Hat and VMware are important, why IBM is Street-positive and HPE Street-negative.  You can’t promote a hybrid cloud, or indeed any form of modern application design, without the middleware toolkit.  Both Red Hat and VMware provide that kit.  Amazon and Microsoft are both reaping the benefits of the shift, but neither are really driving it.  For drivers, we have to look back to Red Hat and VMware, which might mean that Amazon’s VMware alliance and IBM’s cloud linkage to Red Hat could be important for them.  HPE needs to assemble its own arsenal of hybrid middleware and productize it, and Google needs to find some hybrid architectural model and promote it.  Otherwise, the Street will be proven right.

The good news for both Google and HPE is that the role of “assembler-of-the-ecosystem” is one that’s at least de facto accepted in open-source.  The best of open source has come to us through other players who have pulled together suites of symbiotic elements.  HPE and Google could join that elite group, if they go about it right.  There are signs that Google’s current open-source could initiative is aimed at that assembler goal, which leaves HPE as the outlier.

HPE has some hybrid cloud initiatives, the make-up with Nutanix being a recent example, but they seem to be aimed at boosting private cloud directly and hybrid cloud by proxy.  Unlike IBM and Dell, they don’t seem to have a strong middleware-driven hybrid cloud vision, which they need.

Is the White-Box Space Facing the “Death of Too Many Choices?”

We may be heading for a solid white-box architecture, which is good.  We may have two distinctly different paths that could get us there, which is both good and bad.  It does seem clear that we’re setting up for a bit of competition among open-source giants in networking overall, and for the white-box stuff we have two open-source groups (the Linux Foundation and the ONF) promoting approaches, along with a half-dozen other initiatives.  We can afford more than one solution in the market, but too much fragmentation could hurt everyone, so let’s look at issues and options to see what might be needed.

A “white box” switch (or router) is a network device that’s designed to work with third-party software to build the functionality.  In theory, you could build an open white-box design, as Facebook has done with its combination of a white-box device model and its FBOSS software platform, but few would have the resources to do that and the bully pulpit of their own data centers and network to deploy them in.  Most white-box strategies are thus going to come from an organization, which is why the Linux Foundation and the ONF are important.

The basis of any white box is, of course, the box.  A hardware platform has specific properties that have to align with the operating system software.  The most popular and universally understood platform is the model that’s popularly used in computers, particularly Linux servers.  A white-box device based on server-like hardware can run a “light” version of Linux easily, and since the platform interfaces are well-known it’s also fairly easy to build a custom switch OS for the same hardware.

The server-like white-box model gives rise to what we could call “bicameral white-box software”, meaning “two brains”.  There’s an operating system and there’s switching software that runs on it, similar to how a switch/router instance runs as a software component on a standard server.  Using this approach has the advantage of providing a “switching brain” that can serve as a hosted function in the cloud.  uCPE that mimics a server and is fully compatible with cloud server tools would be an example of this approach, and it’s clearly going to be a future option for the market.

The problem with the server-like white-box model is that servers aren’t switches.  Custom chips could make a hardware platform a much better switching platform, and every use of a white box doesn’t demand that it be cloud-compatible.  Where price/performance is critical, we have Facebook’s decision to prove that it can pay to do a custom white-box, and AT&T also proves that point with its dNOS white-box switch OS, turned over to the Linux Foundation as the DANOS project.

Facebook’s FBOSS design is an indicator of what’s needed here.  The key is an “abstraction layer” that’s a bit like a thin operating system.  This layer creates a virtual hardware platform that can then be exploited by a “real” operating system above.  If you need your white-box software to run on different hardware, this model is a smart way to achieve it.

If forwarding efficiency is going to be the key for white-box switches, then it makes sense to have an abstraction layer that can handle specialized forwarding chips.  I’ve mentioned the P4 project in earlier blogs, but in summary it’s a project to develop a forwarding language and an abstraction-layer model that allows that language to be used on various hardware with various chips (or even with no special chips at all).  P4 was a separate project (p4.org) but is now hosted by the ONF.  The idea is that a chip vendor or box vendor would provide the P4 abstraction layer for its devices, and this would enable the “second-brain” switching software to work on their stuff.  You can read a good summary of the approach HERE.

P4 as a project has been working with both the Linux Foundation and the ONF, but it’s now hosted by the ONF and tightly integrated into the ONF Stratum project for white-box SDN as well as its Converged Multi-Access and Core (COMAC) reference design, applicable to 5G.  DANOS, the Linux Foundation distributed network operating system, also references P4.

The benefit of P4 in a white-box design is that it encourages what might be called a “tri-cameral” model.  The second-brain switching software is divided into two pieces, the P4 forwarding language part that describes data-plane behavior, and a control-plane part that manages the framework that turns device forwarding into path routing.  For example, you could write a P4 program to do IP forwarding, and it could be the same whether you added traditional IP discovery and routing table control, or SDN central control, on top.

This to me is the optimum model for a white-box switch, because it makes it possible to write “open forwarding” that can be adapted to various chips using the common “plugin” practice we see in networking.  If you were to add P4 to something like DANOS or Stratum, you’d end up with a highly portable and open model for white-box network devices.  The question is whether either will actually work hard to do the integrating, and whether the result will be widely used.  Right now, those two factors are working in opposite directions with respect to our two open-source white-box models.

DANOS is AT&T’s strategy, and AT&T is already deploying it in cell sites and starting to deploy in other white-box missions for business services and 5G.  That alone gives DANOS an installed-base advantage, and street creds with other network operators.  However, the DANOS white paper on the Linux Foundation website is still the original AT&T dNOS paper, and the Foundation website says that stuff is still “coalescing” around DANOS.  P4 is mentioned in the AT&T paper, but it’s hard to say how committed the Linux Foundation is to it.

Stratum is the brain-child of the ONF, which is an “operator-led consortium” and the father of commercial OpenFlow-based SDN.  It’s a mature, active, organization with plenty of PR (the recent show illustrates that), and the ONF is tightly linked with 5G initiatives.  Still, it’s clear that Stratum won’t have the advantage of AT&T’s major commitment to DANOS in building its own installed base.  On the flip side, it’s clearly committed to P4 and now hosts the P4 project.

One interesting thing about the white-box evolution is that while arguably OTTs like Google and Facebook started things off, the network operators may now be driving it.  That’s because the sheer volume of white-box devices an operator would deploy for 5G or carrier cloud makes the operators’ decisions on strategies and solutions automatically critical.  One Tier One could create a credible installed base for anything just with its own usage.  That’s why it may be that DANOS has the best shot at white-box supremacy in the near term.

What’s important, I think, is that something wins the white-box crown.  An open market with a dozen incompatible alternative approaches isn’t any better than a proprietary, competitive, market at coalescing support and defining a universal model.  It might be worse because absent competition, promotion of any approach is problematic.  If our two open-source players are starting to behave like competitors it might be a good thing.