Why is Google Credible as a Telco Cloud Partner?

Why does Google seem to have momentum in carrier edge computing relationships?  This piece in Light Reading highlights a general if low-key set of successes Google has enjoyed in the network operator space.  If anything, their momentum in the space seems to be building.  Why are they doing so well, so suddenly?

First-off, there’s nothing sudden about it.  A series of operators asked me to feel Google out for partnership well over a decade ago, but at the time Google’s management wasn’t interested.  The reason they had provisionally picked Google was interesting, though.

First, Google wasn’t Amazon, who operators almost universally feared.  This was the primary reason why Google was preferred a decade ago.  Because Amazon was (and is) the leading provider of public cloud services, operators saw them as being the most likely long-term competitor with regard to any cloud plans operators might develop.  Microsoft has been seen mostly as a provider of enterprise cloud services, though today it’s clear that Microsoft has broader aspirations.  Google was the clear choice, if somewhat by default.

The second point (more applicable today than in years past) is that Google is seen as having the most experience in areas where operator interest is high.  Google’s cloud network is the world’s largest SDN application.  Google developed Kubernetes and Istio, both of which are now seen as critical pieces of the cloud-native world.  Google’s content delivery processes, linked to its YouTube services family, are also one of the best examples of edge computing currently deployed.

These two early plusses for Google have increased in credibility today, and this gain has collided with some new factors, some on the side of the network operators and some on Google’s front.

The first new-ish development is Google’s willingness to consider a partnership with operators.  My sources say that arose in earnest about five years ago, and it was due in part to a recognition that operators had a natural advantage in edge computing—they had real estate at the edge.  More recently (in the last year) Google was aware that other cloud providers were showing interest in hosting carrier cloud applications, which could have given a competitor an unassailable lead in the cloud space.

From the operators’ side, the big change is the realization that they don’t have a clue as to how to proceed with carrier cloud deployment.  The vendors they trust, the network equipment vendors they know, don’t have a clue either.  The vendors who do understand the mission (Red Hat, VMware, and so forth) don’t understand the carriers, and the carriers have little or no experience.  In addition, since the operators have no internal cloud software expertise, they’re in no position to assess any solutions.  Outsourcing of some sort makes sense.

An additional operator concern is 5G.  There was a hope, for a time at least, that 5G initiatives could be confined to the NSA (non-standalone) RAN-only upgrades, but operators now believe that for competitive reasons and to harness hoped-for 5G specific benefits, they’ll need to implement 5G Core.  That means having somewhere to host it, and to preserve latency goals for 5G applications, that somewhere has to be the edge.  Imagine fearful operator planners confronting the need to deploy tens of thousands of edge data centers.  Not a happy picture.

The union of these interests lies in the final benefit of outsourcing; incremental commitment.  Operators don’t know how much 5G Core deployment they might push because they don’t know the rate of 5G adoption or the pace at which new 5G-specific (or facilitating) applications might evolve.  If things are slow, they could end up with their own edge data centers sitting idle till they’re obsolete.  If the opportunities develop quickly, they could end up being delayed in addressing them by a lack of edge computing capacity.

Google could really be on to something here, IMHO.  There’s no question that if you had to pick a single player who has a thorough understanding of the cloud and cloud-native development, who understands how to apply hosted function technology to IP networks, who understands next-gen services, Google would be at or near the top of the list.  That doesn’t mean they have an automatic win, though.  What does Google need to buff up their chances of total victory?

Thing one is to recognize that network operators are the world’s most experienced tire-kickers.  Over 80% of all operator RFPs don’t end up delivering any significant production deployments.  The process of assessing technology and the process of deploying it are so separated in most operators that the two organizations may not even like each other.  The “assessors” tend to be in the driver’s seat in early proof-of-concept deals.  Google needs to make sure they get broadly engaged, early on.

The second thing is that the operators’ goals for 5G are actually beyond the scope of their influence on the market.  Operators cannot make IoT or augmented reality or contextualized services happen.  There has to be a broad-market commitment to the concept.  Operator notions of how to build these communities center on announcing developer programs that really don’t offer much benefit to the developers at all.  Google needs to be able to frame things like IoT in terms of new carrier cloud services, since operators can’t.

The third thing is that operators don’t really know what they want from carrier cloud, and don’t know how to find out.  Current interest focuses on 5G mobile-edge because 5G deployment is a given for operators in this market environment.  But what justifies further carrier cloud deployment once you’ve hosted 5G Core features?  When would operators decide to pull the hosting in-house?  What you hear from operators is platitudes like “transformation”, which is clearly too vague to serve as the basis for a plan.  Google needs to understand where operators could and should take carrier cloud, and ensure that there’s always a new application on the horizon to renew interest in outsourcing.

The final thing is that operators don’t really want to sell enterprises cloud computing, they just want to let them buy.  Many of these cloud deals involve the cloud provider becoming a partner in selling cloud computing to enterprises.  The cloud providers usually see this as a way to create a channel partner, but that would only be true if the operators were really trying to sell.  They’re happy to take orders, but where would operator sales people get the training and contacts to actually sell cloud computing services?  If Google expects something to happen here, they have to be prepared to prime the pump.

Then, of course, there’s competition.  Google is far from the only game in town, not as a cloud provider and not as a carrier cloud architecture contender.  There’s clearly an opportunity out there, and the same issues that Google would have to face could be faced by a competitor, perhaps quicker and more effectively.

Amazon, never the favored partner, has been working hard to establish itself in the space, and Amazon has a lot of edge experience and content delivery capabilities on their own.  They held a Telecom Symposium that got not only network operator participation but also the participation of vendors, including integrators, OSS/BSS players, and some network vendors.  While this doesn’t guarantee Amazon can bring an ecosystem of its own to play, it at least indicates it might have the credentials to attract one.

Microsoft wants to be the carrier cloud outsource player of choice, and wants it badly.  They have their own program aimed at victory, and they made an incredibly smart play acquiring Metaswitch, a software company with specific expertise in mobile infrastructure virtualization, including 5G.  IBM also wants the prize, and they’re working to frame their own cloud plus Red Hat tools into a carrier cloud framework that could not only be run on IBM’s cloud but also hosted on operator data centers.

VMware has designs on carrier cloud too.  Dell, the senior partner in VMware, had at one time committed to a group of operators that they would take a big position in function virtualization—including having Michael Dell make the announcement.  It didn’t happen, but VMware seems now to be carrying the torch.  VMware has a good relationship with all the public cloud providers, including Google, and they could convert Tanzu into a cloud-portable strategy that would also allow operators to pull some hosting back into their own data centers.

Service Automation: OSS/BSS or ZSM?

Are we seeing a hidden battle between operations automation alternatives?  On the one hand, there are clearly many developments in the OSS/BSS space, driven by vendors like Amdocs who want to reduce operations costs and improve operations practices, by enhancing traditional operations applications.  On the other hand, some operators are still looking for near-revolutionary changes in lifecycle automation, through things like ETSI ZSM or ONAP.  The balance of these two approaches could be very important.  In fact, it already has changed the nature of lifecycle automation.

One fundamental truth in network cost of ownership is that capex, for most operators, is lower than opex.  In fact, operators spend only about 20 cents per revenue dollar on capex, and they spend over 40% more than that on opex.  In 2016, when I started analyzing opex cost trends, service lifecycle automation could have saved operators an average of 7 cents on each revenue dollar, equivalent to cutting capex by a third.

Things like SDN and NFV were aimed primarily at capex reduction, and that in fact has been one of the issues.  The actual benefit of hosting functions on a cloud versus discrete devices has proven to be far less than 20%, and a lot of operators report that benefit is erased by the greater operational complexity of hosted-function networking.  It’s therefore not surprising that as the hopes for a capex revolution driven by virtualization waned, operators became sensitive to opex reduction opportunities.

Service lifecycle automation, meaning the handling of service events through software processes rather than manual intervention, has the advantage of scope of impact, and the same thing is a disadvantage.  Retooling operations systems to be driven by centralized automation platforms of any sort is the kind of change that makes operators very antsy.

That’s particularly true when there’s really no service lifecycle automation model to touch and feel.  In 2016, when the opex challenge really emerged in earnest, we had no progress in standards and little progress with the proto-ONAP framework.  Sadly, it’s my personal view that we’re in much the same situation today.  I do not believe that ETSI is on the right track, or even a survivable track, with ZSM, and I don’t think ONAP would scale to perform the lifecycle automation tasks we’re going to confront.

That’s where the OSS/BSS alternative comes in.  Vendors and operators both realized that “opex reductions” or “lifecycle automation” was really about being able to cut headcount.  Yes, you could frame a true service lifecycle automation to do that optimally, if you knew what you were doing.  Neither telcos nor telco equipment vendors apparently had the confidence they did.  You could also tweak the current operations systems to handle the current network-to-ops relationships better, and leave the network and network-related event-handling alone.

This isn’t, in the short term at least, a dumb notion.  Of that seven cents per revenue dollar that’s on the table for full-scale lifecycle automation, about three cents could be achieved by tweaking the OSS/BSS.  Some additional savings can be had by framing services to require less lifecycle automation; less dependence on SLAs, customer portals to reduce operator personnel needs, and so forth.  Overall, operators have generally been able to hold their ground on opex, and in many cases have even been able to reduce it over time.  At least four of those seven cents are now largely off the table.

While that doesn’t mean that ZSM (or what I’d consider a better model of lifecycle automation) is dead.  What it does likely do is tie lifecycle automation success to the widespread use of carrier cloud technology.  The substitution of functions as the building blocks for services, versus devices, demonstrably creates more service complexity.  However, even carrier cloud success might not create ZSM or ONAP success.

The cloud community, including Google, Microsoft, Amazon, Red Hat, and VMware, are all working feverishly to enhance the basic Kubernetes ecosystem.  That process will shortly create a framework for lifecycle automation for nearly all componentized applications, the only possible exception being the components associated with data-plane handling.  The exact nature of data-plane functions is still up in the air; most operators favor the notion of a white box rather than a commercial server.  Given that, only widespread NFV adoption using cloud hosting would be likely to accelerate the need for the ZSM or ONAP model of service lifecycle automation.  Otherwise the cloud-centric approach would serve better.

White box data-plane functions would really look like devices with somewhat elastic software loads, similar to the NFV uCPE model.  These applications don’t really impose a different management model for function-based services; the services are just based on open devices rather than vendor platforms.  I doubt whether the differences are sufficient to justify any new management model; we already manage devices in networks.

This seems to have been one of the original goals of NFV; if you focus on virtual devices you can employ device management for at least the higher-level management functions.  The only remaining task is the management of how the virtual elements are hosted and combined, which is a more limited mission.  It could have been a reasonable approach had it been more explicitly articulated and if the consequences of the approach (divided management) had been dealt with, by (for example) embedding the collection of functions within a virtual device in an intent-modeled element.

The OSS/BSS players seem to be sticking with the device-management approach, and that may well be because they don’t see a widespread operator push for deploying their own carrier cloud resources.  Until you commit to carrier cloud on a broad scale, meaning beyond NFV and 5G Core, you have no real need to consider how you manage naked functions.  That’s because follow-on drivers for carrier cloud, like IoT, don’t have real-device network models in place, so function-based services are likely to develop.  Where we have devices, the OSS/BSS-centric solution is workable, or can be made so.

As is often the case, where we end up with regard to operations automation will likely depend on just how far operators take “carrier cloud” and function-based services.  That seems likely to depend on whether operators stay within their narrow connection-services comfort zone, or step out into a broader vision of the services they could provide.

Why Function Integration Needs to Pick an Approach

“What are we supposed to integrate?”  That’s a question a senior planner at a big telco posed to me, in response to a blog where I’d commented that virtualization increased the need for integration services.  The point caught me by surprise, because I realized she was right.  Integration is daunting at best, but one of the challenges of virtualization is that it’s not always clear what there is to integrate.  Without some specifics in that regard, the problem is so open-ended as to be unsolvable.

In a device network, it’s pretty obvious that we integrate devices.  Devices have physical interfaces, and so you make connections from one to the other via those interfaces.  You parameterize the devices so the setup of the interfaces is compatible, and you select control-plane protocols (like BGP) so everyone is talking the same language.  We all know that this isn’t always easy (how many frustrations, collectively across all its users, has BGP alone generated?) but it’s at least possible.

When we move from device networks to networks that consist of a series of hosted virtual functions, things get a lot more complicated.  Routers build router networks, so functions build function networks—it seems logical and simple.  The problem is that “router” is a specific device and “function” is a generic concept.  What “functions” are even supposed to be connected?  How do they get connected?

Standards and specifications, in a general sense, aren’t a useful answer.  First, you really can’t standardize across the entire possible scope of “functions”.  The interface needed for data-plane functions, for example, might still have to look like traditional network physical interfaces, such as Ethernet.  The interface needed for authentication functions might have to be an API.  Second, there are so many possible functions that it’s hard to see how any given body could be accepted to standardize them all.  Finally, there’s way too much time needed, time we don’t have unless we want virtualization to be an artifact of the next decade.

A final issue here is one of visualization.  It’s easy to visualize a device, or even a virtual device.  It’s a lot harder to visualize a function or feature.  I’ve talked to many network professionals who simply cannot grasp the notion of what might be called “naked functions”, of building up a network by collecting and integrating individual features.  If that’s hard, how hard would it be to organize all the pieces so we could at least talk integration confidently?

I’ve been thinking about this issue a lot, and it appears to me that there are two basic possibilities in defining a framework for virtual-function integration, including the ever-popular topic of “onboarding”.  One is to define a “model hierarchy” approach that divides functions into relevant groups, or classes, and provides a model system based on that approach.  The other is to forget trying to classify anything at all, and instead devise an “orchestration model” that defines how stuff is supposed to be put together.

We see examples of the first approach where we have a generalized software module that includes “plugins” that specialize it to something specific.  OpenStack uses this approach.  The challenge with it is to somehow avoid having to define thousands of plugins because you’ve been completely disorderly in how you defined the input to that plugin process.  That’s where the idea of a hierarchy of classes comes in.

All network functions, in this approach, descend from a superclass we could call “network-function”.  This class would be assigned a number of specific properties; it might, for example, have a function you could call on which would have it return its specific identity and properties.  In the original ExperiaSphere project, I included this as the function “Speak”.  Most properties and features would come from subordinate classes that extended that superclass, though.  We could, for example, say that there were four subclasses.  The first is “flow-thru-function” to indicate that the function is presumed to be part of a data plane that flowed traffic in/out.  The second is “control-plane-function” that handled peer exchanges to mediate behavior (BGP would fall into this), and the third “management-plane-function” where management control is applied.  The final subclass is “organizational-function” to cover functions that are intended to be part of the glue that binds a network of functions together.

If we look a bit deeper here, we can find some interesting points.  First, there are going to be many cases where network functions depend on others.  Here, “flow-thru-function” is almost certain to include a control-packet shunt facility, a kind of “T” connection.  This would feed the “control-plane-function” in our example, providing for handling of control packets.  Since it’s necessary to separate control and data plane for handling in our example, rather than require a separate function to do that, which would increase cost and latency, we could require it of at least some flow-thru-functions.

The second point is that we would need to define essential interfaces and APIs for each of our classes.  The goal of doing this based on a class hierarchy is to simplify the process of adapting specific implementations of a class, or implementations of a subordinate class, to software designed to lifecycle-manage the class overall.  If we know what interfaces/APIs a “firewall-function” has, and we write software that assumes those APIs, then all we have to do is adapt any implementations to those same interfaces/APIs.

Another useful point is then raised by the last one.  We still need to define, in order to build our hierarchy of classes, some basic assumptions about what network functions do and how they relate.  We also need to have vendors/products aligned with the classes.  If both of these are done, then the integration of a function requires the creation of whatever “plugin” code is needed to make the function’s interfaces conform to the class standard.  Vendors provide the mapping or adapting plugins as a condition of bidding for the business.

The other approach is simpler on one hand and more complicated on the other.  It’s simpler because you don’t bother defining hierarchies or classes.  It’s more complicated…well…because you didn’t.  In fact, it’s complicated to explain it without referencing something.

If you harken back to my state/event-based concept of service management, you recall that my presumption was that a service, made up of a collection of lower-level functions/elements, would be represented by a model.  Each model element, which in our example here would correspond to a function, has an associated state/event table that relates its operating states and events to the processes that are supposed to handle them.  Remember?

OK, then, what the “orchestration model” says is that if a vendor provides the set of processes that are activated with all the state/event combinations, then these processes can take into account any special data models or APIs or whatever.  The process set does the integration.

Well, it sort-of does.  You still have to define your states and events, and you still have to agree on how events flow between adjacent elements.  But this seems a lot less work than building a class hierarchy.  But even here we have to be wary of appearances.  If there are a lot of vendors and a lot of functions, there will be a lot of work done, when had we taken time to put together our class hierarchy, we might have been able to do some simple adapting and largely reuse all those processes.

A class-hierarchy approach organizes functions, following the kind of practices that have been used for decades in software development to create reusable components.  By structuring functional interfaces against a “class reference”, it reduces the variability in interfaces associated with lifecycle management.  That limits how much integration work would be needed for actual management processes.  The orchestration model risks creating practices so specialized that you almost have to redo the management software itself to accommodate the variations in how you deploy and manage functions.  Class hierarchies seem likely to be the best approach, but the approach flies in the face of telco thinking and, while it’s been known and available from the first days of “transformation”, it never got much traction with the telcos.  The orchestration model almost admits to a loss of control over functions and deals with it as well as possible.

Our choice, then, seems a bit stark.  We can either pick a class-hierarchy approach, which demands a lot of up-front work that, given the pace of telecom activity to date, could well take half a decade, or we can launch a simpler initiative that could end up demanding much more work if the notion of function hosting actually catches on.  If we could find a forum in which to develop a class hierarchy, I’d bet on that approach.  Where that forum might be, and how we might exploit it, are as much a mystery to me as ever.

I think I know what to do here, and I think many others with a software background know as well.  We’ve known all along, and the telco processes have managed to tamp down the progress of that knowledge.  Unless we admit to that truth, pick a strategy that actually shows long-term potential and isn’t just an easy first step, and then support that approach with real dollars and enforced compliance with the rules the strategy requires, we’ll be talking about this when the new computer science graduates are ending their careers.

An Assessment of Four Key ETSI Standards Initiatives in Transformation

The topic of telco transformation is important, perhaps even critical, so it’s good it’s getting more attention.  The obvious question is whether “attention” is the same as “activity”, and whether “movement” is equivalent to “progress”.  One recent piece, posted on LinkedIn, is a standards update from ETSI created by the chairs of particular groups involved in telco transformation, and so frames a good way of assessing just what “attention” means in the telco world.  From there, who knows?  We might even be able to comment on progress.

The paper I’m referencing is an ETSI document, and I want to start by saying that there are a lot of hard-working and earnest people involved in these ETSI standards.  My problem isn’t in their commitment, their goals, or their efforts, it’s in the lack of useful results.  I participated in ETSI NFV for years, creating the group that launched the first approved proof-of-concept.  As I said in the past, I believe firmly that the group got off on the wrong track, and that’s why I’m interested in the update the paper presents.  Has anything changed?

The document describes four specific standards initiatives; NFV, Mobile Edge Computing (MEC), Experimental Networked Intelligence (ENI), and Zero Touch Network and Service Management (ZSM).  I’ll look at each of them below, but limit my NFV comments to any new points raised by the current state of the specifications.  I do have to start with a little goal-setting.

Transformation, to me, is about moving away from building networks and services by connecting devices together.  That’s my overall premise here, and the premise that forms my basis for assessing these four initiatives.  To get beyond devices, we have to create “naked functions”, meaning disaggregated, hostable, features that we can instantiate and interconnect as needed.  There should be no constraints on where that instantiation happens—data centers, public clouds, etc.

This last point is critical, because it’s the goal of software architecture overall.  The term most-often used to describe it is “cloud-native” not because the stuff has to be instantiated in the cloud, but because the software is designed to fully exploit the virtual, elastic, nature of the cloud.  You can give up cloud hosting with cloud-native software, if you want to pay the price.  You can’t gain the full benefit of the cloud without having cloud-native software, though.

Moving to our four specific areas, we’ll start with the developments in NFV.  The key point the document makes with regard to Release 4 developments is “Consolidation of the infrastructural aspects, by a redefinition of the NFV infrastructure (NFVI) abstraction….”  My problem with this is that in virtualization, you don’t enhance by narrowing your hosting target or subdividing it, but rather by enhancing how hosting abstractions are reflected.  This, in NFV, is handled by the Virtual Infrastructure Manager (VIM).

Originally, the VIM was seen as a single component, but the illustration in the paper says “VIM(s)”, which admits to the likelihood that there would be multiple VIMs depending on the specific infrastructure.  That’s progress, but it still leaves the question of how you associate a VIM with NFVI and the specific functions you’re deploying.  In my own ExperiaSphere model, the association was made by the model, but it’s not clear to me how this would work today with NFV.

The paper makes it clear that regardless of the changes made in NFV, it’s still intended to manage the virtual functions that replace “physical network functions” (PNFs), meaning devices.  Its lifecycle processes and management divide the realization of a function (hosting and lifecycle management of hosting-related elements) from the management of the things that are virtualized—the PNFs.  That facilitates the introduction of virtual functions into real-world networks, but it also bifurcates lifecycle management, which I think limits automation potential.

The next of our four standards areas is “Multi-access Edge Computing” or MEC.  The ETSI approach to this is curious, to say the least.  The goal is “to enable a self-contained MEC cloud which can exist in different cloud environments….”  To make this applicable to NFV, the specification proposes to create a class of VNF (the “MEC Platform”) which deploys, and which then contains the NFV VNFs.  This establishes the notion that VNFs can be elements of infrastructure (NFVI, specifically), and it creates a whole new issue set in defining, creating, and managing the relationships between the “platform” class of VNFs and the “functional” class we already have.

This is so far removed from the trends in cloud computing that I suspect cloud architects would be aghast at the notion.  The MEC platform should be a class of pooled resources, perhaps supported by a different VIM, but likely nothing more than a special type of host that would (in Kubernetes, for example) be selected or avoided (by taints, tolerations, affinities, etc.) via parameters.

The MEC concept seems to me to be moving carriers further from the principles of cloud computing, which are evolving quickly and effectively in support of both public and hybrid cloud applications.  If operators believe that they can host things like 5G Core features in the public cloud, why would they not flock to cloud principles?  NFV started things off wrong here, and MEC seems to be perpetuating that wrong direction.

Our next concept is Experiential Networked Intelligence (ENI), which the ETSI paper describes as “an architecture to enable closed-loop network operations and management-leveraging AI.”  The goal appears to be to define a mechanism where an AI/ML intermediary would respond to conditions in the network by generating recommendations or commands to pass along to current or evolving management systems and processes.

Like NFV’s management bifurcation, this seems aimed at adapting AI/ML to current systems, but it raises a lot of questions (too many to detail here).  One question is how you’d coordinate the response to an issue that spans multiple elements or requires changes to multiple elements in order to remediate.  Another is how you “suggest” something to an API linked to an automated process.

To me, the logical way to look at AI/ML in service management is to presume the service is made up of “intent models” which enforce an SLA internally.  The enforcement of that SLA, being inside the black box, can take any form that works, including AI/ML.  In other words, we really need to redefine how we think of service lifecycle management in order to apply AI to it.  That doesn’t mean we have to scrap OSS/BSS or NMS systems, but obviously we have to change these systems somewhat if there are automated processes running between them and services/infrastructure.

That brings us to our final concept area, Zero-touch network and Service Management (ZSM).  You can say that I’m seeing an NFV monster in every closet here, but I believe that ZSM suffers from the initial issue that sent NFV off-track, which is the attempt to depict functionality that then ends up being turned into an implementation description.

I absolutely reject any notion that a monolithic management process, or set of processes built into a monolithic management application, could properly automate a complex multi-service network that includes both network devices and hosted features/functions.  I’ve described the issue in past blogs so I won’t go over it again here, but it’s easily resolved by applying the principles of TMF NGOSS Contract, a concept over a decade old.  However, an NGOSS Contract statement of ZSM implementation would say that the contract data model is the “integration fabric” depicted in the paper.  Absent that insight, I don’t think a useful implementation can be derived from the approach, and certainly an optimum one cannot be derived.

What, then, is the basic problem, the thing that unites the issues I’ve cited here?  I think it’s simple.  If you are defining a future architecture, you define it for the future and adapt it to the present.  Transition is justified by transformation, not the other way around.  What the telcos, and ETSI, should be doing is defining a cloud-native model for future networks and services, and then adapting that model to serve during the period when we’re evolving from devices to functions.

Intent modeling and NGOSS Contract would make that not only possible, but easy.  Intent modeling says that elements of a service, whether based on PNFs or VNFs, can be structured as black boxes whose external properties are public and whose internal behaviors are opaque as long as the model element’s SLA is maintained.  NGOSS Contract says that the service data model, which describes the service as a collection of “sub-services” or elements, steers service events to service processes.  That means that any number of processes can be run, anywhere that’s convenient, and driven from and synchronized by that data model.

The TMF hasn’t done a great job in promoting NGOSS Contract, which perhaps is why ETSI and operators have failed to recognize its potential.  Perhaps the best way to leverage the new initiative launched by Telecom TV and other sponsors would be to frame a discussion around how to adapt to the cloud-native, intent-modeled, NGOSS-contract-mediated, model of service lifecycle automation.  While the original TMF architect of NGOSS Contract (John Reilly) has sadly passed, I’m sure there are others in the body who could represent the concept at such a discussion.

This paper was posted on LinkedIn by one of the authors of the paper on accelerating innovation in telecom, a paper I blogged about on Monday of last week.  It may be that the two combined events demonstrate a real desire by the telco standards community to get things on track.  I applaud their determination if that’s the case, but I’m sorry, people.  This isn’t the way.

A Possible Way to Avoid Direct Subsidies for Rural Broadband

Is it possible to estimate broadband coverage potential for new technologies?  I’ve blogged many times about the effect of “demand density” (roughly, a measure of how many opportunity dollars a mile of infrastructure would pass) on the economics of broadband.  Where demand density is high, it’s possible to deliver broadband using things like FTTH because cost/opportunity ratios are favorable.  Where it’s low, cost has to be ruthlessly constrained to get coverage, or subsidies are needed.

We know from experience that, using my metrics for demand density, an average density of about 4.0 will permit quality broadband under “natural market” conditions for at least 90% of households.  Where demand density falls to about 2.0, “thin” areas, meaning low populations and economic power, will be difficult to support profitably, so penetration of broadband is likely to fall below 80%, and at densities approaching 1.0, penetration will fall to 70% or less without special measures.

The characteristics of wireline infrastructure are usually the limiting factor here.  If broadband deployment costs were very low, then a low economic value passed per mile of infrastructure would still create a reasonable ROI.  Obviously, running any form of physical media to homes and businesses, even with a hierarchy of aggregation points, is going to be more costly where prospective customers are widely distributed.  Almost all urban areas could be served with wireline broadband, where most deep-rural areas (household densities of less than 5 households per square mile) would be difficult to serve unless the service value per household was quite high.

Public policy is almost certainly not going to permit operators to cherry-pick these low-density areas based on potential revenue, but that would be difficult in any case because the revenue that could be earned per household depends on the services the household would likely consume.

What is the service value of a household?  Here we have to be careful, because an increasing percentage of the total online service dollars spent per household don’t go to the provider of broadband access.  An example is that many households who used to spend around $150 per month on TV, phone, and Internet, have dropped everything but Internet and now spend less than $70 per month.  Sure, they may get Hulu and even a live TV streaming service, and spend another $70 or even more, but the broadband operator doesn’t get that.

Generally, the preferred relationship for broadband in US markets seems to be a household revenue stream (all services monthly bill) that’s roughly equal to one third of the combination of pass cost (per-household neighborhood wiring) plus connect cost.  Today, average pass costs run roughly $250 and average connect costs roughly $200, for a total of $450.  That would mean a household revenue stream of $150 is needed, on the average.

In US urban and suburban areas, it’s getting more difficult to hit that monthly revenue target, but it’s still largely possible.  Household densities even in the suburbs tend to run between 300 and 600 households per square mile, which is usually ample to support profitable broadband.  As you move into rural areas, though, household densities fall to an average of less than 100 per square mile, down to (as previously noted) as little as 5 or less.

Wireline infrastructure is rarely able to deliver suitable ROI below densities of 150 households per square mile.  Even in higher household densities, as many as 500 or more, it’s often necessary today for developers to either share costs or promise exclusivity to induce broadband providers to offer quality infrastructure for new subdivisions.

5G millimeter wave, just beginning to deploy, is typically based on a combination of short-haul 5G and fiber-to-the-node (FTTN).  The overall cost will depend in large part on whether there are suitable node points where there’s either already fiber available or where fiber can be introduced at reasonable costs.  Operators tell me that they believe that, on the average, it should be possible to serve household densities of between 100 and 200 per square mile with monthly revenues of $120 or more per household, since self-installation is a practical option here.  This would cover a slightly broader swath of low-density suburbs to high-density rural.

The problem here is that 5G/FTTN tends to support demand densities of somewhere in the 2.5-3.5 range, which is better than the 4.0 lower limit for traditional technologies but still far too high to address many countries and most rural areas.  For that, the only solution is to rely on cellular technologies with greater range.

Studies worldwide suggest that 5G in traditional cellular form (macrocells in low-density areas, moving to smaller cells in suburbs and cities) could deliver 25 Mbps to 35 Mbps per household at acceptable ROIs, and many operators and vendors say that these numbers could probably be doubled through careful site placement and RF engineering.  My models suggest that using traditional 5G, it would be possible to support demand densities down to as low as 0.8, without any special government support.

The “would be possible” qualifier is important here, and so is the 0.8 demand density floor.  The “possible” issue relates to the fact that while it’s possible to hit minimal ROI targets on demand densities below 1.0, it’s not clear whether minimal ROI could actually get anyone interested in deployment.  With every operator chasing revenue, many leaving their traditional territories to seek opportunities half a world away, would they flock to rural areas?  Maybe not.

With respect to the 0.8 limit, the problem is that there are a lot of areas that fall well below that.  In the US, there are 18 states with demand densities below that limit, and that’s entire states.  Within well over 80% of states there are areas with demand densities below 0.8.  Does this mean that even in the US, widespread issues with broadband quality are inevitable without government support?  Yes.  Does it mean the support has to be direct subsidization?  Perhaps not.

You can swing the ROI upward by lowering the cost of infrastructure.  The biggest cost factors in the use of 5G (in either form) as a means of improving broadband service to low-demand-density areas are the spectrum costs and cost of providing fiber connections to cell towers and nodes.  Both these costs could be reduced by government programs.  For example, governments could provide 5G spectrum at low/no cost to those who would offer wireline-substitute broadband at 40 Mbps or more, and they could trench fiber along all public routes, when any construction is underway, then offer fiber capacity under the same terms.

This could be an alternative to direct subsidies.  I’ve not been able to model the impact of the approach, because there are so many country-specific variables and low-level data on population and economic density isn’t always available, but it would appear from my efforts that it could pull over 90% of the US into a zone where ROIs on even rural broadband could be reasonable, enough to make it possible for existing wireless operators at least to serve rural areas profitably.

Reading Cloud Patterns from IBM’s Quarter

One of the expected impacts of COVID is pressure on long-term capital projects.  These pressures would tend to favor a shift toward various expense-based strategies to achieve the same overall goals, and in application hosting that would mean a shift from data center commitments to public cloud.

As it happens, 2020 was already destined to see a growth in public cloud services to enterprises, because the model of using the cloud as a front-end technology to adapt legacy applications to mobile/browser access was maturing.  This “hybrid cloud” approach is why Microsoft was gaining traction over Amazon as the cloud provider of choice for enterprises.

As I noted briefly in my blog yesterday, IBM surprised the Street and many IT experts by turning in a sterling quarter, fed in no small part by its IBM Cloud service.  I want to look at their quarter and what it might teach us about the way that cloud services, cloud technology, and data center technology are all evolving.  In particular, I want to look more deeply at the premise of yesterday’s blog—that Google had a cloud-native strategy that it planned to ride to success in tomorrow’s world of IT.

Let me start with a high-level comment.  Yes, IBM’s successful quarter was largely due to its Red Hat acquisition.  Even IBM’s cloud success can be linked back to that, but why do we think they bought Red Hat in the first place?  IBM had a good base relationship with huge companies, good service organization, and a good brand.  They needed more populism in their product set, and they got it.  We need to understand how they’re exploiting the new situation.

One of IBM’s Krishna’s early comments is a good way to start.  He indicated that “we are seeing an increased opportunity for large transformational projects.  These are projects where IBM has a unique value proposition….”  There is no question that COVID and the lockdown have opened the door for a different, far less centralized, model of how workers interact and cooperate.  As I said in an earlier blog, this new approach will survive the pandemic, changing our practices forever.  I think Krishna’s transformational opportunities focus on adapting to the new model of work.

The related point, IBM’s unique value proposition, is also predictable.  If you’re going to do something transformational, you don’t cobble together a bunch of loosely related technologies, or trust your future to some player who might fold in a minute under the very pressures of pandemic you’re responding to yourself.  You pick a trusted giant, and IBM not only fits that bill, they’ve been the most consistently trusted IT player for half a century.

Now let’s look at the next fascinating Krishna comment: “Only 20% of the workloads have moved to the cloud. The other 80% are mission critical workloads that are far more difficult to move, as a massive opportunity in front of us to capture these workloads. When I say hybrid cloud, I’m talking about an inter-operable IT environment across on-premise, private and publicly operated cloud environments, in most cases from multiple vendors.”  This really needs some deep examination!

I’m gratified to see the comment on workloads already migrated to the cloud, admittedly in part because his numbers almost mimic my own data and even earlier modeling.  The most important reason why public cloud for enterprises isn’t an Amazon lake is that 80%.  It’s not moving soon, and so hybrid cloud services have to augment the existing mission-critical stuff rather than replace it.  But, that “inter-operable IT environment” Krishna is talking about is the cloud-native framework that my blog yesterday suggested was Google’s goal.  So, it appears IBM is saying that the future of that 80% mission-critical application set depends on a new environment for applications that sheds technology and location specificity.  Build once to run anywhere.

What’s in that framework?  Containers and Kubernetes, obviously (and Krishna in fact mentions both).  Linux, open-source software, OpenShift, Red Hat stuff, not surprisingly.  What IBM seems to be doing is framing that inter-operable IT environment in terms of software components it already has and which are considered open industry assets.  IBM could reasonably believe it could lift the Red Hat portfolio to that new IT environment level, making all of it an element in the future of IT.

What isn’t in the framework may be just as important.  Nowhere on the call does Krishna suggest that the new framework is “cloud-native” (he never mentions the term), nor does it include a service mesh (never mentioned) or an application platform that’s intrinsically portable, like Angular.  In other words, none of the stuff that Google may be relying on is a part of the IBM story.  That doesn’t mean that Google is on the wrong track; it might mean IBM doesn’t want to make it appear that Google is on the right track.

The risk this poses for IBM is pretty simple.  If there are in fact technology pillars that have to hold up the new application framework, then IBM has to be an early leader in those areas of they risk losing control of what they admit to be the future of IT.  It seems, at one level at least, foolish to take a risk like that, so why might IBM be willing to do so?

The first reason is their nice quarter, and they’re citing their unique value proposition for those current transformational projects.  It’s the wild west out there in the hybrid cloud; let IBM be your Sherriff.  IBM is clearly reaping benefits in the here and now, and so the last thing they’d want to do is push the fight off for a year or more, losing revenue and momentum along the way.

The second reason is that Red Hat is unique in having complete platform and application solutions.  If future transformational applications have to be built on a new framework, IBM’s Red Hat assets might require a lot of rebuilding.  Google has no inventory of stuff at risk so they can not only afford to risk transformation of the architecture of future applications, they’d benefit from it.

The third reason is that, absent a transformational architecture for transformational applications, it’s not unlikely that building those applications would involve more integration work.  Guess who has a booming enterprise services business; IBM!  Quoting Krishna again, “you’ll recall that nearly half the hybrid cloud opportunity lies in services.”  Nothing kills a professional services opportunity like a nice, fully integrated, architecture.

I think that IBM’s success this quarter, and its views on why it succeeded, demonstrate that we’re likely heading into a polarization period in hybrid cloud.  One camp, the camp IBM is in, sees the hybrid future as an adaptation of existing open applications to a new architecture, via professional services and container suites (OpenShift).  The other camp, which I believe Google is in, sees the future as the definition of a true, universal, cloud-native application framework that has to be supported from development to deployment.

An interesting kind-of-parallel dynamic is the swirling (and confusing) telco cloud space.  It is very possible, even likely, that the first and biggest opportunity to introduce a sweeping new application architecture into the cloud world would be the telco or carrier cloud.  The current market conditions and trends suggest that carrier cloud is both an opportunity for outsourcing to the public cloud and a new hosting mission to justify a new architecture.  It certainly represents a complex hybrid-cloud opportunity, a fusion of the two hosting options.

IBM sees all of this; Krishna said “we have continued to deliver a series of new innovations in the last quarter. We launched our new Edge and Telco network cloud solutions, built on Red Hat OpenStack and Red Hat OpenShift, that enable clients to run workloads anywhere from a data center to multiple clouds to the edge.”  So, of course, do all the other public cloud vendors, and so does HPE and VMware, both of whom could be credible sources of new-architecture elements.  And, of course, with every possible advance of cloud technology into the world of telecom, we have pushback.

A recent story suggesting that container centerpiece Kubernetes may not be the right orchestrator for NFV, and citing a Cisco expert to buttress the point.  The issue, to me, seems linked to the fact that containers aren’t an ideal data-plane element and don’t fit the NFV model.  OK, but software instances of data-plane functionality hosted on commercial servers aren’t ideal either; white boxes designed for the mission would surely be better.  And the NFV model doesn’t seem to fit well with its own mission; most VNFs get hosted outside the cloud, not in it.  Containerized Network Functions (CNFs) are different from containers, if they really are, only because the NFV community chose to make them so.  Nevertheless, the result of this could be a slowing of cloud-native adoption by operators, which would limit their ability to realize carrier cloud opportunities beyond 5G and NFV.

From the perspective of telco cloud services, IBM, then, may be taking a risk, but so are Google and those relying on some sensible carrier cloud thinking.  By taking their winnings when they can, IBM may emerge as the smart player at the table, particularly if the carrier cloud space descends into the disorder we’re becoming accustomed to seeing in the telco world.

I think that, in the net, the cloud opportunity generated in our post-COVID world will overcome the carrier cloud uncertainties.  Carrier cloud is less likely to be a decisive driver, not only because the carriers continue to fumble on the issue, but because COVID-related changes are clearly on the rise in the cloud space.  In that world, forces seem evenly balanced between IBM’s integration-transformational approach and Google’s (by my hypothesis, anyway) architectural approach.  I think the latter offers greater cloud penetration and more overall tech opportunity in the long term, but if I’m right about Google’s intentions, they need to start making announcements of their new direction and win some planner hearts and minds.  Planning requires an understanding of an approach, where IBM’s approach requires only sales account control.

Why Does Google Want to Retain Development Control of those Three Projects?

A piece in Protocol on Google’s desire to control some of its open-source projects’ directions made me wonder why Google was willing to release Kubernetes to the CNCF and wants to hold back control of the direction of Istio, Angular, and Gerrit.  What do a service mesh, an application platform and a code review tool have in common, if anything?  There might not be a common thread here, of course.  But Google is a strategic player, so is there something strategic about those particular projects, something Google needs to keep on track for its own future?  Could there even be one single, common, thing?  It’s worth taking a look.

To make things clear, Google isn’t making these projects proprietary, they’re just retaining the governance of the development.  To some, that’s a small distinction, and there were fears raised that popular projects might end up under the iron control of Google.  Why these three specific projects, though?

Istio is a fairly easy one to understand.  Istio is a service mesh, a technology that does just what the name suggests, which is to provide a means of accessing a community of service instances that expand and contract with load, balancing the work and managing instances as needed.

What makes service meshes critical to Google is cloud-native development.  Stuff that’s designed for the cloud has to have a means of linking the pieces of an application to create a workflow, even though the pieces will vary and even the instance of a given component will vary.

Service mesh technology is also the higher layer of implementations of serverless computing that are integrated with a cloud software stack and not part of cloud provider web services.  Google would certainly be concerned that open community interest could drive Istio in a direction that doesn’t fit the long-term Google cloud-native vision.

What’s the right direction?  One obvious candidate is “Istio federation”.  Recall that the current rage in Kubernetes revolves around means of “federating” or combining autonomous Kubernetes domains.  Service mesh technology, as an overlay to Kubernetes, might also benefit from the same kind of federating framework.  It would also create a bridge between, say, a Google Cloud with an Istio in-cloud feature set, and Istio software in a data center.

Another thing Google might be especially interested in is reducing Istio latency.  Complex service-mesh relationships could introduce a significant delay, and that would limit the value of service mesh in many business applications.  Improving service mesh latency could also improve the serverless computing applications, notably Knative.  Serverless doesn’t quite mean the same thing in a software-mesh framework because you’d still perhaps end up managing container hosts, but it does improve the number of services that a given configuration (cluster) can support.

We might get a glimpse of Google’s applications for Istio by looking at the next software package, Angular.  The concept of Angular evolved from an original JavaScript-centric tool (now popularly called “AngularJS”) to the current model of a web-based application platform built on TypeScript, an enhancement to JavaScript to allow for explicit typing.  Because Angular is portable to most mobile, desktop, and browser environments, it can be used to build what are essentially universal applications.

There are two interesting things about Angular, one organizational and one technical.  The organizational thing is that it’s a complete rewrite of the original AngularJS stuff, by the same development team.  That shows that the team, having seen the potential of their approach, decided to start over and build a better model to widen its capabilities.  The technical thing is that Angular’s approach is very web-service-ish, which means that it might be a very nice way to build applications that would end up running as a service mesh.

Angular was a part of a reference microservice platform that included Istio and builds an application from a distributed set of microservices in a mesh.  This would create a cloud-native model for a web-based application, but using a language that could take pieces of (or all of) the Angular code and host it on a device or a PC.

I have to wonder if Google is seeing a future for Angular as the way of creating applications that are distributable or fixed-hosted almost at will, allowing an application to become independent of where it’s supposed to be run.  If you could describe application functionality that way, you’d have a “PaaS” (actually a language and middleware) that could be applied to all the current models of application hosting, but also to the new cloud-native microservice model.  That would sure fit well with Istio plans, and explain why Google needs to stay at the helm of Angular development.

The connection between Istio and Angular seems plausible, but Garret is a bit harder to fit in.  Garret is a variant to github, a modernized repository model that’s designed specifically to facilitate code review.  Organizations used to github often find Garret jarring at first, and some at least will abandon it after the initial difficulties overwhelm them.  It’s best to do just a few (even one) main repository first and get used to the process overall before you try to roll Garret out across a whole development team.

Without getting into the details of either Garret or code review, can we say anything about why Google might see Garret as very strategic?  Well, think for a moment about the process of development, particularly rapid development, in a world of meshed microservices.  You are very likely to have multiple change tracks impacting some of the same components, and you surely need to make sure that all the code is reviewed carefully for compatibility with current and future (re-use) missions.

As I said up front, Google might have three reasons for three independent decisions on open-source direction in play.  The reasons might be totally trivial, too.  I think Google might also be looking at the future of software, to a concept fundamental to the cloud and maybe to all future development—the notion of virtualization.  If software is a collection of cooperative components that can be packaged to run in one place, or packaged to be distributed over a vast and agile resource pool, then it’s likely that developing software is going to have to change profoundly.

Would Google care about that, though?  It might, if the mapping of that virtual-application model to cloud computing is going to create the next major wave of cloud adoption.  Google is third in the public cloud game, and some even have IBM contending with Google for that position.  If Google wants to gain ground instead of losing it, would it benefit Google’s own cloud service evolution to know where application development overall is heading?

That’s what I think may be the key to Google’s desire to retain control over the direction of these three projects.  Google isn’t trying to stifle these technologies, it’s trying to promote them, but collaterally trying to align the direction of the projects (to which Google is by far the major contributor) with Google’s own plans for its public cloud service.

The early public cloud applications were little more than server consolidation onto hosted resources.  The current phase is about building cloud front-ends to existing applications.  Amazon has lost ground in the current phase because hybrid cloud isn’t its strength, and Microsoft and IBM are now more direct Google rivals.  IBM killed their quarterly earnings, and IBM Cloud made a strong contribution.  That has to be telling Google that if the hybrid-cloud game stays dominant, IBM and Microsoft will push Amazon and Google down.  They need a new game, one in which everything is up for grabs, and Google could come out a winner.  Real cloud-native could be that game, and these three projects could be the deciding factor for Google.

Can “Government Broadband” be Made to Work?

Are all government broadband initiatives doomed?  After I commented a bit on Australia’s NBN as an example of why you can’t count on a form of nationalization to save broadband, I got additional material on Australia, as well as commentary from people involved in various government-linked broadband initiatives here in the US.  I think the sum of the material sheds light on why so many (most, or even all?) such plans end up in failure.  That could serve to deter some, and perhaps guide the initiatives of governments determined to give it a try.

The single point I got from all my sources is that all studies commissioned to evaluate government broadband programs will be flawed.  In every case where such a study was done, the study forecast a shorter period to full deployment, better outcome, and lower costs than were experienced.  According to my sources, the average study missed the mark by almost 100% in each of these areas.  That seems too much of an error to be simple estimate difficulties; there seems to be a systemic issue.  In fact, there are several.

The issue most sources cite for study errors is there is a desired outcome for the study, and it’s telegraphed to the bidders.  I’ve seen this in study RFPs I’ve received in the past, when my company routinely bid on these sorts of things.  “Develop a report demonstrating the billion-dollar-per-year market in xyz” is an extreme example, but it’s a near-quote of one such RFP’s opening.

The second main source of study errors is that the organization requesting the study has no ability to judge the methodology proposed or the quality of resources to be committed.  It does little good for an organization or government entity to request a study when they wouldn’t even know a plausible path to a good outcome, and yet the majority of studies are commissioned by organizations with no internal qualifications to assess the results.  In some cases, that’s not a barrier because (as my first point illustrates) the desired result is already known, and the study is just going through the motions.  In other cases, the organization requesting the study is simply duped.

The second-most-cited reason for the failure of government broadband projects is that a vendor or integrator misleads the government body in the capabilities of the technology.  Everyone who’s ever done any kind of RFP knows that vendors will push their capabilities to (and often past) their limits.  “To a hammer, everything looks like a nail” is an old saw that illustrates the problem.  Go to a WiFi specialist and you get a WiFi-centric solution, whether it’s best or not.

This is the biggest technical problem with government broadband.  Sometimes it’s the result of underestimating the pace of progress in technology relative to the timeline of the project.  If you embark on a five-year effort to do something, the fast-moving world of network technology is likely to render your early product examples obsolete before the end of the project is reached.  Sometimes, there are fundamental architectural issues that should have been recognized and were simply missed, or swept under the rug.

The third-most-cited source of problems with government broadband is lack of flexibility in dealing with unexpected issues.  This covered a number of more specific points.  First, the government projects tended to push issues under the rug when they arose to avoid compromising the plan, when in fact it made the issues nearly impossible to address when they finally blew up.  Second, government projects were slow to adapt the plan to changes in conditions that clearly indicated adaptation was necessary.  Third, government broadband didn’t properly consider new technical options when they arose.

Then, of course, there’s the general complaint that all government broadband is too political.  This issue came out very clearly in Australia’s NBN, where the whole topic was a political pawn.  Politics tends to polarize the decision-makers on extreme opposite sides of any issue, and with broadband that tends to promote a kind of all-or-nothing mindset at every step of the project.

The input I got suggests that most involved in government broadband projects agreed with my point, which was that the best strategy was likely incentive payments to competing operators to induce the behavior the government wanted, rather than shouldering the free market aside and taking over.  A number of Australia’s operators tell me that they believe that the broadband situation would be far better had the government done nothing at all, and that a positive approach to dealing with the specific issues of a low-demand-density market would have served far better.

What, then, could a government do to optimize their chances of succeeding?  There are some specific points that seem to be consistent with the experiences my contacts related.

The step that’s suggested most often is perhaps the simplest:  Governments need to contract for a service level.  The most-cited success story in government/network partnerships is the one involving Google Fiber.  People will argue that Google cherry-picks its sites, but that’s not a reason to say that Google Fiber isn’t a good approach, only a reason to say it can’t be the only one.

Google Fiber tends to go after areas that have reasonable demand density but are under-served by traditional telco and cableco providers.  That there are such areas is proof that the competitive market doesn’t always create optimum strategies.  Some telco/cableco planners have confided that many, even most, Google Fiber targets were considered for new high-speed broadband, but that the market areas were too small to create sufficient profit, and there was a fear that other nearby areas would complain.

New technology of some sort, however, is almost surely required for improving broadband service quality in low-demand-density areas.  There’s too often a focus on reusing the copper-loop technology left behind by the old voice telephone services.  Rarely can this plant sustain commercially useful broadband quality, so a bid for a given service level has to be assessed considering the real capabilities of the technology to be used.

Perhaps the most important lesson of Google Fiber is that if a network company can exploit new technology to serve an area, they should be encouraged to do that, even to the point where the encouragement is a partnership with government.  I think that millimeter-wave 5G in conjunction with FTTN could well open up many new areas to high-speed broadband.  Since technology companies are more likely to understand this than governments, a corollary to this point is that governments should encourage approaches by network companies rather than pushing for something specific.

The second step is only a little behind the first:  Think small, meaning try to get local government initiatives before you look for something broader.  A specific city or county is more likely to be suitable for a given broadband strategy than an entire country.  A solution that’s applied nationally tends to encourage the spread of the approach to areas that really weren’t disadvantaged in the first place.  Did Australia have to create a national NBN, or should they have instead focused on regional solutions where Telstra and others weren’t able to create commercial services profitably?

It may be that “little government” is always going to do better with broadband programs, not because its people know more, but that they recognize their own limitations more readily.  It may also be true that the best thing a national government can do for broadband is to step aside and let the little guys drive the bus.  That, in my own view, is the lesson Australia should teach us all.

A Response to the “Accelerating Innovation in the Telecom Arena”

Nobody doubts that the telecom ecosystem is in need of more innovation.  That’s why it’s a good sign that there’s a group of people working to promote just that.  They’ve published a paper (available HERE) and they’re inviting comments, so I’m offering mine in this blog.  Obviously, I invite the authors and anyone else to comment on my views on LinkedIn, where a link to this will be posted.

The paper is eerily similar to the “Call for Action” paper that, in 2012, kicked off NFV.  That the initiative seems to be promoted by Telecom TV also echoes the fact that Light Reading was a big promoter of NFV, including events and even a foundation that planned to do integration/onboarding.  These points don’t invalidate what the paper contains, but they do justify a close look.

Let’s then take one.  A key question at the end of the paper is “What did we get wrong?”, and so I’ll offer my view of that point first.  The biggest problem with the paper is that it’s a suggestion of process reform.  It doesn’t propose a specific technology, a specific approach, but instead talks about why innovation is important and how it’s been stifled.  With respect, many know the answers to that already, and yet we are where we are.

The paper makes two specific process suggestions; R&D partnerships with vendors and improvements to the standards process.  I heartily agree with the latter point; how many times have I said in my blog that telco standards were fatally flawed, mired in formalism, prone to excessive delays, and lacking in participants with the specific skills needed to “softwarize” the telco space.

NFV was softwarization, but it never really developed a carrier-cloud-centric vision.  Instead, it focused on universal CPE, and as 5G virtual features became credible, operators started looking at outsourcing that mission to public cloud providers.  No smaller, innovative, players there, and in fact really small and innovative players would have a major problem even participating in carrier standards initiatives.  I’ve been involved in several, but only when I’d sold some intellectual property to fund my own activity.  None of them made me any money, and I suspect that most small and innovative companies would be unable to fund their participation.

I could support the notion of improved R&D partnerships, if we knew what they looked like and what they sought to create.  What would the framework for those partnerships be, though?  Standards clearly don’t work, so open-source?  ONAP was an operator open-source project, and it went wrong for nearly the same technical reasons NFV did.  There’s not enough detail to assess whether there’s even a specific goal in mind for these partnerships.

Except for a single statement that I have to add to the “Where did we go wrong” point, with a big exclamation mark.  “Softwarization of the infrastructure (i.e. NFV, SDN, cloud) has, in theory, created opportunities for smaller, more innovative players, to participate in the telecommunications supplier ecosystem, but there remain significant barriers to their participation, including the systemic ones identified above.”  It’s clear from this statement, and other statements in the paper, that the goal here is to improve the operators’ ability to profit from their current network role.  Innovation in connection services, in short, is the objective.

I’ve said this before, but let me say it here with emphasis.  There is no way that any enhancements to connectivity services, created through optimization of data network infrastructure, can help operators enhance their profitability in the long term.  Bits are the ultimate commodity, and as long as that’s what you sell, you’re fighting a losing battle against becoming dirt.  Value is created by what the network supports in the way of experiences, not by how that stuff is pushed around.  Yes, it’s necessary that there be bit-pushers, but the role will never be any more valuable than plumbing.  The best you can hope for is to remain invisible.

As long as telcos continue to hunker down on the old familiar connection services and try to somehow make them highly differentiable, highly profitable, there’s nothing innovation can do except reduce cost.  How much cost reduction can be wrung out?  Cost management, folks, vanishes to a point.  There is no near-term limit to the potential revenue growth associated with new and valuable experiences.  The moral to that is clear, at least to me.  Get into the experience business.

Smaller, more innovative, players have tried to break into telecom for at least two decades, with data-network equipment that would threaten the incumbents.  It didn’t work; the giants of 20 years ago are the giants of today, with allowances made for M&A.  The places where smaller players have actually butted heads with incumbents and been successful have been above basic bit-pushing.  Trying to reignite the network equipment innovation war that small companies already lost, by moving the battlefield to software, just invites another generation of startup failures.  Anyway, what VC would fund it when they can fund a social network or e-commerce company?

Innovation, meaning innovators, go where the rewards are highest.  Telco-centric initiatives are not that space, as VC interest demonstrates.  Will the operators fund initiatives to improve their infrastructure?  I don’t think so; certainly none have indicated to me that they’re interested, and without some financial incentives for those innovators, innovation will continue to be in short supply.

If operators are serious about bringing more innovation into their operations, they need to start by setting innovative service goals.  Different ways to push bits are really not all that different from each other.  Different ways of making money off bits is a totally different thing. The whole of the Internet and the OTT industries were founded on that, so it’s perhaps the most proven approach to enhancing revenue we could find out there.

There is value to things like NFV hosting and 5G Core virtualization, as drivers to an expanded carrier cloud infrastructure that then becomes a resource for the other future higher-layer services an operator might want to provide.  However, we don’t need innovation initiatives to do this sort of thing.  We have cloud technology that’s evolving at a dazzling pace and addressing all the real issues.  I’ve blogged in the past about how cloud-native control-plane behavior, combined with white-box switching, could address connection-network requirements fully.  We don’t need to be able to spin up routers anywhere, we just need to spin them up at the termination of trunks.  But even white boxes can be part of a resource pool, supported by current cloud technologies like Kubernetes.

We don’t need NFV, nor do we need an NFV successor.  We can now turn to the cloud community for the solution to all our problems, even problems that frankly the operators don’t even know they have yet.  Where innovation is needed is in the area of service planning.  Bits are ones and zeros, so differentiating yourself with them is a wasted exercise.  It’s what they can carry that matters, and so innovative operators need to look there, and leave cloud infrastructure to people who are already experts.

Does Nokia Really Have a New Switching Strategy for the Cloud?

The heart of a cloud is the data center, obviously.  The heart of a cloud network is therefore a data center network, and as network equipment vendors struggle for profit growth, data center networking looks very promising.  Given that, it’s not very surprising that Nokia is taking a position in the space.  The question is whether it can turn a position into a win, and it won’t be easy.

Network vendors have been under profit pressure, in no small part because network operators and enterprises have been under pressure themselves.  The one bright area of growth has been the cloud space, and as I noted above, the data center network is the centerpiece of any public cloud network-building plan.  The same, by the way, is true for enterprises; my own research for well over two decades has shown that enterprises tend to build their networks out from the data center.  That gives those vendors with a data center position a big advantage.

The “well over two decades” point is critical here, at least for Nokia.  The position of the data center network has been known to the smarter vendors for at least that long, and players like Cisco and Juniper have been focusing increasingly on switching revenue for a very long time.  Nokia is surely coming from behind, and that’s the biggest reason why their move into the space is a risk.

Every market tends to commoditize over time, with the pace of commoditization depending on the extent to which new features and capabilities can be introduced to justify additional spending.  Data center switching is surely less feature-rich than routing, and routing is already commoditizing, so it follows that the switching space has been under price pressure.  In fact, Facebook has been pushing its FBOSS open switching for five years now, and “white-box” switching was a big feature in the ONF’s OpenFlow SDN stuff.

There’s also the problem of sales/marketing.  First and foremost, there has never been a carrier network equipment vendor who was good at marketing, given that their buyers were awful at it.  Nokia is no exception.  Then there’s the fact that to market effectively, even if you know the general principles of marketing, you have to know a lot about your buyer.  That intimate knowledge is going to come only from direct relationships, meaning sales contacts.  If you don’t call on a given prospect, you’re unlikely to know much about them, and if you’ve had no products to sell them, you’re unlikely to have called on them.  Is Nokia a household word among cloud providers?  Hardly.

This, for Nokia, sure looks like the classic long pass on fourth down in the last minute of a football game you’re losing.  What in the world would make them go for it, other than that very football-game-like desperation?  There are two possibilities, and making either of them work will demand Nokia make some big changes.

The obvious possibility is that Nokia is indeed in that last-pass situation.  They’re behind both Ericsson and Huawei in the mobile space, and 5G is the only hope for carrier network equipment.  The cloud providers are an opportunity, but so are large enterprises with their own data centers.  Rivals Cisco and Juniper have enterprise sales of data center switches, and that gives them an advantage over a rival who might focus only on the cloud providers and operators.  Could Nokia be looking to get into the switching space more broadly?  Maybe.

The other possibility is that Nokia is reacting to the significant developments in the carrier cloud space.  Network operators are committed to virtualization, and in today’s world the commitment is visible both in NFV in general, and in virtualization in 5G in particular.  Future opportunities like IoT seem almost certain to demand hosting, and so it’s long been said that carrier cloud could be the largest single source of new data center deployments—including by me.  The problem is that the carriers themselves have been extraordinarily (even for them) slow in developing any real carrier cloud plan, much less a commitment.  They’re now increasingly favoring outsourcing of at least the early carrier cloud applications to the public cloud providers.  Could Nokia see that outsourcing as a foot in the public cloud door today, and also see a future push by operators to return to hosting their own carrier cloud apps?  Maybe.

If Nokia wants to be a broad-market data center switching player, their biggest challenge is that they don’t call on enterprises today and have little or no name recognition.  To succeed, they’d need an incredibly well-done marketing program, with a great (possibly dedicated) website presence and great collateral.  Without this, the burden placed on their sales personnel would be impossibly large, and it would be difficult to compensate them for the time needed to develop accounts.

Targeting only the public cloud providers might make things just a bit easier from a sales-marketing perspective, but this group has been looking more and more toward white-box switching and price pressure on the deals would be intense.  Because the cloud provider buyers are quite platform-savvy, Nokia would need a highly qualified sales support staff to handle questions and step in when needed.

The last of the options seems best, at least in terms of opportunity versus effort.  Nokia is knowledgeable about the carrier cloud opportunity, more so than nearly all the public cloud providers, which means they actually have an asset they could push through sales/marketing channels.  Nokia, as a credible 5G player, has the same sort of positioning advantage in direct sales of carrier cloud infrastructure, so they could credibly tell a story of transition—start small with a public cloud using Nokia-approved technology, then migrate to your own clouds—to operators.

If all Nokia had was switches, this would be a lost battle for them.  They do have more, however, including what they call “a new and modern Network Operating System”, a version of Linux designed to take advantage of microservices and cloud-native operation to unite a system of data center switches.  This, obviously, could be extended to other devices at some point, though Nokia isn’t saying it will be.  This makes Nokia’s story a story of cloud switching, which could be compelling.

“Could be”, but in order for the story to deliver, it has to overcome that sales/marketing challenge.  Nokia, as with Alcatel and Lucent, has always had a very geeky approach to the world, one that hasn’t so far been able to overcome the basic truth that no matter how smart you are, you can’t win with a strategy that assumes your buyers are smart too.

Smart switches may be necessary conditions for a smart cloud, a cloud that’s capable of being elastic and efficient and operationalizable, but they’re not a sufficient condition.  The cloud is an ecosystem, and you can’t optimize it by tweaking just the connective tissue.

The good news?  Sales/marketing problems are relatively easy to fix.  All it takes is determination and a good, systematic, approach.  The bad news?  Through the evolution of three companies, Nokia hasn’t been able to fix them.  We’ll see how they do this time.