Telco Fall Tech Planning Cycle: Results

Every year, network operators do a technology planning review in the fall, usually between mid-September and mid-November. The purpose of the review is to identify the technologies that will compete for funds in the yearly budget cycle that usually starts after the first of the year. I’ve tracked these review cycles for decades, and I got a fair number of responses from operators regarding the topics of their planning this year. Now’s the time to take a look at them, and to do that I need to do a bit of organizing of the responses to accommodate the differences in how operators approach the process.

My normal operator survey base includes 77 operators, but they don’t all have a formal fall planning process. I asked the 61 operators who do, from all over the world, for their input. Of those, 56 responded with information, and for 45 of these I got data from multiple organizations and levels within the operator. CTO and CFO organizations responded most often, with the CTO in the lead with a response from every operator who responded. Where both the CTO and CFO responded, I got responses from at least one other organization, and CEO data from about a third of the 45. We get our data under strict NDA and data use restrictions, so don’t ask for the details, please!

The one universal technology priority, one every major network operator cites, is the evolution to 5G technology and related issues. 5G is the only new technology that actually has universal budget support. Consumer broadband (wireline) is the second (with support among almost three-quarters of operators, excluding only the mobile-only players). Other than these two technology areas, nothing else hits the 50% support level, and that’s a point deserving of comment in itself.

CEOs and CFOs who commented believed that what I’ve called the “profit-per-bit” squeeze was their company’s biggest challenge. They see ARPU growth in mobile and wireline, business and consumer, as very limited, and yet they see their costs ramping up. I’ve tended to use the revenue/cost-per-bit curves as my way of representing the challenge (because the first operator who told me about this over 20 years ago used them), but today the CFOs draw the charts as a slowly growing ARPU contrasted with a faster-growing cost curve. Most don’t forecast a dire crossover, but about half suggest that the gap between the two, which is a measurement profit, is at risk of shrinking to the point where they need something to happen, and it’s the “something” that sets the planning priorities.

Ten years ago, two-thirds of operators believed that reducing capex was the answer, which is what drove the NFV movement that started about then. Five years ago, just short of two-thirds believed that reducing opex was the answer, and in fact the majority of the actual changes made by operators in the last decade to fend off the convergence of our curves were attacking opex, usually head count.

This can still be seen in the responses of operators at the VP of Operations level. They want to sustain their current service base by controlling equipment costs through a combination of pushing out purchases and hammering for discounts. They want to reduce operations cost by improving the tools, including wider use of artificial intelligence. There is fairly strong interest (62%, driven mostly by an 85% interest in open-model 5G) in alternative architectures for network equipment, but that interest is focused on open-model networking, which they see as a way of getting “discounts” from vendors as much as of getting gear at a lower price because of direct competition and leveraging commercially available hardware.

The CIO organization, responsible for OSS/BSS, is necessarily focused on opex benefits. While almost half of CEOs and CFOs are asking the question whether OSS/BSS should be scrapped in favor of something (as yet unspecified) new, the CIOs are reaping the benefit of the shift in focus to opex reduction. Among this group, the top priorities are customer care and online ordering, with the latter being considered as “significantly” complete and the former as “at less than half its full potential”.

Customer care views are sharply divided between wireline and wireless business units, of course, and it’s really the latter that received the planning focus this fall. Over three-quarters of operators said they believed that customer care was the largest reason for churn, and also the second-largest reason why wireline users selected a competitive service (price was still on top). Since customer acquisition and retention is the largest component of “network process opex”, it’s clear that addressing the issue is critical. CIO organizations all believe that there could be improvements made to customer care portals to reduce user frustration. Progress at the business service level has been rated as “good” but not so the consumer level.

The problem with consumer customer care is a point of connection between the operations people and the CIO people, because wireline customer care is most likely to involve field service people. Consumers aren’t likely to be able to do much on their own to help fix a network problem, and most probably can’t even play much of a role in diagnosing one without an online wizard (presumably accessed via a smartphone not the failed broadband connection!) to guide them. That’s the area where the CIOs are focused.

Field service, the “craft” people usually considered part of operations, likes the idea of a smartphone wizard, and about half of operators plan to work to improve this capability in 2022. Over a third say they already have a good strategy here, and the remainder think they’ll need more than just a year to get one ready.

This isn’t the only place where multiple operator organizations have symbiotic interest. Open-model networking is where operations and the “science and technology” or CTO organizations converge. Open-model networking represents the number-one CTO-level priority outcome of the fall session this year. As it happens, 5G is also the focus of interest for the product/service management organization, where the interest in increasing ARPU is the highest. That, in my view, makes open-model networking the most important technology point in the current cycle.

The operations organization, as I’ve suggested, is really behind the open-model networking idea as a path to getting better discounts from the current vendors, not actually shifting to open-model infrastructure. It’s like getting a collective low bidder, from a network model that’s inherently based on commodity technology with no incumbents, no lock-ins. While operations people typically don’t say “all I want is to squeeze a bigger discount”, their points of interest seem a careful balance between the competitive benefits of open-model networking and the risk of a nobody-responsible wild west.

The CTO organizations are primarily concerned about how an open model comes about. Since all the standards activities aimed at open-model networks (including NFV) came out of the CTO groups, these people are motivated to defend these early initiatives, but at the same time painfully aware that they’ve done very little to advance the open-model goal, despite the fact that open-model networking gets a planning priority nod from over 90% of operators’ CTO organizations. It’s one thing to define a model, but an open-model network requires that it actually work well enough to deploy widely. If NFV isn’t it, then what is, and what gets to it?

Interestingly, less than a third of CTO organizations said they believed that another standards effort, either within the NFV ISG context or outside it, was the answer. Nearly all said a new organization would take too long, and over half thought that it would also take too long to get NFV cleaned up. In fact, no “positive” approach got even fifty-percent support. The most support (47%) came for the idea that 5G O-RAN work would evolve to create an open model, but nobody expressed any specific notion of how that would happen. O-RAN, after all, is about RAN, not 5G overall, and 5G isn’t all of open-model networking.

If you pull out the viewpoints of technical planning staff people across all the operators’ organizations, the sense I get is that the experts think that “the industry” or “the market” is evolving to an open-model strategy, and that the evolution is being driven by much the same forces that have combined to create things like Linux, Kubernetes, Istio, and LibreOffice. In other words, open-source equals open-model. CTOs assume that open-source software models will develop, as O-RAN did, and that a dominant model will appear. That model will allow for variations in implementation but will set the higher-level features and likely the APIs.

The product management groups are divided between those who believe that enhancements to connection services can increase ARPU (41%) and those who think that only higher-layer (above the connection) services can create any significant revenue upside (57%). A small number believed in both approaches, and an even smaller number didn’t think either was worth considering.

I think that Ericsson’s recent M&A is aimed at the product management interest in new service revenues as a means of driving ARPU up. There is broad support for a new revenue strategy (77% among the planners involved in the fall planning said they thought new revenue was “essential” and almost 100% thought it was “valuable” in improving profits), and it’s interesting that Ericsson linked the Vonage acquisition to 5G, which is a budgeted technology. They likely see that operators would jump on something that could help them in 2022, and packaging the stuff needed for enterprise collaborative services could be at least credible.

This year, operators also had a specific “imponderables” focus in their planning. The obvious top of that list is the impact of COVID, and the planning cycle and surveys were complete before the announcement of flight cancellations and travel restrictions associated with the new Omicron variant. If there turns out to be a combination of a big winter surge in COVID worldwide (there already is in some areas) and if Omicron turns out to be higher-risk than Delta (particularly, if it’s vaccine-resistant) then we can assume we’ll see WFH boom again. If not, then we can assume a continued return to normalcy in communications needs. Operators are watching this all, but obviously they can’t yet make decisions.

Much of this year’s planning cycle focused on issues that were also discussed last year, meaning that we didn’t have a decision or weren’t able to implement it. This year, almost 60% of operators thought that they probably wouldn’t be able to address their planning issues satisfactorily in 2022 either, and that they’d still be trying to address most of these points in next year’s planning cycle. I think the laissez-faire approach to open-model networking that I recounted is, like this broad pessimism, a result of operators recognizing that they aren’t building demand by building supply, and that someone has to learn how to do both, properly. That’s progress, I guess, but those operators are still looking for someone else to do the learning, and the job.

There were no spontaneous statements to suggest operators were really seeing the responsibility for network and technology change any differently. They still see themselves as consumers rather than developers of technology, and their role in network evolution as being standards-setting, largely to prevent vendor lock-in. Even operators who have actually done some software development, and who plan to do more, are still reluctant to admit that they’re doing “product” work, and that explains why they tend to cede their software to an open-source group as quickly as they can. They admit that once this is done, their own participation is more likely to ease than to increase.

The network is changing, and the role of everyone who’s a stakeholder is doing the same. Some admit it, even embrace it, but not operators. That’s the biggest weakness in their planning process; they’re not planning for the real future at all.

How Smart Chips are Transforming Both Computing and Networks

I don’t think there’s any disagreement that network devices need to be “smart”, meaning that their functionality is created through the use of easily modified software rather than rigid, hard-wired, logic. There is a growing question as to just how “smartness” is best achieved. Some of the debate has been created by new technology options, and the rest by a growing recognition of the way new missions impact the distribution of functionality. The combination is impacting the evolution of both network equipment and computing, including cloud services.

To most of us, a computer is a device that performs a generalized personal or business mission, but more properly the term for this would be a “general-purpose computer”. We’ve actually used computing devices for very specific and singular missions for many decades; they sent people into space and to the moon, for example, and they run most new vehicles and all large aircraft and ships. Computers are (usually) made up of three elements—a central processing unit (CPU), memory, and persistent storage for data.

Network devices like routers were first created as software applications running on a general-purpose computer (a “minicomputer” in the terms of the ‘60s and ‘70s). Higher performance requirements led to the use of specialized hardware technology to manage the data-handling tasks, but network devices have from the first retained a software-centric view of how features were added and changed. All the big router vendors have their own router operating system and software.

When you attach computers to networks, you need to talk network to the thing you’re attaching to, which means that network functionality has to incorporated into the computer. This is usually done by adding a driver software element that talks to the network interface and presents a network API to the operating system and middleware. Even early on, there were examples of network interface cards for computers having onboard intelligence, meaning that some of the driver was hosted on the adapter instead of on the computer.

The spread of software hasn’t meant that hardware spread was halted. Specialized chips for networking have existed for decades, and today switching and interface chips are the hardware soul of white-box devices. In the computer space, specialized chips for disk I/O and for graphics processing are almost universal. It’s not that you can’t do network connections, I/O, or graphics without special chips, but that you can do them a lot better with those chips.

So why are we now hearing so much about things like smart NICs, GPUs, IPUs, DPUs, and so forth? Isn’t what we’re seeing now just a continuation of a revolution that happened while many of today’s systems designers were infants, or before? In part it is, but there are some new forces at work today.

One obvious force is miniaturization. A smartphone of today has way more computing power, memory, and even storage than a whole data center had in the 1960s. While the phone does computing, graphics, and network interfacing, and while each of these functions are chipified individually, there’s significant pressure to reduce space and power requirements by combining things. Google’s new Pixel 6 has a custom Google Tensor chip that replaces traditional CPUs and incorporates CPU, GPU, security, AI processor, and image signal processor functions. IoT devices require the same level of miniaturization, to conserve space and of course minimize power usage.

Another force is a radical revision in what “user interface” means. By the mid-1980s, Intel and Microsoft both told me, over two-thirds of all incremental microprocessor power used in personal computers was going to the graphical user interface. That’s still true, but what’s changed is that we’re now requiring voice recognition, image recognition, natural language processing, inference processing, AI and ML and all those other things. We expect computing systems to do more, to be almost human in the way they interact with us. All that has to be accomplished fast enough to be useful conversationally and in the real world, and cheap in terms of cost, power, and space.

Our next new force is mission dissection, and this force is embodied by what’s going on in cloud computing. The majority of enterprise cloud development is focused on building a new presentation framework for corporate applications that are still running in the usual way, usually in the data center, and sometimes on software/hardware platforms older than the operators that run them. The old notion of an “application” has split into a front-end/back-end portion, and the front-end piece is a long way from general-purpose computing and the back-end piece a long way from a GUI. In IoT, we’re seeing applications broken down by the latency sensitivity of their functions, in much the same way as O-RAN breaks out “non- and near-real-time”.

The final new force is platform specialization encouraging application generalization. We separated out graphics, for example, from the mission of a general-purpose CPU chip because graphics involved specialized processing. What we often overlook is that it’s still processing that’s done by software. GPUs are programmable, so they have threads and instructions, and all the other stuff that CPUs have. They just have stuff designed for a specific mission, right? Yes, but an instruction set designed for a specific mission could end up being good for other missions, not just the one that drove its adoption. We’re seeing new missions emerging that take advantage of new platforms, missions that weren’t considered when those platforms were first designed.

What do these forces mean in terms of the future of networking and computing? Here are my views.

In the cloud, I think that what we’re seeing first and foremost is a mission shift driving a technology response. The cloud’s function in business computing (and even in social media) is much more user-interface-centric than general-purpose computing. GPUs do a great job for many such applications, as do the RISC/ARM chips. As we get more into the broader definition of “user interface” to include almost-human conversational interaction, we should expect that to drive a further evolution toward GPU/RISC and even to custom AI chips.

The edge is probably where this will be most obvious. Edge computing is all about real-time event handling, which again is not a general-purpose computing application. Many of the industrial IoT controllers are already based on a non-CPU architecture, and I think the edge is likely to go that way. It may also, since it’s a greenfield deployment, shift quickly to the GPU/RISC model. As it does, I think it will drive local IoT to more a system-on-chip (SOC) by unloading some general functionality. That will make IoT elements cheaper and easier to deploy.

At the network level, I think we’re going to see things like 5G and higher-layer services create a sharp division of functionality into layers. We’ll have, to make up some names, “data plane”, “control plane”, “service plane” (the 5G control plane would fall into this, as would things like CDN cache redirection and IP address assignment and decoding), and “application plane”. This will encourage a network hardware model that’s very switch-chip-centric at the bottom and very cloud-centric (particularly edge-cloud) at the top. I think it will also encourage the expansion of the disaggregated cluster-of-white-boxes (like DriveNets) model of network devices, and that even edge/cloud infrastructure will be increasingly made up of a hierarchy of devices, clusters, and hosting points that are all a form of resource pool.

What’s needed to make all this work is a new, broader, notion of what virtualization and abstraction really mean, and how we need to deal with them. We know, for example, that a pool of resources has to present specific properties in order to be suitable for a given mission. If we have a hundred servers in New York and another hundred in Tokyo, can we say they’re a single pool? It depends on whether the selection of a server without regard for location alters the properties of the resource we map to our mission to the point where the mission fails. If we have a specific latency requirement, for example, that wouldn’t likely be the case. We also know that edge computing will have to host things (like the now-popular metaverse) that will require low-latency coordination across great distances. We know that IoT is “real-time” but also that the latency length of a control loop can vary depending on what we’re controlling. We know 5G has “non-real-time” and “near-real-time” but how “non” and “near” aren’t explicit. All of this has to be dealt with if we’re to make the future we achieve resemble the future we’re reading about today.

We’re remaking our notion of “networking” and “computing” by remaking the nature of the hardware/software symbiosis that’s creating both. This trend may interact with attempts to truly implement a “metaverse” to create a completely new model, one that distributes intelligence differently and more broadly, and one that magnifies the role of the network in connecting what’s been distributed. Remember Sun Microsystems’ old saw, that “The network is the computer?” They may have been way ahead of their time.

Net Neutrality…Again

The Net Neutrality issue has raised its head again, as reported in this Light Reading piece. This time, EU telecoms are pushing back against the big US-based web companies (not named, but the identities are obvious), who they say are exploiting telco investments in access infrastructure to reach their users and earn significant revenues and profits. Yes, they may be correct, but I think that expecting a neutrality-linked resolution isn’t the right, or even a possible, solution.

Up to the 1980s, telecommunications services were universally based on a model that shared the revenue for a service across operators who participated. Even voice calls involved a “termination charge” assessed by the operators who received a call, since the charge for the call was paid to the originating operator. But from the first, the Internet adopted a “bill-and-keep” model, meaning that there was no settlement among providers for their role in a given service.

The bill-and-keep model favors having Internet-based services connected to the Internet through specialized operators rather than retail ISPs, so a social-media site wouldn’t pay AT&T or Verizon or BT or Orange for even their own connections. Retail broadband Internet has the lowest revenue per bit of any service, so this means that the retail ISPs are stuck with large investments to cover residential users—investments that have a very low ROI. Consumers need increased Internet bandwidth to consume the services they’re being offered, but they’re not willing to pay proportionally for the increased capacity. That, telcos have long said, is a disincentive for infrastructure investment.

On the other hand, the bill-and-keep model favors Internet startups, and the VCs that fund them, because it limits the cost of accessing prospective users and delivering new services. That, it can be argued, is largely responsible for the wave of innovation we’ve seen in the Internet. The telcos, the retail ISPs, had the opportunity to play a role in these higher-layer services, but not only were they unwilling to jump into a new and uncomfortable (for them) business, they were often inhibited by regulatory policies that were aimed at protecting Internet competition from players who built their business as regulated monopolies.

The specific focus of the operators’ complaints is the cost of mobile infrastructure, meaning 5G, and this whole complaint is IMHO linked to operators’ concerns that 5G will not generate any new revenue for them, or at least not nearly enough to create a suitable return on the investment it requires. They also mention that spectrum license policies, which often admit bidders who have little credibility and may be acquiring licenses only to sell them later at a profit, is driving up the cost of mobile deployment.

The problem with this is that being “made to bear some of the costs of network development” is a goal and not a mechanism to achieve it. Do they expect the tech companies to make some sort of direct payment? How would they be compelled to do that. Do they want a return to settlement practices? In many countries, regulations prohibit that. Finally, it sure seems likely that whatever measures were suggested to make OTTs share in network costs would need legislation/regulation to establish and enforce. That raises the big problem: ordinary people.

Ordinary people see “the Internet” as what they get over it. They see “broadband” or their “ISP” as a kind of water and sewage company, a provider of a pipe that matters only if it’s not working. Tell the OTTs that they have to contribute to telco infrastructure and they’ll run a million ads to convince their users that this is going to hurt their social-media experience or raise the cost of their streaming service. Let’s see, a billion consumers go to their leaders and threaten them with political expulsion, with a dozen telcos on the other side. We know how that will turn out, because it’s turned out that way many times in the past when neutrality policy came up.

Then there’s the stock market. All those retirement plans and ETFs that are invested in OTT stocks. Hit the OTTs with a big new cost and what happens to their share price? D..U..M..P. More people are now rushing to their political leadership demanding a reversal, and so are all the big financial companies and pension plans.

There was a time when this discussion was useful, and I was involved in it at that time. When there was no real OTT industry, no streaming video, and little more than dial-up Internet, it might have been possible for enlightened regulation to come up with a solution. Even the nascent Internet industry, in the 1980s, feared the impact of bill-and-keep on infrastructure investment. But nothing was done then, and it’s time to face a hard truth, which is that nothing is going to be done now either.

For over 30 years, this problem has been visible, and it’s not been fixed. ISPs forecast doom if something didn’t reverse the trends, and yet they’re still operating and we don’t see any major stress cracks—at least no more than we’ve seen all along. Operators have cut costs in various ways to address the low ROI on broadband infrastructure, and we chug along. Maybe at some point, there will actually be an ROI crisis for ISPs, but not today, nor tomorrow, nor likely (in my view) in the next three years or so. In politics, anything you can put off for three years, you put off without another thought.

If we’re going to see a remedy for this, it’s not going to come from forced settlement between OTTs and operators. It’s going to have to come from enlightened thinking on the part of the operators themselves. If infrastructure is too costly, we need to make it cheaper. Most of the cost improvements we’ve made in networking have been tweakings of the legacy models of networks that operators have clung to for decades. A new, better, network model is needed.

That’s why I’m so focused these days on metro, because it’s metro that matters, for three reasons.

Reason One: Metro is where operator 5G investment is focused. If operators are looking for mobile infrastructure cost relief, this is where the majority of cost will happen, so this is where the relief needs to be focused.

Reason Two: Metro is where new services will reside. Edge computing in particular is dependent on achieving a balance between close-to-the-user and deep-enough-for-economy-of-scale. Metro is where that balance can be expected. Telco partnerships with cloud providers, which could be valuable in reducing infrastructure investment, are already focused in the metro.

Reason Three: Metro is where OTTs will need resources for their own future services. Meta (Facebook) interest in the metaverse makes it all but certain that a metaverse will be a part of social-media evolution. You need to host a major chunk of the metaverse at the edge, in the metro. CDNs, vital for streaming video, are already there.

If operators want OTTs to help with investment, they need to work out a symbiotic model for metro investment, a model that lets the players contribute their needs and resources to build what’s essentially a multi-tenant metro framework. That doesn’t mean the operators need to define the model, and in fact it’s probably more likely that vendors on the software and server sides, and the public cloud providers, will cooperate to do the heavy lifting. What operators need to do is make that mission into a mandate, rather than hoping for changes that we’ve failed to accept for three decades already.