The Shape of 2022 to Come: Software and the Cloud

Wall Street, like most everyone, tends to put out prediction pieces at the end of the year. While the Street’s predictions aren’t always right, they do represent the view of a group of people whose livelihood depends on predicting at least major revenue trends. My own research isn’t always right either, so perhaps by looking at both Street and CIMI views in parallel, we can get at least a glimpse of 2022.

Let’s start with software. Most Street analysis says that software is looking promising for 2022, but what’s particularly interesting is in the specific types of software they’re most positive about. “Digital transformation” leads the list, and that topic is first admittedly vague and second a shorthand for a shift to a more direct sales and support model, which (as we’ll see below) also benefits the cloud. Other software categories on the “good” list include work-from-anywhere, data/analytics, DevOps, and security.

All of the secondary software categories are a bit “niche” in nature. That can be proved out by the fact that the specific software companies the Street likes tend to be very specialized, smaller, firms. If software was the Great Tide Lifting Boats, you’d expect to see software giants (with, to follow my analogy, bigger and more numerous boats) getting a lot of lift. Not so.

One common theme across software in general is a Street appreciation for “subscription”, meaning paying a fee (usually annual but sometimes monthly) for software, making software into a recurring revenue stream. Obviously sellers like that sort of thing, and the concept has spread significantly over the last decade. However, subscription software is the space where what the Street likes and what enterprises and other buyers like are most divergent. Subscription software is nice for the software vendor because of that recurring revenue but buyers don’t like it for the same reason software vendors do, and you can see the pushback in action in recent news.

Downloads of open-source LibreOffice hit record levels recently. As perhaps the leading open-source competitor to Microsoft’s Office suite, LibreOffice adoption is a measure of software-subscription angst. We also heard that at least one EU government is working to shift to LibreOffice, and I know a number of enterprises who have done the same already, citing the growing cost associated with subscription services from Microsoft.

Adobe, of course, is largely dependent on subscription software with its Creative Suite, and the back pressure that generates from users is why some analysts don’t like it. However, Adobe serves a community of professionals who’ve spent a lot of time learning the tricks and techniques of the software, so it’s not that easy to discard. Still, the Street increasingly seems to see the subscription model as a mixed blessing.

My view, based on my own experiences and those of enterprises I’ve chatted with, is that subscription pricing is eventually going to face enough back pressure to make it unreliable as a new revenue source. Users want more incremental value year over year, and if that’s not delivered they reason that they’re paying more for less. However, this is most likely to impact software were ongoing integration and support isn’t a big part of the subscription value proposition. Software creates the functional value of IT, so I don’t think the space is in great jeopardy, but I do think that subscription-based software is at risk of benefit stagnation.

The next area is the cloud, and here I think the Street is suffering from a lack of understanding of the whole of enterprise IT evolution. What buyers tell me is pretty simple; they spend where they have a justification. If you look at IT infrastructure broadly, you see that for decades we’ve spent more on IT because we got direct productivity and efficiency benefits from it. IT budgets were at one time biased at about 62% for new projects to improve productivity and 38% for “modernization”. In 2022, I’m told that we’ll see new productivity projects drop below 50% of spending for the third year in a row. All the big IT successes of the past were created by major paradigm shifts that created cyclical upswings in the relationship between IT spending growth and GDP growth overall. We had three such waves in the first 40 years of IT, and we’ve had none since then. It’s not that we’ve not seen change, but not seismic change of the sort that creates big spending jumps. Thus, we’re tweaking the edge of things we’ve done for decades.

What the Street is missing is that this doesn’t mean that things are “shifting” to the cloud in the sense that they’re leaving the data center. What’s happening is that core business applications are seeing modest growth because they tend to grow at the pace of business activity (GDP), but that front-end elements of applications to better support direct online sales and support are gaining, because they improve organizational efficiency. We are doing in the data center what we always did, but we’re doing new stuff more in the cloud. The ramp in IT spending that the Street sees in 2022, and saw in 2021, relate to the use of the cloud to better address sales and support. Obviously, this benefits the major cloud providers, Amazon, Google, Microsoft, and Salesforce, and it’s recently helped Oracle.

The problem in this space is in a sense the same as the problem with subscription software. The cloud is benefitting not from displacing the data center but from its ability to address the more variable workloads associated with online sales/support. In other words, we’re shifting away from human-mediated activity to automated activity in those areas. The cloud is not the driver, the automation goal is. At some point, the shift will be complete, at which time growth will be slower. The cloud will then need specific new drivers, which may favor things like SaaS because it can target a productivity/efficiency benefit more directly. However, if you have to target things specifically to find benefits, you’re still tweaking, not revolutionizing.

The biggest advantage of software as a spending target for enterprises or service providers lies in the fact that features are created through software, and features are what can tie spending to benefits. A shift to a software emphasis makes sense if that’s what’s going on, but the problem is that this isn’t the case for many vendors. Instead, they’re using subscription to play on the capital/expense issue, and software is the only easy place to apply subscription pricing. Software will indeed do better, and public cloud will do better, when compared to the hardware and data center spending still largely tied to GDP growth. But neither software nor the cloud will provide revolutionary returns, not until they address revolutionary benefits.

We can see a trend, and an issue, in Street research. They deal with the market in a helpful way at one level, by favoring companies who represent “innovation” and thus stand to gain the most in market share, revenue, and share price. Right now, they’re saying that innovation is focusing on cost and mistake management—operations, analytics, and so forth. The issue is that they see the market as a series of loosely coupled product segments, where buyers of software and cloud services are supporting processes, not products. Keep this in mind as we move to the next topic, networking.

Are New Players Making the Cloud More Competitive?

What statement about the cloud has the largest number of enterprise agreements? “My cloud services are too expensive!” is the answer. Not only that, cloud costs are the number one reason enterprises give for not doing more in the cloud, and the number one reason they give for looking at another cloud provider. On the surface, this is a pretty profound insight into user attitudes about cloud computing, but it’s even more profound if you dig deeper.

The Economist ran a piece on “The Battle of Competing Clouds” and its growing intensity. It points out that the rumored gross margins for public cloud services from Amazon is 60%, a level that’s attracting competition from smaller players, including Oracle, who recently reported way-better cloud revenues than expected. There are some good points in the story, but I also think it’s missing some key points.

One true point the story makes early on is that users bought into the cloud for its elasticity and agility rather than because of its lower cost. Those of you who have followed my writing over the years know that I’ve always said that the economy of scale of a public cloud provider isn’t so much greater than that an enterprise could achieve that “moving to the cloud” was likely to result in lower costs, given provider profit margins. The problem is that moving to the cloud was never the real goal for enterprises. Instead, what they were looking for was the ability to support highly variable workloads with broad geographic distribution. It wasn’t the stuff in the data center that they were trying to move, it was stuff never in the data center that they were trying to develop, and develop right.

A cloud giant like any of the Big Three (Amazon, Google, Microsoft) has the ability to deploy resource pools that can provide agile capacity for a large base of customers, when private facilities for any such customer would require building out to the peak capacity needed, at a far higher cost. It wasn’t economy of scale, but elasticity of scale that was valuable. However, elasticity of scale is still ultimately about costs; in this case, the cost of a capacity-agile hosting option.

A fledgling cloud provider faces a problem much like that of an enterprise looking for “private cloud”. How do you build out to match the elasticity of scale of a giant? You either have to risk a boatload of money or you have to try to find buyers whose situation fits a more practical infrastructure model. The former option is practical for some players like Oracle or IBM, who have other products/services to generate the capital needed. The latter requires a shift in buyer thinking, and that shift is the big news in cloud competition that The Economist missed. It’s “competitive multi-cloud”.

Amazon had a major outage recently, and of course it’s not the first for them, nor are they the only cloud provider who’s had one. When something like that happens, the companies who have depended on their cloud provider for the critical front-end relationship with their customers and suppliers take a major hit. They want to mitigate their risk, and so they look at spreading out their commitment to the cloud among multiple providers, which is a multi-cloud. By carefully targeting their services and carefully placing their resources, smaller up-and-coming providers could make a go in the competitive multi-cloud.

Despite the hype we’ve had on multi-cloud, enterprises don’t use them nearly as much as you’d think from the stories, but that’s changing…slowly. The reason for the slow change is that the major cloud players understand the risk of cherry-picking competition, the “death of a thousand cuts” that erodes their profit as small players grab specialized niches. They have multiple weapons to address that risk and they’ve used them regularly.

High among their weapons list is value-added platform services, or “web services”. These are software features hosted in the public cloud and available (for a fee) to developers. Look at the websites for the public cloud and you see dozens of such features listed. Each of them makes development of a cloud application easier, and each of them is implemented in a slightly different way across the cloud providers. Applications that are written for Cloud A will almost always have to be changed to run on Cloud B. Operations practices are also likely different for each cloud, and multi-cloud operations and management is harder than that of a single cloud. All these things discourage enterprises from trying to use multiple providers.

Volume discounts are another factor. If cloud provider pricing has usage tiers, which most do, then users will pay more if they divide their cloud applications across providers. That’s true even when enterprises lack the skill or resources needed to truly audit their cloud bills and determine just how much they’re actually paying, or saving.

Will the evolving cloud landscape make cloud services more competitive? It’s true, as the article suggests, that customers are putting more pressure on the Big Three to control the cost of cloud services, and it’s likely that there will be an erosion in cloud service prices and margins from the major players. But does that really generate any competitive pressure? Do other cloud providers like Oracle or IBM or Cloudflare, find the market attractive when they have to know that the giants will cut their profit margins to keep competition out? Wouldn’t the giants also advance the platform-service features they already offer?

There’s already pressure on the cloud giants, particularly Amazon, to change their data transfer charging policies, which currently penalize movement of data across cloud boundaries, and some changes have come about. A broad trend to eliminate these transfer costs could lower multi-cloud costs, but my discussions with enterprises suggest that most multi-cloud deployments are based on a “parallel hybrid” model, where multiple cloud providers home back to a common set of data center applications and traffic crosses between clouds only in exceptional situations like a cloud provider failure.

Transfer cost reductions could help cloud providers themselves, though. While it’s convenient to talk about “front-ending” legacy data center applications with agile cloud components, the boundary isn’t a sharp one. As you move toward the user from the transactional applications, you see a gradual increase in the value of scalability and agility, and where that translates to a cloud versus data-center implementation depends on costs. If moving data across the cloud boundary were less expensive, then more components could cross the cloud border. But that commits more to the current providers rather than empowering new competition.

The real driver of increased cloud competition has to come from cloud-independent platform features. As long as every major provider has their own platform tools, it makes it harder for users to adopt multiple clouds because their software isn’t portable. We have cloud software and features from players like IBM/Red Hat and VMware, and many of the enterprises who are really trying to leverage hybrid cloud use this software instead of cloud-provider platform tools to avoid lock-in. Oracle is able to leverage its software success to create cloud success, showing that software-centric niches are possible. Since the cloud market is growing, niches are also growing, and this may give the impression of “competition” when it’s really a rising tide doing its usual boat-lifting.

Big cloud providers have the advantage in terms of capacity elasticity at a reasonable price, making it hard for the next tier to push out of being niche players. More competition in the cloud space can only come from that sort of push, because that’s the only demand shift that could create it. I don’t think that the forces needed to transform cloud competition are in play right now, and I think it will likely be several years until they can even start to transform the competitive landscape. I do think that second-tier players will work to address larger niches, and that the big players will respond with changes to pricing. Competition, then, may help cloud buyers by its threat more than its reality.

Reading Through Ciena’s Strong Quarter: Is Fiber Winning?

Ciena just posted the first billion-dollar quarter in its history, and that’s surely big (and good) news for the company and investors. What sort of news is it for the networking market overall? I think it’s a sign of some major shifts in thinking and planning, things subtle enough to be missed even by some network operator planners, but important enough to impact how we build networks overall. And, of course, the vendors who supply the gear.

If we look at packet networking’s history, we can trace things back to a RAND study in the mid-1960s, noting that overall utilization was improved by breaking data into “packets” that could be intermingled to fill up a link with traffic from multiple sources. For most of the history of data networking, we were focused on traffic aggregation to achieve transport economy of scale and to provide uniform connectivity without having to mesh every user with every source. That’s not going away…at least, not completely.

Fiber transport has gotten cheaper every year, and as that happens, the value of concentrating traffic to achieve economies of scale has lessened. Operators started telling me years ago that it was getting cheaper to oversupply bandwidth than to try to optimize how it was managed. Since then, we’ve been doing electrical handling of packets more and more to support full connectivity without meshing.

The problem with packet networking is that it introduces complexity, and the complexity means that operations costs tend to rise and operations errors rise too. Network configuration problems, misbehavior of adaptive protocols designed to manage traffic and routing, and simple scale have combined to make opex a big problem. Historically, network-related opex has accounted for more cents of every revenue dollar than capex, in fact, and operators have worked hard to drive opex out of their costs, to sustain reasonable return on infrastructure.

Fiber transport is perhaps the single most essential element of the network. There are a lot of ways we can push packets, but they all rely on fiber to transport data in bulk, and in many cases (like FTTH) even to individual users. Ciena’s quarter may be telling us that network trends are combining to make fiber more important, even relative to the packet infrastructure that overlays it. The best way to understand what that would mean for networking overall is to look at what’s driving a change in how we use fiber itself.

One obvious change in fiber is in the access network. Whether we’re talking about fiber-to-the-home (FTTH) or to the node (FTTN) or tower (backhaul), we’re still talking about fiber. As per-user capacity demands increase, it’s more and more difficult to fulfill them without taking fiber closer, if not right up to, each user. Copper loop, the legacy of the public switched telephone network (PSTN), has proved unable to reliably deliver commercially credible broadband, and fiber is exploding in access as a result.

Another change that’s less obvious is the dominance of experience delivery as the primary source of network traffic. Video is a bandwidth hog, and for years regular reports from companies like Cisco have demonstrated that video traffic growth is by far the dominant factor in Internet traffic growth. With lockdowns and WFH, even enterprises have been seeing video traffic (in the form of Teams and Zoom calls, for example) expand.

Cloud computing is the other factor in enterprise bandwidth growth. To improve sales and customer support, businesses worldwide have been expanding their use of the Internet as a customer portal, and pushing the customized experience creation process into the cloud. The Internet then gathers traffic and delivers it to the data center for traditional transaction processing. Cloud data centers, once almost singular per provider, are now regional, and we’re seeing a movement toward metro-level hosting.

Edge computing is part of that movement. There are multiple classes of applications that could justify edge computing, all of which relate to real-time real-world-to-virtual experiences. While we use the term “edge” to describe this, the practical reality is that edge computing is really going to be metro computing, because metro-area hosting is about as far outward toward the user that we can expect to deploy resource pools with any hope of being profitable for the supplier and affordable for the service user. That truth aligns with our final fiber driver, too.

That driver is metro concentration of service-specific features. We already cache video, and other web content, in metro areas to reduce the cost and latency associated with moving it across the Internet backbone. Access networks increasingly terminate at the metro level, whether we’re talking wireless or wireline. In the metro area, we can still isolate user traffic for personalization, but still concentrate resources efficiently. It’s the sweet spot of networking, the place where “hosting” and “aggregation” combine, meaning the place where network and data center meet.

All of this so far represents first-generation, evolutionary, changes to the role of fiber in the network, but once we concentrate traffic in the metro and reduce latency from user to metro, we’ve also changed the fundamental meshing-versus-packet-aggregation picture. If service traffic concentrates in metro areas, then global connectivity could be achieved by meshing the metros. It’s that second-generation change that could be profound.

There are somewhere between three and fifteen hundred metro areas in the US, depending on where you set the population threshold and how far into the suburbs you allow the “metro” definition to extend. My modeling says that there are perhaps 800 “real” metro areas, areas large enough for the metro and fiber trends I’ve talked about here to be operative. This is important for fiber reasons because while we couldn’t possibly afford to fiber-mesh every user, and it would require a quarter-million fiber runs to fully mesh all metro areas, we could apply optical switching that adds a single regional hop and get that number down. Define 8 regions, and you have 3,200 simplex fiber spans to dual-home your 800 metro areas, and less than 60 to fully mesh the regions themselves.

What I’m describing here would create a network that has no real “backbone” at the packet level, but rather focuses all traditional packet processing in the metro centers (regional networking would be via optical switching only). We’d see higher capacity requirements in the metro area, and likely there we’d see a combination of optical and electrical aggregation applied to create capacity without too much introduced handling delay.

This structure is what would transform networking, because it could make applications like IoT and metaverse hosting practical on at least a national scale, and likely on a global scale. It would demand a lot more optical switching, a focus on avoiding congestion and other latency sources, and likely a new relationship between hosting and networking to avoid a lot of handling latency. But it would make low-latency applications freely distributable, and that could usher in not only a new age of networking, but a new age of computing and the cloud as well.

We’re not here yet, obviously, but it does appear that a metro focus, combined with regional optical concentration, could transform global networks into something with much lower latency and much greater cost efficiency. I don’t think that hurts current packet-network vendors, because whatever they might lose in packet-core opportunity, they’d gain in metro opportunity. But it does say that a metro focus is essential in preparing for the shift I’ve outlined, because otherwise good quarters for fiber players like Ciena could come at the expense of the packet-equipment players.

A Placement Optimization Strategy for the Cloud and Metaverse

One of the biggest differences between today’s component- or microservice-based applications, and applications likely to drive the “metaverse of things” or MoT that I’ve been blogging about, is the level of deployment dynamism. Edge applications aren’t likely to be confined to a single metro area; a “metaverse” is borderless in real-world terms because its virtual world is expected to twin a distributed system. Many IoT applications will have to deliver consistently low latency even when the distribution of sensor and control elements means that “the edge” is scattered over perhaps thousands of miles.

As things move around in the real world, or in the virtual world, the configuration of a given experience or service will change, and that’s likely to change where its elements are hosted and how they’re connected. While we’ve faced agility issues in applications since the cloud era began, the agility has been confined to responding to unusual conditions. Real-world systems like a metaverse face real-world movement and focus changes, and that drives a whole new level of demands for agile placement of elements.

Today, in IoT applications, many hybrid cloud applications, and probably all multi-cloud applications, we’re already running into challenges in how to optimize hosting. This, despite the fact that only scaling and failover introduce dynamism to the picture. It’s possible to configure orchestration tools like Kubernetes (for a cluster) or Anthos (for a kind of cluster-of-clusters) to define affinities, taints, and tolerations to guide how we map software components to hosting points. Even for currently less-dynamic applications, with fewer hosting decisions and less complex decisions to boot, this process is complicated. Add edge hosting and metaverse dynamism and I think we’re beyond where traditional operations practices can be expected to work.

We can see the emergence of these issues today, and as I noted in my blog on December 13th, we also have at least one potential answer. I became aware through LinkedIn of a company, Arctos Labs, who was focusing on “edge cloud placement optimization”. They sent me a deck on their approach, and I set up a demo (online) of their software. I propose to use it as an example of the critical middle-layer functionality of edge software that I noted in that prior blog. I also want to see if the capabilities of the software could be applied to more traditional application challenges, to assess whether there’s a way of evolving into it without depending on widespread edge deployment.

Arctos says that edge computing is driving three concurrent shifts (which I’ll paraphrase). The first is a shift from monolithic models to microservices, the second a shift from transactional toward events, streaming, and real-time AI, and the third from centralized to heterogeneous and distributed. These shifts, which are creating requirements for edge-computing models, drive latency sensitivity, workload changes that are difficult to predict, multifaceted relationships among potential hosting points for components, sensitivity to geographic distribution of work, and process points, a need to optimize limited edge economies of scale, and a need to be mindful of energy efficiency.

According to the company (and I agree), “edge applications” will be applications that likely have some edge-service hosting, but are also likely to have public cloud and data center hosting as well. As I’ve noted in prior blogs, an IoT application has multiple “control loops”, meaning pathways that link events and responses. Each successive loop triggers both a response component and activates (in at least some cases) the next loop in the sequence. Devices connect to gateways, to local processing, to edge services, to cloud and data center, in some sequence, so we have “vertical” distribution to accompany any “horizontal” movement of edge elements. It’s trying to distribute components across this hosting continuum that creates the complexity.

Arctos’ goal is to provide “topology-aware placement optimization based on declarative intents”. In practice, what this means is that they convert behavioral constraints like tolerable latency, utilization, and cost into a decision on where to put something in this (as they phrase it) “fog-to-cloud continuum”. In their demo, they show three layers of hosting and provide a manual input of things like hosting-point capacity utilization, latency tolerance of the control loops, and cost of resources. When you change something, whether it’s latency tolerance or hosting-point status, the software computes the optimum new configuration and asks whether it should be activated. The activation would be driven through lower-level platform/orchestration tools (in the demo, it was done through VM redeployment).

I think the easiest way of looking of the software is as a “virtual placement assistant”. You set it up with the declarative intents that describe your application and your resources. You connect it to the management APIs that give it visibility into the infrastructure and applications, and to the control APIs to manage deployment and redeployment, and it then does its magic. In the demo, the setup was designed to assist an operations team, but I don’t see any reason why you couldn’t write the code needed to provide direct resource telemetry and control linkages, making the process either manual-approve as it is in the demo, or automatic as it would likely have to be in many MoT missions.

Let’s go back, though, to that vertical distributability. In the current demo-form software, Arctos is offering what could be a valuable tool for hybrid cloud, multi-cloud, and cloud-associated edge computing. By the latter, I mean the use of cloud-provider tools to extend the provider’s cloud to the premises, something like Amazon’s Greengrass. The software could allow an operator to deploy or redeploy something without having to figure out how to match what they want to all the various “attractors” and “repellers” parameters of something like Kubernetes.

This isn’t a standalone strategy. The software is designed as a component of an OEM setup, so some other provider would have to incorporate it into their own larger-scale package. That would involve creating a mechanism to define declarative intents and relate them to the characteristics of the various hosting options, and then to map the Acrtos stuff to the management telemetry and control APIs needed.

This suggests to me that perhaps since my MoT “middle layer” has to include both the optimization stuff and the ability to define abstract services, that a player who decided to build that abstract services sub-layer might then be the logical partner for Arctos. There doesn’t seem to be anyone stepping up in that space, and MoT and edge services may be slow to develop, so Arctos’ near-term opportunity may be in the hybrid-, multi-, and cloud-extended premises area, which has immediate application.

Arctos is a model of the placement piece of my middle MoT layer, and of course there could be other implementations that do the same thing, or something similar. AI and simulations could both drive placement decisions, for example. Another point that comes out of my review of Arctos is that it might be helpful if there were a real “lower layer” in place, meaning that the hosting and network infrastructure exposed standard APIs for telemetry and management of network and hosting resources. That would allow multiple middle-layer elements to build to that API set and match all the infrastructure that exposed those APIs. However, I don’t think we’re likely to see this develop quickly, which is likely why Arctos is expecting to work with other companies who’d then build their placement logic into a broader software set.

I’m glad I explored the Arctos implementation, not only because it provides a good example of a middle-layer-of-MoT implementation, but because it addresses a challenge of component placement that’s already being faced in more complex cloud deployments. If the metaverse encourages “metro meshing” to broaden the scope of shared experiences, component mobility and meshing could literally spread across the world. Arctos isn’t a complete solution to MoT or vertically linked component placements, but it’s a valuable step.

There may be full or partial solutions to other MoT and edge computing software challenges out there, but not recognized or even presented because the edge space is only starting to evolve. The final thing Arctos proves to me is that we may need to look for these other solutions rather than expect them to be presented to us in a wave of publicity. I hope to uncover some examples of other essential MoT software elements down the line, and I’ll report on them if/when I do. I’ll leave the topic of MoT software until something develops.

Software Layers and Edge Computing Platforms

Last week, I blogged a series on the evolution of network and edge-hosted services, SD-WAN, SASE, and NaaS. I proposed that the edge platform of the future be based on a “metaverse of things”, and that the way MoT would likely develop would be as a set of layers. Today, I want to look at how those layers might look and how the evolution to MoT might occur.

The presumption behind this is that a metaverse is by its very nature distributed. Software would have to be hosted in multiple physical resource pools (the edge computing resources deployed in metro areas, likely) and that the pools would have to be connected in both a physical/transport sense and a virtual sense. Since the common requirement for edge applications is latency sensitivity, all of this would have to be accomplished without creating unnecessary transit delays for events/messages.

The software model for MoT would have to support, at the low level, this distribution of components, but at the higher MoT-logic level, the connectivity among them and the management of the exchange between the avatars that represent people or objects, and their real-world counterparts. This would have to be done via abstractions because different functions at different levels need to be implemented independently, something that was a fixture in the OSI model of layered protocols in the 1970s, and is the basis of “intent modeling” today. The logic would also have to fit the needs of the three major classifications of MoT applications.

The first class of applications would be the “Meta-metaverse” applications, the applications that Meta/Facebook envisioned, and that are represented today by massively multi-player gaming. I believe this model would define a metaverse as a series of locales, places where avatars and the associated real-world elements congregate and interact. A given avatar/human would “see” the locale they were in, and so each locale would likely have a virtual agent process somewhere, and that process would collect the avatar-synchronizing data for all the stuff in it. This model is avatar-focused in that the goal is to create a virtual framework for each inhabitant/avatar and represent that framework faithfully to everyone in the locale of the focused inhabitant.

The second class of applications would be the IoT applications. These applications could have a variable way of creating a focus. Industrial IoT, for example, would be process-focused where the locales and most of what made them up were static, but that some “goods avatars” might migrate in and out. Transport IoT might focus on a particular shipment, a particular conveyance, or even a particular item. The latency sensitivity of IoT applications would also vary, depending on control-loop requirements. Process-focused industrial IoT would likely need lower latencies than transport IoT, for example.

The third class of applications would be network-function oriented, meaning things like 5G and O-RAN hosting. These would be topology-focused, which I think is closely related to the process focus of some IoT. A network is typically made up of domains that are to an extent black boxes; they assert properties to other domains but not configuration details. Inside a domain you’d need to represent the physical reality of the network, which in most cases is determined by the real trunk configuration and the location of any static elements, such as the stuff the trunks connect to and any “real” (as opposed to hosted) devices/appliances like switches and routers.

MoT “middleware” would have to support all these application classes, but to try to do that by directly invoking low-level cloud or edge resources would tend to make the implementations brittle; any change to those low-level resources would have to be accommodated at the application-middleware level. This would require a three-layer model, with it likely that some of these major layers would themselves be subdivided to optimize the agility of the implementations.

The bottom level of this model would be the “platform” level, which would be the resources, the resource virtualization (VM, container) layer, and orchestration. Wherever possible, this layer should be supported using standard cloud and virtualization tools and software, because we’d want to exploit the technology already in use rather than have to create a new framework. However, it might be useful to assume that there was a kind of “shim” introduced to frame the resources of the platform layer options in a consistent way, which is part of the Apache Mesos cloud concept. Call it an intent-model overlay on the legacy platform options.

The top level of the model would be the MoT middleware, exposing APIs that supported building the three application models described, and perhaps others as well. It might be possible to build a generalized top-level MoT model-based implementation, something that could turn a hierarchical object model of an application into something that could be executed.

The middle level? This is something that’s a fairly new and not-at-all-obvious piece, IMHO. Applications today may be dynamic in the sense that they’re scalable and can redeploy in response to failure, and they may have requirements and constraints that would limit how they’re mapped to resources. Things like Kubernetes or a DevOps tool would deploy them based on these parameters. In an MoT application, the relationship between software and the real world, and thus the relationship between software components, is much more dynamic. There are more constraints to deployment and redeployment and to scaling, and since these processes could occur much more often, there’s a greater need to be able to tune things precisely. The middle layer of this structure is what has to do that.

The middle layer also has to perform a critical role in that abstraction process. We can’t have resource-level abstractions directly consumed at the application (top) level, or we would be inviting different implementations for each of our application models. The middle layer has to present the applications with a true MoT PaaS API set, so that applications can be written without any knowledge of the means whereby they’re distributed or how the elements within the applications communicate.

It’s my view that edge computing’s success depends on defining these three layers in as generalized a way as possible, so that all possible MoT models can be implemented without creating the kind of linkage between top-level application behavior and low-level resources that makes for brittle applications and silos. Development of edge applications is either very different from today’s development because of the special needs of edge applications, or there’s no real need for the edge to start with.

The challenge in doing that is the multiplicity of application models that make it important in the first place. Innovation tends to happen because of new players, especially startups. For decades, VCs who find these companies have been adamant that the recipients of their funding be “laser focused”, and avoid “boiling the ocean”. This is VC-talk to mean that they want their companies to find a very specific niche and play there. MoT and many other modern technology advances depend on developing a framework, an overall architectural model, to define all the essential pieces. Who does that in a laser-focused community?

What’s likely to happen instead is that we’ll build upward from my bottom layer, as MoT-like issues are raised by traditional compute and network evolution. The middle layer itself may end up being implemented as two “sub-layers”. The first would deal with the way that abstracted infrastructure can be harnessed for dynamic component deployment based on application logic, a significant shift from today’s component orchestration, which would introduce dynamism only in response to scaling or failover. This is essential if MoT logic reconfigures itself in response to things like avatar movement. The second layer would then create an abstract set of “metaverse services” that would host and connect things in a totally implementation-insensitive (but constraint-aware) way.

We have at least one example of the first of these sub-layers already, and that’s what I’ll talk about in my next blog.

Fusing the Metaverse, IoT, and Contextual Services

If edge computing is going to happen, something has to initiate a model and drive/fund its deployment. There have always been three applications associated with the edge; 5G hosting, IoT, and (recently) metaverse hosting. My view from the first was that 5G hosting could provide an initiation kicker, but NFV is barely aligned with current cloud-think, and it doesn’t take any useful steps to create a unified edge architecture. One or both of the other two would have to take our vision of the edge from something NFV-ish to something that could be exploited by other applications in the future.

The challenge with the edge lies in that transition. We need a model of edge computing that could serve to support application development. In effect, what we need is a platform-as-a-service (PaaS) specification that defines the service set, the web services, that edge applications could call on. If this service set supported 5G hosting fully, and could also support IoT or metaverse hosting, then 5G deployment could pull through edge infrastructure that could serve other compute missions effectively.

The challenge with that is that we don’t really have a PaaS definition for either IoT or metaversing, and if we assume that these applications evolve independently, it’s almost certain that what we’ll end up with is two definitions. Would it make sense to try to formulate a single PaaS? If so, then it would make sense to try to converge the drivers in some way. I propose that we do that by considering the metaverse-of-things. MoT might be the future of the edge, and some may already realize that.

A metaverse is an alternate reality in which people and other objects/elements are represented by avatars. In some cases, the other elements may be totally generated by the metaverse software, what the old world of Dungeons and Dragons called “non-player characters” or some of the characters in computer gaming. In some cases, as with the avatars representing real people, the metaverse elements are synchronized with the real world. The metaverse software, hosting, and networking framework has to accommodate this high-level mission.

How different is this from IoT? That depends on how you look at an IoT application. One way, perhaps the dominant way in play today, is to view it as an event processing or control loop application, but there’s another way, and signals that this other IoT model is gaining traction can be seen in recent market events.

Look up “digital twin” on a search engine and you get a lot of references. In simple terms, a digital twin is a computer representation of a real-world system, a graph of relationships and interactions that model an actual process or ecosystem and that, by modeling it, facilitates the understanding of the real world. The elements of a digital twin have to be synchronized in some way with the real-world things it represents, something that can be done for statistical-type twinning by drawing from databases, or for real-time twinning through IoT.

It’s hard not to see the similarity between the concept of a metaverse and the concept of a digital twin. In the end, the difference lies in where the real-to-virtual link occurs. In the metaverse, it’s the people who use it who are the focus, and it’s their behavior that’s mirrored from the real to the virtual world. In IoT, the goal is to create a virtual model of a real-world system, like an assembly line, and sensors link the relevant conditions of the elements between the rel and the virtual. If we generalize “metaverse” to include any virtual world with a link to the real world, that single model fits both missions.

Since the value of edge computing lies in its ability to reduce process latency, which is something necessary in at least the Meta version of a metaverse and in IoT as well, it seems to me that we could indeed call an IoT digital twin a metaverse-of-things, or MoT. That then could allow us to frame a common PaaS to support both the primary drivers of edge computing at the same time, increasing the chances that we would be able to realize new opportunities from the investment in metro hosting that 5G could deliver.

As I see it, MoT’s PaaS would have to be about avatar manipulation, but the avatar could represent not only a live person/inhabitant, but also a physical object. An avatar would then be a software object that presented a series of APIs that would allow it to be visualized (including changing its representation at the sensory level), let it synchronize with real-world elements, let it bind to other objects, and so forth. The MoT PaaS would provide tools to facilitate the avatar’s use, and developers of a metaverse would use those tools to create it. Similarly, IoT developers would build digital twins using the same tools.

I envision the APIs involved in a PaaS and also a MoT application as supporting events, meaning messages or simple signals. An avatar, sending a “show” message to the PaaS service to render an avatar, would then represent the avatar in the way it was designed to be visualized, in the frame of reference in which it currently appeared. That frame of reference would depend on where the avatar’s real-world twin was, where the viewer was, and so forth.

Issues of connectivity would be resolved inside the PaaS, not by the applications using it. All avatars would “inhabit” a virtual space, and coordinating their interactions both in that space and across the real-world facilities where the avatars’ twins really lived, would be done within the platform. Network and hosting details would be invisible to the application, and to the real-world users, providing that hosting and connectivity were designed to meet the latency requirements of the mission, meaning the service the MoT is representing. Just as the cloud abstracts what we could call “point hosting” of an instance of software, MoT PaaS would abstract the way a distributed twin of the real world maps in one direction to what it represents, and in another direction how it’s hosted and connected.

MoT would also support what I’ve called “contextual services” in past blogs, services that represent not so much “virtual” or “artificial” reality but augmented reality. The metaverse of 5th Avenue in New York, for example, might contain objects representing each building, shop, restaurant. Each would have a property set that described what was available there, and a real person walking the street (with AR glasses, of course) might see indicators showing where things they were interested in or looking for might be found. The same mixture of real and virtual might be used in a new model of collaboration or hybrid work.

The question that remains is how MoT could fit with 5G. The easy way out, which may or may not be the best way, would be to say that MoT is “edge middleware” and that 5G depends on “NFV middleware”. That would mean that edge computing is standardized really only at the resource level. This means that early 5G work would do nothing to harmonize the implementation of MoT, or of the general metaverse or IoT missions. That gives us less technology to leverage as more applications develop at the edge, and more risks of creating silos.

The better way would be to think of 5G elements as a metaverse, with the elements each contributing their digital state to their virtual equivalent. Then the deployment and management of 5G could be handled the same way as it would be in the other MoT applications. That demonstrates, I think, that there’s a considerable value in thinking the MoT concept through, in generalizing the metaverse model to apply to all edge missions if possible. If we do that, we’re advancing the cloud and the network together.

This poses a threat to operators, largely because they’re unlikely to play any role in developing the MoT PaaS, and will instead rely on vendors. Not only does that cede a big piece of their future success to others, there are precious few “others” in any position to do anything for them. If cloud providers like Amazon define NaaS, it’s not only going to hurt operators, it will hurt those who hope to sell to them. If Amazon’s moves become clear, then vendors are likely to be hopping to counter public-cloud dominance. Obviously, that’s a tidal wave you’d like to stay out in front of.

Flavor Wars: SASE, SD-WAN, and NaaS

Here’s a simple (seemingly) question. If SASE is all about secure access to services, what services are we securely accessing? Is SASE, or NaaS, about a transformation of services, or a transformation of the on-ramp to existing services? Enterprises and network operators are both struggling with that question, according to my contacts with both groups, and vendors are struggling to position their stuff optimally, so apparently the market is struggling too.

Is this a question we can even answer, though? If I dig through enterprise comments on SASE, I find some interesting points that just might lead us to some useful conclusions; certainly some interesting ones.

Enterprises are roughly split between the view that SASE is really a security appliance and that it’s essentially built on SD-WAN. I think this split is behind the question I posed, or at least behind the fact that something that basic still is a question. Enterprises who see SASE as a security strategy see the “services” it provides access to as the services they’re already consuming. Enterprises who see SASE as an extension of SD-WAN are themselves split; about half believe that the services are SD-WAN services and the other half that there’s something beyond simple SD-WAN, like network-as-a-service (NaaS).

If you dig into how these belief sets developed, you find (not surprisingly) that they’re determined by the pattern of strategic influence of network equipment vendors. Enterprises who have a single dominant network equipment vendor tend to fall into the SASE-is-just-security camp, and I think this is because network vendors have a robust security business that they’re interested in protecting, even at the expense of kneecapping a new technology by limiting its scope. The enterprises who have set a multi-vendor policy fall more often into the SASE-is-SD-WAN camp, and this seems to be because they’ve adopted vendor and product selection strategies that assign more technology planning to the enterprise than to vendors.

Right now, I’d have to say that the SASE-is-security camp is winning, mostly because even vendors who might benefit in the long term from a more aggressive SASE positioning are either reluctant to push a strategy that might undermine security sales, or just aren’t pushing their own SD-WAN capabilities to the fullest extent possible.

Network equipment vendors have always been a bit skittish about any overlay-network technology. Most jumped into the SDN game when that was a hot area, and when changes in data center networking requirements (multi-tenancy for cloud providers and virtual-network requirements introduced by things like Kubernetes) demanded a more agile network model. Still, overlay networks threaten a tapping off of new requirements, new benefits, from the underlying IP infrastructure that’s the money machine for vendors. This second factor, additive to the desire to protect their security business, seems to be making all network vendors, even up-and-comings, less aggressive with the notion of SD-WAN and NaaS.

Operators have their own concerns about SASE, SD-WAN, and NaaS (beyond the risk of cannibalizing their VPN business), and all of them can be summarized in one term—opex. Well over two-thirds of operators who offer or plan to offer SD-WAN or NaaS services are more interested in doing it via a managed service provider (MSP) partnership than fielding their own offering. These kinds of services, you’ll note from my blog on December 6th, didn’t figure in the fall technology plans of any significant number of network operators.

If you add up all the views from the various stakeholders in the game, I think there’s a common thread here, which is SD-WAN versus NaaS. SD-WAN is seen as a glue-on extension to VPNs by pretty much everyone, even though there are a few enterprises who are looking seriously at shifting away from VPNs to SD-WAN. NaaS is viewed by enterprises as a new kind of service, but by both vendors and operators as a tweaking of existing services, usually via an SASE edge. None of these parties really has a firm view of just what kind of service NaaS really is, meaning what features it has and how it’s differentiated from SD-WAN and other existing services. That question, the question of NaaS, is really at the heart of not only SASE formulation, but perhaps of the future of networks.

As I blogged yesterday, Amazon’s AWS Cloud WAN may be the first salvo in the NaaS wars. In effect, what it does is to virtualize some or all of an enterprise network, and create what might be a unification of SD-WAN, VPN, and virtual-network NaaS, which not only would necessarily define NaaS, but also its on-ramps and points of integration with existing services, meaning how you evolve to it. It would also likely define SASE, but the big news would be that it could define how higher-level services like Amazon’s own cloud and (possibly forthcoming) edge services and AWS web service features become part of SASE integration.

I don’t think there’s much question that cloud providers like Amazon would like to generalize the concept of SASE to make them a gateway to their cloud services. Expanding on AWS Cloud WAN would surely do that, but it also stakes out a possible claim to VPN services offerings by cloud providers, something that could be a direct attack on the network operators. Operators are motivated less by opportunity than by competition, which is one reason why their technology plans seem pedestrian. Would cloud provider threats induce operators to be more proactive with new services? Maybe.

The reason for the qualification is that, as I noted on December 6th, operators are still more dependent than they like to admit on vendors taking the lead in technology shifts. Vendors, as I’ve noted, have been quick to claim SASE support but unconvincing with respect to their support of anything really different in a service sense. Further, network vendors don’t have a role in higher-layer services like cloud web services, which means that they might well see a service-centric expansion of SASE as something that could admit cloud software players like HPE, IBM/Red Hat, or VMware onto the SASE leader board.

Is NaaS just network services, or is it also higher-level and even cloud services? The final determinant in this question may be the ever-present-in-ink but still-unrealized edge model. One of the most important changes that a cloud-provider vision of SASE might bring is a specific link to edge services. If Amazon could assert web service APIs from its cloud across SASE to an enterprise, might they also assert edge-service APIs, which would require they formalize the PaaS edge model? SASE could truly unify “hosting” and “services”, and also define how they’re related.

The obvious question is what NaaS and SASE would define as “services” and what those exposed APIs would look like, particularly for edge computing. Tomorrow I’ll discuss how IoT and metaverse might unite to create something new and powerful, and even a bit of what that might look like.

Amazon’s re:Invent Might Actually be Reinventing

Amazon’s AWS re:Invent extravaganza is worth exploring for two reasons. First, it offers a look at where the market leader in cloud computing is going to try to take the market overall. Second, it offers an opportunity to see where cloud software features and tools will be heading, and that could have a major impact on both edge computing and carrier cloud plans. It’s not that Amazon laid all this out, but that it’s possible to deduce a lot by summing what they did say in the various announcements at the event.

Perhaps the clearest signal the event sent was that Amazon wants more cloud-selling feet on the street. Despite the fact that Amazon’s reseller partners have been criticized for high-pressure sales tactics, the company is expanding its commitment to channel sales. That demonstrates a basic truth, which is that Amazon is looking for a bigger total addressable market (TAM) by going down-market. Some large channel players with vertical specializations can credibly sell to enterprises, but most enterprises expect direct sales. Not so the smaller buyers, who are looking for reseller/integrator types with expertise in their specific business.

That doesn’t mean Amazon isn’t interested in the enterprise; they also announced their Mainframe Modernization service, aimed at getting enterprises to actually move stuff to the cloud rather than use the cloud as a front-end to static data center applications. The service is aimed at applications that enterprises developed themselves, or at least have source code for, and it allows these applications to be translated into Java applications (refactored) or run the code (largely or somewhat as-is) in the cloud.

I can understand these two moves by Amazon, because rival Microsoft has a much stronger position with businesses, as opposed to web-scale startup types that are a big piece of Amazon’s business. Amazon can at least work to mitigate the bad press on channel sales tactics, and a partner-based cloud strategy is likely to be a bigger benefit to a smaller firm with limited access to skilled in-house expertise, it’s likely to succeed. On the Mainframe Modernization front, though, I’m skeptical.

There are some enterprises who’ve told me that they would like to be able to walk away from mainframe data centers, but the view is held primarily by line organizations and some tech mavens, not by CxO-level people. Most of the latter are simply not prepared to cede business-critical processing and the associated databases to the cloud. I think Amazon may have seen a few eager inquiries from people who really can’t make the decision, and read this as the leading edge of an “everything-to-the-cloud” wave.

A related enterprise-targeted offering is the Enterprise Private 5G service. This has been kicked around at Amazon for some time, and essentially it’s a CBRS-shared-spectrum 5G plug and play deal, where Amazon provides the hardware needed and hosts the 5G software on AWS, giving the enterprise private 5G without the hassle of spectrum auctions and installation and maintenance. But while I’ve seen a fair amount of enterprises kick 5G tires, I’m not seeing (or hearing) a big commitment. Not only that, it’s far from clear what you really accomplish with enterprise 5G based on CBRS. Why is this a better strategy than WiFi, particularly WiFi 6? You’d have to have a fairly large facility or campus, a need for a private mobile network on it, and some reason to believe your spectrum wouldn’t be subject to interference by other unlicensed users.

For the line organizations, Amazon announced a couple of things that play to the citizen developer space. The most obvious is Amplify Studio, a new way of doing low-code development on a cloud platform, to then be run there as well. The other is expanded AI/ML tools, including a free ML on-ramp service, whose UI isn’t too much of a stretch for citizen developers. Reasoning that line organizations are more likely to be interested in getting rid of the data center than the CIO teams, Amazon is making sure that the line departments don’t get befuddled by early complexity.

This smacks to me of the same throw-everything-at-the-wall-and-see-what-sticks thinking as the Mainframe Modernization service. There were some potentially useful announcements relating to moving work to the cloud, in terms of controlling the border-crossing access charges and supporting high-performance file server migration to the cloud, and I think the combination indicates that Amazon is trying to broaden its engagement with enterprises. That they’re likely not getting it right at this point means that they’re risking having Microsoft and Google take their shots, with smarter targeting.

From all of this, it’s tempting to say that re:Invent didn’t invent much, which is in fact how some characterized it; as “more incremental”. There were two things that I think are signs of significant thinking and planning on Amazon’s part, and these are where market-changing trends might be signaled, or supported.

The first is the AWS IoT TwinMaker, which fixes an Amazon void in the creation of digital-twin representations of real-world processes that are synchronized through IoT and used to control material handling and production, for example. This service creates a twin-graph and users can provide their own data connectors, which means that the service shouldn’t require users host everything in the cloud, but rather allows them to use Amazon’s premises-extension tools for IoT. That means that things that require a short control loop can still be supported without cloud-level latency. It also means that if edge computing from Amazon were to come along, more could be done in the cloud without disrupting the digital-twin visualization that anchors the whole system.

That plays with the other thing, which is AWS Cloud WAN. This service takes aim at the needs of enterprises whose major facilities are widely dispersed geographically, and what it does is create Amazon-WAN links to avoid having to do wide-area network connections among those sites. It doesn’t seem to be targeting the replacement of VPNs (so far, at least) but rather at ensuring that cloud adoption isn’t hampered by the need to link cloud-distributed applications across private network resources. This sort of thing would be most useful if enterprises were to adopt (yet-to-be-offered) AWS Edge Computing services.

The net here is that the big news from re:Invent, IMHO, is that it’s signaling Amazon’s intentions to get full-on into edge computing, and that it’s presuming that the edge opportunity will be driven by IoT and that it will be connected by what’s essentially “Amazon NaaS”, a point I’ll address in another blog this week. Amazon’s decision to get into the edge business also suggests it will be getting more into the network business, and looking to make “NaaS” a true cloud service, even including transport. That surely puts the network operators on notice; start thinking seriously about NaaS or watch it tap off your revenue rather than augment it.

Telco Fall Tech Planning Cycle: Results

Every year, network operators do a technology planning review in the fall, usually between mid-September and mid-November. The purpose of the review is to identify the technologies that will compete for funds in the yearly budget cycle that usually starts after the first of the year. I’ve tracked these review cycles for decades, and I got a fair number of responses from operators regarding the topics of their planning this year. Now’s the time to take a look at them, and to do that I need to do a bit of organizing of the responses to accommodate the differences in how operators approach the process.

My normal operator survey base includes 77 operators, but they don’t all have a formal fall planning process. I asked the 61 operators who do, from all over the world, for their input. Of those, 56 responded with information, and for 45 of these I got data from multiple organizations and levels within the operator. CTO and CFO organizations responded most often, with the CTO in the lead with a response from every operator who responded. Where both the CTO and CFO responded, I got responses from at least one other organization, and CEO data from about a third of the 45. We get our data under strict NDA and data use restrictions, so don’t ask for the details, please!

The one universal technology priority, one every major network operator cites, is the evolution to 5G technology and related issues. 5G is the only new technology that actually has universal budget support. Consumer broadband (wireline) is the second (with support among almost three-quarters of operators, excluding only the mobile-only players). Other than these two technology areas, nothing else hits the 50% support level, and that’s a point deserving of comment in itself.

CEOs and CFOs who commented believed that what I’ve called the “profit-per-bit” squeeze was their company’s biggest challenge. They see ARPU growth in mobile and wireline, business and consumer, as very limited, and yet they see their costs ramping up. I’ve tended to use the revenue/cost-per-bit curves as my way of representing the challenge (because the first operator who told me about this over 20 years ago used them), but today the CFOs draw the charts as a slowly growing ARPU contrasted with a faster-growing cost curve. Most don’t forecast a dire crossover, but about half suggest that the gap between the two, which is a measurement profit, is at risk of shrinking to the point where they need something to happen, and it’s the “something” that sets the planning priorities.

Ten years ago, two-thirds of operators believed that reducing capex was the answer, which is what drove the NFV movement that started about then. Five years ago, just short of two-thirds believed that reducing opex was the answer, and in fact the majority of the actual changes made by operators in the last decade to fend off the convergence of our curves were attacking opex, usually head count.

This can still be seen in the responses of operators at the VP of Operations level. They want to sustain their current service base by controlling equipment costs through a combination of pushing out purchases and hammering for discounts. They want to reduce operations cost by improving the tools, including wider use of artificial intelligence. There is fairly strong interest (62%, driven mostly by an 85% interest in open-model 5G) in alternative architectures for network equipment, but that interest is focused on open-model networking, which they see as a way of getting “discounts” from vendors as much as of getting gear at a lower price because of direct competition and leveraging commercially available hardware.

The CIO organization, responsible for OSS/BSS, is necessarily focused on opex benefits. While almost half of CEOs and CFOs are asking the question whether OSS/BSS should be scrapped in favor of something (as yet unspecified) new, the CIOs are reaping the benefit of the shift in focus to opex reduction. Among this group, the top priorities are customer care and online ordering, with the latter being considered as “significantly” complete and the former as “at less than half its full potential”.

Customer care views are sharply divided between wireline and wireless business units, of course, and it’s really the latter that received the planning focus this fall. Over three-quarters of operators said they believed that customer care was the largest reason for churn, and also the second-largest reason why wireline users selected a competitive service (price was still on top). Since customer acquisition and retention is the largest component of “network process opex”, it’s clear that addressing the issue is critical. CIO organizations all believe that there could be improvements made to customer care portals to reduce user frustration. Progress at the business service level has been rated as “good” but not so the consumer level.

The problem with consumer customer care is a point of connection between the operations people and the CIO people, because wireline customer care is most likely to involve field service people. Consumers aren’t likely to be able to do much on their own to help fix a network problem, and most probably can’t even play much of a role in diagnosing one without an online wizard (presumably accessed via a smartphone not the failed broadband connection!) to guide them. That’s the area where the CIOs are focused.

Field service, the “craft” people usually considered part of operations, likes the idea of a smartphone wizard, and about half of operators plan to work to improve this capability in 2022. Over a third say they already have a good strategy here, and the remainder think they’ll need more than just a year to get one ready.

This isn’t the only place where multiple operator organizations have symbiotic interest. Open-model networking is where operations and the “science and technology” or CTO organizations converge. Open-model networking represents the number-one CTO-level priority outcome of the fall session this year. As it happens, 5G is also the focus of interest for the product/service management organization, where the interest in increasing ARPU is the highest. That, in my view, makes open-model networking the most important technology point in the current cycle.

The operations organization, as I’ve suggested, is really behind the open-model networking idea as a path to getting better discounts from the current vendors, not actually shifting to open-model infrastructure. It’s like getting a collective low bidder, from a network model that’s inherently based on commodity technology with no incumbents, no lock-ins. While operations people typically don’t say “all I want is to squeeze a bigger discount”, their points of interest seem a careful balance between the competitive benefits of open-model networking and the risk of a nobody-responsible wild west.

The CTO organizations are primarily concerned about how an open model comes about. Since all the standards activities aimed at open-model networks (including NFV) came out of the CTO groups, these people are motivated to defend these early initiatives, but at the same time painfully aware that they’ve done very little to advance the open-model goal, despite the fact that open-model networking gets a planning priority nod from over 90% of operators’ CTO organizations. It’s one thing to define a model, but an open-model network requires that it actually work well enough to deploy widely. If NFV isn’t it, then what is, and what gets to it?

Interestingly, less than a third of CTO organizations said they believed that another standards effort, either within the NFV ISG context or outside it, was the answer. Nearly all said a new organization would take too long, and over half thought that it would also take too long to get NFV cleaned up. In fact, no “positive” approach got even fifty-percent support. The most support (47%) came for the idea that 5G O-RAN work would evolve to create an open model, but nobody expressed any specific notion of how that would happen. O-RAN, after all, is about RAN, not 5G overall, and 5G isn’t all of open-model networking.

If you pull out the viewpoints of technical planning staff people across all the operators’ organizations, the sense I get is that the experts think that “the industry” or “the market” is evolving to an open-model strategy, and that the evolution is being driven by much the same forces that have combined to create things like Linux, Kubernetes, Istio, and LibreOffice. In other words, open-source equals open-model. CTOs assume that open-source software models will develop, as O-RAN did, and that a dominant model will appear. That model will allow for variations in implementation but will set the higher-level features and likely the APIs.

The product management groups are divided between those who believe that enhancements to connection services can increase ARPU (41%) and those who think that only higher-layer (above the connection) services can create any significant revenue upside (57%). A small number believed in both approaches, and an even smaller number didn’t think either was worth considering.

I think that Ericsson’s recent M&A is aimed at the product management interest in new service revenues as a means of driving ARPU up. There is broad support for a new revenue strategy (77% among the planners involved in the fall planning said they thought new revenue was “essential” and almost 100% thought it was “valuable” in improving profits), and it’s interesting that Ericsson linked the Vonage acquisition to 5G, which is a budgeted technology. They likely see that operators would jump on something that could help them in 2022, and packaging the stuff needed for enterprise collaborative services could be at least credible.

This year, operators also had a specific “imponderables” focus in their planning. The obvious top of that list is the impact of COVID, and the planning cycle and surveys were complete before the announcement of flight cancellations and travel restrictions associated with the new Omicron variant. If there turns out to be a combination of a big winter surge in COVID worldwide (there already is in some areas) and if Omicron turns out to be higher-risk than Delta (particularly, if it’s vaccine-resistant) then we can assume we’ll see WFH boom again. If not, then we can assume a continued return to normalcy in communications needs. Operators are watching this all, but obviously they can’t yet make decisions.

Much of this year’s planning cycle focused on issues that were also discussed last year, meaning that we didn’t have a decision or weren’t able to implement it. This year, almost 60% of operators thought that they probably wouldn’t be able to address their planning issues satisfactorily in 2022 either, and that they’d still be trying to address most of these points in next year’s planning cycle. I think the laissez-faire approach to open-model networking that I recounted is, like this broad pessimism, a result of operators recognizing that they aren’t building demand by building supply, and that someone has to learn how to do both, properly. That’s progress, I guess, but those operators are still looking for someone else to do the learning, and the job.

There were no spontaneous statements to suggest operators were really seeing the responsibility for network and technology change any differently. They still see themselves as consumers rather than developers of technology, and their role in network evolution as being standards-setting, largely to prevent vendor lock-in. Even operators who have actually done some software development, and who plan to do more, are still reluctant to admit that they’re doing “product” work, and that explains why they tend to cede their software to an open-source group as quickly as they can. They admit that once this is done, their own participation is more likely to ease than to increase.

The network is changing, and the role of everyone who’s a stakeholder is doing the same. Some admit it, even embrace it, but not operators. That’s the biggest weakness in their planning process; they’re not planning for the real future at all.

How Smart Chips are Transforming Both Computing and Networks

I don’t think there’s any disagreement that network devices need to be “smart”, meaning that their functionality is created through the use of easily modified software rather than rigid, hard-wired, logic. There is a growing question as to just how “smartness” is best achieved. Some of the debate has been created by new technology options, and the rest by a growing recognition of the way new missions impact the distribution of functionality. The combination is impacting the evolution of both network equipment and computing, including cloud services.

To most of us, a computer is a device that performs a generalized personal or business mission, but more properly the term for this would be a “general-purpose computer”. We’ve actually used computing devices for very specific and singular missions for many decades; they sent people into space and to the moon, for example, and they run most new vehicles and all large aircraft and ships. Computers are (usually) made up of three elements—a central processing unit (CPU), memory, and persistent storage for data.

Network devices like routers were first created as software applications running on a general-purpose computer (a “minicomputer” in the terms of the ‘60s and ‘70s). Higher performance requirements led to the use of specialized hardware technology to manage the data-handling tasks, but network devices have from the first retained a software-centric view of how features were added and changed. All the big router vendors have their own router operating system and software.

When you attach computers to networks, you need to talk network to the thing you’re attaching to, which means that network functionality has to incorporated into the computer. This is usually done by adding a driver software element that talks to the network interface and presents a network API to the operating system and middleware. Even early on, there were examples of network interface cards for computers having onboard intelligence, meaning that some of the driver was hosted on the adapter instead of on the computer.

The spread of software hasn’t meant that hardware spread was halted. Specialized chips for networking have existed for decades, and today switching and interface chips are the hardware soul of white-box devices. In the computer space, specialized chips for disk I/O and for graphics processing are almost universal. It’s not that you can’t do network connections, I/O, or graphics without special chips, but that you can do them a lot better with those chips.

So why are we now hearing so much about things like smart NICs, GPUs, IPUs, DPUs, and so forth? Isn’t what we’re seeing now just a continuation of a revolution that happened while many of today’s systems designers were infants, or before? In part it is, but there are some new forces at work today.

One obvious force is miniaturization. A smartphone of today has way more computing power, memory, and even storage than a whole data center had in the 1960s. While the phone does computing, graphics, and network interfacing, and while each of these functions are chipified individually, there’s significant pressure to reduce space and power requirements by combining things. Google’s new Pixel 6 has a custom Google Tensor chip that replaces traditional CPUs and incorporates CPU, GPU, security, AI processor, and image signal processor functions. IoT devices require the same level of miniaturization, to conserve space and of course minimize power usage.

Another force is a radical revision in what “user interface” means. By the mid-1980s, Intel and Microsoft both told me, over two-thirds of all incremental microprocessor power used in personal computers was going to the graphical user interface. That’s still true, but what’s changed is that we’re now requiring voice recognition, image recognition, natural language processing, inference processing, AI and ML and all those other things. We expect computing systems to do more, to be almost human in the way they interact with us. All that has to be accomplished fast enough to be useful conversationally and in the real world, and cheap in terms of cost, power, and space.

Our next new force is mission dissection, and this force is embodied by what’s going on in cloud computing. The majority of enterprise cloud development is focused on building a new presentation framework for corporate applications that are still running in the usual way, usually in the data center, and sometimes on software/hardware platforms older than the operators that run them. The old notion of an “application” has split into a front-end/back-end portion, and the front-end piece is a long way from general-purpose computing and the back-end piece a long way from a GUI. In IoT, we’re seeing applications broken down by the latency sensitivity of their functions, in much the same way as O-RAN breaks out “non- and near-real-time”.

The final new force is platform specialization encouraging application generalization. We separated out graphics, for example, from the mission of a general-purpose CPU chip because graphics involved specialized processing. What we often overlook is that it’s still processing that’s done by software. GPUs are programmable, so they have threads and instructions, and all the other stuff that CPUs have. They just have stuff designed for a specific mission, right? Yes, but an instruction set designed for a specific mission could end up being good for other missions, not just the one that drove its adoption. We’re seeing new missions emerging that take advantage of new platforms, missions that weren’t considered when those platforms were first designed.

What do these forces mean in terms of the future of networking and computing? Here are my views.

In the cloud, I think that what we’re seeing first and foremost is a mission shift driving a technology response. The cloud’s function in business computing (and even in social media) is much more user-interface-centric than general-purpose computing. GPUs do a great job for many such applications, as do the RISC/ARM chips. As we get more into the broader definition of “user interface” to include almost-human conversational interaction, we should expect that to drive a further evolution toward GPU/RISC and even to custom AI chips.

The edge is probably where this will be most obvious. Edge computing is all about real-time event handling, which again is not a general-purpose computing application. Many of the industrial IoT controllers are already based on a non-CPU architecture, and I think the edge is likely to go that way. It may also, since it’s a greenfield deployment, shift quickly to the GPU/RISC model. As it does, I think it will drive local IoT to more a system-on-chip (SOC) by unloading some general functionality. That will make IoT elements cheaper and easier to deploy.

At the network level, I think we’re going to see things like 5G and higher-layer services create a sharp division of functionality into layers. We’ll have, to make up some names, “data plane”, “control plane”, “service plane” (the 5G control plane would fall into this, as would things like CDN cache redirection and IP address assignment and decoding), and “application plane”. This will encourage a network hardware model that’s very switch-chip-centric at the bottom and very cloud-centric (particularly edge-cloud) at the top. I think it will also encourage the expansion of the disaggregated cluster-of-white-boxes (like DriveNets) model of network devices, and that even edge/cloud infrastructure will be increasingly made up of a hierarchy of devices, clusters, and hosting points that are all a form of resource pool.

What’s needed to make all this work is a new, broader, notion of what virtualization and abstraction really mean, and how we need to deal with them. We know, for example, that a pool of resources has to present specific properties in order to be suitable for a given mission. If we have a hundred servers in New York and another hundred in Tokyo, can we say they’re a single pool? It depends on whether the selection of a server without regard for location alters the properties of the resource we map to our mission to the point where the mission fails. If we have a specific latency requirement, for example, that wouldn’t likely be the case. We also know that edge computing will have to host things (like the now-popular metaverse) that will require low-latency coordination across great distances. We know that IoT is “real-time” but also that the latency length of a control loop can vary depending on what we’re controlling. We know 5G has “non-real-time” and “near-real-time” but how “non” and “near” aren’t explicit. All of this has to be dealt with if we’re to make the future we achieve resemble the future we’re reading about today.

We’re remaking our notion of “networking” and “computing” by remaking the nature of the hardware/software symbiosis that’s creating both. This trend may interact with attempts to truly implement a “metaverse” to create a completely new model, one that distributes intelligence differently and more broadly, and one that magnifies the role of the network in connecting what’s been distributed. Remember Sun Microsystems’ old saw, that “The network is the computer?” They may have been way ahead of their time.