To Plot the Course of Network Change, Follow the Money

Last week, I mentioned that my input from enterprises suggested that 26% of enterprises had shifted their view of how to optimize network purchases from “new technology” to “hammer for discounts”. Monday, Nigel, Maven, and did a podcast about Juniper versus Cisco and about Arista and Palo Alto, and I commented that you can’t go head to head against Cisco, you have to fake them out. What I want to blog about today is how those points all intersect.

Let’s start by following the money. The shift in what network users think will help them in cost management is more significant than it might seem, because it’s tied to a broader change in how network spending is managed, and that larger change may indicate that new network technology will be harder to bootstrap in the future. That’s good news for incumbents, but potentially a risk for startups.

I’ve tracked enterprise network spending for decades, and through the entire period I’ve noted that networks are paid for through a combination of budget spending and project spending. Budget spending is the contribution that pays for sustaining the network-that-is, covering orderly upgrades and modernization. Project spending is the contribution that pays for everything else, things that require a separate business justification. In nearly all cases, it’s project spending that has to fund changes in technology.

Budget spending is usually driven in large enterprises by an RFP process with varying degrees of formality. The point is that the enterprise knows what they need to buy, what specifications are required, and they’re asking for prices. Think “Give me a bid” and you’ve got it nailed. Most successful bids, over the last 30 years or so, have conformed explicitly to the framework of the RFP. Over that period, less than 4% of the successful bids made any significant modification to the requested equipment framework, and over the last 15 years, the number is less than 3%.

Project spending is usually driven by an RFI, which presents vendors with a requirement set and sometimes some general constraints, and then calls for a solution and bid/pricing. Behind the project spending is of course the project, which is a set of business benefits that are assigned a value, a set of costs, and a calculated return on investment that has to meet corporate targets. Over the same 30-year period, almost half of RFI awards went to vendors who actually tweaked requirements and constraints somewhat, though that number has also dropped (to 39%) in the last 15 years.

If we forget technology for the moment and look at the process of converting a prospective buyer into a committed buyer, we see a pretty significant difference between budget and project spending. Budget spending is strongly influenced by sales teams. Not only are they usually the players who will receive the RFPs, they’re also often in a position to influence how they’re developed, obviously in their favor. The term “wired” is used to indicate an RFP that a vendor has influenced strongly in their own favor. Project RFIs, in contrast, tend to be driven by vendor marketing. In 62% of all RFIs I’ve assessed in the last 30 years, the project request came about because a line organization need and a technology capability known to senior management through media coverage or other non-sales channels intersected. That percentage has actually been growing steadily; in the last 15 years almost 70% of projects are launched that way. By the way, in the remaining cases the driver was a vendor sales team from the vendor who had account control.

Over the last 30 years, there’s been a major shift in the balance between budget money and project money in networking. For the first ten years, the two were roughly balanced, and this period represented the period of greatest changes in network technology and in vendor market share. In the next 20 years, the balance has slowly (and a bit erratically) shifted to the point where in 2021 only 13% of money was from projects. My data for 2023 suggests that number will drop, perhaps even to single digits, and that is likely to generate some significant forces on the network market overall.

The first of these forces is the force of inertia. Absent project money, history says that buyers are less likely to either adopt new technology strategies or even change vendors. Market share shifts less when project funding is suppressed, purchases are more likely to be pushed back during times of uncertainty because the option to “not refresh” infrastructure is always on the table, where network projects are often linked to things outside networking or even outside IT, which are difficult to derail.

The second force is the force of cost management. When budget dollars are so dominant, there is almost no interest in feature differentiation, only price. The presumption is that the technical status quo is being refreshed, and nobody even wants to consider rocking the boat by trying to introduce changes that would elevate consideration and approvals.

The third force is the force of negative reinforcement. Vendors tend to respond to budget-focused spending with product cost-cutting, which limits innovation and reduces the chance new features or capabilities will be introduced. As a result, there’s less chance that project spending will be stimulated, which continues to focus vendors on cost-cutting and so forth. Only a technology revolution brings the market out of this loop.

Our final force is the force of quota. Salespeople have sales quotas to meet, and they have to manage their time so that they can make their numbers for the quarter, and support their companies’ attempt to do the same. For decades, salespeople have told me that they dread the “educational sell”, meaning that they dread having to spend time explaining why a particular technology should be purchased, starting with how it works, how you transition to it, and so forth. They’re not paid to be educators, they’re paid to get orders.

What this brings us to is simple if a bit disheartening. Networking has fallen into a trap of their budgets’ making. Not so much the quantity of budgeted funds, but the fact that budgets are by nature a commitment to preservation and not to advancement. If the combined effect of our four forces isn’t broken, then networking will end up being a total commodity, both at the infrastructure level and at the service level, and that would have profound impacts on every player involved in the space.

The root of the issues networking faces is the same for computing, it’s complexity. The missions we set for both networking and computing have expanded radically in the decades since networking and computing started their dance of symbiotic association. We used to have technical solutions to challenges, and we now have components of technical solutions because it’s increasingly hard to grasp the scope of the missions and what we would hope to use to address them. What I’d hope to see is a re-framing of both the compute and network models to package the pieces we now believe are essential, and if we can do that we’ll promote a new age of growth in both spaces.

The Real Reasons the Cloud Doesn’t Always Save Money

Why aren’t we saving money with cloud computing? If you chart the trajectory of IT spending, it’s not trending lower but higher. Given that “you save money adopting cloud computing” is the mantra of the age, why aren’t we seeing the savings? InfoWorld did an article on the topic, and what I found interesting was that the article didn’t address the biggest question of all, which is whether we’re trying to save money in the first place.

The InfoWorld piece offers three reasons why we don’t save money on the cloud, and they all relate to practices that can contribute to higher-than-necessary spending on our cloud applications. OK, yes, those reasons are important in optimizing cloud use, but are they the thing that could turn that upward IT spending trajectory downward? I don’t think so, not by a long shot. And even in those three reasons we overspend on the cloud, there are hidden factors that need to be addressed if even those limited remedies are to work.

I’ve blogged regularly on cloud computing, and I won’t bore you all with details on my view. To summarize, I believe that enterprise cloud adoption has been driven by the need to optimize businesses’ relationship with their prospects and customers by creating web-and-app-friendly portals into current applications. We’re not moving things to the cloud, we’re using the cloud for doing things we’ve never done before. To expect that we could cut IT spending with the cloud under these circumstances is not and never was realistic.

The cloud is a cheaper way to do the user-centric application front-ending that businesses know is required to improve stakeholder engagement with information. Give users a convenient portal into the world of stuff to buy and they’ll use it. Same with partners and employees. The Internet and smartphones created that portal, so the shift to agile Internet-centric presentations of products and services was inevitable. The cloud wasn’t supposed to cut costs, but to cut cost increases that could have crippled the customer-facing initiatives of companies had they been implemented in the data center. The “hybrid cloud” that you’ve been hearing a lot about recently is actually the only kind of cloud the overwhelming majority of enterprises need or want.

That cloud costs should be compared to equivalent data center costs for the same application evolution is the first critical truth in managing cloud costs. It’s the first axiom, in fact, of cloud economics. The second axiom (which those who remember high-school math will recall is a “self-evident truth” is that we have ignored the first axiom in our assessment of cloud cost-effectiveness. The InfoWorld article proves that point.

In geometry you can build theorems from axioms, and we can do it here too. My first proposed theorem is that because we’ve not actually understood the cloud we’ve been planning for and adopting, we’ve not taken all the steps possible to optimize it. How could we, without knowing where we were really heading?

If the primary mission of the cloud is to host application components that enrich prospect/customer interaction (we’ll get to other missions later), then the first thing that needs to be done is to decide what information and processing is needed in the cloud to do that. Remember that web sites and content delivery networks can deliver product information. It’s when you start moving toward the “sale” end of the prospect-to-customer trajectory that you need to have information about detailed pricing, stock and availability, and eventually to convert interaction into transaction. Richer interactions then push some data outward from the data center.

A good example is editing and validity checking. Transaction processing often includes editing prior to applying the transaction, but if you’re going to push the transaction steps outward, that’s likely done in the cloud. If you want to track interest in products and from prospects and customers, you likely want to have a sign-on step, and that creates a link to the account and account history. But even during pre-sale interaction you may want to push stocking information, shipping dates, and so forth outward to avoid having to hammer the data center system while people browse around.

The point here is that whenever you have a boundary between technology options, a boundary that includes different economic tradeoffs in particular, you need to consider things on both sides of the boundary in order to optimize the system overall. We can’t change the cloud without considering the data center model in play, and the opposite is likely even more true. Enterprises often fail to do that, and in that failure they sow the seeds of inefficient cooperation between two essential pieces of the same application.

My second theorem is that cooperation between hybrid cloud elements can generate significant costs in itself. Traffic in and out of the cloud is usually associated with some charges, and storing information in the cloud can do the same thing. This, in addition to the costs associated with the application model in play in both places, the cost to make changes, etc. It’s very easy to forget these costs, and even easier to focus only on one without even considering the other. One enterprise told me they moved a database to the cloud to reduce cross-border traffic charges, only to find that the cost of keeping the database updated and the cost of storage exceeded the crossing charges.

Enterprises tend to get tunnel vision on cloud costs, meaning that they often focus on a cost they’re aware of and ignore the cost of alternatives, even to the point of forgetting to determine if there are any alternatives. This is particularly true of the traffic charges associated with the cloud. Those charges are based on the number and size of message exchanges and data movements, and often the cost of refreshing a database is higher than the cost of keeping the data in the data center and paying for traffic.

My final theorem is that sometimes you just have to do the application over. If a hybrid application is the rule, and if the normal model of development is to use the cloud to absorb the new requirements, is there not a point where limitations in the quality of experience and sub-optimal costs justify rebuilding the application completely? The fact is that the relatively small number of applications that have “moved to the cloud” and gained the business value that was hoped for moved because of this theorem.

One good example is the whole PaaS/Salesforce story. Most companies have things like CRM and ERP applications running in the data center already, and it’s easy for them to front-end these with cloud components to enhance the user experience. However, because the data involved is often limited in volume and scope of use, it makes more sense to simply use a cloud-resident application and forget the hybridization.

There are measures that cloud users can take to better monitor costs and tweak their cloud options. These are good ideas, but they can’t address problems with the application model that’s being deployed. It’s critical, to optimize cloud costs, to optimize the way an application is structured and how processing and traffic are balanced between cloud and data center. If the starting strategy for your cloud application is wrong, all the tuning and tweaking of options in the world won’t fix it.

What Kind of Place will the Metaverse Be (and When?)

I think it’s pretty clear that Meta knows its “social metaverse” concept has a few too many moving parts to be available quickly, and so to help their bottom line. I think that competitors like Microsoft, Apple, and Amazon know that, which is why gaming companies are getting M&A attention. I think that a different set of metaverse applications are going to lead in its adoption, and the question is whether these applications will tap off so much value that the ultimate social-metaverse vision won’t arrive at all. The technical question is what differentiates social metaverse from multi-player games.

The short answer, as I pointed out in THIS blog, is the concept of “locale”, the term I used to describe a virtual “place” where avatars representing people can assemble to interact. The essential concept of locale is what the name suggests, which is localization. We see a slice of the world around us, limited by the features that bound our visual field and our ability to recognize and communicate. We might see people a half-mile away, but we can’t recognize them or talk with them. My locale is a zone of interaction, and those whose avatars are within it can interact with my avatar.

What I’m calling a “social metaverse” is a translation of social-media “friends” into a form of locale. The problem that can create is that people have a lot of social-media friends and if a mass of them assemble in some virtual place, the behaviors of each of their avatars have to be maintained in synchrony or the result won’t be realistic for some, most, or perhaps all.

What differentiates multiplayer games from social metaverses is that the game framework defines the locales, and can (and usually does) limit the number of players who can inhabit one. In addition, games typically impose behavioral constraints on the avatars, and the players are made to work within them. The game sets the framework of interaction, and players have to follow it, limiting their behavior to it. In social metaverses, the presumption is that the players have fairly unlimited interaction potential. You can see that by the fact that Meta releases guidelines on how closely avatars could approach each other, presumably to prevent inappropriate interactions in the metaverse. All this means that games are easier to synchronize than social metaverses.

Meta’s recent focus on educational aspects of the metaverse have the same effect as multiplayer games have on limiting interactions. A simple educational metaverse, like the example of learning about dinosaurs, is really like a one-player game, something I described as a “metaverse of one”. Here there is no issue with interaction because only one player has a representative in a locale, everything else is generated.

But not all multiplayer game models dodge the potential issues of realism and synchronization. If we imagine a player in New York and one in Japan, it’s easy to see that the latency accumulated by the combination of the connections and the processing of the shared locale appearance would be significant enough to mean that even simple interactions (shaking hands, helping to move something) could be rendered awkward and unrealistic by latency.

All this seems to be leading to a shifting of priorities with regard to metaverse. Rather than creating a virtual world (which some, particularly startups, have indeed done), the bigger players are working on making virtual reality more realistic. Think for a moment about mirrorless cameras, which are sweeping the market even in the prosumer and professional levels. If the viewfinder doesn’t render a fairly realistic image, with no significant lag when the camera or the object moves, the photographer will value the camera less, likely enough less to consider staying with the direct-optics system of DSLRs. The same thing is true with virtual reality glasses; if you don’t get a realistic sense of presence with them, then the whole experience is compromised. And VR glasses are thus essential for even metaverse-of-one experiences and surely for multiplayer games. Everyone is now working on a VR strategy.

And no one is working, really working, on the issue of synchronization realism within a locale. They’re defining applications whose locales are limited naturally so they don’t have to. The question is whether that’s a limitation that will eventually evolve away over time, or whether the real value of the metaverse doesn’t and won’t lie in the social-metaverse model at all.

The problems with creating a realistic social metaverse with minimal limitations on the number of users who might populate a locale relate both to the assembly of a “reference model” for the locale and the synchronization of the views and behaviors of each user with that model. Each user has control over their respective avatars, which can move about subject to any behavior constraints the metaverse might impose. If there is no reference model shared by all users, then there is no way of obtaining per-user views that will be consistent, which means users could not interact because they didn’t see the same things.

It is possible that where users were distributed regionally, “sub-models” for each region might be used to aggregate the synchronization of all users within the region with a reference model. However, this could increase the latency associated with updates to the reference model, and also potentially with the distribution of user views. I think it’s clear that in the optimum approach, each user would interact with the reference model to update the user’s avatar representation (location, motion, aspect, gestures, etc.) and to receive current views of the locale based on the avatar’s representation within it (if the avatar is facing north, it can only see what’s north, for example, and it could not see someone or something behind an opaque object).

The problem of the reference model and interaction with it relates to a combination of processing resources and message latency. The more populous the locale, the more complex the reference model and the derivation of per-user views would be. This alone would impact the cost of hosting the reference model. The more geographically diverse the user base is, the more latency and latency variation would be experienced, making synchronizing behavior within the locale more problematic.

My conclusion based on these points is that a social-metaverse system that does not limit the population of locales, the range of behavior within each locale, and the geographic distribution of the users whose avatars make up the population would require highly meshed sites with a low latency between any given point in the service area and the hosting point for the reference model. Providing this mesh networking is obviously beyond the power of a metaverse provider, or even a single ISP, so it would come about only through a general modernization of global network connectivity. That could likely arise only because each operator saw an opportunity associated with the modernization, either from the evolution of “local-to-them” metaverse activity or through broadening acceptance of a social metaverse.

That raises the biggest question for the social metaverse, which is whether the concept has any real value. Meta’s Facebook application has suffered from competition generated by platforms like TikTok, which are considerably less immersive. Facebook, as some of my young friends tell me, is almost a commitment unto itself. Twitter and TikTok are more episodic, meaning they’re well-suited for casual and occasional interactions. It’s hard to imagine a metaverse that’s not immersive, so can a social metaverse compete? Is a social metaverse, even if technically possible, likely to create any advertising payback? Finally, is any form of social networking that can generate ad payback going to end up having to be immersive, which will then make it eventually fall prey to less immersive alternatives?

Shades of biological systems, but it sure seems like the only path forward for the social metaverse is evolution. Emergence of metaverses with limited locales could gradually expand within the footprint of an operator, and in that limited application could perhaps encourage improvements in latency and deployment of edge resources. That could lead to federation among operators, much as the Internet has, and an eventual network/compute framework suitable for even the most extravagant social metaverse.

If anyone wants to live there.

Casa, Google, Carrier Cloud, and Edge Computing

Google and Casa Systems are partnering to create a cloud-native 5G Core and also a strategy for multi-access edge computing (MEC). The service/offering will be available to network operators (telcos, cable companies) and also to enterprises who want private 5G. To say that this raises some interesting questions is an understatement, so let’s dig in to address those questions and hopefully offer some answers.

The “Why now?” question surely tops the list. After all, it’s not like 5G Core has been on the lips of users of 5G services. In fact, most users can’t even name a feature of 5G that depends on 5G Core. Network operators have generally supported the implementation of 5G Core, but not necessarily with a high priority. About half of operators tell me their top reason for planning a 5G Core deployment is competitive; they’re afraid that even network users who don’t know what 5G Core is will still be sensitive to a marketing tale from a competing operator who offers it.

The answer to the question is also more “competitive-reasons” than anything else. Public cloud providers know that larger network operators are unlikely to be prepared to host 5G functions on their own edge facilities across the full geography where they provide service. Some don’t want to host them at all, and so any cloud provider needs to be able to respond or they risk losing what could be a significant amount of business.

OK, that wasn’t too difficult. Next question: Why offer 5G Core to enterprises? Any enterprise application of 5G is rare, but the great majority of them would be focusing on 5G NR, the new-radio component that offers support for 5G devices and higher per-cell and per-device capacity. 5G Core implies a distributed 5G footprint, which most enterprises would not have or need.

Two answers here: Competition and government. What public cloud provider with a 5G strategy isn’t including enterprise or “private” 5G in their story? ‘Nuff said. As for government, that’s the community of 5G adopters who might actually be 1) real and 2) consumers of 5G Core services. Perhaps large city governments, but more likely state/province or national governments, especially the military.

Government deals might be the early 5G Core adopters, in fact. Network slicing end to end is surely useful in government applications, not only military and security but also public works. First responder network services could be offered by 5G operators using slicing, but also by the governments themselves.

Next question: Why a partner deal rather than a player going it alone? Casa and Google Cloud had previously partnered for 5G Standalone (SA), which includes 5G Core, with Casa providing the software and Google (obviously) the cloud and cloud integration expertise. The two companies needed each other for this, and in fact had a history of cooperation with what was essentially the same technology set.

Casa also did a deal with Verizon to provide 5G Core technology to support Verizon’s MEC deployment, and that is the bridge between those questions dealing with the announcement itself and the questions that relate to the implications of the deal overall. Recall that the Casa announcement of its deal with Google Cloud includes MEC.

MEC is one of those evolving concepts, which all too often means that the market has an elastic definition for what it is. Originally, MEC was the name given to the local-to-the-base-station hosting resources called for in the 3GPP 5G spec. More recently, the term has been generalized to mean what the Casa release on its Verizon deal says: “Mobile Edge Compute (MEC) technology moves computing resources onto cloud servers at the network edge – as close as possible to places where data is generated.” Not necessarily all the way to the base station, where the resources would offer limited economies of scale for other applications. That raises the question that matters in this announcement; is Google Cloud and Casa proposing a model for more generalized edge computing under the cover of MEC?

Edge computing is the superset of MEC, and the definition in Casa’s announcement that I quoted above is really IMHO an edge computing definition. Edge computing is a kind of hybrid of the cloud and on-premises distributed computing, because “the edge” and “as close as possible” include situations where proximity can be optimized only by putting the computing resources in the same facilities as the devices that use them. This is the almost-classic IoT model, where a production line or warehouse has its own edge server(s). However, all edge resources wouldn’t necessarily be on premises. The sweet spot for shared (public) edge services, and for network operator edge services related to adding value to basic connectivity, would be the metro center. Public cloud facilities in specific geographic areas would also be able to provide edge hosting, and the optimum location would depend on latency and cost requirements.

Creating logic that could be moved easily across a range of hosting locations likely starts with containers and container orchestration, meaning Kubernetes. It may be coincidence, but Google sponsored an open-source initiative, Nephio, whose mission is “to deliver carrier-grade, simple, open, Kubernetes-based cloud native intent automation and common automation templates that materially simplify the deployment and management of multi-vendor cloud infrastructure and network functions across large scale edge deployments.” This statement seems to hit the same buttons as the Casa/Google Cloud announcement and Casa’s work with Verizon.

If Google Cloud and Casa are really framing out a strategy for edge computing that would also serve as the framework for carrier function hosting, which is that Nephio could be, then it could be that Google is seeing edge computing as its opportunity to stand out from other public cloud services. Given that Amazon seems to be focused on the OTTs and Microsoft on enterprise cloud, Google could focus on “carrier cloud” and perhaps gain some market share. Carrier cloud is a geekier market, too, which plays to Google’s strength.

The big questions with this are 1) whether Google/Casa can gain traction at this point, given that there is still no clear business case for 5G Core and that a slow-roll deployment would afford others plenty of time to stake out competing positions, and 2) whether traction in carrier cloud can translate to edge computing, given that carriers don’t seem to have any real edge ambitions beyond 5G.

I think Casa has enough credibility with the Verizon deal to hold a position in the market, which would answer the first question. But Casa is operator-focused and they can’t make the operator-horse drink at the edge-waterhole. Google has the smarts to come up with a great edge computing model, but they’ve been consistently weak in marketing/positioning for their cloud offering. Thus, we can’t answer the second question with any decisive positive. That means that the venture, with all its promise at the operator level, is still at risk.

The Cloud is NOT Going Away

You’ll be happy to know that a veteran VC believes that cloud computing is here to stay. Given that I’ve blogged about issues emerging in cloud computing, you might think I’d be rebutting this view. Not so; I’ve never suggested that cloud computing would go away, but I have said that “everything will move to the cloud” is also nonsense. I’ve been trying to get enterprise insights on why they use the cloud, and the results may not be as definitive as I’d like, but they’re interesting.

Let’s start with some basic comments on cloud economics. Enterprises tell me that if they compare the cost of a single VM in their own data center to a VM instance in the cloud, each in the context of actual usage, the latter will cost between 7% and 13% more. Where enterprises have virtualization in their data centers, they can achieve nearly the same economies of scale as a cloud provider, and of course the cloud providers have to earn a profit, so they mark up their services. A total transformation to the cloud would increase costs, even not considering the cost of making legacy applications run there.

OK, then, what we’re seeing is a “partial” transformation. What were, and are, the specific shifts in computing that have led to cloud adoption? Where are those trends taking enterprises?

One early trend driving cloud adoption was server consolidation. Back in the year 2000, enterprises said that about 19% of their servers weren’t in the data center, they were scattered around in departments, branch offices, and so forth. I couldn’t get universal data on utilization, but those companies who offered insight into just what usage levels these departmental resources experienced indicated that the total utilization was less than 30%. In addition, the cost of maintaining these decentralized servers was roughly double what was spent supporting servers in the data center. No wonder enterprises saw the cloud as an opportunity to save some money. However, by 2012, enterprises indicated that server consolidation was no longer a meaningful driver of their cloud growth.

The next driver has proved to be more durable. Workload variability has been contending for the number one driver of cloud computing growth since 2012 (we’ll get to the other contenders later). The problem here is familiar to CIOs; when there’s a radical difference in workload over time, peaks and valleys in application usage, it’s essential to size resources for the peaks to avoid congestion and QoE issues at the very times you don’t want anything to go wrong. How do you do that with capital equipment in your data center? In the cloud, capacity elasticity is a given.

A closely related contender for the cloud-driver crown is high availability, so close in fact that I group them into one category. Both workload variability and availability management demand resource spares, which means that a data center would have to be overprovisioned. Failover to the cloud and cloudbursting are faces of the same situation, which is that static server populations have to be overprovisioned in the real world, unless you can draw on the cloud to backstop their operation.

The real alternative contender is technology inertia, and this is the driver that seems to be emerging as the most significant. To understand why, we need to understand what’s happening that promotes agile technology responses, which obviously makes a lack of them a problem. The new development is a shifting in the retail sales model.

The Internet, or more specifically the worldwide web, offered companies a conduit directly to their customers. Unlike advertising, which might stir interest in something but couldn’t interact with a prospect to solidify that interest, the web could be used to present material in the order and with the detail needed. Starting in the ‘90s, web marketing became a major force in retail, something increasingly essential to sellers. Because the interconnected-federation structure of the Internet could create highly variable QoE, content delivery networks emerged and “wholesale” ISPs who served only content providers grew up.

The success of the web created its own problems, though. People wanted better web experiences, more interactivity, great graphics, and inevitably a tighter coupling between the “active marketing” mission and an emerging pre-sale and direct retail mission. They wanted apps on their phones that could be retail portals, offer them customer support and answer questions on their accounts. What was happening was that customers and prospects wanted their phones, and their browsers, to be effectively virtual representatives of the companies they dealt with. That demanded much tighter integration between the experience and the core business applications.

These tighter interactions were also highly variable in duration and trajectory, and so the “front end” of an application became separated both in terms of the technology associated with development and the resources needed for execution. At the technology level, highly interactive front-end elements are more like OTT services than traditional websites, and so development shifted to the microservice/function model that is not particularly useful in transactional applications of the kind found in data centers. At the resource level, not only resource elasticity but geographic elasticity became important.

The need to support hosting across a wide geography is the final contender for the primary diver of the cloud. Enterprises regularly support sales across a much wider area than they cover with their own physical facilities, which means the cloud provides them with a (virtual) local presence where no real option for a data center exists. While companies may have distributed shipping and even sales or support facilities in far-flung locations, these rarely have technical support staff on site, which means central administration of cloud resources could be far cheaper and more effective.

All of this means two things. First, the cloud is not going away. While early drivers have declined, some like server consolidation to the point of being irrelevant, the current drivers are rooted in global economics and geography and they’re not going anywhere. In fact, we can expect to see online shopping, both in researching purchases before going to a store and in actually making purchases, increase. But second, all our current drivers of cloud adoption focus on the user experience and not business records. The split between a cloud front-end and a transaction processing framework in a controlled data center facility is continuing. The data center isn’t going away either.

What’s the Relationship Between Orchestration and Lifecycle Automation?

What’s the difference between “orchestration” and “lifecycle automation” in applications and services? I guess it’s not surprising that there should be confusion here, given that we tend to conflate terms in tech all the time as part of a desire to reduce technical details and the size of stories in tech media. Then there’s the desire of vendors to jump on a popular concept, even if it means stretching what they are doing. At any rate, I’ve been getting a number of comments and questions on these terms, and the distinction could be very important.

The term “orchestration” and “orchestrator” were originally used to describe things associated with writing and performing music. An “orchestra” plays music, an “orchestrator” prepares music to be played, and “orchestration” is what the orchestrator does. In software and services, the terms relate to the central coordination of a series of related tasks or steps, in order to allow them to be performed automatically rather than through explicit human control.

In computing, we’ve had this concept for longer than we’ve had popular use of the term. The old “Job Control Language” or JCL of IBM mainframes is a form of orchestration, allowing a sequence of programs to be run as a “batch”. However, the notion of orchestration came into its own with multi-component software and “DevOps” or developer/operations tools. The goal was to automate deployment and redeployment of applications, both to make the processes easier and to reduce errors.

While things like DevOps were emerging on the compute site, network operators and enterprises were also looking at tools to automate routine network tasks. The TMF has a long history of framing standards and practices designed to generalize the complex task of securing cooperative behavior among network devices.

Containerization, which creates a “container” that includes application components and the configuration information needed to deploy/redeploy them on generalized resources, was an important step in sort-of-linking these initiatives, because it retains deployment tools (Kubernetes) and introduces configuration and parameterization that in many ways mirrors what happens with network devices. Google’s Nephio may be a true bridge between Kubernetes and the network.

But what about “lifecycle automation?” The critical concept in lifecycle automation is that there is a “lifecycle”, meaning that services and applications progress through a series of operating states, some that are goals in normal operation and some that represent fault modes. This progress is signaled by events that represent indicators of behavior change. Events drive state changes, and in a manual network operations center (NOC) an operations team would receive notification of these events and take actions on them to either continue the progressive changes they represent (if they’re “normal” events) or to restore normal operation if the events signal a fault. The goal of lifecycle automation is to create software that can handle these events and take the appropriate actions.

You can see from this set of definitions that there would appear to be an overlap between the two categories. Orchestration in the world of Kubernetes does allow for redeployment in case of failure, load balancing and the creation of new instances or withdrawal of old ones. Some DevOps tools like Puppet and Ansible are “goal-state” technologies that have even more of a state/event mindset. Should we be saying that the concepts of orchestration and lifecycle automation are congruent but that implementations vary on the scope of event-handling? There may be another qualification we could consider.

One is “vertical scope”. Orchestration in the software world is fairly one-dimensional, meaning that you are deploying applications. In lifecycle automation, we often look at deploying infrastructure, configuring devices that have a fixed function, and so forth. Lifecycle automation aims at the entire lifecycle, which includes everything that’s needed to support that lifecycle. Orchestration is generally used in a more limited single-layer scope.

Another possible difference is in autonomy, and this is actually a point our previous vertical scope comments raise. There really are layers of technology needed to make an application or service work. Do some or all of these layers manage themselves independent of the application or service, meaning that they are autonomous? Two examples of this are server resource pool and network management. Do we “orchestrate” software deployment by presuming that if something in either of these platform technologies fails, we’ll simply deploy again and presumably get a different resource allocated? We never have to break down other things, put other things in different places. With lifecycle automation, we may have to consider that a failure means not only replacing things impacted, but even reconfiguring the software structure to optimize things overall.

Which raises yet another possible difference, which is interdependence. Lifecycle automation should take into account the fact that changes to an application or service configuration could require re-optimization. Take out one element and you can’t necessarily replace it 1:1, you have to reconsider the overall configuration in light of the service-level agreements explicit or implicit in the application or service.

Orchestration, IMHO, truly is a subset of lifecycle automation, one designed for a subset of the conditions that application and/or service deployment and operation could present. It’s designed for a simplified set of conditions, in fact. The broader problem, the real problem, needs a broader solution.

That’s what I think hierarchical intent modeling can provide. In this approach, each model element represents a functional component that may draw on subordinate components in turn. If one of them breaks or changes, it’s incumbent on the higher layer to determine if the SLA is still met, and if it is not, to report to its superior element to allow that element to signal a reconfiguration that may or may not require tearing down what was there before. If the SLA can still be met, then our object can determine whether it needs to re-optimize its own subordinate structures or if it can simply replace the thing that’s changing/broken.

The question, then, is whether simple orchestration is ever workable in a modern world. The answer is “it depends”, which means that if we could assume that an application/service could call on a pool of equivalent resources and that those resources were not interdependent in terms of assignment, then we could say “Yes!”

I’m not sure how often we could presume those conditions would exist, and I think the chances they would will reduce over time as we create more complex applications based on more complex commercial relationships with resource providers. However, I also believe that neither the networking community nor the cloud community have addressed these points explicitly in their design. That suggests that we might build network services and cloud architectures that would simplify lifecycle automation to the point where basic orchestration would suffice. It’s a race, but perhaps between contestants who don’t know they’re competing, and whoever figures things out first may gain a durable advantage in the market.

Enterprise Network Spending: Hopeful but Fearful, Too

What impact does overall economic uncertainty have on network buyers? That’s quickly becoming a burning question as we navigate through almost-unprecedented inflation, volatile stocks, and a lot of unfavorable stories in the media. The answer, as you might expect, is pretty complicated.

My own work with enterprises suggests that very few are cutting spending for 2022, though some are slow-rolling some projects, particularly where a lot of the spending will slop over into 2023. Network budgets were up this year over last by roughly 1%, IT overall was up roughly 4% (primarily software followed by the cloud), and as many enterprises say they’re expecting to spend slightly more on IT overall as are suggesting they’re cutting back. Networking is looking a bit more problematic; even though few say they’re cutting network budgets for this year, there is a distinct air of conservatism emerging.

One solid example of this is that enterprises are telling me that their spending on networking in 2023 is likely to be slightly down versus this year. Slightly, according to enterprises, means less than 2% and likely more in the area of 1%. More significantly, the majority of enterprises say they expect to see their network spending back-loaded next year, with as much as 60% spent in Q3/Q4.

The reason for concern isn’t a return of COVID or a new and major economic or social problem, but rather a continuation of what they see as “economic stagnation”. Nothing is really bad, but nothing seems to be getting any better. This is uncharted waters for enterprises as much as it is for markets overall, including Wall Street, and that sort of thing makes companies reluctant to commit to anything really expensive or disruptive.

One example of this is that in August, about 80% of enterprises said their primary strategy for capital cost management on network equipment in 2023 would be “hammering vendors for discounts.” In April, that number was only 54%, and the 26% shift was largely due to backing off strategies to use white box switches, virtual routers, and other new technologies. Again, it’s not that enterprises are throwing in the towel on these things, but rather that they’re pushing the projects off until the second half of 2023. Why? The short answer is that in uncertain times, the devil you know doesn’t look quite as evil.

This particular shift manifests itself in the “Private 5G” story. Frankly, I never saw the rampant interest in private 5G that the media seemed to be uncovering. The majority of the interest came from companies who had been considering or were already using private LTE. What’s now happening is that the few projects I’ve heard about seem to getting pushed back, again to that magical 2H23 period. Might that be why Amazon seems to have rushed out a private 5G offering that many say isn’t really ready? A little delay here could mean a long delay.

Even security seems to be under pressure. At the end of 2021, enterprises believed, by a margin of almost 3:1, that they would likely elect to spend more on a new security technology that “advanced” their confidence in security. Now, I’m finding that enterprises want to see “significant improvements” in security before increasing their spending, and one in five enterprises say that even then they’d have to do an artful justification to get management approval. You’ve probably read pieces lamenting the slow adoption of things like DevSecOps; this is likely why. People aren’t pulling out their checkbooks just because a salesperson says “security”.

The network salespeople I chat with are, like all salespeople, natural optimists, but they admit to seeing some issues. Every one thinks they’ll “certainly” or “likely” make their numbers in Q3 and Q4 of this year, but that’s been true in years when almost nobody ended up doing that, so it doesn’t prove much. They’re even more optimistic about 2023, and that again is hardly unusual, even before a bad year. What’s a bit unusual is that in 2021, almost two-thirds of salespeople said that “the great majority” of their peers would be making their numbers in the year ahead, and as of now, only a third are confident that will be the case for 2023. It shows that salespeople, when the question of future sales is depersonalized, are a bit more concerned.

Another related metric is that at the end of last year, almost three-quarters of salespeople agreed that “a new technology would justify my calling on prospects to explain it”, and now that number is well below 50%. “I can show my prospect how to save money” is the top justification for a call, but salespeople agree that they rely less now on being able to promote a call and more on being called in or offered an RFI/RFP. Push-selling isn’t as likely to be successful as it was in the recent past.

One question this all raises is whether this is a temporary issue, or whether current uncertainty has just raised the profile of an earlier and more systemic problem. Networking has been finding it more difficult to make a business case for additional spending for almost a decade. There was a time when IT created a lot of insight that, if delivered to users, would raise productivity. The network was the bottleneck, and that resulted in a surge in investment there. Then security issues came along, and there was renewed focus on network spending to relieve them. Now, enterprises are wondering if they’ve spent enough in both connection empowerment and security, and thus it’s time to apply traditional ROI constraints.

Enterprises are still on the fence on this issue. Even at the CxO level, the majority of enterprises believe that there is still additional value to be obtained by improving networking and enhancing security. Not as much as before, perhaps, but value nevertheless. They are also eager to believe that additional business justifications for network investment can and will be found. They’re largely staying the course on 2022 budgets, and most even say that they’ll likely complete most budgeted projects for the year. Those who do any sort of technology planning even seem to be holding to their 2023 preliminary plans.

The uncertainty remains, though. Fear of a recession, fear of the flood of unusual happenings in their markets, and in general fear of the unknown has shaken enterprises perhaps more than COVID did. Yes, they felt impacted by the virus, but they believed that the problems would pass, and they did. They’re less sure about current challenges, and so they’re just starting to question a “business as usual” strategy. A lot will depend on just how much they question that going forward.

What to Expect From the Operators’ Fall Planning Cycle

In a month, we’ll hit the start of what’s traditionally been a fall technology planning period that ends mid-November. Enterprises are a bit loose with regard to formal planning (I’ll provide my impression of their views next week), but the network operators have generally been fairly consistent in using this cycle to lay out the tech basis for future budgets. I’ve tracked the cycles for thirty or more years now, and I’ve always found them interesting. I usually start my coverage of the planning with a review of what operators say they’re looking at, then recap the results in late November.

It’s probably no surprise that the number one priority in this year’s cycle is 5G deployment and monetization. Operators have never doubted that they’ll deploy 5G, but most have been concerned about whether the deployment could actually generate any incremental revenue. There have been some slow but important shifts in how that’s working out.

Over the last year, the number of network operator planners who believed that 5G would generate “significant” incremental revenue fell from 43% to 33%. The number who believed it would generate “some” incremental revenue stayed at 35%, and the number who believed it would generate “little or no” incremental revenue rose from 22% to 32%. You can see from this that a significant shift in attitude on 5G revenue has occurred, a general disillusionment on revenue prospects.

Why operators are more pessimistic is also interesting. The largest reason, cited by almost two-thirds of operators, is failure of standards to develop quickly enough. Nearly as many (over half) said that 5G Core products were later than they had expected. Interestingly, less than a third said that the reason was that applications that were 5G specific were slow to develop. This set of attitudes is downright poisonous, because it demonstrates a stubborn “Field of Dreams” supply-side mindset is persisting. It’s up to the market to figure out what to do with what we deploy. Ha!

How do operators believe they’ll get out of their current mess? The rough third that says there will be little or no incremental revenue, not surprisingly, don’t think that there’s any getting out of it. They believe that the hopes for 5G revenue were misplaced, that the opportunity was over-hyped, or some other reason. The remainder of the operators are almost hoping that their vendors will have an answer.

As to where that answer might lie, there’s been another shift in thinking. Last year, well over two-thirds of operators thought that IoT would be the source of incremental 5G revenue, and that optimistic view is now held by only a third (mostly the group who still saw the potential for “significant” revenue). A bit over half now say that it will be network slicing that generates the revenue, and the remainder had no specific ideas.

I have to repeat my concern about what I see as a poisonous attitude here. There is no significant thought being given to what a service buyer would be doing with 5G, only to what 5G feature they would (hopefully) be doing it with. IoT is not an application, it’s a class of applications and operators have always associated it with the use of cellular service for some sort of telemetry.

OK, operators have no idea what would generate new revenue, really. Are they clueless or disinterested in the continued decline in profit per bit? No; that’s still a top concern overall, and by far the biggest concern at the CxO level. They’re looking for cost reductions, but here again they’re surprisingly non-specific.

Almost exactly 40% of operators say that both capex and opex have to be better managed. The remainder are almost exactly balanced between saying they have to reduce capex (28%) or opex (32%). None of the groups had much in the way of specific plans to achieve their goal, and in many cases what they did have to say in a specific planning sense contradicted their overall views.

The most-cited strategy to reduce capex, by far, was open networking. At the same time, though, operators indicated less support for and interest in things like O-RAN. Last year, this was a top priority, and this year it’s fallen to being interesting to a bit over half of operators. White boxes increased from being interesting to 22% of operators to 49% of operators, making it the big winner in terms of approach. Still, of that group of interested operators only a third had a specific vendor partner they even remembered hearing about. The vendor, by the way, was DriveNets.

Lifecycle automation was the top focus for capex reduction, with almost 70% citing it as a goal. However, the interest in lifecycle automation seemed concentrated in the CIO organization who owns the OSS/BSS space, with less than a quarter of network operations people saying it was a priority for their company. Only 11% of operators saw the need to integrate automation tools across the OSS/BSS/NMS boundary. OSS/BSS vendors were also the ones most often mentioned in this area, and the TMF was the body where most opex-focused operators saw critical work being done. All this points to a continued tendency for operators to fail to cross their own organizational boundaries in planning.

Most operators say that cross-organizational issues belong to the CTO office, but another interesting point from my data is that the CTO is losing influence. Five years ago, there was almost-universal support for the CTO organization in driving technology evolution within operators. This year, support has fallen by (and to) half, and the reason seems to go back to that disillusionment with standards that I mentioned earlier. Among CxOs in general, the view that “standards take too long and aren’t as useful as they should be” gets three times the positive response as the one that “we depend on standards for technology advance”. Today, operators say they depend on vendors instead.

How’s that working out? More contradictions, in short. Operators, you’ll recall, said they thought open-model networks would be their key to capex reduction, yet they had less commitment to open 5G. They say they’re depending on their vendors, but almost three-quarters of operators say their vendors aren’t supporting their goals, are trying to lock them in, are boosting their own bottom lines at the operators’ expense…you get the picture.

One result of all of this is a bit of flailing about with regard to new revenue opportunities. For example, 58% of operators say they think they have an edge computing opportunity, but only 30% plan to invest in carrier cloud. They hope to resell cloud provider edge services or partner with cloud providers for a share of the revenue. And, to make things even more complicated, the number of operators who believe that either OTT players should have to settle for their use of the network, or help pay for network infrastructure, rose from 19% in 2020 to 55% this year.

In all the years I’ve surveyed operators on their planning focus for the fall cycle, I’ve never seen a time when things were as disorderly as they are this year. Operators are truly clueless, IMHO. They think their vendors are similarly clueless, manipulating them, or both. Standards are essential but they don’t work fast enough and are often not relevant. Profit per bit needs to increase, but somehow without either strategies to reduce cost or strategies to raise revenues.

There’s a golden opportunity here for vendors. The general view at this point is that budgets for 2023 will provide for only very minimal spending growth, but a solid business case could double that level of expansion. Operators are not providing it themselves. Leadership is what’s lacking, vision to help frame a path to a better future. Will somebody provide it? Operators want someone to do that, and even believe someone will, but they can’t name anyone who’s on the right track. Absent some specific motivation, I think budgets and plans will simply stay the course, but with proper motivation significant progress could be made, and money spent. All this adds up to either a very interesting or very dull 2023 in the world of network operators and their infrastructure.

Comments Invited on the Podcast Approach

I’ve posted some podcasts on my TMT Advisor site, and I’m inviting followers and other interested parties to check them out for content, format, length, and so forth. You can comment on them through LinkedIn, message or email me, or whatever format you’re comfortable with. Remember that I’m trying to decide if the podcast approach can be helpful to you, and to me, in the longer term!

E Pluribus: What Arista Could Do with its New Technology

I’ve been an advocate of virtual networking for both enterprises and network operators, with a special focus on SD-WAN. What’s really important, though, is the “virtual networking” antecedent, and I think that a reunion of virtual network technology and SD-WAN is coming. The driver is likely the cloud, and the instrument of change may be Arista’s purchase of Pluribus Networks.

Pluribus, which is now being combined into Arista’s Converged Cloud Fabric, targets the unification of two important trends. First, cloud computing is really hybrid cloud (as IBM has been saying) and there will be (according to Arista) twice as much application deployment remaining in the data center as moving (or, more properly, being created for) the cloud. Second, the data center needs to optimize virtualization, which means abstracting the hardware from the software, in order to create efficiency and agility. Networking is the glue that has to unite these trends because it extends both applications and resources across a wider zone, one that needs unified connectivity despite being administratively and technologically separating on a near-continuous basis.

Technically, this approach is based on SDN, but SDN using a fully distributed control plane rather than central (and, in larger installations, hierarchical) controllers. Each Pluribus-device runs an instance of their OS, so they’re presumed to be white-box devices or endpoints that can run an instance as a process. The data center piece is based on the presumption that current top-of-rack switches will be subducted under the new Pluribus-equipped devices, which will then gate them into the new framework even though they’re not really SDN elements in themselves. You can link these Pluribus switches across any underlay DCI technology to create a distributed virtual data center.

Public clouds can also be connected to this “cloud fabric”, and not just one but (in theory) all of them. This has the obvious benefit of making all hosting resources part of a unified network, which facilitates unified deployment, management, and monitoring. It would appear that the same capabilities could be used to extend the Converged Cloud Fabric to branch locations. You can see how this augments Arista’s own position in the market.

Further hints about augmented marketing position can be found in one of the solution categories, which is “Unified Cloud Fabric for Metro”. Aimed at network operators who want to shift away from the MPLS framework that dominates corporate VPNs, this solution is an indication that Arista is going to take a more serious run at the service provider space, and that it has its eyes on the “metro” space. The question is whether they see the full potential of metro.

Converged Cloud Fabric at the data center level is a pretty decent foundation for operator-provided edge computing, which would likely be hosted in the metro. If you let CCF link in public cloud connections, it would also be a framework where an operator could supplement its in-area hosting with public cloud services offered out of area. A single foundation created by integrating all the hosting resources would simplify virtual-function deployment. You could even, in theory, link in the customer-owned edge facilities that cloud providers are working to claim as a part of their own cloud services.

Pluribus also takes a stab at what could be called “vertical integration”. Network vendors don’t typically talk about things like containers and Kubernetes, but Pluribus has done a number of announcements on the topic. They don’t supply container tools but they do work to integrate and support them explicitly. That adds some credibility to their claims, and also helps engage the IT and CIO teams more broadly.

If we broaden our thinking on vertical integration to areas where Pluribus/Arista partnerships might be a play, we can’t help but notice two conspicuous ones, Ericsson and Broadcom. Obviously, Ericsson is a 5G and network operator vendor, and obviously Broadcom just acquired VMware. Arista is also a Broadcom (and VMware) partner as well as a partner to a broader range of software companies. They are not an Ericsson partner, so how that particular relationship will play out is still a question mark.

Which could be good or bad with regard to what I think is the Big Question here, which is whether Arista/Pluribus intends to really make a run at the metro space. There is little opportunity for anyone other than a mobile equipment giant or a big router/optical vendor to do much for network operators beyond metro, but metro fuses hosting and networking, hardware and software. It’s also where whatever value operators hope to add to services in the future will have to be connected. There’s no hotter spot in operator infrastructure than metro, so a metro thrust by Arista could be big news.

It would put them in direct collision with Juniper, whose Cloud Metro story is the only other explicit positioning in the metro infrastructure space. Without Pluribus, Arista would have no real claim to the metro space at all, and even with it, Juniper probably has an overall technical edge. Neither firm has really solidified its metro positioning to match market requirements at this point, and of course Arista/Pluribus positioning is a work in progress, so it’s very difficult to say whether one or the other has a natural advantage. Juniper has broader tech reach and a really good AI story, plus carrier incumbency. Arista has more software-centricity and a bit more of a cloud integration story. Throw the dice here, for now.

The biggest question of all, though, is whether virtual networking will get an overall boost here. The current Pluribus metro story is really about an MPLS alternative, which means an SD-WAN alternative too. However, their material isn’t really strong in positioning themselves within a branch or inside a cloud, places where Juniper (via 128 Technology) has very strong technology. It appears that the Pluribus virtual-SDN approach could add connection-level controls, which Juniper also gets with 128T. Juniper has Contrail, but they’ve not pushed it at the same targets Pluribus has worked to cover.

It’s really difficult to say whether Arista could mount a serious run at metro infrastructure. Juniper’s AI targets operations efficiency and improved cost per bit explicitly, and their greatest vulnerability is a lack of linkage between metro networks and carrier cloud. However, the operators themselves aren’t convinced they’d deploy carrier cloud, and Arista’s story in the linkage needs a lot of refinement too. What’s easy to say is that Pluribus gives Arista a shot they didn’t have before, and the company isn’t any shrinking violet in taking advantage of new situations.