What a “Service” Means is Changing

Whether we accept the concept of the semantic web, or Web3, or the metaverse, or even fall back on lower-level things like cable’s Distributed Access Architecture (DAA), the signs are pointing toward an expansion of what we believe makes up networks in general, and the Internet in particular. That’s already been going on behind the scenes, driven largely by the dramatic shift in traffic toward video, but it may now explode. If that happens, it may also stress a lot of the technology architectures and initiatives we’ve been dabbling with.

“Networks”, strictly speaking, are about connecting things, but the network that dominates the world today, the Internet, has been more from the start. Yes, the very early Internet included connectivity features, in no small part because we didn’t have a general model of connectivity, but it also had what we’d call “higher-layer services”. Things like terminal access (Telnet), email, file exchange, and eventually the worldwide web came along, then calling, messaging, and a host of other stuff. Today we know the Internet more as a collection of services than as a network.

What’s particularly interesting about this point is that the “services” of the Internet are, first, on the Internet rather than in it, and second, hosted rather than embedded in devices like routers. We could argue that the goal of creating virtual network functions as hosted software components to replace network devices isn’t necessary to validate hosting of features at all; we already do that, and most users see those features as “the Internet”.

One of the questions this point raises is the longer-term relationship between “networks” and “network services”, both in a business sense and technically. That’s a reprise of a decades-old “smart versus dumb networks” debate, but it’s really not about smart networks as much as service-integrated networks. The network itself is still connectivity-driven, the services are even more likely to be on the network, meaning a user of the network as much as the end users are. What’s not clear is whether the some of the services are provided with, integrated with, the network.

CDN services, content delivery networks, are “technically integrated” with the Internet. Content delivery is the primary mission of the Internet, and without CDNs it’s very possible that it couldn’t be done in an economical way and still maintain QoE. However, the great majority of CDN services are offered by third parties, so there is no business-level connection to the ISPs who combine to provide connectivity. Other “services” are really just destinations, like Facebook or TikTok, and this is what ISPs like the telcos mean when they say that they’re “disintermediated”. Services are what people consume, and others have introduced themselves as the providers of visible services, between users and the telcos/ISPs.

This situation is at the bottom of the debate on what telcos should do to gain revenue. Do they add “services” that relate to their connection facilitation incumbency, or do they add “services” of the kind that end users are actually, directly, consuming? There are people on both sides of that question, and some in the middle.

Which is what? Look for a moment at VoIP. If we expect VoIP to offer the service of universal voice connectivity, which of course is what plain old telephone service (POTS) offers, then we have to be able to interwork with POTS and among VoIP implementations. If a future service needs “universality” then it’s very possible that it would also need some foundation elements that were both standardized and made a part of “the network”. This is what AT&T has proposed in its discussions about creating facilitating elements that would help advance higher-level services. The notion of creating facilitating services that are both technically and business-integrated with the network raises a number of questions.

One that crosses between the technical and business domains is how these services are defined. If an operator like AT&T creates a set of the services, and competitors like Verizon follow suit, would the two be compatible? If they are not, then those who create higher-level services built on the facilitation features would have to adapt to different features and interfaces across facilitation providers. Since it’s hard to see what higher-level services wouldn’t be offered across a wide geography, that could pose a major threat to the utility and growth of the higher-level services.

But who then standardizes? Telco standards bodies have a long history of glacial progress and ignorance of modern software principles. Look at the 3GPP and 5G. Industry forums like O-RAN, often launched to address shortcomings in formal standards, may move quicker, but there is always the risk there might be several of them, and also the risk that since the same players are likely involved as are with formal standards, they’d fall prey to the same issues.

The IETF may be a useful compromise here. While it’s not an open-source group, it does require a reference implementation for most proposed changes and additions, and the fact is that IP is the transport framework for pretty much everything these days. The potential breadth of IETF influence, though, combines with attempts by other bodies like the 3GPP to create collisions. For example, we all know that the 3GPP, in 5G Core, standardizes network slicing and thus implicitly defines core network traffic segregation by service type. The IETF has an initiative with the same goal, and IMHO they’re a better place to deal with the issue since facilities like this should be generalized across all services, including the Internet. We will likely see more of these collisions develop, since “standards” and “forums” are increasingly like competing vendors.

Regardless of how the issue of standardization for facilitating services plays out, there’s another technical point to consider, one that came up almost two decades ago in another forum. When you build a service based on a component from some other player, how do you use it without gaining control over it, without getting visibility into someone else’s infrastructure?

In theory, this problem could arise even with connection services like VPNs, if the service were created using pieces of service from multiple providers, from something providers have sometimes called “federation”. My work with ExperiaSphere addressed this with a proxy, which was a model element that represented a service from a non-owned resource. The proxy was used by the “service operator” to build the overall service, but it only presented the data and control features that the “owning operator” was willing to expose. In effect, it was in my model the “top” of a resource domain model, though it could resolve within the owning operator to any combination of service and resource elements.

I think that proxy elements are essential in supporting facilitating services, but I’m not seeing much recognition of that among operators, nor much interest in figuring out how one would be built and used. One reason might be that operators like AT&T are seeing facilitating services as specialized for third-party use, and presume that interfaces and APIs would be designed to be shared with others. At this point, it’s hard to say whether that would stand up to actual broad-scale implementation, particularly if the role the facilitating service played had to be “composed” into multiple services, from multiple sources.

IETF network slices may be the best example of what could be considered a “facilitating service” defined by a standards body and thus, presumably, available across multiple providers. There is no question that IETF network slices are facilitating, explicitly and implicitly, and also no question that they’re credible because they fall within the scope of an accepted standards body that currently defines cross-provider technology standards. I believe that IETF network slices also fall into “lower-level” facilitating services, and that’s the reason there’s

My personal frustration with this point is that we’re now talking about initiatives that magnify the need for standardized federation of service elements, when the topic was raised almost two decades ago and has been largely ignored in the interim. It would be easier for us to make progress on network services if we accepted or at least explored past issues to be sure that they’re either resolved or have been deemed irrelevant. Otherwise we may waste a lot of effort redoing what’s been largely done before.

Taking the Measure of the Cloud

Just where is the cloud, as an element of IT? We hear a lot of stuff about it, but it’s hard these days to rely much on what we hear. Is the cloud really dominating IT, is it really going to eat the data center? What can we expect from cloud providers? All this stuff is intermeshed, so let’s try to un-mesh and understand it.

First, Wall Street data shows pretty clearly that public cloud spending growth peaked in 2020 and has declined ever since, despite the expectations/predictions that COVID and WFH would drive it up. Yes, it is “up” but not as a percentage of total spending.

Second, enterprises have told me from the first that the statements that they’re spending more and more on the cloud are often misinterpreted. Yes, cloud spending has been up, but not because the cloud is replacing data center IT. What’s happening is that the focus of IT investment shifted in 2018, toward enhancing the user experience associated with data center applications and information. This shift was supported by the creation of cloud front-ends for legacy applications. The number of enterprises who tell me that they have reduced spending on those legacy applications and their data center hosting of them is squeaking along at the edge of statistical significance.

Third, a decent portion of cloud spending growth can be attributed to startups and social-media or content players rather than enterprises. This is particularly true for Amazon, who tends to get the majority of that business. This means that the fact that Microsoft and Google are not closing the gap with Amazon is less due to Amazon’s broad success with their cloud, and more to the fact that Amazon’s gains in the startup/social/content space has tended to balance Microsoft’s growth in the enterprise, and some gradual improvement in Google’s cloud position overall.

What we can take away from all of this is interesting, in part because it seems to fly in the face of the broad vision of the cloud’s direction.

First, the cloud is almost mature as a target of enterprise investment. It offered a very different model of application hosting, one well-suited to the front-end mission I’ve described. That prompted enterprise investment growth, but the need for front-ends is finite, and it’s being exhausted. Any new capability tends to open new opportunities, ones held back by the facilities and tools that existed before that new thing came along. But all opportunities tend to get addressed, and when that happens, the business case for spending more is hard to make.

Second, cloud providers know what’s happening. Why would we be reading about how Amazon is working to help users migrate mainframe workloads to the cloud? Does anyone really believe that 1) there’s a lot of money to be made there given the limited population of mainframes, 2) that even those with mainframes are eager to get rid of them given that they’ve kept them through the whole of the server-pool revolution, and 3) that those who are eager would go to somebody other than IBM? Cloud providers have been telling us, the media, and Wall Street that “everything is moving to the cloud.” Facing some quantitative indicators of slowing growth, they need to make some noise about new targets to be moved.

Third, our perception of the cloud is shaped by what’s being done even if we don’t really understand what that is. Application development trends are shaped by the applications being developed, which as I’ve noted are front-ends to legacy data center apps. The requirements for a user front-end are very different from those of a mission-critical app, and we’re hearing things like “serverless”, “microservice”, and “cloud-native” from the perspective of what’s being developed. The stuff that frames the core business applications of today, in almost any vertical, are underrepresented in the dialog because this stuff isn’t being changed much in comparison. Thus, we’re left with the view that all of software design and development is changing to match the cloud. The cloud is influencing development overall, but not transforming the way all software is done.

Finally, we are misunderstanding and mishandling the application of cloud technology to things like networking, because we believe in that universal, systemic, shift in application technology. Containers, or cloud-native, or virtual machines, or even server-hosted virtual functions, are not the universal future of networking. The majority of network traffic will be pushed by specialized devices in the future, just as it has been in the past and is today. Because all of this isn’t widely understood/accepted, efforts to create a harmony between hosted and appliance-based elements of a network. We are underthinking the real problems because we’ve overestimated the extent to which “everything is moving to the cloud”.

The core value of almost every vertical market is tied up in traditional transaction processing and database analysis that is largely the same as it’s been for a very long time. We are tweaking how the relationship between workers, customers, suppliers, and partners interact with that traditional core. The core value of networking is still the packet-pushing at the data plane. We are now able to tweak the way that data plane relates to the community of network users and the information they’re seeking. That mission is trivial in one sense; it doesn’t alter the basic core value proposition. In another it’s profound, because how applications and services relate to their users is not only valuable, it’s been one of the fundamental drivers of computing and networking change.

We’ve missed something important here, which is that the cloud is a symptom of the fundamental change to information and network technology, not the change itself. The change is virtualization. Virtualization is all about disconnecting software from hardware, through the use of a representational abstraction. This disconnect lets us pool resources, move functionality around without rewiring things, scale up and down with load/demand, and so forth. The fact that it’s a high-value concept is demonstrated in both the scope and speed of its acceptance.

Abstractions are hard to deal with, and we live in an age of instant gratification. Forget understanding the problem, just tell me the right answer. That’s understandable in one sense, but in another sense it’s the problem itself. We can only optimize the cloud, the edge, the network, if we address what’s transforming it and stop mouthing platitudes. I think that starts by considering the notion of a “service”, the thing that users, people, are trying to interact with. I’ll start a short series on that topic with the next blog.

Metaverse Standards Group?

The metaverse took a step (in some direction or another) with the formation of a standards body called “The Metaverse Standards Forum” (a more technical view can be found HERE). Meta is unsurprisingly initiating the move, with Microsoft support, and other tech giants like Adobe and Nvidia have joined. The body offers the potential benefit of broad engagement, essential if the pieces of the metaverse ecosystem are to be made available. It also creates risks, some immediately obvious and some more subtle. Let’s look at both.

According to the prevalent definition, the metaverse concept is a virtual-reality framework within which people interact through avatars. I’ve proposed that the term be extended to cover “metaverse-of-things” applications which use virtual reality concepts to model real-world processes and systems. Whatever definition you like, it’s clear that metaverse success depends on creating an immersive VR experience that’s tightly and accurately coupled to the people, processes, and things it represents.

Meta, as I’ve noted in the past, likely has the resources to do, or induce, the necessary things, but that would surely exacerbate political and anti-trust pressure. The notion of a standards group is smart, because it defuses any claims that Meta is pushing proprietary technology to close off the metaverse market. Even if many companies decline to join (Apple and Google have done that already), the offer is still a powerful tool against complaints or legal action.

Meta, of course, is still primus inter pares as they apparently said once in Rome (first among equals) because they represent the largest, most credible, early metaverse deployment and the one most likely to become the centerpiece of the whole movement. That affords them to frame a model that at least meets their social-metaverse requirements. Many other players with goals that are limited to pieces of metaverse technology or to peripheral metaverse applications will be happy to fall in line. Most who aren’t will likely follow Apple and Google, and probably won’t join.

Truth be told, the arguments that Wall Street favors to justify a view that Apple’s failure to join is an issue seem specious to me. The fact that an Apple VR headset would compete with one from Meta is an issue only if we assume that a broad VR success wouldn’t end up building a market large enough that both players would benefit. However, it does raise the question of the value of openness, a question Apple and Meta both have every reason to think about.

When personal computers first emerged, there was at first an explosion of different proprietary models, including Apple’s. IBM introduced its PC in the early ‘80s, and they took the unusual step of not patenting the architecture. The IBM PC exploded in corporate personal computing, and other companies created compatible systems to take advantage of the growing software base for IBM’s system. The “PC” won out in the market.

Apple created a closed system, with a proprietary architecture they defended against copycats like Franklin. They didn’t win the architecture battle, but there are still Apple PCs sold today and IBM doesn’t sell one any longer. Apple won the Street-value war.

The first question that the metaverse standards concept creates is whether Meta is trying to build a PC or a Mac, whether they’re actually trying to create standards and an ecosystem, or trying to feather their own nest more effectively. We’re not going to know that for a while, since the answer will likely come from the specific direction the body takes.

Which generates the second question, which is “how long is a while?” Standards processes have historically been boat anchors on progress. A camel, so it’s said, is a horse designed by committee, which reflects the challenge of consensus, but there’s an equally important question of how long it takes to create the camel, and what’s been going on in the transport-animal space while the process was advancing. If metaverse standards take a while, then they either stall market progress or they’re bypassed by reality.

A better question for the Street to ask regarding the metaverse standard is whether Meta intends it as a means of advancing an open metaverse architecture, or freezing many players in place while they develop their own approach to maturity, in private. It seems to me that’s one of only two possible Meta-motives, the other being that they really want to accelerate things and will pay nearly any price to advance the open metaverse model quickly. Which is it?

The latter, I think. Meta needs metaverse because the social-media space is fickle. How many alternatives have evolved since Facebook? The value of social media is social, and social value is transient. Meta needs a next big thing that it has anticipated, and whose architecture it will play a major role in defining, because the alternative is another startup that stumbles on the Next Big Social Thing.

The risk, of course, is that Meta manages to define open metaverse and then manages not to profit from it in the longer term. IBM PC, remember? The problem is that not defining an open metaverse may not be a good choice either, because Meta had to raise the concept to keep Wall Street happy, and having done that they’ve created the risk of a competitor. The metaverse standards process at least gives Meta an opportunity to shape the metaverse architecture, and that may be key.

Because? Because, in part, of my “metaverse-of-things” notion. As I said in the two-faced definition at the start of this blog, we could view the social metaverse as a subset of the broader metaverse, one based on digital-twinning any real-world elements and then using the digital twin to do something useful. MoT could revolutionize IoT, for example. Same with collaboration, with worker productivity enhancement. I’ve argued that it would be relatively easy for a big player to define an MoT architecture, and such an architecture would have to be broad and open to be useful because of the scope of things to digital-twin. Digitally twinning humans for social interaction would surely fall within MoT scope. If social metaverses are just an MoT application, then meta has far less chance of controlling how they’re built and how they evolve.

One reason for that is that MoT is likely to focus on “local” metaverses, representing processes, industrial facilities, warehouses, and similar things. Many of the challenges of Meta’s social-metaverse vision, like highly distributed users expecting realistic relationships with their avatars, aren’t significant in these limited-scope missions. Since they’re difficult to address, there’s an incentive for MoT work to conveniently set them aside to make progress. That might then impact Meta’s social-metaverse ambitions.

The real question, then, may be whether a metaverse standard spawned by a social-media giant will support the general MoT model. If it does, then Meta may play an enormous role in defining the whole spectrum of MoT. If it does not, then might Meta’s initiative pave the way for that hypothetical big player to launch an MoT standards group? Might that make it a lot easier for competing social-media metaverses to develop? Wouldn’t Meta support for MoT within its metaverse standard end up doing the same thing?

Meta has set a difficult course for itself here, I think, because they almost have to take the largest possible slice out of metaverse opportunity to ensure that they don’t lose control, but doing that may facilitate market entry by competitors, which means that Meta has to drive its own metaverse investment and progress at breathtaking speed or they’ve blown the whole opportunity. Are they smart enough to see that? We may find out as early as this year.

Some Top-Down Service Lifecycle Modeling and Orchestration Commentary

Most of you know that I’ve been an advocate of intent-modeled, model-driven, networking for almost two decades. This approach would divide a service into functional/deployable elements, each represented by a “black box” intent model responsible for meeting an SLA. This approach has some major advantages, in my view, and also a few complications. Operators generally agree with this view.

I believe that a model-based approach, one aligned with seminal work done by the TMF on what was called “NGOSS Contract”, is the only way to make service lifecycle automation, service creation, and service-to-resource virtualization mapping, work. I’ve laid out some of my thoughts on this before, but some operators have told me they’d like a somewhat more top-down view, but one that’s aligned with current infrastructure and service realities. This is my attempt to respond.

The first and perhaps paramount factor is the relationship between functional division and “administrative control”. For sake of discussion, let’s say that an API through which it’s possible to exercise operational control over a set of service elements is an administrative control point. Let’s say that such a control point would let an operations process create multiple “functions”, which are visible features that can be coerced through the control point. Obviously, this control point could represent a current management system, or a new management tool.

My approach to modeling is also based on a division between the logical/functional and the actual/behavioral, meaning that it has a “service domain” and a “resource domain”. The former expresses the relationship between features and services, and the latter expresses how features map to the behavior of actual resources, including both software and hardware. The “bottom” of the service model has to “bind” to the “top” of the resource model in some way, and that binding is the basis for “deployment”. Once deployment has been completed, the fulfillment of the service-level agreement is based on the enforcement of the SLAs passed down the models (service and resource).

The service level top, which creates the actual overall SLA, has to “decompose” that SLA into subordinate SLAs for it’s child model elements, and each of them in turn must do the same. Each model element represents a kind of contract to meet its derived SLA. The model element relies on one of two things to enforce the contract—a notice from a child element that it has failed its own SLA (and by implication, no notice means it hasn’t) or, for a resource model, an event from a resource within that shows a fault that must either be remedied or reported.

The binding between service model and resource model is based on supported resource “behaviors”, which are functionalities or capabilities that the resource administration has committed to support. The behavior resource model would then be divided based on the administrative control points through which the behavior was controlled. There might be one, or there might be many such points, depending on just how the collective set of resources were managed.

The reason for the domain division is to loosely couple services to resources. With this approach, it would be possible to create a service model that consumed a behavior set that could then be realized by any combination of resource providers, without modification. Anyone could author “service models” to fit a set of behaviors and anyone would be able to advertise those behaviors and thus support the services. This could be used to focus standards activities on behavior definition.

Another feature of this approach is “functional equivalence”. Any implementation of a behavior could be mapped to a service element that consumed it, which means that you could deploy both features based on network behaviors—router networks—and features created by hosting software instances. In fact, you could even have something based on a totally manual process, so that if actual field provisioning of something was needed, you could reflect that in the way a behavior was implemented.

To return to SLAs, each model element in both domains has the common responsibility to either meet the SLA it commits to, to remedy its performance within the SLA, or report an SLA failure. During service creation, or “deployment/redeployment”, each model element has a responsibility to select a child element that can meet its SLA requirements on deployment, and to re-select if the previously selected element reports a failure. The SLA would necessarily include three things—the service parameters expected, the SLA terms, and the location(s) where the connections to the element were made and where the SLA would be expected to be enforced. “I need a VPN with 20 Mbps capacity, latency x, packet loss y, at locations 1, 2, and 3”. That “contract” would then be offered to child elements or translated into resource control parameters and actions.

At one level, this sounds complicated. If, for example, we had a service model that contained a dozen elements, we would have a dozen management processes “running” during the lifecycle of the service. If the resource model contained a half-dozen elements, there would then be 18 such processes. Some could argue that this is a lot of management activity, and it is.

But what is the actual process? It’s a state/event table or graph that references what are almost surely microservices or functions that run only when an event is recognized. A service or resource “architect” who builds the model would either build or identify each process referenced. Many of the processes would be common across all models, particularly in the service domain where deployment/redeployment is based on common factors. I’ve actually built “services” based on this approach, and process creation wasn’t a big deal, but some might think it’s an issue.

The upside of this approach is that each model element is essentially an autonomous application that’s event-driven and that can be run anywhere. An event-handling process is needed to receive events, consult the state/event reference, and activate the designated process, but even that “process” could be a process set with an instance in multiple places. In my own test implementation, I had a single service-domain process “owned” by the seller of the service, and resource-domain processes “owned” by each administrative domain who offered behaviors. This is possible because my presumption was (and is) that model elements could generate events only to their own parent or child elements.

The place where special considerations are needed is the binding point between the domains. A bottom-level service model element has to exchange events with the top-level behavior element in the resource domain. In my implementation, the binding process was separate and provided for the event exchange between what were two different event queues.

This approach also conserves lower-level management processes in the resource domain, processes that are likely already in place. All that’s needed is to wrap the API of an administrative control point in an intent model (a resource model element) that can coerce behaviors and you can then “advertise” them to bind with services. This is possible at any level, and at multiple levels, meaning that if there is some over-arching service management system in place, that system could advertise behaviors on behalf of what it controls, and so could any lower-level control APIs.

For those who wonder about ONAP and my negative views on it, I’m hoping this blog helps explain my objections. ONAP is, IMHO, a telco view of how cloud-centric service lifecycle management would work. It’s monolithic and it doesn’t address many of the issues I’ve noted because it doesn’t rely on intent modeling or service/resource models based on intent models. I don’t think that ONAP was architected correctly at the start, I don’t believe they want to fix it (any more than the NFV ISG really wants to fix NFV), and I don’t believe it could be fixed even if they wanted to, without starting over.

I’m not saying that my approach is the only one that would work, just that I believe it would work and that I’ve done some proof-of-concept development to prove out the major points. I’d love to see some vendor or body take up the issue from the top down, as I have, and I’d be happy to chat with a group that takes on that responsibility.

Will Crypto-Crashes also Crash Web3 and the Metaverse?

Crypto has surely had its problems recently, even major problems. Meta has been under pressure too, and there are signs that the Web3 hype wave has already crested. Is there a connection between these events, either in the sense that there’s a common underlying issue, or in the sense that one problem area might feed concerns in other areas? Maybe, but there are both technical factors and other factors linking these market pieces.

Let’s start with crypto, which IMHO is a kind of do-it-yourself bubble. In the world of investments, nearly everything that’s commonly traded is both regulated and has some “intrinsic” value. If you buy a share of stock, you are buying a piece of the ownership of the company the stock represents. If you buy a bond, you’re buying a promise to pay back, plus interest, from a specific player. Currency is backed by the country that issues it. The fact that there is real underlying value to an investment means that there’s a brake on just how much it can be hyped; the relationship between the underlying value and the price can limit enthusiasm. Not so with crypto.

A crypto-coin is intrinsically worth nothing; pegging crypto to a real asset doesn’t guarantee convertibility. Nobody stands behind some underlying asset with crypto. It’s worth whatever the market is willing to pay, and that means that enthusiasm can bid the price of crypto up without setting off the same kind of alarm bells that would be set off if a stock were similarly bid up. Yes, you can have stock bubbles (the so-called meme stocks were a recent example), but they don’t last long, and there are objective metrics you can use (price/earnings ratio, for example) that lets you spot trouble…spot a bubble.

I say “do-it-yourself bubble” because with crypto, not only do you have an asset with no intrinsic value, you have an uncontrollable quantity of it. Crypto mining is a bit like any other kind of mining in that it generates more of something. In the past, both silver and gold were somewhat scarce, but over time silver was mined more successfully and the quantity of silver increased. As that happened, the ratio of gold-to-silver pricing altered sharply, which demonstrates that the value of something is related to its scarcity. More crypto-coins being mined has that same effect. The higher crypto goes, the smarter it is to mine some, which means that the demand for crypto is influenced by more being produced.

There are also persistent questions about blockchain as a means of ensuring authenticity. A recent DARPA study says that blockchain is subject to tamper risks, which of course has always been known. The essential truth of blockchain authentication is that it relies on more “honest” nodes processing chains than “dishonest” nodes trying to create a fraud. It’s hard to know when that’s true, especially if nation-states are among the bad actors. But while this risk exists, I’ve not seen credible indications that it’s impacted crypto or created any issues with authentication.

My point here is that the problem with cryptocurrency hasn’t been the technology per se, it’s been the fact that there’s nothing behind it. Yes, blockchain underpins crypto, Web3 and most metaverse strategies, but the current problems with crypto aren’t directly linked to issues with blockchain, so they wouldn’t directly impact other blockchain-based concepts.

Not directly linked, but does everyone who’s looking at these three technical developments see it that way? Crypto is worth what the collective market believes it is, which means that it’s almost a social phenomena. What happens when it’s negatively socialized?

Venture capitalists are bubble-makers too. They pick a concept, capitalize a bunch of companies to support it, use that stable of companies to generate a lot of media hype, ride that hype to an exit of a few of their companies at a very high multiple. Web3 is an example of that, and so is the metaverse. Every investor wants in on the Next Big Thing, every Valley technical worker wants to work for the Next Big Startup, and every publication wants to capitalize on the Next Big Story. You can see how this turns into “The Emperor’s New Clothes” very quickly.

There are basic problems with crypto, as I’ve already noted above. There are basic issues with Web3 too, relating to whether any form of validation-by-consensus offers any meaningful validation, because you have no recourse against the masses if there’s something wrong. The metaverse has issues of both scale and value, and also some compelling business challenges that arise out of trying to solve the value problem. My view is that these issues are more than enough to call these technologies into question, to give us not only a right but an obligation to demand some proof of value. The suitability of the underlying technology is a consideration only if what that technology underlies is meaningful. If it’s not, no amount of technology is going to change things.

There is a kernel of real value in cryptocurrency, as a concept. There’s a kernel of real value in distributed validation and authentication as Web3 proposes, and there’s a kernel of real value in the metaverse concept. The problem is that realizing that real value is going to be expensive and time-consuming, just as building a real business would be. It’s a lot easier to figure out a way to create a hype wave, a bubble, and make money off that. Greed and lack of negative social or regulatory pressure almost guarantee that bubble-making will win over value realization.

The question is whether, given the bubble-state of our three technologies, there’s any hope that efforts to realize the real value will continue once the bubble bursts.

We’ve had real challenges in computing and networking over the last couple decades. Some have been addressed, and others never really got much attention, and some could be addressed by technology developments related to crypto, Web3, and the metaverse. I would contend that part of the problem is that when Bubble A bursts, those who made money promoting it are more likely to move on to Bubble B than to go back and invest in realizing the real benefits of that first bubble-creating technology.

The metaverse is a poster child for this issue. It’s pretty obvious from stories like THIS that “Meta says its ultimate goal with its VR hardware is to make a comfortable, compact headset with visual finality that’s ‘indistinguishable from reality’.” Why? Because that’s what the metaverse needs, and that’s a daunting challenge in itself, but it’s not the end.

The metaverse needs edge computing. It needs very low-latency connections between edge points, creating low-latency global reach. Anything less means that users will run up against barriers to immersion more often, and more often easily becomes too often. That’s bad news, but there’s also a touch of good news. With the metaverse, there is a player (Meta, obviously) who could actually drive the changes needed. It would be a bold move on their part, but if it paid off it would pay off big.

Web3 also has issues, but while its issues may be simpler, there is no big player dedicated to resolving them. In fact, the whole technology is anti-big-player. Ben Franklin once griped about the challenges of getting thirteen clocks to chime at the same time. Try getting thirteen hundred startups to collectively support a complex vision. One problem with decentralization is that you decentralize benefits.

Our problem here isn’t blockchain technology or even the business/technical challenges of creating the kind of ecosystem that Web3 or the metaverse need to truly succeed. It’s bubbles. What separates us from the age of tech innovation isn’t that the founders of the Internet or the personal computer or the smartphone were smarter, but that in that time, success required actually doing something, not just claiming you would and then taking the first exit ramp. There are plenty of opportunities to recreate that old model of tech success, and maybe some smart VC will follow one.

Google’s Private 5GaaS: A Good Move but Needs Better Singing

Google is entering the private 5G space, so the story goes, but what it’s likely doing is a bit more profound than that. Does it mean that “private 5G” is going to take off? Does that mean that public 5G is in even more trouble than some have said? Is something even more potentially disruptive going on? Let’s see.

We all know what public 5G is; the next generation of mobile broadband, voice, and messaging. Defining private 5G isn’t as easy these days. Some vendors believe that it’s enterprise deployment of 5G technology to create their own network. Some believe that it’s really “private-5G-as-a-service”, and that’s pretty clearly what Google is promoting. How does p5GaaS (to coin an acronym of convenience) work and is it really a game-changer? That’s the deeper view of my opening questions.

Let’s start with what I think is a fundamental truth, which is that the average enterprise has no solid reason to consider private 5G. We know that because there’s been private wireless versions of 4G available for years, and it’s not exactly swept the market. The truth is that the average enterprise is perfectly fine with WiFi and public carrier wireless services, because that’s what they now use.

I’ve chatted with enterprises who do have a justification for private 5G, and they tend to be companies with spread-out campus operations. I’ve seen it in factory settings, in transportation hubs, warehousing, and so forth. In nearly all the cases where a “justification” has turned into an actual deployment, there’s been a need to support both reliable telephony and data, and it’s also likely there’s an element of mobility in their need for data connectivity. Those who argue that IoT will be the driver of private 5G are right in that it is a driver, but wrong supposing that it’s a driver that will impact a substantial market segment.

Given this, what the heck are vendors thinking with the private 5G push? Answer: Wall Street creds. Many vendors and startups benefited from the 5G hype wave, but 5G has turned out to be what really should have been expected all along, a simple evolution of cellular communications. The big revenue explosion that many had been talking up was clearly not materializing. Solution? Find another reason to say that big bucks were coming. Thus, private 5G explosion.

Of course, as I’ve noted, private 5G wasn’t destined to explode. The cost and complexity of creating your own cellular network would be justified for only that small segment of enterprises with specialized communications needs. But given that, suppose that you could offer 5G in as-a-service form? Lower the bar on the cost and complexity challenge and maybe you’d get some more buyers. It was (and is) a new slant, mixing cloud and private 5G, so the media was in. The Street might be OK with it too.

Cloud providers like Amazon, Google, and Microsoft don’t really need the hypothetical p5GaaS revenue, though, so why would they push the concept? Answer: They want to host public 5G elements, and do so as an on-ramp to becoming the player who fulfills the requirements of the “telco cloud”, which is really more about edge computing than about 5G.

There are credible, if yet unproven, applications that demand processing hosted closer to the point of activity, meaning “the edge”. The common challenge for all of them is the “first telephone” problem. Who buys the first telephone, given that there’s nobody for them to call? There are plenty of things that edge computing might do, but who will plan to do them, absent any available edge computing services? Who will offer those services, absent anyone really planning to use them? Bootstrapping the edge is complicated, but 5G promised at least a way of getting things started.

5G requires hosted virtual functions, which obviously requires something to host them on. Most of the requirements would focus on the metro area, where user concentrations are high enough to justify resource pools and where user-to-hosting latency is low. Those same factors are the underpinning of the generalized applications for edge computing, so 5G-justified resource pools could also serve as edge computing resources, making edge services available and encouraging the planning of applications that require them.

Google, despite very strong cloud technology, hasn’t been able to gain much, if any, market share in cloud computing services. The last thing they need is an emerging edge computing opportunity that they can’t address. There is clearly interest among the network operators for 5G as a service, but much of the reason for that is that the operators themselves don’t want to capitalize telco cloud, given that so far 5G is mostly a radio access network (RAN) play. That suggests that the current telco opportunity for 5GaaS could be very small.

The best solution to that, from the cloud provider perspective, is p5GaaS, but in a particular form. Google’s approach is to offer p5GaaS as an application of Google Distributed Cloud Edge. This is Google’s “cloud-on-the prem” model, announced last year, and it extends the Google cloud right onto customers’ premises, using customer-provided hardware. Google joins its competitors in deploying a model of edge computing that’s slaved to their cloud, yet capitalized by the buyer. If GDCE proves popular enough, even if that’s just in select locations, Google can easily push its own hosting toward the edge to tap the additional potential revenue.

Seen in this light, Google’s push for p5GaaS makes a lot of sense. They cannot afford to let other cloud providers ride private 5G to extend their cloud onto customers’ premises. It is possible that there will be no third-party edge computing deployed at all; that all that will happen is that the cloud-as-a-platform will end up extending outward to customer equipment. If that happens, could the cloud become effectively the data center platform? Companies like Red Hat and VMware, after all, want the data center platform to become the cloud platform.

The other possible strategy Google may be pursuing is simply dominating the telco opportunity. As I’ve noted in other blogs, telcos in my experience have seen Google as less a threat to them than the other cloud providers. Google is the only company that Tier Ones solicited me to contact on their behalf (Google at the time rebuffed the initiative, but this was before the cloud wave) for a relationship. Among the cloud providers, AWS has the startup businesses, Microsoft generally has enterprises, but Google lacks a big core constituency. Telcos might make a very nice one.

Of course, all of this demonstrates Google’s technology more than it validates their marketing/positioning, and that seems to be their problem with the cloud. I would argue that Google has already done more for cloud technology than any other company, and it’s gotten less for its efforts. I’m not sure what’s going on with Google in the engagement process, but the need to fix that just as much as they need to advance their cloud capabilities.

How Did IBM Buck the Tech Downturn?

Let’s face it, this hasn’t been a good quarter, even a good year, for tech. Given that, how is it that an IT company that’s been around for longer than most of today’s technology professionals have lived seems to be doing more than OK? IBM seems to be bucking the downturn. What can we learn from that?

One question that comes to mind is how IBM could buck the supply-chain problem that others are complaining about. The obvious answer is that the key product ingredient in IBM’s success is software, and in particular Red Hat software. There’s no supply chain issue with software, and Red Hat has unquestionably reshaped IBM, but IBM has assets of its own in play too, which leads us to the first lesson.

Since the 1990s, IBM has consistently been the vendor with the largest “strategic influence” in our surveys. We define strategic influence as the ability to shape customer technology plans in a way that conforms to the vendor’s own view of the market, and to its product offerings. At the same time, strong strategic influence seems to ensure relevant and accurate feedback from account teams to senior vendor management, which helps with positioning and messaging.

You can see that with IBM’s “hybrid cloud” slant. The fact is that the number of enterprises who aren’t committed long-term to a hybrid cloud is below the level of statistical significance for my surveys. Despite this, the slant on cloud computing that most vendors take is based on the media, and so they talk about “multi-cloud”, which is something enterprises see as a defensive tactic rather than a strategic direction. From IBM’s quarterly earnings call: “Hybrid cloud is all about providing a platform that can straddle multiple public clouds, private cloud and on-premise properties that our clients typically have.” That simple statement shows how IBM has made their cloud story resonate.

Another benefit of strategic influence is the opportunity to sell consulting and other professional services. Many enterprises have commented that having an IBM team on site has made IBM a kind of de facto IT partner. Who better to turn to when you need outside resources to augment your own team, or for a task you have no internal skills to address? I’ve seen this in accounts where my own consulting brought me into contact with IBM people.

The effects of all of this are clear when you look at the numbers. IBM’s overall revenues were up 11%, but software revenue was up 15%, consulting revenue up 17%, Red Hat revenue up 21%, and hybrid-cloud-associated revenue up 25%. Another interesting revenue number that’s my transition into the next IBM point is that transaction processing revenue was up 31%.

IBM has strategic influence dominance in no small part because IBM is the vendor most involved in the core business applications of major enterprises in major verticals. The old adage, so old I remember it from my early days in programming was “Nobody ever got fired for buying IBM.” With vertical-market expertise that’s literally unparalleled in the industry, IBM can count on C-level engagement and that protects those who make a pro-IBM decision. Add to this the fact that one class of application that’s immune from outside pressure is the core business stuff. You don’t have a business at all if you don’t invest there.

One interesting point raised by IBM’s strategic-influence success is whether IBM’s challenge with marketing over the last two decades is linked to its strategic influence. Strategic influence for vendors tends to be linked to sales-centricity; Cisco among network vendors has the highest strategic influence. That raises the question of whether IBM’s current success is due to a market-behavior shift that suddenly values sales over marketing. It may be, but there’s a bit more to it.

I think it is true that IBM’s resilience in 2022 is attributable to the strategic-interest factors I’ve just noted, but I also think that they’ve been perhaps a bit smarter than I had expected in the way they’ve integrated Red Hat. IBM has made no bones about how critical Red Hat is to their future, both on earnings calls and in other conferences and media comments.

What the current quarter shows relative to IBM and Red Hat is that IBM appears to have added value to Red Hat, created more value for its customers based on Red Hat features, and at the same time not interfered in any noticeable way with Red Hat’s own trajectory. They also appear to have been successful in leveraging Red Hat to gain additional sales traction and strategic influence beyond their original (pre-acquisition) stable of accounts. All that is good news for IBM…and for Red Hat.

The quarter may demonstrate that IBM is at least capable of, if not intending to, balance between leveraging its strategic influence and leveraging Red Hat’s marketing potential. In good times, when tech is stronger overall, they could expect to add to their bottom line through the evangelism of Red Hat’s stuff, and when times are tough (like now) they can still rely on their strategic base.

Looking at lessons for the broader market, the one I think is most important is that hybrid cloud is the cloud. Attempts to gain strong account traction with another message, including “multi-cloud” is almost surely a bad idea, and it could be really bad if competitors manage to figure out how to do hybrid cloud while you wallow in the multi-cloud media and PR machine. Red Hat’s own website features hybrid cloud prominently, and while it would be easy to say that’s IBM’s influence in play, remember that IBM’s hybrid play is largely directed at its own strategic accounts. Red Hat doesn’t have to sing that song, yet they do.

That sets up the question of what the tail and the dog might be in a hybrid relationship. IBM’s success suggests that in a major-market enterprise and in core business applications, the data center is very much the tail. Otherwise why pick your key strategic partner from the data center side of the hybrid? Could this mean that, for enterprise cloud use at least, a data center player with dominant strategic influence could drive the cloud more than cloud providers? If so, it would be huge for IBM.

Not so huge for others, of course. Ironically, the vendor who has the most to fear from IBM’s success might well be the vendor who has the most strategic influence in the network space—Cisco. There are two reasons. First, network strategic influence doesn’t translate to CxO IT strategic influence. If it did, Cisco would have turned in a better quarter, and would have been able to get its software initiatives going better. Second, Red Hat might end up poisoning Cisco’s well, directly or indirectly.

Data center networking is in some ways the only dominating piece of “capital equipment networking” that remains. People don’t buy routers to build a WAN. However, there is already pressure on the switching market from white-box solutions. I think that the Broadcom deal for VMware could boost VMware’s strategic influence in the data center, and they’re already second to IBM. Might Broadcom leverage that to push white-box switches (with their chips, or perhaps even their own switches) in competition with network vendors like Cisco? Could Red Hat be a competing source of open-source switching software? Could a lot of credible backers of white boxes shift data center switching decisively?

This might be the time we find that out. Enterprises are obviously having trouble getting incremental dollars for networking, and almost all of those I’ve chatted with on the topic say that they would be more likely to embrace white-box data centers if they were backed by a big player. They don’t necessarily need that player to be a big network vendor. This seems to present another argument that network equipment vendors can’t simply assume that the future will be a linear descendant of the past. Times change.

Optimization, Virtualization, and Orchestration

What makes virtualization, whether it be IT or network, work? The best definition for virtualization, IMHO, is that it’s a technology set that creates a behavioral abstraction of infrastructure that behaves like the real infrastructure would. To make that true, you need an abstraction and a realization, the latter being a mapping of a virtual representation of a service (hosting, connectivity) to real infrastructure.

If we accept this, then we could accept that in a pure virtual world, where everything consumed a virtual service or feature, would require some intense realization. Further, if we assume that our virtual service is a high-level service that (like nearly every service these days) involved layers of resources that had to be virtualized, from servers to networking, we could assume that the process of optimizing our realization would require we consider multiple resource types at once. The best cloud configuration is the one that creates the best price/performance when all the costs and capabilities, including hosting hardware, software, and network, are considered in realizing our abstraction.

Consider this issue. We have a company with a hundred branch offices. The company runs applications that each office must access, and they expect to run those applications at least in part (the front-end piece for sure) in the cloud/edge. When it’s time to deploy an instance of an application, where’s the best place to realize the abstraction that the cloud represents? It depends on the location of the users, the location of the hosting options, the network connectivity available, the cost of all the resources…you get the picture. Is it possible to pick optimum hosting first, then optimum networking to serve it? Some, even most of the time, perhaps. Not always. In some cases, the best place to run the application will depend on the cost of getting to that location as well as the cost of running on it. In some cases, the overall QoE will vary depending on the network capabilities of locations whose costs may be similar.

We have “orchestration” today, the task of deploying components on resource pools. One of the implicit assumptions of most orchestration is the concept of “resource equivalence” within the pools, meaning that you can pick a resource from the pool without much regard for its location or specific technical details. But even today, that concept is under pressure because resource pools may be distributed geographically and be served by different levels of connectivity. There’s every reason to believe that things like edge computing will put that principle of equivalence under fatal pressure.

The “ideal” model for orchestration would be one where a deployment or redeployment was requested by providing a set of parameters that locate the users, establish cost goals, and define the QoE requirements. From that, the software would find the best place, taking into account all the factors that the parameters described. Further, the software would then create the component/connection/user relationships needed to make the application accessible. It’s possible, sort of, to do much of this today, but only by taking some points for granted. Some of those points are likely to require more attention down the line.

I think that for this kind of orchestration to work, we need to presume that there’s a model, and software that decomposes it. This is essential because a service is made up of multiple interdependent things and somehow both the things and the dependencies have to be expressed. I did some extensive tutorial presentations on a model-based approach in ExperiaSphere, and I’ve also blogged about it many times, so I won’t repeat all that here.

One key element in the approach I described is that there’s a separate “service domain” and “resource domain”, meaning that there is a set of models that describe a service as a set of functional elements and another set that describe how resource “behaviors” are bound to the bottom layer of those service elements. The goal was to make service definitions independent of implementation details, and to permit late binding of suitable resource behaviors to services. If a service model element’s bound resource behavior (as advertised by the resource owner/manager) broke, another compatible resource behavior could be bound to replace it.

This could offer a way for “complex” orchestration to work above cloud- and network-specific orchestration. The service models could, based on their parameters, select the optimum placement and connection model, and then pass the appropriate parameters to the cloud/network orchestration tool to actually do the necessary deploying and connecting. It would be unnecessary then for the existing cloud/network orchestration tools to become aware of the service-to-resource constraints and optimizations.

The potential problem with this approach is that the higher orchestration layer would have to be able to relate its specific requirements to a specific resource request. For example, if a server in a given city was best, the higher-level orchestrator would have to “know” that, first, and second be able to tell the cloud orchestrator to deploy to that city. In order to pick that city, it would have to know both the hosting capabilities there and the network capabilities there. This means that what I’ve called “behaviors”, advertised resource capabilities, would have to be published so that the higher-layer orchestrator could use them. These behaviors, being as they are products of lower-level orchestration, would then have to drive that lower-layer orchestration to fulfill their promise.

That exposes what I think is the biggest issue for the complex orchestration of fully virtualized services—advertising capabilities. If that isn’t done, then there’s no way for the higher layer to make decisions or the lower-layer processes to carry them out faithfully. The challenge is to frame “connectivity” in a way that allows it to be related to the necessary endpoints, costs included.

Deploying connectivity is also an issue. It’s the “behaviors” that bind things together, but if network and hosting are interdependent, how do we express the interdependence? If the higher-layer orchestration selects an optimal hosting point based on advertised behaviors, how does the decision also create connectivity? If the network can be presumed to be fully connective by default, then no connection provisioning is required, but if there are any requirements for explicit connection, or placement or removal of barriers imposed for security reasons, then it’s necessary to know where things have been put in order to facilitate these steps.

It’s possible that these issues will favor the gradual incorporation of network services into the cloud, simply because a single provider of hosting and connectivity can resolve the issues without the need to develop international standards and practices, things that take all too long to develop at best. It’s also possible that a vendor or, better yet, an open-source body might consider all these factors and advance a solution. If I had to bet on how this might develop, I’d put my money on Google in the cloud, or on Red Hat/IBM or VMware among vendors.

Something is needed, because the value of virtualization can’t be achieved without the ability to orchestrate everything. A virtual resource in one space, like hosting, can’t be nailed to the ground by a fixed resource relationship in another, like networking, and optimization demands that we consider all costs, not just a few.

There’s a New Flow Optimization Algorithm; What Might Need It?

One problem that networks have posed from the first is how to optimize them. An optimum network, of course, is in the eye of the beholder; you have to have a standard you’re trying to meet to talk about optimization. Networks can be optimized by flow and by cost, and most experts have always believed that the same process could do both, and algorithms have evolved to provide network optimization since the dawn of the Internet.

One challenge with optimization is the time it takes to do it, particularly given that the state of a network isn’t static. Traffic uses resources, things fail, and errors get made. IP networks have generally been designed to “adapt” to conditions, something that involves “convergence” on a new topology or optimality goal. That takes time, during which networks might not only be sub-optimal, they might even fail to deliver some packets.

A new development (scientific paper here) seems to show promise in this area. Even the first of my references is hardly easy to understand, and the research paper itself is beyond almost everyone but a mathematician, so I won’t dwell on the details, but rather on the potential impacts.

Convergence time and flow/cost optimization accuracy are critical for networks. The bigger the network, and the more often condition changes impact cost/performance, the harder it is to come up with the best answer in time to respond to changes. This problem was the genesis for “software-defined networks” or SDN. SDN in its pure form advocates the replacement protocol exchanges between routers to find optimum routes (“adaptive routing”) by a centralized route management process (the SDN controller). Google’s core network is probably the largest deployment of SDN today.

It’s centralized route management that enables algorithmic responses to network conditions. Centralized management requires that you have a network map that shows nodes and trunks, and that you can determine the state of each of the elements in the map. If you can do that, then you can determine the optimum route map and distribute it to the nodes.

Obviously, we already have multiple strategies for defining optimum routes, and my first reference says that the new approach is really needed only for very large networks, implying that it’s not needed for current networks. There are reasons to agree with that, IMHO, but also reasons to question it.

The largest network we have today is the Internet, but the Internet is a network of networks and not a network in itself. Each Autonomous System (AS) has its own network, and each “peers” with others to exchange traffic. There are a limited number of peering points, and the optimization processes for the Internet work at multiple levels; (in simple terms) within an AS and between ASs. If we look at the way that public networks are built and regulated, it’s hard to see how “more” Internet usage would build the complexity of optimization all that much, and it’s hard to see how anyone would build a network large enough to need to new algorithm.

But…and this is surely a speculative “but”…recall that the need for optimization efficiency depends on both the size of the network and the pace of things that drive a need to re-optimize. The need will also depend on the extent to which network performance, meaning QoS, needs to be controlled. If you can accept a wide range of QoS parameter values, you can afford to wait a bit for an optimum route map. If you have very rigorous service SLAs, then you may need a result faster.

We know of things that likely need more tightly constrained QoS. IoT is one example, and the metaverse another. What we don’t know is whether any of these things actually represent any major network service opportunity, or whether the other things needed to realize these applications can be delivered. An application of networking is almost certainly a swirling mix of technologies, some of which are on the network and not in it. The failure to realize a connectivity/QoS goal could certainly kill an application, but just having the connectivity and QoS needed for a given application doesn’t mean that application will automatically meet its business case overall. We need more information before we could declare that QoS demands could justify a new way of optimizing network routes, but there are other potential drivers of a new optimization model.

Network faults, meaning the combination of node, trunk, and human-error problems, can drive a need to redefine a route map. If you had a very faulty network, it might make sense to worry more about how fast you could redraw your map, providing that there wasn’t such a high level of problems that no alternative routes were available. My intuition tells me that before you’d reach the point where existing route optimization algorithms didn’t work, you’d have no customers. I think we could scratch this potential driver.

There’s a related one that may be of more value. The reason why faults drive re-optimizing is that they change topology. Suppose we had dynamic topology changes created some other way? Satellite services based on low-orbit satellites, unlike services based on geostationary satellites, are likely to exhibit variable availability based on the position of all the satellites and the location of the sources and destinations of traffic. These new satellite options also often have variable QoS (latency in particular) depending on just how traffic hops based on current satellite positions relative to each other. Increased reliance on low-earth-orbit satellites could mean that better route optimization performance would be a benefit, particularly where specific QoS needs have to be met.

Then there’s security, and denial-of-service attacks in particular. Could such an attack be thwarted by changing the route map to isolate the sources? There’s no reliable data on just how many DoS attacks are active at a given time, but surely it’s a large number. However, the fact that there is no reliable data illustrates that we’d need to capture more information about DoS and security in order to make them a factor in justifying enhanced route optimization and control. Jury’s out here.

Where does this lead us? I think that anything that enhances our ability to create an optimum route map for a network is a good thing. I think that the new approach is likely to be adopted by companies who, like Google, rely already on SDN and central route/topology management. I don’t think that, as of this moment, it would be of enough benefit to be a wholesale driver of centralized SDN, though it would likely increase the number of use cases for it.

The biggest potential driver of a new route optimization model, though, is cloud/edge computing, and the reason goes back to changes in topology. You can change traffic patterns by network faults or changes, but a bigger source of change is the unification of network and hosting optimization. The biggest network, the Internet, is increasingly integrating with the biggest data center, the cloud. Edge computing (if it develops) will increase the number of hosting points radically; even without competitive overbuild my model says we could deploy over 40,000 new hosting points, and considering everything 100,000. Future applications will need to not only pick a path between the ever-referenced Point A and Point B, but they’ll also have to pick where one or both points are located. Could every transitory edge relationship end up generating a need for optimal routes? It’s possible. I’m going to blog more about this issue later this week.

The strategy this new optimization algorithm defines is surely complex, even obtuse, and it’s not going to have any impact until it’s productized. The need for it is still hazy and evolving, but I think it will develop, and it’s going to be interesting to see just what vendors and network operators move to take advantage of it. I’ll keep you posted.

Cloud-Native, Composable Services, and Network Functions

I get a lot of comments and feedback from vendors, enterprises, and network operators. Recently, there’s been an uptick on topics related to Network Functions Virtualization, NFV. Some of this has apparently come out of the increased activity around Google’s Nephio initiative, which aims at creating a Kukbernetes-based management and orchestration framework. Some has come out of the realization that 5G deployments will increasingly commit operators to an approach to function hosting, and many realize that the work of the NFV ISG, while seminal in conceptualizing the value proposition and approach, isn’t the technical answer.

It’s important to get a “technical answer” to network function hosting because, whatever the mechanism, it’s clear that the industry is moving toward being able to compose services from functional elements. That means that it will be necessary to define service models and then orchestrate their deployment. If we’re going to go through that admittedly complex process, we don’t want to do it more than once. We need a model of service orchestration and management that fits all.

What, then, is “all?” One interesting thread that runs through the comments is that we aren’t entering the debates on function hosting with a consistent set of definitions and goals. It’s difficult to get to the right answers without asking the right questions, and more difficult if you can’t agree on basic terminology, so let’s try to unravel some of the knots that these comments have uncovered.

The first point is that while the great majority of operators seem to accept the need for “cloud-native” function hosting, they don’t have a solid definition for the term. That’s not surprising given that the same could be said for the industry at large. In fact, the “purist” definition and the popular meaning of the term seem to be diverging.

When I queried the operators who used the term, most said that “cloud-native” meant “based on native cloud computing technology”. Containers, Kubernetes, and the like were their examples of what went into a cloud-native function hosting framework. To cloud people, the term usually means “stateless and microservice-based”. In an earlier blog, I noted that many of the “network functions” that needed to be hosted didn’t fit with that purist definition. They shouldn’t be required to, because to do so would likely result in service failures.

Stateless microservices are a great strategy to support a human-interactive application and some other event-driven applications, including the control-plane part of networking. They’re not suitable, IMHO, for data plane applications because splitting functionality and introducing network latency between functional elements will impact performance and actually reduce reliability.

Containers are probably a good, general, way of hosting network functions. The big advantage of containers is that they’re a kind of self-describing load unit, which means that it’s possible to configure the deployed elements using generalized tools rather than requiring tweaking by human operations management. However, containers aren’t as valuable where something is going to be deployed in a static way on specific devices rather than on a pool of resources. Containers also have a higher overhead, meaning that flow-through traffic may not be ideal.

What we really need for function hosting is a universal deployment and management framework that works for containers, virtual machines, bare metal servers, and white boxes. That framework has to be able to deploy network functions and also on-network functions, because even the Internet, when considered as a service, has both. I’d argue that there are already initiatives that satisfy that for Kubernetes, and that’s one reason I am so interested in the Nephio project and its Kubernetes-centricity. What Nephio proposes to do is to create hosting generalization without sacrificing common orchestration/management, and that’s critical.

This point is important when addressing another theme of the comments, which is that achieving “cloud-native” (or whatever you’d like to call it) for function hosting is something that has to come out of the work of the NFV ISG. I disagree with that, strongly, and the reason is that I believe that the early work of the ISG has made it difficult (if not impossible) for it to embrace the Kubernetes-centricity that function hosting needs.

If future network services are ever going to be differentiable, different, then that’s going to have to be achieved through a unification of network functions and the functions that create and sustain experiences. Networking today, from the perspective of the user, is all about experience delivery. The front-end of those experiences is already largely hosted in the cloud, where cloud-centric practices (including/especially Kubernetes) prevail.

Despite the fact that operators believe that function hosting will enrich their services and make them more differentiable (raising revenues), they have spent little time trying to identify what specific functions would accomplish those goals. As I pointed out in my blog yesterday, they seem to believe that abstract function hosting will do the job, which is essentially a claim that providing a mechanism to do something is equivalent to actually doing it. They haven’t confronted specific service examples that mingle network functions and other hosting. In fact, they’ve really not considered network functions themselves, on any broad scale other than that outlined in places like the 5G standards. That’s likely why they aren’t seeing Kubernetes-centricity as pivotal as it is.

Operator standards make glaciers seem to move a breathtaking speeds by comparison. The cloud has done the opposite, compressing application development schedules and creating an explosion of tools and techniques to support the ever-changing needs of businesses. If Internet-centric experience delivery is what’s really driving the cloud and networking, then can telcos afford to have their own function hosting framework lagging so far behind what’s driving services? I don’t think so.

The pace of cloud progress can be traced to two things. First, there is a compelling mission for the cloud, even though it’s not the mission most think of when they hear “cloud computing”. It’s not about replacing the data center, but about extending applications’ user relationships out of the data center and closer to “the Internet”. Second, the cloud has produced application development, hosting, deployment, and management tools and techniques that favor rapid progress. They’ve done that by replacing “standards” with open-source projects.

A few operators, and more vendors, tell me that the “standards” activity itself is creating a problem. There’s an established operator reliance on standards and on their work in defining them. For mobile networking, the 3GPP has almost literally ruled in defining how equipment works and interoperates. This flies in the face of the fact that IP networking is utterly dominant today, and that the IETF and not any carrier-centric standards body, drives IP specifications. It’s not unreasonable to see the rise of the Open RAN (O-RAN) initiatives as an indicator that even the 3GPP may be losing influence. However, “losing” doesn’t mean “lost”, and so we can’t hope for quick progress in turning operators away from their own standards initiatives. Especially when there are a lot of people in operator organizations whose career is based on those initiatives.

Vendors, I think, can and must be the solution here. I’ve advocated an approach of blowing kisses at ETSI NFV while working hard on developing a true cloud-centric function hosting framework. I think (I hope) that Nephio does just that, but what will decide the fate of Nephio, and perhaps that of function hosting and even telecom, is how vendors support the initiative, and that’s not something that’s easy to assess.

Nephio is a hybrid of cloud and telecom in more than just a technical sense. Most of the truly seminal elements of the cloud were created because a primary vendor did something and made it open-sourced. Rarely has a collection of vendors somehow come together on a concept successfully. The old “horse-designed-by-committee” analogy sure seems to apply. Vendors not only make up the majority of participants in open network-industry bodies, they tend to supply the majority of the resources. What happens next for “cloud-native” in telecom depends on how those vendor participants work to advance something useful, fast enough to matter to a rapidly changing market.