Looking for the “Why” in Service Lifecycle Automation

What is the goal, the real and ultimate goal, of service lifecycle automation? That may seem like a stupid question, and in more than one dimension. First-off, everyone knows that service lifecycle automation is supposed to cut opex. Many know that opex is actually a bigger portion of each revenue dollar operators earn than capex is. Furthermore, many also know that the traditional measure of operator financial health is EBITDA, which stands for “earnings before interest, taxes, depreciation, and amortization”, and that doesn’t include capital purchases in any event but does factor in opex. But how much of the addressable opex have we already addressed? That’s the big question.

Across all the operators, opex represents about 30 cents of each revenue dollar, where capex represents around 21 cents. Since I started keeping tabs on, and modeling for, this sort of thing back in 2016, operators have inaugurated measures to reduce opex, and some operations costs have in fact declined a bit. Customer support costs are down by over 25% since then, due to increased use of customer support portals. Network operations costs are down starting in 2020, by about 6%, but the other components of opex have continued to creep upward. The one that’s gone up the most, and is already the largest, is customer acquisition and retention.

Nearly every mobile operator contact I have will admit that efforts to differentiate mobile services based on any aspect of service quality or reliability have proven ineffective. Instead, operators use handsets to influence buyers, or rely on inertia. The question is whether that’s really the best approach, or even whether it has any staying power in the market. In off-the-record comments, operators will admit that they believe smartphone giveaways or discounts can work to fend off prepay price leaders, but it’s becoming increasingly clear that they’re not broadly effective.

For wireline services, things are similar. Bundling of broadband and live TV is still a big draw, but even in the wireline world there’s still a benefit to differentiating based on local devices (in this case, the wireless/WiFi router). Users should know that having faster WiFi won’t normally have any effect on quality of experience, but they apparently don’t, and some ISPs run ads that strongly imply that their WiFi will indeed give them better Internet.

If operators want to reduce their acquisition and retention costs, they’ll need to get away from a focus on devices and move to a focus on services in general, and service quality in particular. Operators have picked pretty much the whole crop of low apples in customer care already, and while it’s paid off well enough to fend off a profit-per-bit crisis, it’s also covered up some basic problems. Fixing those might fix opex overall.

One interesting point to consider here is the relationship between capex and opex in both network and data center. Right now, operations support for both networks and data centers run roughly four-and-a-half cents of each revenue dollar, but the total technology investment in the network is at least twenty times that of the data center. This means that IT operations is a lot more expensive per unit of technology, and that’s a problem that played a role in rendering NFV ineffective. Because NFV didn’t address lifecycle management any differently, a shift to virtual functions would surely have cost more in opex than it saved in capex.

This obviously raises questions regarding things like 5G function hosting and “carrier cloud” or edge computing. Operators have between eight and thirteen times the operations cost per unit of technology as public cloud providers. With that opex-per-unit disadvantage, they could never hope to be competitive as a hosting source, and in fact their 5G technology investment could be at risk. Some operators who read my comments on why they were focused on public cloud partnerships for 5G hosting told me that opex cost was their big problem; they couldn’t efficiently manage their own infrastructure.

Right now, of course, the operations costs associated with IT equipment is largely associated with the CIO organization, responsible for OSS/BSS. For decades, operators have been divided on whether OSS/BSS needed modernization or should simply be tossed out, and that’s the question I think is at the heart of the whole service lifecycle automation debate.

For efficient service lifecycle automation, you need two very specific things. First, you need a unified strategy for operations for both network equipment and hosted-function infrastructure. 5G should have made that point clear. Second, “service lifecycle” means both operations/business support and network/hosted-function support.

I think that the NFV ISG recognized the first point, the need for unified operations, and that recognition was behind the decision to link management of a hosted function with the management tools used for the physical device the function was derived from. However, they left out the issue of managing the hosting itself. A virtual device fails because the virtual function failed, and that failure is equivalent in a sense to a real device failure. It also fails if what it’s hosted on fails, and the management of the resource pool is therefore just as critical.

My own ExperiaSphere work relates to the second point above. Lifecycle automation is going to require some form of state/event processing. It’s hard to see how service lifecycle automation couldn’t require both OSS/BSS and NMS integration, and if it did, we’d be making OSS/BSS functions into processes run based on state/event relationships just like NMS functions. I think this would effectively remake the OSS/BSS.

A unified view of a service lifecycle, one that integrates service and network operations, hosted functions and network devices, would be the logical way to address lifecycle automation. But would it have any impact on opex? I’d love to say that my modeling said that it would, that it would be the path to opex nirvana. I can’t say that, at least not for the present.

The problem is that consumer broadband is really plumbing, and plumbing is something that’s important only if it breaks; otherwise it’s invisible. You can’t differentiate invisibility, and if acquisition/retention costs are the major component of opex (which they are, by far) then how can you show customers that they should pick, or stay with, you because of that invisible thing?

But, and it’s a big “but”, there are reasons why operators shouldn’t be willing to consign themselves to being the plastic pipe that connects your leopard-skin toilet seat to the public sewer. There are reasons why, if they do accept that, they’d still need to cost-optimize the whole picture. In either event, there is at least some function hosting in their future, and they need to address that so as to reduce their unit opex for server pools.

If we need to do something for lifecycle automation, reflecting those two points, what might that look like and what specific impact on opex might be possible? We’ll look at that in a follow-up blog.

Where Are We Really Heading with Broadband?

It’s pretty clear that fixed broadband access is changing. Both CATV and fiber-based operators are raising their speeds, to 2 or even 4Gbps. AT&T, who a year ago wasn’t all that positive on fixed wireless access (FWA) as an option now says that they’re going to retire more and more copper, replacing it with fiber where feasible and with FWA/5G where it’s not. A bunch of smaller access players are getting into the market, usually with aggressive fiber deployments. You’ve got to wonder what’s going on, so let’s find out.

Always start with money. On the revenue side, things have shifted significantly over the last decade. At the start of it, most wireline broadband relied on linear live TV for its consumer draw. Cable companies, of course, were doing that before they were doing Internet at all. Today, streaming TV services are expanding, networks are starting to stream their own content directly, and competition has made TV less profitable. On the other hand, broadband Internet has become a necessity for most consumers, and so broadband Internet is pushing integrated live TV off the podium.

Staying with money but moving to the cost side, fixed access providers recognize that to be in the game at all, they need to deliver tens of megabits at least, which means that standard copper loop is just not going to cut it. They also recognize that once you decide you can’t use the old loop plant, alternative technologies (fiber, CATV, FWA) are all capable of delivering hundreds of megabits to multiple gigabits. “Delivering” here means clocking a consumer interface at the specified speed, not creating an end-to-end path of that speed. The access providers can’t control actual delivery speed since they’re not providing the entire path, but they can advertise interface speeds and compete with others in that numbers game.

The hot question for today’s market is “What alternative technology” to copper loop is best, and that’s proved to be a complicated question. Almost two decades ago, I ran through a lot of modeling and determined that there were publicly available data points that could be combined to create a measure of “demand density”, which was a combination of the revenue potential per unit of service area and the cost likely incurred to serve that area. High demand density favors fiber to the home (FTTH), and in fact the availability and quality of fiber broadband correlates very strongly with demand density. Verizon, whose demand density is at least seven times that of rival AT&T, was quick to embrace fiber broadband…because they could make it pay off.

Demand density is complicated, though. Saying that Verizon has seven times the demand density of AT&T is a bit of an oversimplification. What’s measured, in a sense, by these broad-area numbers is the ratio of territory that can be considered urban, suburban, and rural. Places with high demand density have more urban/suburban areas, and those with lower values have more rural areas. As time passed and we left “regional” deployment of fiber behind, we entered a period where operators (including AT&T) were thinking of their own infrastructure in terms of those three socio-demographic divisions.

While the territory of a “wireline” operator might have a demand density of 1.0 (the value I normalize to as the average across the US), there will be enclaves within the territory where demand densities are five, ten, or even fifteen times that. Those territories could be served with FTTH quite nicely, and so we’re seeing broad-scale operators like AT&T doing more fiber, not as a path toward making fiber universal but for those areas where local demand density will justify it.

We’re also seeing smaller “rural” telcos and players like Google, and even city, county, or state/province governments, getting into the fiber-based broadband game where no major player is prepared to take the necessary steps and risks. Again, this is often played out as an indication that fiber will become universal, but that is simply not the case. Absent subsidies, at least half the locations within low-demand-density territories are forever beyond fiber broadband. AT&T has proved that out by saying that as it retires copper, it will be relying to a degree on fiber (a limited degree, because most of where they can profitably deploy it is already on the plan) and the rest on FWA, which is likely to be 5G in either its cellular form or as millimeter-wave fiber-to-the-node (FTTN).

Generalizations can be dangerous, but my model and data suggest that cities and towns with populations as low as 20,000 can be served by fiber. 5G/FTTN seems to work for all towns and for the average suburban area, where developments have lot sizes lower than about an acre, and for the majority even where household density is as low as one per three acres. For more rural areas, the most effective strategy is to use mobile-network-centric 5G, which will serve areas with household densities as low as one per thirty acres or more, depending on the prevalence of highways where cellular services tend to be justified by transit traffic.

Well over half the territory of some states/provinces and even some countries will not support commercially profitable wireline broadband at rates of 20 Mbps or higher. For these areas, satellite broadband is the only commercial option likely to be viable. Subsidies to provide better terrestrial service to everyone are unlikely to pass the public-policy and politics muster.

The net of all of this is that we are not going to see universal fiber, or universal anything, in terms of broadband access. There’s too much local variation in demand density to allow for a single strategy. There will always be a digital divide, just as there will always be a difference between the distance to the nearest store or restaurant depending on whether you’re living in an urban, suburban, or rural area. What we can hope for, and should strive for, is to accept the need to tune broadband technology to the geography and demography of the service areas, to achieve the optimum broadband the situation allows. We’re not going to get to that by touting universal fiber or anything else, only by embracing different strokes for different folks.

Are Cloud Articles and Surveys Just Adding to Cloud Confusion?

Inevitably, the technology news cycle follows a pattern of hype-the-plus to exaggerate-the-minuses. We have two recent stories that seem to pit one extreme and the other, one that says that the cloud in general and multi-cloud in particular has “hit a wall”, and another that says that “there’s no going back from multicloud”. Can we learn anything by looking at the two in combination? Let’s give it a shot.

I’ve noted in past blogs that the general perception is that everything is moving to the cloud. Another article, this one in ZDnet, says “a recent survey of 300 IT executives by Harvard Business Review Analytic Services, underwritten by Splunk, which finds while at this moment, most organizations still have most of their technology systems in house. But get ready to start bidding farewell to on-premises IT.”

This, according to enterprises I’ve talked with, my modeling, and my own views, is inaccurate. Few data center applications are actually moving to the cloud. Instead, what’s happening is that application modernization is creating front-ends for applications, hosted in the cloud. One of the goals of appmod is the creation of web-based portals through which existing applications can be accessed optimally by a range of people, including customers, partners, and employees. This myth-versus-reality contrast might be a reason why some say that the cloud has hit a wall.

If you look at the data centers today, you don’t find empty halls with ghostly echos and maybe a little mist (air conditioning?) to add a sense of mystery. It looks pretty much like business as usual, and if you thought that cloud success was about eliminating data centers, you could be forgiven for thinking that the cloud was falling short. The story, in VentureBeat, quotes a research report saying “Seventy-six percent of respondents agreed with that their company is hitting a wall while using their existing programs and tools to achieve cloud objectives” and that only 8% say that they’re certain they’re getting the value they need from the cloud.

This is obviously a different wall to hit than the one the headline suggests. The survey point is that companies don’t believe their current tools, programs, and initiatives are securing everything they want from the cloud. That could well be true, but if so it could be a fault of the tools, etc., or the expectations…or both. Without knowing the details on just who said what (the study included “over 500 senior IT, devops, secops and finops leaders from large enterprises worldwide”) it’s hard to say whether we’re shy of tools or have an excess of expectations.

The specific problem area the article cites is multi-cloud in general, and “visibility” in particular. The ZDnet piece cited a survey that was sponsored by an observability company, too. The theme of this is that we’re held back from totally abandoning data centers by our inability to grasp the state of the multi-cloud.

But how do we reconcile this skepticism with the second article, this one from Protocol, that says that the cloud and multi-cloud is not hitting the wall, it’s in fact inevitable? It quotes not a survey but “Priyanka Sharma, the executive director of the Cloud Native Computing Foundation; Paul Cormier, Red Hat’s president and CEO; and David Linthicum, chief cloud strategy officer at Deloitte”. That piece is filled with comments like “multi-cloud chooses you” and “the move to multi-cloud is a natural progression”. Maybe we’re destined for the cloud, but doomed to do it badly?

Taken as a group, the articles seem to be really about “multi-cloud” more than the cloud in general. That raises the question of just why enterprises adopt multi-cloud in the first place. I think that the VentureBeat piece, and the survey it cites, view multi-cloud as something that creates a unified “front-end” piece of a hybrid cloud by combining a bunch of public clouds into a single resource pool. Knowing what’s going on in that unified cloud, given the separation of administration, is something that the responding enterprises worried about. I’m not totally comfortable with this, nor with the presumption that the visibility issues are hamstringing enterprise cloud adoption. First, I doubt that there’s any significant number of enterprises who think they have all the tools they need for any technical mission. Second, I don’t think that the phrasing of the questions in the survey are such that the responses can be associated conclusively with multi-cloud at all.

The Protocol piece suggests that the most significant driver of multi-cloud isn’t the desire to avoid cloud lock-in, but the fact that some applications are better supported on one particular cloud than another. If you have a range of applications, you’d naturally end up with a range of clouds. Hence, the “multi-cloud chooses you” tag line. That some cloud users pick multiple providers because of differences in how applications are supported on each is true, but my enterprise contacts still suggest that backup is an important consideration.

Here’s the key point. If multi-cloud is driven mostly by specialized selection of cloud providers based on application support, there would be little chance that the clouds would operate as a unified pool of resources, and less chance that cohesive monitoring (“visibility”) would be a requirement. Where cohesion of visibility is important is where resources across the clouds are treated as cohesive, meaning belonging to a common pool.

Even if we assumed that an enterprise wanted all their public clouds to create a single unified resource pool, though, I contend that this doesn’t present a unique visibility challenge. You can get software from HPE, Red Hat, VMware, and others that can be run in the cloud or in the data center, software that’s already running production applications in both, in fact.

There is a risk to the cloud reflected in these articles, but it’s not the risk the articles are promoting. The risk is the articles themselves, the presumption they make about the cloud and its growth. The actual mission of the cloud, whether “cloud”, “hybrid cloud” “multi-cloud” or some strange hybrid, is obvious based on current activity. We use the cloud to build a user experience, not to process transactions, run business reports, and so forth. Cloud providers and cloud software vendors may well believe that broader cloud use would benefit them, and they’d be right. That’s not the question; the question is whether it would benefit the buyers.

Admitting to the real mission, focusing development and conferences and research on the actual mission, could not only improve the cloud’s response to that mission, but also extend cloud usage to other areas with similar technical needs. Metaverse, IoT, edge computing—all these things are real-time activities, and the user experience is likewise. Why not try to push for cloud progress where progress could obviously be made?

The cloud has enjoyed almost boundless support in the tech media for decades, and while simple PR support can’t substitute for a business case, it’s certainly encouraged tech planners to look for one. If we set expectations for the cloud that are totally unrealistic, and if we then start planners on a quest to somehow meet those expectations by taking steps that will end up failing to do the impossible, will we not hurt the cloud of the future?

Could a “Digital Twin” Model of a Network Help with NMS/NOC?

The hierarchy/intent modeling approach I’ve blogged about, similar to the TMF SID, seems to serve the mission of service management automation well. It also seems possible to use a similar modeling technique to represent real-world “digital twin” relationships, and (finally) it seems possible to use a digital-twinning approach that represents features rather than real-world elements to implement service management. That sums up where we’ve gone with modeling so far. Now, I want to conclude the series by looking at the application of digital twin modeling to network management.

Classical network management, meaning the OSI model of network management, presumes we have three layers. At the top is “service management”, which we’ve aligned with the hierarchy/intent modeling approach. Next we have “network management”, which is aimed at managing systems of devices. I’d contend that this is something that could be served by the hierarchy/intent approach or by digital twin modeling of the administrative interfaces that represent those systems of devices. It’s the bottom layer, “element management” that we have to look more closely at.

A network, or a data center, are two interdependent things, as the lower two layers of the OSI management model suggest. One thing is a system of real devices (routers in the case of networks, servers and switches in the case of data centers, for example) and the other is a cooperative collection of resources. If we’re representing the real world, which is the case for the lower element management layer, then we’re implying a digital twin model. If we’re representing the network management layer, we have a choice of using a digital twin or a hierarchy/intent approach. Given that, I’m going to break from my tradition of top-down and start with the bottom or element management layer.

If you can model a metaverse or IoT with a digital-twin model, you could surely model a network. Each device would be represented by an object that had interfaces, properties, parameters, and so forth, and the relationship between objects would be represented by interface bindings that would map to the trunk connections. There’s no question that you could use the model to query device status, change parameters, and so forth. The question is whether that would be valuable, neutral, or risky, and the answer to that depends on just what the “devices” are and whether they are members of a higher-level cooperative group like “a network”.

Routers exhibit “adaptive” behavior, as do most switches. They interact with each other to define the way that “the network” behaves, and while they assert management APIs and report status, their normal operation is largely autonomous. In a sense, “element management” for the devices is a management process that parallels or repairs normal operations. However, MPLS supports explicit routing, and a digital-twin model of the router network could be helpful there, and nearly all network operators use MPLS.

If we were to shift focus from “router” to “SDN switch” it’s a different story completely. An SDN switch does not have adaptive behavior, it depends on a central management process to provide it with operating data, including the routing tables used in “normal” operation. There’s a presumption of a central controller, and since that controller manages the routes, it would be logical to assume that knowing the state of the devices and trunks would be helpful.

So far, IMHO, we can then say that for SDN devices and router networks that supported explicit routing, digital-twin modeling could be valuable. For (largely enterprise) router networks without MPLS explicit routing, then, the question would be whether it would be useful or risky.

The risk in a digital-twin model to control a router network is, ironically, that it would encourage something to exercise control over a network behavior that’s supposed to be adaptive. There is an easy solution to this, though; don’t provide a mechanism to alter things that are supposed to be adaptive. Yes, it would be possible to indirectly alter adaptive behavior by, for example, disabling an interface or the device, but that risk exists with any management interface that exposes those capabilities, directly or indirectly. We can discount any incremental risk, then.

But lack of risk doesn’t constitute a benefit. We can assume that the network, at least, would have a management system that supported device alarms, so having the ability to generate those alarms off a digital-twin model doesn’t add much, if anything. Can we identify anything interesting we could do with the digital-twin model? Yes, but not much.

The most obvious possible benefit of a digital-twin model is to provide an abstraction layer that could support a mixture of SDN and adaptive routing within a single network administration. Digital twinning, as I’ve said, would be a natural partner for an SDN controller, since that controller is responsible for route management in the network. SDN is likely to have to phase in or be a part of a network, rather than to be everything, as Google’s Andromeda illustrates. We could see “virtual” elements of the digital-twin model representing router enclaves, and “real” elements representing the SDN switches.

A related benefit would be the creation of an abstraction layer for EMS/NMS processes to work through. If we think of a router network as being a resource enclave, similar to a cloud resource pool, we know from experience with the latter that “virtualization” would normally involve creating an abstraction (a “virtual machine” is one in the cloud/server space) that would then be mapped to the underlying resources. Could we view this as the bottom layer of service lifecycle management? In any event, a standard abstraction layer could allow a single NMS toolkit to work with every vendor, every device.

The final benefit, I think, could depend on the extent to which the first two benefits are considered significant. If we did deploy a digital-twin model as an abstraction layer, might we then unify the “network” management of mixed router/virtual-function networks. I think that would be an almost-inevitable outcome if we actually thought about network-resource abstraction through digital twinning, but I’m not sure whether the notion would arise based on this point alone. NFV has failed to gain broad traction inside a network, and only limited traction (via uCPE) even at the edge. 5G function hosting may well be too localized to present much of a challenge in mixed-network operations.

I think that the role of a digital-twin model in NMS could be justified in theory, but may be difficult to develop in practice. Multi-vendor abstraction missions are never popular with vendors who want to be the only player in a given network. SDN, despite its proven applications in the core network (by Google) hasn’t advanced to broadly support a core mission yet, and it may do so only if we end up with metro-mesh metaverse networking down the line. Networks today are massive sunk costs, and buyers are as reluctant to threaten them with new ideas as the vendors are to open them up. We’ll have to wait to see whether other forces move the needle on this topic!

Should Operators Fear the Metaverse, Embrace It, or Both?

Would the metaverse be good for operators? That question was touched on in a Light Reading piece on MWC, which pointed out (correctly) that the show was surprisingly short on “mobile” news. It was bigger, at least, on metaverse, which prompted the author to comment “Nobody could explain why the metaverse is a good thing for operators, though. If it happens (debatable), it could throw up a tidal wave of data that soon overwhelms today’s networks.”

Let’s forget the assessment that the metaverse is debatable and focus on the two other points. First, that nobody could explain why the metaverse would be good for operators, and second that it might create a tidal wave of data that would overwhelm networks. I think that the two points are related, meaning that there’s an underlying presumption on which both are based. That presumption is that the only network impact of the metaverse would be to increase traffic, and that’s one of those “both-true-and-false” points.

The “true” side is that it would indeed be likely that the metaverse would increase user appetite for bandwidth. It would surely take less network bandwidth to make a social media comment than to have two avatars successfully shake hands. But this is a modest truth.

One reason is that operators are trying to convince users that they need higher broadband capacity. A hundred meg? Peanuts. A gig? Table stakes; if you want street creds you need at least two. Why? 8K video wouldn’t consume anything like that, and there are precious few 8K sources out there. Online gaming is probably very similar to the metaverse in terms of bandwidth appetite, and operators aren’t collapsing on that, nor are they trying to convince users to hold back on network upgrades.

Another reason is that until we know how the tasks associated with creating a metaverse and projecting it to users are broken down and distributed, we can’t hope to know what traffic would be associated with the process. Even the nature of the metaverse is still open. Is it going to be a fully realistic artificial reality, like a good online game, or is it going to be something more like human-synchronized Pac-Man that mimics our movements as we “run” through a maze? Where will the structure of what I called in previous blogs a “locale” be hosted, and where will visualizations based on that structure be derived? You get the picture.

The final reason is that even with our limited knowledge of the architecture of a metaverse application, it’s pretty clear that access bandwidth is the least of the technical issues to consider. Even, the least of the network issues, which means that the impact on capacity as we know it (which is from looking into the user side of the access connection) is actually a minimal concern.

What, then, is the issue, positive or negative or even null, for operators? The answer is “super-disintermediation”.

I do not believe that the metaverse will drive new investment or competitors in broadband access. That’s an area of the network where nobody other than a regulated monopoly or utility is going to venture. What it would almost certainly do is promote the dispersal of cloud hosting closer to the edge, primarily in major metro areas (there are between 50 and 100 such areas in the US and about five times that number worldwide). This investment is developing “edge computing” resources within which elements of a metaverse could be hosted.

Right now, network operators have a major financial advantage in this early deployment of the edge. First, they have the 5G hosting application that could justify the early spending, when metaversing hasn’t developed and can’t fund things. Second, they have real estate in the right places. Third, they have a low internal rate of return, which means that they can invest in projects with low ROIs and still sustain their overall financial health. These are major benefits, but while they would facilitate operator entry, they don’t guarantee it, and even players without these benefits could be induced to enter the space if the operators don’t make a move.

The metaverse, in the perception of the public, is a social-media concept. Facebook, facing market saturation and revenue stagnation in traditional social media, rebranded itself to “Meta”, proof that they believe strongly that the metaverse is the next level of social media. If Meta makes a strong bet on the metaverse (how could they not?) then we should expect to see supporting technologies boosted quickly. One such technology is edge hosting. Could Meta establish data centers in 50 or 100 cities in the US? Could their business induce public cloud providers to do that (they’re at least half-way there already)?

Network operators have known about the “carrier cloud” opportunity for at least a decade, maybe two. They’ve not been willing to capitalize it because of the combination of a high “first cost” and the fact that carrier cloud takes them into a new service area, one which they’ve been reluctant to target. The fact that many operators are looking to the public cloud providers for 5G hosting strongly suggests that they’re not going to move on the edge on their own, and that could disqualify them from the edge computing opportunity, which is bad.

What’s worse is the collateral risk of losing the “metro mesh” opportunity. If the metaverse expands/evolves into a community that’s very geographically diverse, many of the virtual places where people interact will draw from areas outside a single metro. To preserve a realistic experience, those distant people will have to be connected with fairly low latency, and that means that there’s an opportunity to create an optical mesh that would connect metro edge sites. Once deployed, that could also carry other traffic, and that’s the big risk.

Would large-footprint, low-latency, IP connectivity be a revenue generator? It would surely tap into the revenue of metaverse providers like Meta itself. It would also likely support IoT applications. But the big benefit to operators is that if they developed this themselves, others would be unlikely to do that because of the cost and ROI. That would keep public cloud providers, Meta, and others from getting into the WAN business, and that’s surely something operators should fear greatly.

There’s no risk of a tidal wave of metaverse data overwhelming networks. There’s a big risk of a metaverse network (built by a cloud provider or even Meta) overwhelming traditional IP networks and services. That’s what operators need to be thinking about right now.

Is a Unified Model for Lifecycles and Real-Time Processes Possible?

In the last two blogs on modeling, I reviewed “service modeling” and “virtual-application modeling” and determined that the digital-twin approach wasn’t optimum. I then reviewed the metaverse and IoT applications’ use of modeling, and determined that the hierarchy/intent approach wasn’t optimal. This would seem to argue that there are two missions and two models, and never the twain shall meet. But is that true? That’s what we’ll address in this blog, and while I said this would be a three-blog series, it’s going to need a fourth.

In the first blog of this series, I developed the view that the hierarchy/intent model, the one similar to the TMF’s SID, was well-suited for lifecycle management of service or application elements. It did not address the functional flow of the service or application, but the management of the application’s components.

In the second blog, which focused on “digital twin” modeling, the view I expressed was that this model did essentially the opposite; it focused on the functional flow rather than on component management. That makes it well-suited for both social metaverse and IoT missions.

If we start with these assumptions, then the first question we’d have to ask is whether the “natural fit” model for one of the missions could be used effectively for the other. The second would be whether there was a mingling of missions that might justify a model with features of both digital-twin and hierarchy/intent. So, let’s ask.

Let’s start with the first question. I do not think that a hierarchy/intent model, which is focused on lifecycle management, could make a useful contribution to modeling the functional flow of social metaverse or IoT. Yes, it could manage the deployment of the elements, but since the lifecycle management of something that’s a functional flow has to look at the functions, I do not believe that the hierarchy/intent model is adequate. I think it’s clear that those applications mandate functional modeling, and that functional modeling mandates a digital-twin approach because the model has to drive things that link to the real world.

Could the digital-twin model be effective in lifecycle management? That turns out to be complicated. If we presumed that we wanted was the kind of successive decomposition that the hierarchical/intent approach offers, what would be required would be an initial model of “features”, which we could visualize (using the example from the first blog) as a circle representing “VPN Service” and a series of connected circles representing “Access”. What would then be required would be the successive “decomposition” of these circles, as we did with the other model.

Where the “complicated” arises is if we assume that what we have at any point represents something explicit in the real world. If we let this decomposition run rampant down to the device level, we’ve created a model that is not only totally brittle (any change in network routing would require it be redone), it would also be non-representative, since the structure of a network should never be visible below the level where you’re applying control. If you don’t provision individual routers to set up a service (that would be done, to the extent it’s done at all, by the management system), you shouldn’t try to portray individual routers in the model. That means that logically, the “digital twin” you’d be creating would be a twin of the relationship between the administrative elements and not of the devices.

It would also be true that there would be elements in the model, like the “VPN Service” element, that wouldn’t be a digital twin of anything. In a sense, these would be the equivalent of “locales” in my social-metaverse-and-IoT model. You’d decompose a “locale” as a container of lower-level things, and through successive decompositions along that route, you’d end up with an administrative element that would actually be a digital twin.

We’re not done with the complication, though. The digital twin model, recall, models functional flows in its primary mission. These digital twins don’t do that at all, they model only administrative elements, meaning “control flows”. This is a fundamental difference on the surface, but fundamentals are in the eye of the beholder here; we’re talking about what happens in the metaverse/IoT mission, and the question isn’t whether the missions are the same but whether the modeling strategy would work.

I think it would, but let’s see if we can prove it. The “VPN Service” object in the hierarchy/intent model is, when it’s triggered by an event (“Service Order” for example), would take the order parameters and select a decomposition that matched in requirements, based on a policy. That would cause the model for that decomposition option to be instantiated, and the objects in it would receive “Service Order” events. Thus, we can visualize this as a state/event/driven process, which I’ve always suggested we do. There is no reason why a VPN Service object in a digital-twin model couldn’t do the same thing, providing that it had state/event capability.

But can we visualize metaverse or IoT models in state/event terms? If we can, then a lot of the logic associated with the two missions would be represented in a common model, using different state/event tables and activated process sets. If not, then the baseline software for a metaverse or IoT model would have to be different.

If we’re modeling the functionality of a virtual world, the key “event” is the passage of time. We can imagine that there exists a “metaverse clock” that’s ticking off intervals. When an avatar receives a “tick event”, it would apply whatever behavior policies were associated with that event. If a locale received a tick event, it would update the three-dimensional model of the space, the model from which every viewpoint would be generated. We could draw some inferences on how the avatar would manage behavior policies (active behaviors would each be “sub-objects” just like a decomposition subordinate in the hierarchy/intent model) and they would receive tick events from the parent, which would allow them to change the avatar’s three-dimensional representation appropriately.

We could also optimize this approach a bit. Let’s assume that there’s nothing going on in a room. It doesn’t hurt to “tick” it, but if you assume the room contains a bunch of avatars, the room doesn’t “know” whether those avatars have something pending or not. But suppose that the avatars in a locale “registered” with the room that they had an action pending. We could “tick” the room and have the room pass along the “tick” to registered avatars, which would then allow them to trigger their behaviors, and then the room could remake its model of the space for visualization. The avatars would signal the room through events, in order to register for a tick. I’m not trying to write an entire application here, just illustrate that we can make both processes work as state/event-driven as long as we define states, events, and intersecting processes effectively.

OK, we’ve reached a conclusion, which is that a common software and model framework could be used to describe both a service management application and a metaverse/IoT application. You may, in the course of digging through all of this in a three-blog series, lost track of why you care, so let’s end by making that point.

Service management, application management, social metaverse, and IoT are the primary credible drivers of edge computing. Right now, we tend to think of edge computing as differing from cloud computing only by where the hosting happens to be done. I submit that the majority of cloud applications today are really build on a “platform-as-a-service” framework of web services, designed to simplify development, manage resources effectively, and optimize the experience overall. If all credible edge applications could be built on a common model and baseline software architecture, that combination could become the PaaS of the edge. If we somehow came up with a standard for it, and required that for all edge hosting, we could create an edge computing framework that allowed free migration of application components. That’s a pretty worthy goal, and I think it could be achieved if we work at it.

That seems like a conclusion right there, so why blog number four? The answer is that the digital twin model might have a more specific role, with more specific benefits, in network management applications. The difference between service and network management is that the latter is the management of real devices. An operator would normally have to at least consider having both. So for my last blog in this series, I’ll explore whether a digital twin model would be a benefit for network management missions.

Why We’re Entering the Age of Managed Services

Many of you know that I use a survey-driven demand model to forecast stuff, and recently I decided to run the model on one of my favorite topics, managed services. The model forecast significant growth over the next five years, peaking in 2025 at an annual level of over 50%. That’s sure interesting, so I wanted to talk about what’s going on now, and why my model thinks the future for managed services is so bright.

From the first days of corporate networks, it was an axiom that the total cost of network ownership (TCO) was half capex and half opex, which is another way of saying that running your network costs as much as buying it. You might expect, given this, that improvements in network management tools would have impacted that ratio of TCO components, but that’s not been the case according to enterprises. What has helped over the last two decades has been the shift to VPNs versus router-and-trunk network-building.

What has hurt netops cost control, interestingly, is security. Two-thirds of enterprises I talk with tell me that they spend more on security management than on the rest of network management. A large part of this is due to the fact that security involves so many different functions and layers, and it’s generally true that the management of a complex system is related to the complexity, meaning the number of elements you have to manage.

Another large part of security cost is the support of remote sites, including WFH. Over the last decade in particular, companies have worked hard to expand their footprint, and doing that has meant adding sites and network connections to places where there is no support staff and in most cases, no qualified technical resources at all. Internationally, on the average, over two-thirds of sites have nobody present who has what the company considers “adequate technical literacy”. Those locations not only can’t participate in security management, they can’t participate in network operations management, and in many cases can’t participate in remediation even if they’re talked through it—which can be an issue in itself for far-flung locations.

One big multinational company gave me some great input on this. They have over two-thirds of their sites outside the US, and a third are in small countries with limited technologists in the labor pool. Their first initiative to address this was to improve their operations tools, but they found that well over 80% of their problems created a loss of connectivity and so the tools couldn’t see what was wrong because something was wrong. They next worked to pick someone suitable, usually the manager of a site, and train them as technologists to provide local network/security support. In almost every case, the person they trained was gone within a year, hired away by someone else. They finally solved the problem with managed services.

This doesn’t mean that managed services are without their issues. The biggest problem for the concept has always been cost. In order to offer managed services, the service TCO has to be little more (at worst) than the perceived TCO of the self-managed option. Even where there are issues that make self-managed network operations difficult, enterprises still balk as the TCO rises toward that magic two-times-network-cost number. Managed service providers need management economy of scale, meaning that they need the cost of managing their customer base, calculated on a per-customer basis, to be better than the customer could achieve individually.

A surprisingly large number of network operations and security operations tools don’t consider management economy of scale at all. Generally speaking, management efforts are focused on responding to events. Once the number of events reaches the level where you can justify a 24×7 workforce commitment, more events will just mean more work, and more workers (you can review queuing theory and the classic bank-teller examples for details).

The key to effective bank teller operations is managing service time, and the same thing is true with managed services and economies of scale. If the human time needed to handle an event is reduced, then the number of events that a given number of people can handle per unit time is increased. For a managed service provider (MSP), this is critical in containing the price of the service and the rate of adoption while sustaining reasonable profit margins. The goal of MSP tools should be to reduce human effort as much as possible.

According to MSPs themselves, there are two specific things they’d want their tools to provide. First, they want to be able to deal with an event with the smallest possible human intervention, without sacrificing accurate handling and low rate of error. Second, they want to be able to spot an impending problem and either deal with it before it becomes a failure, or at least get as much information on it as possible before they lose contact with their remote site.

Artificial intelligence and machine learning (AI/ML) are obvious technology strategies to provide what MSPs want. It’s hard for me to get specific model data on the application of AI/ML to MSPs because my universe of MSP contacts is small and the percentage that use AI/ML is also small. However, I do have some data on the attitudes of MSPs, and enough to be able to support at least minimal modeling and forecasting.

MSPs almost universally agree that AI/ML, if properly implemented, would be of “significant” or even “critical” value to their business case. That’s the best response of MSPs to any proposed technology improvement. The average expected improvement in MSP pricing/profit and sales is roughly 33 percent.

Only about a third of MSPs say that they have had a viable AI/ML strategy presented to them. The number one reason is that MSP services are almost always linked to specific edge devices, and most of those devices come from vendors with no effective AI/ML strategy. Even where the vendor involved has a good AI/ML strategy, there’s only a bit less than a two-thirds chance that it will be presented effectively. The MSPs say that the vendor’s salespeople are likely to push hardware features and benefits explicitly, and present AI/ML benefits superficially. Where the benefits are provided, they often fail to consider the MSP’s specific role; the presentation is almost enterprise-centric.

Another issue MSPs cite in how vendors view service/security management tools in general, and AI/ML in particular, is the issue of scope. The reason why MSPs are valuable in remote sites is that those sites lack technically qualified people. A corollary to that point is that the people in these sites are unlikely to recognize the source of a network failure and identify who should be called. Every single MSP I’ve chatted with says that they get a “significant” number of calls relating to problems that have nothing to do with the service they provide. In fact, connectivity issues on premises, which include things like unplugged cables, devices that are turned off, or bad configurations on local devices or systems, are often the number one source of calls. This is a problem for MSPs because they’re expending management resources on something that they’re neither responsible for nor getting paid for.

Most MSPs would like to offer local-network management services. They say that not only would it be a profit center, it would also give them a come-back to relentless reports of problems that are outside the scope of their SLA with the customer. “We can’t see the conditions here, but with our whiz-bang Local Network Guru service, we can diagnose issues right down to the computer.” Make a profit from selling what you’re now giving away at a loss.

Let’s now circle back to the model data. The model is saying that support issues in remote sites already encourage managed services where the sites are truly remote, meaning that local technical skill is limited. Over time, the model predicts that network and security complexity will grow, and that the growth will gradually increase support problems even in less-remote locations. By 2025, over two-thirds of all remote sites will be considered “difficult” to support, and it’s this factor that multiplies the interest in managed services.

The model data suggests that while there are two obvious sources of managed services (MSPs and the network operators themselves) the fact that managed service interest tends to start in “thin” locations means that MSPs have an opportunity to jump in and gain credibility when the network operators who are local to the enterprises are unable to offer service in those thin locations. As interest in managed services shifts to more and more remote sites, the MSP offers the path of least disruption in adding the sites, which means they could have an enduring role even in major-market areas.

The model also says that local-network management add-ons to the MSP relationship are likely to follow the same thin-to-thicker path of evolution. Companies who experience the benefits of local network management in areas where staff technical skills are minimal are likely to extend those services to “thicker” sites as they’re added.

Another important consideration in managed services is that the number one driver of MSP interest is SD-WAN, for the obvious reason that SD-WAN is becoming the VPN strategy de jure for remote sites, even when those sites are in major market areas. The model says that an effective managed service story, combined with a good SD-WAN offering, could create a compelling offering in almost any market area. For the network operators, reluctance to get into the SD-WAN and managed services game risks being disintermediated by early buyer interest that they can’t or won’t address. Getting those early MSP commitments reversed will then be difficult.

The final point in our view of the managed service future comes from worker demographics. We are facing an explosion in the demand for skilled network operations people worldwide, and there is little chance that the workforce can meet that demand. “Management economy of scale” is important, as I’ve noted, but not just to lower overall cost, but also to conserve and optimize a resource that will be more important every year, and proportionally less available. Most people today can’t fix their own cars, and in many areas we’re already at the point where most enterprises can’t fix their own networks. More and more will face that problem, and as they do we can expect managed services to prosper.

Lessons from Dell in 5G and MWC

Server vendors have both special benefits and special challenges in the battle for telco infrastructure, the “telco cloud” or “carrier cloud”. On the plus side, any cloud requires servers, and having a seat at that table gives a vendor more leverage in deals, and may also help keep competitors out. On the minus side, the solution can’t be hardware alone, so server vendors will have to partner, and partnering means integration and the risk of introducing new competitors. Dell is confronting both risks with its previously announced “Telecom Systems Business”, and related and expanded offerings announced at MWC.

Like most telco players these days, Dell needs to pay homage at the 5G shrine. Telcos are committed to 5G, and to at least some feature hosting is written into 5G specifications. 5G is not only budgeted, it’s deploying, and that means it’s an active opportunity. Best of all, the open 5G initiatives are attracting a lot of interest from the mobile operators, and those initiatives give the vendors not part of the usual mobile plays a shot at future ones.

All open-model 5G demands integration by its nature. While telcos want open approaches in general, they seem to want somebody to do the heavy lifting in initial deployment integration, and to stand behind the approach to end finger-pointing and assure continued openness. Dell reflects this with a very explicit commitment to being the integrator in its offering, the giant behind the concepts overall. That leverages its server position because hardware is surely the biggest cost in the project, and the biggest seller is usually the most credible integrator.

Integration isn’t Dell’s only challenge here. Most of the mobile operators are on the fence with respect to 5G hosting, and in several dimensions. First, they’re uncertain whether they want to capitalize telco cloud hardware or simply ride on one or more public clouds. Second, they’re uncertain whether they want to rely on one of their familiar mobile partners (Ericsson and Nokia, notably, and Huawei where there’s no public policy barrier) or look for new players, and finally they’re uncertain whether the should favor a network vendor or a server/platform vendor. Dell is trying to thread all these needles at once.

Dell has four elements to its telco strategy, according to their website. They are Telecom Multi-Cloud Foundation, Open RAN, 5G Converged Core, and Services Edge. The first references Red Hat (OpenShift and OpenStack), VMware (Telecom Cloud Platform), and Wind River (Wind River Studio). The second adds VMware and Mavenir Open RAN and Altiostar, NEC, Netcracker and Red Hat as partners. 5G Converged Core doesn’t reference a specific configuration partner set, and Services Edge currently lists Red Hat OpenShift and Intel SmartEdge as elements.

It would be fair, I think, to characterize Dell’s telco stuff as being responsive/reactive rather than evangelistic. Their website doesn’t try to sell 5G hosting or edge computing as much as present strategies to prospects who already have a specific interest. That, I think, is a strategy consistent with marketing to the telcos, and one likely to be effective if the telco prospect is looking for 5G carrier cloud or perhaps even public cloud, but I wonder whether it’s the optimum strategy for the market as it’s now evolving. I think that current evolution has to address three specific strategic options—the carrier cloud option, the public cloud option, and the metro option.

Dell, as a server vendor, would naturally fit well into the carrier cloud. They’re smart to mention edge computing opportunity in their material, but they are at the same time careful not to sound like they’re dissing the public cloud choice that many operators are making. That makes their positioning of what should be their favored model a little tentative. They could make up for that at the sales level, in particular calling out the risk of lock-in that public-cloud telecom dependence creates, the risk of outages that are beyond their control, and the likely higher price. They don’t do it in their website positioning.

The public cloud option is perhaps even a little more carefully danced around. It’s a bit inherent in the “multi-cloud” platform thinking, and of course the partners Dell identifies in that space are all happy to promote their software either on a server or on a VM or container in a public cloud. I think Dell knows that they can’t really either push or question the public cloud choice given their software partners, and they likely think they need to at least have a fallback position to the hosted carrier cloud option, if the telco prospect doesn’t want to incur the first costs.

That leaves the metro option, and this one has both interesting potential and risk potential. The interesting characteristic about metro positioning is that it focuses on the place where the most profound changes are going to be made to telco networks. There are a variety of ways that “metro” could come together, meaning that Dell would have a wider range of options to push at the sale level if they started with a metro strategy.

The risk comes from the fact that metro is also where almost every vendor interested in the network of the future will have to jockey for position. Network vendors, for example, are likely to make a strong play there, as would platform vendors, software vendors, server vendors…you get the picture. To draw an analogy from the wild, metro is a big carcass but it attracts a lot of scavengers.

Dell’s strategy for the telco space is already heavily dependent on partners. Metro, as an option, would demand some partners in the network equipment space. Dell has switches, including their PowerEdge data center switches, but it doesn’t have routers. Most router vendors, and all the major ones, are also switch vendors so Dell’s portfolio would collide with theirs. Not only that, the major router vendors all have telco strategies of their own, which puts Dell into potential collision with them (and vice versa).

The only answer to this, if Dell wants a metro option, would be to adopt a vendor-neutral router strategy for their metro story. If Dell created a metro model based on abstract router capabilities, they would dodge the need for a specific partnership and also address the fact that most telcos would already have some metro routers in place.

In the end, Dell needs to find a way to sell servers to telcos, of course. That almost surely means they need to make carrier cloud the preferred telco strategy for edge computing, and that will be a tall order. Operators are concerned about making a large carrier cloud investment to support 5G when in truth they don’t have a clue what they could do with the technology in the broader edge mission. Early MWC activity isn’t giving them much good news, either.

Microsoft, at MWC, is promoting their own 5G and MEC offerings for Azure, and many mobile operators seeing public cloud hosting as an alternative to a lot of 5G edge capex. And while Orange is committed to Oracle’s 5G Core signaling/routing, it’s committing to 5G features from both Ericsson and Nokia. Dell really needs Open RAN validation to be able to ride 5G to carrier cloud, and while there are some open-model 5G successes, it may well be that the lack of a widely accepted edge-computing kicker to 5G carrier cloud will stall momentum.

There’s another problem for Dell behind the technology scenes, which is financial. Their latest quarterly earnings weren’t a big hit; rival HPE did better. 5G, while important to the telcos, isn’t likely to pay off for server vendors like Dell unless it expands radically beyond simple function hosting. The only certain 5G element is the radio technology, and Dell doesn’t make that. So despite the fact that MWC seems determined to push the esoteric 5G benefits (“slicing”, “private 5G”), the reality is that only edge computing on a large scale can make even open-model 5G pay off significantly, and the telcos need leadership, insight, inspiration, of anything beyond function hosting is going to happen.

This is where Dell joins former-subsidiary-now-telco-partner VMware in needing to exercise some marketing/positioning aggression. Any vendor who hopes that the telecom industry is going to catch strategic edge computing fire and warm all vendors is surely in for a sharp disappointment. There’s a lot to win, or lose, in telco cloud, and those who aim to be in the former group need to get with it.

Modeling Digital Twin Systems

If you look at a service, or an application, you see something that’s deployed and managed to support a mission that’s really defined by the elements you’re working with. We can deploy the components of a service by connecting the feature elements, but how they interact with each other to produce the service is really outside the scope of the model. Our hierarchical feature/intent approach is great for that, and we can prove it out by citing the fact that the TMF’s SID, likely the longest-standing of all service models, works that way.

If you look at a metaverse, a social-media application, what you see is an application that has to create a virtual reality, parts of which are digitally twinned with elements in the real world. The same is arguably true for many IoT applications, because what they’re doing is describing the functionality, not how the functionality is assembled and managed. The difference may be subtle, so an example is in order.

Let’s take the simplest of all metaverse frameworks, a single virtual room in which avatars interact. The room represents a virtual landscape that can be visualized to the users whose avatars are in it, and it also represents a set of constraints on how the avatars can move and behave. If the room has walls (which, after all, is what makes it a room in the first place) the avatars can’t move beyond them, so we need to understand where the avatars are with respect to those walls. There are constraints to movement.

If we start with the visualization dimension, we can say that an avatar “sees” three-dimensional space from a specific point within it. We have all manner of software examples that produce this kind of visualization today; what you need is a structure and a point of view, and the software can show what would be “seen”. Assuming our own avatar was alone, the task of visualizing the room is pretty simple.

The avatars, if this is a realistic/useful virtual world, also have to be able to see each other, and what they see must be appropriate to the direction they’re facing, the structure of the room, and the other avatars that are in the field of vision. The challenge visualizing the avatars are that they are likely moving and “behaving”, meaning that they may be moving arms, legs, and head, could be carrying something, etc.

To meet this challenge, we would have to say that our virtual room, with static elements, includes dynamic elements whose three-dimensional shapes could vary under their own behavioral control. They might be jumping up and down, waving their arms, etc. Those variations would mean that how they looked to us would depend not only on our relative positions, but also on their three-dimensional shape and its own orientation relative to us.

Now let’s look at the constraint side. As I noted above, you can’t have avatars walking through walls (and since we’re postulating a one-room metaverse, there’d be nowhere else to go), but you also can’t have them walking through each other, or at least if they try you have to impose some sort of policy that would govern what happened. The key point is that our room sets a backdrop, a static framework. Within it, there are a bunch of avatars whose movement represents what the associated person (or animal, or non-player character) wants to do. When that movement is obstructed (by actually hitting something, or by approaching it close enough that there’s a policy to handle that approach) we have to run a policy. Throughout all of this, we have to represent what’s happening as seen through the eyes of all the avatars.

With only us (our own avatar) in our room, we can still exercise all the motions and movements and behaviors, constrained only by the static elements of the room. However, some of those “static” elements might not be truly static; we might have a vase on a table, a mirror on the wall. Interacting with either would have to result in something, which means those objects would effectively be avatars with their own rules, synchronized not with other humans but with software policies.

Put a second avatar in the room, and what changes is that the behavior of one of our “objects” is now controlled outside the software, by the human it represents. That means that we have to be able to apply the control of that human to the behavior of the avatar, in as close to real time as possible. If there is a significant lag in the control loop, the interactions and even views of the now-two-people-involved would be different, and that would make their interactions very unrealistic.

I think that what we have here, then, is the combination of two fairly familiar problems. Problem One is “what would I see from a particular point of view?” and Problem Two is “how would a series of bodies interact if each were moving under their own rules and had their own rules for what happened in an encounter?” For both these things to work as a software application, we need what I think are two model concepts.

The first thing is the notion of our room, which I’ve generalized to call a “locale”. A locale is a virtual place where avatars are found. As is the case with our simple room, it has static elements and avatar elements, the latter representing either human-controlled avatars or software-generated avatars. Thus, a locale is a container of sorts, a virtual place with a set of properties and policies. One of the big benefits of the locale is that it creates a virtual community (of the stuff that’s in it), and so it limits the extent of the virtual world of the metaverse that actually has to be related to a given human member.

The second thing we need is the avatars. They need to be controlled, and in some cases will have to support movement, so they’ll need policies to govern interactions with other things. The issue of latency in the control loop that I noted above will apply to the interface between the avatar and whatever controls it. An avatar would have to be given a defined behavior set, activated by the human it represents or by software. That behavior set would have to include both interaction policies and how the behavior would impact the way the avatar looks (in three dimensions). You’d need the avatar’s position in the locale and its orientation, the latter both to determine its point of view and to determine what aspect it was presenting to others.

It’s pretty easy to see that the whole process begs for a model. Avatars are objects with parameters and policies and behaviors associated with them, and presumably these would be recorded in the model. The locales would have the same set of attributes. A locale would be essentially a virtual host to a set of avatars, and avatars in a real metaverse might move into and out of a given locale, presumably from/to another one.

The models representing both locales and avatars would be “deployed” meaning that they’d be committed to a hosting point. The selection of this hosting point would depend on the length of the control loops required to support the avatar-to-human connections. I would expect that some locales would be persistently associated with users in a given area, and so would be persistently hosted there. Some locales might be regularly “empty” and others might have a relatively high percentage of users/avatars moving in and out (“churn”). In the latter situations, it might be necessary to move the locale hosting point, move some avatar hosting points, or both.

I would assume that an avatar might well be a kind of distributed element, with one piece located in proximity to the user the avatar represents, and the other placed at a “relay point” where it hosted the model (synchronized with the local representation) and fed that to the locale. That way the details of the current state of an avatar could be easily obtained by the locale. In most cases, the represented user wouldn’t be changing things so rapidly that the synchronicity between the two distributed pieces would be a factor.

This would be particularly true when a controlling human had an avatar do something that continued for a period of time, like “walk” or “jump up and down”. As long as the avatar doesn’t “encounter” something and have to run a policy, it could be managed by the relay and the synchronizing information sent both to the locale and to the element close to the controlling human. That would also facilitate the creation of complex behaviors that might take some time to set up; human would set up the local element, which would then feed the completed change to the locale.

If there were a lot of changes in the avatar content of a locale, it might indicate that a better hosting point could be found. The trick to making that work would be to calculate optimal position from the control loop lengths and perhaps a history of past activity; you wouldn’t want to move locale hosting for a single new avatar that never or rarely showed up. When a hosting move is indicated, you’d have to come up with an acceptable way of telling all the users in the locale that things were suspended for a period. Maybe the lights go out? Obviously, you’d not want to move too often, nor take too long to make a move, or the result would be disruptive of the experience.

It’s not hard to see how this would apply to IoT, either. The “locale” would be a set of linked processes that could be manufacturing, transportation, or both. The machinery and goods would be avatars, and the difference between this and a social metaverse would like in how the policies worked to define behaviors and movement. This is why I think you could derive a single model/software architecture for both.

That architecture would differ from the service/application architecture and model, as I’ve already suggested, in that the service/application approach is really about managing the lifecycle of things while the relationship between those things is what creates the experience. The digital twin approach is really about defining the creation of the experiences, and lifecycle management is just a minor adjunct.

If we could harmonize these two models in any way, it might help define a single edge computing architecture, and that’s what we’ll address in the third and final blog of this series.