Just How Real Could our Virtual Metaverse Be?

Facebook is said to be considering renaming itself to claim ownership of the “metaverse”, which has led to many (especially those who, like me, are hardly part of the youth culture) wonder just what that means. The fact is that the metaverse is important, perhaps even pivotal, in our online evolution. It may also be pivotal in realizing things like the “contextual” applications I’ve blogged about.

At the high level, the term “metaverse” defines one or many sets of virtual/artificial/augmented (VR/AR) realities. Games where the player is represented by an avatar are an example, and so are social-network concepts like the venerable Second Life. Since we’ve had these things for decades (Dungeons and Dragons, or D&D, was a role-play metaverse and it’s almost 50 years old) you’d be right thinking that new developments have changed the way we think about this high-level view, and you’d be right.

Facebook’s fascination with the metaverse seems strongly linked with social media, despite the company’s comments that it views the metaverse as a shift. Social media is an anemic version of a virtual reality, something like the D&D model, that relied on imagination to frame the virtual world. Metaverse presumes that the attraction of social media could be magnified by making that virtual world more realistic.

Many people today post profile pictures that don’t reflect their real/current appearance. In a metaverse, of course, you could be represented by an avatar that looked any way you like. Somebody would be selling these, of course, including one-off NFT avatars. There would also be a potential for creating (and selling) “worlds” that could be the environment in which users/members/players interacted. You can see why Facebook might be very interested in this sort of thing, but that doesn’t mean it would be an easy transformation.

One issue to be faced is simple; value. We’ve probably all seen avatars collaborating as proxies for real workers, and if we presumed a metaverse could implemented properly, that could likely be done. The question is whether businesses would value the result. Sure, I could assume that a virtual-me wrote on a virtual-whiteboard and other virtual-you types read the result through artificial reality goggles, but would that actually increase our productivity? Right now, we’re all talking as though metaverse was an established technology, and positing benefits based on the most extensive implementation. Is that even possible?

Metaverse today demands a high degree of immersion in a virtual reality (like a game) and a high-level integration of the real world with augmentation elements in augmented reality scenarios. Most aficionados believe that metaverses require AR/VR goggles, a game controller or even body sensors to mimic movements, and a highly realistic and customized avatar representing each person. As such, a metaverse demands a whole new approach to distributed and edge computing. In fact, you could argue that a specific set of principles would have to govern the whole process.

The first principle is that a metaverse has to conform to its own natural rules. The rules don’t have to match the real world (gravity might work differently or not at all, and people might be able to change shapes and properties, for example) but the rules have to be there, even a rule that says that there are no natural rules in this particular metaverse. The key thing is that conformance to the rules has to be built into the architecture that creates the metaverse, and no implementation issues can impact the way that the metaverse and its rules are navigated.

The second principle is that a metaverse must be a convincing experience. Those who accept the natural rules of the metaverse must see those rules in action when they’re in the metaverse. If you’re represented by an avatar, the avatar must represent you without visual/audible contradictions that would make the whole metaverse hard to believe.

Rule three is that the implementation of a metaverse must convey the relationships of its members and its environments equally well to all. This is the most difficult of the principles, the one that makes the implementation particularly challenging. We might expect, in the real world, to greet someone with a hug or a handshake, and we’d have to be able to do that in the metaverse even though the “someones” might be a variable and considerable geographic distance from each other.

Rule one would be fairly easy to follow; the only issues would emerge if the implementation of a metaverse interfered with consistent “natural-for-this-metaverse” behavior. It’s rules two and three, and in particular how they interact in an implementation, that creates the issue.

If you’ve ever been involved in an online meeting with a significant audio/video sync issue, or just watched a TV show that was out of sync, you know how annoying that sort of thing is, and in those cases it’s really a fairly minor dialog sync problem. Imagine trying to “live” in a metaverse with others, where their behavior wasn’t faithfully synchronized with each other, and with you. Issues in synchronization across avatars and the background would surely compromise realism (rule two) and if they resulted in a different view of the metaverse for its inhabitants, it would violate rule three.

Latency is obviously an issue with the metaverse concept, which is why metaverse evolution is increasingly seen as an edge computing application. It’s not that simple, of course. Social media contacts are spread out globally, which means that there isn’t any single point that would be “a close edge” to any given community. You could host an individual’s view of the metaverse locally, but that would work only as long as there were no other inhabitants who weren’t local to the same edge hosting point. If you tried to introduce a “standard delay” to synchronize the world view of the metaverse for all, you’d introduce a delay for all that would surely violate rule two.

An easy on-ramp to a metaverse to avoid a problem with latency would be to limit the kinds of interactions. Gaming where a player acts against generated characters is an example of this. To avoid latency problems when players/inhabitants interact with each other would require limiting interactions to the kind that latency wouldn’t impact severely. We may see this approach taken by Facebook and others initially, because social-media users wouldn’t initially expect to perform real-world physical interactions like shaking hands. However, I think this eventually becomes a rule two problem. That would mean that controlling latency could end up as a metaverse implementation challenge.

One possible answer to this would be to create “local metaverses” that would represent real localities. People within those metaverses could interact naturally via common-edge technology. If someone wanted to interact from another locality, they might be constrained to use a “virtual communicator”, a metaverse facility to communicate with someone not local, just as they’d have to in the real world.

Another solution that might be more appealing to Facebook would be to provide rich metaverse connectivity by providing rich edge connectivity. If we supposed that we could create direct edge-to-edge links globally, each of which could constrain latency, then we could synchronize metaverse actions reasonably well, even if the inhabitants were distributed globally. How constrained latency would have to be is subjective; gaming pros tell me that 50 ms would be ideal, 100 ms would be acceptable, and 200 ms might be tolerable. The speed of light in fiber is roughly 128 thousand miles per second, so a hypothetical fiber mesh of edge facilities could deliver something in just over 90 ms anywhere on the globe, if there were no processing delays to consider.

The obvious problem with this is that a full mesh of edge sites would require an enormous investment. There are roughly 5,000 “metro areas” globally, so fully meshing them with simplex fiber paths would require almost 16 million fiber runs (8 million full-duplex connections). If we were to create star topologies of smaller metro-to-larger-metro areas, we could cut the number of meshed major metro areas down to about 1,000, but that only gets our fiber simplex count down to about a million paths. The more we work to reduce the direct paths, the more handling we introduce and the more handling latency is created.

Obviously some mixture of the two approaches is likely the only practical solution, and I think this is what Facebook has in mind in the longer term. They may start with local communities where latency can be controlled and allow rich interaction, then see where they could create enough edge connectivity to expand community size without compromising revenue.

Telcos and cloud providers, of course, could go further. Google and Amazon both have considerable video caching (CDN) technology in place, and they could expand the scope of that to include edge hosting. Same with CDN providers like Akamai, and social media providers like Facebook might hope that one of these outside players invest in heavily connected edge hosting, so they can take advantage of it.

Technology isn’t the problem here, it’s technology cost. We know how metaverse hosting would have to work to meet our three rules, but we don’t know whether it can earn enough to justify the cost. That means that the kind of rich metaverse everyone wants to talk and write about isn’t a sure thing yet, and it may even take years for it to come to pass. Meanwhile, we’ll have to make do with physical reality, I guess.