My Experiences Modeling the Metaverse

My notion of a metaverse of things (MoT) is grounded in the assumption that all applications of this type are a form of digital twinning. There’s a real-world framework that is, at least in part, reflected into the virtual world. For applications like industrial IoT, there’s a structure that’s being mirrored, and that structure has specific components that represent pieces of the industrial process. For MoT, or any digital twinning approach, to work, we need to be able to represent the structure and its rules in the virtual world.

For applications like social media, the things that are reflected into the real world are freely moving elements—people. There may still be a structure, representing the static environment of the metaverse’s virtual reality (like a game), but the structure is really, literally, in the eye of the beholder because how it looks and behaves is related to the behavior of the free-will people who are roving about in it.

If MoT is to be implemented, the implementation would have to include the modeling of the virtual world and the linkage between it and the real world. It would be helpful if a single approach could model all kinds of MoT applications, because then the same software could decode any model of a digital twin and visualize it as needed. The question of how to model this sort of thing isn’t a new one, though. Some of my LinkedIn friends have encouraged me to talk specifically about the process of modeling a metaverse, and as it happens I looked at the problem at the same time as I started my work on modeling network services.

When I started to look at network services as composable elements fifteen years ago or so, I had the same challenge of how to model the composition, and ExperiaSphere was the result. I started with a Java-defined model that essentially programmed a service from objects (Java classes) but in the second phase transitioned to the notion of a model. However, ExperiaSphere never intended to model the network or resources, just the functions included in a service. The presumption was that lower-level tools would deploy functional elements as virtual pieces committed to the cloud.

The ExperiaSphere project had a spin-off into social media, though. There was a set of elements that represented a social framework and the interactions, and that opened the question of how you’d model social systems (for those interested, this was called “SocioPATH”). The result of thinking on this was another activity I called “Dopple”, the name representing the German word “doppleganger” which can mean a kind of virtual double. That’s a reasonable name for a digital twin, I think, and in fact a Dopple object was designed to be a representation of something like a person. Broadly speaking, a Dopple was something that represented either a real-world thing or a thing that was intended to act, in the virtual world, as though it was real.

A person’s Dopple would be, in modern terms, an avatar. So would the Dopple of a “non-player character” in the terminology of Dungeons and Dragons. Dopples would have APIs that linked them to the visualization framework, to software elements that controlled how they looked and moved, and so forth. You could also represent fully static things like rooms and buildings and mountains as Dopples, but of course as something in the virtual world became more static than dynamic, there’d likely be a value to simply representing it as a CAD-like model.

In the real world, everyone has a personal view, so it has to be the same in a metaverse. Just as there are billions of people and trillions of discrete objects in the real world, the same might be true for a metaverse. However, in both cases the personal view of an individual acts as a filter on that enormous universe of stuff, and that means that the Dopple concept has to be personal-centered and thus able to understand what’s inside each “personal view”.

My assumption was that, like ExperiaSphere’s “Experiams”, Dopple objects would form a hierarchy. The top level of ExperiaSphere’s model is a “Service Experiam” representing the entire service. The top level of Dopple, in my conception, was a locale. A locale is (as the name suggests) a place that contains things. The scope of a locale is determined by focus or the “visibility” of the individual whose personal view is the center of the metaverse, so to speak. If the metaverse isn’t modeling a social system but an industrial process, the locale would represent a place where the process elements were concentrated in the real world. In IoT, a locale could represent an assembly line, and in a social metaverse it could represent the surroundings of a digitally twinned human, an avatar.

As software objects, a Dopple is a collection of properties and APIs. I assumed that there would be four “dimensions” of APIs associated with a Dopple, and the metaverse would be a collection of Dopples.

The first dimension is the “Behavior” dimension, and this is the set of APIs that represent the linkage of this particular Dopple object to the real world. Generally, it would represent the pathway through which the Dopple-as-an-avatar would be synchronized with the real-world element it represents.

The second dimension is the “GUI” dimension, and here we’d find the APIs that project the representation of the Dopple to the sum of senses that the metaverse supports. Note that this representation is limited to the Dopple itself, not its surroundings. However, the same dimension of APIs would govern what aspects of the metaverse are “visible” to the Dopple.

Dimension number three is the “Binding” dimension, which represents how the Dopple is linked in the hierarchy of Dopples that make up the metaverse. In a social metaverse, the binding links the Dopple to superior elements, like a “locale Dopple” and subordinate elements such as the representation of what an avatar is “carrying” or “wearing”.

The final dimension is the “Process” dimension, and this is a link to the processes that make up the Dopple’s behaviors. My thought was that like ExperiaSphere’s Experiams, each Dopple had a state/event table that defined what “events” it recognized, what “state” it was in, and what processed a given event in this particular state.

In my approach, a “Behavior Dopple”, meaning one directly coupled to the real world, had a hosting point, a place where the Dopple object was stored and where its associated processes were run. “Behavior Dopples” could represent people, industrial elements, NPCs (in gaming/D&D terms), or real places.

Every Behavior Dopple has (in theory, one or more) an associated locale, meaning there is a superior Dopple bound to it that represents the viewpoint of what the Dopple represents. If multiple Behavior Dopples are in a common locale, they have a common superior Dopple and their point of views are a composite of that superior Dopple’s bound subordinates. If you wave in a metaverse, your wave is visible within any bound locale superior Dopples.

To illustrate this (complex but important) point, suppose your avatar is in a virtual room, attending a virtual conference. Your Behavior Dopple waves, and the wave is visible to those in the same virtual room and also to those attending the virtual conference. A virtual conference, in my original Dopple model, was a “Window Dopple” that was bound to the locales of each of the attendees. These Dopples “filtered” the view according to the nature of the conference, so that if your camera was off then your personal/Behavior Dopple wouldn’t be “seen” but would be heard. I assumed that Window Dopples would present a “field of view” that represented a camera, and things outside that field of view would not be seen by the conference. A null field of view is equivalent to the “camera-off” state.

The “Window Dopple” illustrates that a link between two locales (meaning their Dopples) can be another Dopple, which further illustrates that a metaverse is a structure built on Dopples. The same concept can be applied to IoT. We might have a Factory Locale and a Warehouse Locale, each represented by a Dopple and each containing stuff. The product of the Factory Locale (created by the manufacturing process) is a Dopple, which is then transported to the warehouse (as represented by a Window Dopple linking factory and warehouse).

The reason for all these dimensions of APIs and Dopple objects was to create a framework that could be adapted to any relationship between the real and virtual worlds, and to any means we might find convenient for representing the virtual world to its inhabitants. Like most software architects, I always like the idea of a generalized approach rather than a one-off, and which of the two we end up with is probably the biggest question in the world of metaverse and MoT. If we create “silo metaverses”, we multiply the difficulties in creating a generalized hosting framework and the cost of metaverse software overall. At some point, cost and difficulties could tip the balance of viability for some potential applications.

We probably won’t establish a single model for the metaverse in 2022, and we may never do so. What we can expect for this year is a sign of progress toward that single, general, approach. If we don’t see it, we can expect that metaverse and MoT won’t fully realize their near-term potential, and perhaps never realize their full potential at all.