Modeling Digital Twin Systems

If you look at a service, or an application, you see something that’s deployed and managed to support a mission that’s really defined by the elements you’re working with. We can deploy the components of a service by connecting the feature elements, but how they interact with each other to produce the service is really outside the scope of the model. Our hierarchical feature/intent approach is great for that, and we can prove it out by citing the fact that the TMF’s SID, likely the longest-standing of all service models, works that way.

If you look at a metaverse, a social-media application, what you see is an application that has to create a virtual reality, parts of which are digitally twinned with elements in the real world. The same is arguably true for many IoT applications, because what they’re doing is describing the functionality, not how the functionality is assembled and managed. The difference may be subtle, so an example is in order.

Let’s take the simplest of all metaverse frameworks, a single virtual room in which avatars interact. The room represents a virtual landscape that can be visualized to the users whose avatars are in it, and it also represents a set of constraints on how the avatars can move and behave. If the room has walls (which, after all, is what makes it a room in the first place) the avatars can’t move beyond them, so we need to understand where the avatars are with respect to those walls. There are constraints to movement.

If we start with the visualization dimension, we can say that an avatar “sees” three-dimensional space from a specific point within it. We have all manner of software examples that produce this kind of visualization today; what you need is a structure and a point of view, and the software can show what would be “seen”. Assuming our own avatar was alone, the task of visualizing the room is pretty simple.

The avatars, if this is a realistic/useful virtual world, also have to be able to see each other, and what they see must be appropriate to the direction they’re facing, the structure of the room, and the other avatars that are in the field of vision. The challenge visualizing the avatars are that they are likely moving and “behaving”, meaning that they may be moving arms, legs, and head, could be carrying something, etc.

To meet this challenge, we would have to say that our virtual room, with static elements, includes dynamic elements whose three-dimensional shapes could vary under their own behavioral control. They might be jumping up and down, waving their arms, etc. Those variations would mean that how they looked to us would depend not only on our relative positions, but also on their three-dimensional shape and its own orientation relative to us.

Now let’s look at the constraint side. As I noted above, you can’t have avatars walking through walls (and since we’re postulating a one-room metaverse, there’d be nowhere else to go), but you also can’t have them walking through each other, or at least if they try you have to impose some sort of policy that would govern what happened. The key point is that our room sets a backdrop, a static framework. Within it, there are a bunch of avatars whose movement represents what the associated person (or animal, or non-player character) wants to do. When that movement is obstructed (by actually hitting something, or by approaching it close enough that there’s a policy to handle that approach) we have to run a policy. Throughout all of this, we have to represent what’s happening as seen through the eyes of all the avatars.

With only us (our own avatar) in our room, we can still exercise all the motions and movements and behaviors, constrained only by the static elements of the room. However, some of those “static” elements might not be truly static; we might have a vase on a table, a mirror on the wall. Interacting with either would have to result in something, which means those objects would effectively be avatars with their own rules, synchronized not with other humans but with software policies.

Put a second avatar in the room, and what changes is that the behavior of one of our “objects” is now controlled outside the software, by the human it represents. That means that we have to be able to apply the control of that human to the behavior of the avatar, in as close to real time as possible. If there is a significant lag in the control loop, the interactions and even views of the now-two-people-involved would be different, and that would make their interactions very unrealistic.

I think that what we have here, then, is the combination of two fairly familiar problems. Problem One is “what would I see from a particular point of view?” and Problem Two is “how would a series of bodies interact if each were moving under their own rules and had their own rules for what happened in an encounter?” For both these things to work as a software application, we need what I think are two model concepts.

The first thing is the notion of our room, which I’ve generalized to call a “locale”. A locale is a virtual place where avatars are found. As is the case with our simple room, it has static elements and avatar elements, the latter representing either human-controlled avatars or software-generated avatars. Thus, a locale is a container of sorts, a virtual place with a set of properties and policies. One of the big benefits of the locale is that it creates a virtual community (of the stuff that’s in it), and so it limits the extent of the virtual world of the metaverse that actually has to be related to a given human member.

The second thing we need is the avatars. They need to be controlled, and in some cases will have to support movement, so they’ll need policies to govern interactions with other things. The issue of latency in the control loop that I noted above will apply to the interface between the avatar and whatever controls it. An avatar would have to be given a defined behavior set, activated by the human it represents or by software. That behavior set would have to include both interaction policies and how the behavior would impact the way the avatar looks (in three dimensions). You’d need the avatar’s position in the locale and its orientation, the latter both to determine its point of view and to determine what aspect it was presenting to others.

It’s pretty easy to see that the whole process begs for a model. Avatars are objects with parameters and policies and behaviors associated with them, and presumably these would be recorded in the model. The locales would have the same set of attributes. A locale would be essentially a virtual host to a set of avatars, and avatars in a real metaverse might move into and out of a given locale, presumably from/to another one.

The models representing both locales and avatars would be “deployed” meaning that they’d be committed to a hosting point. The selection of this hosting point would depend on the length of the control loops required to support the avatar-to-human connections. I would expect that some locales would be persistently associated with users in a given area, and so would be persistently hosted there. Some locales might be regularly “empty” and others might have a relatively high percentage of users/avatars moving in and out (“churn”). In the latter situations, it might be necessary to move the locale hosting point, move some avatar hosting points, or both.

I would assume that an avatar might well be a kind of distributed element, with one piece located in proximity to the user the avatar represents, and the other placed at a “relay point” where it hosted the model (synchronized with the local representation) and fed that to the locale. That way the details of the current state of an avatar could be easily obtained by the locale. In most cases, the represented user wouldn’t be changing things so rapidly that the synchronicity between the two distributed pieces would be a factor.

This would be particularly true when a controlling human had an avatar do something that continued for a period of time, like “walk” or “jump up and down”. As long as the avatar doesn’t “encounter” something and have to run a policy, it could be managed by the relay and the synchronizing information sent both to the locale and to the element close to the controlling human. That would also facilitate the creation of complex behaviors that might take some time to set up; human would set up the local element, which would then feed the completed change to the locale.

If there were a lot of changes in the avatar content of a locale, it might indicate that a better hosting point could be found. The trick to making that work would be to calculate optimal position from the control loop lengths and perhaps a history of past activity; you wouldn’t want to move locale hosting for a single new avatar that never or rarely showed up. When a hosting move is indicated, you’d have to come up with an acceptable way of telling all the users in the locale that things were suspended for a period. Maybe the lights go out? Obviously, you’d not want to move too often, nor take too long to make a move, or the result would be disruptive of the experience.

It’s not hard to see how this would apply to IoT, either. The “locale” would be a set of linked processes that could be manufacturing, transportation, or both. The machinery and goods would be avatars, and the difference between this and a social metaverse would like in how the policies worked to define behaviors and movement. This is why I think you could derive a single model/software architecture for both.

That architecture would differ from the service/application architecture and model, as I’ve already suggested, in that the service/application approach is really about managing the lifecycle of things while the relationship between those things is what creates the experience. The digital twin approach is really about defining the creation of the experiences, and lifecycle management is just a minor adjunct.

If we could harmonize these two models in any way, it might help define a single edge computing architecture, and that’s what we’ll address in the third and final blog of this series.