Making Virtual and Augmented Reality Real

How could we use augmented reality and what would that use require from the technology itself? AR is perhaps a key in unlocking a whole new way of empowering workers and engaging consumers, but like other new technology arrivals, there seems to be a complex ecosystem needed, and it’s not entirely clear how we’ll get to it.

Since the dawn of commercial computing in the 1950s, every major advance in the pace of IT spending has been linked to a paradigm shift that resulted in bringing information technology closer to workers. We started off with batch processing and punched cards, and we’ve moved through to online systems, portals, and online team collaboration. It seems pretty clear that the next step is to bring IT right into our lives, through the best possible gateway—our eyes.

The really advanced enterprise mover-and-shaker technology planners I talk with have been telling me that they’re excited about the potential of AR, but that they don’t really see a lot of hope in developing a framework to use it in the next two or three years. In fact, they couldn’t give me any real idea of when such a framework could be expected, in part because it isn’t clear what the parts of that framework would be. Since I hate to leave this issue hanging, I’m going to look at it here, starting at the top, or rather the front.

It’s all about eyes. Our visual sense is the sense most able to input a large amount of complex information quickly. AR is useful because it promises to be able to mix the real world with information that relates to the real world. Think of it as a productivity overlay on reality, a means of integrating visual data and linking it with the object in our visual field that’s referenced by the data. You look out over a crowd and AR shows you the name of people you should (but probably don’t) recognize, superimposed on them.

The challenge here is to get the artificial-reality world to line up with our view of the real one. There are two broad approaches. First, you could synthesize a real-world view from a camera, perhaps on the AR headset, and mix the virtual data with that. Since the camera is showing the real world in digestible form, it’s literally a mixing function. The second is to use the camera as before, but keep the camera view of the world behind the scenes. You use it for reference only, to “know” where stuff you’re analyzing is positioned. The virtual data is then injected onto the visual field, presumably through a translucent overlay on what the “AR glasses” see.

We have the technology to do the former perhaps a bit better than the latter, but the two approaches have a different set of issues when we dig into how the stuff has to work.

The first requirement of any visual system is that it doesn’t kill the wearer or make them ill. Anyone who has used immersive virtual reality knows that it can be literally dizzying, and the primary reason for that is that the visual experience tends to lag movement of the head and eyes. That same latency would impact the ability of the VR system to show us something moving quickly into our field, like a car on its way to a collision with us. An AR system that’s based on augmenting the real-world view rather than creating it still may have latency issues, but all that’s impacted is the digital overlay, not the real-world view.

The complicating issue in this visual-lag problem is the challenge of processing the image in real time, providing the synthesis, and then sending things to the glasses. It’s unrealistic to think that the current state of technology would allow us to create a headset that could do this sort of thing in real time, and still not cost as much as a compact car. Even modern smartphones would find it difficult. If we offload the function to an edge computer, we need really low latency in the network connection. Our hypothetical attacking car might be moving at 88 feet per second. If our image took 5ms to get to the edge, 100ms to be processed, 5ms to get back, and 1 ms to be displayed, we’ve accumulated 111ms, and our attacking car has moved just short of ten feet in that amount of time.

Turning your head could create the dizziness issue when you manage to avoid the attacking car. People turn their heads at various rates, but a fairly quick movement would turn at a rate of about 300 degrees per second (not to suggest you could actually turn your head 300 degrees unless you’re an owl; this is just a rate of movement). A rotation of even 30 degrees is enough to create a disturbing visual experience if the system doesn’t keep up, and that would take about the same time as our round-trip delay, meaning that your visual field would seem to lag your movement in even a minimal shift. It doesn’t work.

OK, are there any alternatives? Sure; we could carry a special device somewhere on our person that would do the processing locally. Is there anything we could do to improve response times other than that?

Maybe. Suppose that we have a 3D computer model of the street our hypothetical user is walking along. If we can know the person’s head position accurately, we can tell from our model what the person would see, and where it would be in the field of view. That means we could lay a data element on our virtual glass panel, while letting the real world view pass through. We could pass the model to the user when they turn onto the street, and it would be good for at least a reasonable amount of time.

There are many who would argue that you can’t set up 3D models of the world, and that’s probably true, but it’s also true that you wouldn’t have to. First, for worker empowerment, you’d likely need a combination of city models for sales call support and facility models for worker activity support. Companies could do these themselves with available laser technology and mapping. Second, the majority of the opportunity to serve consumers with data overlays on reality would come from retail areas, because retailer interest in the concept to market more effectively would be the biggest source of revenue to providers.

There could be a lot of money in this. My modeling says that a combination of AR technology and the contextual IoT information used to support it could improve productivity enough to increase business IT spending over a ten-year cycle of investment by an average of about 12% per year. It’s proving difficult to model the consumer opportunity, but I’d estimate it to be several hundred billion dollars. The problem is that to get the job done we’d need a lot of moving parts addressed, and there’s no concerted effort to get the job done. We’ll need to do that eventually, if we want to see AR become as pervasive and valuable as it can be.