Why We Should Be Augmenting Augmented Reality

Everyone has heard of augmented reality, and it’s often mentioned as a driver for everything from 5G to new phone technology.  There’s no question that it could create a whole new relationship between us (as workers and consumers) and our technology, but the space is a lot more complicated than just putting on some kind of special goggles.  In fact, augmented reality ends up linking almost every hot technology topic (including AI), and to see why we need to examine the three requirements for useful augmented reality.

To start off, we should reflect on the Wikipedia definition for augmented reality: “Augmented reality (AR) is an interactive experience of a real-world environment whose elements are “augmented” by computer-generated perceptual information, sometimes across multiple sensory modalities, including visual, auditory, haptic, somatosensory, and olfactory. The overlaid sensory information can be constructive (i.e. additive to the natural environment) or destructive (i.e. masking of the natural environment) and is seamlessly interwoven with the physical world such that it is perceived as an immersive aspect of the real environment.  In this way, augmented reality alters one’s ongoing perception of a real-world environment, whereas virtual reality completely replaces the user’s real-world environment with a simulated one.”

The missions for augmented reality are broad.  We already see some use in improving visual perception for those with limitations in that area.  There is broad interest in having augmented reality play a role in self-drive vehicles, and even in everyday shopping, sightseeing, and most of all, working.  A number of companies are experimenting with the use of augmented reality as a means of offering worker or even customer support for specialized tasks.

We see a lot of virtual reality today in gaming, and there are some examples of augmented reality too, but they’re more limited in both numbers, users, and scope.  The reason is that augmented reality is a lot harder to do, and the three “R’s” that set the requirements framework are the reason.

The first requirement is that it has to be responsive.  Reality is real-time, and so any augmentation of reality has to be as well.  That means that the artificial part of the visual field has to track the real part.  If the system employs a display for both real-world surroundings and augmentation, then that system has to track the real world without perceptible delays.

Responsiveness starts with the ability to present the “real-world” visual field in real time.  That could be automatic if the augmented reality device lets a user “see though” the display, but most of today’s products create the entire augmented reality view from a camera and image insertion.  There is a potential for delay in both models, but most in the latter because of the challenge of redisplaying the real-world image to follow a user as they move their head.  Remember that the system would have to offer a very wide field of view, perhaps not as wide as the human eye but surely well over 120 degrees, or there’s a risk of literal “tunnel vision” that could be dangerous for the user (and those around the user).

The responsiveness problem is exacerbated by the need to visually link augmentation with the reality piece.  A label on a building can’t float in space as the user’s head turns, then race to catch up with the object it’s associated with.  However, figuring out what is in the field of view and where exactly it is located within it is far from trivial.  In pure virtual reality the entire view is constructed so there’s no need to synchronize with the real world.

The second requirement is that it has to be relevant.  This is nearly as important a criterion as responsiveness, because augmented reality seems to imply positive or valuable additions.  The visual sense is our most powerful, the one that (for those without sight impairments) establishes our place in the world.  Imagine the effect of a bunch of image clutter that has little or nothing to do with what we see and what we’re trying to do.

What makes this the most difficult of the criteria to meet is its total subjectiveness.  Relevance is (no pun intended) in the eye of the beholder.  The key to achieving it is placing the augmented reality into context, meaning an understanding not only of the current real-world visual field, but also the mission of the user/wearer.  At the minimum, a good system would have to be able to accept explicit missions or mindsets as a guide to what to display.

If you’re shopping, you look for stores.  If you’re sightseeing, you look for landmarks, and if you’re hungry you look for restaurants.  To clutter a view with all of this at one time would be to either render the real world invisible behind the mask of created imagery, or to render the imagery virtually useless because of the size limits the sheer mass of information would impose.  This kind of mission-awareness is the first and most critical step to contextualizing augmented reality for relevance, but it’s not the end of the story.

Obviously, looking for restaurants isn’t helpful if I can’t find any, which means that I’d also need to understand the nature of the businesses that sit within my visual field.  This understanding could come about only through accurate geolocation of my position and a knowledge of the locations of other points of relevance, like restaurants, stores, and landmarks.  You could get that from a geo-database something like what Google provides, or you could get it from an “information field”.

Those of you who have read my own IoT views know that “information fields” are contextualized repositories of information that intercept a user’s own “information field” as the user moves around or changes goals.  Unlike geo-databases that would require someone to host them (and would likely be available only as a service), information fields would be asserted by companies or individuals, and would likely represent a hosted event-driven process.  An augmented reality user would assert a field, and so would anything that wanted explicit representation in user augmented reality displays.

The problem with my model is that you’d need a standard framework for the interaction of information fields with events and with each other.  Frankly, I think that neither IoT nor augmented reality can be fully successful without one, but it’s hard for me to see how a body could take up something like that and get the job done in time to influence the market.  Might a vendor, a cloud provider, offer an open strategy?  They could, and both Amazon and Google would have a lot to gain in the process.  Of the two, I think Amazon is the more likely mover, since Google already has (to support its Maps service) the geo-database that would likely be the foundation of the alternative to information fields.  But more on this later.

Requirement three may seem almost contradictory; an augmented reality system has to be restrictive.  It has to get out of the way under special circumstances, where the augmentation might put the user at risk in the real world.  That it’s critical that a user walking down a New York sidewalk not end up falling into an open subway grate or walking into traffic is obvious.  That a commercially exploitive model of augmented reality might realistically be expected to pepper the visual field with ads is also obvious.

I’ve talked with some augmented reality researchers who say that experience shows that the difference between “clutter” and “augmentation” is subtle.  This is one place where AI could come into play, because it would be very valuable for an augmented reality system to learn from user management of the density of the augmented part of the display, and enforce similar densities under similar conditions.

In this area, and in others as well, a big part of the benefit of augmented reality depends on AI.  The problem is that too many things are going on in the real world to allow the user to frame missions to filter and label them.  You need at the least machine learning, and better yet you need a mechanism to predict what will be valuable.  Cull through recent web searches, texts, emails, and so forth, and you could (as an AI agent process) take a good guess at what a user might be doing at any moment, and by knowing that, provide better contextual support and augmented reality relevance.

Let me go back now to information fields.  This sounds like a totally off-the-wall concept, but it’s really the model of IoT that fits the Internet the best.  An information field can be analogous to a website, a place where information is stored.  The user, via another information field, can be visualized as a browser doing web-surfing.  Not by explicit clicking, but by moving, looking, living.  Every movement and action, glance and request, surfs the “information field web”.  Much of this approach could be implemented via a combination of event processing and the current Internet web-server model.

When the Worldwide Web came along, it was a technology-driven revolution in online thinking.  That was possible because a self-contained development opened a lot of doors.  Our challenge today is that most of those easy wins have already been won.  Today’s new technology, including augmented reality, is part of a complex implicit ecosystem whose individual parts have to be pulled through somehow to create a lot of glorious wholes.  Should we be thinking in those terms?  Surely.