What’s Required for True Autonomy in Vehicles or Robots?

I’ve done a couple of recent blogs on the elements of “contextual” services, and drawn on driving and walking as examples of places where they could be applied. This can, I learned, lead to intermingling the notion of “contextual” with another term, “autonomous”, and I want to clarify the separation.

“Autonomous” is a popular term these days, and it’s usually related to what might also be called “artificial intelligence”, but with a more specific focus. “Autonomous” seems to mean “the ability to do things in the real world without direct human control, but presuming some supervision and intervention.” We have relatively little high-level material available on the topic, even though one form of it (robotics) has been around for decades. I actually diddled around with robotics then, and since then, and I want to offer my own views of the subject of “autonomy” and how it probably needs to be implemented. I also want to tie it to artificial intelligence and even metaverse concepts.

To me, an autonomous-something starts with the presumption that there exists a “something” that’s controlled directly by humans. That’s obviously true of autonomous vehicles, for example, and from the first I’ve believed that the best way to address the design of something autonomous is to start with the human-controlled version, and with how human control works at a fundamental level. This latter thing can be understood best by looking at the ultimate human-controlled thing, the human.

When I started looking at this problem in the 1990s, it was my view that we could divide our “control system” into three layers. At the top was the mission layer, which represents the overall goal. “I’m driving to Cleveland” is an example of a mission. Next was the action layer, which determined what to do when the mission interacted with the perceptive process layer that paralleled and fed all three layers. “I’m turning on to the Pennsylvania Turnpike” is an example of an action. The final layer, also linked to the perceptive process layer, is the reaction layer, which represents imperative overrides of mission and action based on perceptions. “The car in front of me has stopped suddenly” is a reaction-layer issue.

My assumption was that the layers interacted among themselves, but I later decided that this interaction could only take place between adjacent layers (for those that followed my ExperiaSphere project, that’s why I specified that objects could only interact with subordinate/superior objects in the hierarchy). Interactions with adjacent layers, and the perceptive process layer that was adjacent to at least the action and reaction layers, could trigger the need for a control process.

The easiest way to understand this is to take something like our drive to Cleveland as an example. The mission layer here would be where trip and route planning took place, meaning that this is the layer that knows at the high level where we’re going. The mission layer, at the start of the trip, would signal the action layer to undertake the first portion of travel, meaning to the next point where action (a turn, for example) would be needed. At that point, the action layer would be signaled to make the turn (with adequate warning, of course). The trip, signaled as a number of specific “legs”, would complete using the same sequence.

It’s possible that traffic or road construction, signaled to the mission layer from the perceptive process layer, would require restructuring the route. In that case, the mission layer would send a new leg to the action layer, and the route change would then control movement.

At this point, our autonomous vehicle would look exactly like a human-driven car with a GPS. Where things get complicated is where our perceptive process layer signals the reaction layer, and there’s more to that than not hitting the car in front of you.

The highest priority behavior in an autonomous system is damage prevention and control, and obviously avoiding something falls into that category. Hitting the brakes is an obvious response, and in many cases would be an automatic assumption. However, how about if conditions are icy? Suppose that there isn’t time to stop at the current speed? We could assume that road conditions, stopping distance, and so forth could be used as input to a decision process. Even maneuvering might be automatic providing that the vehicle had “situational awareness”, which is clearly a product of a sophisticated perceptive process layer. I envision the reaction layer maintaining a list of possible reactions based on that situational awareness, including what do do for a flat, how to steer safely off the road, and more.

Obstacles aren’t the only complication either. Suppose you can’t make the next turn because the road is closed, but there’s no real-time notification of the sort that GPS systems get (or Waze delivers through crowdsourcing)? Could we have a computer vision system sophisticated enough to interpret a road-close condition? Eventually, probably, but in the meantime what’s needed is human input. Either a human signal that the road is closed (given verbally perhaps) or a situation where human control overrides the next action would be required.

Even terrain has an impact, and particularly if your autonomous-something is an anthropomorphic robot or an off-road vehicle. You have to regulate speed in general, but your perceptive process layer may also have to measure incline, bumps, etc. The more input you can get on conditions from an external source like a GPS and map, the better.

A final perceptive process layer requirement is addressing a malfunction of some sort. Things like loss of engine power, a flat tire, or other vehicular issue are fairly routine happenings, but the response to one of them would be similar to the response to an obstacle, qualified by taking into account the possible impact of the malfunction on the maneuvering required. It’s also possible that the autonomous-operations system itself could fail, and a complete failure would require a failover to manual operation. To avoid that, I presumed that each layer in the process described would be implemented by an independent device connected via a local interface, and that every layer would have to be able to take action based on cached information should one of its adjacent layers fail.

As I dug through the details of how this might be implemented, I realized I needed another layer, which I’ll call the “autonomic” layer, to abstract the actual motion-creating hardware systems from the higher-level structure. Without this, things become too brittle if you have to make changes in the hardware systems. You also need a “perception abstraction” layer to do the same thing for the perceptive process layer, to account for changes in sensor technology and the way you’d receive information from the outside.

What about “the outside”? Could we assume that autonomous operation was a cloud or edge computing function? I never believed that; you really need most of the intelligence resident locally to ensure that something awful doesn’t happen if you lose connection with the outside world. I think that you’d have to assume a local map and GPS were available, that current conditions (from the perceptive process layer) were cached to use as a last resort.

Does this all sound complicated? It sure does, because it is. It illustrates that true, full, autonomy in vehicles is likely to lead to safety issues, because it’s not possible to make the systems truly autonomous with current technology limits, particularly given the need to be fully redundant at all levels and to ensure that a fallback to human control is always possible. The value of an autonomous vehicle that needs a driver is limited, and if the vehicle is supposed to be autonomous then what are the chances that any driver involved is alert enough to take control in an emergency?

What was missing in my robotic experiments was the way of dealing with complexity, and that way is where new concepts like AI and the metaverse could play a big role. AI can support two important things, optimization and anticipation. When we move about in the world we live in, we instinctively optimize our decisions on movement and even body position to the situation. That optimization has to take into account our own capabilities (and limitations), the situation of the moment, and the way we see that situation evolving in the near term. That’s where anticipation comes in; we look ahead. You can’t catch a ball or dodge an object without having the ability to project where it’s going.

The metaverse concept is just as important, because effective AI means that we have a model of the aspects of the current real-world situation and can use that model to forecast what’s going to happen, and even to manage our optimized response. We could model the path of a ball, for example, to show whether it would intercept an autonomous element or not. Whether our goal was to catch or to dodge, the knowledge of the path could be used to activate movement controls to accomplish our goal. But the more complex the real-world situation, the more challenging it is to model it and to apply AI to the model to support realistic, useful, autonomous operation.

You might wonder how my contextual information fields could play into this. My view is that they would impact the mission and action layer and become an input to the perceptive process layer, but not the lower layers. I think it would be a mistake to have issues like avoiding obstacles handled through a lot of back-and-forth cooperative exchanges. However, I can see that in collision avoidance, for example, it might be helpful to coordinate what direction two vehicles might elect to steer to avoid, so that they don’t end up setting up another collision.

All this is complicated, but it doesn’t mean that autonomous operation and robots can’t work. The key is simplification of the environment in which the autonomous-something is operating. A robotic vacuum doesn’t require all this sophistication because even if it fails or runs amok, it can’t really do serious damage to somebody, and fail-safe operation consists of stopping all movement immediately if something goes wrong. Vehicles operating in warehouses or closed environments can also be safe without all the issues I’ve cited here. We should expect to walk before we run, in autonomy, and that suggests that self-driving vehicles may be more of a risk to the concept than an indication it’s advancing. The real world is a complicated place, and in autonomous operation, complicated may often mean too complicated.