We Can’t Put Off Thinking About Latency!

If latency is important, just what constitutes “good” and “bad” latency levels?  How does latency figure into network and application design, and what are the sources of latency that we can best control?  I’ve talked about latency before, but over the last couple months I’ve been collecting information on latency that should let me get a bit more precise, more quantitative.

Latency is a factor in two rather different ways.  In “control” applications, latency determines the length of the “control loop”, which is the time it takes for an event to be recognized and acted on.  In transactional and information transfer applications, latency determines the time it takes to acknowledge the transfer of something.  Difference here is important because the impact of latency in these areas is very different.

Control-loop latency is best understood by relating it to human reaction time.  People react to different sensory stimuli differently, but for visual stimulus, the average reaction time is about 250 milliseconds.  Auditory reaction time is shorter, at about 170ms, and touch the shortest of all at 150ms.  In control-loop processes whose behavior can be related to human behavior (automation), these represent the maximum latency that could be experienced without a perception of delay.

Transactional or information transfer latency is much more variable, because the former source can be related to human reaction time and the latter is purely a system reaction.  Online transaction processing and data entry can be shown to be just as latency-sensitive as a control loop.  Years ago, I developed a data entry application that required workers achieve a high speed.  We found that they actually were able to enter data faster if they were not shown the prompts on the screen, because they tended to read and absorb them even when experience told them the order of field entry.  But information transfer latency can be worse; if messages are sent at the pace that acknowledgments can be received, even latencies of less than 50ms can impact application performance.

The sources of latency in an actual networked application are just as complex, maybe even more.  There is what can be called “initiation latency”, which represents the time it takes to convert a real-world condition into an event.  Then we have “transmission latency” which is the time it takes to get an event to or from the processing point, and then the “process latency” which is the cumulative delay in actually processing an event through whatever number of stages are defined.  Finally, we have “termination latency” which is the delay in activating the control system that creates the real-world reaction.

The problem we tend to have in dealing with latency is rooted in the tendency to simplify things by omitting one, or even most, of the sources of latency in a discussion.  For example, if you send an event from an IoT device on 4G to a serverless element in the public cloud, you might experience a total delay of 300ms (the average reported to me by a dozen enterprises who have tested serverless).  If 5G can reduce latency by 75%, as some have proposed, does that mean I could see my latency drop to 75ms?  No, because 200 of the 300ms of latency is associated with the serverless load-and-process delay.  Only 100ms is due to the network connection, so the most I could hope for is a drop to 225ms.

The key point here is that you always have to separate “network” and “process” latencies, and expect new technology to impact only the area that the technology is changing.  IP networks with high-speed paths tend to have a low latency, so it’s very possible that the majority of network latency lies in the edge connection.  But mobile edge latency even without 5G averages only about 70ms (global average), compared to just under half that for wireline, and under 20ms for FTTH.  Processing latency varies according to a bunch of factors, and for application design it’s those factors that will likely dominate.

There are four factors associated with process latency, and they bear an interesting resemblance to the factors involved in latency overall.  First there’s “scheduling latency”, which is the delay in getting the event/message to the process point.  Second, there’s “deployment latency”, which is the time needed to put the target process in a runnable state.  Third is the actual process latency, and fourth the “return latency”, associated with getting the response back onto the network and onward to the target.  All of these can be influenced by application design and where and how things are deployed.

The practical reality in latency management is that it starts with a process hierarchy.  This reality has led to all sorts of hype around the concept of edge computing, and while there is an edge computing element involved in latency management, it’s in most cases not the kind of “edge-of-the-cloud” or “edge computing service” that we hear about.

The first step in latency management is to create a local event handler for the quick responses that make up most “real-time” demands.  Opening a gate based on the arrival of a truck, or the reading of an enabling RFID on a bumper, is a local matter.  Everything, in fact, is a “local matter” unless it either draws on a data source that can’t be locally maintained, or requires more processing than a local device can provide.  In IoT, this local event handler would likely be a small industrial computer, even a single-board computer (SBC).

The goal is to place the local event handler just where the name suggests, which is local to the event source.  You don’t want it in the cloud, in a special edge data center, but right there.  Onboard a vehicle, in a factory, etc.  The closer it is, the less latency is added to the most latency-critical tasks because that’s where you’ll want to move them.

In network terms, meaning virtual or cloud-network terms, you want to ensure that your local event handler is co-located with the network event source.  It can be literally in the same box, or it can be in a rack or cluster or even data center.  What you’re looking for is to shorten the communications path, so you don’t eat up your delay budget moving stuff around.

The second step is to measure the delay budget of what cannot be handled locally.  Once you’ve put what you can inside a local event handler, nothing further can be done to reduce latency for the tasks assigned to it, so there’s no sense worrying about that stuff.  It’s what can’t be done locally that you have to consider.  For each “deeper” event interaction, there will be a latency budget associated with its processing.  What you’ll likely find is that event-handling tasks will fall into categories according to their delay budgets.

The local-control stuff should be seen as “real-time” with latency budgets between 10 and 40ms, which means on the average as fast as any human reaction.  At the next level, the data I get from enterprises says that the budget range is between 40 and 150ms, and most enterprises recognize that there is a third level with a budget of 150 to 500ms.

In terms of architecture for latency-sensitive applications, this division suggests that you’d want to have a local controller (as local as you can make it) that hands off to another process that is resident and waiting.  The next level of the process can be serverless, consist of distributed microservices, or whatever, but it’s almost certain that this kind of structure, using today’s tools for orchestration and connectivity, couldn’t meet the budget requirements.  The data I have on cloud access suggests that it’s not necessary for even the intermediary-stage (40-150ms) processing to be in a special edge data center, only that it not be processed so distant from the local event handler that the hop latency gets excessive.

The latency issue, then, is a lot more complicated than it seems.  5G isn’t going to solve it, nor will any other single development, because of the spread of sources.  However, there are some lessons that I think should be learned from all of this.

The first one is that we’re being to cavalier with modern orchestration, serverless, and service mesh technology as applied to things like IoT or even protocol control planes.  Often these technologies will generate latencies far greater than even the third-level maximum of 500ms, and right now I’m doubtful that a true cloud implementation of microservice-based event handling using a service mesh could meet the second-level standard even under good conditions.  It would never meet the first-level standard.  Serverless could be even worse.  We need to be thinking seriously about the fundamental latency of our cloud technologies, especially when we’re componentizing the event path.

The second lesson is that application design to create a series of hierarchical control-loop paths is critical if there’s to be any hope of creating a responsive event-driven application.  You need to have “cutoff points” where you stage processing to respond to events at that point, rather than pass them deeper.  That may involve prepositioning data in digested form, but it might also mean “anticipatory triggers” in applications like IoT.  If you have to look up a truck’s bill of lading, you don’t wait till it presents itself at the gate to the loading dock.  Read the RFID on the access road in, so you can just open the gate and direct the vehicle as needed.

The third lesson is that, as always, we’re oversimplifying.  We are not going to build any new technology for any new application or mission using 500-word clickbait as our guiding light.  Buyers need to understand a technology innovation well enough to define their business case and assess the implementation process to manage risks.  It’s getting hard to do that, both because technology issues are getting more complex, and because our resources are becoming more superficial by the day.

I’ve done a lot of recent work in assessing the architectures of distributed applications, especially cloud and cloud-native ones.  What I’ve found is that there isn’t nearly enough attention being paid to the length of control loops, the QoE of users, or the latency impact of componentization and event/workflows.  I think we’re, in an architectural sense, still transitioning between the monolithic age and the distributed age, and cloud-native is creating a push to change that may be in danger of outrunning our experience.  I’m not saying we need to slow down, but I am saying we need to take software architecture and deployment environment for cloud-native very seriously.