Are We Focused on the “Wrong” Latency Sources?

Does lower latency automatically improve transaction processing?  That may sound like a kind-of-esoteric question, but the answer may determine just how far edge computing can go.  It could also help to understand what network-infrastructure applications like 5G would mean to mobile edge computing (MEC) and even what kind of edge-computing stimulus we might expect to see from microservices and the cloud.

“Latency” is the term used to describe the time it takes to move something from point of origin to point of action.  It’s a modern replacement for (and factually subset of) the old popular term, “round-trip delay”.  We also sometimes see the term “response time” used, as an indicator of the time between the user does something at a user interface device, and they receive a response.  The basic premise of networking and IT alike is that reducing latency will enhance the user’s quality of experience (QoE).  Like all generalizations (yes, perhaps including this one!), it’s got its exceptions.

When we talk about latency in 5G, and combine it with MEC, what we’re doing is suggesting that by lowering the transit delay between message source and processing point, either by improving network handling or by moving the processing point closer to the user, we can improve QoE and often productivity.  The reason this discussion has started to get attention is that it’s becoming clear that things like self-driving cars (which don’t really have much to do with latency in any case) are not jumping out of showrooms and onto the roads.  If 5G latency and MEC are to gain credibility from latency reduction, we need an application that benefits from it, and can be expected to deploy on a very large scale.

Everything that we need to know about 5G and MEC latency benefits can be summed up in one word—workflows.  The user’s perception of a response comes from a number of factors, but they ultimately come down to the flow of information between handling or processing points, from user to their logical end, and then back.  We forget this all the time in talking about latency, and we should never, ever, forget it.

Let’s take a silly example.  Suppose we have a 5G on-ramp to a satellite phone connection.  We have 5G latency to the uplink, which might be just a few milliseconds, and then we have a satellite path that’s a minimum of about 44 thousand miles long, then a return trip of the same length, and then the 5G leg.  The round-trip to the satellite is 88 thousand miles, which would take (over the air only, no account for relay points) 470 milliseconds.  The point is that, in comparison with that satellite hop, nothing likely to happen terrestrially is going to make a statistically significant difference.

We can apply this to transaction processing too.  Suppose we have a transaction created on a mobile phone, one that hops to a cloud location for front-end processing, and then on to a data center for the final transaction processing.  The hop to the Internet from the phone might take 10 milliseconds, and then we might have an average of about 60ms to get to the cloud.  The hop to the data center might consume another 60ms, and then we have processing delay (disk, etc.) that would require 100ms.  At this point, we go back to the phone via the same route.  Total “round-trip” delay is 360ms (2x130ms for the cloud and 100ms in the data center).  This is our baseline latency.

Suppose now that we adopt 5G, which drops our phone-to-Internet latency down to perhaps 4ms.  We’ve knocked 12ms off our 360ms round-trip, which I submit would be invisible to any user.  What this says is that 5G latency improvement is significant only in applications where other sources of delay are minimal.  In most cases, just getting to the processing point and back is going to obliterate any 5G differences.

This, of course, is where edge computing is supposed to come in.  If we move processing closer to the point of transaction generation, we eliminate a lot of handling.  However, if we go back to our example, the total “transit latency” in our picture is only 260ms.  Edge computing couldn’t eliminate all of that, but it could likely cut it to less than 50ms.  Whether that’s significant depends on the application.  For transaction processing, the 210ms eliminated is at least slightly perceptible.  For closed-loop control applications, it could be significant.

But there’s another point to consider.  If we look at edge computing as it is today, we find that it’s implemented most often via one of two architectural models.  The first model is the local controller model, where an event is cycled through a local edge element as part of a closed-loop system.  That’s really more “local” computing than “edge” computing.  The second model is the cloud edge, and that’s the one we need to look at.

The cloud-edge model says that in a transaction dialog, there are some interactions that don’t connect all the way to the database and transaction processing element.  Complex GUIs could create some, and so could parts of the transaction that do editing of data being entered, perhaps based even on accessing a fairly static database.  If we push these to the edge, we reduce latency for “simple” things, things that the user would be more likely to get annoyed with.  After all, all I did was enter a form, not update something!

But this raises what’s perhaps the biggest issue of latency in transaction processing in our cloud and cloud-native era, which is the interior workflow.  Any multi-component application has to pass work among components, and the passage of this work creates workflows that don’t directly involve the user.  Since their nature and number depends on how the application is architected, a user wouldn’t even be aware they existed, but they could introduce a lot of latency.

This may be the most compelling problem of cloud-native applications.  If we presume that we adopt a fully microservice-ized application model with a nice service mesh to link it, we’ve taken a whole bunch of interior workflows and put them into our overall transaction workflow.  In addition, we’ve added in the logic needed to locate a given component, perhaps scale it or instantiate it…you get the picture.  It’s not uncommon for interior flows in service meshes to generate a hundred milliseconds of latency per hop.  A dozen hops and you have a very noticeable overall delay.

What this means is that dealing with “latency” in the radio network (5G) or in the first logic step (edge or local computing) doesn’t mean you’ve dealt with latency overall.  You have to follow the workflows, and when you follow them into a modern service mesh and cloud-native deployment, you may find that other latency sources swamp what you’ve been worried about.

It also means that we need to think a lot more about the latency of the network(s) that will underlay a service mesh.  There’s a lot of queuing and handling in an IP network, and we should ask ourselves if there’s a way of reducing it, so that we can efficiently hop around in a mesh.  We also need to think about making service meshes highly efficient in terms of latency (the same is true of serverless computing, by the way).  Otherwise we could see early cloud-native attempts underperform.