5G, Edge Computing, and the Transactional-versus-Event Debate

I mentioned in yesterday’s blog that there was a profound difference between open-model network software based on the transactional or RESTful model, and software designed to be event-driven. I also said that the difference was critical in selecting the platform software tools to be used to support applications, particularly at the edge. What is the difference, and why does it matter so much. We’ll look at that today.

Let’s start by relating APIs to software models, which is essential because APIs facilitate inter-component communications in software, they don’t define how the software works. My concern is that the publicity that’s been given to “cloud-native” or “event-driven” APIs could foster a belief that all that’s necessary to make software cloud- or event-based is to use those APIs. The opposite is true; the software architecture has to be where we start. The value of a discussion on APIs is that once the architecture is framed out, it’s the APIs that connect the work, and so we can visualize applications through information movement, and thus through APIs.

Today’s common software model is transactional, meaning the software software is designed to support request/response interactions. You have a “user” and a “resource”, and you send a request from user to resource, invoking a response. Absent a request, nothing is sent, and while the user may not “wait” for a response, meaning do nothing until one arrives, that’s a common approach. This is “synchronous” use. There is an option with RESTful APIs to use an asynchronous call, by defining a separate “status resource” that can be queried to get the status of a request, rather than waiting for a response.

Synchronous software design is a bit of a risk because the requester is stopped until a response is received. Asynchronous design prevents that, but the requester still has to keep checking on the status of their request, which is sometimes called “polling”. That’s wasteful, and it can create convoluted software logic if there are multiple things that might be pending.

Event-based systems presume that stuff is happening that needs attention, and that a “happening” can generate an event, which is simply a notification. A software component would normally register interest in a particular kind of event (or events), in which case the event system would post the event to them when it happened. There is no response expected in an event-based system, so the notion of synchronous/asynchronous doesn’t apply. In a simple event-based system, you process the events as they come, but most event-based systems are really event-driven, and they’re not simple.

Protocols, management systems, and other systems that process events usually have to recognize more than one event type. For example, an event might report something was up, or down. Or, it might report a command was processed, or failed. Because event-based systems aren’t request/response oriented, each event has to be related to conditions overall in some way. The same is true, for example, in human conversation. We process speech in context of the conversation and conditions in which it occurs, and events have to be processed the same way.

Context, in event-based systems, is known as state. Most systems that relate to the real world are stateful, and stateful systems have to interpret events in context, meaning according to the current state. To return to our example, a command to activate a connection, issued when the connection is already in the Active state or in Recovery (for example) is an error. In the Inactive state, it’s valid, and should be interpreted as instructions to make the connection active.

The problem with transactional frameworks in real-time systems lies in part with the pace of change. Transactional systems, faced with a lot of change, will queue up things until they can catch up. Because transactional software is usually designed with limited ability to scale capacity, the queuing delay could be considerable, which means that there may be multiple things in the queue relating to the same real-world process, and the current transaction may have set a while too. The state of the system may have changed in during all this sitting around.

This is a particular problem in service management systems and even provisioning systems, where software has to accept external commands (traditionally recognized as transactions) and at the same time reflect the state of the stuff that’s being managed, which is traditionally an event-driven process. Often the software will divide itself into two pieces, which increases cost and complexity. Why not simply use event-based systems for everything?

With event-based systems, each event triggers a state/event table lookup and dispatches a process. There can be only one state/event table for any given real-time system that’s being managed by the software, but any number of processes could use the table (with proper locking) and the state/event intersection’s process could be any instance of that process, if the process is designed to get all its data from the table and the data model that contains it. Further, only a little process needs to be run, something that would likely take little time.

Event-based systems can process transactions as events, with the state/event tables providing the context. In fact, cloud computing in serverless form does this, and many of the social-media applications we run regularly use event-driven software to support what looks to us like transactional behavior. So event-driven software can behave transactionally, but it’s extremely difficult to get transactional software to behave in event-based applications.

The notion that 5G and O-RAN hosting is an on-ramp for edge computing deployment makes this transaction-versus-event issue critical. Whatever edge applications may be, the reason they’re typically classified as “edge” versus “cloud” applications is latency, which at least strongly implies that the applications have a time-critical link to the real world. The real world is event-driven, and that’s a simple truth. Edge computing has to be, too.

To circle all this back to 5G and O-RAN and network lifecycle operations in general, almost everything that needs to be done in these areas is naturally a state/event activity. Every protocol is that, and the great majority of protocol handlers are implemented that way. Every interface point defined in 5G or O-RAN should be viewed as protocol handlers, and thus as event-driven applications. That’s apparently a problem for operators, because all their initiatives on virtualization and management have gone the other way.

NFV and ONAP are both designed as transactional systems and not as event-driven systems. That means that they’re not readily adapted to the kind of elastic adapting to workloads or faults that characterize cloud-native design. The question is whether vendors, who aren’t shy about making their own cloud-native claims, are going do slog down that same path. That’s a particular risk for 5G because the 5G specs tend to call out NFV implementation, which literally interpreted would take the software in a sub-optimal direction.

Both Red Hat and VMware have made recent O-RAN announcements, and both retain an NFV flavor in their story. That may be inevitable given the direction the 5G/O-RAN specs have taken, but you can support NFV while allowing alternative models. That may be critical, because while we can base 5G (imperfectly) on NFV, we can’t easily evolve NFV-and-ONAP-flavored practices to generalized edge computing applications. If one of 5G’s critical missions is to facilitate a transformation to edge computing, then we might expect one of these cloud-software giants to figure that out, and frame a realistic software model that better exploits the cloud. Whatever the edge is in technical terms, it’s functionally an extension of the cloud and not the network, and it’s got to work that way.

A 5G O-RAN model has to be, first and foremost, a cloud computing model, or we risk developing “edge resources” that aren’t applicable to future edge applications. It’s ironic that operators and network vendors, who should have more experience with event-driven applications, seem to be having the most problem with edge-think these days. They’re going to have to get over it if they want to continue to fend off the cloud-software competition.