What the Heck is an Event-Driven App?

What does an event-driven app look like?  That might seem to be a silly question, but as we move toward at least some level of realization of the Internet of Things (IoT) and as serverless cloud computing services aimed at event processing proliferate, it’s a question we need to answer.  Most developers, consultants, analysts, and media types know “applications” primarily from either the business-transaction side or the Internet worldwide web side.  Events are similar to both in some ways, and very different in others.

Both transaction processing and web access are similar in the way they use resources.  Both are usually supported by a semi-fixed set of resources that host application/component instances.  I’m calling these “semi-fixed” because there is a specific amount of pre-positioned capacity, and that might or might not be scalable with changes in workload.  The decision to scale is explicit in that applications/components are designed to scale and scaling is normally invoked either by applications themselves through work scheduling, or through a separate manager that recognizes load changes.

Event-driven systems are forking in this particular attribute.  On the one hand, many are written to utilize the same kind of pre-positioned assets as web/transactional apps.  Containers are a particularly strong way to host event processing because they add relatively little overhead to the applications themselves, compared with VMs that require their own copy of the operating system even for event processes that might be a couple dozen lines of code.  On the other hand, events in the cloud have been associated with serverless “functional”, “lambda” or “microservice” programming where a copy of an event process is only loaded when the event it’s designed to process enters the system.

With cloud-hosted event processes, the decision to scale is implicit because it’s presumed that a new event will spawn a new event processor element.  The cloud approach to event processing is much more dynamic, scalable, and resilient than a pre-positioned component model of event processing.  Under current pricing terms, it can be more expensive if the volume of events is high, because that would justify dedicated and persistent resources.  The cloud model also brings to the fore one of the most critical issues with event-handling, which is context.

Web interactions are stateless; every HTTP event is processed for what it is and where it’s directed.  Transactional applications that involve multiple messages (query-update is an example) are typically handled by stateful processing that can keep track of where you are in the dialog.  Event processing context is harder because it can be about the timing of a given type of event, the relationship between different types over time, and where you are in the context.  Most of today’s event-processing systems are either based on state/event structures (If I get an “Activate” event in the “Ready” state, then assign resources”) or on complex event processing (CEP) software.

Amazon’s Step Functions create a state machine, which means that developers define specific “states” that a process can be in.  When an event occurs, the event is processed according to the logic associated with the current state, as defined in the state machine.  This lets developers build in context by defining how conditions progress a system through specific phases (states).  Part of the processing of an event in a state can set the “current state” so the next event will be processed according to those rules.

State/event programming is familiar to people who build protocol handlers, since virtually all of them work on state/event principles.  In the original ExperiaSphere project, which built state machines (called “Experiams”), the implementation defined specific states and events, and the presumption was that every event had to be handled (even if the handling was to set an error termination) in every state.  This illustrates the challenges of developing state/event logic in any form; you have to understand the relationship between states and events and define the progressions that each event in each state would trigger.  Most people use a diagram consisting of ovals (representing states) and arrows (representing events that generate a transition to another state) to keep track of the progressions.

One of the most challenging elements of any event-driven application is the notion of time.  There are two separate time-related aspects to event handling.  The first is the chronology aspect; events are significant in time, meaning that their precise sequence and interval is normally important.  That means that most systems have time-stamp facilities to carry event timing information to processes.  Transactional systems may also provide timing information, but it doesn’t need to be precise or synchronized in most cases.  Event-driven systems may need to synchronize all event sources to a common clock.  The second is the duration dimension.  States aren’t black holes; a good state/event progression will ensure that the system can’t simply stall waiting forever for something to happen.  This means that a timeout event is common; an event that signals that something being expected has not arrived and no more waiting will be allowed.

No matter where you decide to run an event-driven app, this issue of states, events, and timing will be right there with you.  It’s the fundamental difference between event-driven programs and other kinds of applications, which makes it the hardest thing for new event-programming teams to handle.  The way that state/event progression is addressed may vary between hosted container or component implementations and “lambda” or functional-cloud implementations, but the principles are exactly the same.

CEP is a model that can be related to state/event implementations for event-driven apps.  With CEP, there’s a kind of correlation front-end that accepts policies or definitions of the things that constitute relevant “raw” event sequences, and from them generates process triggers.  In theory, a CEP front-end could eliminate the need for explicit state/event programming, but users of CEP say that most of their implementations of event-driven apps use CEP only to summarize raw events, to cut down on the complexity of later state/event processes.

This opens the last of the issues of event-driven apps, which is event distribution.  Realistic event-driven systems tend to consist of Four classes of components—event generators that actually source the primary events in a system (sensors, for example), event distributors that take a single primary event and distribute it to multiple parties based on some publish/subscribe or bus process, and event processors that actually receive and “handle” events.  In some cases, event processors may then link back to process controllers that take real-world action.  All of this presumes some mechanism for event distribution, for which the event distributor processes are a key.

Distributors would normally have a fixed association with an event generator or a series of related generators, which means that it’s likely they have a dedicated connection of some sort.  You don’t want to have event generators, which are the devices likely to be most numerous in an event-driven system, handling a lot of processing or supporting a lot of direct connections—it raises costs and security concerns.  The distributors would define both a mechanism for knowing about the event processors that wanted a given event, and the connectivity framework that supported the distribution itself.

The overhead associated with an event distributor is important, and for two reasons.  First, many instances of the distributor are likely to be deployed, so you don’t want the process to be expensive to run.  Second, the distributor is in the primary path of events, and any latency introduced by the distributor will add to the “length of the control loop”, which is a way of describing the time interval between the generation of an event and the completion of the event processing.  It’s fair to say that the success of event-driven apps is linked explicitly to the effectiveness of event distributors.

This is why I believe that the focus of IoT on connecting event generators via cellular wireless is so illogical.  Not only should we be looking primarily at local low-cost connectivity for sensors (which is already the practice in the real world), we should be looking not at the service of connecting sensors but at the service of distributing events.  A network operator or cloud provider who had the right combination of edge computing and efficient event distribution could be the true, big, winner in the IoT race.

So why aren’t we seeing examples of this?  A big part, I think, is that in the financial markets of today, it’s only the current quarter that counts.  Things like 5G sensor connection are easy for the Street to understand, and easy to see as near-term revenue-generators.  Something like an edge-computing-and-event-distribution deployment looks like an enormous source of “first cost”, the dip in cash flow that accompanies a service that needs major infrastructure deployment before it generates any compensating revenue.  Another part is that the network operators, the most likely to be long-term players in an event-distribution market, aren’t software types and don’t see the need, or the necessary architectural steps involved in fulfilling it.

I’m of the view that without some explicit exploitation of the event distribution opportunity, there will be no meaningful IoT beyond the simple orderly growth of the same sort of private applications of event processing and process control that we’ve seen for decades.  With a good strategy in place, IoT will meet many (perhaps, optimistically, even most) of the high expectations the market has set for it.  And somebody, or a small group of companies, will make some big bucks.