I’m always eager to look at anything on event-driven architectures, so the piece in New Stack caught my attention. It’s focused on “three myths” regarding EDA, and debunking them, but there are points along the way that relate to the value of event-driven models for telecom software. That should be a critical topic right now in the 5G and even edge computing space, and it’s not getting the attention it deserves. I do have to point out that I think the article promotes its own mythology, so we’ll have to navigate through that.
My biggest problems with the article are 1) it doesn’t really explain the rationale behind event-driven applications, and 2) it focuses on a specific subset of such applications, a subset where the biggest question is getting a process associated with an event in a serverless-cloud model. That combination makes it less insightful than it could be, and it omits many of the things that make event-driven models ideal for network/service lifecycle management.
Event-driven systems recognize that real-world conditions can be described by relating new things that are happening to the state of a system, as set by what happened before. We all use event-driven thinking to manage our lives; in fact, most of what we do, even conversations, are event-driven. That’s because life is contextual, meaning that things that happen are interpreted in context, based on what’s happened before.
An explicit requirement of event-driven systems is the concept of state, which is the condition of the system as determined by those past happenings. My first exposure to event-driven systems was writing protocol handlers, and for these, the “events” were messages from the other side of the connection, and the “state” was reflective of the progress of getting the connection working, exchanging information, or recovering from a problem. The way I did this, and the way almost all protocol handlers are written, is to establish a state/event table that defines, for each state, the processing to be done for a given event (in this case, message).
Suppose for a moment that we didn’t want to do this, that we wanted something other than state/event processing. The fact that we’re not organizing processing around a state/event table doesn’t mean that the thing we’re doing isn’t state/event-oriented. In our protocol handler example, we might receive a “LINK-INIT” message to set up a connection. If get such a message, can’t we just set the link up? No, because it might already be set up and in the data transfer state, in which case the message indicates our partner in communication has lost sync with us. So we’d have to check link state, test the variable that determines it.
What’s wrong with that, you might ask? Well, that link state variable that we’re testing means that our processing of the LINK-INIT event is now stateful. If we invoke a process defined in a state/event table, and we pass it our link variables with the event, the process is stateless, which means we can spin it up when we need it, spin up many copies, and so forth. This is how cloud-native software is supposed to work.
State/event programming isn’t something everyone does, and it’s also not something everyone understands. That leads to resistance, which leads to objections. Myth One in the article is that event-driven systems are difficult to manage. It’s true that view is widely held, but the article doesn’t address the issue in general, but again focuses on specific cloud/serverless applications. It says that event-brokers that link events to the associated process aren’t necessarily complicated. True, but we need to generalize the story for network and service lifecycle enthusiasts to get anything from it.
In a protocol handler, the event-broker equivalent is the processing of the event against the state/event table entries. People think this is difficult to manage because they think that it’s hard to address all the possible state/event combinations and define processes for them. Well, tough it out, people. If a system is naturally contextual, there is no alternative to looking at the possible contexts and how events relate to them. State/event tables make it pretty easy to see what might happen and how to handle it, because every possible state/event combination has to be assigned to a process. In a system that simply checks variables to determine state, there’s a risk that the software doesn’t anticipate all the possible variable states, and thus will fail in operation.
Myth Two in the piece should probably have been Myth One because everything comes down to this particular point, which is that event-driven systems are difficult to understand because they’re asynchronous and loosely coupled. Nice, neat, linear transaction-like processing is so much easier. Yes it is, for people who never did event-driven development, but as I’ve already pointed out, if the system is generating events that are interpreted based on state, making it transactional/synchronous and tightly coupled will make it stateful and unable to fully exploit the cloud. Do you want cloud benefits or not? If you do, then you’ve got to suck it up here too.
The thing that the piece doesn’t note, but which is critical in networking, is that the event-driven, state/event-based, approach may be the only practical way to handle complex services. Suppose that a service had five functional sub-services, which is completely realistic given geographic and technology spread. These five elements have to cooperate to make the service work, and each of them is an independent subsystem under the service, so the service state is determined by the state of the five. In a state/event model, the service is a hierarchy, where a superior element is responsible for selecting and commissioning subordinates, and is also responsible for remediation should a subordinate fail and be unable to self-correct. It’s easy to represent this kind of structure with state/event tables in a service data model (the TMF did this elegantly with NGOSS Contract well over a decade ago). Think of how complicated it would be to test variables in processing an event when you had to consider not only the state of yourself, but the state of superior and subordinate elements!
That leads to the last of our myths, which is that event-driven software is difficult to test and debug. As someone who’s written quite a bit of state/event-table software, I dispute that totally. It’s far easier to debut a hierarchy of state/event systems defined in a data model than to debug “spaghetti code” transactional logic to accomplish the same thing. With a table and data model, everything you need is in one place. In transactional code, it’s spread all over the place.
The worst enterprise network failure I ever saw as a consultant, a major disaster in healthcare, puzzled vendor and user experts alike. I resolved it because I understood state/event processing and that the system that had failed was a state/event system. Others who didn’t happen to have experience in that area couldn’t find the problem, and there’s no way of knowing how bad things might have gotten had nobody who was called in happened to have written protocol handlers.
What is true is that for state/event handling to work, it has to be done right, and this is where the lack of experience in the approach can hurt. Let’s go back to human conversation. Suppose you’re told to do something. The first question is whether you’re already doing something else, or are unable to do it because of a lack of skill or resources. The second question is the steps involved in doing it, and if the “something” involves linking multiple separate tasks, then each of them have to be organized and strung in order. We do this all the time, and so all we’re doing in state/event systems is coding what we’ve lived. Yes, it’s a shift of thinking, but it’s not transformational. I’ve run development teams and taught state/event development to the members, and I never had a programmer who couldn’t understand it.
I did have a lot who didn’t understand it at first, though, and I had a few who, when not supervised into the state/event approach, went back to old-style transactional code. Like most everything in software, event-driven systems have to be built around an architecture, an architecture that’s supported with the right tools and implemented to take full advantage of the benefits of the selected hosting environment, like the cloud.
For those who want to understand how a data-model-centric, state/event system would work through the entire process of service definition through use, I refer you to the tutorials on my ExperiaSphere work. Please note that all the ideas contained there are open, released by me for use without restrictions, but that the material itself and some of the terminology (like “ExperiaSphere”) are trademarks of CIMI Corporation, and you can use them only with permission. We won’t charge for that, but we will review your use to ensure that you are using the terms/materials in conformance to our rules, which are laid out at the start of each presentation.
One thing the article demonstrates is that the software industry may have too narrow a perspective on event-driven systems. It’s a bit of the classic “groping the elephant behind the screen” story; everyone is seeing the concept in relation to their own specific (and narrow) application. Event-driven systems are a fundamental change in software architecture, not because they’re new (they’ve been around for at least 40 years) but because the model has been confined to specialized applications. The cloud is now popularizing the approach, and a holistic appreciation for it and its benefits and risks, is essential for proper use of the cloud.