Most of you will recall that there has been a persistent goal to make OSS/BSS “event-driven”. Suppose we were to accept that was the right approach. Could we then apply some of the edge-computing and IoT principles of software structure and organization of work to the OSS/BSS? Let’s take a look at what would happen if we did that.
The theoretical baseline for OSS/BSS event-driven modernization is the venerable “NGOSS Contract” notion, which describes how the service contract (modeled based on the TMF SID model) can act as a kind of steering mechanism to link service events to operations/management processes (using, by the way, Service Oriented Architecture or SOA principles). This concept is a major step forward in thinking about operations evolution, but it’s not been widely adopted, and in many ways it’s incomplete and behind the times.
The most obvious issue with the NGOSS Contract approach is that it doesn’t address where the events come from. Today, most services are inherently multi-tenant with respect to infrastructure use, which means that a given infrastructure event might involve multiple services, or in some cases none at all. To make matters worse, most modern networks and all modern data centers have resource-level management and remediation processes that at the least supplement and at most replace service- or application-specific fault and performance management. The flow of events differs in each of the scenarios these event-related approaches.
The second problem is SOA. SOA principles don’t dictate that a given “service” which is an operations process in our discussion, be stateless, meaning that it doesn’t store information between executions. It’s the stateless property that lets you horizontally scale components under load or replace them when they break without interfering with operations. We have software concepts that many believe will (or have) superseded SOA—microservices and functional (Lambda) programming. Why would we “modernize” OSS/BSS using software concepts already deprecated?
The third problem with the approach is harder to visualize—it’s distributability. I don’t mean that the software processes could be hosted anywhere, but that there is a specific architecture that lets operators strike a balance between keeping control loops short for some events, and retaining service-contract-focused control over event steering. If I have an event in Outer Oshkosh that I want to handle quickly, I can put a process there, but will that distribution of the process then defeat my notion of model-driven event steering? If I put the contract there, how will I support events in Paris efficiently? If I put the contract in multiple places, have I lost true central control because service state is now multiply represented?
Reconciling all of this isn’t something that software principles like Lambda programming can fix by itself. You have to go to the top of the application ladder, to the overall software architecture and the way that work flows and things are organized. That really starts with the model that describes a network service or a cloud application as a distributed system of semi-autonomous components.
Outer Oshkosh and Paris, in my example, are administrative domains where we have a combination of raw event sources and processing resources. Each of these places are making a functional contribution to my service, and thus the first step in creating a unified, modern, event-driven OSS/BSS process is to model services based on functional contributions. There are natural points of function concentration in any service, created by user endpoints, workflow/traffic, or simply by the fact that there’s a physical facility there to hold things. These should be recognized in the service model.
The follow-on point to this is that function concentration points that are modeled are also intent-based systems that have states and both process and generate events. If something is happening in Paris or Outer Oshkosh that demands local event handling, then rather than forcing a central model to record the specifics of that handling, have a “local” model of the function behavior that does that. A service, then, would have a model element representing each of these functions, and would presumably be defining the event-to-process mappings not inside each of the functions (they’re black boxes) but rather then event-to-process mappings for the events those functions each generate at the service level.
This kind of structure is a bit like the notion of hierarchical management. You don’t try to run a vast organization from a single central point; you build sub-structures that have their own missions and methods, let each of them fill their roles their own way, and coordinate the results. This notion illustrates another important point on my example; it’s likely you would have a “US” and an “EU” structure that would be coordinating the smaller function concentrations in those geographies. In short, you have a hierarchy that sits between the raw event sources and the central model, and each level of that hierarchy absorbs the events below and generates new events that represent collective, unhandled, issues to the stuff above.
Edge processes in this model are essentially event-translators. They absorb local events and accommodate need for immediate short-loop reaction, and they maintain functional state as a means of generating appropriate events to higher-level elements. Thus, HighLevelThing is good if all its IntermediateLevelThings are good, and each of these depends on LowLevelThings.
This approach has the interesting property of letting you deploy elements of service lifecycle management to the specific places where events are being generated. In theory, you could even marshal extra processing resources to accommodate a failure, or to help you expedite the change from one service configuration to another.
The interesting thing about this sort of modeling and event-handling is that it also works with IoT. Precious little actual thought has gone into IoT; it’s all been hype and self-serving statements from vendors. The reality of IoT is that there will be little application-to-sensor interaction. Something like that neither scales nor can provide security and privacy assurance, not to mention being cost-effective.
The real-world IoT will be a series of function communities linked to sensors and using common basic event processing and event generation strategies. There might be a “Route 95 Near the NJ Bridge” community, for example, which would subscribe to events that are processed somewhere local to that point and refined into new events that relate specifically to traffic conditions at the specified intersection. This community might be a larger part of both the “US 95” community and the “NJ Turnpike” and “PA Turnpike” communities.
Function communities in IoT are hierarchical just like they are in network services, and for the same reason. If you’re planning a trip along the East Coast, you might need to know the overall conditions on Route 95, but you surely don’t need to know them further ahead than your travel timeline dictates. Such a trip, in IoT terms, is a path through function communities, and as you enter one you become interested in the details of what’s happening (traffic-wise) there, and more interested than before in conditions ahead. An “event” from the nearby community might relate to what’s happening now, but events from the next community in your path are interesting only if they’re likely to persist for the time you’ll need to get there.
Contrast the I95 trip approach I’ve described with what would be needed if every driver needed to query sensors along the route. Just figuring out which ones they needed, and which were real, would be daunting. The same is true for OSS/BSS or cloud computing or service orchestration. You need to divide complex systems into subsystems, so that each level in the hierarchy poses reasonable challenges in terms of modeling and execution.
The combination of a hierarchical modeling approach, functional/Lambda programming to create easily-migrated functions, and event-driven processes synchronized by the former and implemented through the latter, gives you an OSS/BSS and IoT approach that could work, and work far better than what we’ve been spinning up to now. If this could deliver operational efficiencies better, then it’s what we need to be talking about.