Event-Driven OSS/BSS: Useful, Possible…?

Why are operators interested in event-driven OSS/BSS?  Maybe I should qualify that as “some operators” or even “some people in some operators”.  In the last ten days, I’ve heard (for the second time) a sharp dispute within an operator regarding the future of OSS/BSS.  I’ve also heard vendors and analysts wonder what’s really on.  Here’s my response.

First, it’s important to remember that OSS/BSS systems were originally “batch” operations, meaning that their inputs weren’t even created in real time.  People processed orders and entered them, or field personnel went out and installed or fixed things, and then changed the inventories.  In short, OSS/BSS started off like all IT started, with something like keypunching and card reading.

Over time, what we call “OSS/BSS” today evolved as a transaction processing application set built for the network operator or communications service provider.  Online transaction processing (OLTP) replaced batch processing for the human-to-systems interaction model.  Eventually, as it did in other verticals, this OLTP was front-ended by web portals for customer order and self-service, and even for use by both the operators’ office staff and field personnel.

The recent trends (recent in telco-speak, meaning in the last decade) that have been pushing some people/operators for change has emerged largely from the network side rather than from the service side.  Physical provisioning used to be the rule.  If you wanted a phone line, somebody came out with a toolbelt and diddled in some strange-shaped can sitting (usually at a crooked angle) somewhere on the lawn.  You still have to get some physical provisioning, of course, but with IP networks the “service” is delivered over access facilities often set in place earlier.  Most of the stuff that has to be handled relates to the network side, now automated as opposed to manually configured.

So now, we have this network management system that actually tweaks all the hidden boxes in various places and creates our service.  You make service changes as box-tweaks, and if there’s a problem a box manager tells the service manager about it, so there can be an update to the customer care portal and so there can be (sometimes) consideration for an SLA violation and escalation or charge-back.

You can glue this new stuff onto the existing transaction-and-front-end-portal stuff, of course, and that’s what most operators have done.  Changing out an OSS/BSS system would be, for most operators, about as stressful as changing out a demand-deposit accounting system for retail banks.  The people who don’t want to see OSS/BSS revolution represent the group of operators where this stress dominates.

There are two issues that are driving some operators and some operator planners to doubt this approach.  First, there’s the long-standing operator fear of lock-in.  Remember how hard changing OSS/BSS systems would be?  For a vendor it’s like buying a bond—regular payments and little risk.  Second, as services created via the NMS get more complex and change more often, the relationship between OSS/BSS and NMS gets tighter, and the OSS/BSS can constrain what could be offered to customers.

When you have users at portals, customer service reps at transaction screens or portals of their own, and network condition changes all pouring into the same essentially monolithic applications, no good can come from it.  Collisions in commands and conditions can bring about truly bad results, and this is why service complexity tends to favor a modernized approach to OSS/BSS.  It’s also why you hear about making things “event-driven”.

But being event-driven opens other doors.  If we go back (as I know I often do!) to the TMF NGOSS Contract model, we find an approach that links network and operations events to processes via a contract data model.  Event “X” in Service State “3” means “Run Service Bravo”.  This has major implications on lock-in, and even on whether there needs to be anything we’d call an OSS/BSS at all.

What the NGOSS Contract does is to “service-ize” operations software.  Instead of a big monolithic chunk of code, you have a bunch of services/microservices that do very specific things.  A software vendor could offer “event services” instead of monolithic OSS/BSS systems.  Some of the event services might actually look a lot like things like a retail portal or an analytics tool, so some general-purpose “horizontal” software could be included where appropriate.  Operators could mix and match, which may be why vendors really hate this approach.

It may also explain why the whole TMF NGOSS Contract thing didn’t take off when it came along well over a decade ago.  The TMF has recently shown some signs it would like to resurrect the concept and make something of it, but in order for that to happen, the network operator members of the body will have to out-muscle the OSS/BSS vendors.  In most international groups I’ve been involved with, the operators are novices about manipulating group processes, so this is going to be a difficult challenge both for the operators and for the TMF.

What happens here could be important for the OSS/BSS space, providing that the TMF does advance the NGOSS Contract notion and that it’s actually implemented.  Remember that this was advanced once and wasn’t implemented, so we can’t take TMF acceptance as an indication of an actual product change.  If we do get this change, it could be the beginning of a period of rationalization of operations and business support software, which most would probably agree is long overdue.  It might also have other impacts.

The first is that this change might percolate into service provider operations overall.  Network management is even more event-centric than OSS/BSS, and yet the NMS and even zero-touch automation models currently evolving are really as monolithic as OSS/BSS systems.  If operators see the advantages of event-driven OSS/BSS, can they fail to see that extending the principle into network management, and thus to operations overall, would benefit them significantly?

The next question is whether a “contract-data-model” approach, combined with event-to-process steering by the model, could be used in other applications.  Remember that operations processes have been generally converging on the same transactional and portal shift that applications have followed in general, could operations software lead the rest of the space into a contract-driven approach?  If so, it would be a total software revolution.

The use of the contract-data-model approach would guide the path toward the use of microservices in application software.  Almost anything that deals with the real world, including traditional transaction processing, can be re-visualized as event-driven.  The limitations that the approach put on the processes (which is that they’re stateless insofar as they operate only on the contract data model data itself) could encourage functional decomposition to the appropriate level.  The result would be resilient and scalable because any instance of any given process could handle the events that the state/event relationship targeted to it.

Might this revolutionize the cloud, even create a kind of SaaS-as-microservices future?  Not so fast.  All event-driven systems have an inherent sensitivity to latency, because the in-flight time of data creates a window in which simultaneous events can’t be contextualized.  The same problem occurs (more often) in monolithic software implementations of event systems, where events have to be queued for processing when resources are available, and this loss of context is one reason why that monolithic approach isn’t suitable for event-driven systems, including those of network and service management (which is why I don’t like ONAP).

Apart from the contextual problems, event-driven systems have to manage latency to prevent workflows from accumulating too much of it as they pass around through a sea of microservices.  One of the reasons why it’s important to view a network as a series of hierarchical black-box intent models is to control the scope of event flows, so that you don’t end up having excessive response times.  If you want to believe in SaaS-for-everything and event-driven at the same time, we’d have to think carefully about how the contract data models and state/event tables were structured, and plot the event- and workflows carefully.  Of course, you don’t have to go event-driven to make OSS/BSS a SaaS project; monolithic software can be hosted in the cloud and offered as a service, but that’s another topic.

I believe that OSS/BSS systems are inevitably moving from specialized purpose-built software to collections of horizontal tools.  The key, for operators, is to recognize that at the same time there’s an event-driven dimension moving the needle, and if they ignore the latter trend, they may end up with a bunch of connected monoliths instead of a collection of services and microservices, and in the cloud age that would be a very bad outcome.