Digging Deeper into Data-Driven Event-to-Process Coupling

In yesterday’s blog, I opened two points that I think are particularly critical for lifecycle automation.  The first is the notion of event coupling to processes via the service data model, something that came not from me but from the TMF’s NGOSS Contract work.  The second is the notion of service/resource domain separation, which you could infer from the TMF’s SID and its customer-facing versus resource-facing services (CFS and RFS, respectively).  Today I’d like to build a bit on these.

If we model a “service” (and, in theory, an application in the cloud), we could say that it consists of two sets of things.  The first is the functional elements that make up the service’s behavior, and which in many cases are orderable or at least composable elements.  The second is the resource bindings that link functional elements to resources.  I’ve called these the “service domain” and “resource domain” respectively.

Each domain consists of a hierarchical representation of elements.  At the top of the service domain is an element representing the service itself.  This would decompose into a series of parallel functional elements that directly made up the service.  An example might be “virtual-CPE” and “VPN”.  These highest-level elements could be decomposed into different service types—“vCPE-as-uCPE” for agile premises white-box devices and “vCPE-as-Service-Chain”, or “VPN-via-MPLS” versus “VPN-via-SDWAN”.

The division here would be within the service domain if it represented something that a customer or customer service rep could select, and would be priced differently or differ in some other meaningful contractual sense.  In some cases, the customer or CSR would select the specific low-level elements, if those were what the operator sold.  In others, the customer/CSR might select only the top-level functional elements, and the decomposition of those elements would then change depending on the parameters associated with each functional element happened to be.

Within the resource domain, there would be a high-level object that represented a behavior of a resource or collection of resources.  An example of a behavior could be “IP-any-to-any”, which could then decompose into things like “MPLS-VPN”, “VLAN”, “RFC2547” or whatever, or these behaviors might be exposed directly.  Either way, the decomposition of resource domain elements would eventually lead to parameterization of a network management system (to induce a VPN) or deployment of hosted elements that provided the specified feature.

The service data model for a service is the collection of data associated with the hierarchy of elements in the service.  In my own work, the presumption was that the service data model represented every element in the service, even if that element was a secondary decomposition option not taken for this particular service instance.  Thus, an enterprise VPN service would have the MPLS and SD-WAN options present in the service data model even though only one was selected.

The resource data model is likewise a collection of the data associated with the model hierarchy representing the behavior to which the service (at its “bottom” elements) was bound.  This model would have represented all the decomposition options at each level too, but not all the possible options for all the possible bindings between service and resource.

Binding is something I believe to be critical in service modeling.  It allows the service domain and resource domain to develop autonomously, with the former focusing on functionality and the latter on resource commitments to securing that functionality.  As long as the resource layer can generate behaviors that can be bound to the necessary functionality, it can support the service goals.  Correspondingly, a service function can be bound to any resource behavior that matches its functional needs, regardless of what technology is used or what administration offers it.  Hosted or appliance, software or hardware, wholesale or retail, my infrastructure or a partner—all are supported through behaviors and binding.

When a service is ordered, a service data model is created and populated, decomposing the elements as they’re encountered.  At the point where a service element has to be bound to resources, the process would stop to await an “activation”.  If, later, the service terminated, the “terminate” event would tear down the resource domain model but leave the service data model intact and in a “terminated” state.

This gets us to states and events.  Each model element has a state/event table representing the way it responds to outside conditions.  I found that it was possible to define a generic set of states and event that would serve all of the services and behaviors I was asked to consider, and so the state/event table in my work was made up of the same states/events for each service element, then again for each resource element.

The first event in my sequence was always the “Order” event and the first state the “Orderable” state.  A new service starts in the Orderable state, and when the customer order is complete an Order event is sent to the top object.  That object would consult its state/event table (for the Order event in the Orderable state), and run the associated process, which would always decompose the subordinate elements.  Each of them, as instantiated, would be sent their own Order event by the superior process, and this would eventually result in the service entering the Ordered state, where an Activate event would start the next progression.

Events of this sort flow “downward” from service toward resources, but events could also flow the other direction.  Suppose that there’s a problem with the uCPE hosting of a specific function.  The error would be reported by the uCPE in the resource domain, and the first responsibility for any element is to remediate itself.  If the problem couldn’t be fixed (by, let’s say, rebooting the device) then the next step would be to report the error up the chain to the superior object.  That report would signal that object to remediate, which would likely mean seeing if there was another possible decomposition that would still work.  Perhaps we’d substitute a cloud-hosted instance of the element for the uCPE instance.

All this decomposition and remediation is controlled by the processes that the events trigger.  Those processes would operate only on the parameters associated with the objects that are steering the events, and since those parameters are stored in the service order instance in a repository, any instance of the process would serve.  That means that the processes are effectively microservices and are fully scalable and resilient.  Further, since everything is operating off a service order instance, that instance is a complete record of the state of the service and everything connected with it.  Functions like billing and reporting could operate from the service order instance and need not be state/event-driven.

The integration potential of all of this falls out of this process-to-data-model correlation.  A resource element for “uCPE” for example would define the external properties of uCPE.  A uCPE vendor would be responsible for providing an implementation of the uCPE resource object that would represent their specific implementation.  That implementation would take the form of a set of parameters (that would fill into the service data model instance) and a set of processes, which would either be “stock” processes representing non-specific processes against the generalized uCPE model or their own specific processes that supported their own implementation.  With both those, all uCPE would be interchangeable.

Non-specific processes, meaning processes that would apply to generalized objects and not specific ones, could be authored by the provider of the framework, by the operator, or by third parties.  Each process, remember, works only on the object parameters associated with it and so the process is really specific to the parameters (meaning the implementation).  Many of these non-specific processes would do things like the basic state/event management, and others might be associated with “class” definitions like “uCPE” that represent a reference that vendors would be expected to conform to in their implementations of the class.

A monolithic implementation of zero-touch lifecycle automation could be converted to an event-and-model-driven implementation by extracting the processes and converting them to microservices that operated from the data model.  This would be complicated where the implementation had progressed a long time in the monolithic direction, as would be the case with ONAP, for example, but it would be possible.

There are a lot of ways to make event-driven processes work, but all of the effective ways have common properties.  First, they are asynchronous in that you dispatch an event to a process and the process takes over, perhaps generating an event at the completion.  You don’t wait for a response, and anything that allows waiting for a response is the wrong approach.  Second, they are microservices that don’t store anything internally so that any instance of a given process can handle its work.  I’ve offered examples here of how events can be used, but in theory you could make every request into an event.  For example, generating a bill could be an event, or generating a trouble ticket.  The entire OSS/BSS system could be a series of event-linked processes, which is what operators who want an “event-driven” operations system are looking for.

The biggest truth here is that you can’t do event-to-process coupling without a service/resource model that includes state/event tables.  That’s been my quarrel with ECOMP from the first; you can’t retrofit what’s supposed to be the fundamental step into something.  You start it right or you pay a big price in delay and effort to convert it, and that’s what we’re facing with both ECOMP and with OSS/BSS.

That’s where separating services and resources come in.  If you use service/resource modeling to create an abstraction layer that presents simple virtual functions upward to the OSS/BSS and that manages all the good stuff I’ve described within itself, you insulate the operations systems from the complexity of zero-touch lifecycle management.  That won’t help ECOMP, which is supposed to be doing lifecycle automation, but it would provide a rational path to OSS/BSS evolution.

I’d like to see the TMF pick up its own lead in this space.  As I said in my opening, the TMF has contributed what I think are the two critical pieces needed for effective service lifecycle automation, but hasn’t developed or modernized either one of them.  My own initiatives in the area have focused on “cloudifying” the TMF notion, which of course the TMF could do better on its own.  So how about it, TMF?  Can you show the same truly dazzling insights you showed in the past?  We surely need them.