There have been a number of articles recently about the evolution of OSS/BSS, and certainly there’s pressure to drive evolution, based on a number of outside forces. Caroline Chappell, Heavy Reading analyst, has been a particularly good source of insightful questions and comments on the topic, and I think it’s time to go a bit into the transformation of OSS/BSS. Which, obviously, has to start with where it is now.
It’s convenient to look at operations systems through the lens of the Telemanagement Forum standards. The TMF stuff can be divided into data models and processes, and most people would recognize the first as being “the SID” and the second as being “eTOM”. While these two things are defined independently, there’s a presumption in how they’ve been described and implemented that is (I think) critical to the issue of OSS/BSS evolution. In most cases, eTOM describes a linear workflow, implemented in most cases through SOA and workflow engines like enterprise service bus (ESB). There is a presumptive process flow, in short, and the flow needs data that is drawn from the SID. This follows the structure of most software today; process model described one way, linked in a general way to a supporting data model independently (but hopefully symbiotically) described.
There are several reasons for this “workflow” model of operations. One is that OSS/BSS evolved as a means of coordinating manual and automated processes in synchrony, so you had to reflect a specific sequence of tasks. Another is that as packet networks evolved, operations systems tended to cede what I’ll call “functional management” to lower-layer EMS/NMS activities. That meant that the higher-level operations processes were more about ordering and billing and less about fulfillment.
Now, there’s pressure for that to change from a number of sources. Start with the transitioning from “provisioned” services to self-service, which presumes that the user is able to almost dial up a service feature as needed. Another is the desire for a higher level of service automation to reduce opex, and a third is a need to improve composability and agility of services through the use of software features, which changes the nature of what we mean by “creating” a service to something more like the cloud DevOps process. SDN and NFV are both drivers of change because they change the nature of how services are built, and in general the dynamism introduced to both resources and services by the notion of “virtualization” puts a lot of stress on a system designed to support humans connecting boxes, which was of course the root of OSS/BSS. These are combining to push OSS/BSS toward what is often called (and was called, in a Light Reading piece) a more “event-driven” model.
OK, how then do we get there? We have to start somewhere, and a good place is whether there’s been any useful standards work. A point I’ve made here before, and that has generated some questions from my readers, is that TMF GB942 offers a model for next-gen service orchestration. Some have been skeptical that the TMF could have done anything like this—it’s not a body widely known for its forward-looking insights. Some wonder how GB942 could have anticipated current market needs when it’s probably five years old. Most just wonder how it could work, so let me take a stab at applying GB942 principles to OSS/BSS modernization as it’s being driven today.
When GB942 came along, the authors had something radically different in mind. They proposed to integrate data, process, and service events in a single structure. Events would be handed off to processes through the intermediation of the service contract, a data model. If we were to put this into modern terms, we’d say that GB942 proposed to augment basic contract data in a parametric sense with service metadata that described the policies for event handling, the links between services and resources, etc. While I certainly take pride of authorship for the framework of CloudNFV, I’ve always said that it was inspired by GB942 (which it is).
Down under the covers, the problem that GB942 is trying to solve is the problem of context or state. A flow of processes means work is moved sequentially from one to another—like a chain of in- and out-boxes. When you shift to event processing you have to be able to establish context before you can process an event. That’s what GB942 is about—you insert a data/metadata model to describe process relationships and the data part of the model carries your parameters and state/contxt information.
This isn’t all that hard because protocol jandlers do it every day. All good protocol handlers are built on what’s called “state-event” logic. Visualizing this in service terms, a service starts off in, let’s say, the “Orderable” state. When a “ServiceOrder” event arrives in this state, you want to run a management process which we could call “OrderTheService”. When this has been done, the service might enter the “ActivationPending” state if it needed to become live at a future time or the “Active” state if it’s immediately available. If we got a “TerminateService” in the “Active” or “ActivationPending” state we’d kill the order and dismantle any service-to-resource commitments, but if we got it in the “Orderable” state it would be an error. You get the picture, I’m sure. The set of actions that should be taken (include “PostError”) for any combination of state/event can be expressed in a table, and that table can be included in service metadata.
You can hardly have customer service reps (or worse yet, customers) writing state/event tables, so the metadata-model approach demands that someone knowledgeable puts one together. Most services today are built up from component elements, and each such element would have its own little micro-definition. An architect would create this, and would also assemble the micro-services into full retail offerings. At every phase of this process, that architect could define a state/event table that relates how a service evolves from that ready-to-order state to having been sold and perhaps eventually cancelled. That definition, created for each service template, would carry over into the service contract and drive the lifecycle processes for as long as the services lived.
Connecting this all to the real world is the last complication. Obviously there has to be an event interface (in and out), and obviously there has to be a way of recording resource commitments to the service so that events can be linked through service descriptions to the right resources. (In the CloudNFV activity these linkages are created by the Management Visualizer and the Service Model Handler, respectively; whatever name you give them the functionality is needed.) You also need to have a pretty flexible and agile data/process coupling system (in CloudNFV this is provided by EnterpriseWeb) or there’s a lot of custom development to worry about, not to mention performance and scalability/availability.
In theory, you can make any componentized operations software system into something that’s event-driven by proceeding along these lines. All you have to do is to break the workflow presumptions and insert the metadata model of the state/event handling and related components. I’m not saying this is a walk in the park, but it takes relatively little time if you have the right tools and apply some resources. So the good news is that if you follow GB942 principles with the enhancements I’ve described here, you can modernize operations processes without tossing out everything you’ve done and starting over; you’re just orchestrating the processes based on events and not flowing work sequentially along presumptive (provisioning) lines. We’re not hearing much about this right now, but I think that will change in 2014. The calls for operations change are becoming shouts, and eventually somebody will wake up to the opportunities, which are immense.