What the Heck is NGOSS Contract and Why do I Care?

I ‘ve mentioned NGOSS Contract many times in prior blogs, and I was somewhat surprised when my latest blog (yesterday) raised questions from readers who were actually TMF members.  One was particularly interesting: “I’ve never heard of it.  What is it?”  The actual question was a big more complicated and interesting, and that was how NGOSS Contract related to service lifecycle automation and cloud-native deployments.  That’s what I’m hoping to answer here.

Let’s start with the definition, though.  Back in around 2004, the TMF started to rethink its OSS/BSS approach, leading to the concept of a Next-Generation OSS or NGOSS.  A part of this work was a framing of the service contract (the “NGOSS Contract”) as a means of representing process relationships to events.  This eventually became GB942.

In an early presentation on NGOSS Contract (from Jessie Jewitt of Ciena, who headed up the TMF’s Carrier Ethernet activity), she laid out some issues on high-level service coupling with OSS processes in a resource-independent way.  She asked me if I’d do something on NGOSS Contract, and I did a derivative paper in 2008.  The key diagram from the presentation was one that shows the Contract as the conduit for requests made to what were at the time Service-Oriented Architecture (SOA) services.  All this stuff sort-of-got incorporated in the TMF053 NGOSS Technology-Neutral Architecture (TNA).

You have to wonder what was behind all of this, what benefit the TMF advocates thought NGOSS Contract would bring to an industry that seemed to somehow chug along.  At least part of the answer is concurrency.  A service is the output of a real-time system, a system that has a lot of moving parts that are loosely federated into what we’d call “services” or “applications”.  Each of those parts has its own logic, its own hosts, its own role to play.  Each has to be put into service, fixed, and removed from service as the progress of the service/application lifecycle dictates.  That means that at any given time, you might have a dozen things that needed attention, a dozen events to process.  Why not have all those separate events drive their own instances of their own processes, as the Contract/SOA relationship of service-driven logic permits?

We now live in a world of cloud-native and microservices, not SOA, but that world actually needs the concept more than the world of 2008 did.  The problem with a true cloud-native implementation is that since everything in it has to be fully scalable and microservice-ized, the process elements have to be “stateless”, meaning that they process requests based on data in the requests and not according to data or context stored by the processes.  That’s what makes something cloud-native.  However, statelessness makes it hard to interpret things.

Suppose I get a monitoring system report of a server failure.  It’s an event that says “Server X Failed”.  The problem we face is that we don’t know what that failure means.  We have no context to interpret the event, and we thus don’t have any notion of what to do about it.  Obviously, whatever ran on Server X has also failed, but what was that?  Classic monolithic design says we’d spin through all our records of deployment to see what was on Server X, and from that we could see what processes had failed.  However, even if we knew what those processes were, what services do they form a part of?  How are those services expected to recover from the fault?  All that is context.

The presumption of NGOSS Contract is that when the service order was processed, the order instance created by the deployment was a record of everything relating to the service.  At the bottom of the hierarchy of model elements within the contract, representing features and access points and whatever, there would be a set of agents that received events from the resources those model elements referenced.  Those elements could then take an event, reference the process list in the service model to see what that particular event in the current service process state meant, and invoke the appropriate process to handle it.  Data model mediation is the way that the context that’s inherent in any cooperative behavior set gets imposed on a set of processes that themselves don’t have any contextual awareness.  The model is the context.

In this approach, the service model element’s information and the event information are what’s needed to process, so any instance of the process can receive that information and generate the same result.  That’s what makes cloud-native implementations scalable and resilient.  But the process depends on having something—something like the NGOSS Contract—maintain state for all the feature implementations, hold the contract and deployment data, and hold state/event tables to link events to processes in the appropriate way.  The microservices don’t have context because the contract does, and that frees the microservices to be their stateless, cloud-native, self.

In yesterday’s blog, this represents what I called “functional orchestration”, which means that the contract defines not just deployment but the coordination of the actual work.  It provides the steering mechanism to harmonize asynchronous service events and pass them properly to cloud-native elements for processing.  There is no “OSS” or “NMS” in this structure, in the traditional sense.  Functionally, the OSS or NMS is the collection of operations/management processes that are identified in the state/event tables.  Functionally, but not actually, because nothing except the contract assembles these things, and we could define any collection of processes that did something symbiotic as a “system”.

This point is why you have to be careful about “functional block diagrams”.  In NFV, as I’ve noted, and in most OSS/BSS modernization processes, what we do is define the functional elements of an OSS or NMS or an NFV MANO framework.  Those functional elements should be collections of cloud-native microservices linked via state/event tables to service conditions.  They should not be translated literally into monolithic software components, which sadly they have been.

This (I hope) illustrates the “Why?” of NGOSS Contract, but it doesn’t explain how we managed to miss the significance of these points for so long.  Some TMF friends told me five years ago that NGOSS Contract was rarely implemented, and I can tell you that all the operators I’ve dealt with have told me they don’t implement it.  I stumbled on it because the TMF project called “Service Delivery Framework” or SDF I’d joined was dealing with how to represent a service as a series of interdependent but autonomous functional elements.  Some operators asked me to do a proof-of-concept implementation (in Java, which became the first ExperiaSphere project), and that demonstrated to me that you couldn’t have scalability, autonomy of functions, and cooperation in mission without something minding the collective store.

I presented this to the TMF SDF team, but they didn’t pick up.  I presented it to the NFV ISG and they didn’t pick up either, and all the operators who were the inspiration of the ISG were long-standing TMF members.  So, the answer to why we missed this, why the TMF and the NFV ISG missed it, isn’t a question I can answer.  I suspect we may never be able to get a good answer to that, but we do still have a chance of correcting our failures.  I believe that as long as we don’t address the points NGOSS Contract raised, we have no chance of operations automation or cloud-native infrastructure.