Operators and the Single Source of Truth

One of the many concepts the cloud has brought us is the “single source of truth”, the idea behind the repository-centric operations vision called “GitOps”.  Some vendors, like Ciena, have recently been promoting a transformation model based on collecting data from operations systems and management systems into a repository.  This raises the question of whether GitOps-like modeling would truly be transformational to operators, or even be a step in the right direction.  To answer that, we have to lay out the challenge of transformation in operator-data terms.

In the beginning, there was manual operations.  The old-line telco model of the ‘70s and ‘80s was based on craft personnel (technical workers) doing stuff that created services.  The operations systems of the time, “operations support systems” or OSS and “business support systems” (BSS) were primarily recording manual steps after the fact, or generating work orders.  In a very real sense, they reflected the old-time batch processing mindset of IT, rooted in punched cards to capture transactions or commercial paper that was processed by humans, not systems.

As network devices became intelligent, which happened largely through architected voice and data service features that were finally rooted in routing and IP, the management of these network devices started to take humans out of many of the service creation tasks.  This created a “network management system”, and it raised what turned out to be a critical question—the relationship between NMSs and OSS/BSS.  The critical nature of that question was not recognized, and we ended up creating a link between OSS/BSS and NMS that mirrored the old craft-management model of fulfillment.  Instead of journaling human activity, we had OSS/BSS journaling NMS activity.  We still do today, as all the diagrams that show OSS/BSS as a box that’s “upstream” from the NMS, and separate.

Recognition that this was the wrong approach came along with the new millennium, when the Telemanagement Forum (TMF) released its Shared Information and Data model, or SID.  SID represented a truly revolutionary step in telco GitOps, before Git came along.  The essence of SID is that a single model defines the data necessary to describe a “service”.  I put the term in quotes because from the very first, the TMF adopted nomenclature inconsistent with the rest of the industry, so they call the orderable entity a “product”.  Whatever the name, the SID defined a repository, and as it evolved it came to include the notion of a service as a hierarchical structure of elements, each of which had an appearance that faced the customer (Customer-Facing Service or CSF) and another that faced resources (Resource-Facing Service or RFS).

Over the next decade, the TMF enhanced the SID-centric view with various features, most notably the “NGOSS Contract” which said that the service data model (the SID) provided the binding between service events and operations processes.  That created a model of an event-driven OSS/BSS that would be naturally integrated (by the fact that network and service events were both events) with NMS.  However, there were few complete implementations of the SID, and virtually none of NGOSS Contract.

This is important for two reasons.  First, we have had, courtesy of the TMF, the right repository concept for OSS/BSS/NMS for almost two decades.  There’s nothing new about gathering information in one place.  Second, given that we’ve had the SID for the career lifetimes of many in the industry and have not effectively exploited it, there are obviously barriers that need to be crossed.  That would be just as true for a new approach as the TMF approach.

Let me make it clear that I don’t think that the SID is an ideal model.  We have modern modeling approaches (like the OASIS TOSCA specification) that I believe are far better at addressing how we now view complex services, especially ones involving hosted elements.  The thing is, TOSCA has also been around for a while (having been approved in 2014) and it’s not set the world on fire, adoption-wise, either.  I think the barriers to a GitOps-like approach for operators lie beyond pure model technology.

Problem number one is the TMF’s approach to things.  They’re an organization with dues, officers, and biases toward sustaining their own business model.  They developed terminology that’s unique to them, and they provide most of their information only to dues-paying members.  You need to join and absorb their whole complex picture to understand their approach, and that tends to keep the TMF a kind of micro-culture, a secret society.  They’re not alone in this; I think you could make the same statement about most of the industry groups we have.  The problem is that it makes it hard to socialize the TMF vision to the outside-the-TMF world.

The second problem is that operators have long divided their organizations in a way that separates OSS/BSS (under the CIO in most cases) and network operations (under the chief operations officer).  The two groups have had a hands-off relationship for all the old-craft-support-model reasons I’ve already noted.  To integrate both network and service operations around a common model would imply integrating operations overall, meaning merging/eliminating some groups and perhaps jobs.  Finding political support for that is obviously not a trivial task.

The third problem is that management in the telco world knows “software” mostly in the telco context.  They’re not cloud people, not microservice people, at the senior level.  Even if they hire totally state-of-the-art people at the lower layers, those people are buffered from decisions by layers of people who literally don’t know what their juniors are talking about.

The final problem is that the impetus for the harmonization of everything around a model, the telco GitOps, is service lifecycle automation.  This is a massive initiative, one that telcos are ill-equipped to handle on their own, and that telco-centric vendors are afraid to push because it lengthens sales cycles and perhaps threatens the jobs of their prospective buyers to boot.

If transformation means transforming all operations practices around the “telco GitOps”, whatever the specific model technology used, then these four problems will have to be addressed.  The good news, I think, is that there is a convergence of modeling and tools between what the cloud is doing and what the telcos need done.  It would be better if the telcos would acknowledge that their needs are just applications of generalized tools designed for the world of virtual computing and networking, but eventually the cloud people will end up with something close to a right answer.  “Close” here, meaning close enough that visionary telco types can grasp the connection.

ETSI could have fixed this, both in the NFV group and the zero-touch automation initiative.  The TMF, obviously, could fix it too.  Neither have fixed it so far, and that’s why I think we have to forget the bland discussions about transformation and recognize that until we advance the cloud tools for virtual-lifecycle-management overall, to the point where telcos can recognize what’s there, we won’t move an inch.  A body or even company with the right mission and approach could hasten that happy insight convergence, but it’s realization that’s needed here, not just technology.

As I said yesterday in my blog about the application of cloud federation to operators, I think the cloud is going to provide the answers here.  The problem for operators is that they may not provide it in time.  We had, at one point, an addressable benefit case for service lifecycle automation equal to almost two-thirds of the total capex budget of operators.  That’s an enormous asset in terms of improving profit per bit.  The problem today is that tactical one-off improvements to operations in some areas have reduced the addressable benefits by about half, which means that it will now be much harder to justify lifecycle automation based on profit per bit.  Today, it’s probably tied to either significant substitution of hosted features for physical devices, or to climbing the value chain to more OTT-like services.  Business cases for both can be made, but with greater difficulty.  It would be smart for operators to start working with cloud players on how to advance “GitOps” and lifecycle automation overall.