An Operations Modernization Perspective on Software Automation and Orchestration

Across the various places I write on networking topics, the topic that consistently outperforms the rest is OSS/BSS.  I also get more comments and questions on that topic, and operators tell me that it’s become the most important issue.  That all means that this is a good time to look at just where we are on OSS/BSS modernization.  That the TM Forum, the standards leader for OSS/BSS, has just dumped its CEO makes the topic of modernization positively poignant.

I have to start the topic with an important truth, which is that the CIO who drives OSS/BSS transformation is still largely outside the network planning process.  Across the spectrum from Tier One through Tier Three, just short of nine out of ten operators say that they separate operations and network planning.  About the same number say that integrating new services and technologies with OSS/BSS is a matter of creating the right record formats, or perhaps the right APIs.  Even though the Tier One operators almost universally created “transformation teams” to cross organizational boundaries in evolving their services and infrastructure, the great majority of them have quietly dropped or de-emphasized these teams.  We’re back to business as usual.

The isolation of CIO/OSS/BSS has been a major contributor to the limited target adopted by SDN and NFV standards.  It’s also why there’s been a sharp division within operators on what should be done on the operations side.  In fact, just as you could divide CTO and COO people based on whether they saw next-gen networking as a virtual revolution or a minimal transformation, you can divide OSS/BSS people on both the goals and the path to operations renewal.  Some simply wanted software modernization, a more open way of selecting operations tools that broke traditional monolithic stacks that locked an operator into a vendor.  Others wanted something “event-driven”, where service lifecycle changes drove specific OSS/BSS activity in a more automated way.

What SDN and NFV have done is empowered the latter view, in no small part because creating service lifecycle management in a separate MANO-like process would threaten to turn all of operations software into a kind of passive record-keeper.  Some OSS/BSS vendors (Netcracker, Amdocs) have framed architectures for OSS/BSS orchestration, but operators think they’re as vague as NFV strategies.  In fact, there are a lot of common issues between event-driven OSS/BSS and NFV.

Most operators would say that the two problems that NFV has at the technical architecture level is that VNFs don’t have a solid architectural framework in which to run, and service lifecycle management is simply the wild west in disorder.  In event-driven OSS/BSS, the problem is that OSS/BSS components have no solid architectural framework in which to run, and that service lifecycle management is also the wild west.

The issue of the architectural framework for either VNFs or OSS/BSS components is significant because without some specific interface through which these can connect with the rest of the management/orchestration and operations software, every service would require customization.  That’s already being experienced in the NFV world, giving rise to the “snowflakes versus Legos” analogy that reflects the uniqueness of each VNF.  This issue, then, is both a threat to easy on-boarding of elements and to the openness of the overall deployment.

The service lifecycle management problem is critical for everything in next-gen networking, and it’s actually the thing that provides both a technical and benefits linkage between SDN, NFV, and OSS/BSS modernization.  The goal is simple; to activate software processes in response to service and infrastructure events, to reduce significantly or eliminate the need for manual activity.  Since the cost of all this manual activity and related costs currently runs more than 50% higher than the total capital budget of operators, that’s a pretty significant goal.  By 2020, my model says that capex will decline under spending constraints and opex (absent software automation) would continue to expand.  The result is that opex could exceed capex by 90%.

Another critical (and often-ignored) point is that both SDN and NFV increase complexity, and that unless the cost impact of that complexity increase is mitigated by software automation, you’ll not see any net benefit to either technology even where it can be applied.  Service modeling and automation is thus critical in even defending capex reductions.

I’ve argued for literally years that the only path to achieve savings in process opex was model-driven service automation that defined, for each model element, the service lifecycle in state/event form.  I produced an open model that I called “ExperiaSphere” (the name was taken from a previous service-automation project I’d done for the TMF SDF activity) that’s represented by five PowerPoint/PDF tutorials running an average of 30-odd slides each.  This, in about six months’ part time, one-person, effort.  Yet we’ve had a NFV management and orchestration project and a TMF project staffed by dozens each, and neither has generated even a high-level picture of such a model.  Since I know many of the people and they’re far from dumb, you can attribute this failure only to a lack of motivation.

Modeling isn’t a new concept for software deployment and lifecycle management.  Complex software deployments are typically handled by a set of tools called “DevOps”, and there are two models of DevOps in use.  One lets you define modular scripts for deploying things, and it’s called the “imperative” model—it tells you what to do.  The other defines end-state goals, so it’s called the “declarative” approach.  Both support modular nesting of compositions, and both have a provision for event-handling.  The OASIS TOSCA cloud model has the same capabilities.  With all of this, why have we not accepted this most basic rule of software automation for network service deployment—you have to model it in a structured way?

Model-driven automation of the service lifecycle is a given.  It’s a simple extension of software and cloud practices.  It’s going to happen either within the OSS/BSS or outside it.  We can already see that operators like AT&T and Verizon have acknowledged that limited-scope NFV orchestration wasn’t enough; their models are based on multi-layered orchestration.  Interestingly, their models seem to live below the OSS/BSS.

And that’s where the solution may end up.  While it would be possible, even logical, to build orchestration and service modeling into an OSS/BSS, the OSS/BSS (and CIO) disconnect from initiatives like SDN and NFV make it harder to integrate high-level orchestration with emerging technical developments.

Fortunately, it’s also very possible to build modeling and orchestration outside the OSS/BSS, and to represent services to operations systems as virtual devices.  If that happens, then the future belongs to whatever activity ends up owning the service automation piece, and it wouldn’t be the current CIO group.  For that opportunity, though, the outside-the-OSS/BSS vendor would have to accept a challenge.

The difference in where the model lives could make a difference in how well the model can work.  Models that live inside the OSS/BSS can easily integrate operations processes, but to work they’d force the OSS/BSS to be aware of service lifecycle and resource conditions that are now typically handled by management software outside the OSS/BSS.  In contrast, service modeling below the OSS/BSS would preserve the current “virtual-device” mechanism for making OSS/BSS systems aware of the network, and potentially make it easy to integrate management processes, but harder to automate the OSS/BSS.

Logically, the only good solution to service modeling is to require that both management and operations processes be presented via an API and then be orchestrated into state/event handling that’s defined in the service model, wherever it lives.  If you do that, then you really support a third option for service modeling, an option where it forms a boundary layer between “management” and “operations” and composes both.

To make this sort of thing work, you first need a broad and agile service modeling and orchestration strategy.  I believe that the boundary-layer approach, if optimal, would foreclose having layers of orchestration based on dissimilar modeling.  Such firm technology borders would invite breaking up the boundary function into things that had to be firmly on one side or the other of the operations/management DMZ.

The second thing you’d need is a standard API set to describe the model-to-process linkages.  We don’t have to demand that current interfaces be rewritten, only that they be adapted via a software design pattern to a standard API.  Eventually, I think that API would be accepted by vendors because operators would prefer it.

We could do all of this very quickly if a group of powerful players (vendors, operators, or both) got behind it.  Absent that kind of support, I think we’ll end up in this place eventually, but only after wasting half-a-decade in diddling.  During which investment in infrastructure and vendor revenues are likely to suffer.

If I were a network equipment vendor I’d be jumping on this approach with every resource I could bring to bear.  It would take the pressure off my sales and reduce the threat that SDN or NFV (or both) would permanently reduce my total addressable market.  If I were a server vendor I’d be jumping on it too, because to control the hosting opportunity I have to control the overall business case, and that business case is established by the boundary-layer approach.

The key point to remember here is that the business case isn’t going to be established any other way.  We know now, if we look at the work of the last four years objectively, that there is no way that SDN or NFV are going to make a business case without the largest contribution coming from service lifecycle automation.  Without some unification of management and operations processes for service lifecycle automation, that maximum contribution isn’t possible.