Service Automation: OSS/BSS or ZSM?

Are we seeing a hidden battle between operations automation alternatives?  On the one hand, there are clearly many developments in the OSS/BSS space, driven by vendors like Amdocs who want to reduce operations costs and improve operations practices, by enhancing traditional operations applications.  On the other hand, some operators are still looking for near-revolutionary changes in lifecycle automation, through things like ETSI ZSM or ONAP.  The balance of these two approaches could be very important.  In fact, it already has changed the nature of lifecycle automation.

One fundamental truth in network cost of ownership is that capex, for most operators, is lower than opex.  In fact, operators spend only about 20 cents per revenue dollar on capex, and they spend over 40% more than that on opex.  In 2016, when I started analyzing opex cost trends, service lifecycle automation could have saved operators an average of 7 cents on each revenue dollar, equivalent to cutting capex by a third.

Things like SDN and NFV were aimed primarily at capex reduction, and that in fact has been one of the issues.  The actual benefit of hosting functions on a cloud versus discrete devices has proven to be far less than 20%, and a lot of operators report that benefit is erased by the greater operational complexity of hosted-function networking.  It’s therefore not surprising that as the hopes for a capex revolution driven by virtualization waned, operators became sensitive to opex reduction opportunities.

Service lifecycle automation, meaning the handling of service events through software processes rather than manual intervention, has the advantage of scope of impact, and the same thing is a disadvantage.  Retooling operations systems to be driven by centralized automation platforms of any sort is the kind of change that makes operators very antsy.

That’s particularly true when there’s really no service lifecycle automation model to touch and feel.  In 2016, when the opex challenge really emerged in earnest, we had no progress in standards and little progress with the proto-ONAP framework.  Sadly, it’s my personal view that we’re in much the same situation today.  I do not believe that ETSI is on the right track, or even a survivable track, with ZSM, and I don’t think ONAP would scale to perform the lifecycle automation tasks we’re going to confront.

That’s where the OSS/BSS alternative comes in.  Vendors and operators both realized that “opex reductions” or “lifecycle automation” was really about being able to cut headcount.  Yes, you could frame a true service lifecycle automation to do that optimally, if you knew what you were doing.  Neither telcos nor telco equipment vendors apparently had the confidence they did.  You could also tweak the current operations systems to handle the current network-to-ops relationships better, and leave the network and network-related event-handling alone.

This isn’t, in the short term at least, a dumb notion.  Of that seven cents per revenue dollar that’s on the table for full-scale lifecycle automation, about three cents could be achieved by tweaking the OSS/BSS.  Some additional savings can be had by framing services to require less lifecycle automation; less dependence on SLAs, customer portals to reduce operator personnel needs, and so forth.  Overall, operators have generally been able to hold their ground on opex, and in many cases have even been able to reduce it over time.  At least four of those seven cents are now largely off the table.

While that doesn’t mean that ZSM (or what I’d consider a better model of lifecycle automation) is dead.  What it does likely do is tie lifecycle automation success to the widespread use of carrier cloud technology.  The substitution of functions as the building blocks for services, versus devices, demonstrably creates more service complexity.  However, even carrier cloud success might not create ZSM or ONAP success.

The cloud community, including Google, Microsoft, Amazon, Red Hat, and VMware, are all working feverishly to enhance the basic Kubernetes ecosystem.  That process will shortly create a framework for lifecycle automation for nearly all componentized applications, the only possible exception being the components associated with data-plane handling.  The exact nature of data-plane functions is still up in the air; most operators favor the notion of a white box rather than a commercial server.  Given that, only widespread NFV adoption using cloud hosting would be likely to accelerate the need for the ZSM or ONAP model of service lifecycle automation.  Otherwise the cloud-centric approach would serve better.

White box data-plane functions would really look like devices with somewhat elastic software loads, similar to the NFV uCPE model.  These applications don’t really impose a different management model for function-based services; the services are just based on open devices rather than vendor platforms.  I doubt whether the differences are sufficient to justify any new management model; we already manage devices in networks.

This seems to have been one of the original goals of NFV; if you focus on virtual devices you can employ device management for at least the higher-level management functions.  The only remaining task is the management of how the virtual elements are hosted and combined, which is a more limited mission.  It could have been a reasonable approach had it been more explicitly articulated and if the consequences of the approach (divided management) had been dealt with, by (for example) embedding the collection of functions within a virtual device in an intent-modeled element.

The OSS/BSS players seem to be sticking with the device-management approach, and that may well be because they don’t see a widespread operator push for deploying their own carrier cloud resources.  Until you commit to carrier cloud on a broad scale, meaning beyond NFV and 5G Core, you have no real need to consider how you manage naked functions.  That’s because follow-on drivers for carrier cloud, like IoT, don’t have real-device network models in place, so function-based services are likely to develop.  Where we have devices, the OSS/BSS-centric solution is workable, or can be made so.

As is often the case, where we end up with regard to operations automation will likely depend on just how far operators take “carrier cloud” and function-based services.  That seems likely to depend on whether operators stay within their narrow connection-services comfort zone, or step out into a broader vision of the services they could provide.