What Operators Think about Service-Event versus Infrastructure-Event Automation

I’m continuing to work through information I’ve been getting from operators worldwide on the lessons they’re learning from SDN and NFV trials and PoCs.  The focus of today is the relationship between OSS/BSS and these new technologies.  Despite the fact that operators say they are still not satisfied with the level of operations integration into early trials, they are getting some useful information.

One interesting point clear from the first is that operators see two different broad OSS/BSS-to-NFV (and SDN) relationships emerging.  In the first, the operations systems are primarily handling what we would call service-level activities.  The OSS/BSS has to accept orders, initiate deployment, and field changes in network state that would have an impact on service state.  In the second, we see OSS/BSS actually getting involved in lower-level provisioning and fault management.

There doesn’t seem to be a strong correlation between which model an operator thinks will win out and the size or location of the operator.  There’s even considerable debate in larger operators as to which is best, though everyone said they had currently adopted one approach and nearly everyone thought they’d stay with it for the next three years.  All this suggests to me that the current operations model evolved into existence based on tactical moves, rather than having been planned and deployed.

There is a loose correlation between which model an operator selects and the extent to which that operator sees seismic changes in operations as being good and necessary.  In particular, I find that operators who have pure service-level OSS/BSS models today are most likely to be concerned about making their system more event-driven.  Three-quarters of all the operators in the service-based-operations area think that’s necessary.  Interestingly, those that do not seem to be following a “Cisco model” of SDN and NFV, where functional APIs and policy management regulate infrastructure.  That suggests that Cisco’s approach is working, both in terms of setting market expectations and in fulfilling early needs.

The issue of making operations event-driven seems to be the technical step that epitomizes the whole “virtual-infrastructure transition”.  Everyone accepts that future services will be supported with more automated tools.  The question seems to be how these tools relate to operations, which means how much orchestration is pulled into OSS/BSS versus put somewhere else (below the operations systems).  It also depends on what you think an “event” is.

Most operations systems today are workflow-based systems, meaning that they structure a linear process flow that roughly maps to the way “provisioning” of a service is done.  While nobody depends on manual processes any longer, they do still tend to see the process of creating and deploying a service to be a series of interrupted steps, with the interruption representing some activity that has to signal its completion.  What you might call a “service-level event” represents a service-significant change of status, and since these happen rarely it’s not proved difficult to take care of them within the current OSS/BSS model.

The challenge, at least as far the “event-driven” school of operations people is concerned, lies in the extension of software tools to automatic remediation of issues.  One operator was clear:  “I can demonstrate OSS/BSS integration at the high level of the service lifecycle, but I’m not sure how fault management is handled.  Maybe it isn’t.”  That reflects the core question; do you make operations event-driven and dynamic enough to envelop the new service automation tasks associated with things like NFV and SDN, or do you perform those tasks outside the OSS/BSS?

This is where I think the operators’ view of Cisco’s approach is interesting.  In Cisco’s ACI model, you set policies to represent what you want.  Those policies then guide how infrastructure is managed and traffic or problems are accommodated.  Analytics reports an objective policy failure, and that triggers an operations response more likely to look like trouble-ticket management or billing credits than like automatic remediation.  It’s not, the operators say, that Cisco doesn’t or can’t remediate, but that resource management is orthogonal to service management, and the “new” NFV or SDN events that have to be software-handled are all fielded in the resource domain.

Most operators think that this approach is contrary to the vision that NFV at least articulates, and in fact it’s NFV that poses the largest risk of change.  It’s clear that NFV envisions a future where software processes not only control connectivity and transport parameters to change routes or service behavior, the processes also install, move, and scale service functionality that’s hosted not embedded.  This means that to these operators, NFV doesn’t fit in either a “service-event” model or in a “resource-based-event-handling model.  You really do need something new in play, which raises the question of where to put it.

The service-event-driven OSS/BSS planners think the answer to that is easy; you build NFV MANO below the OSS/BSS and you field and dispatch service-layer events to coordinate operations processes and infrastructure events.  This does not demand a major change in operations.  The remainder of the planners think that somehow either operations has to field infrastructure events and host MANO functions, or that MANO has to orchestrate both operations and infrastructure-management tasks together, creating a single service model top to bottom.

I’ve always advocated that view and so I’d love to tell you that there’s a groundswell of support arising for it.  That’s not the case.  In all the operators I’ve talked with, only five seem to have any recognition of the value of this coordinated operations/infrastructure event orchestration and only one seems to have grasped its benefits and how to achieve it.

What this means is that the PoCs and tests and trials underway now are just starting to dip a toe in the main issue pool, which is not how you make OSS/BSS launch NFV deployment or command NFV to tear down a service, but how you integrate all the other infrastructure-level automated management tasks with operations/service management.  This is what I think should be the focus of trials and tests for the second half of 2015.  We know that “NFV works” in that we know that you can deploy virtual functions and connect them to create services.  What we have to find out is whether we can fit those capabilities into the rest of the service lifecycle, which is partly supported by non-NFV elements and overlaid entirely by OSS/BSS processes that are not directly linked with MANO’s notion of a service lifecycle.

I think we may be close to this, and though “close” doesn’t mean “real close”, I think that the inertia of OSS/BSS is working in favor of keeping service events and infrastructure events separated and handling the latter outside OSS/BSS.  Since that’s what most are doing now, this might be a case where the status quo isn’t too bad a thing.  The only issue will be codifying how below-the-OSS orchestration and the OSS/BSS processes link with each other in a way broad and flexible enough to address all the service options we’re hoping to target with NFV.