The Technical Steps to Achieve Service Operations Automation

If the concept of service operations automation is critical to NFV success and the NFV ISG doesn’t describe how to do it, how do you get it done?  I commented in an earlier blog that service operations could be orchestrated either within the OSS/BSS or within MANO.  The “best” place might establish where to look for NFV winners, so let’s explore the question further.

Service operations gets mired in legacy issues in the media.  We hear about OSS/BSS and we glaze over because we know the next comments are going to be about billing systems and order portals and so forth.  In truth, the billing and front-ending of OSS/BSS is fairly pedestrian, and there are plenty of implementations (proprietary and open-source).  These can be refined and tuned, but they can’t automate the service process by themselves.

Operations automation of any sort boils down to automating the response of operations systems to events.  These events can be generated either within the service framework itself, in the form of a move, add, or change for example.  They could also be generated in the resource framework, meaning they’d pop up as a result of conditions that emerged during operation.

Automating event-handling has been considered for some time.  As far as I can determine, the seminal work was done around 2008 by the TMF in their NGOSS Contract discussions, which grew into the GB942 specification.  The picture painted by the TMF was simple but revolutionary; events are steered to operations processes through the mediation of a contract data model.  In my view, this is the baseline requirement for service automation.  Things like billing systems respond to events and so are passed them by the model, and things like order entry systems generate events to be processed.

The reason why the TMF established the notion of a contract data model (the SID, in TMF terms) as the mediating point is that the automation of service events is impossible without event context.  Suppose somebody runs into a network operations center that’s supporting a million customers using a thousand routers and ten thousand ports and circuits, and says “a line is down!”  It’s not very helpful in establishing the proper response even in a human-driven process.  The fundamental notion of the TMF’s GB942 was that the contract data model would provide the context.

For that to work, we have to be able to steer events in both the service-framework and resource-framework sense.  It’s pretty easy to see how service events are properly steered because these events originate in software that has contract awareness.  You don’t randomly add sites to VPNs, somebody orders them, and that associates the event with a service data model.  The problem lies in the resource framework events.

In the old days, there was a 1:1 association between resources and services.  You ordered a data service and got TDM connections, which were yours alone.  The emergence of packet technology introduced the problem of shared resources.  Because a packet network isn’t supporting any given service or user with any given resource (it adapts to its own conditions with routing, for example), it’s difficult to say when something breaks that the break is causing this or that service fault.  This was one of the primary drivers for a separation of management functions in networking—we had “operations” meaning service operations and OSS/BSS, and we had “network management” meaning the NOC (network operations center) sustaining the pool of shared resources as a pool.

“Event-driven” OSS/BSS is a concept that’s emerged in part from GB942 and in part because of the issues of resource-framework events.  The goal is to tie resource events somehow into the event-steering capabilities of the service data model.  It’s not a particularly easy task, and most operators didn’t follow it through, which is a reason why GB942 isn’t often implemented (one of almost 50 operators I talked with said they did anything with it).

This is where things stood when NFV came along, and NFV’s virtualization made things worse.  The challenge here is that the resources that are generating events don’t even map to logical components of the services.  A “firewall” isn’t a device but a chain of software-hosted functions on VMs running on servers and connected with SDN tunnels into a chain.  Or maybe it’s a real device.  You see the problem.

Virtualization created the explicit need for two things that were probably useful all along, but not critical.  One was a hierarchically structured service data model made up of cataloged standard components.  You needed to be able to define how events were handled according to how pieces of the service were implemented, and this is easy to do if you can catalog the pieces along with their handling rules, then assemble them into services.  The other was explicit binding of service components to the resources that fulfill them, at the management level.

NFV’s data model wasn’t defined in the ISG’s Phase One work, but the body seems to be leaning toward the modern concept of an “intent model”, with abstract features, connection points, and an SLA.  This structure can be created with the TMF SID.  NFV also doesn’t define the binding process, and while the TMF SID could almost surely record bindings the process for doing that wasn’t described.  Binding, then, is the missing link.

There are two requirements for binding resources to services.  First, you must bind through the chain of service component structures you’ve created to define the service in the first place.  A service should “bind” to resources not directly but through its highest-level components, and so forth down to the real resources.  Second, you must bind indirectly to the resources themselves to preserve security and stability in multi-tenant operations.  A service as a representative of a specific user cannot “see” or “control” aspects of resource behavior when the resource is shared, unless the actions are mediated by policy.

So this is what you need to be looking for in an implementation of NFV that can address service operations automation—effective modeling but most of all effective binding.  Who has it?

Right now, I know enough about the implementations of NFV presented by Alcatel-Lucent, HP, Oracle, and Overture Networks to say that these companies could do service operations automation with at-most-minimal effort.  Of the three, I have the most detail on the modeling and binding for HP and Overture and therefore the most confidence in my views with regard to those two—they can do the job.  I have no information to suggest that the OSS/BSS players out there have achieved the same capabilities, so right now I’d say that NFV is in the lead.

What challenges their lead is that all the good stuff is out of scope to the standards work, while it’s definitely in-scope to the TMF.  I’m not impressed by the pace of the TMF’s ZOOM project, which has spent over a year doing what should have been largely done when it started.  But…I think a couple of good months of work for a couple people could totally define the TMF approach, and that would be possible for the TMF.  I don’t think the ISG can move that fast, which means that vendors like those I’ve named are operating in a never-never land between standards, so to speak.  They might end up defining a mechanism de facto, or demonstrating that no specific standard is even needed in the binding-and-model area.

The deciding factor in whether service operations automation is slaved to MANO or to the OSS/BSS may be history.  MANO is a new concept with a lot of industry momentum.  OSS/BSS is legacy, and while it doesn’t have far to go it’s had the potential to do all of this all along.  The same effort by the same couple of people would have generated all the goodies on this in 2008, and it hasn’t done it yet.  We have four plausible implementations on the NFV side now.  If those four vendors can hook to service operations automation they could make the business case for NFV, and perhaps change OSS/BSS forever in the process.