Have We Forgotten a Key Piece of Service Lifecycle Automation?

We’ve all heard the talk about automation and opex reduction as means of improving both service and revenue per bit.  Part of the implicit goals in increasing operational efficiency is a shifting of some tasks to an automated form, but a bigger part has to come from shifting customer care responsibility more directly and efficiently to the customer.  That means that a portal is a critical piece of the story.

Customer care, meaning customer technical support and technical sales support, has exceeded network operations costs for the last five years, and the two pieces are growing about 40% faster.  Not only that, customer care overall has a significant impact on customer acquisition and retention costs, which are higher still and growing even faster.  If we could assume that we could project pre-sale and technical support through a portal, we could not only reduce the staffing requirements (in many operators already expensed through use of a third party) but also improve customer retention.

A couple years ago, I did a survey of consumer and business attitudes toward the handling of technical problems and questions.  I found, to nobody’s surprise, that about two-thirds of users reported “less than satisfied” with their experiences, and about a quarter reported themselves to be “very unsatisfied”.  Business and residential experience was similar, but with a tilt toward more unsatisfactory for consumers.  Current support isn’t particularly popular, and offshoring trends are the factor cited most often as the source of the problem.

It’s not just disgruntled customers, either.  It’s also fleeing customers.  A single negative support experience, even one that is considered very unsatisfactory, generates only about a year of angst, which will subside if nothing else happens.  Several experiences, particularly if they’re spaced about six months apart, generate a continuous negative attitude.  If this persists to the point where the service contract is up for renewal, customers of all types tend to look at competitive options, and if there is another choice at a comparable price, those with the “very unsatisfactory” view of support are three times as likely to jump ship as average.  All of this is why operators put customer care high on their list of priorities.

The goal of operators, which I’ve seen both in surveys and contacts and in real consulting, is establishing a portal that represents the totality of their relationship with customers.  The portal offers service order support, including marketing, technical support, and problem resolution.  Most operators also want to have a kind of status indicator, the classic green/yellow/red service state for each service, summarized upward by service type and eventually reaching a customer-level status indicator.  Some want information on periods of maintenance, planned upgrades, etc.

One of the specific challenges that service lifecycle automation runs into is that this customer care stuff is typically seen as an operations-level task, meaning that it’s related to the OSS/BSS systems.  Over the last five years, network operations and service operations have actually separated somewhat, with improvements in the latter creating a situation where the former area doesn’t necessarily have deep visibility into service resources and service state.  SDN and NFV, which today are largely being automated using a “virtual device” model that presents the status of logical features rather than physical elements, seems to be widening the gap.

There are two tasks here to address, then.  One is to create a series of service management APIs that allow non-technical inspection of and intervention into service behavior.  That has to be different from the capabilities offered to customer service and network operations personnel, usually “higher-level” meaning more translated into common language and more filtered against accidental errors.  The other task is to construct views of the underlying service/network data according to the needs of specific users inside and outside the operator organization, and the policies and regulations that govern the space.

My view has always been that the easiest way to get the duality of requirements noted above would be to apply the principles of an old (and now-obsolete) Internet RFC called “i2aex” which stands for Infrastructure to application exposure”.  The notion of i2aex was to use proxy functions to suck data out of MIBs for everything and record them in a database.  Queries would then be run on this database to produce management views.  Updates would be pushed through the reverse process, and policy filters would limit what different roles could see and do.  I called this “derived operations” in my 2012-early-2013 presentations, and I incorporated it in the original CloudNFV architecture.

One of the reasons for this kind of indirection is that customer access to management data in any form poses a risk of overloading the associated APIs.  The classic example is an outage, which causes every customer impacted by the problem to immediately look for status, which swamps the APIs that provide for monitoring and control, which prevents the NOC from taking action by creating what’s almost like a denial of service attack.

There are some open-source management tools that work this way, even if i2aex never really gained traction and acceptance.  Given the proposed role of analytics in network and service management, capacity planning, and other network operations roles, it seems logical to me that having all the data available in a nice time-stamped repository would be the logical solution to everyone’s problems.  Certainly it would make the portal process easier.

Management repositories like the one i2aex envisioned look like ordinary databases, which means that normal analytic tools and web front-end tools for data digestion, presentation, and even updating would work on them.  Every worker role, every customer role, and even every third-party role could be given a customized view of everything they’re entitled to see; “derived operations” in action.  It could make the presentation of customer care interfaces an easy web development task, fostering greater customization of the GUI depending on factors like the nature of the customer’s support contract, the skill of personnel, the use of third-party integrators or MSPs, and so forth.

Vendors, likely seeing a loss of management differentiation, haven’t been wild about the approach (though it was a Cisco employee who led the i2aex draft).  Even operators have been cool, with some saying they feared the impact of gathering all the telemetry from all the resources and functional elements.  They tend to change their view when you explain that 1) the new approach would eliminate the risk of management denial-of-service issues, 2) the use of analytical tools implies the same amount of access to the same information to have current network state information and trends available, and 3) the rate of access to management portals would be controllable if one element polled them where it’s not if everyone interested just takes their shot when it’s convenient.

Portals are useless if they’re as monolithic and rigid in features and functions as the underlying operations systems have been.  You need to have full visibility, but you also need to decide how you’re going to exploit it.  I’m not saying the notion of “derived operations” is the only solution, but it’s a solution that would obviously work.  If vendors have a different approach, they need to describe it.