The Two Pieces to a Transformed Operator

I offered operators’ criticisms of transformation views, and a few of you wanted to know what my own views of transformation were.  Do I have something that operators wouldn’t criticize?  Probably not, but I do have a view based on my absorbing of their views and the running of my market model.  It’s actually not all that complicated either.

At the high level, transformation to me has two pieces.  The first is to create and sustain a minimally profitable business model for connection services.  Operators need to provide these services, and they can’t lose money on them.  The second piece is carrier cloud.  Whatever forms the growth engine for operators in the future, it won’t be connection services but hosted experiences.  Carrier cloud is what hosts them.  We’ll take these in order.

A minimally profitable connection service business model depends on two things.  First, you have to streamline operations through the introduction of automation to the service lifecycle.  This is hardly a revolutionary idea; CIMI published its first service management automation paper in 2002, and most of what was in it is still recognizable today.  Plenty of others did the same, but we’ve only now started to come to terms with reality.  Second, you have to rely much more on open hardware and software technology for the capital infrastructure you deploy.  With profit per bit in the toilet, you can’t pay 40% gross margins to a vendor.

Service lifecycle automation has to start with the customer experience, for the simple reason that customer-related displaceable opex represents 57% of opex overall.  Most operators realize now that service lifecycle automation, like web-enabled applications, has a front- and back-end process model.  The back-end stuff is where most of the current attention is focused because the perception is that by controlling what operations people do, you control cost.  The reality is that in order for service lifecycle automation to work, you have to start with a customer self-service and support portal.

The problem with ZTA as an isolated back-end activity is that it’s difficult to align it with the notion of services, which are after all what are being sold and consumed here.  A service isn’t a collection of technology, it’s a collection of experiences.  The customer does the experiencing, of course, and so the most critical thing we need is a vision of how a customer handles a service—from ordering to problem management.  Only a front-end portal-driven vision can offer that.

The scope of ZTA, and the means by which it’s applied, relate directly to the services from the customer side.  In particular they relate to the service-level agreement or SLA.  To give a simple example, look at two extremes, one where there is no real SLA at all—best-efforts services—and another were the SLA is very stringent in terms of number of and duration of failures.

In the best-efforts case, what the customer needs is confidence that the operator knows there’s a problem, has identified it at a high level, and has offered an estimated time to repair.  The customer doesn’t need to know that this or that trunk failed, what they really need to know is that the failure is either already being worked on (because someone else reported it or because it was identified through management tools) and that it should be fixed by a given time.

Last winter when my area was plagued by heavy, wet, snow, we had a number of protracted power outages.  My utility company offered a map (which of course I had to access via a cell site) showing where the reported outages were, and offering information on how many customers were impacted and what the current state of allocation of crews were.  If I reported an outage, I could be told either that I was part of another specific outage already reported, or my report would launch a new outage.  Yes, like anyone without power and visualizing all the foot in my freezer thawing while all the humans in my unheated home froze, I was frustrated.  Not as much as I’d have been had I been unable to determine whether anyone knew I had a problem, though.

The status-and-resolution-centric view of a service is appropriate where users don’t have specific guarantees.  It enables operators to manage the infrastructure as a shared resource, which is how utility companies work.  The limited SLA means that service lifecycle automation really isn’t much of a requirement, and that providing it wouldn’t necessarily do much for opex, as long as resource-and-capacity management tools were employed from planning into network operation.

With a stringent SLA, there are two differences that have to be addressed.  First, the service user has very specific contractual guarantees, which means that the data needed to support assertions that there is or isn’t a problem has to be provided to the customer, at least on request to resolve disputes.  Second, the fact that multiple services that share resources do so with different guarantees means that it’s important to manage SLAs at the service level and to remediate and escalate issues at that same level.  You can’t rely on resource management as much, or perhaps at all.  Thus, you need low-to-zero-touch service lifecycle management.

Even in this second case, the specifics of the SLA will have a lot to do with the level of service management automation required to address those customer-centric operations costs.  If you look back to the days of time-division multiplexing (TDM) services based on T1/E1, T3/E3, and SONET/SDH trunks, customers expected to get not only service-level data but facility-level data.  Trunks had things like “severely errored seconds” and “error-free seconds”, and remediation was expected at the trunk level.  Whenever we provide a “service” whose connectivity components are visible and individually guaranteed, we need to provide management visibility at least to the level of identifying facility-specific faults.  If the customer’s service includes some grooming/routing of trunks, we’d need to provide access to the tools that could do that.

Since we seem to be moving away from services that are made up of discrete, visible, elements in favor of virtual connectivity services, might we dispense with all this hype?  No, because of virtualization.  A corollary to our principles of matching service automation to service visibility is that stuff that’s under the virtualization abstract can never be made visible.  A customer doesn’t know a virtual device, hosted in a VM or container, run on a cluster of servers, is doing what they see as providing an edge firewall.  They know they have a firewall, and they expect to be able to see the status of that virtual device as though it were real.

The strongest and most enduring argument for service lifecycle automation, including the elusive zero-touch automation, is virtualization.  Users cannot manage structurally diverse resource pools associated with virtualization; it’s not possible for them to even know what is being done down there.  Even customer service people manage functions in abstract, because functions build services.  The translation of function to infrastructure, both at an individual function level and at a service-systemic level, has to be handled invisibly, and if that handling isn’t done efficiently then the inevitable complexity introduced by virtualization (a function on a VM on a server connected by pathways to other functions is more complex than a couple functions in an appliance) will kill the business case for virtualization.

This point is both the conclusion to the “make connection services profitable” track, and the launching point for the “carrier cloud” discussion.  Everything in carrier cloud, all of what it is made up of and what it’s expected to host, is a form of virtualization.  If a user is getting a TV show via wireline FTTN, wireless 4GLTE, 5G/FTTN, or 5G mobile, they are buying the experience and they’ll need to know when, if it’s not working, it will be fixed.  If anything, carrier cloud is going to carry virtualization to many deeper levels, virtualizing not devices, but devices within deployment models within service models, and so forth.  That risks further decoupling the experience being sold with the details of management.

“Carrier cloud” is a broad term that’s usually taken to include two areas—service features or functions that are hosted rather than embedded in appliances, and the resource pool needed for that hosting.  Like network infrastructure, carrier cloud is a kind of capability-in-waiting, assigned not to a specific task or service but available as an ad hoc resource to everything.  Like all such capabilities, the trick isn’t making those assignments from an established pool, but in getting the pool established in the first place.  The resources of carrier cloud are an enormous “first cost” that has to be managed, minimized.

We have a naïve assumption in the market that operators would simply capitalize carrier cloud and wait for opportunities to pour in, the “Field of Dreams” approach, named after a popular movie.  “Build it, and they will come” was the tagline of the film, and that might work for regulated monopolies, but not for public companies who have to mind their profit and loss statements.  Getting carrier cloud going requires both an assessment of potential revenue and potential cost.

I’ve blogged before on the carrier cloud demand drivers; NFV, virtualization of video and advertising features, IMS/EPC/5G, network operator cloud services, contextual services, and IoT.  All of these have timelines and extents of credible influence, and the summary impact of the group would create the largest potential opportunity driver between 2019 and 2023.  However, opportunity benefits have to be offset by opportunity costs to derive an ROI estimate, and that’s what’s needed to drive carrier cloud.

Hosting economies of scale are predictable using long-established mathematical formulas (Erlang).  James Martin wrote a series on this decades ago, so any operator could determine the efficiency of a given resource pool at the hardware/capital level.  They can’t determine the profitability of carrier cloud services because they can’t establish the operations costs, and from that and their target rate of return, the selling price, and from that the market penetration and total addressable market (TAM).  If all of the complexity of multi-level virtualization are exposed to “normal” operations practices, none of our carrier cloud services are likely to happen.

Whether we’re talking about automation to support efficient virtualization or to support efficient customer support, it’s likely that automation and intent modeling are linked.  At the lowest level of modeling, when an abstract element contains real resources, the management goal would be to satisfy the SLA internally through automated responses to conditions, then report a fault to the higher layer if resolution wasn’t possible.  The same is true if a model contains a hierarchy of subordinate models; there may be broader-scope resolutions possible across resources—replacing something that’s failing with something in a different area or even using a different technology.  That’s a cross-model problem at one level, but a unifying higher-level model would have to recognize both the fault and the possible pathway to remediation.

My view is that intent modeling hierarchies, and intent-based processing of lifecycle management steps through state/event analysis, is central to service lifecycle automation.  An intent hierarchy is critical in supporting an end-customer view (that should be based on the status variables exported by the highest levels of the model) and at the same time providing operations professionals with deeper information (by parsing downward through the structure).  It’s the technical piece that links the two transformation pathways.  If you model services based on intent modeling, you can then improve operations efficiency by automating customer support, and you can also ensure that carrier-cloud-hosted services or service elements aren’t marginalized by exploding operations costs.

We are still, as an industry, curiously quiet about model-driven service lifecycle automation, despite the fact that seminal work on the topic (largely by the TMF) is more than a decade old.  If we spent some time, even limited time, framing a good hierarchical intent-driven service model structure, we could leverage it to cover all the bases of transformation.  It might not be a sufficient condition for transformation (other things are also needed), but I think it’s clearly a necessary condition.