Is Service Lifecycle Management Too Big a Problem for Orchestration to Solve?

Everyone has probably heard the old joke about a person reaching behind a curtain and trying to identify an elephant by touching various parts.  The moral is that sometimes part-identification gets in the way of recognizing the whole.  That raises what I think is an interesting question for our industry in achieving the transformation goals everyone has articulated.  Has our elephant gotten too big to grope, at least in any traditional way?  Is the minimum solution operators need beyond the maximum potential of the tools we’re committed to?

The steady decline in revenue per bit, first reported by operators more than five years ago, has reached the critical point for many.  Light Reading did a nice piece on DT’s cost issues, and it makes two important points.  First, operators need to address the convergence of cost and price per bit quickly, more quickly than credible new service revenue plans could be realized.  That leaves operators with only the option of targeting costs, near-term.  Second, operator initiatives to address costs have proven very complex because many of their costs aren’t service- or even technology-specific.  They can push their arms behind the curtain and grab something, but it’s too small a piece to deal with the glorious whole.

This is an interesting insight because it may explain why so many of our current technologies are under-realizing their expected impacts.  What operators have been seeking goes back a long way, about ten years, and the term we use for it today is “zero-touch automation”, which I’ve been calling “service lifecycle management automation” to reflect a bit more directly what people are trying to automate. Here, “zero touch” means what it says, the elimination of human processes that cost a lot and create errors, and the substitution of automated tools.

Like SDN and NFV?  Probably not.  Neither SDN nor NFV themselves address service lifecycle automation fully, they address only a substitution of one technical element for another.  Putting that in elephant terms, what we’ve been trying to do is apply what we learned from a single touch of some elephant part to the broad problem of dealing with the beast as a whole.  SDN and NFV are just too narrow as technologies to do that.

The next thing we tried was to apply some of the technology-specific automation strategies that emerged from SDN and NFV to that broader problem.  Orchestration in the NFV form of “MANO” (Management and Orchestration) was a critical insight of the NFV ISG, but the big question is whether the approach to automation that MANO takes can be broadened to address the whole of operator cost targets, “zero touch”.  If you touch an elephant’s toe, you can manicure it, but can you learn enough from that to care for the whole beast?

Simple scripting, meaning the recording of the steps needed to do something so they can be repeated consistently, isn’t enough here; there are too many steps and combinations.  That is what has already lead cloud DevOps automation toward an intent-modeled, event-driven, approach.  But now we have to ask whether even that is enough.  The problem is interdependence.  With intent-and-event systems, the individual processes are modeled and their lifecycle progression is synchronized by events.  The broader the set of processes you target, the more interdependent cycles you create, and the more combinations of conditions you are forced to address.  At some point, it becomes very difficult to visualize all the possible scenarios.

MANO orchestration has a simple, highly compartmentalized goal of deploying virtual functions.  Once deployed, it leaves the management of those functions to traditional processes.  It doesn’t try to orchestrate OSS/BSS elements or human tasks, and if you add in those things you create the interdependence problem.  You can visualize a service deployment as being access deployment plus service core deployment, which is a hierarchical relationship that’s fairly easy to model and orchestrate.  When you add in fault reports, journaling for billing, human tasks to modernize wiring, and all manner of other things, you not only add elements, you add relationships.  At some point you have more of a mesh than a hierarchy, and that level of interdependence is very difficult to model using any of the current tools.  Many can’t even model manual processes, and we’re going to have those in service lifecycle management until wires can crawl through conduits on their own.

What I am seeing is a growing realization that the problem of zero-touch is really, at the technical level, more like business process management (BPM) than it is about “orchestration” per se.  No matter how you manage the service lifecycle, sticking with the technical processes of deployment, redeployment, and changes will limit your ability to address the full range of operations costs.  BPM attempts to first model business processes and then automate them, which means it’s focused on processes directly—which means it can focus directly on costs, since processes are what incur them.

What we can’t do is adopt the more-or-less traditional BPM approaches, based on things like service busses or SOA (service-oriented architecture) interfaces that have a high overhead.  These are way too inefficient to permit fast passing of large numbers of events, and complex systems generate that.  Busses and SOA are better for linear workflows, and while the initial deployment of services could look like that, ongoing failure responses are surely not going to even remotely resemble old-fashioned transactions.

How about intent modeling?  In theory, an intent model could envelope anything.  We already know you can wrap software components like virtual network functions (VNFs) and SDN in intent models, and you can also wrap the management APIs of network and IT management systems.  There is no theoretical reason you can’t wrap a manual process in an intent model too.  Visualize an intent model for “Deploy CPE” which generates a shipping order to send something to the user, or a work order to dispatch a tech, or both.  The model could enter the “completed” state when a network signal/event is received to show the thing you sent has been connected properly.  If everything is modeled as a microservice, it can be made more efficient.

This seems to be a necessary condition for true zero-touch automation, particularly given that even if you eventually intend to automate a lot of stuff, it won’t be done all at once.  Even non-service-specific processes may still have to be automated on a per-service basis to avoid creating transformational chaos.  Some tasks may never be automated; humans still have to do many things in response to problems because boards don’t pull themselves.

It’s probably not a sufficient condition, though.  As I noted above, the more interdependent things you have in a given process model, the harder it is to synchronize the behavior of the system using traditional state/event mechanisms.  Even making it more efficient I execution won’t make it scalable.  I’m comfortable that the great majority of service deployments, at the technical level, could be automated using state/event logic, but I’m a lot less comfortable—well, frankly, I’m uncomfortable—saying that all the related manual processes could be synchronized as well.  Without synchronizing those broader processes, you miss too much cost-reduction opportunity and you risk having human processes getting out of step with your automation.

This is a bigger problem than it’s appeared to be to most, including me.  We’re going to need bigger solutions, and if there’s anything the last five years have taught me, it’s that we’re not going to get them from inside the telecom industry.  We have to go outside, to the broader world, because once you get past the purchasing and maintenance of equipment and some specific service-related stuff, business is business.  Most of the costs telcos need to wring out are business costs not network costs.  To mix metaphors here, we’re not only shooting behind the duck with SDN and NFV, we’re shooting at the wrong duck.

I’ve said for some time that we need to think of NFV, orchestration, and lifecycle automation more in terms of cloud processes than specialized network processes, and I think the evolving cost-reduction goals of operators reinforces this point.  If zero-touch automation is really an application of BPM to networking businesses, then we need to start treating it that way, and working to utilize BPM and cloud-BPM tools to achieve our goals.