Another Slant on the Service Lifecycle Automation Elephant

I asked in an earlier blog whether the elephant of service automation was too big to grope.  The Light Reading 2020 conference this week raised a different slant on that old parable, which is whether you can build an elephant from the parts you’ve perceived by touching them.

Wikipedia cites the original story of the elephant as:

A group of blind men heard that a strange animal, called an elephant, had been brought to the town, but none of them were aware of its shape and form. Out of curiosity, they said: “We must inspect and know it by touch, of which we are capable”. So, they sought it out, and when they found it they groped about it. In the case of the first person, whose hand landed on the trunk, said “This being is like a thick snake”. For another one whose hand reached its ear, it seemed like a kind of fan. As for another person, whose hand was upon its leg, said, the elephant is a pillar like a tree-trunk. The blind man who placed his hand upon its side said, “elephant is a wall”. Another who felt its tail, described it as a rope. The last felt its tusk, stating the elephant is that which is hard, smooth and like a spear.

My original point was that it’s possible to envision an elephant/problem that’s just too big to be grasped (or groped!) in pieces.  The question raised by the Light Reading piece is whether you can assemble an elephant from what you thought the pieces were.  Snakes and fans and walls and spears don’t seem likely to provide the right tools for elephant-assembly, right?  Might it be that the tools of our current transformation process don’t add up to the whole at all?

Most of the key initiatives to support transformation of the network operator business model have been deliberately localized in terms of scope.  SDN and NFV, for example, didn’t cover the operations side at all; it was out-of-scope from the first.  It’s also out-of-scope for 5G and IoT, and both these initiatives talk about SDN and NFV but acknowledge that the standards for them are created elsewhere.

Like the blind men in the Wikipedia version of the elephant story, the standards bodies are dealing with what’s in scope to them, and that has totally fragmented the process of transforming things.  This fragmentation isn’t critical when the standards cover an area fully enough to guide and justify deployment, but remember that “transformation” is a very broad goal.  Addressing it will require very broad initiatives, and we have none of them today.

If the whole of transformation can’t be created by summing the parts of initiatives we’ve undertaken, can we really expect to get there?  I’ve been involved in three international standards activities aimed at some part of transformation, and indirectly associated with a fourth.  I’ve seen three very specific problems that fragmentation of the standards initiatives have created, and either could compromise our goals.

The first problem is the classical “dropped ball” problem.  For example, if the NFV ISG decided that operations is out of scope, how does the body ensure that any specific operations impacts of their activity are met by somebody?  The classic standards-group answer is “liaisons”, between the groups, but we’re still seeing liaison requests submitted and approved by the ISG four years after the process started.

What we’re lacking to address this problem effectively is a single vision of the parts that have to add up to our glorious transformation whole.  Not the details, just the identification of the total requirements set and how that set is divided up among the bodies doing the work.  This could, of course, guide liaison by identifying what is essential in the way of the relationships across the various groups.  It could also bring to the fore the understanding that there are areas in the work Group A is doing that can be expected to heavily impact Group B, thus showing that there needs to be special attention given to harmonization.

There’s nowhere this is more obvious than in the relationship between NFV and the cloud.  What is a VNF, if not a network application of cloud computing principles?  We were doing cloud computing before NFV ever started, and logically should have used cloud computing standards as the basis for NFV.  I firmly believe (and have believed from the first) that the logical way to do NFV was to presume that it was a lightweight organizing layer on top of cloud standards.  That’s not how it’s developed.

The second problem is the “Columbus problem”.  If you start off looking for a route to the East and run into an entirely new continent instead, how long does it take for you to realize that your original mission has resulted in myopia, and that your basic premise was totally wrong?

We have that problem today in the way we’re looking at network transformation.  Anyone who looks at the way operator costs are divided, or that talks with operators about where benefits of new technologies would have to come from, knows that simple substitution of a virtual instance of a function (a VNF) for a device (a “physical network function” or PNF) isn’t going to save enough.  In 2013, most of the operators who signed off on the original Call for Action admitted they could get at least that level of savings by “beating Huawei up on price”.  They needed opex reduction and new service agility to do the job, and yet this realization didn’t impact the scope of the work.

The final problem, perhaps the most insidious, is the “toolbox problem.”  You start a project with a specific notion of what you’re going to do.  You have the tools to do it in your toolbox.  You find some unexpected things, and at first you can make do with your toolbox contents, but the unexpected keeps happening, and eventually you realize that you don’t have what you need at all.  I’ve done a lot of runs to Lowes during do-it-yourself projects so I know this problem well.

The current example of this problem is the notion of orchestration and automation.  You can perform a specific task, like deploying a VNF, with a script that lets you specify some variable parameters.  But then you have to be able to respond to changes and problems, so you need the notion of “events”, which means event-handling.  Then you increase the number of different things that make up a given service or application, so the complexity of the web of elements increases, and so does the number of events.  If you started off thinking that you had a simple model-interpreter as the basis for your orchestration, you now find that it can’t scale to large-sized, event-dense, situations.  If you’d expected them from the start, you’d have designed your toolbox differently.

Architecturally speaking, everything we do in service lifecycle processing should be a fully scalable microservice.  Every process should scale with the complexity of the service we’re creating and selling, and the process of coordinating all the pieces through exchange of events should be structured so that you can still fit in a new/replacement piece without having to manually synchronize the behavior of the new element or the system as a whole.  That’s what the goals for service lifecycle automation, zero-touch automation, closed-loop automation, or whatever you want to call it, demand.  We didn’t demand it, and in many cases still aren’t demanding it.

None of these problems are impossible to solve; some are already being solved by some implementations.  Because we’ve not particularly valued these issues, particularly in terms of how they’re covered, there’s not been much attention paid by vendors in explaining how they address them.  Buyers don’t know who does and who doesn’t, which reduces the benefit of doing the right thing.

We also need to take steps in both the standards area and the transformation-related open-source projects to stop these kinds of issues from developing or worsening.  Some sort of top-down, benefit-to-project association would be a good start, an initiative to start with what operators expect from transformation and align the expectation with specific steps and architectural principles.  This wouldn’t be difficult, but it might then be hard to get the various standards bodies to accept the results.  We could try, though, and should.  Could some major standards group or open-source activity not step up and announce something along these lines, or even a credible vendor or publication?

Nobody wants to commit to things that make their work more complicated.  Nobody likes complexity, but if you set about a complex process with complex goals, then complexity you will have, eventually.  If you face that from the first, you can manage things at least as well as the goals you’ve set permit.  If you don’t, expect to make a lot of trips to Lowes as you try to assemble your elephant, and they probably won’t have the right parts.