Why do automation projects, meaning lifecycle automation projects, fail? A long-time LinkedIn contact of mine, Stefan Walraven did a nice video on the topic, something worth discussing not only in the context of lifecycle automation, a big interest of mine, but also in the context of any “transformational” projects, including virtualization, disaggregation, white boxes, and so forth. I want to amplify some of Stefan’s points, gently supplement a few, but endorse this video for anyone in the business of changing networks.
Automation is a must, but also a disappointment. That’s probably the key message, and the key truth, in the automation story. Why is this? That’s the question Stefan wants to address. He starts with a cute and effective analogy on risk posture, the “Sumo” versus “Kung Fu”.
The Sumo warrior takes a posture to control and resist change, to stay balanced, to move deliberately but with power. This is clearly the conservative/defensive mindset that’s often attributed to the network operators. In contrast, the Kung Fu warrior uses change to advantage, responding to pressure by moving to take a more powerful position. Speed, accuracy, and agility is the play.
Stefan says that you have to handle change as a Kung Fu warrior, because automation is really about change (since we’re not automated now). You must take one position or the other, so you have to take the change-centric one. But in the exercise of your agility, you have to confront the yin and yang of automation, which on one hand says you want to offer reasonable time to market with your solution but on the other hand have total stability in your result. Here, I’d offer another point I think is related. Time to market can induce you to pick an approach that doesn’t extend far enough to deliver in the long run. Stefan says too much emphasis is often placed on TTM, but not for this reason.
His next point is the bottom-up versus top-down mindset, and here Stefan introduces the risk that a bottom-up approach, which is usually the result of a TTM focus, would create a jumble of things that don’t go together, a “tangle of local automation” to paraphrase the slide. The problem with the top-down approach is that you can get so carried away with specification and design that you never get started at all.
Many, including me, would say that somehow the network operators have managed to get the worst of both worlds. They almost always take a very bottom-up approach, but they get so carried away with specification—a risk of the top-down approach—that they reach a conclusion only after nobody cares anymore. Harmonizing that seeming contradiction is the next point, the “ecosystem” in which the operators do their projects.
The recommendation that Stefan makes regarding the ecosystem or the external relationships that influence operator automation projects is to avoid technology choices driven by external opinions. That’s probably good advice if you can take it, but that may not be possible because of the reality of operator culture and technology experience. Not to mention that operators themselves are often guilty of spawning the most influential of those external relationships, the standards activities.
Standards have been a big part of operator behavior for literally decades. The Consultative Committee on International Telegraphy and Telephony (CCITT) isn’t heard of much these days, but this group of standards-writers, collected from operator processes, were the word up to about 25 years ago. In the days before deregulation of telecom, in fact, the CCITT rules had force of law in countries where the operators were part of government postal, telephone, and telegraph (PTT) bodies. The CCITT gave us standards like international packet switching (X.25), so their work has in fact been involved in data protocol standards.
Operators still see standards as being a kind of touchstone to protect them through change. They have to slip out of Sumo into Kung Fu, they feel threatened, so they launch a standards initiative to prepare a nice set of Sumo positions that the Kung Fu change process can then hunker down into when the change has been accomplished.
It’s here that Stefan makes his strongest point. This is your business, operators, your ecosystem. If you don’t know automation, you need to learn it, and learn it fast. But while this is the strongest point, it’s also a vulnerable point, because the future of networking is so complex, and so different from the past, that learning it from scratch without reliance on some of those external ecosystem elements Stefan cites, is almost surely impossible.
Now, Stefan starts with some points or rules, each of which is highly interesting and worthy of discussion. The first is that “We’re in this together”. There is no middle ground. All of the stakeholders, the members of the ecosystem, have to be integrated into the vision. Thus, you need to build cross-functional, cross-organizational teams. The members of this team, and any partners selected, have to be Kung Fu masters as well as people who know their areas and can get things done.
The second point is “Fierce automation per service”, which means one service (business case) at a time, and it’s here that I must gently suggest that there’s a missing step. Let me address that, then pick up with Stefan at this same point.
My “missing step” here is that there is no chance of harmonizing per-service automation, or even completing it effectively, if you don’t establish some architecture first. That architecture is framed by a set of technology elements that are currently available, widely used, and have direct capabilities in the specific areas where the automation project must focus. I believe that the public cloud technology base, the concepts of intent modeling, and the principles of TMF NGOSS Contract, are the minimum inventory of these tools. I’ve blogged a lot about each of these, so I’m not going to go into detail about them again here. Instead, I’ll rejoin Stefan.
Fierce automation per service, if done within an architectural framework that can address automation in general and leverage available technology, is indeed a very smart concept. Pick something that completely exercises your proposed solution, and preferably something that’s fairly contained and that can be done quickly without collateral impacts on other areas. Learn from that process, refine your architecture, and then proceed. Stefan rightfully suggests top-down design for this first service (and others, of course).
Using what’s learned from this first automation exercise and using the software architecture that’s been created and refined, you now enter Stefan’s “Brutally automate everything” phase. His point here is also a key point, most projects will fail because the scope of automation leaves too many loose ends, and manually addressing all these disconnected little leftovers will surely require an enormous human effort, enough to create a project failure.
In Stefan’s third point, he defines the “Kung Fu automation toolbox”, which is a good name for what I’ve just introduced here as a step to be taken before his second point. I think this reflects the difference between my own view of this and Stefan’s view. He’s framing a software architect’s vision of automation because he’s a software architect from a software organization, and so he intuitively uses this framework because it’s proved to be the right approach in cloud projects. The problem is that the operators don’t have that background, and so they’re going to have to establish this toolbox, pick the right elements, understand both the choices and the way they’re applied to the problems of automation.
This isn’t just a semantic debate or the order of steps. The reason why not only automation projects but virtualization projects and other “transformation” projects have been failing for operators is that they don’t know what the toolkit is. Build structures with mud bricks for generations, and when you’re confronted with the need to build a bridge, there’s a lot of inertia driving a mud-brick selection. Operators are box people, so if you don’t give them an explicit not-box toolkit, they’ll build whatever they’re building with box concepts.
I think the problem that operators have with automation, as well as with function virtualization and other transformation pathways, is that they don’t know anything about where they’re going in terms of basic tools. Not-box isn’t an affirmative choice, or even a specific one. What is the new model of networks that replaces the box model? I tried to describe this in my ExperiaSphere work, and there are other initiatives using intent models and state/event processing, but whatever approach you like has to be an explicit one. We need an architecture for the way networks are built in the age of the cloud and lifecycle automation.
Stefan is describing a basic approach, and one that could surely be refined and adopted. If any such approach, any such toolkit, is in fact specified and used, then all his steps make perfect sense and would result in a successful project. The key is to get that toolkit, and the biggest mistake we’ve made in transformation and automation is trying to move forward without it.