How Do We Ensure that Zero-Touch Automation Actually Touches Enough?

The more you, do the more it helps…and costs.  That’s the challenge of scope for zero-touch automation in a nutshell.  The challenge is especially acute when you consider that the term itself could be applied in many ways.  What exactly are we zero-touch automating?  It’s that question, the question of scope, that determines the impact that a solution can have, the disruption it will cause, and the cost it might generate.

The scope issue has to start with something no matter how vague its boundaries might become, and there are two dimensions in which scope variation happens with applications and services.  The first dimension is what could be called the business versus technology dimension, and the second the resource relationship dimension.

Both services and applications do something in a functional sense, and run on something in a technical sense.  The value, both intrinsic and commercial, depends on the functional side and the operational requirements for the resources fall on the technical side.  With network services (the main focus of my discussions here), the business side for operators is represented by the operations support and business support systems (OSS/BSS), and the technical side by network management systems (NMS) and network operations personnel and processes (the “network operations center” or NOC).

The business/functional versus operational/technology dimension also exists in applications, where developers or software selection professionals working with line departments decide what to use, and operations center personnel decide how to run it.  The origin of the concept of “DevOps” or Developer/Operations tools lies in this relationship.

The resource relationship dimension has focused on how stuff gets managed.  Services and applications have historically been managed, in lifecycle terms, in one of two specific ways.  The first is the explicit-linked resource model, where we have resources that are specifically linked to services or applications, and the state of those things are thus dependent on the state of the specific resources assigned.  Old TDM services worked this way.  The second is the shared resource model, which says that resources are assigned from a pool, shared among services or application.  Packet networks share capacity and resources, so that’s the traditional network model.

A third model has recently emerged with virtualization and the cloud.  The pooled resource model differs from the shared resource model in that resources needed for services or applications are allocated from a pool, one that includes virtualized resources hosted on real servers.  In effect, it creates a hybrid between the two earlier approaches.  There’s real stuff underneath, and virtual stuff above it.

This real/virtual boundary is what’s actually at the heart of the zero-touch scope debate.  A “virtual” resource looks real from above and looks like an application from below.  When SDN and NFV came along, they created explicit virtualization but not explicit virtualization management practices.  The biggest problem that the notion of zero-touch automation faces is the fact that we have this new virtual layer that divided the first of our dimensions as well as the second.  How do you manage virtual resources, and how do they relate to the services/applications and resources that they separate?

The reason this is important to the scope issue is that since virtualization has to be somehow accommodated, there’s been a tendency to focus on managing it rather than managing everything.  Some zero-touch proponents are really exclusively interested in managing virtualization, some are interested in managing virtualization and services, and some virtualization and resources.  The problem is that if you want to manage, in zero-touch terms, the glorious whole, you end up having to address a lot of stuff that’s already been addressed, but not in the virtual-world environment in which we currently live.  This is where the ONAP/ECOMP initiative comes in.

It’s always been my view that AT&T got to its ECOMP platform because its Domain 2.0 architecture needed a software operationalization framework to live in, given that different vendors made up different pieces/zones of the same infrastructure.  That mission makes ECOMP a fairly complete management tool, and if ECOMP can deliver on zero-touch automation, it can fully address the dimensions of business, function, services, applications, virtualization, and resources.  In my view, zero-touch automation is meaningless if it only limits what you have to touch a bit.  It has to limit it a lot.

There are two corollaries to this point.  First, ECOMP needs to explicitly address both zero-touch automation and all the pieces and dimensions I’ve listed.  If it doesn’t, then it’s just another partial solution that may not get complete backing from operators.  Second, if you think you have a better strategy than ECOMP, you’d better have full coverage of all those areas or you are offering a partial solution.  That means that anyone who wants to sell into network operator infrastructure or management has to make it their mission to fit into ECOMP, which means ONAP today.

What about the ETSI zero-touch activity?  Well, the only thing that separates them from becoming another science project in the standards space is a decision to specifically target their strategies to the enhancement of ONAP/ECOMP.  Might the reverse work?  In theory, but the problem is that it makes no sense to link a currently deployable framework for solution to a long-term standards process and wait for happy convergence.  Make what can be done now the reference, and let the long-term process define an optimal evolution—in conjunction with the ONAP people.

ONAP/ECOMP offers a good blueprint for full-scope management, but it still needs work in explicitly defining management practices related to virtualization.  ONAP/ECOMP effectively manages virtual resources and admits real resources into the same management process set, making all resources virtual.  Since OSS/BSS systems have typically treated services as the product of “virtual god boxes” that provided all the necessary functionality, this marries neatly with OSS/BSS processes.  However, it doesn’t orchestrate them.  That’s not necessarily a horrible problem given that OSS/BSS intervention into operations events can be limited to situations where billing and customer care are impacted, but it could bite the process later on.

The ONAP marriage is new, and still evolving, of course.  ONAP is hosted by the Linux Foundation, which has just established a combined administrative structure (see HERE) to manage their collective networking projects.  I’m hoping that this reflects a longer, broader, vision of networking; it certainly encompasses a lot more of the important stuff, including OpenDaylight, that bears on the issue of that critical virtual layer.  ODL codifies the notion that everything can be made to look virtual, and that formalization in the ONAP structure might induce the group to be more explicit about the strategy I’ve outlined, which at the moment can only be inferred and not proved.

One thing that addressing the virtualization model would accomplish is closing the loop on the issue of the explicitly linked versus shared resource model, in management terms.  Shared resources are generally managed independently of services, and that is also true with resource pools.  The presumption is that higher-layer elements that are built on real resources will respond to a failure of those resources by redeploying.

Where the challenge in this model is most obvious is in the way that we commission new resources, or coerce resource-centric behaviors into a form where they can be consumed by services.  We’re used to thinking about modularity and intent modeling in services, but the same concepts have a lot of value on the resource side.  A concept like “component-host”, for example, is a property of something that can (obviously) host components.  That might be a VM, a container, a real server, an edge blade in a router, or a piece of agile vCPE.  However, it might be packaged as a piece of equipment, a piece of software, a rack of equipment, an entire data center, or (in the case of M&A) a community of data centers.  The boundary layer concept is important because it gives us not only a way of making an infrastructure-neutral way of mapping service components to resources, but also because it offers a way of making new resources available based on predefined capabilities.  One vendor I know to be working in this area is Apstra, and there may be others.

The breadth of impact that’s potential with ONAP/ECOMP is perhaps the most critical thing about it, because of the potential opex savings and agility benefits that full zero-touch service lifecycle automation would offer.  A complete solution would offer potential savings of about 7 cents per revenue dollar even in 2018, across all categories of network operators.  In agility terms, it could lower the time required for a setup of a new service from about two weeks for largely manual processes excluding new wiring, three days for today’s state-of-the-art automation, or four hours for what’s considered today to be full automation, down to about ten minutes.  These benefits are critical for operator transformation, so much so that without them there’s doubt that much transformation can be achieved.  But, as we’ll see, there are issues beyond scope of impact that we have to address, and this blog series will cover them as we go along.