ETSI ZTA Architecture Shows Some Real Risks

In past blogs I’ve talked about abstract threats to the ETSI zero-touch automation (ZTA) project, but referencing one of the open documents, I want to talk here about the real threats that are now visible in the early documentation.  ETSI’s reference architecture for ZTA balances the new and old of standards, but I think one particular part of it biases the ZTA process toward the “old”, and in a way we already know can be fatal.

I want you to think for a moment about a network service in the hypothetical SDN/NFV future.  We would likely see a collection of cooperating features and devices, some appliances and some hosted software functions.  Service lifecycle management in this situation is a combination of order-driven activity from above and event responses from below.  Every customer, every service, has a corresponding implicit or explicit resource commitment associated with it, reaching across not only the primary network provider’s infrastructure but also probably the infrastructure of “federated” providers as well.

The reason this is important is that it almost guarantees that the only software architecture that’s going to work is one designed for event processing.  I also believe that when you tie in event processing, the modern notion of functional/microservice components that are inherently stateless for scalability, and the need to handle a very large number of concurrent services, you end up with a prescription for a model-driven architecture.  Event-driven, model-driven, systems are a collection of functions whose contextual handling of things is determined by model-hosted state/event relationships.

If you look at the referenced ETSI document, you can actually see a lot of this spelled out (in a very limited way) in Section 4, which is “Architecture Principles”.  The section calls for model-driven, scalable, intent-based implementations.  It even calls for stateless components.  The first problem is that even this section has contradictions.

Look at Principle 1: “A modular architecture defines logical functional components that interoperate with each other via interfaces. A modular architecture avoids monoliths and tight coupling [italics mine; I’ll come back to this], and consists of self-contained, loosely-coupled services, each with a confined scope.”  This seems fine on the surface but it’s not.

There is a logical contradiction between the first principle and some of the others, at least at the potential level, though that may not seem obvious.  The problem is that event-driven systems really don’t have interfaces between components, they have components activated by state/event tables in the data model.  Yes, there might be cases where a state/event-defined process had a series of components linked through conventional interfaces, but the important stuff is state/event-driven.  Coupling, “tight” or otherwise, components through interfaces can (and often does) create a “monolith”.

Where that really hits home is in the “reference architecture” shown as Figure 6.2-1.  This model is just the kind of thing that the ETSI NFV group created, calling it the “end-to-end” or functional model of NFV.  While “functional models” don’t purport to describe an implementation, what happens in many cases is that people take the model as a literal guide to the software structure.  That creates the very notion of “monoliths” and “tight coupling” that Principle 1 says must be avoided.  All the blocks in the diagram are monoliths.

Another figure, 6.5.4.2.1-1, has in my view even greater risk of misuse.  The figure describes the relationship between services and the management domain, and it again shows a task-oriented view rather than a state/event view.  In a real event-driven system, events drive processes directly so you don’t have the compartmentalized functional divisions the figure shows; functionality is the result of the process executions that events (via the state/event mappings in the service data model) trigger.  The notion that you get an event, analyze it in some way like an AI process, is a batch/transactional vision not an event vision.  Since this figure is supposed to represent real implementation relationships, it can’t even hide behind the notion that it’s just explaining a functional vision.

I’m also concerned about the notion of inter- and intra-domain fabrics.  An event-driven system built on intent modeling would always let each component of the service (modeled as an intent model) manage itself to its SLA and generate an event if it couldn’t.  That would be true within an administrative domain or between them, and the only thing that moves around in an event-driven system is the events themselves.  What’s a fabric supposed to be doing?  You don’t need to integrate a service model that’s defined by a data structure as a hierarchy of intent models.

I don’t want to understate the challenge that a standards group faces with this sort of thing.  Very few members are software architects, and in any event one of the greatest strengths of a data-model-coupled state/event-driven system is that its functionality is almost totally determined by the state/event relationships in the model and how they steer events to processes.  You can do a lot to define service lifecycle automation by just changing the relationships, but it’s very difficult to describe what exactly is being done, because it’s the state/event/process relationships and not the software that determine that.  How do you then describe the “functionality”?

The only sure solution I can see to the impasse we have here is for standards bodies to describe both the functional model (making it clear that’s not an implementation model) and the application architecture, which would be a description of the data model and how it steers events to processes based on per-element state/event relationships.  If both were presented, you could visualize the “what” and the “how” from independent models framed from the same requirements and have a pathway both to explain things and to implement them.

State/event systems are quite old.  I wrote a computer-to-computer protocol system for IBM 360 computers way back in the 1960s and used state/event logic.  Every protocol handler I ever did, or saw, used that principle.  The TMF embodied the notion of service data models coupling events to processes in its work a full decade ago.  It’s hard to believe that some vendor, some player who has hopes of being a kingpin in the carrier cloud, isn’t going to catch on to the right principle here and do something simple and yet having a profound impact on the whole ZTA space.  If that happens, somebody is going to catapult into a lead role in very little time, and the whole market dynamic of carrier cloud could change.