Defining (and Understanding) Service Models

What is a service model?  I get comments on my blogs on LinkedIn and directly from clients and contacts via email, and that’s a question I’ve been getting.  I’ve dealt with modeled services for almost 15 years now, and perhaps that’s demystified them to me when they’re still mysterious to others.  I want to try to correct that here and explain why the concept is critical, critical to the point where a lot of what we expect from transformation can’t happen without it.

A network service or cloud application could well have a dozen or more explicit components, and each of these might have a dozen “intrinsic” components.  If you host a function somewhere, it’s a part of a higher-level service relationship, and it’s also dependent on a bunch of hosting and connection resources, each of which may involve several components or steps.  You could prove this out for yourself by drawing a simple application or service and taking care to detail every dependent piece.

The difficulty that multi-component services pose is the complexity they introduce.  If you look at the NFV concept of service chaining, for example, you’re proposing to replace a simple appliance with a chain of virtual machines, each linked with a network connection and each requiring the same level of management as any cloud component, or more.  If you have a goal of agility and composability of services from logical elements, the variations possible only add to the complexity.  Get enough of this stuff wrapped up in a retail offering and it becomes almost impossible to keep it running without software assistance.

Service lifecycle automation, or “zero-touch automation” is about being able to set up services/applications, sustain them when they’re in operation, and tear them down when needed.  Think about that in the context of your own service/application diagram.  There are a lot of steps to take, a lot of possible things that could break, a lot of responses to problems that could be deemed proper.  How do these things get done?

The best way to start is by looking at the high-touch, manual, way.  An operator in a network operations center (NOC) would use a service order/description and take steps to provision everything according to the order.  Everyone knows that this approach is expensive and error-prone, and more so as the number of elements and steps increases.  In the IT world, human steps were replaced by scripts a long time ago.  A script is essentially a record of manual steps, something like a batch file.  From that, scripting evolved into the imperative model of DevOps tools; “Do this!”  Scripting handles things like deployment fairly easily.

Where scripting falls down is in handling abnormal conditions.  Suppose that something breaks or is unavailable when it’s called for in the script?  You’d have to write in exception conditions, which is a pain even if you assume that there’s a parallel resource you could commit.  If a whole approach to, for example, deploying a software element is invalidated because some specific thing can’t be done, you can’t replace the specific thing, you have to replace the approach.  That means going backward, in effect.

It’s even worse if something breaks during operation.  Now you have a broken piece, somewhere, that was supposed to do something.  The context of its use has to determine the nature of the remedy, and it could be as simple as using a parallel available element or as complicated as starting over again.  That’s where service modeling comes in.  Service models are declarative, meaning that they don’t describe steps, they describe states.

A service model is a representation of the relationships of elements in a service.  Each element in the model represents a corresponding functional element of the service.  The collection of elements creates a functional hierarchy, a representation of how the overall service breaks down into pieces, eventually pieces that can be deployed and managed.

With a functional hierarchy, a service or application is defined at the top as an object with a specific set of properties.  That object is then related to its highest-level functional pieces, and each of them is decomposed in turn into their highest-level functional pieces.  At some point, this decomposition reaches the level where the pieces are not themselves decomposable objects, but rather specific resource commitments.  A simple example of a top layer of a functional hierarchy is that “Service” consists of “Access” and “Core”.

This model is interesting, but it’s not in itself an answer to ZTA.  When we were talking about service deployments and problems, we were talking about events.  The notion of events is critical to automation because it’s events that automation is supposed to be handling.  A long time ago, the TMF came up with a then-revolutionary (and still insightful) vision, which was that a service contract (which is a model of a service) is the conduit that steers events to processes.

Events and event-handling are absolutely critical to the success of ZTA.  What happens in the real world is asynchronous, meaning that everything is running in parallel and everything that happens is also in parallel.  It’s possible to queue events up for a serialized monolithic process, but if you do there’s a good chance that when you process Event A, you don’t know that a related event or two occurred later on, and you’re now working out of sync with reality.  It’s not enough to be able to understand what an event means in context if your context is wrong.

OK, so let’s suppose that something in our “Access” component of our “Service” breaks.  The fault event is directed to the “Access” data model.  That data model is built around a state/event engine or table, which says that every event in every possible functional state (orderable, ordering, active, fault, etc.) has a target process.  When the event is received, that process is run.

There are a lot of things an activated process might do, but they break down into three main categories.  It can do something “functional” (like changing a parameter or even initiating a tear-down of a resource commitment), it can do something “signaling” (generate an event), and it can do something “stateful” (changing its own state).  Usually it will do several of these things, sometimes all.

If a model object like “Access” gets an event it can handle within its own scope, it would run a process and perhaps it would set a state indicating it was waiting for that process to do the handling.  When the completion was signaled (by another event), it would then restore its state to “active”.  If a model object cannot handle the thing an event is signaling, it might signal down the chain to its subordinates.  That would happen if a service change were processed.  Finally, if a model object cannot handle the event, or pass it down, then it has to report a fault up the chain of objects to its superior, again via an event.

Example: “Access” faults, and the internal repair is attempted by FaultProcess, after setting the “Repairing” state.  After the process runs, the repair is not successful.  The object then sets the “fault” state and reports a “fault” event to “Service”.  Service might then try to replace the whole access element, essentially treating the fault as a signal to recommit resources.  If that can’t work, then “Service” enters a “fault” state and signals a “fault” event to the user.

The “models” here are dualistic.  In one sense they’re an abstraction of the process set they represent; the sum of what can be done to resources to fulfill the functional mission.  That makes them an intent model.  In another sense, they are a blueprint in a data form.  If I have a model of “Service”, I have everything needed to do anything that can be done to or for it.  That means that any process set can handle any event for the service.  I could spin up a new process for each time I had an event, or I could send the event to a process already set up.  The ability to have parallel processes handling parallel events is critical to scaling, and also to keeping the context of your service elements up to date.

This was my notion of the “Service Factory” in the first ExperiaSphere project I did in association with the TMF’s Service Delivery Framework work.  You have a blueprint, you send it to a Service Factory, and that factory can do whatever is needed because it has the blueprint.  Models give you a pathway to ZTA, but not just that.  They give you a way to exercise full scalability of process resources, because any instance of the “FaultProcess” in our example could handle the fault.  There is no single monolithic application to queue events to, or even a fixed set of them.  The model mediates its own process scheduling for any number of processes and instances.

There are two things that I think emerge from understanding this model stuff.  The first is that it’s very difficult to see how you could respond to events in a service lifecycle any other way.  State/event processing is a fundamental element in real-time design, proven for literally decades.  The second is that without the intrinsic process scalability and “Service Factory” approach, you end up with a lifecycle manager that has fixed capacity, meaning that some combination of events will likely overrun it.  It doesn’t scale.

A deeper insight from the model approach is that the functions of “orchestration” and “management” are intrinsic to the state/event sequence and not explicit in a process sense.  What you have in a model-driven system is a series of interconnected state/event systems, and their collective behavior creates all the lifecycle responses, from deployment to tear-down.  There is orchestration, but it’s a set of model-driven state/event processes.  Same with management.

This is why I object to things like the NFV ISG approach.  You cannot have a real interface between abstractions.  If you define an interface to the concept of “orchestration”, you are nailing the abstraction of orchestration to an actual component that the interface can interface to.  There is, or should be, no such thing.  Real-time event-driven systems have only one interface, an event interface.  They don’t have explicit, individual elements, they have collective, model-driven, behaviors.  You can do anything with them that a state/event process-model system can describe, which is pretty much anything.

All this depends on building the models right, which is how SDN, NFV, ZTA, IoT, and everything else we’re talking about in real-world services and applications should have done from the start.  None of them have done so, and so we are at the point where we either have to accept that we’ve done a lot of stuff totally wrong and start over, or go forward with a path that’s not going to lead to an optimum solution, and perhaps not even to a workable one.  I’m at the point where I won’t talk to anyone about a service lifecycle automation/ZTA approach unless we can talk modeling in detail and up front.  Anything else is a waste of my time.

The great irony is that there’s nothing in service modeling that’s rocket science.  There have been service automation projects based on modeling for five years or more.  I’ve detailed the specific application of hierarchical modeling to network services in six annotated slide presentations in my latest ExperiaSphere project, and everything there is freely contributed to the public domain except the use of the trademark “ExperiaSphere”.  You don’t even have to acknowledge the contribution if you use part of or all of the ideas.  Anyone with some real-time programming education or experience could do this sort of thing.

Whether the importance of the approach I’ve outlined will be recognized in time to save the concept is another matter.  I’ve been able to make every software type I’ve talked with understand the structure and issues.  I’ve also run though the modeling with operator standards types, and while they seemed to like the approach, they didn’t insist on its adoption for SDN, NFV, or ZTA.  Even within the OSS/BSS community, where TMF types may recognize the genesis of modeling in the NGOSS Contract event steering from a decade ago, it’s hard to get an endorsement.  The reason may be that real-time event programming isn’t something that non-programmers are tuned into.  Many can understand it works, but most apparently don’t see it as the optimum approach.  Because it’s an event-driven approach, and because lifecycle management is an event-driven process, I think it is.