A Deep Dive into Service Modeling

The question of how services are modeled is fundamental to how services can be orchestrated and how service-lifecycle processes can be automated.  Most people probably agree with these points, but not everyone has thought through the fact that if modeling is at the top of the heap, logically, then getting it right is critical.  A bit ago I did a blog on Ciena’s DevOps toolkit and made some comments on their modeling, and that provoked an interesting discussion on LinkedIn.  I wanted to follow up with some of the general modeling points that came out of that discussion.

Services are first and foremost collections of features.  The best example, and one I’ll use through all of this blog, is that of a VPN.  You have a “VPN” feature that forms the interior of the service, and it’s ringed by a series of “Access” features that get users connected.  The Access elements might either be simple on-ramps or they might include “vCPE” elements.  When a customer buys a VPN they get what almost looks like a simple molecule; the central VPN ringed with Access elements.

Customers, customer service reps, and service architects responsible for building services would want to see a service model based on this structure.  They’d want Access and VPN features available for composition into VPN services, but they would also want to define a “Cloud” service as being a VPN to which a Cloud hosting element or two is added.  The point is that the same set of functional elements could be connected in different ways to create different services.

Logically, for this to work, we’d want all these feature elements to be self-contained, meaning that when created they could be composed into any logical, credible, service and when they were ordered they could be instantiated on whatever real infrastructure happened to be there.  If a given customer order involved five access domains, and if each was based on a different technology or vendor, you’d not want the service architect to have to worry about that.  If the VPN service is orderable in all these access domains, then the model should decompose properly for the domains involved, right?

This to me is a profound, basic, and often-overlooked truth.  Orchestration to be optimally useful has to be considered at both the service level and the resource level.  Service orchestration combines features, and resource orchestration deploys those features on infrastructure.  Just as we have a “Service Architect” who does the feature-jigsaws that add up to a service, we have “Resource Architects” who build the deployment rules.  I’d argue further that Service Architects are always top-down because they define the retail service framework that is the logical top.  Resource architects could be considered to be “bottom-up” in a sense, because their role is to expose the capabilities of infrastructure in such a way that those capabilities can couple to features and be composed into services.

To understand the resource side, let’s go back to the Access feature, and the specific notion of vCPE.  An Access feature might consist of simple Access or include Access plus Firewall.  Firewall might invoke cloud hosting of a firewall virtual network function (VNF), deployment of a firewall VNF in a piece of CPE, or even deployment of an ordinary firewall appliance.  We have three possible deployment models, then, in addition to the simple Access pipeline.  You could see a Resource Architect building up deployment scripts or Resource Orchestrations to handle all the choices.

Again applying logic, AccessWithFirewall as a feature should then link up with AccessWithFirewall as what I’ll call a behavior, meaning a collection of resource cooperations that add up to the feature goal.  I used the same name for both, but it’s not critical that be done.  As long as the Service Architect knew that the AccessWithFirewall service decomposed into a given Resource Behavior, we’d be fine.

So, what this relationship would lead us to is that a Resource Architect would define a single Behavior representing AccessWithFirewall and enveloping every form of deployment that was needed to fulfill it.  When the service was ordered, the Service Architect’s request for the feature would activate the Resource model, and that model would then be selectively decomposed to a form of deployment needed for each of the customer access points.

If you think about this approach, you see that it defines a lot of important things about modeling and orchestration.  First, you have to assume that the goal of orchestration in general is to decompose the model into something else, which in many cases will be another model structure.  Second, you have to assume that the decomposition is selective, meaning that a given model element could decompose into several alternative structures based on a set of conditions.

So a higher-level model element can “contain” alternatives below, and can represent either decomposition into lower-level elements or deployment of resources.  Are there other general properties?  Yes, and they fit into the aggregation category.

If two lower-level things (Access and VPN, for example) make up a service, then the status of the service depends on the status of these things, and the deployment of the service is complete only when the deployment of each of the subordinate elements is complete.  Similarly, a fault below implies a fault above.  To me, that means that every object has an independent lifecycle process set, and within it responds to events depending on its own self-defined state.

Events are things that happen, obviously, and in the best of all possible worlds they’d be generated by resource management systems, customer order systems, other model elements, etc.  When you get an event directed at a model element, that element would use the event and its own internal state to reference a set of handler processes, invoking the designated process for the combination.  These processes could be “internal” to the NFV implementation (part of the NFV software) or they could be operations or management processes.

If we go back to our example of a VPN and two AccessWithFirewall elements, you see that the service itself is a model element, and it might have four states; ORDERED, ACTIVATING, OPERATING, and FAULT.  The AccessWithFirewall elements include two sub-elements, the Firewall and the AccessPipe.  The Firewall element could have three alternative states—FirewallInCloud, FireWallvCPE, and FirewallAppliance.  The first two of these would decompose to the deployment of the Firewall VNF either in a pooled resource or in an agile premises box, and the latter would decompose to an external-order object that waited for an event that said the box had been received and installed.

If we assume all these guys have the same state/events, then we could presume that the entire structure is instantiated in the ORDERED state, and that at some point the Service model element at top receives an Activate event.  It sets its state to ACTIVATING and then decomposes its substructure, sending the AccessWithFirewall and VPN model elements an Activate.  Each of these then decompose in turn and also enter the ACTIVATING state, waiting for the lowest-level deployment to report an Operating event.  When a given model element has received that event from all its subordinates, it enters the OPERATING state and reports that event to its own superior object.  Eventually all these roll up to make the service OPERATING.

If a lower-level element faults and can’t remediate according to its own SLA, then it reports a FAULT event to its superior.  The superior element could then either try remediation or simply continue to report FAULT up the line to eventually reach the service level.  When a fault is cleared, the model element that had failed now reports Operating and enters that state, and the clearing of the fault then moves upward.  At any point, a model element can define remedies, invoke OSS processes like charge-backs, invoke escalation notifications to an operations center, etc.

Another aggregation point is that a given model element might define the creation of something that lower-level elements then depend on.  The best example here is the creation of an IP subnetwork that will then be used to host service features or cloud application components.  A higher-level model defines the subnet, and it also decomposes into lower-level deployment elements.

I would presume that both operators and vendors could define model hierarchies that represent services or functional components of services, and also represent resource collections and their associated “behaviors”.  The behaviors form the deployment bottom process sets, and so if two different vendors offered slightly different requirements they could still perform interchangeably if they rationalized to the same behavior model, which could then be referenced in a service.

This is a lightweight description of how a service model could work, and how it could then define the entire process of service creation and lifecycle management.  All the software features of SDN, NFV, and OSS/BSS/NMS are simply referenced as arguments in a state/event table for various model elements.  The model totally composes the service process.

The final logical step would be to develop a tool that let operator architects drag and drop element to create higher-level service models all the way up to the service level, and to define resources and deployment rules below.  These tools could work from the service model catalog, a storage point for all the model elements, and they could be based on something open-source, like the popular Eclipse interactive development environment used for software-building.

You might wonder where the NFV components like MANO or VNFM or even VNFs are in this, and the answer is that they are either referenced as something to deploy in a behavior, or they’re referenced as a state/event-driven process.  You could build very generic elements, ones that could be referenced in many model element types, or you could if needed supply something that’s specialized.  But there is no single software component called “MANO” here; it’s a function and not a software element, and that’s how a software architect would have seen this from the first.

A data-model-driven approach to NFV is self-integrating, easily augmented to add new functions and new resources, and always updated using model-data-driven activities not software development.  A service model could be taken anywhere and decomposed there with the same kind of model software, and any part of a model hierarchy could be assigned to a partner provider or a different administrative zone or a different class of resources.

This is how NFV should be done, in my view, and my view on this has never changed from my first interactions with the ETSI process and the original CloudNFV project.  It’s what my ExperiaSphere tutorials define, for those interested in looking up other references.  This is the functional model I compare vendor implementations against.  They don’t have to do everything the way I’d have done it, but they do have to provide as workable an approach that covers at least the same basis.  If you’re looking for an NFV implementation this is the reference you should apply to any implementations out there, open source or otherwise.

There’s obviously a connection between this approach and the management of VNFs themselves.  Since this blog is already long, I’ll leave that issue for the next one later this week.