Getting NFV Orchestration Up to Speed with the Cloud

Whatever else you have, or think you have, the NFV business case will depend on software automation of the service lifecycle.  VNFs matter functionally, but only if you can operationalize them efficiently and only if they combine to drive carrier-cloud adoption.  The core of any justification for NFV deployment in the near term is operational efficiency.  Software automation generates that, if you have it at all.  The core for the future is service agility and low cost, and software automation generates that too.  Get it right, and you’re everywhere.  Get it wrong and you’re nowhere.

Getting software automation right is largely a matter of modeling, but not “modeling” in the sense of somehow picking an optimum modeling language or tool.  You can actually do a good job of NFV service modeling with nothing more than XML.  What you can’t do without is the right approach, meaning you can’t do without knowing what you’re modeling in the first place.  You are not modeling network topology, or even service topology.  You’re modeling multi-domain deployment.

A service, from the top, is a retail contract with an SLA.  You can express this as an intent model that defines first the service points (ports) where users connect and then as the characteristics the service presents at those points, which is the SLA.  In the most general case, therefore, the high-level service looks like a list of port-classes representing one or more types of user connection, and a list of ports within each class.  It also represents a list of service-classes available at these ports, again as SLAs.

Most people agree with this at this high level.  The problem comes when you try to go downward, to deploy this on resources.  There we have a whole series of approaches that essentially come down to two—the model decomposition approach and the model deployment approach.

Model deployment is what a lot of vendors have thought of.  In model deployment, a service model is linked to a series of scripts that provision resources, manipulate management interfaces, invoke controllers, and so forth.  Even if model-deployment implementations of NFV allow for the “nesting” of models to, for example, split an IP VPN service into the VPN Core and Access components, the bulk of the provisioning decisions are made in one giant step.

Model decomposition is different.  Here the goal is to define a service from top to bottom in abstract.  A Service consists of a Core element and Access.  Each of these can then be divided—Access into point-of-attachment features (firewall, etc.) and access connectivity.  Those can then be divided into specifics—the specific edge features, the specific connectivity type.  You go as far as you can go in decomposition of a service before you start thinking about implementation.

The easiest way to assess these differences is to look at the two logical ends of a service deployment, the service-level and the resource management.  What the two approaches do at these extremes will likely determine how broadly useful, or risky, they could be.  That’s particularly true when you assume that networks are already populated with different technologies, and multiple vendors within each.  We’re now introducing SDN and NFV, and they won’t deploy uniformly or instantly.  The network we’re trying to software-control is in a state of confusion and flux technically.

In the model-deployment approach, the process of building an access connection across multiple technology and vendor choices has to be built into the logic of the model that represents the VPN.  If we imagine this as a script, the script has to account for any variation in the way something gets deployed because of the technology available there, the vendor and API standards, etc.  As a result, there is a risk that you might end up with many, many, versions of “IP VPN”, with each version differing in what technology mix it’s designed to deploy on.  Changes in the network because of orderly evolution would break the scripts that depended on the old configuration, and if you wanted to change a service based on a customer order, the change might be radical enough to force you to break down one deployment model and initiate another.  Even a failure, one that for example shifts a piece of a service from a VNF to a legacy device, might not be handled except by kill-and-start-over.

At the other end of the service, the top end, having a lot of implementation details puts pressure on the service order and management portals, because if you make the wrong selection of a model element to represent the order, the order might not even be compatible with your choice.  You also have to resolve the management portal relationship with all the resource details, and if every model deploys differently on different equipment, the harmonization of the management view would have to be explicitly part of the deployment process, and it would be as brittle as the models were.  You could change how a service was deployed and it would then change what the user saw on a management console, or could do in response to what they saw.

You probably think that the model-decomposition approach solves all of this, but that’s not totally true.  Model decomposition would typically be based on the notion of “successive abstractions”.  A “service” would decompose into “sub-services” like Access and VPN Core.  Each of the sub-services would decompose into “feature elements”, and each feature element into a “virtual device” set.  Virtual devices could even decompose into “device-class variants” before you actually started putting service pieces onto network or hosting resources.  This structure, in and of itself, can contain the impact of changes in infrastructure or service, and it can also make it easier to substitute one “tree” of decomposition (targeting, for example, real Cisco routers) with another (targeting virtual router instances hosed as VNFs).  It doesn’t necessarily make things dynamic, and it doesn’t solve the management problem.

What you really need to have to do both these things is an extension of the basic notion of model-decomposition, which is this:  since every model element is an intent model known to the outside world only by its properties, which include its SLA, you can manage every model element, and should.  If you have an element called “Access”, it has an SLA.  Its sub-elements, whether they are divided by administrative domain, geography, or technology, also have SLAs.  The SLA of any “superior” object drives the SLAs of the subordinate objects, which at the bottom then drive the network behavior you’re parameterizing, configuring, and deploying.  You can provide a user or customer service rep with SLA status at any intent model, and they see the SLA there, which is what they’d expect.  Only at the bottom do you have a complicated task of building an SLA from deployed behaviors.

Speaking of the bottom, it’s helpful here to think about a companion activity to the service modeling we’ve been talking about.  Suppose that every administrative domain, meaning area where common management span of control exists, has its own “services”, presented at the bottom and utilized in combination to create the lowest-level elements inside the service models?  In my ExperiaSphere project I called these resource-side offerings Behaviors to distinguish them from retail elements.  They’d roughly correspond to the resource-facing services (RFS) of the TMF, and so my “service models” would roughly map to the TMF customer-facing services (CFS).

Now we have an interesting notion to consider.  Suppose that every metro area (as an administrative domain) advertises the same set of behaviors, regardless of their technology mix and the state of their SDN/NFV evolution?  You could now build service models referencing these behaviors that, when decomposed by geography to serve the specified endpoints, would bind to the proper behaviors in each of the metro areas.  Service composition now looks the same no matter where the customers are.

I’ve described this as a service-domain/resource-domain split, and in my own model the split occurs where technical capabilities are linked to functional requirements—“Behaviors” are linked to “service elements”.  Above the split, the model is primarily functional and logical, though in all cases a model element that has for example five underlayment elements to integrate would have to control the gateway processes that connect them.  Below the split, the goal is primarily technical.  You could justify a difference in modeling approach above and below, and even multiple modeling strategies in the resource domain, as long as the same behavior set was presented in the same way to the service domain.

This approach, which combines service-modeling using intent-based abstractions with resource modeling that takes place underneath a single set of “behaviors” abstractions that are presented upward, seems to me to offer the right framework for software automation of the service lifecycle processes.  Each of the model abstractions is a state/event machine whose states are self-determined by the architect who builds them, and whose events are generated by the objects immediately above or within.  The states and events create, for each model element we have, a table of processes to be used to handle the event in each possible operating state.  That’s how service lifecycle management has to work.

I like TOSCA as a modeling approach for this, but as I said what’s critically important is that the modeling technology support the structure I’m describing.  It’s a major plus if there are tools to manage the models available in the open-source community.  TOSCA is already gaining acceptance as part of vendor NFV strategies.  It’s supported by Cloudify as a specific NFV solution (through Aria, a general modeling tool), Alien4Cloud, and Ubicity.  OpenStack’s HEAT is based on it, and so is Amazon’s CloudFormation.  OpenTOSCA, an open-source implementation of the model, was used by Jorge Cardoso in a proof-of-concept university project I’ve often cited.  There are also some NFV ISG TOSCA documents available.  I think TOSCA is the smart choice in the service domain, and also at the top of the resource domain.

What happens in that resource area, below where TOSCA is a good choice?  That’s by definition inside an opaque intent model, so in truth it doesn’t matter.  Any administrative domain could use any modeling and deployment tools they liked as long as they were published as something like my Behaviors to be used in services.  TOSCA would work there overall, and work best IMHO where cloud elements (like VNFs) deploy, but the key to NFV success is to embrace whatever equipment is out there, and demanding a reformulation of the management of that equipment to support a new model wouldn’t make sense.

I think that this software automation approach handles orchestration and full lifecycle management, can easily support wholesale/retail partners, 5G slicing and operating within those slices as a VNO, and any combination of legacy vendors and technologies, as well as SDN and NFV.  There may be other ways too, and I’m not trying to say this is the only solution. It is, however, a good solution and one I’m absolutely confident could work as well as any in making SDN and NFV transformation successful.  It’s fully described in my ExperiaSphere material (http://www.experiasphere.com/page16.html) and please note that I’ve made all this available and open without restrictions, even attribution, as long as you don’t use the trademarked name.  Take a look and see if it works for you, and let me know if you find something better.