Fixing the Conflated-and-Find-Out Interpretation of MANO/VIM

I blogged recently about the importance of creating NFV services based on an agile markup-like model rather than based on static explicit data models.  My digging through NFV PoCs and implementations has opened up other issues that can also impact the success of an NFV deployment, and I want to address two of them today.  I’m paring them up because they both relate to the critical Management/Orchestration or MANO element.

The essential concept of NFV is that a “service” somehow described in a data model is converted into a set of cooperating committed resources through the MANO element.  One point I noted in the earlier blog is that if this data model is highly service-specific, then the logic of MANO necessarily has to accommodate all the possible services or those services are ruled out.  That, in turn, would mean that MANO could become enormously complicated and unwieldy.  This is a serious issue but it’s not the only one.

MANO acts through an Infrastructure Manager, which in ETSI is limited to managing Virtual Infrastructure and so is called a VIM.  The VIM represents “resources” and MANO the service models to be created.  If you look at the typical implementations of NFV you find that MANO is expected to drive specific aspects of VNF deployment and parameterization, meaning that MANO uses the VIM almost like OpenStack would use Neutron or Nova.  In fact, I think that this model was explicitly or unconsciously adopted for the relationship, which I think is problematic.

The first problem that’s created by this approach is what I’ll call the conflation problem.  A software architect approaching a service deployment problem would almost certainly divide the problem into two groupings—definition of the “functions” part of virtual functions and descriptions/recipes on how to virtualize them.  The former would view a VNF implementation of “firewall” and a legacy implementation of the same thing as equivalent, not to mention two VNF implementations based on different software.  The latter would realize the function on the available resources.

If you take this approach, then VIMs essentially advertise recipes and constraints on when (and where) they can be used.  MANO has to “bind” a recipe to a function, but once a recipe is identified it’s up to the VIM/chef to cook the dish.

In a conflated model, MANO has to deploy something directly through the VIM, understanding tenant VMs, servers, and parameters.  The obvious effect of this is to make MANO a lot more complicated because it now has to know about the details of infrastructure.  That also means that the service model has to have that level of detail, which as I’ve pointed out in the past means that services could easily become brittle if infrastructure changes underneath.

The second issue that the current MANO/VIM approach creates is the remember-versus-find-out dichotomy.  If MANO has to know about tenant VMs and move a VIM through a deployment process, then (as somebody pointed out in response to my earlier blog on this) MANO has to be stateful.  A service that deploys half a dozen virtual machines and VNFCs has a half-dozen “threads” of activity going at any point in time.  For a VNF that is a combination of VNFCs to be “ready”, each VNFC has to be assigned a VM, loaded, parameterized, and connected.  MANO then becomes a huge state/event application that has to know all about the state progression of everything down below, and has to guide that progression.  And not only that, it has to do that for every service—perhaps many at one time.

Somebody has to know something.  You either have to remember where you are in a complex deployment or constantly ask what state things are in.  Even if you accept that as an option, you’d not know what state you should be in unless you remembered stuff.  Who then does the remembering?  In the original ExperiaSphere project, I demonstrated (to the TMF among others) that you could build a software “factory” for a given service by assembling Java Objects.  Each service built with the factory could be described with a data model based on the service object structure, and any suitable factory could be given a data model for a compatible service at any state of lifecycle progression and it could process events for it.  In other words, a data model could remember everything about a service so that an event or condition in the lifecycle could be handled by any copy of a process.  In this situation, the orchestration isn’t complicated or stateful, the service model that describes it remembers everything needed because it’s all recorded.

There are other issues with the “finding-out” process.  Worldwide, few operators build services without some partner contributions somewhere in the process.  Most services for enterprises span multiple operators, and so one operates as a prime contractor.  With today’s conflated-and-find-out model of MANO/VIM, a considerable amount of information has to be sent from a partner back to the prime contractor, and the prime contractor is actually committing resources (via a VIM) from the partner.  Most operators won’t provide that kind of direct visibility and control even to partners.  If we look at a US service model where a service might include access (now Title II or common-carrier regulated) and information (unregulated), separate subsidiaries at arm’s length have to provide the pieces.  Is a highly centralized and integrated MANO/VIM suitable for that?

I’m also of the view that the conflated-find-out approach to MANO contributes to the management uncertainty.  Any rational service management system has to be based on a state/event process.  If I am in the operating state and I get a report of a failure, I do something to initiate recovery and I enter the “failed” state until I get a report that the failure has been corrected.  In a service with a half-dozen or more interdependent elements, that can best be handled through finite-state machine (state/event) processing.  But however you think you handle it, it should be clear that the process of fixing something and of deploying something are integral, and that MANO and VNFM should not be separated at all.  Both, in fact, should exist as processes that are invoked by a service model as its objects interdependently progress through their lifecycle state/event transitions.

If you’re going to run MANO and VNFM processes based on state/event transitions, then why not integrate external NMS and OSS/BSS processes that way?  We’re wasting enormous numbers of cycles trying to figure out how to integrate operations tasks when if we do MANO/VNFM right the answer falls right out of the basic approach with no additional work or complexity.

Same with horizontal integration across legacy elements.  If a “function” is virtualized to a real device instead of to a VNF, and if we incorporate management processes to map either VNF and host state on one hand and legacy device state on the other to a common set of conditions, then we can integrate management status across any mix of technology, which is pretty important in the evolution of NFV.

If we accept the notion that the ETSI ISG is a functional specification then these issues can be addressed readily by simply adopting a model-based description of management and orchestration.  Another mission for OPNFV, or for vendors who are willing to look beyond the limited scope of PoCs and examine the question of how their model could serve a future with millions of customers and services.