Does the Industry Need More Order in NFV Infrastructure?

There is no question that disorder is hard to manage.  Thus, it might seem reasonable to applaud the recent story that the GSMA is working to bring some order to the “Wild West” of NFVi, meaning NFV infrastructure.  Look deeper, though, and you’ll find that disorder should have been the goal of NFV all along, and that managing it rather than eliminating it is essential if NFV is ever to redeem itself.

The stripped-to-the-bare-bones model of NFV is a management and orchestration layer (MANO), a virtual infrastructure manager (VIM), and an infrastructure layer (NFVi).  This model is very similar to the model for cloud deployment of application components; a component selected by the orchestrator is mapped to infrastructure via an abstraction layer, which means via virtualization of resources.  Virtual network functions (VNFs) are orchestrated, through the VIM, onto NFVi.

If we look at the early models of this, specifically at OpenStack, we see that a “virtual host” is mapped to a “real host” by a plugin that harmonizes differences in the interfaces involved.  That’s true for actual hosting (OpenStack’s Nova) and also networking (Neutron).  If we assumed that all virtual hosts had the same properties, meaning that any real host connected via a plugin could perform the role of hosting, this whole process is fine.

The issue arises when that’s not true.  I pointed out in yesterday’s blog that there were really two missions for “carrier cloud”, one being retail cloud hosting and the other being “internal” service function hosting.  I also noted that the latter mission might involve connection-like data plane requirements, more performance-stringent than typical event-or-transaction interfaces to application components.

The problem that the NFV community has been grappling with is that different applications with different missions could arguably demand different features from the infrastructure layer.  The current initiative proposes three different NFVi classifications, network-intensive (data plane), compute-intensive (including GPU) and a “nominal” model that matches traditional application hosting done by cloud providers today.  This, they propose, would reduce the variations in NFVi.

But why do that?  Does it really hurt if there’s some specialization in the NFVi side?  Remember that there are a bunch of parameters such as regulatory jurisdiction and power stability or isolation that could impact where a specific virtual function gets placed.  Why wouldn’t you address technology features in the same way?  There are two possible issues here.  One is that VNFs are being written to different hosting environments, and the other is that there are just too many different hosting environments to allow for efficient resource pooling.

The first issue, that VNFs might be written to a specific hosting platform, is in fact a real issue, but one that the NFV ISG should have addressed long ago at the specification level.  It is possible to create a VNF that needs a specific version of operating system and middleware, of course, but if VNFs are hosted in virtual machines then that should allow any essential set of middleware and OS to be bundled into the machine image.  If there are deeper dependencies between VNFs and the hosting platform hardware or software, then the lack of a single middleware framework for all VNFs to match to is needed, and should have been part of the spec all along.  Harmonizing NFVi isn’t the answer to that problem at all.

The second issue is a bit more complicated.  A resource pool in a cloud is efficient because of the principle of resource equivalence.  If a bunch of hosts can satisfy a given hosting request, it’s easier to load all the hosts fully and raise infrastructure utilization, which lowers capex and opex.  If you need a bunch of different hosts because you have a bunch of different VNF configurations at the hardware level, then your resource pool is divided and less efficient.  However….

…however, if there really are specialized hosting, networking, or other requirements (including power, regulatory, etc.), then those other requirements will either subdivide the three sub-pools, or we’d have to shoehorn everything into those three pools no matter what.  Both these approaches have issues.

How many blue left-handed widgets are there?  How many do you need there to be?  Any form of subdivision of hosting options, for any reason, creates the possibility that a given virtual function with a given requirement will find zero candidates to host in.  The more requirements there are, the more likely it is that it will be difficult to manage the number of candidate hosts for every possible requirement set.  If the need to control where something goes is real, then the risk of not being able to put it there is real, too.

The risk is greater if there are competing virtual functions that need only approximately the same NFVi.  If multiple characteristics are aggregated into a common pool, it’s difficult to prevent assignment from that pool from accidentally depleting all of a given sub-category.  Five blue-widget VNFs could consume all my blue-left-handed slots when in fact there were right-handed options available, but we didn’t know we needed to distinguish.

Another problem with the aggregate-into-three-categories approach is that it means that in order to achieve a larger resource pool for efficiency (which doesn’t even work if there are other criteria that impact hosting, as I’ve just noted), you end up creating categories that are OK on the average but will oversupply some VNFs with resources.  The loss in efficiency within the categories could well eradicate any benefit obtained by limiting the number of NFVi classifications to achieve overall resource pool efficiency.

To me, the real issue here is not NFVi, but VIM.  The NFV approach has been to have a single VIM, which means that VIM has to accommodate all possible hosting scenarios.  That doesn’t make a lot of sense if you have a distributed cloud infrastructure for your hosting, and it doesn’t make sense if you’re going to subdivide your NFVi or apply secondary selection criteria (power, location, whatever) to your hosting decisions.  From the first, I argued that any given service model should, for each model element, be able to identify a VIM that was responsible for deploying that element.

If the NFV ISG wants resource pool efficiency, I’d suggest they have to look at it more broadly.  Everything but the network-performance-critical categories of NFVi should probably be considered for container applications rather than for virtual machines.  VMs are great wasters of resources; you can run many more containers on a given host than VMs.  For specialized applications, rather than considering a “pool” of resources, the VIM for that class of VNF should map to a dedicated host or hosting complex.  Another advantage of having multiple VIMs.

My view continues to be that NFV needs to presume multiple infrastructure domains, each with its own VIM, and that VIM selection should be explicit within the service data model.  That lets a service architect decide how to apply resource virtualization to each new service, and how multiple hosting sites and even hosting providers are integrated.  However, at present, NFV doesn’t mandate a service model at all, which is the big problem, not NFVi multiplicity.

If a service model decomposes eventually to a set of resource commitments, the specific mechanism to do the committing isn’t a big deal.  You can have as many options as you like because the model’s bottom layer defines how actual resource commitment works.  Containers, VMs, bare metal, whatever you like.  Further, the process of model decomposition can make all the necessary selections on where and how to host based on service requirements and geography, which is also critical.  This is what we should be looking at first.

Two factors would then drive diversification of NFVi.  The first is legitimate requirements for different hosting/networking features for virtual functions.   The second is attempts by infrastructure vendors to differentiate themselves by promoting different options.  If you presume model-based decomposition as I’ve described, you can accommodate both.  Whatever selectivity is needed to deploy a service, a model can define.  Whenever a vendor presents a different hosting option, they can be required to provide a decomposition model that handles it, which then makes it compatible with the service deployment and automation process overall.