Modeling Pools of Resources for Carrier and Other Clouds

Virtualization is all about abstraction, and in most cases that means abstracting resources and building resource pools.  The ideal vision of “the cloud”, whether it’s a private cloud, a public cloud provider or carrier cloud, is one of a vast pool of resources that can be tapped as needed to provide the optimum in economy and performance. That pool is a hard thing to create in practice, and we’re just starting to see how many dimensions there are to the process.

A resource pool, a server pool, has to decompose into servers, so we can infer some of the requirements for resource-pool abstraction from the server side.  An abstraction of a server has to look like a real server, at least to the point that it has to run applications as a real server would.  Making that happen involves three interrelated things.  One is having a model of the abstract element itself, which we could call a “virtual machine” or a “container”.  The second is having a way of getting the application deployed into the abstraction, which we’d call “orchestration”, and the final one is getting the abstraction mapped to the real resource pool.  This can be a part of orchestration, or something outside it, including the so-called “infrastructure as code” or IaC.

Public and private cloud services all accept this approach and provide tools to handle the three interrelated things needed to make virtualized resource pools work.  However, those complications I opened with can make implementing the three necessary abstraction facets much more difficult.  The complications arise from managing a fundamental requirement for resource pools, which is resource equivalence.

A virtual machine that has to be handled differently depending on what “real” machine it’s mapped to is a heck of a lot less useful.  Differences in the quality of resources, meaning how they’ll perform for the task of hosting the abstraction, can mean that some applications may have to deploy not in the general resource pool, but in a kind of sub-pool.  As you start subdividing resource pools to accommodate specialized application needs, you start to lose the efficiency of resource utilization that pools bring.  If one server in the cloud is all you can deploy your app on, you don’t have a cloud any longer, just a server.

Resource equivalence spills into operations too.  If orchestration has to “know” about differences in deployment requirements, it makes orchestration more complicated.  It might be necessary to describe not one orchestration sequence, but one for each of the possible resource types.  Many of the older DevOps and scripting tools had this problem; change just a little in how something was hosted and the deployment broke.

There have been a number of approaches taken to hide a lack of complete resource equivalence from orchestration, so as to make practices more portable, but they fit into two main classes.  One is multi-level orchestration and the other is intermediary abstraction.

The NFV ISG’s Virtual Infrastructure Manager is an example of a kind of multi-level orchestration.  NFV’s primary Management and Orchestration (MANO) deals with a VIM, and the VIM then “orchestrates below” that level to accommodate differences in the resource pool.  The implementation of the VIM concept is a bit hazy now, and not all the options are workable.

A “single-VIM” model says that there is only one VIM, and it has to then handle all the differences in the resource pool.  That not only means dealing with differences in the cloud/container stack or hardware configuration of each real server, it also means dealing with the difference between the physical-network-function (PNF) version of a device like a firewall, and its virtual-network-function (VNF) version.  This packs a lot of diverse logic into a single software element, and it means that if there are licensing issues associated with the APIs needed to control a particular resource, there may be difficulties getting a tool that can actually manage everything.

The “multi-VIM” model resolves that problem, but by doing so creates another.  In this model, a given kind of infrastructure has its own VIM, perhaps one for each vendor or each class of device.  That means anyone who wants to sell infrastructure to an NFV-equipped operator would have to provide the VIM needed to manage it.  However, it also means that something “above” the VIM has to decode the service model to the point where the correct, specific, VIM is invoked when appropriate.  That might mean “nested VIMs” where one super-VIM did the initial screening and then ran the right “sub-VIM”.

Another mechanism to deal with the problem is to employ an intermediary abstraction.  OpenStack does this by defining, for Neutron’s networking support, a “plugin” interface that can be written to by vendors to allow their gear to bind with the rest of Neutron.  In Kubernetes, the networking feature itself is a kind of “plugin” that, as long as the network implementation matches the integration with Kubernetes, makes Kubernetes network-independent.

A more sophisticated and complete solution to the problem is to formalize this intermediate abstraction into that third map-to-infrastructure piece of the puzzle.  Apache Mesos does that by creating a functional layer running on each machine and exposing APIs that then let any kind of deployment orchestration to map to a common resource abstraction Mesos creates.

You might wonder how this maps to the notion of “intent-based” data center tools, and the answer is that it’s complicated.  I blogged recently about the overall problem of using the term “intent”, and one area where the problem is especially acute is in data center resource abstraction.  An article quoting the new head of marketing for Apstra, a company who’s championed “intent-based” networking, says it “allows a network operator to state a business intent for the network and then use software to automatically implement this intent take corrective actions if needed.”  That’s really more a definition of policy-based networking, which Cisco has always championed.

Real intent modeling could fit into the virtualization-and-resource-pool problem, at several levels, in fact.  As I noted above, all of virtualization involves abstraction, and it is perfectly possible to expand the notion of any abstraction to include a definition of all the interfaces and lifecycle stages.  Any interior or implementation of the abstraction would then be responsible for meeting those external properties, just like any intent-modeled implementation.

The missing piece in this, the piece that could connect “intent-based” stuff with virtualization and orchestration, is some solid reference abstractions.  We know what the abstraction for a server is—a VM or container.  What’s the abstraction that represents a useful element of a resource pool?  In most orchestration tools, we’d call a cohesive set of resources that kind of operates in harmony a “cluster”.  Thus, it’s possible that intent-based data centers could define clusters as abstractions, and provide a way of mapping them to cluster-based orchestration (Kubernetes) or cluster-infrastructure mappings like Mesos.

This, I think, would be a very useful exercise, something that bodies like ETSI should have taken a look at.  A uniform intent-modeled cluster management abstraction would be a huge benefit in any sort of cloud, and it could be critically important in carrier cloud.  It would open the very important conversation about how to measure “resource equivalence” when we have differences in geography, connectivity, and regulations to deal with.  As I’ve pointed out in my ExperiaSphere stuff, it would also be the optimum way of connecting services that depend on specific resource behaviors with the places where that behavior can be provided.

The modeling of resources is just another of those areas where we’re not thinking deeply enough about the technology requirements of the concepts we’re proposing.  Without more foresight, it’s difficult to ensure that we take the optimum path toward implementing some of our most critical technology shifts, and without optimality the long-term future of those technologies is threatened.