What’s the Best Way to Abstract Infrastructure?

In a blog last week, I criticized the NFV concept of the VIM.  A “virtual infrastructure manager”, I noted, should be an abstraction of the infrastructure, not a gateway to a presumed implementation.  The fact that some operators are attempting to create a set of standard “NFVi” or NFV Infrastructure models is proof; a good abstraction would render that unnecessary.  But what is a good abstraction?  That’s actually a complicated question.

If we look at the cloud market, where obviously the most work and the most mature work is being done, we could argue that a good infrastructure abstraction would be, in effect, a pair of synchronized intent models that represent the same underlying resource options.  One of the models would represent the deployment and lifecycle management of the applications or services of the infrastructure, and this would be something like a DevOps or container orchestration tool, augmented to understand and handle lifecycle events.  The other would represent the “infrastructure-as-code” stuff, the facilities needed to commission resources into the pool the other intent model face is then exploiting.

We’ll look at the service side first.  All intent models, of course, should first and foremost represent intentions reasonably, which means that ideally the model should expose APIs that do the things a user wants done, at a high level and without any specificity with respect to the implementation options below.  For example, you don’t want to have to know the mechanics of deploying an application component or virtual network function; you want to say (in effect) “Deploy THIS with THESE constraints”, and the model magic does the rest.

The intent models would also have to expose a mechanism for dealing with lifecycle events.  In my view, the actual state/event processing of a service would be done at the service level, “above” the infrastructure abstractions like the NFV VIM.  The infrastructure intent model needs to be able to post events to the higher service-layer stuff, so driving the service lifecycle process.  Those events might then be actioned (above) by invoking other API commands on the infrastructure intent model.

An event would have to be associated with a service if it’s destined for the service layer, and there would have to be a clear set of event semantics established.  In my first ExperiaSphere project, the Java code included a definition of event types and the associated information models, and I’m sure there are plenty of references.  States need to be coded similarly, though they’re used within the service layer and not in the infrastructure abstraction model.  Obviously, the intentions also have to be coded, and the constraints.  That might sound daunting, but remember that the goal is to keep the relationships generalized.

Let me offer an example.  A simple service might have the following states: “Orderable”, “Setting Up”, “Active”, “Fault”, Restoring”, and “Decommissioning”.  We might then have events called “Activate”, “Error”, and “Remove”.  If a service model defines a service element like “VPN”, then an “Activate” in the “Orderable” state could signal the higher service layer to invoke the infrastructure intent model on the specific components of “VPN”.  An “Error” event reported by one of those components would then result in a transition to the “Fault” state, in which the failed component might be removed, transitioning to the “Restoring” state when the new component was “Activated”, and then to the “Active” state again.

One thing this example shows is that each service element (“VPN” in my example) would represent a “service intent”.  The service model for it would decompose it into specific infrastructure “intents”, which would then be passed to the infrastructure intent model.  There is no need to reflect the “how” in the service model; that’s a function of lower-layer decomposition.  Similarly, the infrastructure intent relates not to the service overall, but to what’s expected from the infrastructure itself.  Commit an MPLS VPN via the management API, spin up a bunch of virtual routers and tunnels, or whatever.

The cloud is evolving to a kind of unified model of virtualized infrastructure from the top, creating a set of tools that treat both applications and resources in abstract (“pods” and “nodes” in Kubernetes terms).  This abstraction, because it’s linked to the application and through to the user, has the power of the business case behind it.  It is already defining the characteristics/behaviors of infrastructure that applications will “bind” to in order to run.  It then falls on what’s below to fit into those characteristics and behaviors, in order to host that which is paying the bills.

Infrastructure as code and similar resource-side initiatives are, in a sense, dinosaurs trying to fit into a mammalian ecosystem.  You had a dominant role when the world was young, but in this new age, you’re going to have to find some empty space to fill.  Put more technically than in terms of evolutionary biology, that means that as the abstraction layer between applications/services and infrastructure is assumed from above, it’s defined there.  What happens below is first constrained and then commoditized.

What has been happening below is the use of hardware-side abstraction to “mechanically adapt” infrastructure to suit the requirements of applications/services.  Configure infrastructure as needed, in short.  That is consistent with what I’ll call the “DevOps” view of orchestration/deployment, which is that there is no standardization of either that which is above or that which is below, and thus a need to meet them in the middle.  Containers, Kubernetes, and the related ecosystem have eroded that assumption, creating instead a de facto abstract model of infrastructure.

What that does to the hardware-side process is change the emphasis.  There is no vast set of possible application/service needs up there, waiting to land.  There is a standard model instead, which means that instead of a dynamic and continuous manipulation of hardware to fit application needs, we have more a commissioning or registration of hardware in accordance with application needs.  The model is set; deal with it.

Of course, that’s not totally true yet.  We have features in the container toolkit that can be used to steer pods to or away from nodes, but we don’t have a “language” that describes either the needs of an application or the capabilities of the infrastructure.  There’s still a lot of cobbling needed, which makes the process of mapping services or applications to infrastructure more complicated.  That’s one reason why the NFV ISG went a bit off on its own with respect to deployments; they wanted a model that reflected deploying constraints beyond minimal technical ones.  However, a general semantic for this task is what’s needed, not something mission-specific; that leaves too much room for competing definitions that hamper interoperability.

All this means is that it would be possible for the hardware-side players to come up with something useful, something that could secure them a place at the table on this future-of-hardware-abstraction topic.  If there is such a thing as an abstract, virtual, host, then it can be defined on the hosting side as easily as from the application direction.  I think that may be what the Common NFVi Telco Task Force (CNTT) has been trying to do, but instead of generalizing an abstraction for multiple classes of infrastructure, they’re proposing to define those classes explicitly, then force applications to fit into them.  That might work with NFV (though, frankly, I doubt it) but it’s not going to work with cloud applications in general.

One reason for that is that, in their own way, the basic truth of containerism (so to speak) is working to solve the problem, or at least minimize it.  Containers have a subtle but powerful homogenizing effect on applications.  They conform to a specific model for networking and they imply a strategy for componentization.  Kubernetes, the de facto champion of container orchestration, has specific mechanisms to match pods to nodes, meaning containers to hosts.  It’s a casual back-ending into establishing an abstract hosting model.

I think the growth of the Kubernetes ecosystem, particular the addition of things like service mesh, continue this subtle homogenizing effect.  The more tools are assumed to be in place in an ecosystem, the more applications can (and will) rely on them.  That shifts the emphasis of development from a fairly casual and disorderly set of options to a shrinking, even static, number.  Software knows what it runs on, or with, by the tools or middleware it has available.  That’s starting to coalesce around that Kubernetes ecosystem I’ve talked about.

It may be that the biggest challenge for the various cloud paths toward an abstract application/infrastructure binding approach is the lack of specific support for monolithic applications.  As I’ve noted many times, cloud-native is critical as a means of designing applications that exploit the cloud’s unique features.  Since we’ve not had such a model up to now, we’ve not done much exploiting.  Instead, the applications we have, that we depend on, come from the evolving transactional model, which is highly monolithic because it’s highly database-centric.  Interestingly, the NFV ISG’s virtual functions are monolithic.

The biggest barrier to bottom-up abstraction definition success is getting stuck in a device model.  The bottom, after all, is made up of devices, and so it’s difficult not to end up defining nothing more than a collective view of what’s really there.  The problem with that is that the abstraction loses two things—unity of interfacing (you still have separation by devices or classes), and completeness (you infer device properties exist, and so never break out features).  Since applications today really consume abstractions already (via device drivers), this can be a fatal flaw, one the infrastructure guys will have to work hard to overcome.

A cloud hosting model has to host everything, not only what’s in the cloud, because some of that which is not is going to be eventually, and all of that which is not will have to optimally link with a cloud piece eventually.  The resource-side players in the infrastructure-as-code and related games should think now about how they’ll present a true hybrid model, because I suspect their cloud-centric opponents are doing that.  I been reading more stuff on how cloud-native isn’t everything, and it’s coming from the cloud community.  If the cloud guys embrace the inevitability of monolithic applications in some way, the infrastructure side can only hope to play a game whose rules have been set elsewhere.