Hardware Abstraction, Software Portability, and Cloud Efficiency

One of the factors that limits software portability is custom hardware.  While most servers are based on standard CPU chips, the move toward GPU and FPGA acceleration in servers, and toward custom silicon in various forms for white-box switching, means that custom chip diversity is already limiting hardware portability.  The solution, at least in some eyes, is an intermediary API that standardizes the interface between software and specialized silicon.  There are two popular examples today, the P4 flow programming standard and Intel’s new oneAPI.

The problem of hardware specialization has been around a long time, and in fact if you’re a PC user, you can bet that you are already benefitting from the notion of a standard, intermediary, API.  Graphics chips used in display adapters have very different technologies, and if these differences percolated up to the level of gaming and video playback, you could almost bet that there’d be a lot less variety in any application space that involved video.

In the PC world, we call this intermediation process after the name we’ve given to the piece of technology that creates it, “drivers”.  There are PC drivers for just about every possible kind of device, from disk storage to multimedia and audio.  These have a common general approach to the problem of “intermediarization”, which is to adapt a hardware interface to a “standard” API that software can then reference.  That’s the same approach that both P4 and oneAPI take.

The upside of this is obvious; without intermediary adaptation, software would have to be released in a bewildering number of versions to accommodate differences in configuration, which would likely end software portability as we know it.  Intermediary adaptation also encourages open-model networking by making lock-in more difficult and making an open-source version of something as readily usable as a proprietary product with a zillion marketing dollars behind it.

There’s a downside, too; several in fact.  One is efficiency.  Trying to stuff many different approaches into a single API is a bit like trying to define a single edge device that supports everything, the often-derided “god-box” of the past.  Jack of all trades, master of none, says the old saw, and it’s true often enough to be an issue.  Another is innovation; it’s easy for an intermediary API framework to define a limited vision of functionality that can’t then be expanded without losing the compatibility that the API was intended to create.  A third is competing standards, where multiple vendors with different views of how the intermediation should evolve will present different “standards”, diluting early efforts to promote portability.  We still have multiple graphic chip standards, like OpenGL and DirectX.

P4, the first of the two intermediation specifications I’ll look at here, is a poster child for a lot of the positives and negatives.  P4 is a flow-programming language, meaning that not only does it define an intermediating layer between chips and software, but a language to express chip-level commands in.  Since both “routing” (Level 3) and “switching” (Level 2) packet handling are protocol-specific forwarding techniques, P4 can make it possible to define forwarding rules in a way that can be adapted to all forwarding, and (potentially) to all chips, or even no chips at all.

The name is alliterative; it stands for “Programming Protocol-Independent Packet Processors”, the title of a paper that first described the concept in 2015.  The first commercial P4 was arguably from Barefoot Networks, since acquired by Intel, and Intel is arguably the major commercial force behind P4 today.  However, it’s an open specification that any chip vendor could adopt and any developer could work with.

A P4 driver converts the P4 language to chip-specific commands, in pretty much the same way that something like OpenGL converts graphics commands.  For those who are in the software space, you can recognize the similarity between P4 and something like Java or (way back) Pascal.  In effect, P4 creates a “flow virtual machine” and the language to program it.

The ONF has embraced P4 as part of its Stratum model of open and SDN networking.  Many vendors have also embraced P4, but the arguable market leader in the flow-chip space, Broadcom, started its own P4-like concept with its Network Programming Language, or NPL.  There’s an open consortium behind NPL just as there is with P4.

P4 and NPL aren’t compatible, but that may not be the big question in the flow-switch space.  A flow switch is part of a network, meaning that there’s a network architecture that aligns flow-switch behavior collectively to suit a service.  A good example is SDN versus adaptive IP routing.  You could build a P4 or NPL application for either of these, and the result would be portable across switches with the appropriate language support.  However, an SDN flow switch isn’t workable in an adaptive IP network; the behaviors at the network level don’t align.  It’s like metric versus English wrenches and nuts.

Intel’s oneAPI, like P4, emerges as a response to the need to support innovation in optimum hardware design while preserving software compatibility.  The specific problem here is the specialized processors like GPUs that have growing application in areas like AI and computation-intensive image processing and other missions.  As already noted, different graphics chips have different interfaces, which means that software designed for one won’t work on another.

This problem is particularly acute in cloud computing, because a resource pool that consists in part of specialized processors is likely to evolve rather than being fork-lifted in.  There may be multiple processors involved, some that have emerged as more powerful successors and others that have been empowered by some new specialized mission.  The result is a mixture of processors, which means that the resource pool is fragmented and getting applications to the hosts that have the right chip combination is more difficult.

The oneAPI framework supports GPUs, CPUs, FPGAs, and in theory, any accelerator/processor technology.  Intel calls this their XPU vision, and it includes both a library/API set designed to allow XPU programming in any language, and a new language for parallel processing, Data Parallel C++ or DPC++.  Like P4, oneAPI is expected to gain support from a variety of XPU vendors, but just as Broadcom decided to ride its own horse in the P4 space, AMD, NVIDIA, and others may do the same with oneAPI.  Intel has some university support for creating the abstraction layer needed for other XPUs, though, and it seems likely that there will be ports of oneAPI for the other vendors’ chips.  It’s not yet clear whether any of these other vendors will try to promote their own approach, though.

The presumption of the oneAPI model is that there is a “host” general-purpose computer chip and a series of parallel XPUs that work to augment the host’s capabilities.  The term used for these related chips in oneAPI documentation is devices.  An XPU server thus has both traditional server capability and XPU/oneAPI capability.  The host chip is expected to act as a kind of window on the wider world, organizing the “devices” in its support and blending them into the global concept of a cloud application.

I’m a supporter of both these concepts, not only for what they are but for what they might lead to.  For example, could we see Intel integrate P4 into oneAPI, creating a model for a white-box switch that includes both its “main processor” and its switching fabric?  Or, best of all, could we be heading for a vision of a “cloud hosting unit” or CHU?  This would be the abstraction of a true cloud platform-as-a-service, offering perhaps its own “Cloud Parallel C++” language and its own APIs?

Cloud-native development depends on a lot of tools, most of which have multiple open-source implementations.  Public cloud providers have their own slant on almost everything cloud-native too, and all this variability creates a challenge for development, forcing almost all cloud-native work to fit to a specific hardware/software model that locks the user in.  Since lock-in is the number one fear of cloud users, that’s a major problem.

I’ve said for years that we should be thinking of the cloud as a unified virtual computer, but the challenge with that step is that there’s no universal language or library to program that unified virtual computer with.  Could someone develop that, and so set the cloud on its own pedestal at the top of all these parallel/virtualized-hardware abstractions?  Could that be the dawn of the real age of cloud computing?  I think it could be, and I think that we’re seeing the early steps toward that future in things like P4 and oneAPI.