Can the White-Box Server Players Find the Right Platform Software?

I blogged last week about the white-box revolution and “platform compatibility” for white-box servers.  Since I had a number of interesting emails from operators (and vendors) on the blog, I wanted to use this blog to expand on the whole issue of platforms for white boxes, and on what “compatibility”, openness, and other attributes of platforms really mean to the buyers and the market.

Hosted features and functions, including everything that would come out of NFV, OTT-like services, and other carrier cloud missions, are all “applications” or “software components”, which I’ll call “components” in this blog.  They run on something, and the “something” they run on is a combination of hardware and platform software, meaning an operating system and middleware tools.

Components match platform software first and foremost, because they access the APIs that software exposes.  Components can also be hardware-specific if details of the CPU instruction set are built into the component when it’s compiled.  However, the hardware-specificity can be resolved in most cases by recompilation to avoid those details.  Platform software dependencies are much harder to resolve, so you have to be pretty careful in framing your platform software details.

On the operating system side, Linux is the de facto standard.  There are specialized operating systems (called “embedded systems” in most cases) that are designed for use on devices with a limited software mission, but even these often use Linux (actually, the POSIX standard) APIs.  The big difference across possible platforms, the difference most likely to impact components, is the middleware.

Middleware tends to divide into three broad categories.  The first is “utility middleware” that represents services that are likely to be used by any component.  This would include some network features and advanced database features.  Category two is “special-function middleware” designed to provide a standard implementation for things like event-handling, GUI support, message queuing, etc.  This stuff isn’t always used, but there are benefits (as we’ll see) for having a single approach to the functions when they are needed.  The final category is “management middleware”, which supports the deployment and lifecycle management of the components and the service/application ecosystem to which they belong.

Today, almost every possible piece of middleware has at least a half-dozen competing implementations, both open-source and proprietary.  Thus, there are literally thousands of combinations of middleware that could be selected to be a part of a given platform for cloud or carrier cloud hosting.  If one development team picks Combination A and another Combination E, there will be differences that will likely mean that some components will not be portable between the two environments, and that programming and operations practices will not be compatible.

The purpose of “platform compatibility” as I’ve used the term is to define a platform that works across the full range of services and applications to be deployed.  That means anything runs everywhere and is managed in the same way.  As long as the underlying hardware runs the platform, and as long as components don’t directly access hardware/CPU features not universally available in the server farm, you have a universal framework.

This isn’t as easy to achieve as it sounds, as I can say from personal experience.  There are at least a dozen major Linux distributions (“distros”), and packages like OpenStack have dependencies on operating system versions and features.  Things that use these packages have dependencies on the packages themselves, and it can take weeks to resolve all these dependencies and get something that works together.

One of the advantages of broad-based open-source providers like Red Hat is that their stuff is all designed to work together, giving users a break from the often-daunting responsibilities of being their own software integrator.  However, it’s also possible for others, acting more as traditional integrators, to play that role.  The plus there is that many network operators and even cloud providers have already made a decision favoring a particular version of Linux.

This naturally raises the question of how you’d achieve an open framework when even the synchronization of open-source elements in a white-box hosting world is hard to achieve.  One possibility is to first work to frame out a set of basic function-hosting features, and then to map those to all of the popular Linux platforms, perhaps with the notion of a release level.  We might have Function1.1, for example, which would consist of a specific set of tools and versions, and Function 1.2 which had a more advanced set.  Platform vendors and integrators could then advertise themselves as compliant with a given version level when they could deliver that set of tools/versions as a package.

All of this would probably be workable for carrier cloud hosting from the central office inward, but when you address the actual edge, the so-called “universal CPE” or uCPE, it gets more complicated.  One reason is that the features and functions you think you need for uCPE have to be balanced against the cost of hosting them.  You could surely give someone a complete cloud-like server framework as uCPE, but to do so might well make uCPE more expensive than proprietary solutions targeted at specific service-edge missions like firewall.  Yet if you specialize to support those missions, you probably leave behind the tool/version combinations that work in a server farm, thus making feature migration difficult.

Another issue is the entirety of the service lifecycle management framework.  The ETSI Network Functions Virtualization (NFV) framework is way too complicated to justify if what you’re really doing is just shoveling feature elements into a premises box.  Service chaining inside a box?  The simplest, cheapest, mechanism with the lowest operations cost would probably look nothing like NFV and not much like cloud computing either.  That doesn’t mean, though, that some of the lifecycle management elements that would be workable for NFV in its broad mission couldn’t be applied to the uCPE framework.  The ETSI ISG isn’t doing much to create lifecycle management, nor in my view is the ETSI zero-touch initiative, but rational management could be defined to cover all the options.

The most critical challenge white-box platform technology faces is lifecycle management.  Vendors are already framing packages of platform tools that work together and combining management tools and features into the package.  These initiatives haven’t gone far enough to make deployment, redeployment, and scaling of components efficient and error-free, but there’s no question that vendors have an advantage in making those things happen.  After all, they have a direct financial incentive, a lot of resources, and a lot of the components are under their control.  Open platforms of any sort lack this kind of sponsorship and ownership.

NFV has failed to meet its objectives because it didn’t consider operations efficiencies to be in-scope, and the decision to rely on them being developed elsewhere proved an error; five years after NFV launched we still don’t have ZTA.  Can white-box platforms somehow assemble the right approach?  In my view, it would be possible, but I’m not seeing much progress in that area so far.  History may repeat itself.