Is the White-Box Space Facing the “Death of Too Many Choices?”

We may be heading for a solid white-box architecture, which is good.  We may have two distinctly different paths that could get us there, which is both good and bad.  It does seem clear that we’re setting up for a bit of competition among open-source giants in networking overall, and for the white-box stuff we have two open-source groups (the Linux Foundation and the ONF) promoting approaches, along with a half-dozen other initiatives.  We can afford more than one solution in the market, but too much fragmentation could hurt everyone, so let’s look at issues and options to see what might be needed.

A “white box” switch (or router) is a network device that’s designed to work with third-party software to build the functionality.  In theory, you could build an open white-box design, as Facebook has done with its combination of a white-box device model and its FBOSS software platform, but few would have the resources to do that and the bully pulpit of their own data centers and network to deploy them in.  Most white-box strategies are thus going to come from an organization, which is why the Linux Foundation and the ONF are important.

The basis of any white box is, of course, the box.  A hardware platform has specific properties that have to align with the operating system software.  The most popular and universally understood platform is the model that’s popularly used in computers, particularly Linux servers.  A white-box device based on server-like hardware can run a “light” version of Linux easily, and since the platform interfaces are well-known it’s also fairly easy to build a custom switch OS for the same hardware.

The server-like white-box model gives rise to what we could call “bicameral white-box software”, meaning “two brains”.  There’s an operating system and there’s switching software that runs on it, similar to how a switch/router instance runs as a software component on a standard server.  Using this approach has the advantage of providing a “switching brain” that can serve as a hosted function in the cloud.  uCPE that mimics a server and is fully compatible with cloud server tools would be an example of this approach, and it’s clearly going to be a future option for the market.

The problem with the server-like white-box model is that servers aren’t switches.  Custom chips could make a hardware platform a much better switching platform, and every use of a white box doesn’t demand that it be cloud-compatible.  Where price/performance is critical, we have Facebook’s decision to prove that it can pay to do a custom white-box, and AT&T also proves that point with its dNOS white-box switch OS, turned over to the Linux Foundation as the DANOS project.

Facebook’s FBOSS design is an indicator of what’s needed here.  The key is an “abstraction layer” that’s a bit like a thin operating system.  This layer creates a virtual hardware platform that can then be exploited by a “real” operating system above.  If you need your white-box software to run on different hardware, this model is a smart way to achieve it.

If forwarding efficiency is going to be the key for white-box switches, then it makes sense to have an abstraction layer that can handle specialized forwarding chips.  I’ve mentioned the P4 project in earlier blogs, but in summary it’s a project to develop a forwarding language and an abstraction-layer model that allows that language to be used on various hardware with various chips (or even with no special chips at all).  P4 was a separate project (p4.org) but is now hosted by the ONF.  The idea is that a chip vendor or box vendor would provide the P4 abstraction layer for its devices, and this would enable the “second-brain” switching software to work on their stuff.  You can read a good summary of the approach HERE.

P4 as a project has been working with both the Linux Foundation and the ONF, but it’s now hosted by the ONF and tightly integrated into the ONF Stratum project for white-box SDN as well as its Converged Multi-Access and Core (COMAC) reference design, applicable to 5G.  DANOS, the Linux Foundation distributed network operating system, also references P4.

The benefit of P4 in a white-box design is that it encourages what might be called a “tri-cameral” model.  The second-brain switching software is divided into two pieces, the P4 forwarding language part that describes data-plane behavior, and a control-plane part that manages the framework that turns device forwarding into path routing.  For example, you could write a P4 program to do IP forwarding, and it could be the same whether you added traditional IP discovery and routing table control, or SDN central control, on top.

This to me is the optimum model for a white-box switch, because it makes it possible to write “open forwarding” that can be adapted to various chips using the common “plugin” practice we see in networking.  If you were to add P4 to something like DANOS or Stratum, you’d end up with a highly portable and open model for white-box network devices.  The question is whether either will actually work hard to do the integrating, and whether the result will be widely used.  Right now, those two factors are working in opposite directions with respect to our two open-source white-box models.

DANOS is AT&T’s strategy, and AT&T is already deploying it in cell sites and starting to deploy in other white-box missions for business services and 5G.  That alone gives DANOS an installed-base advantage, and street creds with other network operators.  However, the DANOS white paper on the Linux Foundation website is still the original AT&T dNOS paper, and the Foundation website says that stuff is still “coalescing” around DANOS.  P4 is mentioned in the AT&T paper, but it’s hard to say how committed the Linux Foundation is to it.

Stratum is the brain-child of the ONF, which is an “operator-led consortium” and the father of commercial OpenFlow-based SDN.  It’s a mature, active, organization with plenty of PR (the recent show illustrates that), and the ONF is tightly linked with 5G initiatives.  Still, it’s clear that Stratum won’t have the advantage of AT&T’s major commitment to DANOS in building its own installed base.  On the flip side, it’s clearly committed to P4 and now hosts the P4 project.

One interesting thing about the white-box evolution is that while arguably OTTs like Google and Facebook started things off, the network operators may now be driving it.  That’s because the sheer volume of white-box devices an operator would deploy for 5G or carrier cloud makes the operators’ decisions on strategies and solutions automatically critical.  One Tier One could create a credible installed base for anything just with its own usage.  That’s why it may be that DANOS has the best shot at white-box supremacy in the near term.

What’s important, I think, is that something wins the white-box crown.  An open market with a dozen incompatible alternative approaches isn’t any better than a proprietary, competitive, market at coalescing support and defining a universal model.  It might be worse because absent competition, promotion of any approach is problematic.  If our two open-source players are starting to behave like competitors it might be a good thing.