Will TIP’s Requirements for Disaggregated Routing Help?

The Telecom Infrastructure Project (TIP) has done some interesting things, and one of the most recent and interesting of the bunch is their release of the Distributed Disaggregated Backbone Router (DDBR) requirements document. TIP has been a significant force in open-model network hardware, and so anything they do is important, and we need to take a look inside the document to see what insights we can find.

The “why” of the project covers what we’ve come to know as the traditional justifications for open-model network elements, much the same as those cited by other projects, like O-RAN. There is an almost-universal view among network equipment buyers that the vendors are locking them in, stalling innovation, failing to control prices, and so forth. You could fairly argue that DDBR is aimed at not doing what operators in particular think their vendors have been doing for years.

That doesn’t mean that the justifications aren’t valid. Every business tries to maximize its own revenues, and so there’s a lot of truth behind the assertion that vendors are doing that at the expense of customers. The best way to stop that is to open the market up, eliminating both proprietary hardware and software in favor of open implementations. Competition among the players who produce open solutions, or parts of those solutions, will then drive innovation and control prices.

One interesting justification stands out from the rest: “Taking advantage of Software Defining Network to make the network operation simpler, give tools for automation, enhance the capabilities of our network, and introduce a set of capabilities that today are not present.” This makes SDN at least an implicit requirement for the DDBR, and we’ll get more into that below.

The paper moves on to what’s essentially an indictment of chassis routers, and in particular the monolithic model of routing that dominates the market. While major router vendors like Cisco and Juniper have “disaggregated” in the sense that they separate hardware and software and permit either of those elements to be used with an open-model version of the other, the point of the paper is that monolithic is monolithic no matter how you wordsmith it. For backbone routing in particular, the scalability and resilience issues work against the monolithic model, and to reap the benefits of DDBR that the paper opened with, you need a different approach.

That approach is based on the spine-and-leaf or hierarchical routing/switching model that’s common in the data center already. You have some number of “spline” or higher-level boxes that connect to “leaf” edge boxes by multi-homing, and the combination supports any-to-any connectivity. This works well in the data center, but there are potential issues when you apply it to a backbone router.

When traffic enters a spline-and-leaf hierarchy at a leaf, it has to be distributed to “exit leaves”. If they happen to be on the same leaf device, the traffic makes the jump via the device’s internal switching. If not, the traffic has to jump to a spline device then back to the exit leaf device. This creates two potential issues; capacity constraints and latency.

If we assumed that every inbound flow had to exit on a different leaf device, then all traffic would be going through the splines. You’d need more spline devices and more leaf-to-spline multi-home connections to build the network, and the total capacity of the complex would be determined by the aggregate spline capacity. You also have to worry about whether the connections between leaves and splines are overloaded, and whether spline-to-spline links might be required.

The latency issue is created by the fact that the device connections are network trunks and not backplanes or fabric, and as you move through the hierarchy of devices, each hop introduces some latency. How much accumulates depends on the number of hops (and thus on the depth of the hierarchy) and the length of the connections, meaning how “distributed” the disaggregated complex is.

With the right set of white boxes and some care in assigning ports to leaf devices, you can control a lot of these issues, and I don’t think that either creates a true barrier to the concept. As I’ve pointed out on other blogs, we have white-box switches with very high-performance switching chips even today, and things are likely to get even better as competition drives innovation at the chip level.

Software is loaded into the devices based on the TIP’s Open Network Install Environment, which defines how an OS gets loaded. The rest of the software framework isn’t so far specified in detail, the presumption being that competition for excellence (and market share) will drive multiple solutions, competition, and innovation. There are, however, some baseline presumptions in the document about the software structure.

The software framework for the new DDBR is the most interesting piece of the story. The software piece starts with the separation of control and data planes. The spline-and-leaf hierarchy is purely data-plane, and the control plane is hosted on either a standard server or in the cloud. The paper presumes container implementation and cloud-native software for the cloud side, but it seems likely that a more general VM approach and IaaS would also be supported, and that bare metal execution of a cloud-compatible model would be what creates dedicated server hosting capability.

The paper isn’t explicit with respect to what “control-plane” functions are handled in the separate server/cloud device. The heavy lifting of topology and route management is explicit, though. The paper is also not explicit about just what’s meant by “SDN” in terms of the relationship between the control and data plane elements of DDBR.

Most of us think of SDN in terms of the ONF project and the OpenFlow protocol for switch control. There is no specific reference to either in the paper, and I can’t find an OpenFlow reference in any of TIP’s documents, including the blogs. Thus, I can’t say whether TIP’s DDBR is making OpenFlow an option or not; since it’s not shown in the figures in the paper, even where connections between an “SDN controller” and the data plane devices are represented, and it’s not called out in the protocol requirements either, I think it’s clearly not a requirement. My guess is that they’re accepting any standard mechanism for the exchanges that would relay forwarding tables to the devices.

The document isn’t specific about management features, but it does relate SDN’s ability to support network abstraction to the ease of interfacing with OSS/BSS. I think the goal is to require that the DDBR cluster look like a single device and present the same sort of management features a core chassis router would present.

So what do I think about this, overall? Let me offer some points.

First, I think that this further cements the open networking trends we’ve seen over the last five years or so, and that I’ve been blogging about. While there are still steps needed to transform a requirements document into a technical specification, we’re on the path toward having an extension to open-model networking that generalizes us away from what’s been a bit of 5G-specificity up to now. That’s good for users, and not so good for vendors.

Second, I think that the document still has a bit too much telco-think woven into it. The paper has this quote: “running on an on-prem x86 server or as a container in a cloud-native fashion”, which suggests that cloud-native requires container hosting or that all you need for cloud-native is a container, neither of which is true. There’s also all manner of network-protocol-specific stuff in the paper but not much about the cloud. I hope the next spec will correct this.

Third, this should be a clear signal to network equipment vendors that the world is changing, and that they have to change with it. The Internet is data service populism, it’s the largest source of IP traffic, and it’s a market where feature differentiation for vendors is virtually impossible. Connectivity is a commodity, which means that vendor margins are dependent on stepping beyond and above it, to something that’s meaningful to the user.

Finally, this is a signal to NFV-and-hosting players, to white-box network players, and to network software players that this market is going to open up and get competitive. Nobody can rest on their laurels at this point. Remember that DDBR is a unified hardware/software cluster, not just the hardware, and so it may well be that it will make it harder for someone who wants to be just a white-box player, or just a software player, to engage. That could have special implications in the 5G space, where many software elements are being presented without being specific about what they run on.

It’s positive to have a set of requirements for something as critical as distributed routing, but I’d have liked to have seen this set more explicitly aligned with modern cloud practices. The step of turning the requirements into specs, the next step according to TIP, will be absolutely critical.