A First Step to an Open-Model Network Future

Do you think that core routing is just for big routers?  Think again.  DriveNets, a startup who developed a “Network Cloud” cloud-routing solution and AT&T co-announced (HERE and HERE) that DriveNets “is providing the software-based core routing solution for AT&T, the largest backbone in the US.”  That could fairly be called a blockbuster announcement (covered HERE and HERE and HERE), I think.  It’s also likely the first step in actually realizing “transformation” of network infrastructure.

I’ve mentioned DriveNets in a couple past blogs, relating to the fact that they separated the control and data planes, and I think that particular attribute is the core to their success.  What the AT&T deal does is validate both the company and the general notion of a network beyond proprietary routers.  That’s obviously going to create some competitive angst among vendors, and likely renew hope among operators.  The basics of their story has already been captured in my references, so I want to dig deeper, based on material from the company, to see just how revolutionary this might be.

Looking at the big picture first, the AT&T decision puts real buyer dollars behind a software-centric vision of networking, one that’s had its bumps in the road as standards efforts to create the architecture for the new system have failed to catch on.  Some operators I’ve talked with were enthusiastic about the shift in technology from routers to software, but concerned that they wouldn’t see a viable product in the near term.  DriveNets may now have relieved that concern, because now they have a very big reference account, an operator using DriveNets in the most critical of all missions.  That’s a pretty big revolution right there.

The next revolutionary truth is the fact that it is software that’s creating the DriveNets technology.  While DriveNets runs on white boxes, what makes it different is control/data separation, and how a kind of local cloud hosts the control plane.  The data plane is hosted on white-box switches based on Broadcom’s Jericho chip, and the control plane on a more generic-looking series of certified white-box devices.  White box devices also provide the external interfaces and connect with the data-plane fabric.  You can add white boxes as needed to augment resources for any of these missions.

Different-sized routers are created by combining the elements as needed into a “cluster”, and capacities up to 768TB can be supported with the newest generation of the chip (the AT&T deal goes up to 128TB).  AT&T says that they’ll have future announcements for other applications running on the same white-box devices, and that offers strong support for the notion that this is an open approach.  This is an important point I’ll return to later in this blog.

One obvious benefit of this new model of networking is that the same white boxes (except for extreme-edge applications that would use a single device) can be used to build up what’s effectively an infinite series of router models, so the same boxes are spares for everything.  Another benefit is that no matter how complex a cluster is, it looks like a single device both at the interface-and-topology level and at the management level.  But the less-obvious benefits may be the best of all.

Here’s a good one.  The control-plane software is a series of cloud-native microservices hosted in containers and connected with a secret-sauce, optimized, service mesh for message control.  This gives DriveNets (and, in this case, AT&T) the ability to benefit from the CI/CD (continuous integration and continuous delivery) experience of cloud applications.  In fact, the control-plane software and DriveNets software overall is based on cloud principles, which is where operators have been saying they want to go for ages.  There are no complicated, multi-forked, code trains that have haunted traditional routers for ages.  All the software microservices are available, loaded when needed, and changed as needed.

And another.  When you get a new Network Cloud cluster, or just a new box, it’s plug and play.  When it’s first turned up, it will go to a DriveNets cloud service to get basic software, and it then contacts a local management server for its configuration, setup, and the details.  Sounds a lot like GitOps in the cloud, right?

And still more.  The data-plane boxes can be configured with fast-failover paths that allow for incredibly quick reaction time to faults.  The control plane, in its cloud-cluster of white boxes, will reconverge on the new configuration.  More complex network issues that require the control plane benefit from having the cluster’s internal configuration tunable to support the overall connection topology that’s required to return to normal operation.  From the outside, it’s a single device.

And another…the operating system (DNOS) and orchestrator (DNOR) elements combine with the Network Cloud Controller and Network Cloud Management elements to provide for internal lifecycle management.  Everything that’s inside the cluster, which is thus a classic “black box”, is managed by the cluster software to meet the external interface and (implicit) SLA.  The fact that the cluster is a cloud is invisible to the outside management framework, so the architecture doesn’t add to complexity or increase opex.

To recap the market impact, what we have is a validation that a major Tier One (AT&T) who has been committed to open-model networking is satisfied that there is an implementation of that concept that’s credible enough to bet their core network on.  We also have a pretty strong statement that an “open-model” network is a network composed of white-box devices that can, at least in theory, host anything.

Remember that AT&T says they may host other software on the same white boxes.  I think that means that to AT&T, openness means protection against stranded investment, not necessarily that every component of the solution, the non-capital components in particular, are open.  You don’t need to have open-source software on white boxes to be “open”, but you do have to be able to run multiple classes of software on whatever white boxes you select.

I’m a strategist, a futurist if you like, so for me it’s always the major strategic implications that matter.  Where is the future open-model network heading?  What’s the next level up, strategy-wise, that we could derive from the announcement?  I think it may arise from another announcement I blogged on just last week.  If we go back that blog on the ONF conference, there was an ONF presentation on “The Network as a Programmable Platform”, which proposed an SDN-centric vision of the network of the future.  As the title suggests, it was a “programmable” network.

One figure in that presentation shows the SDN Controller with a series of “northbound” APIs linked to “applications”, one of which is BGP4.  What the figure is proposing is that BGP4 implementations, running as a separate application, could control the forwarding of SDN OpenFlow white-box switches, and create what looks like a BGP core.  You could do the same thing, according to the ONF vision, to implement the 5G interfaces between their “control plane” (which I remind you is not the same as the IP control plane) and an IP network.  This is almost exactly what Google’s Andromeda project did for Google.

Why would AT&T not have selected an ONF implementation, then?  They’ve supported, and contributed elements to, the ONF solution.  The answer, I think, could be simple:  there is no validated implementation of the ONF solution available commercially.  It may be that DriveNets is seen by AT&T as the closest thing to that utopian model that’s available today, and of course they may also believe that DriveNets is close enough that they could evolve to the model faster than someone (just starting with just the ONF diagrams to work with) could implement it.

Why could they think that?  If you dig into the DriveNets material (particularly their Tech Field Day stuff), the architecture of their separated control plane is characterized as a web of cloud-native microservices.  These work together to do a lot of things, one of which being creating what looks like a single router from the combined behavior of a lot of separate devices.  And DriveNets, in the video, says “Twice the number of network operators say they expect radical change in network architecture within three years, versus those who say they do not.”  They also say that their approach “builds networks like hyperscaler clouds.”

Let me see…radical changes are needed, build networks like hyperscaler clouds, consolidate multiple devices into a single virtual view?  This sounds like a network-programmability model that doesn’t rely on SDN controllers, single or a federation.  Do all that at the network level and you’ve implemented the bottom half of the ONF vision.

How about the top half, those northbound APIs?  There’s no detail on exactly what northbound APIs are currently exposed by DriveNets, but since their control plane is made up of microservices, there’s no reason why any API couldn’t be added fairly easily.  The current DriveNets cluster has to have a single forwarding table from which it would derive the forwarding tables for each of their data plane fabric devices, so could that table be used to create a network-as-a-service offering, including BGP4?  It seems possible.

New microservices could be developed by DriveNets, by partners, and even by customers, and these microservices could extend the whole DriveNets model.  ONF OpenFlow control?  It could be done.  Network-as-a-service APIs to support the mobility management and media access elements of 5G, via the N2 and N4 interfaces?  It could be done.  I’m not saying it will be, only that it seems possible, just as it’s possible that the ONF model could be implemented.

“Could” being the operative word.  The problem with the ONF model, as I said in that blog on their conference referenced above, is that central SDN controller.  That’s a massive scalability and single-point-of-failure problem.  Federated SDN controllers is a logical step I called out years ago when this issue was raised, but it’s not been developed (you can see that by the fact that the ONF’s pitch doesn’t reference it).  There is no industry standards initiative in the history of telecom that developed something in under two years, so the ONF solution can only be realized if somebody simply extends it on their own.  DriveNets extension to programmability and ONF implementation are both “coulds”.

Even without being able to create a programmable network like the ONF diagram shows, DriveNets has made a tremendous advance here.  It might also demonstrate that there are at least two ways to create that programmable network, the yet-to-be-defined SDN controller federation model and the “network-wide control plane” model.  Two options to achieve our goals are surely better than one.

Customers or opportunities are another place where more is better, and to achieve that, any competitor in the new-model network space is going to have to confront the business case question.  AT&T has an unusually high sensitivity to infrastructure costs, owing to its overall low demand density.  It would be logical for them to be on the leading edge of an open-model network revolution.  How far behind might the other operators be?  That’s likely to depend on what value proposition, beyond the simple “the new network costs less than routers”, is presented to prospects.  Most operators with high demand densities won’t face the issues AT&T is facing for another couple years, but there are other factors that could drive them to a transformation decision before that.  It’s a question of who, if anyone, presents those other factors.

We are going to have a transformation in networking eventually, for every operator, and I think the AT&T/DriveNets deal makes that clear.  New models work and they’re cheaper, and every Wall Street research firm I know of (and I know of a lot of them) expects telco spending to be at least slightly off for the balance of this year, and of 2021 as well.   In fact, they don’t really see anything to turn that trend around.  Even operators with high demand densities and correspondingly lower pressure on capex savings will still not throw money away when a cheaper and better option is generated.  Opportunities for a new strategy are growing.

So are alternatives to what that strategy might look like.  We’re going to have a bit of a race in fielding a solution, between cheaper routers, white-boxes with simple router instances aboard, clusters with separated control and data planes like DriveNets, and SDN/ONF-based solutions.  The combination of opportunity and competition means there’s a race to pick the right prospects and tell the right story.  It’s going to be an interesting 2021 for sure.