Google’s Aquila Marries the Past and Future of Networks

Google has been one of the great innovators, even though a lot of what they do isn’t appreciated or even known. They launched Kubernetes and Anthos for cloud hosting, their Andromeda SDN network is probably the largest SDN deployment on the planet, and they also launched the Istio service mesh. Now they want to transform data center switching, and maybe other aspects of networking too, with a new project called Aquila.

Data center switching is predominantly based on Ethernet, but for decades there’s been a movement to provide a faster and lower-latency model than a stack of Ethernet switches. One initiative was InfiniBand, another more recent one from Intel was Omni-Path. Vendors, of course, have also launched fabric switches with the same goals; I remember a number of Juniper launches on the topic, stretching back almost two decades.

“Over a decade” surely demonstrates that data center switching alternatives has been hot for some time, but it’s gotten way hotter recently with the advent of cloud computing and the potential deployment of edge computing. Edge applications are latency-sensitive, and so latency in the data center switch (either “pure” latency resulting from hops or latency arising from lack of deterministic performance) is a potential stumbling block.

Fabric switches are often non-blocking, meaning that they can connect any-to-any without the risk of having capacity or interconnect limitations interfering. That helps a lot in both pure and non-deterministic latency sources, but Google wanted to take things a bit further, and so in a sense they’ve married SDN technology and some principles from the old-and-abandoned Asynchronous Transfer Model (ATM) and cell switching to create Aquila.

Cell switching, for those who don’t remember ATM, was based on the notion that if you divided packets of variable size into fixed-size cells, you could switch them more efficiently and also control how you handle congestion by allowing priority traffic to pass non-priority stuff at any cell boundary rather than waiting for a packet boundary. The whole story of ATM was determinism, which is what Ethernet switching tends to lack and what fabric switches don’t fully restore.

The SDN stuff comes in as a replacement for the Ethernet control plane. SDN, you may recall, is also the foundation of Google’s Andromeda core network, and it provides for explicit central route control for optimization of traffic engineering. In Aquila, SDN is distributed, with some in a central controller and some in the custom chips Google has created for Aquila. The combination of technologies means that Google can support distributed Aquila elements up to over 300 feet apart (via fiber optics) and with per-hop latencies no more than in the tens of nanoseconds, including cell processing and forward error correction to further improve determinism.

Aquila is still a kind of super-proof-of-concept at this point, but it’s not difficult to see that Google could create a unified networking model for both data center (Aquila) and core (Andromeda) that would build very efficient coupling between hosting and network. That would be key for Google, obviously, in cloud computing, but it would be even more important for the edge, and it might well be decisive in a metro-centric vision of the future.

Where Google intends to take the pieces of Aquila is an open question. Do they sell the chips? Do they license the concepts/patents? Whatever Google decides to do, it seems likely that others will look at the same overall approach to see if they can do something competitive and profitable. Chip players like Broadcom, for example, might create their own version of Aquila chips and even the Aquila model, given that a lot of the technology is likely inherited from open specifications like ATM and SDN.

Another open question for Aquila is whether the same technology could be used to build a router. “Cluster routers” are available today—AT&T picked DriveNets’ for its core router, for example. You can build one up from white-box switches, which is what DriveNets did, and if we assumed that there were commercial Aquila-like chips available, somebody would likely stuff them into white boxes to create the same sort of model for an SDN-ATM hybrid fabric.

This could be important beyond the cloud, and we can look to AT&T again to see why. AT&T told the financial industry that it intended to create a feature layer above pure connectivity that would add things that OTT partners could exploit to create new retail services. It’s hard to see how such a layer could be created and sustained except as a hosted application set partnered with connectivity. It’s hard to see how that could be made to work without some mechanism to optimize the hosted-feature-to-network-connectivity coupling.

The relationship between Aquila and Andromeda is also interesting here. Is Google working its way up to a multi-level SDN control plane for centralized-and-yet-distributed operation? Andromeda proves that you can build a very efficient IP core network using SDN and make it look like an IP autonomous system by implementing a BGP emulator at its edge. The same thing could be done with any piece of an IP network. SDN would let you traffic engineer things inside but to the outside you’d look pure legacy IP.

Returning to metro, this sort of thing could also play a role in metaverse hosting, my “metaverse of things”, and even things like Web3 or other blockchain-centric concepts. SDN routes could grow from data center, through the core, to metro, and even outward toward the edge. SDN could be a key ingredient in supporting network slicing for 5G. In short, Aquila might be leading us to finally realizing the full potential of SDN, and providing us with a tool in rebuilding our model of IP networks into something more like a low-latency metro mesh. It might even be a way of preserving electrical-layer handling rather than an optical mesh of metro networks as the future core.

Probably few even inside Google know where it might take Aquila, and certainly even fewer (if any) outside Google. Google hasn’t historically announced technologies like Aquila before they’ve exploited them inside their own networks. Are they doing that here because they see that the role of Aquila may be developed very quickly, so quickly that they can’t wait for their own production commissioning? It’s an interesting thought, and it might indicate some interesting twists and turns in networking’s future.