How Opportunity Will Change the Data Center, and the Global Network

What does the data center of the future look like?  Or the cloud of the future?  Are they related, and if so, how?  Google has built data centers and global networks in concert, to the point where arguably the two are faces of the same coin.  Amazon and Microsoft have built data centers to be the heart of the cloud.  You could say that Google had an outside-in or top-down vision, and that Amazon and Microsoft have a bottom-up view.  What’s best?

We started with computing focused on a few singular points, the habitat of the venerable mainframe computers.  This was the only way to make computing work at a time when a 16kb computer cost a hundred grand, without any I/O.  As computing got cheaper, we entered the distributed age.  Over time, the cost of maintaining all the distributed stuff created an economic break, and we had server consolidation that re-established big data centers.  The cloud today is built more from big data centers than from distributed computing.

That is going to change, though.  Anything that requires very fast response times, including some transactional applications and probably most of IoT, requires “short control loops”, meaning low propagation delay between the point of collection, the point of processing, and the point of action.  NFV, which distributes pieces of functionality in resource pools, could be hampered or even halted if traffic had to go far afield between hosting points and user connections—what’s called “hairpinning” in networking of old.  Cisco’s notion of “fog computing”, meaning distribution of computing to the very edge of a network, matches operators’ views that they’d end up with a mini-data-center “everywhere we have real estate.”

The more you distribute resources, the smaller every pool data center would be.  Push computing to the carrier-cloud limit of about 100 thousand new data centers hosting perhaps ten to forty million servers, and you have between one and four hundred servers per data center, which isn’t an enormous amount.  Pull that same number of servers back to metro data centers and you multiply server count per data center by a hundred, which is big by anyone’s measure.  So does hyperconvergence become hype?

No, we just have to understand it better.  At the core of the data center of the future, no matter how big it is, there’s common technical requirements.  You have racks of gear for space efficiency, and in fog computing you probably don’t have a lot of edge-space to host in.  My model says that hyperconvergence is more important in the fog than in the classic cloud or data center.

You have very fast I/O or storage busses to connect to big data repositories, whether they’re centralized or cluster-distributed a la Hadoop.  You also, in today’s world, have exceptionally fast network adapters and connections.  You don’t want response time to be limited by propagation delay, and you don’t want work queuing because you can’t get it in or out of a server.  This is why I think we can assume that we’ll have silicon photonics used increasingly in network adapters, and why I think we’ll also see the Google approach of hosted packet processors that handle basic data analysis in-flight.

You can’t increase network adapter speeds only to lose performance in aggregation, and that’s the first place where we see a difference in the architecture of a few big data centers versus many small ones.  You can handle your four hundred servers in fog data centers with traditional two-tier top-of-rack-and-master-switch models of connectivity.  Even the old rule that a trunk has to be ten times the capacity of a port will work, but as your need to create data center meshing to allow for application component or virtual function exchanges (“horizontal traffic”) you find larger data centers need either much faster trunks than we have today, or a different model, like a fabric that provides any-to-any non-blocking connectivity.  I think that even in mini (or “fog”) data centers, we’ll see fabric technology ruling by 2020.

The whole purpose of the ten-times-port-equals-trunk rule was to design so that you didn’t have to do a lot of complicated capacity management to insure low latencies.  For both mini/fog and larger data centers, extending that rule to data center interconnect means generating a lot of bandwidth.  Again by 2020, the greatest capacity found between any two points in the network won’t be found in the core, but in metro DCI.  In effect, DCI becomes the higher-tier switching in a fog computing deployment because your racks are now distributed.  But the mission of the switching remains the same—you have to support any-to-any, anywhere, and do so with minimal delay jitter.

Future applications will clearly be highly distributed, whether the resource pools are or not.  The distribution demands that inter-component latency is minimal lest QoE suffer, and again you don’t want to have complicated management processes deciding where to put stuff to avoid performance jitter.  You know that in the next hour something will come up (a sporting event, a star suddenly appearing in a restaurant, a traffic jam, a natural disaster) that will toss your plans overboard.  Overbuild is the rule.

Beyond fast-fiber paths and fabric switching, this quickly becomes the classic SDN mission.  You can stripe traffic between data centers (share multiple wavelengths by distributing packets across them, either with sequence indicators or by allowing them to pass each other because higher layers will reorder them) and eventually we may see wavelengths terminating directly on fabrics using silicon photonics again.  Probably, though, we’ll have to control logical connectivity and white-box forwarding is probably going to come along in earnest in the 2020 period to accommodate the explosion in the number of distributed data centers and servers.

You can see that what’s happening here is more like the creation of enormous virtual data centers that map to the fog, and the connectivity that creates all of that is “inside” the fog.  It’s a different model of network-building, different from the old notion that computers sit on networks.  Now networks are inside the virtual computer, and I don’t think vendors have fully realized just what that is going to mean.

Do you want content?  Connect to a cache.  Do you want processing?  Connect to the fog.  Whatever you want is going to be somewhere close, in fact often at the inside edge of your access connection.  The whole model of networking as a vast concentrating web of devices goes away because everything is so close.  Gradually, points of service (caching and cloud) are going to concentrate in each metro where there’s an addressable market.  The metros will be connected to be sure, and there will still be open connectivity, but the real bucks will be made where there’s significant economic opportunity, which means good metros.  Short-haul optics, agile optics, white-box-supplemented optics are the way of the future.

This adds to the issues of the legacy switch/router vendors because if you aren’t going to build classic networks at all.  The metro model begs for an overlay approach because you have an access network (which is essentially a tunnel) connected through a fog edge to a DCI network.  Where’s the IP or Ethernet?  Both are basically interfaces, which you can build using software-instances and SD-WAN.

The key point here is that economic trends seem to be taking us along the same path as technology optimization.  Even if you don’t believe that tech will somehow eradicate networks as we knew them, economics sure as heck can do that.  And it will, starting with the data center.