Google’s Cloud Network as a Model of Federation

I have already blogged on aspects of Google Andromeda and a network-centric vision of the cloud.  SDxCentral did an article on Google’s overall vision, based on a presentation Google gave at SDN World Congress, and I think the vision merits some additional review.

One of the key points in the Google approach is that it is really five SDNs in one application, or perhaps a better way to put it is that Google applies five different SDN frameworks in a cooperative way.  At the SDN level, at least, this is an endorsement of the notion of infrastructure as a loosely coupled hierarchy of structures that are independently controlled within.  That’s a form of federation, though the fact that Google is a single company means that it doesn’t have to worry about the “horizontal” form, cutting over across multiple administrative domains.

There has never been any realistic chance that SDN would deploy in a large-scale application using a monolithic controller.  However, Google seems to be illustrating a slightly different vision of vertical federation, and that might be helpful in NFV.  First, though, we should look at federation overall.

“Federation” is a widely used but not universal term for a cooperative relationship between independent service management domains, aimed at presenting a single set of “services” to a higher-level user.  That user might be a retail or wholesale customer, so federation in this sense is a kind of fractal process, meaning that a given service might be a federation of other lower services.

Taken horizontally, federation has proved a requirement in even conventional services because many buyers have operating scopes broader than a single operator can support.  In the old post-divestiture days, a simple phone call could involve an originating local exchange carrier (LEC), an interexchange carrier (IXC) and a terminating LEC.  In this model of horizontal federation, there’s a common service conception within all the players, and an agreed gateway process through which they are linked (and settled).

Vertical federation isn’t as common, but it’s still used by operators who acquire transport (lower-layer) services from a partner to use in building a higher-level service infrastructure.  Mobile services often involve a form of horizontal federation (roaming) and a form of vertical federation (acquisition of remote metro trunks to support cell sites, or tower-sharing federations).

Even in modern networks we could see these two models migrating over, largely unchanged.  In fact, since current services depend on these early federation models, they’ll likely remain available for some time.  The question is what other models might arise, and this is a question Google may be helping to answer, but I want to talk about emerging horizontal federation first.

When you create a service across multiple infrastructure domains, you have three basic options.  First, you can follow the legacy model of a service set in each domain, linked through a service gateway.  Second, you can cede service creation to one domain and have it compose a unified service by combining lower-level services published by other domains.  Finally, you can let subordinate domains (for this particular service) cede resource control to the owning domain and let that domain deal with what it now sees as a unified set of resources.  All these options have merit, and risk.

The gateway approach is essential if you have legacy services built to use it, and legacy infrastructure that offers no other option.  The problem is that you’re either building something service-dependent (PSTN intercalling) or you’re concatenating lower-level services (notably IP) and then adding a super-layer to create the service you’re selling.  The former lacks agility and the latter poses questions on exploitation of the model by passive (non-paying) or OTT players.

The resource-ceding approach is an evolution of the current vertically integrated substructure-leasing model, like fiber trunks or backhaul paths.  It would give one operator control over the resources of another, and that’s something operators don’t like unless the resources involved are totally static.  However, the cloud’s model of multi-tenancy offers an opportunity to cede resources that include dynamic hosting and connectivity.  Groups like the ETSI NFV ISG have looked at this kind of cloud-like federation but it’s not really matured at this point.

The final model is the “component service” model.  The subordinate operators publish a service set from which the owning operator composes retail (or, in theory, other wholesale) services.  These subordinate services are inherently resource-independent and in modern terms are increasingly “virtual” in form, like a VPN or VLAN, and thus known by their SLAs and properties and not by their implementations.

Even a cursory review of these models demonstrates that the last one is really capable of becoming the only model.  If operators or administrative/technical domains publish a set of low-level services in virtual form, then those services could be incorporated in a vertical or horizontal way, or in combination, to create a higher-level service.

It’s into this mix that we toss the Google five-controller approach.  At the highest level, Google is building “Cloud 3.0” which is a virtual computer that’s created by connecting a bunch of discrete systems using a multi-layer network structure.  Google has two controllers that manage data center networks, the first to connect things at the server level and the second to abstract this into something more service-like.  Andromeda is this second controller set.

We then move to the WAN and higher.  B4, which is Google’s base-level WAN controller, provides for link optimization and their TE controller manages overall DCI connectivity.  Above all of that is the BwE controller that does service routing and enforcement at the total-complex level, and that’s responsible for insuring that you can meet application/service SLAs without creating lower-level issues (like the trade-off between latency and packet loss).

Google is doing vertical federation, but they don’t need to horizontally federate in their own network.  Their model, though, would be capable of federating horizontally because it’s based on a modeled service abstraction at all levels.  I think that Google is illustrating that the notion of an abstraction-based federation model is applicable to federation in any form, and that it would be the best way to approach the problem.

The abstraction approach would also map as a superset of policy-based systems.  A policy-managed federation presumes a common service framework (IP or Ethernet) and communicates the SLA requirements as policy parameters that are enforced by the receiving federation partner.  You can see the limitation of this easily; a common service framework doesn’t cover all possible services unless you have multiple levels of policies, and it also doesn’t open all the federation opportunities you might want to use because the partners never really exchange specific handling parameters, only service goals.

I also want to remind everyone that I’ve long advocated that NFV define a specific network and federation model, and I think the fact that Google defines its cloud in network terms shows how important the network is.  In enterprise cloud computing, too, Google is demonstrating that a private cloud has to be built around a specific network model, and that IMHO means adopting an SDN architecture for private clouds.