How the Cloud is Solving the Federation Problem

The old Ben Franklin quote about the difficulties in getting 13 clocks to chime at the same time wasn’t just about colonial politics.  Synchronizing autonomous systems to behave cooperatively has been a networking challenge for decades.  For a variety of reasons, including business practices, technology, and regulations, networks have been divided into separate “administrations”, and this division creates challenges when you want to create cohesive services or deploy applications for consistent behavior.  We may finally be seeing activity that will help resolve this, and perhaps it’s no surprise it’s coming from the cloud.

If we were to step back a decade, we’d see a number of international initiatives aimed at the creation of services that involved multiple administrative entities.  Three that come to mind are the work of the IPsphere Forum, the TMF, and the ITU.  The first of these created an explicit model of multiple administrations in network services, the second an approach that implicitly supported administrative separation, and the last a way of defining management harmony across a network, regardless of its ownership.  None of these has fully resolved the problem.

Likely because it’s not an easy problem to resolve, but in the cloud the vibrant and innovative open-source community has made some real progress.  There, we have a number of competing approaches too.  One is orchestration and management through a single tool with specialized plugins for each hosting environment.  This is what OpenStack does.  A second is infrastructure abstraction, where an abstract “host” is defined and exposed for use, and a lower layer then maps that abstraction to a variety of “real” hosting options, both in the data center and the cloud.  The final one is federation, which recognizes different administrative/management entities exist, and tries to work with that reality.

Federation as a concept really goes back to the network operators.  Operators have long been reluctant to give others any visibility into or control of their infrastructure, for the simple reasons that it would pose a competitive risk and create a risk of instability if the partner misused the capability.  The IPsphere Forum dealt with this by presuming that services were defined by the selling operator, and assembled from “elements” that were contributed by other operators.  An element had the properties of today’s intent models, meaning that what was inside was invisible and interchangeable.

The problem that the IPsphere Forum worked to address is similar to the one we find increasingly today because of highly architected or “managed” cloud services.  When public clouds were simply infrastructure-as-a-service (IaaS) virtual hosts, it was fairly straightforward to make deployment of something in the cloud look quite similar to data center deployment on virtual servers.  As we evolved toward things like functional/serverless computing, managed container services, and the use of web-service features in applications, we ended up with a situation not unlike the one found in multi-administration carrier services.

The relationship between administrative or architected entities and services overall is often called “federation” by the operator community.  The federation model recognizes that the contributing pieces of a service or application are autonomous elements with their own internal management and practices, and simply harnesses them based on externally exposed interfaces (hence my comment that it’s essentially an intent-modeled system).  Federation-modeled orchestration creates what’s effectively a multi-level process where we ask modeled administrative elements to do something that’s part of a higher-level vision of the service or application.

There are challenges associated with the federation model, the most significant of which emerge when the federation process hides properties of the infrastructure.  If we decide to scale or redeploy a component of an application or service from one administrative domain to another, the impact of the decision will depend on the quality of service in the new location.  That in turn will depend on the advertised internal behavior of the new location (which can be known) and on the way that the new location will impact overall workflow—which is harder to know.

The key term here is “workflow”, meaning the path of information movement among application components.  Suppose we deploy an instance of a component in Cloud A, and it happens that the hosting point within that cloud is across a continent, through a fairly large number of network hops, from the components we need to connect with.  The impact on overall performance will be different than it would be if the hosting point was in the same city as the partner components, but we don’t know where Cloud A will put the component, and Cloud A doesn’t know where the partner components are located.

Connectivity is another issue in federation, because it’s almost certain that address assignment within a federated element is controlled by the owner.  Lack of address control limits the options available for maintaining connectivity during redeployment of failed elements, and it can also impact security be making it difficult to predict the addresses of components and offer them protection through firewall-like mechanisms.

There’s a general increase in interest in virtual networking, particularly in SD-WAN form, as a means of dealing with connectivity.  Virtual networking creates, through some mechanism or another, an effective overlay network that disconnects the “logical” address of a user or resource from the physical network address and structure.  It’s the responsibility of the virtual network layer to maintain an association between the logical address and the physical address of the “node” that offers connectivity to the current location of the logical destination.

The cloud is promoting another network-related notion, which is the concept of the “service mesh”.  Remember that the cloud is a resilient and elastic hosting resource that necessarily has to be supplemented/supported by networking.  It does no good to replace or scale a component if you lose touch with what you’ve done.  A service mesh is designed to augment networking by including features to identify components and to provide load balancing if you scale them.  Thus, it tends to be more of a service abstraction layer than a connectivity layer alone.

Is there a unifying theme emerging here?  I think there is, though it’s still a bit murky.  We know that federation as a concept is essential because we know that individual cloud providers, network operators, and even data centers are going to be autonomous to a significant degree.  However, we also know that granularity to the level of administration is probably going to be a problem.  If everything that impacts reliability and performance is hidden inside an administrative element, then either specific measures to enhance these qualities aren’t available, or all the factors that you want considered have to be communicated to get a favorable outcome.  The latter isn’t practical.

What I think is the likely answer is a mixture of the federation model of old, with elements framed by administrations, and a more cloud-like model where elements are functionally defined.  To me, something based on a model hierarchy using TOSCA would be ideal, but even though TOSCA is about the cloud, it doesn’t seem as though this approach is gaining traction overall.  That means that while open-source components for cloud orchestration and federation are surely the best approach, getting them piecemeal and doing your own integration isn’t what enterprises tell me they want.

Today, I’d argue that VMware and Red Hat/IBM (in that order) are the most likely sources of a viable strategy.  VMware’s relationship with Amazon (primarily) and other cloud providers make it a contender in federation by definition, and Red Hat’s OpenShift is also a great approach to federated orchestration.  The company also seems determined to improve cloud integration and monitoring, and hybrid cloud is the main focus of its recent initiatives.  Best of all, the vehicle for nearly all the developments on federation in open source and the cloud center on the same technology, Kubernetes.

In early efforts like IPsphere, we never got our clocks chiming at the same time.  Vendors all tried to push things to favor their own product plans, and operators were inclined to blow in the wind.  It’s ironic to me that open source software, which has never had a central guiding body (or even goal) has somehow managed to collect itself on a unifying approach.  Maybe we need to rethink how market-driven solutions work!