What’s the Right Network for Cloud and Function Hosting?

If containers and Kubernetes are the way of the future (which I think they are), then future network models will have to support containers and Kubernetes.  Obviously, the combination expects to have IP networking available, but there are multiple options for providing that.  One is to deploy “real” IP in the form of an IP VPN.  A second is to use a virtual network, and a third is to use SD-WAN.  Which of these choices is the best, and how are they evolving?  That’s what we’ll look at here today.

Most container networking is done via the company VPN with no real additional tools.  Classic containers presume that applications are deployed within a private IP subnet, and the specific addresses that are intended to be shared are exposed explicitly.  Kubernetes presumes that containers are all addressable to each other, but not always exposed on the company VPN.  This setup doesn’t pose any major challenges for data center applications.

When you cross this over with the cloud, the problem is that cloud providers give users an address space, which again is typically a private IP address.  They then can map this (Amazon uses what it calls “elastic IP addresses”) to a standard address space on a company VPN.  The resulting setup isn’t too different from the way that data center container hosting would be handled.

Hybrid clouds often won’t stress this approach either.  If the applications divide into permanent data-center and cloud pieces, then the combination works fine.  The only requirement is that the cloud and data center pieces should be administered as separate Kubernetes domains and then “federated” through a separate tool (like Google’s Anthos).

The rub comes when components deploy across cloud and data center boundaries, so as to treat the sum of both cloud(s) and data centers as a single resource pool.  This can create issues because a new component instance might now be somewhere the rest of its subnetwork isn’t.  It can still be made to work, but it’s not ideal.

The final issue with the standard IP model is that when users have to access cloud applications, they’d typically have to go through the VPN, and in multi-cloud, it’s now difficult to mediate access to an application that might have instances in several clouds.

The second network option for containers and the cloud is to use a virtual network, which Kubernetes supports.  The positive of the virtual-network approach is that the virtual network can now contain all the Kubernetes containers, retaining the property of universal addressability, and that if the virtual network is supported across all the clouds and data centers, then everything is addressable to everything else (unless you inhibit that), and if something is moved it can still be reached.  With some technologies, it may even be possible for redeployed elements to retain their original addresses.

The obvious problem with the virtual network model is that it creates an additional layer of functionality, and in most cases that means virtual nodes to handle the virtual-network, which is effectively an overlay above Level 3.  There are additional headers that have to be stripped on exit and added on entry, and this is also a function of those virtual nodes.  While the packet overhead of this may not matter to all users in all applications, hosting the virtual nodes creates a processing burden and limitations on the number of virtual-network connections per node can also impact scalability.

A less-obvious problem is that virtual networks are another layer to be operationalized.  In effect, a virtual-network overlay is a whole new network.  You need to be able to correlate the addresses of things in the virtual world with the things in the real world.  Real worlds, in fact, because the correlations will be different for the data center(s) and cloud(s).  Some users like the virtual network approach a lot for its ability to independently maintain the container address space, but others dislike the complexity of maintaining the independent network.

The third option is the SD-WAN, which is a complicated story in itself, not to mention the way it might impact containers and the cloud.  SD-WAN implementations are overwhelmingly based on the virtual-network model, but unlike the generalized virtual network, an SD-WAN really extends a company VPN based on MPLS technology to sites where MPLS VPNs are either not available or not cost-effective.

Recently, SD-WAN vendors have started adding cloud networking to their offerings, which means that cloud components can be placed on the company VPN just like a branch office.  The SD-WAN “virtual network” is really an extension of the current virtual network.  Because SD-WANs extend the company VPN, they use company VPN addresses, and most users find them less operationally complex than the generalized virtual networks.

Because most SD-WANs are overlay networks just like most virtual networks are (128 Technology is an exception; they use session-smart tagging to avoid the overlay), they still create a second level of networking, create a need to terminate the tunnels, and potentially generate scalability problems because of the fact that a number of specific overlay on/off ramps are needed.  Because most SD-WAN sessions terminate in a hosting point (the data center or cloud), this concentrates those incremental resources at a single point, and careful capacity planning is needed there.

SD-WAN handling of the cloud elements varies across implementations.  In some cases, cloud traffic is routed via the company VPN in all cases.  In others, cloud traffic might go directly to the cloud.  Multi-cloud is similarly varied; some implementations add a multi-cloud routing hub element, some permit direct multi-cloud routing, and some route through the company VPN.

Many virtual network implementations have provided the plugins to be Kubernetes-compatible.  At least a few SD-WAN vendors (Cisco, most recently) have announced that they’ll offer Kubernetes support for their SD-WAN, but it’s not clear just what benefit this is in the SD-WAN space; perhaps more exemplar implementations will help.

You can see that, at this point, you couldn’t declare a generally “best” approach to container/Kubernetes networking.  The deciding factor, in my view, is the way that the various options will end up handling service mesh technology associated with cloud-native deployments.

Cloud-native implementations will usually involve a service mesh like Istio or Microsoft’s new Open Service Mesh, as a means of linking all the microservices into the necessary workflows while retaining the natural dynamism of microservices and cloud-native.  The performance of the service mesh, meaning in particular its latency, will be a critical factor in the end-to-end workflow delay and the quality of experience.  While a few vendors (Cisco, recently) have announced Kubernetes compatibility with their SD-WAN, nobody is really addressing the networking implications of service mesh, and that’s likely the future battleground.

One issue/opportunity that service-mesh-centric virtual networking raises is the question of addressing elements.  It’s highly desirable that components have something that approaches logical addressing; you aim a packet at a component and the mesh or network then directs the packet to either a real component (there’s only one instance) or to a load-balancer that then selects a component.  If there’s no component available, then it gets instantiated and the packet goes to the instance.  Does the network play a role in this, or is it all handled by the service mesh?  If a component has to be instantiated because nothing is available, can a newly freed-up component then jump in if the instantiation doesn’t complete before something becomes newly available?  You can see that this isn’t a simple topic.

All of this also relates to the way that network functions would likely be implemented.  The control and management planes of a network are event-handlers, and lend themselves to cloud-native-and-service-mesh deployment.  The data plane likely works best if the nodes are fixed, with new instances coming only if something breaks or under strict scalability rules.  The two planes have to be functionally integrated, but their requirements are so different that they’re really parallel networks (DriveNets, who recently won one of Light Reading’s Leading Lights awards, takes this approach).

Control-/data-plane separation, IMHO, is a mandate for any cloud implementation of network functions, and both then have to be related to the network technology that will bind the elements together within a plane, and carry coordinating messages between planes.  Since traditional protocols like IP will carry control/management messages in the data plane, the network of the future may well separate them at the ingress point and rebuild the traditional mixed-model data path on egress.  That admits to the possibility of an “internal” network structure that’s different from the traditional way that routers relate to each other in current IP networks.  SDN is an example of this, but only one possible example.  There’s plenty of room for invention here.

There’s a big opportunity here to go along with that invention, IMHO.  Cloud-native networking has to be efficient, agile, and exceptionally secure, and likely a virtual-network technology of some sort is the right answer.  I think vendors are starting to see the value of container- and Kubernetes-specific virtual network support, but it’s far from universal, and the great service-mesh-and-cloud-native opportunity is, for now, untapped.  It probably won’t stay that way for long.