Cloud and NFV Revolution: Arista’s CloudVision and Beyond

One of the persistent challenges SDN and NFV have faced is the conflict between their “revolutionary” label and the pedestrian applications that tend to come to light.  While there’s much new and different that could be done with either technology, most of what is done looks an awful lot like what you could do with standard devices.

One reflection of this conflict came out in a recent network conference, where a question on the loss or contamination of an SDN controller was answered by one provider with the comment that they’d fall back to legacy mode on all the devices—meaning that the SDN features were being used only to augment normal adaptive device behavior.

Arista Networks is an SDN vendor that has at least taken things further than the traditional.  Their high-level strategy has been based on a single distributed network operating system that provides uniform functionality across devices.  Just this week they announced “CloudVision”, an application of their EOS to enterprise cloud computing.  There’s a lot of interesting stuff in CloudVision, not the least of which being the features it exposes that could be valuable—even critical—to NFV.  This, when Arista doesn’t seem to be targeting the NFV space at all.

At a high level, CloudVision is a network automation or orchestration tool.  Arista characterizes it as a kind of third-generation approach to automation, with generation one being a do-it-yourself customized approach and generation two being something based more on DevOps tools that already exist.  Generation three, says Arista, is

If you see the cloud as a kind of static server-consolidation host, there’s not much that has to be done to networking to make it work.  The challenges arise when you make cloud processes dynamic, meaning that you can move them, replace them, or most of all scale them.  It gets worse when you design applications that create elastic relationships between work and processes, something like the grid computing of old.  If the cloud creates a kind of enormous and flexible virtual computer in the Great Somewhere, the distribution of the components of that virtual computer creates a problem with what we call state.

“State” is another way of saying “process context”.  We have discussions about state every day, when we ask somebody to do something and they say “I’m busy”.  It’s easy to see how you could represent the state of a processing system that has one element—your requested resource.  What if there are two, or a dozen?

The state problem hits network management right between the eyes because a workflow or path or whatever you’d like to call it is the result of a series of stateful behaviors.  Each switch has to participate in a flow through the means of a forwarding table entry.  The state of a flow is the sum of the states of all the switches, and of course all the trunks that connect them.  The fact that there’s nowhere to go to get the real state—or perhaps there are too many places to go—means that you can’t really know the status of the workflow you’re interested in.

Arista builds an abstraction, in a sense, that represents the “virtual” or workflow network.  This abstraction can be pushed down onto devices, and as that happens CloudVision keeps the state of the devices associated with the workflows they support.  It’s now possible to recover the state of the workflow itself, which means it can be managed.  This is why the basic feature of CloudVision, the first one that Arista lists in their documentation, is centralized control of network state.

If you have centralized state for your workflow abstractions you can manage SDN services in a way that’s very similar to how networks are typically managed today.  That’s a powerful capability for SDN networks, particularly for enterprises who are used to a given management approach and don’t want to rock the boat.  But as powerful as CloudVision is with this centralized control of network state, I think there’s greater power to be had, and it’s in this greater power stuff we find some areas that NFV and even broader cloud computing use might need.

There’s more to a cloud application than the network.  Distributed components, linked with CloudVision, form a higher-level abstraction than the workflow—they form the workprocess in effect.  If Arista could combine CloudVision as it is—central control of network state—to become central control of process state, they could “know” all about an application and its relationship to work.

Imagine a cloud that could know when to autoscale, to replace a component, to reconfigure itself at the process hosting level to accommodate changes in traffic.  Something like that could be done by extending CloudVision to the workprocess level.  Some cloud users are already in a position where something like this would be helpful for work management reasons.

Also imagine a sure, effective, distributed way of handling load balancing, something that could reflect the needs for stateful processes and not just stateless web-like exchanges.  NFV introduces this requirement in every single application where horizontal scaling is proposed.  Without distributed load balancing that can reflect process state, you can’t really scale anything.

The point here is that the distributed-state-centrally-represented concept (which is part of Arista’s DNA) is pretty significant for the future of the cloud and for NFV, but you have to extend it from where Arista focuses (the network) upward to that workprocess level.

SDN creates what are effectively virtual devices, distributed over a wide area and made up of many elements.  The state of a service in SDN is dependent on the state of the cooperative device elements that participate.  NFV and the advanced cloud applications create what are effectively virtual devices, too.  They look like giant virtual computers, and their work capabilities have to be represented centrally just as the networks’ capabilities have to be so represented.

I think we’ve missed a lot of this discussion in NFV.  Some NFV supporters, like Metaswitch, have always recognized the need for stateful load balancing in their Project Clearwater IMS implementation.  That’s good, but it would be better if the industry at large understood what Arista does at the network level, and did something about it.

As far as I can see, there’s no commitment on Arista’s part to extend CloudVision up to workprocess state management, but I think it’s something the company should look long and hard at doing.  If they can pull that off, they could be not only the “best” approach to cloud workflow orchestration, they could be the only way to build a truly agile cloud, and to support NFV along the way.