UBS just published a brief on their SDN conference, where a number of vendors made presentations on the state of SDN, its issues and benefits, their own strategies, etc. If you read through the material you get an enormous range of visions (not surprisingly) of what SDN actually is, a consistent view that somehow SDN is right around the corner (not surprising either), and a very clear and virtually universal point. That point is that SDN and operations are highly interdependent.
The truth is that operationalization (the term I use to describe the engineering of a technology to allow it to be deployed and managed in the most cost-effective way) is the baseline problem for pretty much everything that’s important today—cloud, SDN, and NFV. The operational challenge of the future is created by the fact that cooperative systems’ complexity generally rises at the rate of the number of relationships, which often approaches the square of the number of elements. If you take a device and break it into a couple virtual functions and a small dumbed-down data-plane element, you end up with something that could be cheaper in capital costs but will be considerably more expensive to operate, unless you do something proactive to make things easier.
Enterprise and network operator management processes have evolved over time as networks have moved from being run through largely static provisioning to being run by managing (or at least bounding) adaptive behavior. You could visualize this as a shift from managing services to managing class of service. Arguably the state-of-the-management-art is to have the network generate a reasonable but not infinite service set, assign application/user relationships to one of these services when they’re created, and manage the infrastructure to meet the grade-of-service stipulations for each of the services in your set. Get the boat to the other side of the river and you get the passengers there too; you don’t have to route them independently.
While class-of-service management makes a lot of sense, it’s still had its challenges and it’s facing new ones. First, there is a sense that new technologies like SDN will achieve their benefits more through personalization, meaning that the ability of the network to adapt to user-level service visions is a fundamental value proposition. Second, there is the question of how services work when they’re made up not of physical boxes in static locations, but of virtual elements that are in theory put anywhere that’s convenient, and even broken up and replicated for performance or availability reasons.
I’ve gone through these issues with both operators and enterprises in surveys, and with the former in face-to-face discussions. It’s my view that neither operators nor the enterprises who really understand the challenges (what I call the “literati”) buy into the notion of personalizing network services. The SDN or NFV of the future isn’t going to turn perhaps four classes of service into perhaps four thousand, or even four hundred. Not only do buyers think this won’t scale, they think that the difference among the grades becomes inconsequential very quickly as you multiply the options. So that leaves the issue of dealing with virtual elements.
What seems inevitable in virtual-element networks, including elastic applications of cloud computing and NFV as well as SDN, is the notion of derived operations. You start derived operations with two principles and you cement it into reality with the third.
The first of the two starting principles is that infrastructure will always have to be policy-managed and isolated from connection dynamism. There is a very good reason to use OpenFlow to control overlay virtual SDN connectivity—virtual routers and tunnels—because you can shape the connection network that builds the customer-facing notion of application network services easily and flexibly that way. There’s no value to trying to extend application control directly to lower layers. You may use OpenFlow down there, but you use it in support of traffic policies and not to couple applications to lower OSI layers directly. Too many disasters happen, operationally speaking, if you do that. We need class-of-service management even when it’s class-of-virtual-service.
The second of the two starting points is that a completely elastic and virtual-element-compatible management model is too complicated for operations personnel to absorb. We could give a NOC a screen that showed the topology of a virtual network like one built with NFV, or a view of their virtual hosts for multi-component applications. What would they do with it? In any large-scale operation, the result would be like watching ants crawl around the mouth of an anthill. It would be disorder to the point where absorbing the current state, much less visualizing correct versus incorrect operations would be meaningless. Add in the notion that class-of-service management and automatic fault remediation would be fixing things or changing things as they arise, and you have visual chaos.
The unifying principle? It’s the notion of derived operations. We have to view complex virtual-element systems as things that we build from convenient subsystems. Infrastructure might be a single subsystem or a collection of them, divided by resource type, geography, etc. Within each of the subsystems we have a set of autonomous processes that cooperate to make sure the subsystem fits its role by supporting its mission. People can absorb the subsystem detail because it’s more contained in both scope and technology.
There’s a second dimension to this, and that’s the idea that these subsystems can be created not only by subsetting real technology based on real factors like geography, but also by grouping the stuff in any meaningful way. The best example of this is service management. A service is simply a cooperative behavior induced by a management process. Whether services represent real or virtual resources, the truth is that managing them by managing the resources that are cooperating hasn’t worked well from the first. Box X might or might not be involved in a given service, so whether it’s working or not doesn’t matter. Boxes Y and Z may be involved, along with the path that connects them, but looking at the state of any of the boxes or paths won’t by itself communicate whether the service is working properly. You need to synthesize management views, to create at the simplest level the notion of a service black box that presents a management interface as though it were a “god box” that did everything the service included. The state of that god box is the derived state of what’s inside it. You orchestrate management state just the way you provision services, and that’s true both for services using real devices and services made up of virtual elements. The latter is just a bit more complicated, more dynamic, but it still depends on orchestrated management.
So the lesson of the SDN conference? We still don’t understand the nature of software-defined or virtual networks at the most important level of all—the management level. Till we do, we’re going to be groping for business cases and service profits with less success than we should be, and often with no success at all.