We’re all familiar with the notion of “layered networks” but one of the things that’s sometimes overlooked when considering these age-old concepts is that multiple layers often begat multiple connection topologies, service policies, and so forth. In today’s world we’re not thinking of layers as much in terms of protocol layers as in terms of service model layers, but the issue of layer-specific topologies and policies is still as much a bone of contention as ever. And it may be complicated by the fact that we’re likely optimizing the wrong thing these days.
Virtualization follows the concept of abstraction and instantiation. When you apply it to networks, you start by visualizing a given service as being fulfilled by a black box that connects inputs and outputs. That box is given a mission by the layer above it, and it fulfills that mission in many cases with its own virtualization process. Inside a black box is another black box; it’s like the bear going over the mountain.
The issue of nesting black boxes or supporting hierarchies of service models comes up today mostly in two places. One is at the operations level, where the “services” of the network are often abstracted to simplify how much work has to be done integrating OSS/BSS processes with individual devices. The other is in SDN services, where intermediary layers insulate generalized control logic from specific topologies or even implementations.
Multi-layer abstraction is a useful tool. If you look at the operations example, you can see that if an OSS/BSS has to understand how to route packets via OpenFlow, you’re exploding the complexity of the high-level operations software to the point where it may be difficult to make it work, and you run the risk of having what might be called “service-level” decisions create structures at the connection level that don’t even make sense. You also create a situation where a change in service topology that happens to involve using a different part of the network could end up changing a bunch of high-level operations tasks because that part of the network used a different implementation of SDN or a controller with different northbound APIs.
OpenStack’s network API, Neutron, is a good example of multi-layer abstraction in action. You supply Neutron with connection models—you say, for example, “create a subnet”—and Neutron does that without bothering you with the details. Underneath Neutron is often an OpenFlow controller that may have its own abstractions to help it translate a request into specific paths. And above Neutron is the whole cloud thing, with its top-level picture of a cloud service as being a set of hosts that become visible on an IP network like the Internet.
Of course, everything that’s logical and easy isn’t always good, and that’s the case with this nested-abstraction box-in-a-box thing. The problem you have in nesting abstractions is that it becomes more complicated to apply policies that guide the total instantiation when you’re hiding details from one layer to another. Let’s look at an example. You have a cloud with three data centers and three independent network vendors represented, so you have three network enclaves, each with a data center in it. You want to host a multi-component application in the cloud, and a single high-level model abstraction might be our IP subnet. However, the “best” data center and server might depend on characteristics of the individual network enclaves and how they happen to be interconnected, loaded, etc. If your high-level process says “subnet” to the next layer, which then divides the request among the resource pools there, how do you know that you picked the best option? The problem is that the networks know their own internal optimality but not that of the hosting layers they connect, and not of each other.
It would be possible for a higher layer to obtain a complete map of the potential resource commitments of the layer below it, and to aggregate a single picture of the resource pool to allow for centralized and optimized decisions. If you really had centralized SDN control over an entire network domain end-to-end that is an automatic attribute, but realistically SDN is going to be a bunch of interconnected domains just like IP is, and in the real world we’re going to have server resources and network resources that have to be considered/optimized in parallel. Is that even possible?
Not completely, and so what we’re really trying to figure out in this onrushing cloud, SDN, and NFV era is how much it matters. The benefit of optimization depends on the cost of inefficiency, and that depends on how different the resource cost of various network paths or hosting points might be. If there are five data centers with good capacity in a metro area, for example, and if we assume that network bandwidth within that metro area is fairly evenly distributed, you could probably presume that the best place to host something is the center closest to the point of connection; it diverts traffic less and produces lower latency and risk of disruption.
But how much work is being done trying to get the “best” answer when the difference between it and every other answer is way inside the realm of statistical significance? How much complexity might we generate in a network, in a cloud, by trying to gild the lily in terms of optimization—how much opex could we build up to overwhelm our marginal resource cost savings? Operators and enterprises alike are increasingly oversupplying capacity where unit cost is low (inside a data center, where fiber trunks are available, etc.) to reduce operations costs. Given that, is it sensible to try to get the “best” resource assignment? I asked, rhetorically, several years ago what the lowest-cost route would be in a network with zero unit bandwidth cost. The answer is “There isn’t any” because all routes would cost the same—nothing. We’re not there yet, but we have to start thinking in terms of how we deal with opex-optimized networks and not resource-optimized networks.