We are obviously a long way from exploiting the full potential of our revolutionary network technologies. Years into SDN evolution we still can’t build a global network with it, for example. Part of the reason is that we’ve not attacked the notion holistically. I want to continue my exploration of “lessons learned” with respect to developing a model for universal management and orchestration for the cloud, SDN, and NFV. Again, this isn’t about my ExperiaSphere activity in a direct sense, only about what’s come out of it, and what it means in that critical holistic sense.
At the high level, this is about what we could call connection networks. A connection network supports the delivery of information between member endpoints based on cooperative functional behavior of the network components and in conformance to rules for handling that user and provider agree upon. A connection network is essentially an abstraction, a black box that defines its properties by the relationship between its inputs and outputs.
IP and Ethernet create connection networks, based on adaptive cooperative behavior of the devices. OpenFlow and SDN can obviously replicate the behavior of IP and Ethernet, and some believe they must do that to be useful. My view is that while supporting current connection-network models is helpful in evolution from today to the future, it’s not essential even for that. All you have to do is to interoperate with legacy networks in some way. The really essential thing is that any model of SDN has to do something different, something better, or it will offer little incentive to remake infrastructure.
If you read about SDN, you’d conclude it’s a lot easier to point to things that aren’t SDN than to point to things that are, and not because of complexity of properties. Everything claims to be SDN these days. In part, that’s a fair claim because of my connection-network-black-box analogy. If we use that as a jumping-off point, what we’re saying when we say “SDN” is that software has the ability to generate a variety of connection models from network resources. It’s the connection models that software defines, the mechanism for creating the models inside the box is the function of management and orchestration—not in the limited NFV sense but in the broadest sense. Functionally, this opens three broad ranges of choices for implementing inside the black box.
Option one is to provide hooks and tweaks to existing network protocols that can refine or change forwarding behavior, creating software control over connection network services. This is the Cisco approach, roughly, and it has the advantage of building the future based on a fairly straightforward evolution of present devices. The challenge this approach faces is that native behaviors of the underlying network, addressing, etc. are still exposed.
Option two is to build an overlay structure on top of the classic three layers of networking (the other four OSI layers are end-to-end). This is what Nicira and many of the SDN players do, and the advantage it has is that it uses current, evolving, or future network devices and paths as transport resources and builds connection networks above them. That not only frees up new paths of device evolution, it preserves current devices. The disadvantage is that the overlay network can only segment connectivity and manipulate forwarding policies within the range of what the transport resources are providing. A best-efforts IP or Ethernet path isn’t made better by just adding a layer to it.
Option three is to build a new forwarding model for the network devices themselves. You would then control (in some way) the per-device forwarding process to secure the connection network behavior you wanted. The classic approach to this option is OpenFlow, which uses a central control process to build forwarding rules that add up to that cooperative behavior inside the black box. The advantage of this is that you couple connection network behavior down to where traffic is handled, which means you can control connection behavior as much as would be possible anywhere. The disadvantage is that centralized control of network behavior can radically increase control traffic to manage the forwarding tables in devices and while central control elements can in theory respond to failures by creating new routes, there’s always a risk that a problem would cut a device off from the control interface, which could then mean a long and complicated process of finding a way for our device ET to phone home.
My experiences with SDN and NFV suggest that the best strategy overall would be a combination of the all three. I think that overlay networks should be considered the “Level 3a” of the future, and that rules for connectivity should be created at this level. Application and service awareness should be created and enforced in the overlay. I also think that the overlay network should, through its own logical/virtual devices, be responsible for converting connection network requirements to transport-layer changes. These could be applied for each domain of transport network (SDN/OpenFlow in a data center, legacy devices in a branch, optical tunnels between) in a way appropriate to the technology available.
The connection network notion also offers some benefits in defining a specific relationship between SDN and the cloud or NFV. A connection network is a black box, remember. In ExperiaSphere I called that abstraction a “service model”. Any service model is a set of connection properties that can be created by anything that’s suitable, which means that the abstraction can be implemented using legacy technology, new OpenFlow devices, overlay networks, or anything else. The consumer application, which might be orchestrating a service or application experience, relies on the properties, which are then assured by the management/orchestration practices of the implementation.
Connection networks also let us think about “network services” unfettered by assumptions about how they work based on projecting today’s service behaviors into future offerings. You can forward packets based on the phases of the moon or the time of day or the location of a mobile user in a cellular network or the location of that user relative to the best content cache. You can use grade of service, traffic type, application, or anything else as a metric. Describe the properties of a useful service, consider it to be a service model representing a connection network, then use MANO principles to deploy what you want.
This approach makes migrating to new hardware a choice. If highly useful services are highly inefficient when built as overlays onto legacy infrastructure in one or more places, you replace the infrastructure to improve efficiency. But the service works through the process. So when we think about how SDN should work, we should be thinking first of how we want connection network services to work.