One of the important issues of multi-layer networking, and in fact multi-layer infrastructure, is how things at the top percolate down to the bottom. Any kind of higher-layer service depends on lower-layer resources, and how these resources are committed or released is an important factor under any circumstances. If the agility of lower layers increases, then so does the importance of coordination. Just saying that you can do pseudowires or agile optics, or control optical paths with SDN, doesn’t address the whole problem, which is why I propose to talk about some of the key issues.
One interesting point that operators have raised with me is that the popular notion that service activities would directly drive transport changes—“multi-layer provisioning”—is simply not useful. The risks associated with this sort of thing are excessive because it introduces the chance that a provisioning error at the service level would impact transport networks to the point where it would impact other services and customers. What operators want is not integrated multi-layer provisioning, but rather a way to coordinate transport configuration.
Following this theme, there are two basic models of retail service, coercive/explicit and permissive/implicit. Coercive services commit resources on request—you set them up. Permissive services preposition resources and you simply connect to them. VPNs and VLANs are coercive and the Internet is permissive. There are also two ways that lower-layer services can be committed. One is to link the commitment below to a commitment above, which might be called a stimulus model, and the other is to commit based on aggregate conditions, which we might call the analytic model. This has all been true for a long time, but virtualization and software-defined networking is changing the game, at least potentially.
Today, it’s rare for lower network layers, often called “transport” to respond to service-level changes directly. What happens instead is the analytic model, where capacity planning and traffic analysis combine to drive changes in transport configuration. Those changes are often quite long-cycle because they often depend on making physical changes to trunks and nodes. Even when there’s some transport-level agility, it’s still the rule to reconfigure transport from an operations center rather than with automatic tools.
There are top-down and bottom-up factors that offer incentive or opportunity to change this practice, providing it can stay aligned with operator stability and security goals. At the bottom, increased optical agility and the use of protocol tunnels based on anything from MPLS to SDN allow for much more dynamic reconfiguration, to the point where it’s possible that network problems that would ordinarily have resulted in a higher-layer protocol reaction (like a loss of a trunk in an IP network) can instead be remediated at the trunk level. The value of lower-layer agility is clearly limited if you try to drive the changes manually.
From the top, the big change is the various highly agile virtual-network technologies. Virtual networks, including those created with SDN or SD-WAN, are coercive in their service model, because they are set up explicitly. When you set up a network service you have the opportunity to “stimulate” the entire stack of service layers, not to do coupled or integrated multi-layer provisioning but to reconsider resource commitments. This is what I mean by a stimulus model, of course. It’s therefore fair to say that virtual networking in any form has the potential to change the paradigm below.
There are two possible responses, then, to the way lower-layer paths and capacity are managed. One is to adopt a model where service stimulus from above drives an analytic process that rethinks the configuration of what’s essentially virtual transport. An order with an SLA would then launch an analytics process that would review transport behavior based on the introduction of the new service and, if necessary, re-frame transport based on how that meeting that SLA would alter capacity plans and potentially impact target resource utilization and the SLAs of other services/customers. Another is to shorten the cycle of the analytic model, depending on a combination of your ability to quickly recognize changes in traffic created by new services and your ability to harness service automation to quickly alter transport characteristics to address the changes. Which way is best? It depends on a number of factors.
One factor is the scale of service traffic relative to the total traffic of the transport network. If a “service” is an individual’s or SMB’s personal connectivity commitment, then it’s very likely that the SLA for the service would have no significant impact on network traffic overall, and it would not only be useless to stimulate transport changes based on it, it would be dangerous because of a risk of overloading control resources with a task that had no helpful likely outcome. On the other hand, a new global enterprise VPN might have a very significant impact on transport traffic, and you might indeed want to reflect the commitment the SLA for such a service reflects even before the traffic shows up. That could prevent congestion and problems, not only for the new service but for others already in place.
Another factor is the total volatility at the service layer. A lot of new services and service changes in a short period of time, reflecting a variety of demand sources that might or might not be stimulated by common drivers, could generate a collision of requests that might have the same effect as a single large service user. For example, an online concert might have a significant impact on transport traffic because a lot of users would view it in a lot of places. It’s also true that if services are ordered directly through an online portal rather than through a human intermediary there’s likely to be more and faster changes. The classic example (net neutrality aside for the moment) is the “turbo button” for enhanced Internet speed.
The final factor is SLA risk. Even a fast-cycle, automated, analytic model of transport capacity and configuration management relies on traffic changes. If those changes ramp rapidly, then it’s likely that remediation will lag congestion, which means you’re going to start violating SLAs. There’s a risk that your remedy will create changes that will then require remediation, creating the classic fault avalanche that’s the bane of operations.
I think where this ends up is that virtual networking at multiple layers will need to have layer or layer-group control, with behavior at the higher layer coupled by analytics and events to behavior at the lower layer. You don’t provision transport with services, but you do stimulate the analysis or capacity planning of lower layers when a service-layer change is announced. That lets you get out in front of traffic changes and prevent new services from impacting existing ones. Since virtual networks are explicit rather than permissive, they present a unique opportunity to do this, and it might be that the ability to stimulate transport-layer analytic processes will be a critical byproduct of virtual network services.