Why Burying Costs in Bandwidth Might be Smart

We could paraphrase an old song to promote a new network strategy.  “Just wrap your opex in bandwidth, and photon your opex away.”  From the first, a lot of network design has focused on aggregating traffic to promote economies of scale in transport.  That has translated into equipment and into protocol features.  Many (including me) believe that well over three-quarters of router code is associated with things like capacity management and path selection.

How much opex could that all represent, and how much could we save if we just buried our problems in bits?  That would depend on what path to that goal we might take, and there are several that would show promise.

What’s the “shortest” or “best” alternative path in a network whose capacity cannot be exhausted?  There is none; all paths are the same.  Thus, it doesn’t make any sense to worry about finding an optimum alternative.  If we could achieve a high level of optical meshing in a network, we could simplify the task of route selection considerably.  That could reduce the adaptive routing burden, and the impact of route adaptation on operations processes overall.  An option, for sure.

Suppose we added electrical grooming layer technology to transport optics, so that we had a full electrical mesh of all the sites.  We now have a vastly simplified routing situation; everything is a direct connect neighbor, and so we only need to know what our neighbor can connect with at the user level.  That again vastly simplifies routing burden, so it’s another option.

One problem with these options is that both of them rely on an assumption, which is that routing-layer functionality would be adapted to the new situation and would create a cheaper and easier-to-operate device.  That assumption flies in the face of the interest of router vendors in sustaining their own revenue streams.

A second problem is that any change in router functionality would almost surely have to be based on a standard.  Operators hate lock-in, and they’d likely see any vendor implementation of a new router layer as proprietary unless there was a standard behind it.  Since the interfaces to the devices would be impacted, even an open-source solution wouldn’t alleviate the operator concerns on this point.

Since we’ve had these two options available for at least a decade (they’re more practical now because of the declining cost of transport bandwidth that’s come with optical improvements), we have to assume that there’s no obvious, simple, solution to the problems.  Let’s then look for a solution that’s less than obvious but still simple.

Technically, the challenge here is to define a standard mechanism for the operation of the simplified router layer elements, and provide a means of its integration into existing networks so that fork-lift upgrades are not necessary.  To meet this challenge, I propose a combination of the OpenFlow SDN model, intent-model principles, and Google’s Andromeda SDN core concept.

Google built an SDN core and surrounded it with an open-router layer based on the Quagga BGP implementation.  This layer made simple SDN paths look like an opaque BGP network, an “intent model” or black-box implementation.  It has the classic inside/outside isolation, and for our purposes we could implement any open-model IP software or white-box framework to serve as the boundary layer.  The protocol requirements would be set by what the black box represented—a router in a router network, a subarea in an OSPF/IS-IS network, or a BGP network.

This model, which is an SDN network inside a black-box representation of a router or router network, is then the general model for our new router layer in a drive to exploit capacity to reduce opex.  In practice, what you’d do is to define a geography where you could create the high-capacity optical transport network, and possibly the electrical-layer grooming that would make every network edge point adjacent to every other one within that transport domain.

You’d then define the abstraction that would let that transport domain fit within the rest of the IP network.  If you could do an entire BGP AS, for example (as Google did), you’d wrap things up in a Quagga-like implementation of BGP.  If you had a group of routers that were convenient to high-capacity transport optical paths, and they didn’t represent any discrete IP domain or subarea, you’d simply make them look like a giant virtual router.

One purpose of this boundary layer is to eliminate the need for the SDN central control logic to implement all the control-plane protocols.  That would add latency and create a risk that loss of the path to the controller could create a control-plane problem with the rest of the network.  Better to handle them with local implementations.

The benefit of this approach is that it addresses the need for a phase-in of the optical-capacity-driven approach to networking.  The risk is that it partitions any potential opex improvements, and of course having efficient operations in one little place doesn’t move the needle of network- and service-wide opex by very much.  This benefit/risk could easily create an unfavorable balance unless something is done.

A small-scope optically optimized deployment would generate a minimal benefit simply because it was small.  It could still generate a significant risk, though Google’s Andromeda demonstrates that big companies have taken that risk and profited from the result.  The point is that generally you have to have a large deployment scope to do any good, but that large scope tends to make risk intolerable.  Is there a solution?

It would seem to me that you need to consider optically dominated transformation of IP networks after you’ve framed out an intent-model-based management framework.  That framework would, of course, have to focus on electrical-layer (meaning IP) devices, and so it’s out of the wheelhouse for not only the optical vendors, but the network operations types who the optical vendors engage with.

When I ask operators why they don’t plan for this sort of transformation, what they say boils down to “We don’t do systemic network planning”.  They plan by layers, with different teams responsible for different layers of technology.  Things that require a harmonization of strategy across those layers are difficult to contend with, and I’m not sure how this problem can be resolved.

One possible solution, the one I’ve always believed had the most potential, was for a vendor to frame an intent-based management model and use it to do “layer harmonization”.  That hasn’t made much progress in the real world, in no small part because management itself forms a layer of network technology, and operators’ management processes are further divided by network and operations/business management.

I’d hoped that Ciena, when they did their Blue Planet acquisition, would develop this layer harmonization approach, and I still think they had a great opportunity, but the operators themselves blame themselves more than Ciena or other vendors.  They think that their rigidly separated technology enclaves make it very difficult for any vendor to introduce a broad technology shift, or an idea that depends on such a shift for an optimal result.  They may be right, but if that’s the case then who is likely to drive any real transport-centric revolution?

Maybe the open-model network movement?  Open-model networking assembles pieces that are united in that they’re based on open hardware/software technology, but divided in just about every other way.  Somebody has to organize the new system of components into a network, and it might be the operators themselves or an integrator.  Whoever it is may have to deal with the whole layer integration problem, and that may lead them to finally take full advantage of the transformation that simple capacity could bring to networking.