Can We Simplify Networks to Improve Economics?

We have network infrastructure, and we’ve had several models describing how it might be built better, made more efficient.  So far, none have really transformed networks.  We knew, through decades of FCC data on spending on telecommunications, what percentage of after-tax income people were prepared to devote to network services.  We now know that the percentage hasn’t changed much.  Thus, we have a picture of network services where revenues are static and where cost controls have proven difficult.  What happens now?

I’d followed the FCC “Statistics on Common Carrier” for all the years the report was published.  In it, the FCC offered the insight that telecom services accounted for 2.2% of disposable (after-tax) income for consumers.  The most recent information suggests that it’s between 2.5% and 2.7%, and IMHO the difference from the number decades ago is within the margin of error for the calculations.  From this, I think we can say that there’s no credible indication that consumers will pay more over time for communications.  Business spending on networks, as all operators know, is under considerable cost pressure, and things like SD-WAN promise to increase this pressure.  Revenue gains over time, therefore, are difficult to validate.  Losses might be more credible.

Operators, again IMHO, have known this for a long time, and their goal has been to reduce costs—both capex and opex.  Capex currently accounts for about 20 cents of each revenue dollar.  Another 18 cents are returned to shareholders.  “Process opex”, meaning the operations costs associated with infrastructure and services, accounted for about 29 cents per revenue dollar in 2018, the year that it peaked.  Under pressure from tactical cost-savings measures, it’s fallen to about 27 cents in 2020.

Capex reductions of about 20% have been the goal of operators, meaning a reduction of 4 cents per revenue dollar.  SDN and NFV (the “First Model” of change) were both aimed at framing a different model of networking based on more open-commodity technology, but neither have produced any significant reduction in capex, and neither has been widely adopted.  Most recently, operators have looked at a combination of open-source network software and open-model “white-box” hardware (the “Second Model”).  They indicate that this combination appears to have the potential to constrain capex growth, and perhaps reduce it by 10% if all the stars aligned.  The smart planners tell me that’s not enough either.

Some operators have been exploring what I’ll call the Third Model, and I’ve mentioned it in some earlier blogs.  This model says that the network of the future should be built primarily as an enormous optical capacity reservoir, capped with a minimalist service overlay.  You reduce both capex and complexity by burying network problems in oversupply, which is becoming cheaper than trying to manage a lower level of capacity.

How much of what goes on inside an IP network relates to actually forwarding packets?  My estimate is 15%.  The rest is associated with adaptive traffic management and overall network management.  If we stripped out all of the latter stuff, could we not propose that a “router” was very much like an SDN forwarding device?  If the forwarding tables in this stripped-down white box were maintained from a central control point (which even Segment Routing proposes as an approach), the entire service layer of the network could almost be a chip, which would surely cut the capex of operators.

The capacity-reservoir approach presumes that if there’s ample capacity in the transport network, and transport-layer facilities to reroute traffic if a trunk fails, the service layer doesn’t see many (if any) changes in status, and the central controller has nothing to respond to.  That eliminates scalability issues; there can be no floods of alarms to handle, no need for mass topology changes that require resetting a bunch of forwarding tables.

The central control point would then look like what?  Answer: a simulator.  When something happens, the control point models the new conditions (which, due to the enormous residual capacity, involves only a few conditions to model) and defines the new goal state.  It then gracefully adapts the service layer.  No hurry; the transport layer has already taken care of major issues, so we’re just tweaking things.

From the opex perspective, this has the advantage of creating a simpler network to operationalize.  Our Second Model of white-box open networking still has the same devices as the traditional IP networks do, and so there’s no reason to think it’s any inherent opex advantage.  The Third Model would have very little complexity to manage, and all the real logic would be in the central control point.  In most respects it would be an evolution of the original SDN/OpenFlow model, with much less pressure on the control point and much less concern about how it would scale.

We could combine this with the notion of network virtualization and intent modeling to make it better.  We could say that IP networks are made up of domains of various types and sizes.  Some domains are “domains of domains”.  If each domain had its own controller, and each controller had the objective of making its associated domain look like a single giant router, then the scalability issues go away completely, and you also have the ability to take any number of domains and translate them to the new model while retaining compatibility with existing infrastructure.

Google did something much like this with its Andromeda SDN model, which surrounded SDN with a ring of “BGP emulators” that proxied the control plane protocols of IP into/from the Andromeda core.  In effect, an IP packet like BGP becomes a request for or input to control-point information.  The domain is a black box, one that looks like an AS (in this case) but is really SDN.

It’s likely we could tune up SDN and OpenFlow to optimize for this model.  It’s likely that we could work out a way to migrate MPLS and segment routing to this approach too, which further defines an evolution from the current router infrastructure to the Third Model.

Making the Third Model work doesn’t require much invention, then.  We have large-scale implementations of all the pieces of the puzzle.  We have transport networking that include agile optics, using for example, the MEF 3.0 model.  This lets the transport layer present what’s essentially a constant QoS to the service layer.  We have SDN/OpenFlow implementations.  We have the P4 programming language that lets us build forwarding devices from suitable chips.  We have central control point software, and we have the suitable IP control-plane emulation tools.

I think the picture would be better with some intent-based model-and-management overlay, because it would make the transformation to this picture more efficient.  The problem is that such an approach has been possible for a decade and never taken seriously.  It’s hard for me to promise it would be taken seriously now.  The Third Model I’ve cited has the advantage of seeking simplicity.  Maybe that’s what everyone has craved all along.