What’s Needed to make AT&T’s 5G Infrastructure Initiatives a Success

AT&T shared their vision for an open-source, white-box, and 5G future in a blog post, and any time a major Tier One does that it’s worth a look.  It’s fair to say that AT&T has been the most proactive of all the Tier One operators in the use of open technology to reduce network costs and improve service automation and agility.  I haven’t always agreed with their approach (ONAP is an example of a place I don’t), but I truly admire their determination.  I also admire the fact that their efforts alone, even the unsuccessful ones, are changing the dialog on the future of network infrastructure.

The AT&T blog opens with the comment that 5G demands an entirely new approach to networks, and I agree with that statement.  I think AT&T does hype the impact of 5G on IoT and other applications, but under the hype there’s a fundamental reality.  Network usage, and our dependence on network-related applications, has been increasing steadily and is certain to continue to do so.  What we want is always the same—more for less.  That’s a common buyer desire, but one that flies in the face of the reality of being a seller.  There, you need profit.

What AT&T’s statement comes down to is that we need to reduce the cost per bit of network services in order to sustain and improve current service levels (capacity and QoS) while at the same time maintaining a reasonable profit per bit for the operator.  For years now I’ve been blogging about the decline in profit per bit, a decline that’s the result of the fact that price per bit has fallen sharply as broadband speed expectations have risen, and that cost per bit has declined much more slowly.  Had the curves stayed on their track of 2013, we’d have crossed into a zone where ROI on infrastructure was too low to sustain investment.  All operators have taken measures to reduce costs faster, and AT&T is one of the leaders in that effort.

Cost per bit is made up of both a capital (capex) and operations expenses (opex) component.  In networks today, capex accounts for about 20 cents of each revenue dollar and opex for about 31 cents.  However, that opex isn’t all associated with what most would think of as “network operations”.  The largest component is customer acquisition and retention, and these costs are impacted both by the value and agility of the services and the effectiveness of customer support.  Most operators have recognized that the easiest way to improve cost per bit is to radically reduce the cost of network infrastructure.

NFV tried that with hosting of functions, but the problem that poses is that complexity rises quickly with NFV, increasing opex and limiting the net benefit.  A more promising approach is to attempt to control device costs more directly using white-box technology.  A white box is no more complex than a traditional vendor device, so the opex impacts are minimal.  Further, if you could build white-box deployment into a good service lifecycle automation approach, you could hit both cost components at the same time.

That’s what I think AT&T is working to do.  At the Open Networking Summit, as their blog says, they announced four specific initiatives, and they’ve already been very active in service lifecycle automation, so it’s fair to say they have five pieces to their strategy that we need to consider.  We’ll look at them starting with the most 5G-specific and moving to the most general.

The first AT&T open initiative is the RAN Intelligent Controller (RIC) for 5G, based on the work of the O-RAN group.  Mobile services have long been dominated by the vendors who could provide the RAN technology, and so an open initiative would potentially break the control these vendors have on 5G.  This is potentially the most significant initiative of the group for 5G because it could significantly reduce 5G deployment costs.  It’s also perhaps the hardest to realize, since open RAN (New Radio or NR in 5G) is a combination of hardware and software, and it’s not progressing particularly fast.  Many operators tell me they don’t believe there will be a practical deployment possible before 2020, and pressure on operators to deploy 5G quickly for competitive reasons could delay the impact further.  The software side of RIC is being turned over to the Linux Foundation.

The second initiative is the white-box router technology AT&T has been talking about for the last year.  These devices are intended to be service gateways for business customers, and AT&T has been deploying them in two international locations, hoping to expand these quickly to over 70 new locations by the end of the year.  These routers are cheaper by far than proprietary traditional routers, and so AT&T can deliver more capacity for less capex.  Operationally they’re roughly equivalent to the traditional devices they replace, and so there’s no negative opex hit as there would likely be had AT&T deployed the same logic as a VNF.

Initiative number three is the “Network Cloud” white-box switch, designed for edge missions as the data center switch.  This device is software-controlled by AT&T’s ONAP, and that illustrates the extent to which AT&T is relying on ONAP for its 5G mission, and overall for its operations automation.  Having a standard framework for data center switches, with switch software consistent and (because it’s open-source) more controllable, is an important piece of AT&T’s data center evolution.

The final initiative is the heavy use of fiber connectivity in metro infrastructure.  If in fact 5G will require more capacity per user and per cell, and obviously require more cells, then getting the connections made and fast enough to virtually eliminate the risk of congestion simplifies traffic management and operations automation significantly.

I think all of these moves, and other moves to use white-box cell routers, are both smart and likely to be effective.  The only question I have about the AT&T strategy, in fact, is whether ONAP’s architecture is up to the task.

Lifecycle automation at its very foundation is an event-handling process.  Events represent signals of a condition that changes the status of a service or service element, and that therefore requires handling.  I’ve worked on various projects for the handling of events in telecom services for about 15 years, and I developed a Java-based exemplar implementation for data-model-driven state/event coupling of events to processes.  This early work was based on the TMF’s NGOSS Contract and Service Delivery Framework (SDF) activity, and it proved the value of the event-to-process mapping in creating a distributable, scalable, resilient model for service lifecycle automation.

Model-driven state/event handling requires a model to do the driving, and ONAP was not designed around that principle, nor have they so far included the model-based approach in upgrades (I’ve asked them to brief me when they do, asked at each release if that feature was included, and have yet to be briefed).  It’s my view that without the model-driven approach, ONAP is just a monolithic management system.  Such a system poses a variety of risks, ranging from integration challenges when new gear or software is introduced, to scalability problems that could limit the system’s ability to manage a flood of events that might arise from the failure of some fundamental network component.

I don’t know whether ONAP will ever become truly event-driven, and obviously I’m unlikely to be able to influence that decision.  AT&T could.  What I’d like to see now from AT&T is a push to modernize ONAP, to absorb the cloud-native principles emerging and the model-driven state/event coupling of a decade or more ago.  If AT&T can manage to do that, or make it happen (since ONAP is open-source), I think their 5G strategy is ready for whatever happens in the 5G market.