A Tech-Conference Vision of the Future of Networking

It’s always smart to get a broad view of the future, and conferences can provide that. A good article titled “Open source, programmability, and as-a-service play a big role in future networks” discussing what happened at the Future:Net 2021 symposium was provided in Network World, and so we’ll start a two-part view of the future by analyzing what the article reported. As always, I’ll offer my comments on the points, and we’ll follow this blog with the results of my own attempt to get a vision of the future from the stakeholders, including users and suppliers.

One early point the article makes is that networks and networking are going to be progressively more software-centric over time, and that this trend is at least strongly influenced by (if not largely created by) open-source software. That raises an interesting question about the chicken/egg relationship; is software driving this change, or is a desire for more flexibility driving software? That’s our first point of discussion.

The case for a software-centric shift is largely based on the fact that networks have been under cost pressure for at least a decade. Users have traditionally spent a combination of budget dollars and project dollars on networks, with the former sustaining what’s there and the latter bringing new business cases, new technology, and new spending. Over the last decade, the share of project dollars has declined sharply, meaning that networks today are largely in sustaining mode, and that’s all about cost conservation. Open network technology is a smart way to conserve costs, and even with white box versus server hosting, that approach demands software. Open source is a popular option because it’s free.

The case for a flexibility-based shift is based on the fact that network technology is changing at the functional level, and that these sorts of changes are more easily accommodated with software, even when the software and hardware are bundled and proprietary. Recently, though, there’s been a push for “disaggregated” network devices, with software and hardware unbundled. In theory, at least, that opens the possibility that a user might switch network software to gain flexibility for existing devices. Once that realization hits, it’s a smaller step to depend on open-model networking with white boxes and open-source software.

I think this one can be called a kind of co-evolution relationship. Both the software-centric and flexibility-centric drivers have been around long enough that users probably can’t tell which of the two came first for them, or separate their impacts on current planning. The net is that the statement is true; we’re increasingly thinking network software, and open-source at that, for network needs.

Almost everything that runs software needs an operating system. One role that the OS plays is abstracting hardware so that different implementations of the same thing (like different disk drives or graphic chips) are harmonized. One interesting thing about switches is that the OSs tend to be network operating systems or NOSs, meaning the network functionality is built in.

There are probably two dozen white-box NOSs available, and SONiC is surely one of them. It has two big advantages, in fact. The first is that Microsoft uses SONiC in Azure infrastructure, so the NOS is tested under real-world, high-scale-and-traffic, situations. The second is that Microsoft (who developed it) donated it to the Open Compute Project, and so it’s been jumped on by a large number of vendors, including Broadcom, who makes the switching chips used in the great majority of high-performance white-box designs.

SONiC is designed with the standard NOS goal of abstracting the hardware, and that includes the switching chips used in high-performance white-boxes. That makes SONiC switching/routing software portable to most any popular white-box configuration. However, some users are concerned about support, since SONiC is an OCP project and doesn’t have a specific corporate source to fall back on. There are other NOSs used by enterprises, and right now SONiC doesn’t have even half the market. According to enterprises I talk with, the main ones include Arista (the EOS family), Arrcus (ArcOS), Pluribus (Netvisor ONE).

The key point for enterprises is integration and support. Most enterprises are best served by finding a white-box supplier who has the products they need, and letting them provide a device and bundled network OS. For operational reasons it’s best to get the same NOS for everything, and that consideration means it would be smart to look for a product source that matched both current and future demand.

The network operator community might prefer another option, like the ONF Stratum NOS, because Stratum includes P4 flow-programming language support that aligns it with more new switch platforms. Broadcom, though, doesn’t support P4 (that might change with their recent anti-trust settlement but there’s no reference to it that I’ve seen), so you can see that there’s already a bit of parochialism in the open-model, white-box world.

The article moves then to what I think is the most interesting topic, which is the relationship between networks and network applications. Today, we operate networks to be largely application-independent, and network requirements are loosely set by aggregating the requirements of the applications. The future, according to the article and the conference, is one where we program application-specific behaviors rather than the network, and the summation of behaviors happens inside the cloud. Networking is a service, a NaaS, in short.

This is, IMHO, another way of talking about the whole SASE concept. A SASE is a gateway to a collection of services, a place where application needs are explicitly recognized and commissioned via the SASE/NaaS gateway. Thus, each application tells the box what it needs, and the box brokers it. This, obviously, means that the box likely doesn’t have much of an idea of what the network collective is, and that means that for this to scale the operations process has to be automated.

The way that happens, said the conference, is that a goal state is recognized and AI/ML learns to achieve it from whatever state an issue might put the network in. Since the network really doesn’t have a collective SLA (that’s an application property), this has to be able to address both the application-specific SLA and the collective network behaviors. All the more reason to look for automation!

This could lead to hardware-as-a-service for things like data center and LAN switches that cannot be virtualized and delivered through SASE/NaaS in a cloud-like way. Hardware as a service would mean that users would pay on subscription for hardware ports they need, and the HaaS provider would provision things and sustain operations. Obviously, this sounds a lot like Cisco’s concept, and I didn’t agree with it when Cisco touted it, and I still don’t. I think it is likely that as SASE/NaaS evolves, there are likely to be management services offered from the cloud to control and sustain local network and data center switches. I don’t think most enterprises will accept the obvious security concerns this would create, though, and I don’t thing ports-as-a-subscription-service is likely either.

Conferences are always a balancing act. On the one hand, they have to address things that users actually care about, but on the other they have to be 1) exciting and 2) favor the positioning and interests of vendors who drive the process and provide speakers. It’s been my experience that the further talk advances into the future, the more of the excitement-and-vendor-interest driver influences what’s said. That seems to be the case here.

I didn’t attend this conference (those who know me realize I almost never attend any conferences), so I’m happy to get a good summary of the key positions. I think there’s a lot of truth in them, but some of the views don’t match market views I’ve heard. Tomorrow I’m going to talk about what enterprises, vendors, and providers themselves say about key technologies.

The Telco “Crossroads” is an Illusion

Are operators really at a crossroads? They’ve suffered through continuous profit-per-bit pressure on traditional network services for over a decade. Improving that means enhancing revenue per bit, reducing cost per bit, or maybe both, and the choice operators make will have a major impact on the revenues vendors can expect, and which vendors can expect them. It will also impact just what technology innovations we can expect and in what area those innovations could be expected. In short, it could mean a lot, and while we’ve been stumbling around on this crossroads issue seemingly forever, I think there are some clear signals emerging.

Light Reading offered us a good starting point with an analysis of the recent AT&T deals with Microsoft and Google. Those deals represent telco/cloud relationships, and obviously some new-thought on the part of AT&T. So it’s far to ask whether these deals are cost-driven, revenue-driven, or both. I agree with some of the points, but not all.

The article says that “many feel as though the telecom industry has outsourced innovation to others – most notably the West Coast technology giants, including the hyperscalers. These companies tend to adopt new technology more quickly than the telcos…but I don’t know that we should expect them to innovate on the services front any more efficiently than the telcos, who have the knowledge the others don’t.” The problem with this statement is that it presumes that “the services front” lies in areas where telcos have knowledge, and I don’t think that’s true.

Pushing bits is pushing bits at the technical level. If they go from “A” to “B” the method of transport matters only insofar as it relates to cost. Yes, we’ve talked about elastic bandwidth and stuff like that, but while those discussions are decades old, they’ve never discovered any really new and useful strategy. What’s valuable to the user in this space is what’s cheaper, and what’s cheaper to the user isn’t going to improve profit per bit for the supplier. The fact is that the service front, or perhaps “frontier” is really at a higher layer than today’s services, and it’s value that’s added there that could generate new revenue. For that, the cloud providers surely have more knowledge than the telcos do.

The two AT&T relationships with public cloud providers seem aimed at both sides of the profit-per-bit equation; cost and revenue. Microsoft’s deal seems aimed at creating hosted 5G service elements on the public cloud, in combination with an unspecified amount of telco-provided edge hosting. Google’s seems to focus on applications that would build on those 5G connection services, which clearly live at that higher layer we always hear about. The question with regard to Microsoft is whether costs are controlled in the long term, and with regard to Google whether revenues are first, credible, and second, generated for the telcos or the cloud providers.

Cost management through outsourcing 5G hosting to the public cloud is a trade of “first-cost” deployment of resources against longer-term costs created by the difference between what they’d pay a public cloud provider and what they could obtain from their own infrastructure. We could expect that operators would pay about 25% more for cloud provider services versus what their own cost would be. In the short term, when operators like AT&T would have to deploy a couple hundred metro data centers, the cash flow would likely be better using public cloud.

Most of the operators deploying 5G are doing so within their home regions, meaning that they have a market geography that corresponds to the physical distribution of facilities. Larger operators in competitive areas like the US, EU, and parts of Asia have market geographies that extend beyond the limits of their normal facilities. AT&T is obviously one; they have a home region in the western/southern US but a wireless presence throughout the country. It would be expensive to deploy data center hosting resources for edge computing and 5G over even an operator’s own region, much less their likely larger market geography. Getting started with the public cloud makes sense.

Over time, the balance shifts; my model says that by 2026 or so, relative cash flow for the public cloud position would slip behind the do-it-yourself choice, and the gap would widen continuously. Two factors could counterbalance this seeming long-term loss. First, nearly everyone (including most of the people I talk with in the telco world) think the telcos don’t know how to do their own telco cloud hosting of 5G. Recall that AT&T has probably made as much or more progress in that space as any operator, and it’s their cloud deals we’re talking about. Second, there’s no reason why telcos couldn’t claw back their 5G hosting when they had sufficient volume of deployment and had acquired enough skill to justify it. AT&T’s deal with Microsoft plants telco-oriented skills in Microsoft’s cloud organization, but software progress in edge hosting of 5G elements would surely develop broadly over time.

That telcos don’t understand the cloud is important here, because most of the arguments against the partnerships between telcos and cloud providers are based on the assumption that the telcos are surrendering their future. That argument is weak if the telcos can’t develop what’s needed to assure their future in the first place, and as I’ve noted, that’s a view widely held even within the telcos themselves. The biggest problem the telcos have had is the notion that they’re special, that generalized cloud features can’t be harnessed to support their needs. Baloney.

Telcos (and their vendors) have resisted using general, open, technology for ages, and it’s pretty clear now that there was no technical justification for their intransigence. There’s no reason why open cloud technology can’t be pressed into service, but I do agree that there are issues related to telco hosting of services that aren’t optimally addressed in mainstream cloud-think. That’s where having a transfer of technology from AT&T to Microsoft could help. It’s like AT&T tossing the ball to Microsoft and saying “Score with this!” Since Microsoft, of all cloud providers, has already acquired a dose of telco-think via its 5G acquisitions (like my favorite, Metaswitch), they have a quicker path to enlightenment.

But Microsoft doesn’t have the same skill in evolving an MEC position, which is why I think the Google deal is smart. Google is the force behind Kubernetes, Istio, the largest core network use of SDN, and a bunch of other application-level innovations. If there is a reality to exploiting edge computing, I’d expect Google’s cloud-native supremacy would let them discover it first. Since AT&T needs speed here to increase the revenue/cost spread again, Google is the smart choice.

There is still one cost-side question, though, and that’s back to capex and opex. All network operators, but especially the telcos, will need to link their network operations tools with their service/business operations framework, the OSS/BSS in the telco world. That means that both 5G hosting (the Microsoft side) and MEC exploitation (the Google side) will require that. Will the two cloud providers cooperate to create a common framework? Hardly. Will one accept the framework of the other? Only if AT&T and other operators demand it. Thus, we still need a management framework for 5G/MEC (which current standards don’t really define, only connect with). AT&T’s vision of management, ONAP, is flawed, so getting some realistic input from some source is essential, or the short-term cost-management goals critical to this whole evolution to the cloud won’t be met.

All of this is important because telcos are not at a crossroads at all, they’re really at the junction of two lanes that go the same place in the end. There is no long-term profit-per-bit strategy that doesn’t include both a transformation in costs and a transformation in revenues. Future revenue-generating services will have to be very cost-effective to actually contribute to profits, and since these services will be more OTT-like, telcos can’t allow their core business to fall to a loss leader while competing with players who don’t have those losses to cover. The telcos can leverage and exploit cloud provider expertise in the near term, but they need to wean themselves off it in the long term.

Assessing the ONF’s Broader View of 5G and MEC

One thing that I think the last couple weeks have made clear is that, while there’s a clear business relationship between 5G hosting and multi-access edge computing (MEC), the technical relationship has been difficult to pin down. One reason is that most of 5G hosting discussions have centered on 5G RAN, and the open O-RAN model, and the MEC edge has to be more comprehensive. Can we find a broader perspective?

Fortunately, there are often multiple sources of insight for new technology implementations these days. For 5G, the core specifications are from the 3GPP but as THIS blog from Red Hat shows, there have been a number of initiatives to open up the 3GPP model. The best-known of these is from the O-RAN Alliance, which has done key work on creating an open model for the RAN. A less-known source, the Open Networking Foundation (ONF) has also supplied a reference architecture relating to 5G and multi-service edge computing (MEC). Unlike other specifications, the ONF ones go all the way from high-level functions to the elements of open-model hardware, so they offer an interesting view of what a totally open 5G/MEC strategy might look like. To do that, we have to take a brief look at 5G itself.

Mobile networks have historically been made up of three high-level elements, the radio access network (RAN), the mobility management piece (Evolved Packet Core in 4G, or 5G Core in 5G), and the subscriber management framework that was supported by the IP Multimedia Subsystem or IMS in 4G. For 4G networks, these were all normally built on proprietary elements that conformed to open (3GPP) interfaces, but more often than not came from a single vendor.

5G was designed to make use of open, hosted, components, though the support for that approach was weak in the 5G New Radio (NR, which is 5G RAN). The O-RAN Alliance strengthened the open-model RAN piece considerably, and this is important because early 5G deployments are what’s called “non-standalone” or NSA, meaning that 5G sits on the IMS/EPC mobility management piece. Full 5G implementation would require 5G Core, which is just getting to the point where vendors and operators are willing to bet on implementation.

The reason 5G Core is important is that most 5G functionality, and in particular most of the elements likely to be hosted in a server pool, are part of 5G Core. As I’ve said in prior blogs, 5G feature hosting is logically concentrated in metro areas, because metro areas are the roaming range of most users, and in particular the area where sessions would have to be maintained during roaming. Because 5G Core is metro-centric, it covers the same area that edge computing would likely focus, which is what makes 5G a potential driver for edge infrastructure.

The “however” here is that when 5G RAN and O-RAN are included, 5G reaches all the way to the towers and involves places where user density is too low to justify a server resource pool. Thus, open 5G appliances are important to fully realizing an open 5G deployment, and that means that not only do we need open interfaces, but common management practices that can span both traditional servers and specialized devices. This is one reason looking at the ONF approach is helpful; they offer a full solution architecture for open infrastructure. They also include 5G Core, which means that their stuff would definitely either sow the framework for MEC, or create a silo that would make MEC evolution more complicated.

The ONF approach to 5G/MEC has two components. Aether, the high-level 5G and mobile model, and it sits on top of the SD-RAN and SD-Core elements that map 5G/mobile to SDN. Those in turn sit on top of SD-Fabric, which is the network-layer model. SD-Fabric is in turn made up of four pieces, the ONOS SDN controller with broad capabilities, the P4 forwarding language, the PINS “adapter” element that creates a hybrid IP/SDN control plane and P4 compatibility, and the Stratum white-box operating system.

To put this into a more familiar perspective, the O-RAN specifications of the O-RAN Alliance map to the SD-RAN piece of the ONF model. The greater scope of the ONF story is one differentiator, and the fact that ONF (not surprisingly) focuses on SDN is another. The O-RAN model can be mapped to an SDN implementation of the user/data plane, but that’s not mandatory. With the ONF model, you can support a hybrid SDN/adaptive-routing model (via PINS) but SDN is the presumptive architecture.

White-box elements would normally run the Stratum operating system and the P4 language driver to provide flow-switching support for custom chips. On top of that is ONOS, which is both a distributed SDN control model and an application platform. While ONOS is modular and distributed, it’s “logically centralized”, and can be distributed to both servers and white boxes. This stack makes white boxes a logical part of the “edge cloud”, and unifies operations.

Functionally, as I’ve noted, the ONF 5G/MEC approach is a superset of the O-RAN model. At the RAN level, they are functionally compatible in that they both support a more elegant and open disaggregated RAN model than the 3GPP 5G NR RAN specifications would, they both include both near- and non-real-time RAN Intelligent Controllers (RICs) and they both include an explicit cloud-hosting element. The ONF model is what the ONF calls an “exemplar” of the O-RAN approach, so the correspondence isn’t a surprise.

The biggest difference between the two is that O-RAN allows an open-model implementation where the ONF defines one. That accounts for all the extra elements in the ONF model, and for the fact that the ONF model describes more at the implementation level, all the way down to devices. That, in turn, makes it extremely useful when we want to consider exactly how an open-model 5G deployment would look, and how it might relate to edge computing in general.

The ONF model defines the “edge cloud” in an implementation sense, and that lays out the way that edge-cloud applications are connected and sustained. It does not mandate a specific application architecture for edge-cloud applications, though it encourages modular, even cloud-native, approaches as well as supporting NFV at the connection level. The way this is done is best related by looking at the ONOS RIC.

The real-time RAN Intelligent Controller is the heart of the open-model O-RAN architecture. The ONF approach creates a multi-layer real-time RIC whose top is the A interface defined by O-RAN and whose bottom is the southbound E2 interface. The first layer above the bottom is the distributed ONOS core functions of the RIC, which present their own API upward to RAN applications, and those in turn feed the northbound A-interface API. You can imagine the adaptation of this structure to generalized MEC requirements, but there’s no diagram of that in the ONF material (at least none I could find).

5G RAN and O-RAN specialization is built around the ONOS layer, so we could imagine a lower layer to ONOS that would present MEC connectivity APIs to underlying infrastructure, and APIs to edge applications running above and in parallel with the RAN components. The edge applications would not be expected to work like 5G RAN would work, but they’d have the same configuration, management, scaling and resilience, and connectivity options available.

This is, I think, a step forward in defining 5G as an edge application, and defining how other edge applications could be created. It’s not a complete picture because it doesn’t define the structure of such an application. 5G, arguably, isn’t a general edge application, but an application of a generalized edge, so perhaps that’s not something we should expect. What we do have is a picture of how edge applications could be structured/orchestrated, and how a 5G RIC model could be generalized to provide that.

A step, but not an arrival in force. What still remains is a framework for edge applications that addresses all the issues those applications might create. For example, can components shift from one edge to another if the user moves or user distributions in multi-user applications change? How does the optimum edge position get calculated? It seems likely that there’s another layer of tools needed to avoid application-specific silos, and we still don’t have a full understanding of what that new layer might look like. I’d sure like to see someone take a stab at it.

A Deep Dive into Edge Opportunities, Drivers, and Models

Everyone loves metro and the edge these days, it seems, and the last couple of weeks have proved that. Juniper did a big thing on their Cloud Metro strategy (see my blog) and Ciena just launched a whole campaign based on “Next-Gen Metro & Edge”, and IBM did a keynote at MWC on its telco-centric edge strategy. Every cloud provider now has an edge strategy too, so it’s obviously worth taking some time to consider why metro and the edge mean so much to so many, and whether we’re on course to realize the benefits everyone hopes for. Warning: This is a long blog!

It seems generally accepted that not only is 5G an edge application, it’s very likely the application that would justify the initial edge deployment, the step that other applications would then exploit and expand. This could be true only if 5G deployment was sufficient to create proto-edge facilities, and that those facilities were suitable for the follow-on edge missions. To see if both those conditions are met, we have to look deeper at both edge requirements and 5G features.

Most Internet users think of the Internet as a big global network touching everyone and everything, which is only sort-of-true. Yes, there is global connectivity, but in a traffic and topology sense, the Internet is really concentrated between you, the user, and a local source of content. Content delivery networks (CDNs) really started that revolution by facilitating content delivery, especially video content, through local caching. Over time, there have been initiatives to cache processes in CDNs, and this could fairly be said to be the on-ramp to what we now call “edge computing”.

Edge computing is exciting and troubling at the same time. It’s exciting because it promises to provide hosting for applications that are sensitive to latency, potentially opening a whole new set of applications not only to “the cloud” but perhaps facilitating their very development. It’s troubling because we really don’t know exactly what those new applications are, how they’d be developed, or whether they could make a business case for initial deployment and long-term use.

Practically speaking, the “edge” is more likely to be the “metro”, as it usually is with CDNs. While it’s possible to push hosting or caching outward toward the user, the reality is that forward placement quickly loses its appeal. If the edge is “close to users” then it follows that each edge point supports a smaller number of users, making its per-user cost higher. You can’t give every user a resource pool of their own, so economies of scale dictate you move that pool inward until the benefits of scale and the increases in latency balance each other.

That balancing act is important, because it dictates that we rethink how we structure the access/metro relationship, and the metro network itself. From an efficiency and trade-off point of view, we can create more effective edge computing if we can improve the connectivity from user to metro, and within the metro complex, so that we can distribute resources for efficiency without increasing latency to the point where we lose some of the benefits of the edge. However, just like there’s such a thing as a “cloud-native” application, an application designed to take full advantage of the cloud, there is likely to be an “edge-native” model that can take advantage of this new access/metro relationship. We may not be on exactly the right path to find it.

Going back to my earlier point, the biggest asset in getting edge/metro hosting into the real world is 5G, but it’s also the biggest potential poison pill. Because 5G standards almost dictate edge hosting, particularly for the RAN, it represents a potential deployment driver that could get the critical hosting assets into metro locations, where applications above or beyond 5G could exploit them. However, 5G and in particular O-RAN mandate the notion of the RAN Intelligent Controller or RIC, and RICs are a bit like orchestrators and a bit like service meshes. The telco community has been conspicuously ineffective at defining its own management and orchestration model in a generally useful way, or even in a way that really supports its own goals. RICs, done wrong, could poison the edge by encouraging a set of components that can’t really support generalized edge missions.

We know two things about metro/edge. First, it works best if there is exceptionally good, low-latency, high-capacity, connectivity within it. Everything at the access level doesn’t have to be meshed, but high-capacity access should feed a thoroughly interconnected metro area pool. Second, it will require a software architecture and toolkit that defines, even implicitly, what “edge-native” would look like, so that everything that’s deployed there (even 5G and O-RAN) can use a single set of features. Both Juniper and Ciena are contributing to the first of these, but we’re still struggling to see how the second requirement can be met.

One thing that seems clear is that “edge-native” is a variant on “cloud-native”, meaning that the tools for the former will likely be related to (or expansions of) the tools for the latter. In fact, it’s my view that edge computing simply cannot work except with cloud-native elements. The reason is that if latency control is a prime requirement, and if users and user relationships create dynamic component relationships within the metro, then we can expect to need to redeploy things often to accommodate the changes. A good example of this is gaming.

Gaming is a popular proposed edge mission because latency in a game, particularly in a multi-player game, distorts the sense of reality and creates situations that could advantage or disadvantage some players by changing their timing of action relative to the timing of others. The problem is that multi-player games often involve players that are highly distributed. If a group of players within a metro area could be supported with consistent, low, latency, then their experience would be better, but making that happen means being able to shift both connectivity and hosting to optimize the flows for the current community of players/avatars.

The common way of speed-matching a group like this would be to equalize delay, but if that means delaying everyone to the worst level of latency, the result could be actions that lag significantly between being ordered by the player and becoming effective in the game. We can’t equalize on worst case, at least unless we can always ensure that “worst” is never really bad. A single town probably won’t host a lot of players, but a big metro area could well host tournaments.

Gaming isn’t the only application of edge computing that poses questions. Exactly what role the edge would play in autonomous vehicles is, in my view, rarely addressed rationally. To start with, early autonomous cars would share the road with a host of non-automated vehicles, and then there are those unruly cyclists and pedestrians. Central control of automobiles and highway vehicles in general makes no sense at all; on-vehicle technology is essential for accident avoidance and if that’s removed from the picture, the role of the network is simply to control the overall route, hardly a novel application. Where edge might fit would be in warehouse applications, rail transport, and other controlled situations where per-vehicle coordination was more a matter of planning than of reaction. These applications, though, could often involve dynamic movement within a metro, and that could in turn require redeployment of control elements to minimize latency.

For IoT, it’s almost certain that metro/edge services will involve coordinating public hosting resources with on-premises private technology. In 5G this would be helpful in addressing the access/backhaul portion of the network, where resources are too isolated/specialized in terms of location to justify a pool, but where unified orchestration of features would still be important, and there’s the matter of state control to be considered if we assume stateless microservices as the vehicle for building functionality. The more an application’s components are distributed, the harder it is to coordinate their operating state.

The orchestration piece, provided by the RICs in O-RAN as previously noted, needs refinement for the metro/edge. The RIC performs a mixture of tasks, including some that would align with cloud orchestration, some that would align with cloud service mesh, and some that align more with the lifecycle management tasks that the TMF supported via NGOSS Contract. As I noted earlier this week, the use of a data model to contain state/event tables and service information was the center of what’s still the best articulation of lifecycle management for services. There’s no reason I can see why the same thing couldn’t be applied to edge applications in general, which would address the issue of state control. For orchestration, we might be looking at some blending of serverless and Kubernetes principles, perhaps with some “service” functionality being handled by the former because the microservices would be lightweight and require only short execution times, and some handled as more persistent containers. This could correspond in concept to the near-real-time and non-real-time RICs of O-RAN.

There are half-a-dozen serverless adaptations for Kubernetes (Kubeless may be the best-known) and Microsoft highlights the union of the two for Azure. Recall that Microsoft has also launched a private MEC capability, which might facilitate the blending of customer-hosted edge resources with the metro edge services, even for transitory functions. The best approach, of course, would be to explore the right platform starting with requirements, and I offer THIS story as both an example of a requirements-based approach to cloud platforms, and as a candidate for the right answer.

The edge is going to be diverse if it’s going to be important, but it can’t become a bunch of silos separated by distinct and incompatible middleware tools. Some consistent tool setup seems critical, and there that setup could come from might be the competitive question of our age. Do cloud providers dominate, or do software providers?

In my modeling of IoT opportunity, I noted that the early phases of the visionary form of IoT were likely to involve more premises hosting than cloud resources. I think that’s true for edge applications in general; early latency-critical stuff is likely to be supported with local on-prem resources, with things moving to the edge/cloud only when the business case is solidified. Software vendors might then have a natural advantage. Thus, it’s smart for cloud providers to push an elastic vision of edge computing, with the goal of seizing and exploiting the premises-centric period and gaining early traction when the scope of the edge expands into metro and cloud.

If the cloud providers do push their elastic-edge vision, they could create a significant problem for software vendors who want to create a “transportable edge”, a software package that could run on premises and in the cloud, but in the latter case without drawing on specific cloud-provider services other than simple hosting. Cloud providers who want differentiation and long-term edge revenue seem to have recognized the opportunity, but software vendors may be slower to do that, in which case edge software could be much more cloud-specific down the line. Then the race to edge-native would both define and expand the role of the public cloud in the future.

MWC didn’t answer the question of how 5G hosting morphs into MEC, though the term was surely batted around a lot. Because that’s the case, we can’t yet say how the edge will unfold, and we can’t say whether 5G will push the right vision of the edge. Not only is the edge up for grabs, competitively, it’s up in the air in terms of its own evolution.

Are We Bridging the Digital Divide or Deepening It?

How will shifts in technology, and new government programs, impact consumer broadband? We’re getting some hints on what the US infrastructure bill could do, and we’re also seeing competitive and technology shifts in commercial consumer broadband. There could be some major changes in the works, but through it all is the thread of a core issue that still doesn’t get enough attention.

Consumer broadband, on the demand side, is being shaped by the increased interest in (well, maybe “demand for” is more accurate) streaming video. A family could well be streaming three or four video sessions at a time, each in 4K, and people could also be playing games, and all the while someone might be having a Zoom/Teams call. Certainly from an entertainment perspective, and with COVID often for work and education, broadband is a critical resource. That’s why many believe that if broadband availability is limited in some areas, those impacted are at a serious disadvantage.

All broadband isn’t up to the task at hand. From the first, the majority of broadband connections exploited existing facilities, which almost always meant either twisted-pair loop plant or CATV cable. Successive improvements to CATV via DOCSIS versions have enhanced the ability of cable to deliver broadband services, but the length and quality of the copper loop plant varies considerably, and DSL technology can’t keep up with modern competitive technologies.

Fiber is foremost among those, of course, but any new in-ground broadband delivery mechanism faces the problem of customer density, what I’ve called “demand density”. This is the problem that’s threaded its way through broadband from the first. Where demand density is high, a given deployment of media will serve a lot of users, and so offer a reasonable per-user cost. Where it’s lower, the cost of the media will either constrain operator profit and incentives to deploy, or price the service out of the market. The more rural an area, the lower the demand density, and the greater the chance that nobody will offer fiber broadband.

The “digital divide” is a name given to the phantom line that separates areas with lower and higher demand density. Rural users fall on the wrong side of it, and so there’s been continued public policy debate over what to do about that. Australia launched its NBN corporation to try to subsidize broadband and equalize services, and the US is now considering an infrastructure bill that has money in it to cover rural broadband costs, as well as policies to decide what “broadband” means. At the same time, operators (cable and telco alike, in the US) have been struggling with how to address the challenges, not only of the digital divide but for users who could be served, but perhaps with limited profit.

AT&T is a poster child for the latter. As I’ve pointed out in past blogs, AT&T has a demand density that’s much lower than rival Verizon, and so when Verizon launched its FiOS initiative, AT&T didn’t counter with its own push on fiber to the home. In fact, it’s been a bit negative on FTTH, and also negative on the use of 5G for residential/home broadband. Now, that seems to be changing.

Not surprisingly, AT&T is finding that it wins deals where it offers FTTH, and loses them where it offers DSL. It makes sense to tout a new emphasis on fiber for AT&T, but it’s important to understand that singing PR and trenching fiber are worlds apart. AT&T’s demand density challenges mean that while it can deploy fiber in areas where it does have dense opportunity, it will exacerbate the digital divide among its own customers by doing that.

One way those divided customers could be lost is for a competitor like Verizon or T-Mobile to rush in with a 5G millimeter-wave or 5G cellular-to-the-home solution, which you’ll recall AT&T has also dissed in the past. Thus, I don’t think there is much chance that AT&T will hold only to an FTTH response to broadband competition. They’re going to have to support 5G home broadband in one or both forms, period.

The most obvious reason is the current competitive trend. If AT&T loses enough broadband customers to cable, they’ll have to fight to get them back later on. That will almost certainly mean discounts, and that will further lower profits. AT&T is under a lot of investor scrutiny right now, and any announcements that seem to threaten future financial pressure on the company, particularly that might signal a cut in dividends, will put executives under significant pressure from the Street.

The other reason is the public policy push the infrastructure bill represents. If billions are allocated for rural broadband, it stands to reason that there will be service providers who swoop in to get some of the money. These providers will have even greater demand-density challenges than AT&T does as the incumbent, so they’ll need a technology that doesn’t have as high a cost. Like 5G home broadband. More competitors, cheaper technology option? What can AT&T do but follow the same path?

Here is where the goals of public policy can flounder on the rocks of bad legislative drafting and lobbying, though. The broadband part of the infrastructure bill isn’t necessarily going to promote something as truly useful as 5G. The current rumor is that the bill will mandate 100Mbps Internet access, and that is an invitation to strategies that throw public money away.

How do you measure broadband performance? Traditionally we use the “bandwidth” of the interface, which is the speed the digital channel that delivers or accepts data is clocking. That’s different from the speed of the interface, which is the data rate at which packets can be exchanged. A simple example will show why that is.

Suppose that you have one of those old-line AT&T DSL connections running at 8 Mbps. You integrate that with a WiFi router that’s capable of perhaps a gig. Does the fact that the interface is clocked at a gig mean you have gigabit Internet? Or suppose you have a mesh WiFi service, one with thousands of WiFi routers that connect through each other in a mesh back to a single feed with a hundred meg of capacity? Whatever the speed of the WiFi, your users are all sharing that 100 Mbps, and WiFi routers close to the feed point will have to carry traffic for all those outward toward the edge. Even cable broadband, which shares bandwidth among the 100-500 customers typically on a span connected to a fiber head end, has a difference between the connection bandwidth/clock speed and the data rate. Suppose you had thousands of users on a cable span. Would they have gigabit service, really?

The fact is that the source of the digital divide is what divides things in the first place, which is demand density and its impact on infrastructure cost, service cost, and operator profits. The only thing that’s going to make broadband services work for both buyer and seller, both urban and rural, is a technology set that can deliver quality service at reasonable cost. AT&T needs to face that fact, and the big news in the Light Reading piece is that they’re starting to do that. The potentially bad news is that there’s no way AT&T can or will offer FTTH to every customer, and so we need to know what happens to the rest.

The governments of the world need to face that fact too, and there’s not much indication that’s happening. Just mandating a broadband “speed” isn’t going to close the digital divide, it’s going to encourage gaming the system with broadband technologies that are cheap and so reap profit and offer a promise of coverage, but leave the underlying issue of quality of experience untouched. The broadband part of the infrastructure bill isn’t written (or at least released for review) yet, so I hope lawmakers in the US will take heed of this point. I also hope lawmakers in other countries will consider it as they face their own digital divide issues.

MWC May Have Set Up the Battle for Carrier Cloud

Everyone has an opinion on what telcos need to be doing (including me). At an event like MWC 2021, it’s not surprising that that question has been raised, and it’s similarly unsurprising that IBM’s CEO has used the bully pulpit of a keynote session to offer IBM’s answer. IBM is one of the vendors telcos have always favored as a technology partner, and IBM is now trying hard to gain traction in the telecom vertical, which means catching up with rivals in the cloud and in the software space. Those cloud rivals came to the fore with the announcement by AT&T that they would be expanding their cloud deal with Microsoft to include hosting parts of their 5G Core, and the two announcements combine to show there’s going to be a battle over the very nature of “carrier cloud”.

In the session, written up by SdxCentral, IBM CEO Arvind Krishna cited four tectonic shifts that are shaping the future of telco infrastructure; 5G (of course), edge (of course), hybrid cloud, and artificial intelligence (AI). The keynote made it clear, by Krishna’s comments on open technology, that Red Hat is leading the IBM charge for the telco vertical, and it also made an interesting point about the edge that’s a good starting point.

IBM says that it believes the edge to be an extension of the data center, but it’s not entirely clear whose data center is being extended. On the one hand, the comment seemed to be aimed at operators; “It’s critical to understand that telecom operators and carriers have enormous potential to harness the power of 5G and the edge, not just as a connectivity solution but as a business services platform.” On the other hand, it’s very possible that IBM’s reference to hybrid cloud as a tectonic force is an indication that it sees edge computing critically linked to premises hosting of real-time features that would then link with public cloud or public edge services.

It may be that IBM is balancing forces (and opportunities) here. On the one hand, obviously the operators are looking primarily at 5G hosting in the near term, and 5G Core in particular is a metro-hosting or cloud-hosting possibility. For 5G RAN, the operators may have significant equipment deployed in the form of discrete devices, so whatever they do there will have to influence the way the cloud/edge piece is managed. On the other hand, applications beyond 5G could well start specifically with the “edge” being on premises. In my blog on IoT opportunities, I noted that the early phase of “visionary” IoT was likely to be hosted by enterprises within their own facilities, evolving toward broader edge services at the metro layer as time passed.

Another significant point IBM made was that, after initially focusing on virtualizing key network functions, operators were shifting toward Linux, containers, Kubernetes, and so forth—all cloud-native elements. This seems to endorse the notion that NFV isn’t going to be the centerpiece of edge computing, or even of telecom feature/function virtualization overall, in the longer term. Obviously that’s my view, but I’m a bit surprised that IBM would take a strong position in that area. It suggests that they seeing the handwriting on the wall.

They also realize that cutting both capex and opex is critical for operators who want to improve profit per bit, so they’ve announced the IBM Cloud Pak for Network Automation, which is a toolkit to improve lifecycle management and achieve (or more likely approach) zero-touch automation. This is an enhancement to the previously announced IBM Cloud for Telecommunications, IBM’s root offering for the industry.

Not surprisingly, a theme that crosses back and forth through the whole keynote is the value of an open set of solutions. IBM recognizes that their real competitors in the telco space are the network equipment vendors, particularly those that offer specific 5G capabilities. 5G is the first network technology that was designed to be hosted, at least in part, and IBM knows that if a computer/software vendor is going to get a piece of the pie, they’re going to have to beat back those network vendors who traditionally supply infrastructure. Stressing the value of openness is a popular public position, but it also stakes out an IBM sales strategy—proprietary stuff will lead you down the garden path (again).

IBM is much more accommodating to the network vendors (like Juniper) who aren’t perceived as specific 5G threats, or threats to IBM’s vision of 5G at least. Juniper is a partner listed on IBM’s website, and Juniper’s Cloud Metro story is actually fairly symbiotic with IBM’s vision of both 5G hosting and edge computing in general. It’s unlikely after all that data-plane services for 5G will be hosted; white boxes or a separate IP network are the preferred approach according to providers.

For all the good stuff included in or referenced in the keynote, there are still some gaps that could upset IBM’s plans. The biggest is a lack of a clear picture of what the architecture of edge applications will be, something needed to frame the toolkit and help development teams build edge-native applications. There are some clear architectural issues at this point, and several are crucial enough to warrant discussion.

The first issue is the functional/serverless versus container issue. Functional computing is available from all the cloud providers, often called “serverless” because there is no specific hosting commitment to a given function. Everything is run-on-demand. That model makes sense where edge applications are made up of small units of logic that are run relatively rarely, but it would make no sense where there’s either larger functional units or more frequent executions. There’s also a question of whether function loading could create a crippling level of latency in the very applications that are running in the edge to avoid latency.

Containers are a kind of middle ground between functions and virtual machines. Good container design emphasizes more stateless, or at least no-internal-state, behavior because that improves scalability and resilience, but containers are designed to be persistent; they stay loaded when the application is running. That means that if the things the application handles are rare, the cost of container-based applications is unreasonably high.

The second issue is “latency scheduling”. If edge applications are latency-sensitive, then there’s likely a budget for latency that has to be met, and to meet that budget the hosting of application components has to consider the latency implications of the choices. Even Kubernetes experts are reluctant to say that Kubernetes can provide latency-based scheduling, in no small part because part of the latency is network-related, and Kubernetes doesn’t see network topology or delay. Since latency depends on the workflow and network relationships between components, just having the characteristics of a set of Kubernetes nodes doesn’t let you reliably predict the latency a given combination of hosting points might generate.

One way to address this is to assume “flat-fabric” connectivity, meaning that every hosting point in the edge is connected with a mesh of high-speed paths, and since it’s likely that “the edge” for a given application is entirely within a metro area, it’s possible to build a metro network so that there’s minimal difference in network latency among hosting choices. Possible, but whether it’s financially practical could be another matter.

If we could address these two points, we could probably frame a set of edge-native tools that would support development. Cloud providers can address the first of these points, but they don’t control metro networking or latency differences. Network operators could, in theory, address both points, and of course software providers could anticipate the question, but would have to rely on the organizations who deploy their software to actually control whether these points are fully addressed.

Let’s close by linking this to the AT&T deal with Microsoft. Many 5G operators are looking at public cloud hosting outside the areas where they have real estate, meaning out of their home region. As I hinted in the opening, this deal establishes that public cloud providers are going to take a serious run at owning 5G hosting, perhaps completely. IBM certainly sees that, and that’s likely why they make a point of saying that their own 5G and telco solution can be run on any cloud. What IBM is doing is laying out its own key differentiating point, which is that if operators were to sign on to the IBM strategy, they wouldn’t be locked into a specific cloud provider, or indeed any provider at all. The “carrier cloud” self-hosting option for operators would still be on the table.

Microsoft is also acquiring AT&T technology and intellectual property relating to 5G, which gives Microsoft more ammunition in going after the telco space. In fact, this piece of the deal may be the strongest indicator that Microsoft is going to offer 5G-as-a-service and focus on that complete solution. That puts it in conflict with IBM’s “any cloud” theme, and with software providers who want to offer 5G without specific cloud ties.

All of this represents a victory for open-model 5G, however it falls out, and it’s also in a way a victory for the legacy IP vendors, notably Cisco and Juniper. If the 5G control-plane features are going to be cloud-hosted, then it is possible that the competitive landscape in the IP network won’t be changed much by 5G introduction. The metro area is both the focus of 5G functionality and the key opportunity in the router space. In theory, disaggregated router vendors might have posed a significant threat by incorporating 5G features into their separated control planes and then going after metro. It may now be too late for that to have much impact on the market.

The “however” to this part is that AT&T has already committed to open technology in the 5G access space and to disaggregated white-box routing in its core. If the Microsoft deal takes some immediate pressure off legacy router vendors in the metro space, it doesn’t mean that AT&T won’t eventually insist on white-box metro. Cisco and Juniper will have to use the breathing spell they get here to do something not just useful but compelling.