A Deep Dive into Edge Opportunities, Drivers, and Models

Everyone loves metro and the edge these days, it seems, and the last couple of weeks have proved that. Juniper did a big thing on their Cloud Metro strategy (see my blog) and Ciena just launched a whole campaign based on “Next-Gen Metro & Edge”, and IBM did a keynote at MWC on its telco-centric edge strategy. Every cloud provider now has an edge strategy too, so it’s obviously worth taking some time to consider why metro and the edge mean so much to so many, and whether we’re on course to realize the benefits everyone hopes for. Warning: This is a long blog!

It seems generally accepted that not only is 5G an edge application, it’s very likely the application that would justify the initial edge deployment, the step that other applications would then exploit and expand. This could be true only if 5G deployment was sufficient to create proto-edge facilities, and that those facilities were suitable for the follow-on edge missions. To see if both those conditions are met, we have to look deeper at both edge requirements and 5G features.

Most Internet users think of the Internet as a big global network touching everyone and everything, which is only sort-of-true. Yes, there is global connectivity, but in a traffic and topology sense, the Internet is really concentrated between you, the user, and a local source of content. Content delivery networks (CDNs) really started that revolution by facilitating content delivery, especially video content, through local caching. Over time, there have been initiatives to cache processes in CDNs, and this could fairly be said to be the on-ramp to what we now call “edge computing”.

Edge computing is exciting and troubling at the same time. It’s exciting because it promises to provide hosting for applications that are sensitive to latency, potentially opening a whole new set of applications not only to “the cloud” but perhaps facilitating their very development. It’s troubling because we really don’t know exactly what those new applications are, how they’d be developed, or whether they could make a business case for initial deployment and long-term use.

Practically speaking, the “edge” is more likely to be the “metro”, as it usually is with CDNs. While it’s possible to push hosting or caching outward toward the user, the reality is that forward placement quickly loses its appeal. If the edge is “close to users” then it follows that each edge point supports a smaller number of users, making its per-user cost higher. You can’t give every user a resource pool of their own, so economies of scale dictate you move that pool inward until the benefits of scale and the increases in latency balance each other.

That balancing act is important, because it dictates that we rethink how we structure the access/metro relationship, and the metro network itself. From an efficiency and trade-off point of view, we can create more effective edge computing if we can improve the connectivity from user to metro, and within the metro complex, so that we can distribute resources for efficiency without increasing latency to the point where we lose some of the benefits of the edge. However, just like there’s such a thing as a “cloud-native” application, an application designed to take full advantage of the cloud, there is likely to be an “edge-native” model that can take advantage of this new access/metro relationship. We may not be on exactly the right path to find it.

Going back to my earlier point, the biggest asset in getting edge/metro hosting into the real world is 5G, but it’s also the biggest potential poison pill. Because 5G standards almost dictate edge hosting, particularly for the RAN, it represents a potential deployment driver that could get the critical hosting assets into metro locations, where applications above or beyond 5G could exploit them. However, 5G and in particular O-RAN mandate the notion of the RAN Intelligent Controller or RIC, and RICs are a bit like orchestrators and a bit like service meshes. The telco community has been conspicuously ineffective at defining its own management and orchestration model in a generally useful way, or even in a way that really supports its own goals. RICs, done wrong, could poison the edge by encouraging a set of components that can’t really support generalized edge missions.

We know two things about metro/edge. First, it works best if there is exceptionally good, low-latency, high-capacity, connectivity within it. Everything at the access level doesn’t have to be meshed, but high-capacity access should feed a thoroughly interconnected metro area pool. Second, it will require a software architecture and toolkit that defines, even implicitly, what “edge-native” would look like, so that everything that’s deployed there (even 5G and O-RAN) can use a single set of features. Both Juniper and Ciena are contributing to the first of these, but we’re still struggling to see how the second requirement can be met.

One thing that seems clear is that “edge-native” is a variant on “cloud-native”, meaning that the tools for the former will likely be related to (or expansions of) the tools for the latter. In fact, it’s my view that edge computing simply cannot work except with cloud-native elements. The reason is that if latency control is a prime requirement, and if users and user relationships create dynamic component relationships within the metro, then we can expect to need to redeploy things often to accommodate the changes. A good example of this is gaming.

Gaming is a popular proposed edge mission because latency in a game, particularly in a multi-player game, distorts the sense of reality and creates situations that could advantage or disadvantage some players by changing their timing of action relative to the timing of others. The problem is that multi-player games often involve players that are highly distributed. If a group of players within a metro area could be supported with consistent, low, latency, then their experience would be better, but making that happen means being able to shift both connectivity and hosting to optimize the flows for the current community of players/avatars.

The common way of speed-matching a group like this would be to equalize delay, but if that means delaying everyone to the worst level of latency, the result could be actions that lag significantly between being ordered by the player and becoming effective in the game. We can’t equalize on worst case, at least unless we can always ensure that “worst” is never really bad. A single town probably won’t host a lot of players, but a big metro area could well host tournaments.

Gaming isn’t the only application of edge computing that poses questions. Exactly what role the edge would play in autonomous vehicles is, in my view, rarely addressed rationally. To start with, early autonomous cars would share the road with a host of non-automated vehicles, and then there are those unruly cyclists and pedestrians. Central control of automobiles and highway vehicles in general makes no sense at all; on-vehicle technology is essential for accident avoidance and if that’s removed from the picture, the role of the network is simply to control the overall route, hardly a novel application. Where edge might fit would be in warehouse applications, rail transport, and other controlled situations where per-vehicle coordination was more a matter of planning than of reaction. These applications, though, could often involve dynamic movement within a metro, and that could in turn require redeployment of control elements to minimize latency.

For IoT, it’s almost certain that metro/edge services will involve coordinating public hosting resources with on-premises private technology. In 5G this would be helpful in addressing the access/backhaul portion of the network, where resources are too isolated/specialized in terms of location to justify a pool, but where unified orchestration of features would still be important, and there’s the matter of state control to be considered if we assume stateless microservices as the vehicle for building functionality. The more an application’s components are distributed, the harder it is to coordinate their operating state.

The orchestration piece, provided by the RICs in O-RAN as previously noted, needs refinement for the metro/edge. The RIC performs a mixture of tasks, including some that would align with cloud orchestration, some that would align with cloud service mesh, and some that align more with the lifecycle management tasks that the TMF supported via NGOSS Contract. As I noted earlier this week, the use of a data model to contain state/event tables and service information was the center of what’s still the best articulation of lifecycle management for services. There’s no reason I can see why the same thing couldn’t be applied to edge applications in general, which would address the issue of state control. For orchestration, we might be looking at some blending of serverless and Kubernetes principles, perhaps with some “service” functionality being handled by the former because the microservices would be lightweight and require only short execution times, and some handled as more persistent containers. This could correspond in concept to the near-real-time and non-real-time RICs of O-RAN.

There are half-a-dozen serverless adaptations for Kubernetes (Kubeless may be the best-known) and Microsoft highlights the union of the two for Azure. Recall that Microsoft has also launched a private MEC capability, which might facilitate the blending of customer-hosted edge resources with the metro edge services, even for transitory functions. The best approach, of course, would be to explore the right platform starting with requirements, and I offer THIS story as both an example of a requirements-based approach to cloud platforms, and as a candidate for the right answer.

The edge is going to be diverse if it’s going to be important, but it can’t become a bunch of silos separated by distinct and incompatible middleware tools. Some consistent tool setup seems critical, and there that setup could come from might be the competitive question of our age. Do cloud providers dominate, or do software providers?

In my modeling of IoT opportunity, I noted that the early phases of the visionary form of IoT were likely to involve more premises hosting than cloud resources. I think that’s true for edge applications in general; early latency-critical stuff is likely to be supported with local on-prem resources, with things moving to the edge/cloud only when the business case is solidified. Software vendors might then have a natural advantage. Thus, it’s smart for cloud providers to push an elastic vision of edge computing, with the goal of seizing and exploiting the premises-centric period and gaining early traction when the scope of the edge expands into metro and cloud.

If the cloud providers do push their elastic-edge vision, they could create a significant problem for software vendors who want to create a “transportable edge”, a software package that could run on premises and in the cloud, but in the latter case without drawing on specific cloud-provider services other than simple hosting. Cloud providers who want differentiation and long-term edge revenue seem to have recognized the opportunity, but software vendors may be slower to do that, in which case edge software could be much more cloud-specific down the line. Then the race to edge-native would both define and expand the role of the public cloud in the future.

MWC didn’t answer the question of how 5G hosting morphs into MEC, though the term was surely batted around a lot. Because that’s the case, we can’t yet say how the edge will unfold, and we can’t say whether 5G will push the right vision of the edge. Not only is the edge up for grabs, competitively, it’s up in the air in terms of its own evolution.