Taking Mobility to the Next Level

On March 16th, I blogged about the question of what a cloud-native telco might look like, and obviously this had to address the evolution of things like NFV and 5G. The question I raised at the end of that blog was whether we had perhaps carried evolution too far in mobile networking and NFV, whether starting again from the top might lead us to a different place. Let’s look a bit harder at that one today.

A big piece of how mobile networks manage their critical property of mobility traces back quite a way. What we have today, in 4G and 5G in some form, is a strategy that connects per-device “tunnels” from the gateway point to the packet network (the Internet in most cases) to the cell site where the device is found. Packets are processed (in 5G, by the UPF) through a set of classification rules (Packet Detection Rules or PDRs in 5G), and the rules match the packet header (the IP address, tunnel headers, etc.) and apply handling policies. Every device has to have at least two rule sets, one for each direction of traffic. The implementation of these rules, handling policies, and routing based on the combination is normally associated with elements of the “user plane” of a mobile network.

You could justify having per-device PDRs, but how about per-device tunnels? A packet for any user, emerging from a gateway UPF representing the Internet, could “appear” in the right cell even if it shared the tunnel to that cell with hundreds of other packets from other users. Thus, we could cut down on the number of tunnels by having one tunnel per destination cell. Not only that, if we had detailed control over forwarding rules, we could simply forward the packet to its proper cell based on its own IP address. We could get more forwarding control, too, by thinking a bit about how the user plane of a mobile network would work.

In the 3GPP specifications, user-plane and control-plane elements are represented as boxes with interfaces, and in the past they’ve been implemented as real appliances. In an effort to open things up, recent standards evolution has described these elements as “virtual network functions” (VNFs) and presumed a hosted software instance implemented them. The suggestion about “microservices” or “cloud-native” network traffic handling likely comes out of this point.

Following the standards leads to the creation of mobile infrastructure that’s almost an adjunct to the IP network rather than a part of it. This mobile network touches the IP network in multiple places, and its goal is to allow a user roaming around through cell sites to be connected to the Internet through all this motion, retaining any sessions that were active. We have a completely separate set of “routers” with different routing policies to make this work.

Eliminating mobile-specific appliances in favor of hosted VNFs introduces cloud network overhead to the handling of mobile traffic, which seems at odds with the 5G goal of lowering latency and supporting edge computing for real-time applications. My suggestion, as described HERE and HERE [add references to the blogs, the latest being March 16th], was to consider the control-plane implementation to be based on cloud-native and the user plane to be based on a “router with a sidecar” to add the capability of implementing the PDR-based handling 5G requires. I noted that something SDN-ish could work as the means of connecting sidecar element to the router.

There has been work done on supporting mobility management via “standard” devices, meaning either routers with some forwarding agility or white-box switches. SDN, as just noted, would offer a means of customizing forwarding behavior to include tunnel routing, and the SDN controller could act as an intermediary PDR controller. There has also been work done on using the chip-level driver language P4 to define mobile PDR handling. That would enable any device that supported P4 to be a PDR handler, though there would still be a need for “sidecar” control to implement the interface between the device and the 5G control plane.

If VNF-based handling of mobile traffic isn’t the optimum approach, it’s not the only thing that mobile standards lead to that could be sub-optimal. The problem with block diagrams in standards is that they are often difficult to connect with two real-world elements—the location where they might be deployed, and the patterns of traffic within the network overall. In my view, both these real-world elements converge into what I’ve been calling metro.

Metro to me is both a place and a traffic pattern. Network service features are most easily introduced at a point close enough to the edge to permit real-time handling and personalization, but deep enough to serve the number of users needed to create reasonable economy of scale. I submit that this place is the metro center or metro. As it happens, the notion of a metro center is already established because of the importance of content caching. The major CDN providers connect to broadband access networks at specific places, and these are probably the best early candidates for a metro site. That’s because video traffic is the great majority of Internet traffic, even mobile broadband traffic, and so there’s already great connectivity to these points, and a natural concentration of traffic. The majority of content cache sites are also places where real estate to house edge computing is available.

Mobile networks often implement a kind of implicit metro-centricity because it’s at least possible to differentiate between mobile users that remain within a metro area versus those that are roaming more broadly, with the former having direct IP addressing and the latter tunneling. Since most mobile users do tend to stay in a metro area, and since most video consumption on mobile devices involves those generally-in-metro users, traffic efficiency is often high because video traffic doesn’t need to be tunneled.

It might be possible, and valuable, to make metro-centricity a design mandate. If we were to define a kind of “movement zone” around each metro center, with the goal of containing a large proportion of the users there within that zone as they travel locally, we would improve handling and latency. We’d probably cover about three-quarters of the population that way, and over half the geographic area of the US, for example. For the rest of the population and geography we could expect users to be moving in and out of metros, and more tunneling would likely be needed.

Presuming population mobility was a challenge that had to be addressed in sustaining session relationships, we’d end up needing more tunneling because it would be very difficult to reconnect everything when a user moved out of a metro area and required a different mobile gateway to the Internet, or to cached content.

Edge computing and real-time digital twinning via 5G would exacerbate some of these issues, because interruption of message flows in real-time systems can have a catastrophic impact, and attempts to remedy that at the message-protocol level (requiring acknowledgments, buffering and reordering, etc.) would increase latency overall. If we were to assume that there were “social-metaverse” and other “cooperative” applications of edge computing that required broad edge-to-edge coordination, the need to optimize just what edge facility a given mobile user is connected with would increase. It’s also true that real-time edge applications would be likely based on microservices, and that could give impetus to attempts to create microservice-based mobile service elements overall.

I think that mobile standards, like NFV ISG work, have suffered from box fixation, making it difficult to assess how to implement them in a “cloud-native” way. I think that network standards in general have to balance between the optimized use of new technologies and the ability to evolve from existing infrastructure without fork-lifting everything at great cost. The problem is that this combination creates a barrier to motion and a limit to innovation at the same time. At some point, we need to think harder about going forward with network standards, and I think that starts by looking at features and not boxes.