Could Edge Computing Drive 5G?

Should we look at 5G in terms of edge applications?  If the big benefit of 5G is lower latency, then doesn’t it follow that new 5G revenues would likely be associated with those applications, which would have to run at/near the edge?  What are those applications, how much opportunity could be associated with them, and what do they need beyond just 5G, especially in terms of edge computing?  I’ve been working to model this out, and here’s what I’m finding regarding the popular, and the credible, options.

If you do a bit of media research, the number one proposed low-latency application for 5G is autonomous vehicles, so that surely qualifies it as “popular”.  The presumption is that low latency is essential because any delay in having the automated driving system respond to pedestrians and other vehicles could literally be fatal.  One of my LinkedIn contacts offered this: “In 2018, the Finnish Ministry of Transport and Communications justified the 5G construction strategy, e.g., assume that ‘every car in 2025 will generate thousands of gigabytes a day on the network’ and ‘one car will be equivalent to a thousand smartphone users’”.  A Light Reading article had another view that I quoted earlier this week:  said that latency could “turn streets of autonomous vehicles into a demolition derby.”

The connected/autonomous vehicle presumption ranks near the top of the nonsense predictions that plague popular discussions of 5G or edge computing.  First, there is absolutely no credibility to the assumption that remote control of autonomous vehicles would extend to driving and avoiding obstacles.  Every credible approach puts collision avoidance and situational awareness in the in-vehicle functions category.  Second, even if we somehow made the terrible mistake of offloading it onto a network process, it wouldn’t require “thousands of gigabytes a day” of traffic.  Even if it somehow did, there’s no way on earth that users would pay for that kind of connection, much less the processing resources needed to interpret the data.

But the final nail in this coffin is that hits this application is that smart vehicles can only be introduced as fast as new vehicles are introduced, even assuming that every new vehicle is made connected/autonomous.  In the US, for example, we have about 282 million vehicles on the road, and we add about a million a year.  That means the first year out, we have a max of a million vehicles involved, for which we must deploy infrastructure to connect them and guide them everywhere.  Not to mention that they’re a drop in an uncontrolled vehicular bucket, a bunch of stuff that the guidance system doesn’t control, or even see.

Another popular justification is IoT, by which most people seem to mean smart homes and buildings.  We need 5G and edge computing to control them, right?  Well, a quarter of all homes, over half of all small business sites, and almost all multi-tenant sites are already “smart” in that they have central control over lighting and environment.  What horrible deficiencies have we uncovered in serving this group that require 5G and edge computing to fix?  Even mobile elements can be supported in-building using WiFi with some “roaming”, and 4G also works fine for these applications today.

This example of wrong-think stems from failing to recognize the difference between an application that could use a technology and one that would justify it.  If we had ubiquitous 5G, could it be exploited for mobile IoT sensors, in public or private form?  Sure, but would an operator or enterprise find that exploitation added so much value that it let the investment hit the ROI targets?  Doubtful, and even if it did, there’s the question of how you synchronize the availability of 5G low-latency service and edge computing with the things that need it.  Who builds applications for non-existent resources?  Who deploys resources for non-existent applications?  Who’s the chicken and who’s the egg?

OK, enough of the hype stuff.  Let’s move on to things that could actually work.  My goal is to explain some of that, and also explain why each misses the pitfalls that the popular applications fall into.

There are a host of uninteresting-sounding event-processing applications that relate to IT and networking themselves.  Companies that have large data centers or a lot of distributed servers and network sites often look at automating operations.  These applications are driven by event logs and management events, and some of them depend on having a real-time response to a change.  This class of application is actually probably the largest real-time event processing load on public cloud event tools.

Amazon’s Wavelength, Microsoft Stack, and Google’s Anthos at the Edge are all public cloud extensions aimed at placing compute resources closer to events.  In all these cases, the hosting is supplied by the user, and the cloud provider offers a way of extending cloud application and operations support (in some mix) to those user-provided hosts.  As I noted in an earlier blog, the cloud providers are promoting this model to network operators, but the significant thing is that it makes “edge computing” something a user can adopt with their own resources.  This could provide some ready applications that could become the basis for an edge computing service.

5G could enter into this because user-deployed edge computing might well be used where reliability of wireline connections is limited, and so a good-quality 5G backup would be useful.  Useful, in fact, both for the applications that are using the edge computing, and useful in ensuring management systems and operations automation remain connected to the site, even if there’s a network failure.

A more interesting application is remote medicine, not to be confused with telemedicine.  Telemedicine is typically a simple video link, perhaps augmented with some ability to do common tests like heart rate, blood pressure, and temperature.  Remote medicine would add in more complex testing (EKG, ECG, blood tests) and also at least some form of robotic treatment or surgery.

What characterizes this application is the dependence on end-to-end latency, not just “local” latency.  The presumption of most edge computing is that the control loop stays local to the event source.  With remote medicine, you’re explicitly remote, meaning likely out of area, and so not only do you have to deal with traditional local latency, but with latency over the whole event flow and response.

This application can also be launched via an edge that’s really a locally extended cloud, like the three I’ve already named.  That gives it the same easier on-ramp, since many communities (including my own) become what’s in essence a city-wide medical campus with a high concentration of facilities and services.  Individual “edge-clouds” could be combined into a cloud- or operator-provided edge computing facility, and 5G might serve as a backup for connecting life-critical applications in the event of an issue with normal connectivity.

Gaming is in many ways related to this application, but I list it below remote medicine because it doesn’t have as easy an on-ramp.  Massive multiplayer gaming (MMPG) is surely a global phenomenon, literally, and that means that there’s an enormous explicit latency associated with simply linking all the players, a latency that’s not addressable with 5G radio alone.  What would be required would be a form of that end-to-end latency control needed for remote medicine, but it’s not clear that there’s a local enclave from which you could jump off.

Many gamers play with people they know, though, and there are gaming events that could reasonably spawn related local-community involvement.  I think that it might be possible to promote an edge-centric coordination of MMPG, but it may require some technical changes.  In any event, while there’s a natural centralized starting point for remote medicine, that’s less true for MMPG.  On the flip side, there’s every indication that players would pay more for the service.

MMPGs essentially create a broad virtual geography in which players/characters move about.  If communities of players congregate in a specific area, it’s possible that area could be controlled by an edge resource close to the players.  The more concentrated the players were in real-world geography, the more likely this could be done, with the effect of enhancing performance of the game by reducing latency.  This creates an interesting segue into the final topic, AR/VR.

By far the largest application of VR today is gaming, and gaming illustrates a real issue with both augmented and virtual reality, which is the synchronization of the movement of the head of the player with the virtual world seen by the character.  A lag creates a very unpleasant sensation that a few find downright nauseating.  While some of the lag is created by the process of creating the virtual vision itself (imagine a CAD system rotating a part, but in real time, in terms of the complexity of deciding just what the character would “see” looking in a given direction), some is also attributable to latency.

Because this problem is real in gaming even today, and because at least some gamers would pay for improved game performance (and the advantage it might offer them), this could be an on-ramp for the creation of a virtual reality view at the edge.  My model says that gaming applications, virtual-reality real estate and holiday applications, and similar applications could generate just short of $75 billion in service revenues.

The question is whether it’s possible, and better, to push the creation of the view to the gamer’s own system.  Obviously, a dedicated edge process to support a single gamer would be expensive, so again the utility and revenue potential for this may depend on whether communities of player could, by the position of themselves in the real world and their characters in the game, share a hosting process.

But now let’s look at the broadest, most extravagant, and most complicated application of edge computing, which is contextual services.  This is a term I’ve used to describe applications and services that are designed to present information and visualization to a user based on specific real-world context, meaning where the user is, what they’re trying to do, where others are relative to them, and so forth.

Contextual services and applications have three components.  The first is context management, which is responsible for knowing what a user is sensitive to or interested in, and detecting when anything that fits is within a defined “range”.  The second is real-world analysis, which is the process of identifying the real-world things the user might see.  This would likely require a combination of a highly accurate map, accurate user location data, how the user was facing, and what mobile elements were also in the area.  The final component is contextual presentation, which is the way that contextual information is integrated with the user’s real world; how they are told about a contextual hit.  There are essentially three possible high-level models of contextual services developing from these two components, and we’ll have to look at each.

The first model delivers contextual information without integrating it with the visual field of the user, meaning that no VR/AR is involved.  If we were to apply this to a stroll through a shopping district, a user might get an audible alert and a message if they approached a store that had a deal on something they’d been shopping for.  This still requires contextual integration but presentation harnesses available and well-known channels available on every smartphone.

The second model delivers contextual information in the context of a VR display.  How much of the real-world view is used as the basis for the display depends on the application?  For example, a remote medicine system might show contextual information that has nothing to do with what a doctor’s eyes might actually see, but it’s still visual.  Think “computer display”.  The problem with this is that if there’s no boundary on just what the “real world” is, meaning that it represents what the user’s own eyes would see without VR, just presenting that real-world base is complex, and overlaying contextual information on it means integrating the information with what’s at that place in the real world.  A price over a shop door, perhaps?

The final model, based on augmented reality, is really designed to deal with that problem.  AR presumes that the display is “see-through” so that the user will see the real-world baseline without any system support.  The contextual augmentation is displayed as an overlay on that real-world view.  There’s little difference in the challenge in creating the view because it’s still essential to be able to identify real-world objects, but limitations on the display quality are reduced.

I’ve worked for several years to lay out what the potential for this final application could be.  It appears, based on my current modeling, that it could generate about $250 billion in service revenues when applied to worker productivity enhancement, and about $175 billion in service revenues when applied to consumer contextual applications.  A sizeable component of the latter would be generated by sales/advertising revenues.

The point is that there are edge opportunities, and that if they were to fully develop, they could promote 5G.  5G and the edge, however, are not sufficient to create them.  There’s a lot of software infrastructure needed, infrastructure for which we don’t even have an architecture to work from today.  Don’t get too excited about edge and 5G, then.  There’s a lot of work to do, and all the hype may be hurting efforts to get it done.

A Deeper Dive into Disaggregation in Networking

It’s nice to find a thoughtful piece on technology, and a potential new source for such pieces.  The Next Platform, published in the UK by Stackhouse, is offering such a piece HERE and it’s worth digging into a bit.  While the titular focus is “disaggregated routing” and perhaps DriveNets’ recent funding bonanza, there are plenty of broad messages to consider.

The opening of the story is certainly strong.  It frames the current network technology developments as the outcome of a long-term struggle between the benefits of specialized chips in pushing packets, and the benefits of software in molding packet-pushers, first into networks than then into differentiable services.  While that’s certainly not the sort of top-down approach to things that I naturally look for, it does lead to some provocative points.

In computing, we’ve seen a hardware evolution and a software revolution taking place at the same time.  Early microprocessor chips were, by current standards, so anemic as to be useless in most commercial applications.  As they got more powerful and more could be done with them, it’s hardly likely that PC buyers would turn into a vast army of programmers building their own applications to take advantage of the new power available.  What was needed was packaged software, something that was sold by somebody to many users.

The concept of separate, packaged, software releases the bond that ties an entire IT investment to its most capital-intensive, depreciation-sensitive, asset, the hardware.  In IT, you buy a server, and while you depreciate it over perhaps five years, you can run anything you like on it.

Let’s now look at this same power dynamic in networking.  For decades, there’s been routing software available in open-source form.  The UNIX Berkeley Software Distribution (BSD) stuff included it, and that helped to pull TCP/IP into network supremacy when UNIX began to displace proprietary operating systems.  As recently as 2013, Tier One operators were trialing hosted-router software (from Vyatta) as a replacement for proprietary routers.  The software ran on “COTS” or commercial-off-the-shelf servers, and performance was…well…OK.  Not surprisingly, silicon innovation evolved to improve it.

We have a lot of switch/router silicon available today, from companies like Broadcom, Nvidia, Marvell, Intel, and even Cisco (Silicon One).  This is the same kind of innovation we’ve seen in computer graphics chips, network adapters, and other interfaces used in computer systems.  It creates the same challenge, which is to create packaged software that’s portable across some reasonable base of technology.  The solution is what’s generically called a “driver”, which is a software component that takes an abstraction of a broad set of interface requirements and maps them to one or more specific implementations.  Software can then be written to the abstraction, and the proper driver will let it run in multiple places.

Run on what?  The industry has evolved its own answer to that too.  Rather than demand a COTS server architecture to glue chips onto, a “white box” model has emerged that optimizes the overall hardware platform for the specific network mission the chips are supporting, which is switching and routing.  White boxes will generally have a specific mission, and a specific set of optimizing chips to support the mission.  You put packaged software on white boxes and you have a network device.

Here’s where I think the article misses an important truth.  Networks are built from network devices, and how the devices combine to build a network is intrinsic in the architecture of the devices.  If we took packaged routing software and combined it with a white box, we’d have a white-box-generated example of what could be called a “black box abstraction”.  Black boxes are functional elements whose internal structures are opaque.  They’re known by the relationship among their interfaces, the external properties.  Thus, a white-box router and a proprietary router are both “routers” and they build “router networks”.

This is the most critical point in the evolution of networks today, because if we nail a new technology into a legacy mission, we may constrain it so much that it loses a lot of its potential.  What’s the difference between any two implementations of a black-box router?  From the outside (by definition) they look the same, so the only meaningful difference can be cost.  Cost, friends, isn’t enough to drive transformation.

On a box-per-box basis, meaning one white-box router versus one proprietary router with the same specs, you’re looking at between 25% and 40% cost savings of the first versus the second.  Operators tell me that a major replacement of network technology would have to save around 35% in capex to be meaningful and justify a change, so there’s an issue with the justification of the boxes from the first.  Then there’s the problem that there really aren’t white box models made for every possible router configuration, and since network devices are less common than computers, there’s less economy of scale.

Where DriveNets comes into this picture is that they’ve abstracted the “white box” interior of the black-box router.  They assemble arbitrary configurations by linking multiple white boxes into one virtual box, which behaves exactly like a black-box router from the outside.  This is one of several things that’s described as “disaggregation”, which loosely means “taking things apart”, and it fits.  What DriveNets disaggregation does, out of the box (no pun intended), is to reap the maximum savings from the white-box-and-packaged-software approach, and extend the option to all possible classes of network devices.  That’s enough to make a business case for a network upgrade, and they’re unique in that regard.

Now (finally, you may think) we come to the point about boxes building networks and the box/network functionally interdependent relationship.  Black boxes are linked into a network via those external interfaces, which means that the initial definition of the abstraction then defines how the devices are used, the kind of networks they build, and the services those networks directly create.  Suppose you don’t want that?

The SDN movement and the NFV movement both, in theory, offered a way of changing that.  SDN separated the control and data planes, and implemented the control plane differently (via a central controller).  NFV offered the possibility of decomposing devices into virtual functions that would then be recomposed into more flexible virtual devices, and the opportunity to create virtual functions that current routers didn’t even support.  Neither has been a dazzling success at replacing routers, but they both demonstrate that there is life beyond router networks, no matter how each black-box router is realized.

What kind of life?  We know what an SDN network model looks like, and we can assign black-box element properties to all the components and then hope for an open implementation of each.  Anyone’s flow switch should work with anyone’s SDN controller.  What this means is that SDN creates a new network model that represents not individual devices/nodes, but the network as a whole.  A community of SDN elements is one big black box, made up of interior black boxes.  Abstraction within abstraction, wheels within wheels.

That, I think, is the general model of networking we’re heading toward.  On the outside, at the network level, what we build has to look like IP.  On the inside there’s another collection of abstractions that represent the open assembly models that add up to that exterior black-box abstraction of an IP network.

Things like separating the control plane and the data plane, however you do it and however far apart they are, are the responsibility of the inner-model relationships.  If you elect to extend the “services” of the IP network to include the specific user-plane interfaces of 5G or the implementation of CDN, they add to the features of the network abstraction and they’re implemented within.  You could also say that those future “services” are another higher level in the abstraction hierarchy, consuming the “IP Network Abstraction” and its contained abstractions, and other abstractions representing non-IP-network features, likewise then decomposed.

This gets me to where I like to be, at the top of the model.  It frames the future of networks by creating service abstractions, functional (like “IP Network”) abstractions, and low-level realizations within them all.  It defines openness in terms of open implementations to defined abstractions.  It envelopes what we can do today with or on IP networks, and what we could evolve to wanting, including edge computing.

If this is a good statement of the future of networking, then where does it leave current vendors, including DriveNets?  The model makes everything a decomposition (a “disaggregation” if you like) of a functionally defined glorious whole.  Anything we have or do now is a special case of an expanding generalization.  That’s what vendors have to be thinking about.

For traditional network vendors, both switch/router and 5G, the dilemma is whether to embrace a lower role in the abstraction hierarchy, to adopt the full model and position your offerings within it, or to ignore the evolving reality and hope for the best.  Router vendors who really offer “disaggregated software and hardware” will have to support a hardware abstraction that embraces white boxes.  Do they then also have to embrace a hierarchy of abstractions like DriveNets?  A major question for those vendors, because they have broad sales targets, a broad product line, and a lot to lose if they fail to support what seems to be evolving.  But they may lose differentiation, at least to a degree, if they do.

DriveNets, despite its enviable market position, has its own challenges.  The article cites Hillel Kobrinsky, DriveNets co-founder and chief strategy officer: “DriveNets is going to focus completely and exclusively on the service provider space – forget large enterprises, hyperscalers, and cloud builders. DriveNets started at the core for routing, Kobrinsky adds, and has moved into the peering and aggregation layers of the network and has even moved out to the edge and is sometimes used in datacenter interconnects. But DriveNets has no desire to move into other routing use cases and has no interest in doing switching at all. At least for now.” 

Self-imposed limitations are good because you can “un-impose” them easily, but bad as long as you let them nail you to a limited role in an expanding market.  With 5G interest and budgets exploding, and with edge computing differentiating from cloud computing, “now” in the service provider space may be measured in weeks.  Enterprise and cloud provider opportunity is already significant, and any significant and unaddressed opportunity is an invitation for competitive entry.  The competitor may then take a broader view of their target market, one that includes what you thought was your turf.  And, a true and full adoption of the model hierarchy I’m talking about would be a great way to enter the market.

Great, because there are already initiatives that could easily be evolved into it.  The biggest driver for adopting this abstraction-hierarchy model of networking may be projects like Free Range Routing and the ONF’s Stratum.  Open RAN developed because operators wanted it and vendors were ready to oblige them, particularly vendors who didn’t have incumbent products in the 5G RAN space.  Could Stratum or FRR create an appetite for an agile high-level model for services and networks?  If so, could that then drive everyone to either adopt their own broad model, or be supersetted by the rest of the market?

If there’s any issue you want to watch in networking, it’s my recommendation that you watch this one.  If this industry is going to be moved and shaken, this is where it will be done.

What Can We Say About Operator Edge Partnerships with Cloud Providers?

Are cloud providers’ edge deals with operators symbiotic or parasitic?  Both Amazon (Wavelength) and Microsoft (Stack) have tailored edge offerings that essentially extend a tendril of their cloud into a telco data center.  The cloud providers view this as a symbiotic relationship, meaning it’s an ecosystemic partnership that benefits both parties.  Some telcos think it’s parasitic, which I’m sure you don’t need me to define.  Which it is may be hard to determine, now or even in the future, for a number of reasons.

One good reason for uncertainty is that edge computing is often linked to 5G hosting, and in particular to Open RAN and a broader-and-still-undefined “open 5G”.  Operators have accepted that they might well have no choice than to host edge-5G components because they might not have real estate in all areas, but opponents have pointed out that Wavelength/Stack requires telco real estate in any event.  That raises the first question: Would telcos perhaps use cloud-provider edge services hosted on another telco’s real estate?

The operators I’ve talked with say “No!” decisively.  They insist their deals with public cloud providers would foreclose that, which means either that the “out-of-region” 5G hosting driver is a non-starter, or that cloud providers themselves might deploy hosting compatible with Wavelength/Stack (as appropriate) to be shared.

The second reason for edge symbiosis uncertainty is that edge applications are a big question mark.  The big 5G network equipment vendors are promoting using their own tools to host 5G edge, and when you get beyond 5G hosting as an application, you enter the realm of speculation.  Light Reading did an article on the topic, and say that without edge computing, network “latency is anything between 50 and 200 milliseconds, levels that would trigger seasickness in users of virtual reality headsets and turn streets of autonomous vehicles into a demolition derby.”  If we take these two example applications, the first begs the question of what virtual reality headset applications are, and the second ignores the fact that collision avoidance in self-drive is almost certainly not going to be hosted off the vehicles themselves.

This question of applications for the edge is one I’ve noted before, and it’s an example of a broader issue I’ve also noted, which is to presume that because you can think of something to do with a technology, that “something” will justify it.  I’ve been working on the modeling of the opportunity for augmented reality or virtual reality in relation to IoT, and it’s been exceptionally difficult to get the factors right.

A sifi writer, Robert Heinlein, did a 1950s book called “The Door into Summer”.  It postulated a 1970 world where a robotic vacuum cleaner inventor cold-sleeps himself into the year 2000, where he finds all manner of robotic stuff, invented by him.  That happens because he’s able to then time-travel back to 1970 to take what he found in 2000 and patent it.  OK, the theory has a lot of outlandish elements, but my point in citing it is that the advanced robotics that were postulated for 2000 (or even the basic stuff for 1970) were not anything close to being realized at those dates.  Imagination (and PR) has no inertia or boundaries.  So it may be with edge applications; they make a nice story today, but there’s a lot of moving parts in them that are not yet moving.

This generates our second question:  Is there a mission for edge computing that operators could cite to justify deployment?  5G hosting alone isn’t it; edge computing is a general-purpose commitment.  The answer to this, IMHO, is that there is no clear near-term mission.  Thus, operators who made a major financial commitment to “the edge” would be reckless in the eyes of their CFOs and Wall Street.  You don’t commit depreciable assets to a future mission; you wait till the mission is credible in the near term.

Put in this context, the decision by operators to cloudsource its 5G hosting might seem to make more sense.  My reason for the “might” is that point about real estate I made earlier.  Operators probably don’t have edge hosting resources on which to run cloud-provider symbiotes/parasites.  If they have to acquire them, then they’ve taken the big depreciable-asset-risk step.  How do we reconcile this with the no-current-justification point?  There are ## possibilities.  Obviously one possibility is that both operators and cloud providers have drunk too much bath water, but let’s set that aside and assume rational business planning on both sides.  What then?

One possibility is that operators aren’t worried about stranding computing assets because they believe 5G will justify them.  Whether 5G is open or proprietary, it’s based on the presumption of hosted functions of some sort.  They have to run somewhere, and most operators agree that that “somewhere” is likely at least within a metro and perhaps (particularly for large metro areas) even in each central office.  So, you stick some edge data centers where you want to host 5G.  Your problem then is that you don’t know anything about clouds and hosting, so the cloud provider deal is a way to get those edge data centers functioning correctly.  As time passes and your own skills grow, you can kick out the symbiotes if they turn out to be parasites.

Another possibility is that cloud providers know that operators, presented with a “portable cloud” strategy that they have to invest in hosting, will elect to commit to a cloud provider’s own edge.  Get some Wavelength/Stack contracts in place, maybe let the operators fund a couple data centers, and they’ll come to their senses and ask you to deploy your data centers in their areas.  Maybe they even rent/sell you real estate.

A third possibility is that both cloud providers and operators recognize that low-latency applications of any sort will have to avoid ever getting onto the Internet.  The edge, then, must be a piece of operator’s infrastructure, and 5G transport has to ride on internal operator capacity.  In short, 5G is a total end-to-end ecosystem, and operators are simply looking to gain the skills needed to host the functional and application pieces.  They’re building a “limited public cloud.”

Which of these three might be true is beyond my skill to predict, and in any event, I suspect that every operator would have its own probability set to define the most likely scenario they’d adopt.  Instead, I want to look at the implications of the most interesting of the choices, the last.  It’s the one I’d like to think will win, but I can’t present a modeling case to support it yet.

The essence of the third possibility is that carrier cloud and public cloud would be interdependent more than competitive, a true longer-term symbiosis.  The operators wouldn’t be expecting to compete with public cloud providers for today’s applications, and the public cloud providers would be accepting that a carrier cloud deployment would happen.  Linking the two, perhaps in some cases in a single relationship and others in a multi-cloud relationship, would be the Wavelength/Stack model.  It would jump-start carrier cloud, and provide an extended cloud that likely both cloud providers and operators could “sell” depending on the balance of edge/deep-hosting in the applications.

This opens some interesting possibilities, I think.  It would likely promote faster and broader use of edge computing, particularly if I’m correct and the public cloud providers and operators could cooperate in developing and selling edge services.  It would raise the pace of carrier cloud data center adoption, it would promote enhanced applications…a lot of good stuff.  The question is whether the stars within both the public cloud providers and network operators will align.  We have tools so far, but they’re not mature enough to answer that opening question; symbiote or parasite?

That question will probably have to be answered by vendors and not operators.  Operator initiatives in open networking have been mired in the usual standards-group quicksand and they simply cannot advance the technology fast enough to influence the market.  Remember that both cloud providers and operators looking to deploy their own 5G hosting will need something to host.  We have solutions today, but are they optimized for performance in the cloud, optimized for their use of white-box switching (including specialty boxes for data-plane handling)?  I don’t think so, and so the direction that the operator/hyperscaler partnership will take is likely to be determined by the players who offer the right technology, and who they offer it to.

How Will Network Operators Fare in 2021?

Network operators have faced challenges in profit per bit for over a decade.  The major concern has been with the “wireline” segment of their business, the segment that provides business access and VPNs, and consumer home broadband.  Some of them have fared better than others, and since those challenges are surely expanding with the work-from-home, entertain-in-home shift that COVID has created, we’ll see differences in how operators face the future, even in the current quarter.

Wireline broadband, or wireline services in general, pose the greatest risk for change, both in terms of technology choices and in terms of basic demand.  Business broadband and VPN services are under considerable price pressure as enterprises cope with their own profit challenges.  WFH has increased those problems by dispersing workers and expanding the use of the Internet to reach customers, prospects, and partners.  Consumer broadband is obviously under pressure from the same WFH source, but remote classrooms and stuck-at-home entertainment have added to the stress.

Let’s start with businesses and WFH.  Enterprises have two essential models for WFH; one where users “home-run” into the data center via an Internet VPN or SD-WAN, and the other where they attach to their normal local office via one of these mechanisms, and are then linked to the data center through the branch VPN.  Two different traffic types are involved, either way.  One is access to applications, the same general stuff that the worker would use if they were in the office, and the other is the “collaborative” applications, usually centering on video (Teams, Zoom, etc.) but also involving screen sharing and so forth.

Businesses are also seeing a transformation in the way that information users and information resources interact.  Almost every enterprise has increased their use of “portals” that provide information directly to their customers, prospects, and partners.  Many have been moving toward a portal-based approach to supporting their own workers, whether at home, traveling, or in their usual seat in the office.  Portals are often cloud front-ends to legacy applications and analytics, and they’re increasingly seen as providing a “role-based” means of controlling access of people based on what they’re allowed to do.

So far, enterprises aren’t reporting major changes in their access needs, though some are expanding their links to the Internet to support information extension outside their own facilities, including collaboration via video.  However, they do say that as they transform the way they work, they’re likely to do more over the Internet, and likely reconsider traditional VPNs in favor of SD-WAN VPNs.  This is particularly true where it’s important to link sites in areas where VPNs aren’t economical, which is of course the original SD-WAN value proposition.

Forecasts on network spending, both equipment and services, are also showing an expected (if small) reduction for this year.  I hear that enterprises believe they had to do something quickly for WFH, but that they don’t believe they did the optimum thing.  There may be more focus this year on cost management and reduction.  The net, then, is that operators can expect that they’re not going to see a lot of incremental business revenue, if any.

On the consumer side, things are really complicated.  Consumer networking means Internet access, which has always been under price pressure.  In fact, consumer broadband in the past was often carried on infrastructure whose costs were largely subsidized by linear TV to the home.  That’s been under pressure from streaming services, and COVID has added a pair of challenges relating to at-home behavior.

The first challenge is a greater use of streaming, created because many content sources have been unable to produce new episodes of popular shows.  Streaming not only tends to reduce consumer appetite for channel bundles, it also consumes a lot of bandwidth.  Streaming video is the largest consumer of bandwidth for nearly all operators, mobile and wireline.

The second challenge is video chatting and meetings, the latter including WFH.  People who can’t see friends and family in the real world want to “see” them in a virtual sense, which means video upstream and not just flowing downstream.  That upstream load creates an issue, and also a sharp divide among operators.

Whatever the medium used for “wireline” broadband, there are two different models of service, symmetrical and asymmetrical.  The former offers the same speed in both directions, and the latter will offer higher downstream speed than upstream speed.  Even operators who offer symmetrical access will often not traffic engineer for a higher upstream load, and so video chatting creates a major risk for them.  Where access is asymmetrical, the upstream speed is often insufficient to support, for example, one or two parents doing WFH and several children doing remote learning, at one time.

According to operators, consumers have been fairly open to paying more for better broadband services (though not necessarily for the top tier; consumers seem to think that 50 or 75 Mbps is sufficient for most of their needs), but they’re not always available.  Globally, DSL is still a widely used broadband technology, and it’s not only asymmetrical, it’s often unable to deliver the downstream bandwidth a large family would need.  Cable companies seem to have benefitted from COVID in that they saw a slight boost in TV revenues and a number of new customers, drawn often from DSL users who hit the bandwidth wall during the pandemic.

While there’s been a lot of talk about fiber replacing copper, that kind of shift in access technology is difficult to undertake with limited profit-per-bit potential.  In the US, for example, a bit less than a quarter of states have average demand densities sufficient to make FTTH profitable.  Another 22 could justify some FTTN (especially 5G/FTTN millimeter-wave), and the remaining 18 would find it difficult to make even FTTN profitable.  In the US, most of the losses DSL has experienced have been gains by FTTH, but cable companies offer more broadband than FTTH, telco FTTN, and DSL combined.  Most of them are earning some revenue per user from TV, which has helped them deploy more broadly, and CATV cable has a lower pass cost than fiber.

The question for the consumer space is when COVID subsides.  There are some signs that live TV in any form is hurt by the loss of programming, and that if the programming drought continues, customers may try to drop to cheaper (fewer-channel) bundles to focus on local and sports programming.  Others may move to pure streaming sources like Netflix and Amazon Prime Video.

Cable companies, in the markets where they operate, may be under the most pressure, because they rely more on live TV and because their delivery system is inherently asymmetrical.  The DOCSIS 3.1 and 4.0 specifications define something more competitive, but since users on a single span share capacity, there’s a limit to what upgrading the DOCSIS level can do for them, unless they re-segment to limit customer counts.  That would require additional head-end technology and raise costs.  The cable model of TV, being based on linear RF, isn’t suitable for streaming, and while some cable operators have developed streaming platforms, this might be a barrier to adopting 5G/FTTN technology.

In the US, cable companies are also stressing over their mobile service capability.  Many cablecos have entered into MVNO deals with telcos, to supply mobile services or to supplement their recurrent attempts to build virtual mobile networks from WiFi hotspots.  These have been successful in the main in attracting customers, but they’ve been on (or over) the “unprofitable” edge for many.  This doesn’t relate directly to the technology issues that this blog targets, but it does create another headache for management to contend with, and it might spawn a capital initiative (like the use of CBRS spectrum) to replace MVNO relationships.  That would compete with wireline for budget.

Staying with US cable, Comcast apparently didn’t bid significantly in the recent spectrum auction, and it’s reportedly resuming share buybacks.  That means, as the article just referenced says, that Verizon’s deal with them for MVNO services is likely secure, but it also means that replacing MVNO relationships with their own mobile services isn’t something many cablecos are excited about.

For the telcos, a comparison between US operators AT&T and Verizon may be useful.  These two operators compete nationally for mobile services, but have their own wireline territories, inherited from the original Regional Bell Operating Company makeup.  Verizon has seven times the demand density of AT&T, and has the majority of the states whose demand density is in the FTTH range.  All of Verizon’s states are candidates for FTTH or 5G/FTTN.  AT&T, besides having a low demand density, has all the states where demand density is critically low (9 in total) and none of the FTTH-suitable states.

Despite this, AT&T seems sour on the 5G/FTTN hybrid and Verizon is embracing it.  Verizon’s success with FTTH and its strong interest in mm-wave 5G reflect the demand density differences.  For AT&T, while 5G/FTTN hybrids would reach more customers with high-speed broadband, that technology wouldn’t be viable for over three-quarters of their states, and might not be able to generate a viable ROI in those states.  For Verizon, that’s not an issue.

This reinforces the importance of demand density in projecting telco behavior, not only in the US but elsewhere.  High demand density means many different broadband delivery systems are likely to be profitable, and that incumbents would have a risk of competitive overbuild to drive their own deployments.  Low demand density means competitive overbuild is likely impossible and that even the wireline incumbent may struggle to commission any effective consumer broadband delivery model at all.

It’s probably premature to say that all this, in the long term, adds up to a business model transformation for operators.  It does seem likely it will create a driver for one, though.  That means that network vendors should be exploring how they could benefit from a particular transformation path, and then working that strategy to improve their revenue from the operators. Otherwise, they may find that what was a bright spot in 2020 will dim in 2021, and operator profit per bit will be even more problematic, constraining their spending further.  Not a happy combination.

Why “Separate the Control Plane?”

What does separating the IP control plane really do?  It’s been a kind of mantra for a decade that a separation of the control and data planes of an IP network creates a beneficial outcome.  I’ve had a lot of discussions with vendors and network operators on this topic, and there’s a surprising variability of viewpoint on something that seems so widely accepted.  It’s probably time to dig down and explore the issue, which means we have to take a look at what the control plane really does.  Hint: The answer may be “it depends.”

All networks have a fundamental responsibility to turn connections into routes.  A route is a pathway that, if followed, delivers packets to some destination.  This is a bit more complicated than it seems, because router forwarding tables only define the next hop.  In order to build a route, there has to be two things; a knowledge of topology and a means of organizing the hops into routes.  That means having an understanding of network topology, including who can be reached where.

This can be a complex process, but the Cliff Notes are simple (please don’t point out the exceptions; it would take a book to describe everything!).  Each router in an IP network advertises what addresses it can “reach” to adjacent routers, who then pass it along to their adjacent routers.  Routers, in some way (hop count, link state), decide which advertisements of reachability are “best” for them, and so they list the advertiser in their routing table, making it the next hop.

When something breaks (a trunk or a router), the result is that some routes are broken.  The advertising process, which is ongoing, will then define a different hop for routers adjacent to the failure, and a set of new routes will be created.  This process is sometimes called “convergence” because it means that all routers impacted by the fault have to agree on new routes, or there’s a risk of packets falling into space.

The actual process of forwarding packets is simple by comparison.  A packet is received, the IP address of the destination looked up in a routing table, and the packet forwarded to the next hop identified in that table.  It’s so simple that you can reduce the whole thing to silicon, which has been done by many vendors already.

The argument for a “separate control plane” starts with this difference in complexity.  Data forwarding is a drudge job, and route determination is an egghead task that seems to get more complicated every day.  Most router vendors have separated the control plane and data plane within their devices for this reason.  Over time, control-plane processes have been more “computer-like” and data-plane more “siliconized”.

Suppose now that we were to take the control-plane processes out of the box completely, put them in a separate box with a direct 1:1 connection to the data-plane devices?  That was proposed over a decade ago (by Juniper, and perhaps others).  We could now size the control-plane device based on the actual compute load.  We’d still need a high-speed connection to the data-plane device because control packets in an IP network are in-band to the data flow.  Even more separation with this model, right?

OK, now we can look at something other than the 1:1 device relationship.  Might a single control-plane processor manage the control-plane packets for a multiple data-plane devices?  Subject to load calculations and connection performance between them, why not?  Similarly, could we visualize the control-plane device as being “virtual”, a collection of resources that cooperated to manage the routing.  We could combine the two as well, creating a “cluster”.  DriveNets, who recently jumped to over a billion-dollar valuation with their latest funding round, is a cluster model.

The interesting thing here is that there’ a lot of variability in how these many-to-many clusters could be constructed.  A virtual node could be a re-aggregation of a bunch of disaggregated functions.  White boxes don’t come in an unlimited variety of configurations, so today we’re seeing a cluster limited by current hardware.  As we diddle with cluster-creating options, might we find other functions that could be solidified into a white-box model, and thus advance the richness of the configuration overall?  I think we could, and one thing that could drive that hardware advance is an enhanced mission.

We started our discussion with a simple view of a router network.  All of our speculations so far presume that we have the same control-packet flows in our separate-control-plane frameworks as we did with classic routing.  Suppose we now think about tweaking the behavior of the control plane itself?  If a “cluster” is the node, the virtual router, then what are its boundaries, and what might be different on the inside versus at the edge?

SDN took the cluster model of a bunch of data-plane “forwarding devices” and combined it with a centralized control plane implementation.  One controller to rule them all, so to speak.  This creates an interesting situation, because that one controller now has centralized end-to-end, all-device, all-route knowledge of the network.  There’s no need to have a bunch of hop-by-hop adaptive exchanges of topology because that master controller has everything inside it.

SDN is a special case of what a few pundits have proposed, which is that IP could benefit from end-to-end control-plane visibility.  Suppose we have “edge interfaces” where real IP-native behavior has to be presented, with both control and data packets.  Also suppose we have “trunk interfaces” where we only forward the end-to-end stuff.  The edge interfaces feed what’s likely a logically centralized but physically distributed control-plane process.  Now every port is a port on what’s essentially a giant virtual router.  A device is simply a place to collect interfaces, because everything operates with central knowledge.

What, exactly, would a control plane like this look like?  There are probably a lot of options, including the ONF SDN controller option, but let’s look at what we know to be true about any such approach.

First, the data path between the edge interfaces and the control plane has to be able to handle the control-plane traffic exchanges there.  How much actual traffic would have to pass would depend on how powerful the edge interface was with respect to proxying the control plane traffic expected, and how complex the control plane was at that edge point.

Second, this first point leads to a conclusion that the separated control plane is really a kind of hierarchy.  There’s an edge-interface process.  There’s probably a local, perhaps per-site, process that knows all about conditions at a given facility where multiple trunks terminate.  There’s a central process that knows everything, perhaps rooted in a highly available database.  Likely the things that were response-time-sensitive would be close to the edge, and those not so sensitive would be hosted deeper in.

Third, if forwarding is controlled by simply manipulating forwarding tables, then anything that’s based on forwarding control could be co-equal with the IP control plane.  I would submit that this is why it’s reasonable to expect that centralized control planes would eventually combine, whatever their source, and so the IP and 5G, or CDN, or EPC, control planes would all combine, and all determine routes by simply updating those forwarding tables.

I think that if you look at 5G Open RAN, the ONF SDN approach, DriveNets’ success, and the historical trend in control-plane separation, it’s inevitable that this creates a model much like I’m describing.  It’s less capital intensive, it creates a less complex overall network, it improves operations, it integrates all the stuff that’s now threatening to create silos…the list goes on.  The question is how it will come about.  Will 5G “open” initiatives recognize the benefits, will Open RAN promoters specifically address the idea, could white-box vendors, network operators, standards groups (like the ONF), or “disaggregation” vendors (like DriveNets) do the job?

DriveNets is closer to the brass ring from an implementation perspective, for sure.  The ONF’s programmable network concept is a contender, but at a distance that’s created by the inertia of the standards process and the necessary loss of differentiation that implementing a standard creates.  The Open RAN initiative is the real contender.  The concept of the Open RAN control plane in general, and the RAN Intelligent Controller (RIC) in particular, could be a jumping-off point to incorporating some of the IP control plane features into (in effect) the 5G control plane.  DriveNets eats routing from below, and has taken some real steps in the right direction. Open RAN eats it from above, and 5G has a lot of budget behind it that Open RAN could co-opt.

If consolation of all control planes is the goal, I think we can rule out standards as a source of progress—it takes too long.  Operators tend to find refuge in standardization when they want something (though AT&T has been pretty bold) so they can probably be ruled out too.  That leaves the “open” and “disaggregation” initiatives to carry the water, or hopeful water, in the matter.  I suspect that the value of all of this will be clear by the end of the year, so it’s likely we’ll know by then just who will be leading this approach, or if the whole idea is hung up on tactical concerns and unlikely to be implemented despite my prediction.  That would be a shame because I think the benefits to the network community are significant.