March 2021 – Page 2 – Welcome to CIMI Corporation's Public Blog

The Future of the Service Mesh in Telecom Applications

Are service meshes perfect for telecom? Some articles have suggested they are, or at least are essential. I’ve also heard from telecom users who stop just short of saying that service meshes are the death of telecom applications. It’s useful to see why there’s such a divergence in viewpoint, and also to see whether there’s a final decision to be made on the topic.

Let’s start by looking at what we mean by a service mesh. Cloud-native applications are composed, semi-dynamically, from a collection of agile microservices. The “agile” qualifier means that the microservices are designed to be freely distributable across a resource pool, with the ability to scale with load and to replicate to repair a failure. Picture, then, this dynamic system of microservices stitched by workflows. Nice picture, but to get to the gritty details, how does the work find the microservices? We have to address things in a network, and in order to pass work along, we have to know where we’re passing it, in addressing terms.

The problem with this in a microservice cloud-native world is that it may be challenge to know where something is, how many of them there are and which we should be using, and so forth. Reliable and efficient communications among microservices is essential to high-quality experiences and you can’t let every development team come up with their own approach, particularly since cloud-native design favors a lot of reuse of components.

Think of a service mesh as a kind of control-plane-and-data-plane structure, but rather than having each microservice then have to implement whatever technologies the service mesh might include, each is instead equipped with a sidecar or proxy element that represents the service in the mesh. The service stays in a “functional plane” while all the communications is handled through the sidecar. If you change service mesh technologies, the worst you’d face is changing sidecars. Usually even that’s not required because most service meshes use a common sidecar technology such as Lyft Envoy.

To understand what a service mesh was designed for, consider the concept of the API proxy as an alternative. Widely used in applications, an API proxy sits between users and services and presents a constant API to the user side, while mapping the service side to whatever instance should be run. This adds load balancing and failure recovery. A service mesh is a distributed form of the same concept, but the principle is very similar, and we’ll see why that’s important.

There are way more API proxies in the world than service meshes. The reason is that for “basic” service-to-user mapping, an API proxy is fine. In fact, it may well be a lot better, because a service mesh adds latency to something that may already have more than an ample supply—a microservice deployment. Every service is a hop no matter how you discover the route of the workflow, and so transit delay accumulates. Add in the process of sidecar proxy handling and you add in more latency. If we could assume that a given application was more about “services” than “microservices”, meaning that there were fewer components to lace into a workflow, then API proxies would be fine and meshes would be overkill.

Let’s look at 5G, and in particular Open RAN. We have, in the specification, five or six functional components, depending on whether you count the Non-Real-Time RIC as part of Open RAN or part of the orchestration function that’s separate from it. What that means is that compliant Open RAN implementations have five or six services/microservices (whatever you choose to call them). If those were to be broken into microservices, you’d be unable to expose the latter without being non-compliant. So, it would be my view that Open RAN, at least, doesn’t require microservices be used.

Now look at the APIs. I contend that the message flows defined (shown in diagrams with nice descriptive names starting with “Q” and “E” and “F”) are explicitly steered by the specification. You could easily implement this using an API proxy, which could handle auto-scaling and resilience.

My point is that if all you’re doing in 5G is implementing the Open RAN or 3GPP specifications, there simply isn’t enough component complexity to require either microservices or service meshes. Does this mean that meshes and telecom aren’t perfect together? Not yet.

There’s 5G, and then there’s 5G, as an ecosystem. By the former, I mean the communications and connectivity features mandated by the standards. By the latter, I mean the system of applications, features, and tools that create application value, manage the infrastructure, and manage the services. If you look back at an Open RAN reference with fresh eyes, you’ll notice that the RAN Intelligent Controller (RIC) is divided into six or so elements of an “application layer”. Those might be microservices, right? Then there are the applications that have justified 5G like (hypothetically) IoT. They may also benefit from cloud-native, microservice-based, design.

Or not. We still haven’t accepted a basic truth, which is that it’s not only possible but easy to decompose applications too far, to create so many coupled components that the network delay created is insurmountable, even without any additional processing or handling at the mesh or proxy level. It’s been a practice in software for decades to write modular code. A dozen modules combined into a single load loses the granular scalability of the same dozen distributed in the cloud, but gains efficiency. Before we decide that service meshes are essential in telecom, and in particular in 5G, we have to decide whether we’re “meshing” to accommodate a decision to decompose more than we should have.

We may be a victim of our organizations here. Development and operations have been traditionally separate, to the point where practices to unify them to deploy applications efficiently turned into an industry with its own name—DevOps. I’ve talked with a lot of enterprise operations people who are moaning about the lack of operations efficiency awareness among developers. They see over-componentization as a contributor to degradation of QoE, and in many cases an increase in operations cost and complexity too. We could see the same thing if we rush forward to componentizations with minimal benefits in the telco world too.

Every new technology isn’t universally valuable, and some aren’t even particularly useful. When the value of service meshes is linked to the value of cloud-native, and when we’re trying to apply cloud-native principles to an infrastructure model like 5G that was designed around virtual boxes not real virtual functions, we’re constraining the benefits by limiting the implementation.

However, and it’s a big “however”, we have to accept that there are indeed higher-layer services that depend on microservices, cloud-native, and service mesh. We may not want to make them universal in telecom, but we probably have to prepare to adopt them where they make sense. That means that, like cloud computing, cloud-native and service mesh may coexist with other more traditional development models for years, and maybe decades.

The Hypothetical Edge is the Big Driver of the Real Cloud

If it happens at all, edge computing just might become more important than cloud. Not so much because there would be a ton of stuff hosted there, because the majority of application code won’t be. Not because of explosive growth in things like self-driving cars, because they won’t contribute much to the edge for years./ Because there could be just enough stuff to change the cost dynamic in favor of broader adoption of cloud computing. A little edge could pull through a lot of cloud.

There’s a common view that cloud computing has an inherent cost advantage over data center computing because of “economy of scale”. That’s not true in most cases. The inherent economy of a resource pool generally tracks an Erlang curve, meaning that as the size of the pool grows you reach a point where additional scale no longer improves efficiency. In other words, there’s a point where data center size is such that when you factor in cloud provided profit margins, the cloud isn’t cheaper.

My own modeling has been pretty consistent in saying that about 23% of applications would be cheaper to run in the cloud at prevailing cloud pricing. About 70% of them are already there, and today most of public cloud growth comes not from “moving” to the cloud, but from transforming application front-ends, meaning GUI and related elements, to take advantage of cloud elasticity under load and resilience.

Suppose we wanted to increase the use of cloud computing, which certainly the cloud providers do, and likely other symbiotic groups as well. One way to do that would be to promote those front-end benefits by creating an application framework that could deliver more value for the same dollars. Another would be to somehow change the Erlang outcome. Edge computing could do both.

Let’s do the Erlang stuff first, because it’s the easiest to talk about. Here’s a simple principle to consider: There’s no such thing as a centralized edge. Resource pools are “pools” only if they’re built in such a way as to essentially eliminate differences in cost and performance among the members of the pool. When you spread out a set of resources, you lose the ability to leverage facility and operations efficiencies, you can no longer grab any resource and get the same result because of latency, and you can’t match application needs to remaining server capacity as easily.

Know what kind of applications got moved to the cloud the fastest? Applications that ran on dedicated, distributed, servers rather than in the data center. Server consolidation, in other words. Edge computing is a model for server “un-consolidation”, for explicit distribution of compute resources necessitated by the value of having them proximate to the data they’re processing.

There are a lot of reasons to move processing to the edge, including a desire to reduce latency. IoT and virtual/augmented reality applications need edge computing of some sort, and while it would be possible (and in some cases even advisable) to self-host these edge applications, there could be a considerable economy of scale gained by pushing the processing back a bit so that resources for a whole geography of users could be combined into a pool.

The reason for that is that the Erlang curve is steepest at the start, meaning that a small increase in the size of a resource pool yields a big benefit in cost efficiency. An important edge application might need server redundancy if you self-host, and some form of local technical support. Both those can be provided automatically out of a shared-edge-hosting service. However, there are some critical considerations in just where that shared hosting is located.

The limiting factor, even financially, in sharing edge computing is latency, which translates to the size of the geography from which a given shared-edge complex draws its users. What you’d like is to draw users from an area as large as latency permits, so that your edge complex is high on the Erlang curve and can replace user-hosted edge resources by offering a good price, but still earn a good profit. As the prospective service area gets bigger, not only does the latency increase but the incremental benefit in economy of scale is smaller because of the Erlang relationship. In short, you don’t have a more efficient edge resource, and you offer the user less.

This doesn’t address the question of whether there are credible edge applications in the first place, of course. Most of the discussion about edge computing has centered on IoT, and in fact on a combination of 5G and IoT. Every technology element you add to a mission adds a cost that demands a compensatory benefit. Every company doesn’t have an IoT application. Every IoT application doesn’t demand general-purpose edge computing. 5G for IoT needs its own justification. We need to expand our thinking here.

One possible expansion is to think of the “edge” as less a place that data center stuff moves forward into, than perhaps where user stuff moves backward into. We can see an example today, in the expanding interest in Chromebooks. A Chromebook is a lightweight edge element that relies on cloud/web processes for the heavy lifting. It’s almost a programmable dumb terminal, and so what it does is to cede what’s usually done on a PC to a cloud application. Could some of the things that a Chromebook does work in partnership with the edge rather than the cloud?

Another question is whether some of the “cloud” stuff needs to be distributed further out. I’m sure you’ve noticed that cloud providers have been working on their own edge strategies, some of which involve partnerships with network operators to get access to suitable edge hosting sites, and some of which involves architectural extension of cloud middleware (web services) to the premises, so that users who adopt edge hosting aren’t abandoning the cloud to do it. Surely this means that cloud providers see a risk that some cloud applications might migrate to the user edge, which means they think there are already candidates.

Probably the biggest potential application for the edge, the thing that could be most decisive in shifting more emphasis to the cloud overall, is virtual/augmented reality applications aimed at worker productivity. These offer the potential to launch a new wave of IT spending, the fourth such cycle since the dawn of business computing.

I’ve attempted to model this, so far exclusively for the US market where I have a lot of previous modeling work I can draw on. If we assume that a full IT spending cycle takes 10 years, which is roughly consistent with past cycles, then the total potential total-cycle business spending impact of IoT and VR/AR on productivity would be over $900 billion. The portion of that which could be assigned to cloud/edge services would be about $250 billion, and the annual spending for the peak period of 2025-2028 would be over $34 billion. I estimate that almost 80% of this new spending would be edge spending.

This would be a pretty decent pot for cloud providers to chase on its own, but that’s not the end of it. The new productivity cycle would be drawing on current applications, some parts of which (the front-ends) are already in or moving to the cloud, and the remainder still in the data center. If we were to add in this new-cycle productivity application set, it would have the effect of shifting some current applications “edge-ward”, moving more of the current cloud components to the edge and more of the data center components to the cloud. It’s really difficult to model this process, but based on what I have so far, I would expect that the net effect would be to increase cloud/edge spending by another $79 billion in each of those peak years. That would mean a new cloud revenue opportunity of $113 billion per year, which would be enough to transform the public cloud leader board if it were distributed differently than today’s cloud spending, which it likely would be.

All of this depends on realizing what I’ve called “point-of-activity empowerment” in prior blogs. In short (you can do a search on the blog page to find other references to the term), this combines VR/AR technology with IoT and “contextual analysis” of user mission and location to provide workers with assistance wherever they happen to be, always in the context of what they’re trying to do.

That’s the reason why edge impact on cloud spending has to be qualified by the “if it happens” I opened with. There’s an enormous build-it-and-they-will-come bias in tech, a notion that resource availability fills pools with applications like petri dishes grow bacteria. It takes more than that; applications that either improve productivity for businesses or quality of life for consumers are the basic engine of technology change, and so they will be with the edge. Who drives all of this? My bet is on the cloud providers because they simply have the most to gain.

Do Spectrum Prices Threaten Telcos?

Is the high price of 5G spectrum hurting telcos? AT&T has been under particular pressure from the financial industry on the point, but claims to be confident in their finances. There are a lot of moving parts involved in deciding whether spectrum costs are a risk, and the answer may be critical to the long-term financial health of the telcos involved. Thus, we need to look at the picture closely.

One good place to start is a Fortune article that offers a look at the record-breaking US spectrum auction and does a decent job of ranking winners and losers. It rates T-Mobile and Verizon as winners because the former didn’t have to buy as much (it had a lot of Sprint spectrum available) and the latter really needed to get a lot of spectrum, and did. The big loser, according to the article, is AT&T.

The financial markets weren’t thrilled with the auction because it seemed to them that all (or all but T-Mobile) spent so much on spectrum it hurt their debt load and perhaps limited investment in future infrastructure. They particularly didn’t like AT&T’s situation post-auction because AT&T already had a high debt level, has just agreed to sell off part of its DirecTV business, and has also seen some executive changes. But can a simple financial-market view really tell the story?

It seems to me that the Fortune piece, by making a casual, often-cited, and erroneous point, actually starts us in the right direction. How many times do they talk about “speedy” or “fast” or even “superfast”? What they’re implying is that the 5G market will be competitive based on speed, and that all the telcos needed to get themselves up toward the front of the pack in that area. The question is whether that’s true, and if so, for whom in particular.

I’ve noted many times that the typical mobile device user would be unlikely to notice 5G speed differences (I see none in my services). The great majority of the mobile experience is streaming not downloading, and streaming speed is set by the content rather than by the connection. That point leaves us with just a few possibilities to explain the land-rush spectrum auction. First, operators went nuts. Second, operators think speed will be a competitive driver in mobile service even if nobody notices it. Third, there’s something more complicated going on. I’m doubtful on the first two, so we need to explore what complicated things might be at the root of this.

One possibility is wireline replacement. C-band spectrum, the stuff we’re talking about here, isn’t going to deliver the capacity of millimeter-wave 5G, but a single frequency has been shown (by Ericsson) to be able to deliver over 1.5Gbps to a user, and actual tests of services at this frequency range show it could deliver 100Mbps, the broadband sweet spot de jure. Because C-band has a nice combination of capacity and range, it could serve to replace wireline in rural areas where demand density is too low for fiber to the home to be profitable.

A second possibility, the one that gets cited the most, is IoT. Some love factory IoT. Some love connected or autonomous cars. Others love smart buildings, and some just love a good story. The problem with the IoT story is that it’s appealing on the surface, but not very deep. We have factory IoT today, as well as smart buildings, and we don’t use 5G for them. There’s no credible mission for 5G in an autonomous vehicle; things that need low latency and high bandwidth like collision avoidance are necessarily on-vehicle functions. Could an IoT explosion be a driver, though?

Possibility number three are applications of AR/VR, a topic I blogged on recently. This would include both gaming and future AR/VR “contextual services” and “point-of-activity empowerment” tools. Here we have some potential, but my “complicated” qualifier surely applies. There are a lot of elements needed to make any of these applications a big driver of 5G, and even then it’s not clear whether users would pay more for 5G benefits, or just take what they could get at the usual mobile-service pricing.

Some have also suggested that we might see 5G-ready laptops and tablets, whose larger screens and potentially greater computing utility might justify faster connections. I think there are in fact some applications for this sort of stuff, but I think they’re more likely to evolve to exploit 5G than to drive it, or its revenues, forward.

The revenue point is critical here, because operators are not only committing to a higher spectrum spend than ever before (Verizon spent more than most analysts thought the entire auction would bring), but to 5G infrastructure too. It’s not enough that 5G works, it has to generate ROI to cover a pretty substantial new investment.

Where does it leave the spectrum bidders and winners? I think T-Mobile has the most “classical” of the telco situations. They’re an up-and-comer and they think 5G will improve their mobile market share radically. They didn’t have to break the bank to get enough C-band spectrum to fill the gaps in what their Sprint spectrum covered, so we can say they’re fine.

Verizon is, in my view, the operator who’s a bit like a billionaire shopper. They spend a lot more than most, but they can afford it. Verizon has the most profitable territory of all the US telcos, so they can stick with the old principle that when you’re a telco you fear competition more than seek opportunity. They can’t leave a big hole in their 5G service set, and without C-band, that’s what they had. They’re OK too.

AT&T is a harder play. If they don’t have a very specific plan to draw new and credible revenue out of their overall 5G investment, then I think they took a big gamble with the bids. Maybe, like Verizon, they were working to shut out competition. Maybe they believe that they can be profligate with spending, again like Verizon. Maybe they intend to push forward on one or more of my “complication” themes above. The thing is, this isn’t a good time for them to be taking risks, particularly risking their dividend, which is holding their stock price out of the basement.

AT&T isn’t the big loser, though. For that, we may have to consider Dish or Comcast. Dish won only one license and Comcast (who jointly bid with Charter) didn’t win any. These bidders will either have to give up mid-band 5G aspirations, which may mean giving up any hope of significant 5G market share, or hope to make a big play in the later 3.5GHz auction. What happens if that also blows past bidding records out of the water?

None of these are what I’m worried about, though, and I don’t think they’re the stuff telcos need to be worried so much about either. I think that there are pathways to redeeming the 5G spectrum investment, and I think that MVNO deals will serve for Comcast. Dish won’t get everything it wants spectrum-wise, so they’ll need to combine MVNO service and their own 5G. What all these people are facing is a potential opex crisis, and I’m not sure they see it coming.

Telcos have always been naive about operations. Their organizational makeups tend so split service (OSS/BSS) and network (NMS/NOC) operations, and virtualization shapes services and networks from a pure technology side. Virtualization, which is mandated by 5G, also creates more things to manage, more layers of complexity to dig through. AT&T took a stab at service lifecycle automation with ONAP, but they got started wrong and never fixed it. Nobody else has really got a clue either, and as most future 5G revenues depend on raising the feature level of services by adding additional elements above traditional networks, 5G is going to push these guys into the cloud whether they like it (or even know it) or not.

For AT&T, efficient operations is critical because of their low overall demand density. They can’t afford inefficiency, and they can’t afford service issues in 5G that could tarnish their brand. For the other telcos, the need might not be as immediately urgent, but they need to realize now (before it’s too late) that 5G operations is inherently more complex because of the hosting and software dimensions, and that additional services will take them only deeper. How much of the essential ROI that the high spectrum prices will demand might be justified by operations problems is hard to say, and nobody should be eager to find out.

My view is that spectrum pricing is hurting the telcos, whatever they say. There’s only so much you can spend on infrastructure, an amount that’s set by your internal rate of return, before you start eroding your own financial position. The worst damage may not be suffered by the telcos though, or even the bidders in general, but by network vendors. What you spend in one area, given a presumptive constant pool of available capital, you have to save somewhere else, and equipment is where the savings is most likely to come. Pressure for lower technology cost will mount, favoring open-model networking.

The only ways out are first, that new service set, the value-add above the connection, and second, significant operations automation. If I were a network vendor, this is what I’d be looking at right now.

Why Networking Needs to Catch Software Operations Fever

Remember when we used to talk things over? I’m not talking about relationships, but about faults and problems. Have we, as an industry, gotten too focused on real-time analysis and forgotten “retrospection?” There’s an interesting article on this, as it relates to software operations, that we could think about applying to networking. Note that the article distinguishes between “retrospectives” and what it calls “debriefings” because software engineers use the former term for a type of formal design-team review.

I remember working on a big real-time project with a lead architect from computer vendor. We had a failure and he immediately ran around trying to set the conditions up so that the problem could be observed. First, he couldn’t get the problem to happen because the conditions were wrong. Then the attempt to observe changed the timing. I got disgusted pretty quickly, so I started rooting around in the aftermath, and in about ten minutes of discussions and exploration, we found and fixed the problem.

Networking is really a perfect place to apply this sort of retrospection for fault and root-cause determination. In network problems, setting things up is often impossible; you rarely know what was happening so setting it up again is a dream. Network problems are often timing problems, making them very hard to diagnose. I wondered about whether enterprises were thinking this way, so I went back through my recent discussions and found some interesting truths.

I’d chatted with 44 enterprises on matters relating to their procedures for fault management and root cause analysis. Of that group, only 4 had volunteered that their procedures included an interview with the user. Only 9 had interviewed the network operations people who worked on the problem after the fact, to establish whether they believed that the root cause had been identified. Two enterprises, by the way, did both—five percent.

Any real-time system, networks included, need to be assessed in context, which means finding out what the context was. When we have PC application problem to assess, it’s common to ask the user “What were you trying to do?” Wouldn’t it make sense to capture that information on the networking side?

The argument could be made (and in fact has been, many times) that interviews don’t do any good because there’s too much going on in the network that causes problems elsewhere. The old “butterfly’s wings” argument. It sounds good, but of those 44 enterprises, 30 said that in most cases, a network problem was caused by a “predictable source”, which meant an action taken by someone or a condition someone knew about. Given that a total of 11 of the group had interviewed anyone, it’s clear that these predictable-source outages were determined to be predicable after considerable analysis.

I went back to a half-dozen enterprises after I’d read the article I referenced and asked them a simple question; what usually gets reported to you when there’s a network problem, what gets reported from that contact into a trouble ticket, and how carefully do you establish the exact time of the problem. I guess you know how that went.

Everyone agreed that trouble tickets were digestions of the report itself. They also agreed that what was retained was a description of the problem itself, not what was going on before it. Finally, they agreed that their help desks recorded the time of the report, not the time of the incident, which meant that it would be very difficult to correlate the problem with activities taking place outside the awareness of the user who reported the problem.

Another interesting point was that only 5 of 44 enterprises said that they routinely analyzed problem reports in the context of general application and user activity. This five said they would review application logs to determine what was being run and by how many, for example. Three said they would contact other users of an application if the person who reported the problem indicated they were running, or trying to run, it. Almost all the 44 admitted they could probably do better in communicating with the IT operations team when analyzing a network problem; most said that happened only if the IT team referred the problem to them.

A failure to solicit user feedback, human impression, is particularly scary when you consider that the percentage of problems this group of 44 users said were directly caused by human error was an astounding 78%. It’s also scary when you consider that all of the 44 users said that retrospective interviewing “might help” accelerate lasting corrections.

Some organizations (about a third of my group of 44) treat problem analysis as a formal collaborative task, but among network professionals. The remainder may make contact ad hoc with network operations team personnel during fault analysis, but they don’t have any specific tools or procedures to support the conversations.

Formalism in the sense of specific tools to encourage collaborative problem analysis and resolution seems critical in another way. As the article notes, having a group hug isn’t going to make long-standing changes in your network. Discussions have to end in recommendations, and everything needs to be documented. One network operations lead I was chatting with spontaneously admitted that “I think sometimes we have to solve the same problem three or four times before it gets recognized and documented.”

That comment suggests that it’s not simply a matter of communications and collaboration to facilitate network problem resolution. You also need to record the result, and in fact you should view all the steps involved, from gathering information about the problem through taking steps to isolate it, and onward to solutions, as important enough to record and index. This might be a place where artificial intelligence and machine learning could come in, as long as we had proper records for them to operate on.

Almost every network user has a trouble-ticket system that’s designed to track problem resolution, but users report these systems are rarely useful as a reference in analyzing new problems. Do you index tickets by symptoms? A lot of unrelated stuff then gets connected. How about by the area of the network involved? At the start of a fault analysis, you don’t know what area is involved.

There are enterprises who keep detailed network fault analysis records, but none of the enterprises I talked with said they were in that group, though as I noted above, many realized that they needed a system to record what they saw, tried, and what worked. One company said they had tried using a simple text document to track a problem, for the logical reason that the tool was readily available, but said that it quickly generated either long, meandering, texts that spanned multiple problems, or tiny isolated ones that couldn’t be linked easily to a new instance of a problem, and thus were never referenced.

A couple of my contacts said they wondered whether software development tools or project management tools could serve here, but they hadn’t tried to use them. They really seem at a bit of a loss with regard to how to move forward, how to modernize fault management to make networks more available and responsive to business needs. It makes you wonder why things seem to have changed so much.

Two reasons, I think. First, companies are a lot more reliant on networking these days. Even before COVID there was a gradual increase in network-dependency, created by the increased use of online customer sales and support. COVID accelerated that, and introduced work-from-home (WFH), essentially remaking project team dynamics. Second, networks have changed. In the early days (which were really only 20 or 30 years ago), companies built networks from nodes and trunks. Today, they rely more and more on virtual networks, and these are becoming more powerful and more abstract with the advent of SD-WAN and the increased use of the cloud. Technology may have a five-year lifespan, but human practices tend to live a lot longer.

Fault management is a piece of network operations, and to link back to the opening here, the article on interviewing and software operations. The software world, perhaps because they’re the force driving their own bus, has done a better job managing the growing complexity of virtualized resources than the network people have. It’s obviously time to catch up.

Making Virtual and Augmented Reality Real

How could we use augmented reality and what would that use require from the technology itself? AR is perhaps a key in unlocking a whole new way of empowering workers and engaging consumers, but like other new technology arrivals, there seems to be a complex ecosystem needed, and it’s not entirely clear how we’ll get to it.

Since the dawn of commercial computing in the 1950s, every major advance in the pace of IT spending has been linked to a paradigm shift that resulted in bringing information technology closer to workers. We started off with batch processing and punched cards, and we’ve moved through to online systems, portals, and online team collaboration. It seems pretty clear that the next step is to bring IT right into our lives, through the best possible gateway—our eyes.

The really advanced enterprise mover-and-shaker technology planners I talk with have been telling me that they’re excited about the potential of AR, but that they don’t really see a lot of hope in developing a framework to use it in the next two or three years. In fact, they couldn’t give me any real idea of when such a framework could be expected, in part because it isn’t clear what the parts of that framework would be. Since I hate to leave this issue hanging, I’m going to look at it here, starting at the top, or rather the front.

It’s all about eyes. Our visual sense is the sense most able to input a large amount of complex information quickly. AR is useful because it promises to be able to mix the real world with information that relates to the real world. Think of it as a productivity overlay on reality, a means of integrating visual data and linking it with the object in our visual field that’s referenced by the data. You look out over a crowd and AR shows you the name of people you should (but probably don’t) recognize, superimposed on them.

The challenge here is to get the artificial-reality world to line up with our view of the real one. There are two broad approaches. First, you could synthesize a real-world view from a camera, perhaps on the AR headset, and mix the virtual data with that. Since the camera is showing the real world in digestible form, it’s literally a mixing function. The second is to use the camera as before, but keep the camera view of the world behind the scenes. You use it for reference only, to “know” where stuff you’re analyzing is positioned. The virtual data is then injected onto the visual field, presumably through a translucent overlay on what the “AR glasses” see.

We have the technology to do the former perhaps a bit better than the latter, but the two approaches have a different set of issues when we dig into how the stuff has to work.

The first requirement of any visual system is that it doesn’t kill the wearer or make them ill. Anyone who has used immersive virtual reality knows that it can be literally dizzying, and the primary reason for that is that the visual experience tends to lag movement of the head and eyes. That same latency would impact the ability of the VR system to show us something moving quickly into our field, like a car on its way to a collision with us. An AR system that’s based on augmenting the real-world view rather than creating it still may have latency issues, but all that’s impacted is the digital overlay, not the real-world view.

The complicating issue in this visual-lag problem is the challenge of processing the image in real time, providing the synthesis, and then sending things to the glasses. It’s unrealistic to think that the current state of technology would allow us to create a headset that could do this sort of thing in real time, and still not cost as much as a compact car. Even modern smartphones would find it difficult. If we offload the function to an edge computer, we need really low latency in the network connection. Our hypothetical attacking car might be moving at 88 feet per second. If our image took 5ms to get to the edge, 100ms to be processed, 5ms to get back, and 1 ms to be displayed, we’ve accumulated 111ms, and our attacking car has moved just short of ten feet in that amount of time.

Turning your head could create the dizziness issue when you manage to avoid the attacking car. People turn their heads at various rates, but a fairly quick movement would turn at a rate of about 300 degrees per second (not to suggest you could actually turn your head 300 degrees unless you’re an owl; this is just a rate of movement). A rotation of even 30 degrees is enough to create a disturbing visual experience if the system doesn’t keep up, and that would take about the same time as our round-trip delay, meaning that your visual field would seem to lag your movement in even a minimal shift. It doesn’t work.

OK, are there any alternatives? Sure; we could carry a special device somewhere on our person that would do the processing locally. Is there anything we could do to improve response times other than that?

Maybe. Suppose that we have a 3D computer model of the street our hypothetical user is walking along. If we can know the person’s head position accurately, we can tell from our model what the person would see, and where it would be in the field of view. That means we could lay a data element on our virtual glass panel, while letting the real world view pass through. We could pass the model to the user when they turn onto the street, and it would be good for at least a reasonable amount of time.

There are many who would argue that you can’t set up 3D models of the world, and that’s probably true, but it’s also true that you wouldn’t have to. First, for worker empowerment, you’d likely need a combination of city models for sales call support and facility models for worker activity support. Companies could do these themselves with available laser technology and mapping. Second, the majority of the opportunity to serve consumers with data overlays on reality would come from retail areas, because retailer interest in the concept to market more effectively would be the biggest source of revenue to providers.

There could be a lot of money in this. My modeling says that a combination of AR technology and the contextual IoT information used to support it could improve productivity enough to increase business IT spending over a ten-year cycle of investment by an average of about 12% per year. It’s proving difficult to model the consumer opportunity, but I’d estimate it to be several hundred billion dollars. The problem is that to get the job done we’d need a lot of moving parts addressed, and there’s no concerted effort to get the job done. We’ll need to do that eventually, if we want to see AR become as pervasive and valuable as it can be.

Would an Architecture to Promote Vertical Services Help 5G?

Just what is needed to broaden the potential revenue base for 5G. One idea that’s been gaining in credibility is a vertical-market focus. Both Google and IBM have taken vertical steps recently, and now we have an interesting article on the architecture of a vertical exploitation of 5G. You all know I’m a big, unashamed, fan of architectures and of vertical-oriented service strategies, so we’ll take a look at this one in light of both industry trends and industry needs.

The basic premise of the article is one I can surely agree with. Progress in network, cloud, and general IT these days involves adding new features that often end up adding new components from new vendors. There’s an expanding, even exploding, problem of integration that the buyer is stuck with. Not only that, if the application area being considered is new, as 5G surely is, there’s a major risk that all the pieces of the ecosystem needed to make progress at the application level won’t be in place at all.

The solution the piece proposes is an “Industry Digital Services Platform” (which I’ll abbreviate hereafter as ISDP to save me typing and you reading). The ISDP consists of a generalized gateway layer that feeds a three-layer stack that consists of vertical solution players, cloud providers, and network operators. All the layers are linked by APIs, and if the conventions of the architecture and the APIs are followed, specific vertical solutions would link into the architecture and work with little or no integration on the part of the user/buyer.

There’s a lot to be said for this approach. If applications that (in this case) were prospective consumers/drivers of 5G services could be fit into a composable framework that could then be used to support third-party providers and end users alike, the sum of the application software would surely be greater than the parts. Thus, the question is whether such a framework could be developed.

One obvious barrier to the development is the time required to do the job. You could argue that public cloud providers would have seen an opportunity for this sort of thing almost from the first. While there have been some vertical-market announcements by public cloud providers, none of them have really proposed to create any such framework. Even if they did, you’d surely end up with one framework for each provider, and that would likely mean that actual providers of vertical-specific components would have to integrate separately with each.

A formal standard might be the solution to this problem, but that creates problems of its own. There are standards groups that are associated with the structure the article describes, but each layer has at least one such group, and for vertical applications there’s arguably one for each vertical. It’s possible that a single group could take up the problem, but just what group that might be isn’t clear. Then there’s the question of time; standards processes are something I’ve been involved with for decades, and they tended to work at the pace of glaciation or the formation of stalactites. I doubt that the market would wait for the process, and certainly there will be a lot of embarrassed 5G proponents long before any hope of progress could arise.

The next question is how the “hybrid cloud” fits into the picture. Yes, it’s true that we’re talking about a way of creating driver vertical applications for 5G, but can we expect enterprise customers for these services to move totally to the cloud? Despite market hype to the contrary, I see no indication that this kind of migration is pending, or even contemplated. My own financial modeling suggests that it couldn’t be cost-justified. So, if there are data center pieces, how are they integrated? Are we then involving yet another set of players who have to cooperate in our architecture? And let’s face it, we don’t even have a standard way of doing hybrid cloud in general.

A final question is whether “vertical” in the sense of the article and the specific chart of the architecture is the best approach. The fact is that while industries all have their own core software set, built to support their specific mission(s), there are a large number of “horizontal” functions that could be used to compose at least a big chunk of the verticals. Someone, in a LinkedIn comment on the piece, made that point. Should we focus on creating components, horizontal components, from which vertical support could be composed, supplemented by specialized components for the vertical?

From these points, you might think I don’t like the idea, but that’s not the case. I totally agree that vertical market applications are the key to driving 5G-specific revenue opportunities. What I’m less certain of is that we have a path to achieve them, or the model the article presents. It lays out a good strategy, but there’s a lot of heavy lifting needed to realize it. I guess the big question is “What does that lifting?” I believe there are two options that could work; a cloud provider and open-source.

The key layer in the architecture the article depicts is the public cloud layer. The cloud provider has engagement with developers already, and it also has horizontal tools that could be put into play. Hybrid clouds are the key to their success, so they can be relied upon to come up with a workable hybrid strategy. These cloud providers are also already engaging (or trying to) with operators on 5G hosting (edge hosting).

If a cloud provider were to jump on this notion, they could address all the issues I’ve raised, save one. Their solution would be proprietary, and while other providers would surely make similar vertical moves, the software providers and users would have to either pick a single partner or customize for each public cloud platform.

The open-source approach has engagement at the vertical layer. We already have some vertical-market software available in open-source form, and we have a significant amount of horizontal software, including all the platform tools needed. A foundation like the Linux Foundation could certainly host something like this, and with proper design and development care, an open-source package could be portable to any cloud.

The downside of this approach is sponsorship of the open-source project, and the time involved in getting it to progress. Development times are notoriously difficult to estimate correctly; everyone tends to be too optimistic.

In my early career, I had a manager who became known for his ability to make uncannily accurate estimates of project times. I was working at a computer vendor, and attending a meeting with my manager and the systems programming team, who offered their estimate of the project. My manager shook his head and said they were about a third of the right number, which turned out to be correct. When the VP asked my boss how he estimated so well, he said “I use the Harvey Principle.” The VP was quick to ask what that was, and my boss said “I used to have an employee named Harvey, who was the dumbest SOB I ever worked with. When I estimate a project, I ask myself ‘How long would it take for Harvey to do it?’” Enough said; complications always arise and there are plenty of Harveys in the world.

So what’s the answer? Do we have to give up on the architecture? No, but we have to accept that there’s no perfect way to achieve it at this point; certainly no way that would advance 5G on the schedule that operators and network vendors hope. That doesn’t mean we should throw in the towel, only that we’re not going to get instant gratification. I’d say minimum 18 months, and that’s a lesson in itself. If we need new services and applications to drive a network technology, we need to work on them when we’re starting work on the technology they’re expected to support. Otherwise we’ll leave that new technology waiting at the alter.

Should Managed Network Services Take a Cloud Lesson?

Kubernetes has been almost as revolutionary as containers, and almost synonymous with that concept too. As great as Kubernetes is, though, it’s often seen as a complexity black hole, and its complexity is surely both a barrier to its adoption and a barrier to cloud-native development. Google doesn’t like that, largely because cloud-native support is perhaps the biggest differentiator for Google’s cloud services. Well, Google is now doing something about Kubernetes Complexity with GKE autopilot. You can read about it HERE and HERE. I don’t want to cover Autopilot as a specific container technology; others will surely do that. I want to look at the market implications instead, and in particular what it means for networking, but that still requires a bit of Kubernetes and Autopilot introduction.

The most important thing to know about GKE Autopilot is that it is GKE Autopilot and not “Kubernetes Autopilot”. It’s not intended to be a generalized operationalization layer to simplify Kubernetes, but rather an added feature to the Google Kubernetes Engine feature of Google Cloud. GKE is a managed Kubernetes service, and what Google is dealing with in Autopilot is the fact that even managed Kubernetes may be too much Kubernetes for some users.

Any tool that’s designed to automate deployment and redeployment is replacing a manual process that’s complicated enough to be problematic. Otherwise, it wouldn’t be needed. The problem is that automated process setup can itself be complicated, particularly when the thing being automated is shooting at a moving, evolving, target. That’s what GKE, and the new Autopilot, is all about.

In a very real sense, both GKE and Autopilot are part of a broad trend I’ve talked about before. Both networking and IT are muddled by a constant wave of innovation. Yes, it’s added many valuable features and tools have been improved based on the experience of real users. In many cases, startups have been responsible for creating the newest and best stuff, and this has created a challenge of complexity in simply integrating all the stuff.

Functionally, Autopilot extends the GKE concept of a managed Kubernetes control plane to nodes and pods. With vanilla GKE, a user gets the managed control plane but still defines their own cluster configuration based on their needs. The GKE SLA doesn’t extend to nodes and pods. Autopilot lets a user define their cluster in “Autopilot mode”, where all the best practices for security, reliability, and availability are baked in by Google’s Site Reliability Engineering (SLE) process. Autopilot makes GKE more of a true managed service.

Which, of course, is what users want, apparently, and that’s my segue into the market implications. I know I’ve noted this user quote in a past blog, but it’s one of my favorites because it’s evocative. “I don’t want decision support,” a bank CIO told me, “I want to be told what to do!” Tech is a strange land to be a stranger in, according to almost every decision-maker I’ve talked with. Another quote from the same industry shows one good reason to feel that way. “You’ve gotta understand that if I screw this up, there’s fifty banks I could never work for!” A little exposure goes a long way, and I think Google recognizes that.

The broader market is likely doing the same thing, and that has major implications for the networking space. I think that the same forces driving Google to launch Autopilot will drive operators to launch a better set of managed services. We know that Autopilot is likely a strong step toward the right answer for IT services, but does it go all the way? How about network services?

Managed services have been around for a very long time, but most managed services are created by adding a management professional service to a traditional network service. The managed service provider (MSP) may integrate or develop special software to reduce their own human cost and improve service features, but the nature of the service is still pretty much what it was.

When SD-WAN came along, managed services got an additional kicker, because most SD-WAN deployments involved sites where there were no skilled technical people at all, much less network professionals. In many cases the sites were rural, or scattered across multiple countries, some of which were third-world locations. SD-WAN alone roughly doubled the number of MSPs.

It also introduced a new approach to the whole managed service story, one that added additional features to the basic connectivity offered by SD-WAN. This feature expansion came about because buyers had other virtual-network missions they wanted to address, like connecting cloud computing applications; in the cloud, extending the company VPN isn’t possible. Then, of course, there was the usual competitive-dynamic issue; SD-WAN vendors wanted to add features to attract customers, shorten the sales cycle, and potentially build margins.

One of the key elements in a managed service is the empowerment of the SLA process, which depends on gathering operating statistics. MSPs have to guarantee something, and both users and MSPs need some way of monitoring what’s being guaranteed. Statistics on operation are also a critical in any automation of SD-WAN management.

The reason why SD-WAN has become a kind of focus point in the whole MSP telemetry game is that, in order for SD-WAN to work, it has to have an agent element that’s sitting essentially on the point of user connection. This is usually a critical boundary in the SLA game, since MSPs will either set it as the limit of their SLA, or charge for taking responsibility deeper into the user network.

This might be a part of the reason why Juniper, who bought SD-WAN vendor 128 Technology earlier, has now announced it’s integrating 128 Technology’s data (which includes session-specific detail) with its Mist AI and Marvis virtual network assistant technologies. Enterprises could use the combination to enhance their own network management, but of course MSPs could also use the technology to buttress their own SLAs.

The big question is whether this sort of thing could move networks to parity with IT infrastructure in terms of managed services. If you look back at Google’s GKE and Autopilot combination, you see not only a strong SLA framework, but a divorcing of the user from a lot of explicit infrastructure responsibility or knowledge. An Autopilot-mode cluster is almost an intent model; it has external properties and an internal realization of those properties, but users don’t worry about the latter once they’ve defined the former. Could something like this be brought to networks?

My favorite concept, network-as-a-service or NaaS, would be one way. In a way, a session-aware edge element could be viewed either as a way to generate a connection “overlay” that could integrate with an MPLS VPN, or a way to request a connection, which would surely look a lot like NaaS. The difference in the two viewpoints would be that the second assumes that the session-aware edge was universally deployed, not just deployed where SD-WAN’s traditional small-site connectivity enhancement was appropriate.

That same question would apply to telemetry used in an SLA. MSPs selling “managed SD-WAN” and adding feature enhancements for differentiation would surely, at some point, be interested in selling “universal VPN” services, which would envelope both those traditional SD-WAN sites and MPLS VPN sites. The latter might then be targets for SD-WAN introduction to replace MPLS, or they might simply be targets for augmented features independent of the VPN connectivity.

128 Technology was taking a step in this direction, I think, with it’s L3 NID concept. Late last year, around the time when the deal with Juniper was announced, 128 Technology created a kind of split model of its SD-WAN edge, one that started with what’s essentially a telemetry generator, the L3 NID. You’d then add on other features to it, including traditional SD-WAN VPN features. It’s not clear what Juniper intends to do with this approach, but it would appear to be a step toward providing universal telemetry for MSPs, and perhaps add-on session awareness that could be used to create NaaS-like connection awareness without changing the interior network behavior at all.

If users want hands-free containerized applications, they’d seem likely to want something similar on the network side. In fact, you could argue that one of the reasons why things like managed container/Kubernetes services work is that they virtualize the network piece. Is this a signal that we really do need a similar approach to managed services on the network side? Might cloud providers like Google start to think about providing it? Remember, anything that’s at the point of user connection to the network can add features to facilitate managed-service capability. Cloud edge included? Could be.

The Relationship Between SD-WAN and SASE: Complicated

Network technologies often seem to overlap, particularly where several somewhat-related ones end up going in the same physical place in the network. This may be a factor in something talked about in a recent SDxCentral piece, which speculates that SD-WAN is just a “Trojan Horse” for SASE, meaning Secure Access Service Edge. I think there is a relationship between the two, but it’s a lot more complicated. Maybe it’s even a case of parallel evolution.

Most everyone knows that SD-WAN is a technology that got its start because small sites, particularly in remote areas, were often unable to get on the corporate VPN because MPLS VPN connectivity wasn’t available to them, or was too expensive. SD-WAN creates a real or virtual overlay on the Internet that is then linked to the real corporate VPN, putting all the SD-WAN sites neatly on the VPN without MPLS.

SASE’s story is more complicated. There’s a good argument to be made (and the SDxCentral piece suggests it) that SASE was a product category seeking a problem to solve, until COVID came along. A good argument but not a great one, because there’s a kind of implicit First Law of Branch Networking, otherwise known as the Conservation of Boxes principle. It says that branch connectivity solutions will ultimately converge on the smallest number of devices needed to do the job, which is usually one device. Since branch offices have service termination, local network connections, and security, Conservation of Boxes says these have to be combined, hence the SASE. COVID was just a proof point.

That’s not where the complexity ends, either. SD-WAN is a demonstration of what might be called the First Corollary to that First Law, which says that a vendor will try to stuff all the useful functions possible into their box to maximize the business case. That is actually the driver behind Conservation of Boxes, because it creates the competitive drive toward simplification. So what we have here is a set of loose security and connectivity features in a branch, a new kid on the block in the form of SD-WAN, and the inevitable Conservation-of-Boxes drive toward optimality. SD-WAN can be expected to grow in features, and some of those features will be features that were already supported by other devices.

This has put larger network vendors in a tough position. On the one hand, they’ve made money by selling incremental box-based improvements to networking at the branch level. On the other, here’s this SD-WAN thing that threatens to puff out with features to become the god-box of branch networking, the SASE.

An integrated approach to connectivity is a general threat, saving perhaps to any firmly established incumbents. New issues offer vendors an opportunity to introduce themselves to buyers and inject their solutions into deals previously lost. Integrated strategies also favor incumbents because, as one enterprise CIO once told me, “If you’re putting all your eggs in one basket, make sure it’s a strong one!”

All this suggests that there is a relationship, one with both positive and negative elements, between SD-WAN and SASE. Does that make SD-WAN a Trojan Horse? If it does, how does that impact the risk/reward balance for vendors?

The Trojan Horse argument is based on two points. First, any new box or function will tend, as I’ve noted above, to grab onto associated features and functions to improve its own business case. That would mean that SD-WAN would naturally expand into other connection-point areas, and at some point would surely end up looking like SASE. Second, if SD-WAN actually brings any feature value to a related connection area, it naturally changes the market dynamic for that area. SD-WAN seems to have both these effects.

A good example is that SD-WAN implementations, because they separate traffic between virtual-VPN and Internet, often provide some form of prioritization control. That feature may be an element in an edge router value proposition. The same could be said for encryption, and of course if an SD-WAN implementation has zero-trust security features (which a very few do), then that could be perceived to compete with traditional barrier-based mechanisms like firewalls.

From this, it seems the argument that SD-WAN is a Trojan Horse (or perhaps a stalking horse) for SASE is valid. I don’t think so, because I think that the two have really been one thing all along. “Connection-point services”, the sum of stuff that a user would expect to see as a service feature provided at the network’s edge or demarcation point, are a group of features. You shouldn’t really be taking some combinations of features from that group and calling it a “product category”.

We all know that analyst firms can make more money by creating a new product category than by assigning new products to an existing category. Thus, it’s hardly surprising that we have categories springing up that are really simply rearrangements of others, or evolutions. I think that’s the case here, and we’re ignoring the big factor behind it all, which is virtual networking.

Virtual network technologies, including SD-WAN, are almost always technologies that overlay (in a real or logical way) the real network, and since they’re user services the process starts with the user’s point of connection. Any virtual network will therefore be first an on-ramp for the user and second a device that sits at the “real” network edge. If you define something that’s designed to be a “service edge” you’re defining something that’s going to sit there too. Edge devices tend to be driven by a single mission set. Thus, parallel evolution.

There’s also perhaps an element of coincidence in all of this. SASE, after all, names a place as much as it implies a security function. If you, like me, believe we’re evolving toward that ultimate level of service personalization called “network-as-a-service” or NaaS, then what’s really happening is that SD-WAN is becoming NaaS, and SASE is just another name for the place the device that’s at the edge of this new virtual network service happens to sit.

So will vendors and analyst firms eventually get this? Not likely; reality rarely overcomes opportunism these days. Vendors are very concerned about protecting their security revenue stream, a revenue stream that would (if we had full NaaS in place) suddenly disappear. If every user/application is a NaaS consumer, there’s effectively zero-trust security in place. Analysts will, instead of defining that which is and has always been happening, simply say that a “NaaS edge” is another product category. Business for all, then, continues as usual.

HPE Enters the Open RAN Battle

The battle for Open RAN may be taking its final and most useful form. HPE is entering the space, and with a suite of elements that promises to reduce the integration burden associated with Open RAN and other open-model networking technologies. Since Mavenir announced a partnership with Red Hat in the same general area, it’s pretty clear that we’re moving from point-element competition in the space to systemic solution competition. That’s critical for the success of Open RAN overall, but it also comes with some risks.

My talks with operators this year have illustrated three basic truths about their infrastructure budgets. First, with the exception of 5G in general and RAN in particular, no significant infrastructure spending is budgeted. Second, the preference of operators, by approaching a 3:1 margin, is for an open solution. That’s particularly true with Open RAN views. Finally, operators are increasingly disinterested in (or even disenchanted with) integrating a couple dozen vendors into an open-model 5G network.

These three truths play out in a kind of “mindset sandbox” that’s evolved over a couple of years. Operators have generally accepted that hosted functionality is going to be critical going forward, not the least in 5G where it’s actually mandated. The question is how the hosting is done. One approach, the traditional one for 5G, is function hosting in a server-based resource pool. The other, which has been gaining significant traction, is the white-box hosting model.

What operators have always liked about white-box networks is that they’re a special case of “box networks”, which is what they’ve built all along. You can put white boxes everywhere from (in theory) a cell site to the core. They can be managed almost exactly like traditional boxes in each of these locations. You can probably evolve to white boxes more easily because you can do point substitutions for obsolete gear in your current network. Finally, white boxes separate the capital hardware and the functional software. Operators rightly believe that it’s software that will evolve, so they believe that having an open software model means they can repurpose hardware, making hardware (and capital) lock-in less risky.

What they don’t like about white boxes is that they see less economy of scale. Hosted functions in edge-and-deeper server farms is almost surely more capital-efficient. Not only that, these servers can also host things other than traditional network functions, which means they might serve in helping operators climb the service value chain (when and if they shake off their fear of ‘higher-layer”).

The mixing of the three operator issues and our two sandbox frameworks appears to have shaken out three vendor approaches to 5G. The first is the “software-on-whatever” approach, taken by VMware and most recently the Mavenir/Red Hat alliance. This approach says that 5G is all about the software; anything consenting adults want to run it on is fine as long as it’s basically compatible hardware. The second is the “mission” approach, which says that what you want is a package that includes both hardware and software and focuses on a specific network mission. The final one is the “pre-integrated” approach, and that seems to be what HPE is after.

All of these, of course, are being touted by someone other than the mainstream network equipment vendors, whose embrace of open-model networking could fairly be described as halfhearted and a bit cynical. Unfortunately, all the vendors who do support an open-model approach are novices in selling network infrastructure to network operators. In particular, these vendors have all fallen into the IBM trap of focusing on sales engagement to drive their entire campaign, doing little or nothing in the way of marketing and positioning to grease the skids. That’s bad because it puts the sales organization in the position of doing all the legwork, spending time that isn’t generating sales and commissions. It’s no wonder that these vendors have all had issues with their sales so far. Even now, it seems pretty clear that the current vendor initiatives in the open-model network space are being driven by the hope that buyers are getting more determined, not that their own efforts are getting smarter or better.

Well, maybe HPE will be different. They do have some assets (or possible assets) in the space. In fact, they have four, and we’ll look briefly at each.

Asset number one is that they actually have a very complete telco solution. HPE has always supported NFV, always supported OSS/BSS integration, and they supported an open 5G core model even before their just-announced Open RAN strategy. That means that they could provide all the relevant pieces of a 5G infrastructure, both hardware and software, and they integrate it and take support responsibility for it. That is a major asset according to operators.

The second asset is that they use edge computing to address white-box native granularity. To quote the HPE material I link to above, this is a “Carrier-grade ruggedized NEBS Level 3 and GR 3108-compliant platform — Designed for the far edge.” The same technology HPE promotes for edge computing and IoT is available for carrier edge applications in 5G. There is significant benefit in the notion of generalizing 5G edge to “IoT edge”, given that there’s a lot of scale to be obtained in the latter mission.

The third of the HPE assets is partner programs supporting development of higher-level applications. HPE has long had an NFV program to solicit and certify VNFs. They have similar capabilities in the IoT space, and so they know how to do partner programs and leverage them without losing sales or support control.

Finally, HPE has the mass to promise full support for 5G advanced features, including network slicing, both within the RAN and through their open-model 5G Core. Operators have cited a fear of being left behind as a major barrier to Open RAN, and they accept HPE’s long-standing telecom commitment (via NFV) and its sheer size as an assurance that HPE will keep up with the network equipment vendors.

All this is encouraging for HPE, but not decisive. Every public cloud provider is working to get in on the 5G opportunity, even before it’s really clear just what that opportunity is. Software-side players like Red Hat and VMware have jumped in too, and of course there are white-box-based initiatives for 5G being produced by individual vendors and integrated by others. HPE, in the face of all of this, has fairly dismal marketing/positioning. The only thing that’s saving them from another NFV-like debacle (NFV never delivered much for HPE) is that competitors really aren’t doing much/any better.

I asked about three dozen operator infrastructure planners to draw me a “complete” 5G open-model network. None of them drew exactly what AT&T proposes to sell, or reproduced a diagram that matches the HPE website material. Nearly all of them did one of two things—ceded 5G user plane entirely to legacy technology, excepting the UP elements that terminated 5G interfaces, or advocated some role for white-box elements. Nobody hosted everything, and if you want to sell an architecture your buyers don’t have any inherent appreciation of, you’d better be singing and dancing like a son-of-a-gun to get mindshare, and quickly.

The big opportunity for players like HPE is the operators’ desire to get a complete 5G solution from a single, responsible, player. That “complete” qualifier ties to those diagrams I asked for, and if the buyers themselves can’t draw a picture of their goal, it follows that a vendor/integrator has to either draw a credible one for them, or wait a long time for the opportunity to develop. I wonder if there’s a long time to wait. The 5G network giants are all moving to embrace something more open, and they’re surely able to provide something “complete” already. All the open-model integrator-wanna-be’s are going to be watching developments and trying their own drawing skills. Somebody might just get lucky.

HPE and other major players in the Open RAN space who believe they can capitalize on the integration interest of buyers should keep this point in mind. Open RAN is a piece of a network, not the whole network, and operators need to build the network and not pieces. Even when you add a new element, an almost-autonomous piece, to a network, you have to consider the whole and not just the part. That’s part of the marketing reality of network transformation…if you remember that you have to do marketing.