May 2019 – Welcome to CIMI Corporation's Public Blog

The Relationship Between 5G Success and OTT Exploitation

5G is surely a popular topic for media and bloggers, in no small part because it’s seen as the salvation of both service providers and the vendors who sell to them. A recent article in Fierce Telecom raises some good points on just who might benefit from 5G, points that deserve a close examination based on my own input from users and enterprises. The teaser for the article says that users will benefit from 5G, but the benefit for operators is less clear. That’s true, but with a caveat.

Users expect 5G to make wireless connections faster, and that’s probably the only certain benefit of 5G to users. Even here, though, you have to qualify the “certain” part. 5G certainly could deliver faster wireless connections than 4G, but one interesting truth about broadband is that the speed of the connection is rarely the limiting factor in the quality of user experience. That quality is what users actually perceive, so the speed benefits of 5G may depend on just what users expect to do with it.

Users who talk to me about 5G are almost universally expecting it to improve the quality of video streaming, and in some areas it’s very likely that will happen. The reason isn’t as much “speed” of connection as it is consistency of connection QoS, which 5G improves by improving the number of users that can be supported on a cell. In areas where local cell sites are likely to be overloaded at various times, 5G could relieve the congestion and improve the QoS for video delivery, which would be a powerful benefit to those users who want to view streaming video where no WiFi is available or where WiFi is itself overloaded by other video-hungry users.

Broadband speed itself is probably not as much an issue. If a user can get video at 500 Mbps versus 50 Mbps, or even 5 Mbps, most probably couldn’t see any difference in the quality on a phone or tablet. That raises what I think is the most important truth of the Fierce Telecom article: it’s almost surely OTT applications that will drive, and win for, 5G service providers.

There is a tendency, as the article says, for users to expand their usage to fill available bandwidth, but of course it’s not really the users doing the expansion but the OTT experience-delivery outfits. There will be a certain amount if new streaming usage with mobile 5G because of improved QoS, but overall any big gains are likely to come from more complicated and newer delivery models and even experiences.

“Connected car” is an example; if you presumed that you could download videos at over 100 Mbps with 5G, then you could use 5G to a vehicle to drive an in-vehicle WiFi connection and let everyone in it stream to their heart’s content. Similarly, 5G mobile to a dongle could support a group of laptops, tablets, or phones in a home or office, or in a hospitality location. All of these could exploit 5G connectivity.

But probably not drive it, at least not drive it to quick ubiquitous adoption. For that, we probably need things that 5G could do that standard 4G can’t, applications like virtual and augmented reality. Gaming seems a likely driver, for example, because you need a combination of enough bandwidth to ensure the game experience is realistic and that, for massive multiplayer games, the users of the game are properly synchronized in both experience and collective behavior. Virtual reality added to gaming only increases the potential need for higher bandwidth because it’s likely players would expect to be able to see a broader swath of their environment by turning their heads. Games can easily become more immersive, more demanding.

AR is a topic I’ve blogged on before, and it represents the broadest set of potential 5G drivers out there. I’ve suggested in past blogs that IoT and other contextual/location services should be visualized as a set of “information fields” that users intersected with as they moved around, changed goals and missions, etc. These fields could then contribute to the visual field of users, creating a highly flexible and customizable experience. Since AR of this sort would demand some close synchronization between “reality” in the sense of what the user actually sees, and the “augmentation” that’s being presented, this could well be a two-way exchange of information, more demanding and especially more latency sensitive.

IoT could in theory spawn a whole set of applications, but I’m still skeptical about the way the necessary sensor deployments could be made profitable. There’s no question that if we had 5G-equipped sensors available for OTT access at little or no cost, we’d generate a lot of applications quickly. There’s a big question of whether getting those sensors under those terms has any chance of happening. OTTs, you recall, fight bitterly against having to pay for access to broadband users to deliver video content and other for-pay experiences. Why would they accept paying for sensor access, and how would sensor deployment (and 5G service) costs be paid with no compensatory revenue?

What everyone would like to believe is that simply having 5G in place will spawn the same kind of competitive frenzy to exploit it that we had with broadband home Internet services a couple decades ago. That would be true if all that was needed to make the 5G ecosystem work is the bandwidth, but that’s not the case. 5G needs applications, missions, that probably demand things like IoT sensor deployments that are themselves massive capital investments. Even if we presumed that somehow we established a utility-like framework for deploying those sensors and then sustaining their operation, we’d still have to earn enough to generate a profit on them.

The question the article raises most clearly is related to that “utility-like framework”. Who is more utility-like than a telco, a successor to a regulated monopoly or even an arm of a government? If the telcos were to have a clear path to successful OTT service deployment relating to 5G, they could surely justify investing in it. However, if the telcos had a clear path to that goal, they’d be IoT and OTT kingpins already. Part of the problem is cultural, part is regulatory, but all of the problem is clearly hard to solve because we haven’t solved it yet.

5G will happen, for sure. Whether what happens with it is little more than a radio-network upgrade to what we have today, or a massive shift in how we conceptualize and deploy mobile and even wireline infrastructure, depends on whether we can make 5G-specific services happen. The jury is still out on that one.

The “Concurrency” and “Persistence” Dimensions to Cloud-Native

There are a lot of moving parts to cloud-native, some of which (like state control) I’ve already blogged on, and some of which have received considerable attention in the media overall. If you look at the big picture, you find the strategies for achieving cloud-native behavior fall into two groups. One is the software design side, with the stateless-microservice discussions, and the other is on the operations side with things like orchestration (Kubernetes).

An operator and an enterprise contact both asked me to talk a little about two other issues, ones that I’ve neglected but are still important. It occurred to me that these two other issues really give us an opportunity to consider the totality of cloud-native. They’re concurrency and persistence, and not only are they fundamental to cloud-native concepts, they might even unite operations and development.

“Concurrency” the concept of running elements of an application in parallel instead of in sequence. This has been an issue for decades because it’s often necessary to “wait” an application’s execution while something external, like an I/O operation, is handled. More recently, network-centric applications have used concurrency to make better use of processor/memory resources by handling things that aren’t really tightly connected in parallel. The Java programming language illustrates a common approach to concurrency—threads. A thread is a parallel process set that runs alongside something else, but threads are programmatic concurrency accommodations within a logic set.

In the cloud, concurrency is a natural property of multi-component operation. If an application consists of multiple, independently hosted, components, it’s almost certain that the components run asynchronously and concurrently. In fact, if you have “components” that never run at the same time, I’d argue they shouldn’t be separate components at all.

If separately hosted, cloud-native, component truly runs concurrently, then besides the usual cloud-native state management and related stuff to consider, you have to consider what concurrently operating components means in workflows and design. Grid/parallel computing and some big data tools offer an example of this challenge.

Grid/parallel computing presumes that you can take massive computational tasks and divide them up among multiple systems for execution. The key to making it work is to find computational tasks that are independent until their results are finally correlated for delivery. In short, you have to be able to separate the overall task into a set of autonomous subtasks. You then farm the latter out, and make something responsible for a final summarization of results.

Big data, particularly in the map-reduce-Hadoop model, uses a combination of clusters of data, distribution of queries, and “reduction” or combination of results. This means that the queries are divided and executed in parallel, another example of concurrency. In this case, the original data-hosting, the query-division, and the data-combinatory processes ensure that you actually get concurrent execution and assimilation of results.

Concurrency of this type, meaning what’s essentially “concurrency within an application” is complicated. In the cloud, we might have these issues, but we often have a different kind of concurrency, one where the parallelism of component execution is created by the need to service multiple events or transactions at the same time. Event-based systems that process many unrelated events, or front-end cloud systems that process parallel transactions from many users, are examples of this kind of component concurrency, and it’s the easiest kind of concurrency to manage because the parallel tracks are independent.

The hardest kind of concurrency is related to it, though. If we have an event-driven system whose events are created by elements that are cooperating to create a single behavior set, then those events occur asynchronously and can be processed to a degree in parallel, but they still have to be processed with the recognition that the system as a whole is trying to operate and all the parts and processes have to be coordinated. This is the special problem created by lifecycle management of multi-element applications and services.

Suppose you spawn an event to signal that you’ve started to deploy an access connection in a given area as a part of spinning up a VPN. That specific piece of the VPN is now transitioning toward operational, but not all of the VPN may be spun up. If there are a dozen different geographies being served, a dozen access connections are needed. We can’t release the VPN for service until they’re all operational.

If that’s not complex enough, suppose, as we’re spinning up our access connection, we find that a key resource has failed. A VNF hosted in a given data center can’t be hosted there because of a fault or overload. This is an error condition that has to be processed, and we’d all agree. But we might have provisioned the deeper part of the VPN to create a connection to the data center we expected to use, and that’s not available. We have to “back out” that prior step and connect to another data center. While we’re trying to do that, there might be a fault in the original data center connection part of the service. We have two different things now trying to impact the same area of service at the same time. Why remedy a problem in something we’re decommissioning? And suppose we got those two events in the opposite order?

This is why you need to have state/event mediation of the event-to-process correlations for each piece of a service. We could structure our state/event tables to say that 1) if we were in a “decommissioning” state, ignore fault events, and 2) if we were in a fault state and received a decommissioning event, let that take precedent and suspend fault processing.

Note that in this most-complex kind of concurrency, we have a benefit to explore too. Any event can be processed by a new instance of the component, spun up in a convenient location, as long as that instance gets the event passed to it and has access to the service data model that contains all the state and stored data required. Even a fault avalanche can be handled, both in terms of mediating how events are handled and in terms of creating enough processes to handle everything that arrives.

OK, that’s concurrency in cloud-native terms. Now on to “persistence”. I mentioned in a blog last week that you could differentiate between a classic “microservice” and a “function” by saying that microservices were persistent, meaning they stayed where you put them till they broke or you removed them, and functions were transitory, in that they were loaded only when actually needed.

To me the best argument for saying that microservices should always be stateless is that if they are, then the decision on persistence can be made operationally rather than at development time. If a component is expected to be used a lot (or if regular use is detected through monitoring) you could elect to keep it resident. If not, you could save resources (and, in a public cloud, costs) by unloading it.

You could also decide to keep components resident when idle unless you needed to use the capacity for something else. That would make the process persistence decision very dynamic and very much aligned with the actual needs of the applications/services. More efficient resource use equals lower costs.

In our lifecycle management examples, you can see that the ability to have a process selectively persistent would be a big benefit. If things are running normally, then error processes would be invoked rarely and so unloading them while idle could be a big benefit. If things start to go south, operations systems could decide to make more error processes (the ones either in use or likely to be used, an AI determination) persistent to handle things faster.

Concurrency plus persistence control equals optimum efficiency, scalability, agility, and performance. The keys to achieving this are first, to use stateless processes fed with all the data they need as their inputs, and second to have a data-model-driven state/event steering of events to processes. If you have this, then you can always spin up a process to respond to need, and manage the inherent tension between disconnected event-driven processes and the services or applications that necessarily unite them and demand coordinated handling.

An important point here is that “cloud-native” isn’t a buzzword, but neither is it a simple one-dimensional concept. To take advantage of the cloud, you have to support concurrency, scalability, and resiliency in a way that maximizes cloud benefits. You also have to coordinate diverse autonomous but interdependent processes and foster agility in operations in a way that doesn’t mean rewriting everything to support new run-time conditions and issues. State is a big part of it, as is the mediation of the event-to-process relationships via an organized model of the application or service. It’s complicated, which is very likely why we’re having so much trouble coming to terms with it.

Another reason is that (as is often the case) we’ve tended to come at cloud-native from the bottom, starting with the software implementation of processes rather than with the architecture of the application or service within a hybrid/multi-cloud. If we started with issues like concurrency and persistence, we could derive things like stateless components, state control and orchestration, and event/process steering, and end up in the right place.

The “top” of “top-down” thinking is thus a bit fuzzy. Yes, you have to start with requirements, as Oracle did a blog on it recently, adding a link to a useful e-book as well. Yes, you have to end up with specific software design patterns and associated tools, as we’re seeing in the Kubernetes ecosystem. What’s in between, though, is the “software architecture”, the application framework into which everything that’s developed fits and to which all operations practices and tools must apply. DevOps, for example, is just a concept if you don’t have some specific framework for applications and services that build the bridge between the two pieces. We’re still not quite there on that framework, and the biggest challenge we may be facing in getting there is the “atomization” of the problem.

“The bigger it is, the harder it is to sell.” Salespeople know that it’s a lot easier to sell a can of car wax than to sell a car, and easier to sell a car than to sell a fleet. The problem with this tendency to think small and near-term thoughts in marketing and selling technology is that it doesn’t introduce the full scope of potential problems and benefits. That can limit the solution scope these early and fragmented initiatives expose us to, and that can create inefficiencies as time passes and broader needs and opportunities are exposed. I think the right model for cloud-native is already out there, but it’s not yet recognized because of think-of-the-next-quarter’s-earnings-myopia among vendors and buyers alike. It’s time to raise our eyes above our feet on this, folks. There’s a lot to gain…and lose.

Have We Missed a Fundamental Point on State?

One of the big and hidden problems with cloud-native and microservices is state. True cloud-native elements should be fully scalable and resilient, and that combination mandates that these elements save nothing within themselves, even the context of a dialog with a user. “State” or “context” is intrinsic to virtually all business applications, including applications designed to support online retail operations or customer support. Thus, getting state/context into cloud-native is critical, and we just might have missed something fundamental.

We’ve known for a long time that “stateful” behavior was important in applications. Level 3 switches, application gateway controllers, and load balancers often mention “stateful” behavior as part of their feature set. That’s because a transaction usually involves a number of back-and-forth exchanges, and if the same instance of a process doesn’t handle them all, it’s possible that transaction processing will get tied up in knots and even create security issues.

The cloud today is most often used by businesses not to host everything, but to host the front-end piece of transaction processing. Some of that is complicated by the need to support inherently stateful behavior within the cloud. There are two pieces to that; state control of stateless microservices, and cloud storage of data needed by all the possible instances of these front-end application components.

There are three broad ways to provide state to stateless components of an application. The first and easiest to understand is to send dialog state information inward from the GUI or application, as part of each “event” that’s being generated. Think of that as meaning that you tell a stateless process “this data is the response to a presentation of account information for update”, where you send not only the data the user updated but also the information needed by the process to interpret that data. Any instance of the process can receive that message and handle it.

The second mechanism is to have the stages of contextual transaction processing recorded in a database and accessed by the cloud-native process when it receives some work. This is “back-end” state control because the state database is “behind” the process. You still need some sort of transaction ID to link each new message with the back-end data store.

The final mechanism is “orchestration”, a kind of distributed state control. Amazon provides this with Step Functions, which plot out a sequence of steps a given transaction goes through, and runs stateless processes within a model framework of those steps. While orchestration doesn’t have an explicit link to a database, the model and sequencing has to be stored somewhere.

Orchestration is interesting because it’s a link between two related but still separate concepts—microservices and functions (lambdas). The two get conflated all the time, but generally a microservice is persistent except under unusual conditions. They can scale and heal by redeployment, but they are available while the application is running. Functions are kind of hyper, meaning that they are transient in-and-out things. A function loads when you need it and disappears when it’s done. That makes it “serverless” and makes the issues of state management more acute, because functions can pop up anywhere. We’ll get to that below.

The precise mechanisms available for these options vary among cloud providers and also among “tool providers”, which presents some challenges for developers. Front-end state control is the most portable approach in terms of working on any cloud and with any toolkit used to build and deploy microservices, but it introduces more work on the user interface to maintain state. Back-end state control poses a risk if the state database can be separated (in distance/latency terms) from the process instances accessing it. Orchestration state control is quite implementation-specific, and so all three mechanisms have challenges.

As I noted above, persistent data storage in the cloud is another state/context issue. If multiple instances are to access a state database, the access probably has to be mediated to prevent having collisions arise in updating it. This is a fairly classic problem that some database technologies address within themselves, so there are solutions available. However, there’s been a lot of interest in cloud storage and data management recently, both among public cloud providers and among third-party vendors.

We have some recent news in the space. One startup, Reduxio, is about to launch a cloud-native storage and data management tool that can provide persistence of state/context information without sacrificing cloud-native benefits. The KubeCon event last week focused in part on stateful support in Kubernetes and state-centric advances in the Kubernetes ecosystem, and Robin.io announced a collaboration with Red Hat and OpenShift to enhance stateful application capability.

All this is good news for cloud-native in general, and for broader use of public cloud services by enterprises as well. The challenge is that we’re still not really homed in on a total solution. The tools I just cited are great strategies for cloud databases to hold state and other data that can be expected to be updated with user activity, but they don’t totally address the question of state. Since we’re still seeing multiple implementation options among public cloud providers, that means that state management and persistent data strategies aren’t portable without some tuning.

There’s also the broader framework of deployment of this stuff to consider. Kubernetes is the go-to orchestration approach for containerized applications, and you can get a number of Kubernetes-based solutions that work across multiple clouds as well as the data center. For connecting services, we also have a number of approaches, including Istio, which I’ve noted in past blogs. Logically we’d like to see stateful behavior integrated into/with both Kubernetes and Istio.

The question is what kind of stateful behavior we’re talking about. Remember our earlier comment about microservices versus functions? While functional computing is usually associated with event processing, an event is really just something external that demands software attention. We have events in protocols all the time; every message is one. We interpret events by associating them with processes through a state/event table. You can surely frame such a table as an orchestration model for IoT events, but you can also frame a transactional dialog as a sequence of events, as noted above, and that’s what I think raises the big question.

Orchestration of functions combines the process of function hosting and state control. You put a function somewhere, run it, and contextualize it through a distributed state control feature. Could you, in theory, if a particular kind of event was regularly occurring, elect to keep the function in place? Surely. Could you, if you had a microservice that was rarely used, elect to unload it and reload it when needed? Surely.

The point is that we’ve slipped into state control using a model that’s different for transient functions versus persistent microservices, and that’s a difference of application and conditions and not one that is as naturally polarized as we’ve made it. That was a mistake, and we’re not moving as quickly to correct is as we should be.

Learning the Deep Lesson from Cisco’s Quarter

Cisco gave us classic good/bad news in its earnings call. The good was that their guidance for the rest of the year was strong. The bad news was that service provider revenues were down by 13%, and these points were summed up by Light Reading and CNBC . There are a lot of implications to explore, some shallow and obvious and others deep and important. We’ll explore them now.

The most obvious point of the Cisco numbers is that enterprises are still seeing a business case for network investment and service providers are having a problem with that business case. This is consistent with the long-standing view of providers that declining profit per bit was inevitably going to impact ROI on infrastructure, and that in turn would impact capex. OK, that’s happening, which makes this particular insight obvious and shallow.

But it’s a jumping-off point to a deeper question. Operators have two basic sets of cost associated with infrastructure, capex and opex. Opex is, on the average, more than half-again as much as capex. Opex is part of the financial measure EBITDA (Earnings Before Interest, Taxes, Depreciation, and Amortization) typically used to measure operator financial performance. Capex isn’t. There was a major shift in NFV emphasis to focus more on opex reduction, and there was a new ETSI activity (Zero-Touch Automation) to address it. Given all of this, why is capex under pressure?

One possible answer is that the opex reduction initiatives have failed, and that’s true in the sense that neither NFV nor ZTA is contributing anything to reduced cost per bit. However, operators have streamlined operations significantly through less efficient means, to the point where they’ve reaped pretty much all the low apples in savings. Since 2015, in fact, operators have saved more in opex than the 13% that Cisco says capex is down.

Another, probably better, answer is the “new-model-problem”. Nobody wants to buy a “new” car at the end of a model year when a better/newer/cooler model is about to come out. At least they don’t want to buy it without a steep discount. Could we be seeing operators’ belief in a future infrastructure model that they can’t quite grasp and implement, now hanging over sales of the old model they don’t believe has legs? Almost surely, yes, and Cisco is actually benefitting from that very issue, both in the service provider space and also with enterprises.

In my surveys, both groups have shown a shift in how they balance “proprietary risk” versus “integration risk”. Six years ago, operators were at their peak of “lock-in” sensitivity, trying to keep major vendors from creating pull-through across their product lines. They viewed this as a way of keeping prices high, and we all know we went through a media phase on the lock-in topic at that time. Today, the same players report that they believe having a single vendor with a vision of infrastructure evolution cuts their integration risk, their integration cost, and their risk of stranding capital on old technology.

Cisco has always been the master of the paper-it-over school of technology revolution. Someone comes up with software-defined networks. Cisco responds with application-centric infrastructure, which is designed to accomplish at least some of the goals of SDN but without the fork-lift replacement of legacy gear or the leap of faith into a new network paradigm. That’s a pretty strong response to a scary new trend, particularly when it’s the perception of buyers that the new trend is losing steam in the real world.

Operators tell me that they have “lost some faith” in every new network paradigm. Part of the reason is that all “new” things these days are over-hyped to the point where there’s no chance they’d live up to expectations. This problem has been around for decades, but it’s worse in the age of online journalism and click-bait topic management. It’s particularly difficult to get senior management buy-in for something that’s being trashed in the media; nobody wants to take the risk. Remember the old saying “Nobody gets fired for buying IBM?” It’s now “Nobody gets fired for buying Cisco.”

The natural response to fears of integration and technology risk is “communitization”, meaning cooperative work by operators and vendors. Standards groups have been the traditional way of handling that, but standards have proven too slow and too narrow to do much good in the current world of network tech turmoil. Open-source software is a better model, but operators don’t know how to do that and network vendors, including Cisco, have obviously resisted initiatives that would level the vendor playing field.

Another insidious problem operators have faced in trying to manage the risks of multi-vendor is the fragmentation of initiatives. We have a new technology, so we launch something to develop it. The management of that new technology? Out of scope. The use of that new technology in emerging services (like the OTT stuff)? Out of scope. Integration and evolution of the technology into a running network? Out of scope. In short, the nature of the initiatives intended to give us something new has created an even worse integration problem than we had before.

We know how to build basic IP networks. We know their strengths, their weaknesses, their costs and risks, and we accept at one level that something better is going to come along. It just doesn’t seem to be quite here yet, and that has resulted in a hanging back of those buyers with modest business cases. Enterprises, who are generally booming relative to operators, and who see network-linked information empowerment as both a revenue booster and a productivity improver, are more eager to step forward and buy something, but even those enterprises are trying to stay in their comfort zone, a zone Cisco has long defined.

Cisco expects the operators to spend more later this year, and that’s consistent with what operators are telling me. There is competitive pressure on operators to deliver 5G. There is also a recognition that in many ways bandwidth is cheaper than bandwidth management, and building networks with more capacity can be the fastest path to reducing opex. Eliminate the complexities of congestion management by oversupply, not by regulating how capacity is used.

Cisco is the go-to vendor for capacity. They don’t supply 5G-specific technology, but they do supply technology to push bits. You can add Cisco devices into a Cisco-centric network easily, more easily than integrating another vendor in, and far more easily than moving to a new technology. Cisco, financially speaking, is doing the right thing, and that’s what earnings calls are about.

Cisco competitors are doing the wrong thing. Some, like Juniper, try to fill in to capacity plans on a per-box replacement approach, but that strategy has failed, and had failed long ago. Some, like Nokia or Ericsson, rely on providing the single key technology (the radio network) that’s associated with an accepted modernization thrust, such as 5G. That ignores the truth that if the 5G future does all that it’s supposed to, what it consumes most is capacity, and that’s not just in the RAN. We may never see 5G Core, but we’ll surely see 5G backhaul.

The reality is that every Cisco competitor should be working out a single-thrust, single-body, strategy for modernization. The future is scary, the risks you know and take every day seem less so. Cisco is the default path to that risk-you-know tomorrow, and so Cisco competitors need to unite and make the risk-you-don’t-know path seem less…risky.

What the Heck is NGOSS Contract and Why do I Care?

I ‘ve mentioned NGOSS Contract many times in prior blogs, and I was somewhat surprised when my latest blog (yesterday) raised questions from readers who were actually TMF members. One was particularly interesting: “I’ve never heard of it. What is it?” The actual question was a big more complicated and interesting, and that was how NGOSS Contract related to service lifecycle automation and cloud-native deployments. That’s what I’m hoping to answer here.

Let’s start with the definition, though. Back in around 2004, the TMF started to rethink its OSS/BSS approach, leading to the concept of a Next-Generation OSS or NGOSS. A part of this work was a framing of the service contract (the “NGOSS Contract”) as a means of representing process relationships to events. This eventually became GB942.

In an early presentation on NGOSS Contract (from Jessie Jewitt of Ciena, who headed up the TMF’s Carrier Ethernet activity), she laid out some issues on high-level service coupling with OSS processes in a resource-independent way. She asked me if I’d do something on NGOSS Contract, and I did a derivative paper in 2008. The key diagram from the presentation was one that shows the Contract as the conduit for requests made to what were at the time Service-Oriented Architecture (SOA) services. All this stuff sort-of-got incorporated in the TMF053 NGOSS Technology-Neutral Architecture (TNA).

You have to wonder what was behind all of this, what benefit the TMF advocates thought NGOSS Contract would bring to an industry that seemed to somehow chug along. At least part of the answer is concurrency. A service is the output of a real-time system, a system that has a lot of moving parts that are loosely federated into what we’d call “services” or “applications”. Each of those parts has its own logic, its own hosts, its own role to play. Each has to be put into service, fixed, and removed from service as the progress of the service/application lifecycle dictates. That means that at any given time, you might have a dozen things that needed attention, a dozen events to process. Why not have all those separate events drive their own instances of their own processes, as the Contract/SOA relationship of service-driven logic permits?

We now live in a world of cloud-native and microservices, not SOA, but that world actually needs the concept more than the world of 2008 did. The problem with a true cloud-native implementation is that since everything in it has to be fully scalable and microservice-ized, the process elements have to be “stateless”, meaning that they process requests based on data in the requests and not according to data or context stored by the processes. That’s what makes something cloud-native. However, statelessness makes it hard to interpret things.

Suppose I get a monitoring system report of a server failure. It’s an event that says “Server X Failed”. The problem we face is that we don’t know what that failure means. We have no context to interpret the event, and we thus don’t have any notion of what to do about it. Obviously, whatever ran on Server X has also failed, but what was that? Classic monolithic design says we’d spin through all our records of deployment to see what was on Server X, and from that we could see what processes had failed. However, even if we knew what those processes were, what services do they form a part of? How are those services expected to recover from the fault? All that is context.

The presumption of NGOSS Contract is that when the service order was processed, the order instance created by the deployment was a record of everything relating to the service. At the bottom of the hierarchy of model elements within the contract, representing features and access points and whatever, there would be a set of agents that received events from the resources those model elements referenced. Those elements could then take an event, reference the process list in the service model to see what that particular event in the current service process state meant, and invoke the appropriate process to handle it. Data model mediation is the way that the context that’s inherent in any cooperative behavior set gets imposed on a set of processes that themselves don’t have any contextual awareness. The model is the context.

In this approach, the service model element’s information and the event information are what’s needed to process, so any instance of the process can receive that information and generate the same result. That’s what makes cloud-native implementations scalable and resilient. But the process depends on having something—something like the NGOSS Contract—maintain state for all the feature implementations, hold the contract and deployment data, and hold state/event tables to link events to processes in the appropriate way. The microservices don’t have context because the contract does, and that frees the microservices to be their stateless, cloud-native, self.

In yesterday’s blog, this represents what I called “functional orchestration”, which means that the contract defines not just deployment but the coordination of the actual work. It provides the steering mechanism to harmonize asynchronous service events and pass them properly to cloud-native elements for processing. There is no “OSS” or “NMS” in this structure, in the traditional sense. Functionally, the OSS or NMS is the collection of operations/management processes that are identified in the state/event tables. Functionally, but not actually, because nothing except the contract assembles these things, and we could define any collection of processes that did something symbiotic as a “system”.

This point is why you have to be careful about “functional block diagrams”. In NFV, as I’ve noted, and in most OSS/BSS modernization processes, what we do is define the functional elements of an OSS or NMS or an NFV MANO framework. Those functional elements should be collections of cloud-native microservices linked via state/event tables to service conditions. They should not be translated literally into monolithic software components, which sadly they have been.

This (I hope) illustrates the “Why?” of NGOSS Contract, but it doesn’t explain how we managed to miss the significance of these points for so long. Some TMF friends told me five years ago that NGOSS Contract was rarely implemented, and I can tell you that all the operators I’ve dealt with have told me they don’t implement it. I stumbled on it because the TMF project called “Service Delivery Framework” or SDF I’d joined was dealing with how to represent a service as a series of interdependent but autonomous functional elements. Some operators asked me to do a proof-of-concept implementation (in Java, which became the first ExperiaSphere project), and that demonstrated to me that you couldn’t have scalability, autonomy of functions, and cooperation in mission without something minding the collective store.

I presented this to the TMF SDF team, but they didn’t pick up. I presented it to the NFV ISG and they didn’t pick up either, and all the operators who were the inspiration of the ISG were long-standing TMF members. So, the answer to why we missed this, why the TMF and the NFV ISG missed it, isn’t a question I can answer. I suspect we may never be able to get a good answer to that, but we do still have a chance of correcting our failures. I believe that as long as we don’t address the points NGOSS Contract raised, we have no chance of operations automation or cloud-native infrastructure.

How Bad is the “Cloud-Native” Problem for Operators?

Light Reading sets a lot of the dialog in the industry, and so when they raise a topic it’s important to me, to my clients, and to those who read my blog. An example is this article, citing a discussion at the TMF’s Digital Transformation World event in Nice. The piece recounts operator frustration with the “cloud native” claim and concepts. This mirrors the complaints I’ve heard myself, and I want to cover those complaints here.

The article’s main thrust is introduced with comments from Telecom Italia, including “We firmly need vendors to step up in order for us to be able to give to our friends what they need”, referring to the needs of business customers. According to the operator’s spokesperson, the specific issue is in cloud-native technology and its implications. “We are talking about adding software and core at the edge and we need to have orchestration. We have plenty of orchestration but not yet the right one and that is something vendors need to work on. We are talking about cloud-native and guess what? Vendors are not yet delivering cloud-native software for us. Time is running out.”

I agree with the sentiment expressed here; as I said I’ve heard the complaints myself. I also agree that vendors haven’t exactly been forthcoming with their cloud-native products, not exactly truthful in their claims, and not at all responsive in meeting operator needs. So, is this time for another vendor indictment? Only in part, because part of the problem is in the notion of “cloud-native” itself, and part is also with the operators.

Cloud-native means “designed explicitly to take advantage of the features of cloud computing”. If you think about that for a minute, you realize that the whole concept flies in the face of much of what we used to think was essential in cloud services. Current software is not cloud-native. If you port it to the cloud, the result will be not-cloud-native running in the cloud. If you had cloud-native software now, you’d already be running it on the cloud. That means cloud-native is about developing software explicitly for the cloud.

Software runs on a platform that consists of hardware (the server), an operating system, and “middleware” that provides access to incremental specialized features. If you ask a software developer to write software, the logical first question is “what does it run on?” because you’d need to know the interfaces, the APIs, that were exposed by the platform and thus available to the software. Let’s ask this obvious question; what is the API set that defines “cloud-native”? Answer: We don’t know. It’s not that we don’t have any; we have too many. The public cloud providers all offer the cloud equivalent of “middleware” in their web services. Think of writing software that had to run on any operating system, any set of middleware tools, and what would you end up with? Probably the classic one-line “Hello World” program, because doing anything else would make you platform-specific.

Carrier cloud is a kind of nascent public cloud, because we don’t have it yet but most say they want it and most also have specific things they think it will do. Making it do those things in a “cloud-native” way has to start with a definition of what the carrier cloud platform would look like to a programmer. Once that’s been covered, it would be possible to define how applications for the carrier cloud would look, and from that we could launch development projects.

Who defines the platform? If the answer is “vendors”, then we’ll probably have at least a couple dozen platforms, not one. That means that there will be no “carrier cloud” but rather a bunch of incompatible versions of one, which reduces the chances of anyone developing much for any of them. If the answer is “operators”, then we also have a couple dozen platform choices, but this time they’ll all be suppositional and described in terms of what they should do, not what they consist of. You can’t write any code for that mess. If you say “standards groups” you’ll wait five years to see if any result emerges, and if it’s useful. If “open source” is the answer, you’d need to identify the open-source elements of the platform, which then gets you back to the same list of options to do the defining.

There probably is no right answer to how to get to a carrier cloud platform, which is the biggest reason why we don’t have cloud-native technology of the kind Telecom Italia wants. There is, however, a best answer.

The cloud computing community has been feverishly expanding the scope of tools that contribute to cloud-native functionality. There are often multiple solutions to the same problem, but because the tools build on each other to create ecosystems, we already have a fairly limited number of ecosystems with fairly standard elements. We know cloud-native is about containers. We know Kubernetes is the way orchestration will be done. We know that there will be a service mesh employed to connect and load-balance elements of our applications. We know how an application looks, how it connects. We know, in short, quite a bit.

What don’t we know? First, we don’t know all the choices for the tools we’d use—service mesh, monitoring and management. We don’t know how orchestration extends to control state for stateless microservices or serverless functions, but we have some implementations. What we know the least about is how some of the specific issues network operators face, like data-plane traffic among high-speed software instances of device functionality flow in an application mesh.

Most of all, we don’t know “functional orchestration”. Yes, orchestration has been exploding as a topic in the cloud space, for deployments in containers, and even in a limited sense in NFV, but this is what we could call “structural orchestration”, the creating of the application or service structure. In a true cloud-native system, with elements that are stateless, you need to orchestrate the event flows through the application or service in order to get things that don’t know anything about context (microservices) to behave systematically.

Ironically, the very TMF who hosted the meeting Light Reading was covering had the answer to this ages ago, with their NGOSS Contract work that proposed a data model mediate the event-to-process relationships and set context for stateless processes. Without this approach, it’s very difficult to build a cloud-native version of any application or a cloud-native implementation of any network service.

The point here is that we don’t have a huge problem with our carrier cloud architecture, except perhaps in the political sense. That political problem was generated because we started out by trying to define how to build carrier cloud applications with no framework in mind, and so we cobbled together stuff. That stuff isn’t cloud-native because the framework we seized on isn’t cloud-native and never will be. The cobbling created fragmentation, which is one of the points the article makes about the current cloud-native situation. Nothing fits with everything, because there was no everything-framework to start with.

We have to accept that all that early stuff was done wrong, and we have to start working on doing it right.

Smart vs Dumb Pipes: Are We Still Fighting That?

I mentioned in my Monday blog my mock political debate on “smart versus dumb pipes.” Now it seems that the industry is heading for a reprise of that same issue, which is a good thing because it was never really resolved. The obvious question is whether there are new ideas behind the old question, or whether we’re just doing another rehash of the past.

The SDxCentral article I cited above opens with an interesting comment that should be the launch point for the whole issue. “Operators won’t be collecting big checks from YouTube or Facebook anytime soon,” says the piece, “but new services and capabilities delivered by 5G could lead to new revenue.” Let’s parse that statement and see where it goes.

The first point is that we’re still circling around on the question of whether there will be a “net neutrality” rule that bars settlement among providers and/or paid prioritization of traffic. That question is outside my topic boundary here, but suffice it to say that the “dumb pipe” option would surely be more viable were operators able to get traffic-based revenue from the OTTs. If they can’t, then the problems with declining revenue per bit are likely to go unchecked.

That leads us to the second point, which is that 5G could “lead to new revenue”. Does that mean connection revenue, meaning dumb-pipe revenue? It could if we presumed that 5G would either cost more than current 4G services or that it would somehow have more customers than current 4G service. We already have announcements from operators like Verizon that 5G won’t cost more. We surely have plenty of operators and pundits saying that the Internet of Things could add a bunch of paying-non-human customers for 5G service, but they don’t explain why these same new users wouldn’t already be testing the cellular waters on 4G.

A few largely ignored voices have been saying for years that 5G is really just a somewhat-faster-but-nobody-knows-how-fast version of cellular services we already have. What is surely true is that in the end, it provides another dumb-pipe option. That raises the last point in our parsing of the statement, which is whether 5G could create new smart-pipe revenue opportunity for operators. That depends on two factors: do 5G opportunities require something beyond connectivity, and are the operators in any position to provide it.

The foundation of a smart-pipe revenue stream from 5G is associated with what we could call “socialized IoT”. Instead of communicating in a traditional sense, socialized IoT contributes information. The notion of sensors open on the Internet fails the sniff test of business case; who deploys them when it costs to buy and maintain them, costs to connect them, and you are contributing them to all? What’s actually needed for socialized IoT is a public-utility model, and perhaps that’s the best argument for network operators to reap some smart-pipe revenue from it. The big and obvious question is “How?”

The obvious part of the answer is “not from connecting devices”, which would take us back to the failed sniff test. If IoT is contributing information, then it’s information that the operators need to be selling, at the minimum. They could go beyond that, to offering services based on information, but the information piece is the irreducible minimum of a smart-pipe offering.

IoT information is a potentially valuable commodity for a number of reasons. First, you need sensors to get it, and operators have long experience in deploying things that earn revenue only over time. They can make the big “first-cost” investments and we know that because they do it all the time. Second, operators can shield the sensors themselves from hacking, and validate the information that they generate. They’re already regulated, so adding regulations to protect device and data security and safeguard public privacy isn’t difficult to add on. Third, they have offices, real estate, located in proximity to where most sensors would end up being deployed and most information would end up being consumed. If things like low latency mean anything in IoT, then operators have hosting points that would sustain it.

The reason for my qualified “potentially valuable” is that we’ve all read a lot of things that you can do with IoT, but most of them are either home-control/facility-control and security applications that we really don’t need new technology (or new players) to support. Other applications like augmented reality and autonomous vehicles are problematic in terms of willingness to pay, difficult to frame a service for without forklifting a lot of consumer technology, or simply silly.

I’m a fan of augmented reality in the form of unobtrusive glasses that let information be displayed over top of the real-world view of the wearer. Think of it as a personal heads-up display, lined to information fields that describe what things are, what might be offered there commercially, and even whether you really know that person who’s coming the other way on the sidewalk. The problem is that the technology to support this in a convenient way, a safe way, is still embryonic. We probably will have this in five years, but we almost surely won’t have anything more than a glimmer (augmented glimmer, of course) in the next couple years.

Autonomous vehicles are both sensible and silly depending on just what you think is controlling them. If you visualize a future where a central edge-computing-and-IoT pool runs all the vehicles on the road, I hope you’re young enough to have a shot at seeing it. I doubt it, though, because the smart model of autonomous vehicles is like augmented reality—it’s a set of information fields that a vehicle and its onboard controller are moving through and drawing information from. The decisions made by the controller are a combination of information-field-driven and driven by onboard sensors that are responsible for what could be called “reaction”. There’s a tactical and strategic dimension to driving, and the tactical part is surely going to stay within the vehicle.

The good news here is that we’ve (perhaps accidentally) stumbled on the smart-pipe model. It’s those information fields. Any realistic new-revenue scenario will end up here, in my view, and it’s completely feasible to promote network operators as the creators of those fields. It would then make them an intermediary between raw IoT elements and applications that consume the information, which is the role they play now. If operators want to go the distance to the user themselves, they’re free to do that, and if not, they can fall back on “disintermediation light”, a position where they do more than connection but less than OTT competition.

This doesn’t seem to be what Dell, the company named in the Light Reading piece I’ve cited, is thinking. They seem to be looking at transformation purely in terms of cost management and 5G optimization, and that’s not enough. It’s clear that 5G Core might require more virtualization of functions than we see today, but it’s only a “might” and it’s not going to happen soon because of delays in justifying the move from 5G RAN-only (non-standalone or NSA) to 5G Core. Not only that, operational automation has little value if you can’t automate all of operations, not just one-technology pieces. And even if Dell succeeds in smartening operations, it doesn’t smarten the pipes.

I think Dell needs to take another look at this question, and maybe we need to take up that old debate in the industry at large.

Is the MEF Effort to Standardize SD-WAN Helpful?

Standards are not getting great press these days, nor are standards groups. Part of the problem is that open-source seems to be stealing the traditional thunder of standards, and part the fact that standards seem to take forever. That combination can be deadly, and when you think of standards in connection with a hot and expanding concept like SD-WAN, it can be doubly so.

According to SDxCentral, the MEF has set some new goals for its SD-WAN standards, which have been underway since 2017. There are ambitious goals for the new initiatives and for the MEF’s SD-WAN work overall, and given that there are at least two dozen SD-WAN implementations out there (including a new open-source one), you can reasonably question whether the market has already outrun the MEF’s work. In the near term, surely. In the longer term, we need to see what the MEF is trying to do to understand whether there’s a chance it can (retrospectively) impact the SD-WAN market.

You can read the public information on the MEF SD-WAN 3.0 work here, but if you want to read the draft standard you need to request it. The white paper on the page, which is freely available, does lay out the basic architecture and framework for the MEF’s work, but it dates to the launch in 2017. The figure on the referenced webpage provide a glimpse of the major goals of the initiative, which starts with defining standard elements of an SD-WAN deployment and giving them standardized names. To quote the site:

SD-WAN Edge, Physical or virtual
SD-WAN Gateway Between SD-WAN and external connectivity services
SD-WAN Controller Centralized management of SD-WAN edges & gateways
Service Orchestrator Lifecycle Service Orchestration of SD-WAN and other services
Subscriber Web Portal Subscriber service ordering and modification

This is helpful in itself because (of course) SD-WAN vendors don’t adhere to any common terminology, and it’s hard to offer a general explanation of how SD-WAN works without running into differences in how vendors describe the pieces. The MEF does capture the real elements of SD-WAN, and it provides at least logical names for them.

Overall, the MEF is positioning SD-WAN services within their LSO (Lifecycle Service Orchestration) framework, which means (of course) that the MEF work is really more about SD-WAN service than about SD-WAN technology. That does help to dodge the problem with the timing of the work relative to the launching of SD-WAN products, but today the fastest-growing piece of SD-WAN service is the operator-provided SD-WAN, so the onramp for the MEF’s work at the service level is under some time pressure already.

LSO diagrams also lay out the notion of pan-provider service, so of course pan-provider SD-WAN would be a target. The question is whether, given that SD-WAN was designed to deploy over a network not within it, there’s a meaningful pan-provider dimension to SD-WAN unless there’s an attempt made to harmonize different implementations of SD-WAN for different providers. That doesn’t seem likely to work, since virtually every SD-WAN implementation has proprietary elements in tunneling and connection management. I note, though, that it would be possible in theory to create a kind of “SD-WAN Demilitarized Zone” between operators, where each operator contributed an SD-WAN node of some sort and the two nodes were interconnected on a small stub network. The white paper doesn’t describe this sort of pan-provider connectivity in any of its use cases, so we have no basis to assume there’s a specific strategy being promoted by the MEF.

One thing that is being promoted is some harmony in how SD-WAN elements would access the underlay transport network. That’s not particularly critical for SD-WAN vendors or users who acquire SD-WAN technology and build their own service, but it’s important for MSPs and network operators because without some standard interface between the SD-WAN “layer” and the transport “layer”, operators would have to provision transport connections differently depending on the SD-WAN implementation used. The obvious question is whether the MEF could effectively promote such a standard, given how many different SD-WAN vendors there are, and that most vendors would see any “open interface” as a doorway to admit competitors into their deals.

Some of the articles on the MEF’s work have suggested that some feature harmony would be part of the MEF’s work. One vendor was quoted as saying “that prior to MEF’s efforts in SD-WAN, vendors defined their services in a lot of different ways. Service providers need to have a service definition. With service definitions we are defining the vernacular. We are saying what the service components are and how they connect together.” That also seems problematic in a competitive market, particularly since the common vernacular would tend to impose a minimal feature set on buyers when the market trend is to advance new features for differentiation.

I understand the MEF’s interest in basic connectivity features. It will be incredibly difficult to harmonize SD-WAN if you step beyond VPN services, because what lies beyond basic VPN services is anyone’s guess, anyone’s product decision. Furthermore, if you aren’t doing just VPNs you are now stepping into areas where IP standards don’t offer any guidance on interworking, or any base set of protocols or procedures that might offer interworking solutions. It’s the wild west of connectivity.

But is that bad? SD-WAN, even as a service, is really about the “edge” as the MEF’s own material admits with its definitions of the elements of an SD-WAN service. In the terminology of old, we have a user/network interface or UNI in the SD-WAN Edge, and a network-to-network interface (NNI) in the SD-WAN Gateway. An NNI is still an edge, not a deep element in operator infrastructure, and given that it’s not clear whether you really need a lot of interworking of the SD-WAN layer and not of the transport layer. Does the MEF see operators selling SD-WAN interconnect among different SD-WAN implementations? The operators I’ve talked with aren’t seeing that approach at all.

Most of the leading SD-WAN vendors have either said nothing about the MEF work in public, or been cautiously but noncommittally supportive. In private, some have told me that they think the MEF is making a fundamental mistake with SD-WAN, treating it as a featureless IP VPN implementation option when in fact there are already SD-WAN implementations that offer features IP VPNs lack. Some of these features, like the ability to host a software node in the cloud to extend SD-WAN connectivity there, challenge a carrier deployment model because they involve a non-carrier component. Other features, like explicit connectivity control rather than promiscuous connectivity that IP provides, are increasingly found in the newer SD-WAN offerings, and most operators believe that they need many more features, even services above simple connection, to be successful.

It seems to me that harmonizing SD-WAN features through standardization is going to end up emphasizing the lowest common denominator, and that’s already shooting well behind the duck in terms of state of the art. It does nothing but encourage operators, who are already too fixated on simple connection services, to hunker down in their comfort zone and await inevitable doom.

Years ago, on election day in the US, I participated in a mock political debate on smart versus dumb pipes. I took the smart pipe side and won, and my argument was that without something beyond basic connectivity features, operators were doomed to be low-value players that could easily end up needing government protection again. SD-WAN could be important not because it’s an extension to the dumb-pipes model of VPNs, but because it’s an on-ramp to the smart pipes that operators need. It’s fair to say that if operators are satisfied with basic SD-WAN connectivity, they’ll never get smart at all.

Is 5G MEC Really Hype or Just Misunderstood?

Light Reading raised an interesting point on MEC (Mobile Edge Computing) in a story today; several, in fact. Along the way, they’re raising some interesting points about 5G, virtualization, and even NFV. It should be clear to many that 5G has become a kind of perceptual on-ramp for a bunch of new technologies, and we need to understand why that is and whether it’s justified.

The big problem with transformation for network operators is the scope of the activity. If you dabble with replacing a box here and there, or do things a bit differently in one emerging new service area, you’re not transforming as much as evolving. It might take years for new technologies introduced that way to build enough mass to change the service or cost picture noticeably. But if you really transform, you face an enormous initial cost (“first cost” in carrier terminology) and a comparably enormous risk. What everyone realized years ago (and what I blogged on then) was the fact that 5G had the advantage of being budgeted. What could be made to ride along for 5G would be budgeted too, and that might make the difference between it happening and it dying the death of ATM and frame relay.

The focus of the article is the hype around MEC and its association with 5G, and to start with we need to somewhat-separate these two issues. All modern technology is hyped, overhyped, and overhyped beyond possible realization. That’s the nature of click-bait journalism these days. “Man Bites Dog!” has always been a more exciting, novel, and interest-generating story than the opposite. Thus, MEC would have been hyped, and in fact was in the form of “edge computing”, 5G or not. Where there is a symbiosis between the concepts is that 5G budget. If you’re going to hitch your wagon to a star, pick one that’s not falling at a couple thousand miles per second.

One of the first points the article makes is that MEC would require 5G deployments to rearchitect services away from voice. That’s true, but it’s been happening all along, and it’s not really the problem in any event. The issue with MEC and operator 5G deployment is the same as the issue between any new technology and any deployment thereof—you have to justify it. The article quotes a speaker at the LR 5G event as saying that a transformation of architecture is like turning an aircraft carrier in the mud. What gives it that inertia is the fact that we design networks to be built on device principles not on virtual principles.

Most of the elements of mobile infrastructure are the way they are because of mobility, not voice. Arguably the most complex aspect of mobile infrastructure today is the Evolved Packet Core (EPC), which is responsible for keeping mobile users connected as they move through various cells. The design for EPC was done when there was no virtualization, no SDN. If we were to address the problem of mobility today, I’m pretty confident we wouldn’t address it with anything that resembled EPC, but we did EPC and it’s now a massive sunk cost.

The article also notes that Open RAN (ORAN) for 5G has been a slog, and obviously MEC would be even more difficult to push standards on. True again, but ORAN was the obvious place for 5G focus for the simple reason that 5G benefits to customers are almost totally derived from the 5G New Radio (NR). That’s why we had this cryptic concept of “Non-Stand-Alone” or NSA as a pre-specification for 5G. 5G NSA is 5G NR without the rest of 5G, which admits the basic truth that what 5G really provides is higher customer bandwidth, higher cell bandwidth, and lower latency. Anything beyond that is speculative, a benefit not of 5G but of what might be enabled by 5G.

That speculation is how we got to MEC in the first place. Remember that one of our 5G benefits is latency. Well, it makes little sense to minimize transit delay in the radio network if you’re going to haul the traffic a thousand miles through a half-dozen hops to get to where it’s going. MEC is thus a way of making 5G low-latency connectivity meaningful, but not necessarily justified. We still need applications that can monetize low-latency processing to justify hosting near the edge. We know that gaming or augmented reality might do that, and so what we should be looking at for MEC’s business justification is the revenue potential for those two technologies and the totality of the infrastructure and features needed to make them real.

But there’s another path we could take here. Even virtual networks have to get real eventually. Virtualization of networking substitutes agile, generalized, hosting and software resources for purpose-built and fixed-in-place service devices. You still need hosts, you still need generalized transport (optical, Level 2/Ethernet in data centers), and so you’re not virtualizing everything, just (really) the services. The physical reality of what’s under virtualization has its own set of requirements.

One example is access networks. If you have fiber to the home (FTTH), to the node (FTTN), or even copper loop, they have to go somewhere. If you backhaul wireless cells, you’re backhauling them to somewhere. In most cases, the “somewhere” is either a telco edge office (called a “central office” or CO), of which there are about 12 thousand in the US. The physical media goes there, which means that if you’re going to provide something to the connected customers, that’s the first place where significant service intelligence could be hosted.

We’re not going to have hundreds of thousands of micro-edge data centers. The limitations are based on a combination of media topology and real estate cost and ownership. My model, for example, has never suggested that global carrier cloud would generate more than about 100,000 incremental data centers worldwide by 2030. Most of these (88%) would qualify as “edge” in that they’d be located at the terminus of edge aggregation media connections. That’s short of the drama of MEC we often hear, but remember that click-bait technology has no cost, no inertia, to contend with. Bulls**t, I tell my clients, has no inertia.

The real question here, in my view, is whether 5G has any justification beyond the NR/NSA form. Will 5G core, network slicing, NFV VNFs for all the pieces, really go anywhere at all? That’s a question that the article asks implicitly, but there’s no answer to that question. 5G itself, 5G in the standards sense, is the classic field of dreams. Telco has been a supply-side world for ages, and the real inertia in 5G isn’t voice but the notion that if you build a new service it will succeed automatically. The OTTs have been thriving because the telcos have built connectivity without purpose, leaving others to provide the purpose and gain the revenue.

We shouldn’t forget a final point about the click-bait hype age we’re in. Once you’ve hyped, overhyped, and hyped-beyond-all-recognition, you can’t get any further on the positive side of a technology so you have to turn on it. What was once salvation is now a contemptable fraud. The people quoted in the article were primarily representing companies would be hurt by MEC and carrier cloud, so it’s not surprising they’d be totally negative, as it wasn’t surprising early 5G and MEC coverage was totally positive.

How things get covered isn’t the issue here. How they get built is, and we still have some work to do putting the pieces of 5G’s puzzle into a business case picture. Till we’ve done that, we won’t know anything about what the future really holds.

Capturing Innovation for Transformation

Divide and conquer is a popular theory, but it’s not always smart. Networks and clouds are both complex ecosystems, not just disconnected sets of products, and so to think about them as products will threaten users’ ability to combine and optimize their usage. Sadly, we live in a world where powerful forces are creating the very division that works against making the business case for transformation, and we really don’t have any strong forces working to combine things as they should be.

The first of our strong, divisive, forces is history. While data networks were, at first, described as systems of devices (IBM’s Systems Network Architecture or SNA is the best example), the advent of open network technology in the form of IP and Ethernet made it convenient to consider networks as device communities loosely linked with an accepted overall architecture. That model let us expand networks by buying devices, and you could gradually change vendors by gradually introducing new vendors as you expanded your network or aged out old devices. The “architecture” became implicit, rather than explicit as it was with SNA.

Things got even more divisive with the exploitation of the OSI model and the visualization of networks as a series of interdependent layers, each of which had its own “architecture” and device complement. It’s common to find three layers of network technology in legacy IP WANs today, from optical through IP. Further, networks were also dividing into “zones”, with the “access network”, “metro network” and “core”. All these divisions generated more fragmentation of architectures into products, and more integration and confusion for users.

Vendors saw this user perception of risk as an opportunity, and so they responded by trying to create proprietary symbiosis among their own layers, zones, and devices so a success anywhere would pull through their entire product line. Network operators got so concerned about this that they started dividing their products into “domains” and setting limits on the number of vendors per domain and the number of domains a vendor could bid in. AT&T’s Domain 2.0 approach is an example.

Then there’s coverage. While all this was going on, we saw a major change in how technology was covered. Early tech publications were subscription-based, and so they played to the interest of the buyers. As time went on, we shifted to a controlled-circulation ad-sponsored model for tech media. On the industry analyst side, we saw a similar shift—from a model where technology buyers paid for reports that served their interest to one where sellers paid for reports to be developed to serve their interests.

The “magic quadrant” concept is another factor that came along here. Everyone wanted to be in the top-right “magic” quadrant of a product chart, but of course you couldn’t have a chart that everyone won with. So how about many charts? The mantra of vendors and positioning was “create a new product category” so you’d have an uncluttered win in that magic space you just defined.

Which brings us to the next issue, differential competition. My work on the way that buyers make decisions shows that there are two early critical things that sellers have to consider. One is the enablers, the characteristics of their product or strategy that actually make the business case for the buyer. The other are the differentiators that separate different solutions that can make the business case. If you look at how technologies are covered these days (take 5G or IoT as examples), what you find is that the publicity tends to accept the enablers as a given, not requiring proof. That focuses the vendors on differentiation, which of course tends to be product-versus-product. Differential competition creates two problems for the buyer, both serious.

First, the process tends to obscure the enablers that make the business case itself. My project data from the first years of my user surveys (1982) versus the last full year of project data (2018) show the project failure rate (where “failure” means failure to make the business case that justified the project) is nearly triple what it was early on. If you graph the failure rate, you see that the greatest rate of increase came after 2002, corresponding to the impact of all the issues I’ve cited here.

Second, differential competition almost always loses the ecosystemic vision of infrastructure along the way, by looking instead at product-for-product comparisons. That means that integration issues are far more likely to bite during project implementation. Going again to the data, the number of projects reporting overruns due to integration problems in my surveys before the year 2000 was a third that of project integration problems from 2001 onward.

An interesting side-effect here is the overall decline in strategic influence of vendors. Starting in 1989 when I surveyed this subject, it was common to find that the top level of strategic influence where users said vendors had “significant impact on technology project planning, objectives, and implementation”, five vendors fit the bill. In that period, we had two distinct vendor groups—those who had strategic influence and those who did not. By 2013, we saw every single vendor’s strategic influence declining, and by 2018 no vendors fit that most-influence category. Is this because vendors don’t know how to influence strategy, because users don’t trust vendors, or because there’s really no strategy to influence?

Slightly over half the users I contacted in late 2018 and nearly all the network operators told me that they did not believe there was any generally accepted architecture for cloud networking, virtual networks, SDN, etc. Interestingly, about the same number said they were “committed” to these new technologies.

The obvious question here is the impact this might be creating on transformation, sales, etc. That’s harder to measure in surveys and modeling, but as I’ve noted in earlier blogs, there is strong evidence that IT spending and network spending are both cyclical, with spending rising fastest when there’s a new paradigm that creates a business case for change, and dropping when that business case has been fully exploited and everyone is in maintenance mode. Interestingly, the cyclical technology spending model I developed shows that we went through a series of cycles up to 2001 or so, at which time we finished a positive cycle, went negative, and stayed in maintenance mode ever since.

The tricky question is what generates a positive cycle, other than the simple value proposition that you have a business case you can make. It seems to come down to three things—an opportunity (to create revenue, lower cost, improve productivity, etc.), an architecture that credibly addresses that opportunity and can be communicated to buyers confidently, and a product set that fits the architecture and is provided by a credible source or sources. That we don’t have any of these things is the conclusion of my points above.

For enterprise buyers, the open-source ecosystem surrounding the cloud, cloud-native, and microservices seems to be providing the architecture and product set. We’re still lacking an articulation of the opportunity and the architecture, though. That doesn’t stop enterprises from migrating in the right direction because technologists can promote the new approaches being defined, given that their scope of adoption and impact is limited. Operators, who face a longer depreciation cycle and a greater chance of stranded assets and massive turnovers in the case of transformations, are more reluctant to move without a full understanding of the future model.

I think the surveys and modeling I’ve done over the last 30 years shows that we do better when we have a solid set of “enablers” to drive a technology shift. I think it’s also clear that a solid set of enablers means not only having business case elements that are true and that work, but also ones that are promoted, understood, and accepted enough to be validated by peer consensus. The most important truth about open-source software isn’t that it’s free, but that it’s been successful in building a model for the future of applications and computing because it establishes a community consensus that eventually gets to the right place. We can only hope that somehow open-source in network transformation can do the same.