Operator Services and Profits in the Cloud Era

Networks are obviously anchored in connectivity services offered by network operators, carriers, service providers, ISPs, or whatever you’d like to call them. But all these terms are going to fall away except for one, which is ISP. The only “network” anyone will subscribe to in the future will be the Internet, and the Internet and the networks that will exist within public cloud infrastructure will be the only wide-area networks, period. How I justify that, how we get there, and what that means is the subject of my final blog in this series.

There was a time when telco revenues were fairly balanced between businesses and residential users. That’s not true even today. There was a time when just getting access to 45 Mbps of bandwidth would have cost you ten thousand dollars a month. That’s not true today; you could buy service that fast for a hundred households with that budget now. The times are changing, and we need to explore why that is.

The first reason is that all the growth opportunity in number of customers comes from individuals, not businesses. The number of business sites reported by government data shows that business sites grow in major markets by a percent or two per year. Not only that, the great majority of the sites are single-site businesses who clearly don’t have any need for VPNs. Instead, they get their business services in the form of repackaged residential broadband. Not only that, if we look at the “enterprises” who represent real multi-site networking opportunity, 95% of those sites are little different from a single-site business in terms of network needs. It’s taking time, but SD-WAN is gradually eating away at VPN services.

The second reason is that the Internet has created a conduit directly to the market, reducing or eliminating the need to have “branch” locations for human-level engagement. My data shows that over two-thirds of all sales (in dollar terms) are either completed online or researched online prior to a visit to a retailer to simply pick up the product. Customer support is similarly shifting to the Internet. The effort to wring out operations costs to improve profits has induced companies to actively plan to reduce human intervention in sales/support. Every enterprise, and nearly every SMB, needs a web presence as much or more than they need a human presence, and that has already impacted the pace of growth in remote offices.

The third reason is that competition for Internet customers is improving the service quality level of the Internet. Just a decade or so ago, it was common for me to be asked to “use a landline” meaning TDM-based telephony if I was going to participate in a conference call. Today, VoIP is the rule and in most cases it’s Internet-based. Internet failures still occur, but if you catalog them by person-hours impacted you’d find that the problems are usually related to BGP routes or DNS services, not the physical access infrastructure.

Reason number four is that smartphones have become the preferred information conduit simply because they’re always available. Even people who aren’t obsessed with being in touch (among which I’d count myself) end up using a smartphone for more communication than any other option, and except for my work, I use a smartphone way more than I use a computer. This means that mobile services offer a better option for new customers than wireline, but it also means that services can now go with us, be a more direct part of our lives. And smartphones are important, even critical, because they’re our window onto the Internet.

All this adds up to a single truth, which is that “the Internet” is going to offer the lowest unit cost of bandwidth of any network option available. That means, inevitably, that it will displace other network technologies almost completely. Yes, there will still be companies that can justify direct fiber connections between major sites within a metro area, but beyond that we can expect to see business networks become business Internet.

How about applications like IoT or service features like network slicing? Aside from the fact that IoT means “Internet of Things”, does anyone really believe that we could create a vast IoT network based on anything but the most pervasive and least expensive network model, which is the Internet? And does anyone believe that we can create slices in a mobile network for any new premium service, with the baseline Internet always available as an alternative? Network slicing is probably a feature useful in supporting MVNOs, but MVNO opportunity is itself based on a desire by mobile operators to address the price-sensitive part of the market without discounting their mainstay mobile services. And we have MVNOs today, so it’s hard to say whether slicing’s contribution to that will be even meaningful.

The net of this is simple. There is no future connection service that can survive competition with the Internet, and so there is no hope that new connection services can improve operator revenues and stem the tide of declining profit per bit. Every operator in the future is an ISP, or they’re done.

It’s also true that the Internet supports a different model of “service”, one focused on the delivery of an experience rather than the support of multi-party communication. What we call “content” today is an example; we listen to online music, watch streaming video, download software, and so forth. The capacity we use to do things like VoIP or even video calling is minimal by comparison. Given that investment will tend to gravitate to the options that support the greatest return, that means that experience creation and facilitation are the future. In fact, they’re really the present, too.

The best example of this we have in play today is the Content Delivery Network (CDN). We call this a network, but it’s actually a set of cache points where content is stored, located at the inside edge of the access network so the content can be delivered as an experience with a minimum of handling. Many, including myself, believe that CDNs’ role was obvious almost from the inception of the worldwide web, and yet operators never took advantage of the opportunity, even though the fact that they sat at the access edge made them a logical player. It’s fair to say, though, that this failure wasn’t entirely their fault; regulations hampered operator entry into markets beyond simple connectivity, and that’s still the case.

Even without CDNs, though, the Internet is more than connection. You need, for example, a DHCP server to assign IP addresses and a DNS network to decode URLs into IP addresses. A CDN has to have URL mapping so that requests for content are directed to the closest cache point rather than to a specific place that might work for some users and be hopelessly impacted by latency and congestion for others. Facilitation is as important as connection, but to facilitate something you have to offer something specialized to the mission that’s being supported, which means that we have to expect that future experiences will require facilitating features that aren’t needed today.

And that’s the opportunity for the network operators who want to boost their revenues and profits, but realizing the opportunity means understanding what facilitation those future services will be. AT&T, for one, has committed to working upward from connectivity to attempt to define them, but I think the right answer is to look at contextual services, the services needed to exploit the closer ties between people and smartphones. Will an operator do that successfully? That’s the big question for the industry now, because if there is no facilitating service opportunity set realized, operators may stall investment in infrastructure and increase their demands for subsidization.

I’m hopeful that the AT&T initiative on facilitation spurs other operators to review the concept, and I’m also hopeful that AT&T and those other operators will eventually realize that they have to have a vision of the future retail services to offer any meaningful and financially successful facilitation. If I’m right, then this is the pathway operators will take to sustain themselves into the future.

How the Real Cloud-First Changes Everything

I’m a programmer and software architect by background, and a student of computing history. I’ve seen the transformation of computing from the retrospective analysis of “keypunched” paper records of retail transactions to highly interactive assistance with online, real-time, activity. Through that, computers have migrated from data center to desktop to smartphones in our hands, and to the cloud.

All of this has changed our relationship with computers totally. Despite this, our thinking about computing still focuses on the data center, the same data center that would have held those old-line “mainframes” with punched-card readers, sixty years ago. Can we exploit the current state of computing and networking, the state that has given us the Internet and the cloud, when we’re still focused on data centers? That’s the real question that “cloud first” planning should be addressing. It’s what we’ll address here.

The cloud is not, on a system-by-system basis, cheaper than a data center. Run the queuing theory math and you’ll see that most enterprises would achieve economies of scale comparable to cloud providers. What makes the cloud different is that it’s elastic in capacity and distributed in space. Achieving those properties in the data center would require sizing resources well beyond “average usage” levels, and that would add to data center costs, where it’s already a property of the cloud. That means that proper cloud applications are those that benefit from those cloud properties, and that’s where changes in behavior come in.

The changes in mission for computing that we’ve seen boil down to two things. First, we have replaced “shopping” with the Internet equivalent. We now browse products, make decisions on what to get, and ever-more-often even complete the order entirely online. Second, we have exploited portable and mobile devices to move information delivery to workers outward to their points of activity. We don’t ask them to go to the computer, it comes with them. These mission shifts are the thing that’s been driving the cloud.

In the old days, a shopper might go to a counter and ask a salesperson to see (for example) a watch. They’d look it over, ask questions, and if they were satisfied, make the purchase. The chances of a successful purchase thus depends on that at-the-counter experience, and when you eliminate that in favor of an online experience, you need to be able to provide the prospect the same level of confidence that they’re making the right decision. You need to move the online process close to the prospect so it can hand-hold through the product evaluation and sale, so you need “the cloud”.

In the old days, a worker would have a desk in an office, and when they needed company information to do their job, they’d go to that desk in that office and use a computer there. If they needed some information to be taken away from the office, to a prospect or even to a warehouse, they’d print something out. Today, the worker would simply use a phone, tablet, or laptop, and carry it with them everywhere. If they needed information, they’d use a “portal” to the company databases. That portal would have to be customized to their use of the information in order to optimize their productivity. They’d need the cloud too.

Both our missions need the cloud, but neither mission needs the cloud to absorb all of our IT processes. If we consider the evolution of computing I cited above, we see that the basic information that’s associated with a business transaction (like the purchase of a watch) hasn’t really changed. It’s how IT relates to the purchase process that has changed, which means it’s the part of IT that does the relating that’s a cloud candidate. The central database, the processing of transactions against, the regulatory and management reporting needed to run the business, are all largely the same. In fact, in most cloud applications today, the cloud creates a front-end to legacy applications. So what we should have expected from the first is that enterprise cloud applications are hybrid cloud applications. We should thus visualize the application model of the future, the one that’s already evolving, as a model where the Internet and the cloud form a distributed virtual GUI that is then linked back to the data center. Given that, we can see that “the network” is no longer a VPN that links all the branch locations, it’s now part of the Internet and the cloud.

From a compute perspective, then, the goal should be to gradually, through regular application modernization, slim down the core data center applications to be nothing but transaction processors and analytics associated with the corporate databases that are collected from transaction processing. Beyond this, the logic should be pushed out into the cloud where it can better support the real-time relationships with customers/prospects, partners, and employees. This is already being done, but I don’t see much awareness of why, and so the process of modernizing data center apps isn’t necessarily being directed optimally.

From a network perspective, this means that corporate VPNs are going to diminish because office worker traffic will come in through the Internet/cloud front-ends. Businesses like banks might, for a time, still justify VPNs for specialized devices, but even that is likely to vanish as we start building true IoT and online appliances designed to be used on the Internet and with cloud applications. What will replace them is a virtual network model, much like SD-WAN, but likely with a cloud-hosted security component that makes the edge what’s popularly known as a Secure Access Service Edge or SASE. Thus, it would be smart for enterprises to be planning for this evolution.

If we consider the computing and networking pieces together, the best approach for planners to take, and for vendors to support, would be to start moving workers to Internet/cloud portals and SASE rather than through office-based VPNs, and to adopt SD-WAN to replace MPLS VPNs where they were still needed because applications hadn’t fully migrated or specialized devices needed VPN support.

Of course, service providers (particularly the ISPs and other network operators) will play in all of this. Some managed service providers (MSPs) are already offering SD-WAN services that include multiple Internet connections, wireless backup, etc. As the quality of Internet access improves, it paves the way for a transition to truly making the Internet into the universal dial tone of the future. That’s a good thing, because it’s inevitably going to become just that, and operators and enterprises alike should start planning for it. The fusion of the cloud and the Internet will create, finally, one network.

This fusion will facilitate a more complete mission transformation, at the user level as well as at the technology level. The cloud, edge computing, enhanced devices, new software models…all of these things combine to get information power close to us so it can be a part of our lives. In a sense, it’s a reversal of the metaverse model of moving us into virtual reality. Instead, we move virtual reality out to touch us, so the power of AI and digital twinning can make computing more effective in supporting what we do. It’s a future that we can approach slow or faster, depending on how effectively we can direct the sum of cloud and network toward achieving it. I’d love to see that be a target of venture capital, because that’s where startup innovation would serve us all best.

Can We Shake Our Addiction to Ad-Sponsored Online Services?

While an old-time science fiction writer once said “There ain’t no such thing as a free lunch”, consumers of online services have been binging on free stuff from the first. Of course, “free” really means “ad sponsored”, and since everyone is used to ad sponsorship through TV commercials, applying the concept to online experiences doesn’t seem bad. The challenge is that, not surprisingly, ad spending tends to be related to total retail sales and grows at slightly less than the GDP. Not only that, the growth rate for online ad spending has been declining every year for the last five years.

If online services are to be ad-sponsored exclusively, it doesn’t take a statistical genius to realize that the number of new services, the total time spent online, and the profits of over-the-top providers are all at risk. We’re already in a situation in the social media space where the success of a new platform like TikTok comes at the expense of an established one, such as Facebook. What can be done, if anything, to release the constraints that online experiences now seem to be under?

One piece of good news is the increased reliance on online shopping. A retail product sells through a chain of players and processes, and traditionally the goal of advertising has been one of marketing, in this case generating a trip to a storefront retailer to buy something. With increased online fulfillment, the focus of advertising has been shifting to include direct execution of the purchase. You regularly see links to a given product on Amazon, for example, rather than just an ad targeted at making the prospect want to seek out a place of purchase. By cutting down on the number of players involved in a retail sale, it’s possible to spend more on the players that remain, which in this case includes advertising. Growth in ad sales, which had been generally tracking GDP growth, are now growing faster than GDP, and in some markets nearly twice as fast.

Additional adspend is good news for the OTTs, but of course it is a temporary process, and in fact it may be self-limiting. Direct ad-to-online-purchase links make buyers comfortable with online purchasing, and less dependent on advertising. If advertising leads someone to Amazon (for example) to buy something, it makes sense to simply go to Amazon for purchases and ignore the ads, which most of us do for most ads anyway. In any event, the positive impact of direct ad-to-purchase links on adspend is already dipping, as the slower growth in adspend overall proves.

A better option for OTTs is to follow the pattern of television and offer paid-by-consumer services. We have cable channels that don’t take advertising but charge for their programming, and people do subscribe to them. However, it’s difficult to get an audience to start paying for something they’ve traditionally gotten for nothing (ad sponsored). Even for material never aired in ad-sponsored form, there’s constant pressure to get people who have ad-sponsored options to pay for content. Netflix is now going to offer a service tier that includes ads to broaden its audience.

We need to look deeper at the issues here. All online experiences don’t have the same food chain from production to consumption, and in particular the production mechanisms of the material vary significantly. In content, we have material that’s now not subject to copyright, material that’s syndicated for consumption after its primary release, and original material. Obviously the first of these types of content is less expensive to deliver and requires less ad sponsorship than the last type. Within our last type, we also have different types of content—music or video—and different models of creation. A reality show is less expensive than a Hollywood movie, and a multi-player game less expensive than either of those.

Social media is a form of content that’s self-authoring. The users provide their own reality show cast and script, and that means that production costs are lower and that ad revenues build profits faster. However, social media is really a form of communications, and people tend to focus on a platform because there’s only so much communicating they are looking to do. It’s one-dimensional, which is why companies like TikTok can create angst for older platforms like Facebook. Nobody wants to watch TV when only one series is playing, and eventually they change channels.

Taken in this light, we can perhaps understand why Meta rebranded itself after the metaverse concept. Properly done, a metaverse is almost self-authoring content. The cost is in the development of the software and the deployment of any necessary compute/network infrastructure. A metaverse is an alternate reality, not just a way of communicating in the real world. You can create different metaverses (or spaces within one) to reflect different issues. You could have a virtual world that mimicked some imaginary scenario, and let the users/members interact within it. Instead of watching a show, you’re a character in it. We pay for watching shows, so why wouldn’t we pay for being a character in one?

You could also work in a metaverse, be educated in one, even be examined and diagnosed in one, and all these are things that don’t rely on ad sponsorship to work. The point is that by creating a virtual world, you create a value framework that can be worth spending something explicitly, because it replaces something that’s already being paid for directly.

Another strategy is to leverage assets used to create ad-sponsored services to build other for-fee services. Obviously, Meta could create a social-media-like framework built on the metaverse but with expanded immersive interaction (which is what I think they intend to do), and charge for use, leveraging its customer base. Companies like Google, whose ad-sponsored services required the building out of massive databases, data centers, and networks, have already used that infrastructure to offer paid services like cloud computing.

The challenge with this approach is that you need some target service in mind, and the most obvious one is cloud computing. That service has way too many incumbent players already, and price commoditization is already a factor there. Service targeting would have to include framing some specific cloud mission, and even the current cloud giants have had challenges coming up with credible new missions that could be used to refine their feature set and improve margins.

There’s another basic truth at the end of this trail, which is that in the long run, revenues and profits for the sum of products and services in a market define the components of GDP, which means that growth is going to be constrained by the size of the economy. You can see the truth of this in Wall Street’s recent behavior, which I’ve been talking about in the TMT Advisor “Things Past and Things to Come” podcasts. The financial industry is distorting and exploiting the markets to build gains, instead of supporting things that are really gaining, because it’s become easier. That means there is a real danger in not facing the facts about ad sponsorship, and working hard to find credible alternatives.

Facing the New Model of Technology Adoption

One of the hardest lessons we all have to learn in life is that you can’t go back to the past. I suspect everyone has had a personal lesson on that topic, but the same principle applies on a broader scale, to populations and economies. We need to learn it now.

One of the things I get a lot are comments on the role of technology in moving the markets. Not surprisingly, most people on the vendor/operator side are “supply-side” thinkers, meaning that they believe that technology advances are adopted because they’re better, and that market movement is therefore created by technology shifts. End users are different; they see themselves as “shoppers” in a grocery-store sense. They go to the store with a shopping list, and while they are sensitive to other items that present themselves, that sensitivity is often created by recognition that the “other item” would serve a specific purpose. They recognize the value connection when they see the item. The tension between these perspectives is what makes traditional sales a complex process.

The tech market is also influenced by tech media. In the last 40 years or so, we’ve shifted from a publication model of paid subscription to an online ad-sponsored model. As that’s happened, it’s changed the nature of what’s published from one focused on the buyer to one focused on the product. The difference between an online article and a vendor press release is getting ever smaller, and the difference between either of these and a TV commercial is shrinking too.

I’ve spent a fair amount of time this year trying to see what’s going to happen in 2023, not a surprising fact given that I always try to look ahead at market conditions. This year has proved more complicated because it’s brought the seller/buyer personas into even more tension than usual.

Over the last 40 years, enterprises have largely abandoned the notion of “network planning”. Back in the ‘80s, they saw both network and compute technologies as projects, an infrastructure that was gradually evolving as it took on new business missions. That’s because we were in the early phases of empowering workers, customers, and partners with information. Now we’ve matured in both computing and networking, and enterprises treat both technologies more like office space; the underpinnings are just there, and you clean your facilities periodically with a fixed budget. Even the network operators who used to have a formal planning cycle as recently as a decade ago have been moving away from that model in favor of simply budgeting maintenance and addressing pain points.

Another force that’s entered the picture is the dot-com crash and regulations, specifically Sarbanes-Oxley or SOX. The crash at the end of the ‘90s was arguably brought on by Wall Street analysts who hyped stock valuations to the point where responsible financial metrics like earnings per share were meaningless. However, SOX had the effect of making companies focus on the next quarter, because instead of valuing a company on the long-term potential, you hopped from quarterly report to quarterly report. The future depends on the casual presumption that current-quarter forces will continue to drive progress. But will they?

All this has created a major challenge for tech, and the financial issues this year have exacerbated that challenge by introducing the unwanted notion of risk. The pandemic created a market problem, but it was one that had a specific cause and admitted to a set of familiar government remedies. As 2022 rolled around, everyone was expecting a return to normalcy, which meant a return to what had been true two years earlier, before COVID. That meant returning to that notion that the path to the future was defined by those successive quarterly reports.

People lived differently, and companies operated differently, during the pandemic. What was particularly important was the shift from “going to the store” and buying online. In fact, even “going out” was replaced by online relationships and entertainment. It was a profound shift, something that relied more on tech than ever before, and it kept tech strong through the period. But the pandemic is now over, people are going to the store, going out. Many saved up a lot of money, because they had little to spend it on, were afraid to spend, or both. People who had savings were now able to take some time to find the right new job, and we saw a major shift in the labor force. These forces hit production and distribution of goods, reducing supply as demand was increasing with the feeling that “the pandemic was over.” The fact is that we have undergone a major transformation here, perhaps more significant than the onset of the pandemic, and that major transformation is now facing tech.

The transformation of consumer behavior has exposed a weakness in consumer tech, which is the reliance on ad sponsorship. As I’ve pointed out, ad spending has tended to grow a bit slower than global GDP, and so far the online market, particularly social media, has succeeded by taking market share from traditional advertising. That shift has made television more challenging; many shows today have 25% ad content, and we’re seeing content production getting bought out as the market tries to accommodate shrinking revenue growth potential.

The concept of the metaverse is arguably one of the major advances in technology, but its realization is tied to social media, which is in turn tied to ad sponsorship. Meta itself, the obvious primary player in the metaverse, may have too many quarterly earnings challenges to jump wholeheartedly into metaverse fulfillment. The economic challenges this year, combined with Meta’s issues are likely the major reason why venture capital is holding back on funding startups.

In business technology, the problem is the “quarterization” impact I mentioned earlier. It’s difficult to get quarterly growth with product strategies that demand considerable long-term thinking and planning, and so we are actively avoiding even considering the major, systemic, impacts created by both the economic shifts I’ve noted above, and the technology shift that’s been generated by the evolution of the Internet and the cloud into what’s essentially a single marketing/sales ecosystem.

“Online” is redefining how we live and work, and that means it’s redefining how virtually every element of every country’s GDP is being created. This is a truly seismic shift, and while it’s been underway for thirty years or more, it’s reaching its peak now because the drive to enhance delivery of information to customers, partners, and workers has finally integrated computing and the web, with the cloud. Online experiences have been driven by ad sponsorship, but part of the cloud revolution is that we’re bringing them to the core of how we do business. That’s leaving the OTTs wondering how to continue to drive revenue growth, and it’s changing what we mean by both computing and network infrastructure. It’s uncomfortable, but there’s no question that we can’t go back to the comfortable time. We need to face the future and deal with it, and so I’m going to blog about some of the changes we need to make through this week.

What the H*** does “Cloud Networking” mean?

Is there any such thing as “cloud networking”? On one level that seems like a dumb question, but if you look deeper into it, you see that we have a whole range of vague and often contradictory definitions. That’s too bad, because it’s clear that the cloud is transforming computing and applications, and because the cloud is inherently distributed, it’s transforming networks too. We just don’t quite have the range of the transformation yet.

It’s not that cloud architectures are, in diagrammatic terms at least, major changes over past models. In batch applications, we had the classic input-process-output flow, and as parallel computing models emerged, what tended to happen was that the front two elements were made into separate, parallel/scalable, components. The “output” piece, representing the actual database activity, was largely serial. In cloud computing, what’s happened is that those two front elements were pushed out into public cloud hosting and the back-end piece stayed largely in the data center.

Because cloud computing utilizes fully distributable components running on a global resource pool, it presents a bunch of different potential on-ramps for work, the “input” piece of the picture. As applications have transformed to optimally use cloud resources, some of the “process” pieces have also migrated to the cloud, and often these pieces run in a smaller set of cloud locations, proximate to the point where the data center is connected. This means that application workflows mimic a kind of “tree-in-the-wind” model, with the edge branches shaking around over a wide area, feeding bigger and less-mobile branches, and finally a largely stable trunk. That model is why the real “cloud networking” exists.

Networks, meaning specifically wide-area networks, have historically connected facilities. In the cloud-tree workflow model, there is no real facility to connect. That means that a corporate WAN can’t be built to connect these now-virtual places. The edge pieces are moving around in response to changes in workloads, failures, and so forth. If we assume (as I believe we should) that not only customer and partner access but also employee access to applications shifts to the cloud, you can see that the original notion of a VPN goes out the window for the simple reason that the traffic is moving increasingly within the cloud.

One striking impact of this is the growth of interest in SASE. The value of SASE is that it’s a virtual, secure, service termination that presumes it will be deployed in the cloud, which is where the access edge components of applications are now hosted. Just as other real application components can be shifted to match conditions, so SASE can be shifted, scaled, redeployed, and so forth. If it includes encryption capability, it can secure traffic within the cloud and onward to the data center. If it can support multiple cloud applications in and a single one out, creating a “node”, it can build an almost-traditional network model in the virtual world.

Another impact of the tree-cloud-network model is that all those tiny branches waving around translate to a need for connectivity not only in very small offices where traditional business network technology is too expensive or even unavailable, but also in the homes of workers, and maybe even casual locations like hotel rooms or airport lounges. The closest thing to universal broadband access that we have on the planet is broadband Internet, and so we can assume that more and more applications will be accessed via broadband Internet connections to cloud front-end elements. Where MPLS VPNs already exist, they might feed the cloud front-ends too, but the transition to Internet on-ramps for applications seems both clearly underway and inevitably complete. Thus, SD-WAN concepts become the model for creating a separate and secure address space for both application components and workers.

If we assumed that all corporate computers were concentrated in a single data center, what we’d end up with here is an Internet/cloud federation of services that then connected cloud-to-data-center in a massive (likely redundant) trunk. In the real world where there are often multiple data centers, either distributed for availability management or regionally to reflect local needs, what happens instead is the creation of a data-center-interconnect (DCI) grid that then connects to the cloud. This structure then supports the SD-WAN-modeled VPN.

I think that this picture describes what’s actually happening in the “cloud-network” space, but as I’m sure you realize, this picture isn’t really being presented by vendors. Each element of the picture is presented as an almost-independent product, in the name of avoiding the classic sales blunder of “boiling the ocean” and expanding the scope of the project to the point where near-term revenues can’t be realized. The challenge for enterprise planners is that the big picture is actually what’s driving the bus here, and not recognizing what’s happening makes it hard to judge technologies and products correctly.

One example of this is Graphiant’s launch. According to an article in Fierce Telecom, Graphiant is a NaaS product (whatever that is in today’s world of fuzzy definitions) based on a “stateless core”. If you read the article, it’s difficult to understand just what Graphiant is doing or even why it’s interesting. In fact, some of the stuff it a bit frightening; “At the heart of Graphiant’s pitch is its so-called “stateless core,” which acts as the hub to which each endpoint is connected and through which all traffic is routed.” Does this mean that each endpoint is homed to a single hub? That would be a disaster for a global cloud-network.

Graphiant is really providing a kind of super-SD-WAN-as-a-Service model, meaning that it’s a network provider rather than a product provider. The model is most similar to SD-WAN provider Cato Networks in that it isn’t a set of SD-WAN products acquired by a managed service provider (MSP) and then used to create a service for sale. However, the capabilities Graphiant offers are very similar to those that could be created by an MSP with the proper SD-WAN technology on which to base their services, and that’s the key point.

If a cloud network is inevitable, there are two ways to get there. One is to acquire the technical elements and build the thing, but that requires that broad understanding of where things are heading and what’s needed to get to the right destination. The other is to acquire a cloud network service that gets you to where you want/need to be without having to understand the details and build them into your own network.

The downside of this approach, other than the fact that there’s almost surely a higher cost associated with it, is that without understanding the way cloud networking is evolving and the way that your MSP is addressing that evolution, how would you know that the network you’re getting is really what you want in the long term?

You need to have SASE/SD-WAN edge features in a cloud network. Your SD-WAN implementation needs to be fully resilient as any cloud application would be, which means that workflows originating at any user point have to be secured on entry and put onto a VPN, which must then be able to route to the string of cloud components needed to process the work, and onward to the data center, without loss of connectivity if intermediate nodes scale or redeploy (this is what I think a “stateless core” means). Some SD-WAN implementations don’t allow for cloud hosting of nodes, or require stateful behavior of nodes to complete routes, so it’s important to know whether that’s the case in any SD-WAN plans you undertake. When you build a cloud network, or buy a cloud network service, you’re still building/buying a network.

The reality here is that a cloud network is a virtual network that is likely created using a combination of SD-WAN technology and the broader virtual-network technologies used for things like data center segmentation. The future of application networking, as I’ve said, is virtual, and every enterprise should be asking vendors how to build and maintain a virtual network that has the attributes needed to network that tree-cloud I’ve described here. Every vendor should expect that they’ll need to answer that question, too.

Contextual Service Software: Part 3 of the Series

For this third and final blog in my “contextual services” series, I’ll look at the actual higher-level software requirements for the concept. Remember that my goal in the series is to frame a set of facilitating services or features that would be exposed via APIs and provide operators with a means of wholesaling capabilities to OTTs in order to support higher-level “retail” services. Of course, operators could also exploit these APIs themselves, subject to regulatory policies.

Let’s start by saying that contextualization, in software/processing terms, is the intersection of events and policies. The events are signals of conditions of interest/value to a user, and the policies define how the user would like his contextual applications to respond to those signals. My assumption (from my last blog) is that the policy storage would be on a user device, within a blockchain like Ethereum’s, but I also assume that contextualization could involve enough signals and reactions that the overhead and cost of blockchain use might be prohibitive. I also think that we should assume that a user would have to enter a state of “contextual sensitivity” that would render the user’s technology sensitive to contextual aid. With these assumptions, I think we can lay out a sequence of processing, in two pieces. The first is the policy side and the second the signal side.

The policy side is initiated when a user sets themselves in a contextual sensitivity state, meaning they open or activate an app on their device that indicates they want contextual help. I assume that any given user would have a number of what I’ll call contextual states, representing activities where contextual help is valuable. For work-oriented states, these would be aligned with work activities, and for personal/recreational there would be other states, such as “shopping”, “looking for a friend”, “sightseeing”, etc. A state would be linked to a service and parameter set. When a user activates one, the policies from that state in the user’s blockchain would be transferred to a context agent that could be resident in the device or in the cloud, or even split between the two. There, these policies would be used to respond to signals.

Which we’ll now introduce. A “signal” is a process event triggered by the analysis of local conditions based on those stored policies, conducted by a context agent. The assumption I make is that “local conditions” are represented by what I’ve characterized in past blogs as “information fields”, where the term “field” means “area of impact”. For example, if it’s raining in New York City north of 41st street, there would be a “raining” information field associated with that geography. If there is a sale on at Macy’s, there would be a field centered on the store location that would represent the things on sale. My conception was that information fields would be stored in a database, indexed by the locations within the field.

When the user’s location is identified, or when it changes by what’s deemed a significant amount (as defined in the context state, but a block, for example), the user’s context agent would query the information field database and obtain those fields to which the user’s policies indicated sensitivity or interest. Each information field would then be processed according to the context state data provided by the user, and when there was a match between information field and context state, the identified process would be activated.

If we generalize this, we could say that the user’s context state would identify the conditions that signaled a query of the information field database. For example, a work task like changing a fuse might trigger a query explicitly, meaning that the user might be told “open the fuse box and click NEXT when done.” Any condition that could be sensed by the user’s device, by a device like an IoT sensor, or signaled explicitly could be used to initiate an information field query.

In summary, what happens is that when a “context signal event” occurs, the context agent would query the information fields database to determine which of the conditions user policies said were interesting were actually present. For each such information field, the context agent would match its parameters against the user policies, and when a match was found would activate the process indicated. That process could be a device notification or some on-device or externally run software element. To make this work, then, what do we need?

A lot of what’s needed is data models. First, we need a standardized structure and coding for our information fields. We have to be able to create that database I talked about, and allow any context agent to access it to process events against it and against user policies. Second, we need a standard for coding policies in blockchain identity records, and in the form that would be passed to context agents. This might take the form of a policy language specification. Third, we need a standard signal event coding so we can activate the context agent properly.

Then there’s the APIs. We need to be able to submit information fields to the database, and to access the database, and both those mechanisms should be in the form of an API. We also need to be able to activate processes from the intersection of policies, information fields, and signal events, which is also an API.

Operators, in my view, would host this, and could contribute (directly or through their users) information fields and signals or triggers. Where regulations allowed, they could offer the services retail (directly or through a subsidiary, again depending on regulations). The bulk of the data and the applications (context agents, processes associated with the handling of signal events, etc.) would likely be created by third parties as part of a retail service.

Regulations are the last point here. There are multiple levels of regulations that could apply here. One is the question of whether a regulated entity can offer an “information service”; this is the dominant issue in the US. The general answer is either “Yes” or that the entity has to do so through a fully separate subsidiary. Another level is the privacy protections associated with customer data. In most cases, this applies to data acquired through the operation of the regulated service. Finally, there’s the question of “custodial” behavior; if the operator stores third-party data, would that data be subject to privacy regulations even though the operator didn’t own it. All this is murky to say the least, so I asked operators for their views.

I’m told by operators globally that there would be no regulatory barrier to their providing standards for data structures or mechanisms for creating contextual services. In some geographies, there may be barriers to operators supplying personal data, but my proposal is that the users’ personal data is stored in a blockchain, and is supplied parametrically to the context agent on user command. Most operators think that wouldn’t violate regulations in their market because the user is sharing the information and there is no persistent storage of the policies and parameters extracted by the user from the blockchains.

About a third of operators believe they could actually provide the complete service as long as they used a separate subsidiary, but there are some questions regarding how that subsidiary would be able to access the network or real estate owned by the regulated operator entity. Most operators, both in this group and overall, were less interested in being the contextual service provider and more interested in contributing wholesale data or being a repository, with both options falling into my “facilitating services” category.

That raises the final point. I do not believe that there are any significant opportunities for new service revenues for operators that don’t fall into either contextual-services or would be based on a similar model of facilitation. I also think that regulators are beginning to realize that they either have to give operators more latitude in creating return on infrastructure, or we end up in nationalization or subsidization. You can’t wish away financial realities, and even governments are (reluctantly) accepting that.

Technology Support for Contextual Services

If, as I speculated yesterday, the optimum new-service strategy for network operators would be a set of facilitating services that exploited contextualization and personalization of mobile behavior, what would the technology requirements look like? Operators need to balance feature specificity to raise the value of their services to the OTTs who would frame them in retail form, but also to ensure that whatever they do is useful across as wide a range of retail services as possible. Specific features, broadly useful…sounds like a balancing act. It is, so let’s develop a plan starting at the top.

Which is that what we’re doing is serving mobile users with contextual information. Mobile users move around, but at any point in time, they’re in one place. Most of the time, they’ll be in that one place for a while, and most of the time that one place will be within a single metro area, the place a user lives or works. That single metro area has two desirable characteristics. First, it contains a lot of users, so things done there can benefit from economies of scale. Second, current access network technology for both wireless and wireline terminate there, so there’s efficient connectivity to users whether they’re sitting in their living room or office, or wandering about on the streets or even in parks.

The reason why metro user density is critical is that it’s hard to imagine how contextualized, personalized, services could be hosted without a significant economy of scale. A metro resource pool is feasible because there’s a lot of applications that could be hosted there. If you go further out, to what many see as “the edge” at the cell site, you have little chance of getting economies of scale and thus little chance of getting reasonable cost levels.

The close-to-access point is equally significant because it’s possible to capture user identity and to personalize services at the off-ramp to the access network. For mobile services, you have access to the mobility management elements of LTE or 5G, so you can know who everyone is and get a rough location even without accessing GPS data from a phone or capturing identity by having someone pass a Bluetooth sensor.

There’s another point about metro that cuts both ways, though. If a cellular mobile provider is also a wireline incumbent in a given geography, they surely would have metro facilities in which to plant resource pools for hosting. If they are not, then they have a problem because they likely do not have the real estate, and would have to consider the cost of acquiring space, creating proper environmental conditions, installing a data center, and staffing it with qualified resources. All is not lost, though. There are three ways operators could deal with the in-my-area-versus-not problem.

Way number one, for operators with a territory and resources, would be to offer contextual services only within their resource footprint. This may sound like a losing proposition, but the fact is that contextual services are most valuable in the area where people regularly move, meaning where they live and work. If an operator focused contextual services on their customers within their resource footprint, chances are those customers would stay in their zone most of the time. This strategy wouldn’t be suitable for operators who didn’t have wireline service offerings and so didn’t have a resource footprint to exploit.

The second possible strategy would be federation. If we assume that contextual/personalized services are the right approach, then it would be likely that competition would tend to force most operators to offer them. If the APIs were standardized, then operators could federate their contextualized and personalized facilitating features, creating a uniform platform for OTTs. Alternatively, the OTTs could create their services across operator boundaries using whatever APIs a given operator supported. However, this would require that most operators make these facilitating APIs available in some form.

The third strategy would be for operators to acquire cloud hosting in the areas where they didn’t have a resource footprint. The challenge here would be that the cloud provider service would likely be more costly in the long run than an operator-owned metro resource pool. However, “the long run” might be well down the road, and operators would be able to size their resource capacity to the pace of activity in the lower-density areas. The key to making this effective would be the creation of a hosting platform software set, and the contextualize/personalize applications, to be run on a VM, bare metal, or in containers.

Both the first and second strategies involve the decision to create “carrier cloud” data centers in at least the major areas of the operators’ resource footprint. The third does not, or rather the third would support the migration to those carrier cloud data centers when enough demand for the facilitating services justified the move. That means that the two pieces of the software platform are the critical ingredients; if operators have those they can ease into the service and enhance their own resources as revenues permit.

My baseline presumption would be that the right platform would be containers and Kubernetes, which fits the cloud model well and aligns with the Google Nephio project’s initiatives to make virtual functions work with Kubernetes. I’m inclined to think that service-mesh market leader Istio would also be smart since it is well-suited to message or short-transaction interactions and (since Google did it too) works well with Kubernetes and probably with Nephio as well.

As a layer above, I think we need some anonymizing elements, and this might be a reasonable place to think about blockchain. A blockchain is “authentic”, meaning it can be tied explicitly to something, and we could assign a user a blockchain not only to represent their identity but also (if we assumed an Ethereum chain) hold the policies, interests, goals, and other stuff that would be required. The proposal for a “Level-3” element of Ethereum (which has nothing to do with Level 3 of the OSI model) that would handle process control and optimization could be a help here.

The higher-level stuff, the more contextualization-specific elements, are obviously harder to address. However, I think that we can assume that we can’t really make these elements of the software pieces of blockchain that would require coding/authentication, or the cost of these steps would likely be prohibitive for the number of exchanges that contextualization would require. I’ll talk about some mechanisms for this final step in the next blog in this series.

What Would Credible New Operator Services Look Like?

What are the network operators, the service providers or ISPs, going to do? As the Things Past and Things to Come podcast said today, there’s a lot of bad signs coming in a space that’s had little else for over a decade. Operators have faced a consistent drop in revenue per bit, and it’s proved increasingly difficult for them to offset this with reductions on the cost side. The financial press has been increasingly negative on the space, and EU operators have been pushing the Union to force OTT players to contribute to improvements in infrastructure. Is there a seismic shift about to happen, or is this the same sort of noise we’ve heard for the last ten years?

Let me start by saying that that last question is the hardest to answer. Yes, profit per bit has been falling, and yes, we’ve been hearing that the net profit per bit was going to hit the point where further investment in infrastructure would mean operators would reduce their financial stability and cut their stock prices. But we’ve heard that all along. Is anything different really on deck here?

Since 2012, operators have been saying that they’d reach a point where they couldn’t profitably build out better networks within two to four years. Obviously that’s not been true. What’s happened instead is that operators have been relentlessly working to reduce opex, and dabbling on the capex side. The result has been a stretching out of that inevitable point where cost and revenue cross the ROI demarcation, where investment in infrastructure won’t meet company requirements for return on investment.

It’s very difficult to estimate where the real crossover could come. My model currently says that operators have between two and seven more years where cost management can stem the tide of profit-per-bit reduction. Where in that range a given operator lies depends on the opportunity density (“demand density” in my terms), which determines the rate of return a mile of infrastructure can earn on the users it passes. However, the same demand density measures the potential revenue an operator could obtain from new services, and it should be clear that while cost management vanishes to a point, revenue upside is limited only by the perceived value of new services.

So far, operators have tended to bog down on the idea that a “new service” is a new service customer for an existing service, or some tweaking of the service pricing model. That idea, IMHO, is clearly not working. A new service has to either be a new experience for the end customer, or a new set of facilitating features designed to promote OTT use. There are two paths operators might take toward this kind of new service, targeting a specific opportunity or investing in infrastructure from which multiple service paths could then evolve.

One of the deficiencies operators themselves agree they have is an inability to conceptualize new retail services beyond connectivity. Another is a lack of understanding on how to market something that’s not a simple evolution of what they’ve sold all along. Both these deficiencies make it difficult for operators to take the first of my two paths. It shouldn’t be like that, but it is, and there’s little point at this time in claiming we could solve that problem. Let’s assume it’s a given, so that leaves our second path.

AT&T, whose financial position is shakier than many others because of its low demand density, has already said that it believes that a path to new future revenues would be to build features that would be wholesale components of services/experiences offered by others, meaning of course the OTTs and perhaps the cloud providers. The question is how those facilitating features/components could be made available.

It seems fairly clear that the foundation of meaningful new services depends on two elements, contextualizing and personalizing, and that the two are linked. The theory behind these services is that operators today are focused on mobile services delivered to smartphones for the majority of their profit. However, mobile services are really little more today than mobile versions of wireline services. We have calling and broadband Internet as the primary features operators provide, and those features are already commodities. I believe that to create our meaningful new services, we have to exploit some new aspect of mobility, and that aspect is it’s-with-us-ness. Unlike wireline services that are delivered to a facility, mobile services are delivered to a device that’s in our hands as we live our lives. That means that if our services “understand” what we’re doing (contextualization) and “understand” our specific desires for facilitation (personalization) they’d be valuable to us.

We’re already seeing some OTT services that nibble on these concepts. An example is a security system that uses the phone to manage a “geofence” that sets a boundary for our being home or away. This is personalized to our notion of what “home” means, and it’s contextual because the behavior of the security system will depend on whether we are “at home”.

Our geofence also opens an important element of the concept of contextualization, which is that one of the pillars that support it is our location. If we could establish the location of a service user, and could exploit knowledge of the location without creating a privacy/security risk, we could offer contextual support for other activities.

Which raises the second pillar of support—goal behavior. We are in Location X trying to do Thing Y. Having information on what both X and Y are would allow a service to offer us support in reaching our goal. Are we driving or walking, looking for a place to find something specific, looking for a specific place, or maybe a specific person? The answer to this goal-related question, combined with our location, gives a service provider an opportunity to do helpful things, ranging from direct goal support to perhaps warning us about potential risks we might encounter.

This, I submit, is the next level of service. However, getting to it is going to require a significant investment in technology to locate us precisely, keep a record of goals and personal policies, and match these factors up in a way that can’t be exploited by a malefactor to track our movements or put us at risk otherwise. Operators are accustomed to making investments with a limited ROI, and they’re already regulated and could be made to protect our privacy and security better than players who are not. If operators could create a set of APIs based on personalization and contextualization, they could offer these to OTTs who would then frame retail offerings, and they’d get the wholesale revenues from the success of those offerings.

This is what I think operators need to be doing, but having a service goal like this isn’t going to answer the technology and business questions that still need to be addressed. I’ll do that in follow-on blogs.

Google’s High-Flying Aalyria Might be a Game-Changer

Google has always been a technology byword, the source of things like Kubernetes. One of its ideas, a project called “Loon” was supposed to provide Internet services through a network of balloons, a notion that sure sounds outlandish and didn’t get far. Now, some of the Loon concepts are reappearing in a Google spinout, Aalyria. The details on the new company, and on the technology, are a bit sparse at this point, but there’s enough to ask some intriguing questions, and maybe even provide some speculative answers.

First, let’s say that Loon was more sensible than it might sound. Google proposed to establish a mesh network of balloon-based cells that would be steered into position by changing elevation to capture winds that would take each balloon where it needed to go. Once there, each balloon would have a ground link and a link to other balloons, creating that mesh network I mentioned. The mesh would be the cell network, requiring no towers and having better coverage than a terrestrial tower would have. Google did a lot of work on all of the pieces of this system, and filed a number of patents.

Aalyria dispenses with balloons, but builds on two specific pieces of the stuff developed for the original project—the laser free-space optical links and the software that organizes the mesh. To quote the website, “Aalyria brings together two technologies originally developed at Alphabet as part of its wireless connectivity efforts: atmospheric laser communications technology and a software platform for orchestrating networks across land, sea, air, space and beyond.”

The software piece of this is called “Spacetime”, and it’s (apparently) based on the mesh software that was done for Loon. It provides for antenna link scheduling, traffic routing, and spectrum management for the system, including (potentially) “ground stations, aircraft, satellites, ships, and urban meshes.” The goal of this orchestration is (apparently) the creation of a connectivity mesh that (potentially) envelops all current network technology and creates multiple pathways to a given user. All significant network transport options would be covered and incorporated into the Aalyria mesh; “designed for interoperability with legacy, hybrid space, 5G NTN and FutureG network architectures.”

The laser piece is called “Tightbeam”, and it supplements the current transmission options to create connectivity where there’s no existing infrastructure. “Tightbeam radically improves satellite communications, Wi-Fi on planes and ships, and cellular connectivity everywhere.” To me, this means that Tightbeam would be available for use on/with planes and ships as well as in ground stations designed to connect with those endpoints, and also available as a backhaul technology for traditional cell sites and as a trunk connection in existing networks.

Tightbeam is highly secure, and that’s likely a part of the reason why Aalyria announced it had a contract with the Defense Innovation Unit, to support its Hybrid Space Architecture (HSA) initiative. The military value of this program is obvious; low-latency battlefield networks for warfighter support. Obviously, the same sort of technology could be used to create low-latency commercial network services. While free-space laser technology was ruled out in Loon because of cloud interference, Tightbeam is said to be usable in most or nearly all terrestrial weather conditions.

Tightbeam can track a moving aircraft and uplink to a satellite at rates of up to 1.6 Tbps, far higher capacity than current microwave channels and more difficult to interfere with. The technology has been tested out to over a hundred miles in the atmosphere, and in space it would obviously have far greater range. It’s also fairly portable; you can move a Tightbeam station around, even put it on a truck, I’m told. A combination of a ground-level Tightbeam station, a 5G mobile cell, and a satellite or aircraft that could offer a Tightbeam relay could deliver super-broadband ad hoc pretty much anywhere.

This admittedly meager description is about all we can get officially from Aalyria at this point, but there are some things we can, I think, safely infer from the stuff they provide.

First and foremost, I think we can assume that Aalyria is creating a multi-transport virtual network. I think that it builds a secure IP network over all the broadband options (including Tightbeam) using a virtual-network approach that isolates its traffic from baseline broadband services of each transport option. This new Aalyria network is responsible for picking the connectivity options that, when strung out into a connection, deliver the best performance and latency to the communicating parties. I think that HSA is an example of a virtual network overlay, in fact.

Second, Aalyria’s technology is the most powerful when it includes a Tightbeam-equipped satellite network, because this configuration could be made to deliver broadband to anything on or around the earth. Absent the satellite link, fully realizing the technology would depend on a terrestrial Tightbeam mesh or aircraft, which quickly gets you into the Loon-balloon model that apparently didn’t work. But it could, perhaps, with Tightbeam. You could also potentially see commercial aircraft with multiple Tightbeam stations, linking to the ground and to other aircraft in a floating/flying/swirling mesh.

It’s likely that the early applications of Aalyria’s technology in the commercial space would involve fixed terrestrial locations, such as mountaintops, as repeater points. That would mean that the beam path would be close to the ground for at least part of its travel. Beam power isn’t given, so we can’t know whether the beam could present a risk to a person. I would assume that any Tightbeam unit would have to be high enough to eliminate the risk of having someone or something wander into the beam, though it’s also possible the unit would recognize a loss of path and shut down before damage could occur. I also assume that there would be an aiming/targeting procedure that would get things closely aligned before any higher power was used.

One terrestrial/commercial application for Tightbeam would be the creation of low-latency paths to tie metaverse elements together. If we assume that the range in the atmosphere is a maximum of 100 miles, we could hop across the entire US in roughly 25 links if we could get enough altitude in flatter places like the midwest. We could mesh population centers along the east and west coasts with a dozen or so on each coast, too. This level of commitment wouldn’t be beyond Google itself, nor beyond other cloud giants.

I’ve talked a lot about Tightbeam and not much about Spacetime, but if we assume I’m correct that the Aalyria system creates a virtual overlay across all the transport/transmission options and optimizes it dynamically, then the same model could be applied in theory to manage service overlays on current network infrastructure. Would it be possible to actually do routing of traffic this way, too? I think that it’s unlikely that the underlying IP network community we call “the Internet” would change over, but new services and the cloud? It could happen.

In short, this new spinout could prove to be a very interesting development for network technology. I’m going to watch it carefully, and I’m sure others will too.

https://www.cnbc.com/2022/09/12/google-spins-out-secret-hi-speed-telecom-project-called-aalyria.html
https://x.company/projects/loon/

Should We Plan to Bypass Cloud Management for Virtual Functions?

The idea of stepping around or beyond the cloud seems almost heretical these days, but the fact is that if we consider “the cloud” to mean cloud platforms and infrastructure, we already step around/beyond it every day. The question is the best way to do that for network services.

The technology that deploys, redeploys, scales, and otherwise operationalizes applications is part of the cloud, but the way that work moves within an application isn’t. Applications may have run-time parameters set by Kubernetes, but the telemetry associated with internal application behavior isn’t part of cloud monitoring. This suggests that our middle virtual-network layer has things going on that are specialized to the application, or the service. That’s even more likely given that virtual network functions (VNFs) are hosted abstractions for physical devices, and those devices have their own management.

We have networks made up of appliances or devices today, and we manage those elements with specialized management systems. There are international standards, IETF standards, and so forth associated with this process, and both network operators and enterprises have network operations staff that depend on their training in these standards to do their jobs and keep networks running. If we imagine a network of a hundred “real” routers and one router virtual function, would we expect that singular VNF to be managed differently in its routing behavior than the other real devices? We might imagine it, but the operations people would surely push back strongly.

In the TMF and in the ETSI NFV ISG, standards gurus elected to retain the traditional management model associated with networks, the latter doing so even though it meant that for VNFs, there would be two management dimensions, one dealing with the VNFs as software committed to a resource pool, and the other dealing with VNF participation in a “network”. In some sense, this mimics the cloud’s operational model, because as I’ve already noted, cloud tools focus on deployment and scaling but not on workflow among application components. The question is whether there’s another “sense” here, one where the separation of management creates a potential risk to service stability.

Let’s assume a network of a hundred routers. Each of these routers has a management API that configures them and manages their software, interfaces, etc. Each also participates in a peer control-plane dialog that defines best paths between endpoints—routes. If a router fails, or a trunk connecting two routers fails, the combination of these two operational models respond, and the service is restored as long as an alternative route can be found.

Now let’s replace some of those routers with VNFs. We still have, with the VNF software, the same management APIs and we also have the same peer control-plane interactions. The conditions of trunk or VNF failure could therefore be handled in the same way as we’d have handled it with a hundred real routers. However, our VNFs are hosted. They can be scaled, redeployed, in ways that real routers cannot be. If a VNF fails, the best option might not be to reroute traffic at all, but to simply redeploy the VNF. That’s probably a broadly accepted benefit of VNFs, in fact.

The question is how the traditional router management processes know about the option, and how VNF deployment knows about whatever the router management processes are doing. Could we have a situation where two management systems work to respond to the same condition? Since the two have different inventories of possible remedies, what’s the chance these dueling responses wouldn’t end up truly dueling rather than cooperating?

We can also look at a more mundane situation, which is that there’s a failure of a VNF that requires it to be redeployed and to “rejoin” the network, likely after whatever remedies the router-network operations model has taken to sustain connectivity have been applied. We can assume that the VNF software and hardware configurations would be properly set by the cloud tools, but what about those parameters? The peer-to-peer piece of the router operations model would result in adapting routing tables, but how about configuring the device? The router operations tools routinely use the VNF’s management API, but a cloud tool wouldn’t intervene at that level.

The questions that all this raises are 1) whether a service and an application are truly different and need a different management model, 2) whether the true benefits of the cloud can be applied to services without unifying the management model, 3) just how far “unification” of the model would have to go to be effective.

I think the first question can be answered “Yes!” for no reason beyond the fact that current networks and network operations practices are based on the router-operations model, which the cloud obviously does not embrace. It would be a major task to substitute a different management model for current router operations.

The second question is a bit more problematic because there are two issues embedded in it. The first is the broad question of whether we can harness true cloud benefits for VNFs in a two-model world. Since replacing a VNF instance is surely easier than replacing a physical router, I think we can assume that that first piece can be answered with at least a qualified “Yes.” However, the re-parameterization issue raises a question of coordination between the two management models. Can the router management model know when to set up a new instance?

Question three is obviously the key question, and the most difficult, because answering it effectively would require that we assess how we’d achieve a unification to determine how far we could go. I pointed out in the past that there are two basic ways we could unify our models.

The first way would be to establish a higher-layer service management framework to which both cloud management and router management would be subordinate. In this approach, any management system would report a fault to something above, which would then have control of both management systems to remedy the fault, or report it up the line for higher-level action. This is my favorite approach, one I think we’ll eventually have to take, but also one that nobody seems inclined to support at this point in time.

The second way is to subordinate one management model to the other. We could either provide an API whereby router management could communicate with cloud management to scale or redeploy, or provide an API where cloud management could communicate with router management to configure a VNF instance. It seems likely that the first option would be difficult to implement because so many different management tools would have to be changed to include reference to that new cloud API, so that leaves the second.

Obviously it would be possible for cloud orchestration to include some sort of “stub” function that could parameterize a router VNF. In a sense, it would be little more than an extension to the concept of containers that Kubernetes (for example) already recognizes. A container holds what’s needed to host an application component. Adding to what’s needed in some generalized way isn’t inappropriate, and in fact might even be helpful down the line if we identify other applications (other than VNFs) that need special setup. The Nephio project may be heading in this direction, though it’s too early to say for sure, or explore how it might work.

This isn’t the end of our issues, though. We could, for example, ask what happens in the world of SDN. If our hundred-router network was replaced by a hundred-node SDN network with a central route controller, we have a single point where routes and SDN node configuration would likely be stored. This would mean that the SDN controller could be an easy place to harmonize management models, which would mean that SDN-router management could easily take the superior role in the management hierarchy for VNFs. That would make the once-difficult option for harmonization easy.

We may also be missing issues from another evolution, which is the evolution mobile networks have already created in defining a multi-layer structure. We have a user plane and a control plane in 5G, the former of which is the IP network, which is itself a data plane and a control plane. Generally, as we climb away from the data plane we see our “VNFs” looking more and more like traditional cloud applications, and we see the relationship between planes being formalized. That seems to create an SDN-like situation that could again favor letting traditional device-centric management take the lead role. Are multi-planar structures inevitable? If so, then should we be accepting the challenges of letting device management take the lead now, since it could be inevitable down the line?

We’re not hearing much about this issue, or similar issues, because our initiatives to advance virtual elements in a network have been perhaps a bit too contained. Management is usually declared out of scope, and that disguises management inconsistencies that can arise. It would be nice to think about our network evolution holistically, because it’s evolving that way whether we like it or not.