Acquisition and Retention: The Real Opex Challenge

Network operators typically spend almost four times as much on customer acquisition and retention as they will on traditional network operations. That’s up almost a third over a five-year period, and it has major implications for the way we think about managing operations costs, since customer A&R is usually lumped into operations expenses. Why is this happening and can anything be done about it?

The reason why A&R costs have grown much faster than traditional opex costs is due in part to the fact that traditional opex is now declining. Operators have been working for two decades to drive human costs out of network operations. They’ve reduced craft provisioning and installation costs, they’ve reduced truck rolls on problems, they’ve charged more costs back to customers. All of this has worked because the Internet is a kind of support wild west, broadband users are used to dealing with chatbots and other stuff in online services, and so they’ve learned to tolerate them in their network services too.

I can personally remember a time when you couldn’t make a call between exchanges without operator assistance. I sat with a woman on a flight once, and she told me her grandmother (who still lived) had placed the first transcontinental call ever made in the US. I once ordered a phone installation in a travel trailer slip, and they ran a line for a mile and a half to do the job, which earned them about ten bucks a month in revenue. All these show how much human cost has been wrung out.

On the acquisition and retention side, there’s no question that there’s a lot more wireless competition. Not only are there multiple “real” operators who field their own infrastructure for services, there are MVNOs who target specific market niches, and of course there’s the continued prepay/postpay debate, and an increased number of mobile services that don’t fit neatly in either of those categories. Competition means differentiation, and differentiation means feature differentiation or price commoditization, and this is the core problem with acquisition and retention charges.

A network service is an almost-worst-case situation from a feature differentiation perspective. The best you can hope for as a provider is that the customer will never really think about the service at all. If they notice anything, it’s almost certainly a bad thing, and so you’re in trouble just by being visible. Operators have had at least a hazy understanding of this for a decade, recognizing that “communications” wasn’t the goal any longer. It was experience delivery that mattered, and they were only the plastic pipe doing the delivering. They called it “disintermediation”, but that implies they were ever “intermediated”, and they were not. There was a mission transformation that came along with the Internet, and operators just didn’t see it. Many still don’t.

What do mobile or wireline broadband users see? The experience, as I’ve said, but also the local device through which they gain experience access. In both mobile and wireline networking, that local device is often a smartphone, though wireline broadband’s biggest bandwidth consumer is a flatscreen TV. Among singles, smartphones are close to dominating even wireline broadband usage. The point is that if the service doesn’t have features, then the feature gateway is the device, and that’s had a major impact on acquisition and retention costs.

My first mobile phone was as large as a thick hardcover book and cost me about a thousand dollars. All I could do with it was (at least occasionally) make a phone call. Today’s smartphone is way smaller than a paperback and you can get them for half that price. Not only that, the smartphone has actually replaced the paperback for many, which illustrates how important device features are in realizing experience delivery. This feature dependence on devices has led operators to try to differentiate themselves as much or more with smartphone giveaways and annual refresh plans as they do with service quality or even price. There are often free phones, payment plans, and plans that guarantee a new phone annually. Needless to say, these plans drive up customer acquisition and retention costs, but they do have the advantage of creating loyalty, and they can also be useful when something like 5G comes along, because they get suitable handsets in the field faster. The big point here is that operators have almost universally decided that they need to promote handsets and not their services. You can see that reflected in their TV advertising.

Given this shift to handset-differentiation, it’s unlikely that operators will gain as much from further optimization of network operations costs as they’ll lose (concurrently, but not as an effect of network operations optimization) in customer acquisition and retention overruns. I’m getting the sense that what operators are now trying to do is to hold the line in operations costs, reduce errors and problems that generate outages that complicate acquisition and retention, and efficiently manage any new infrastructure they may deploy.

Service quality or QoE is seen by some operators as a tool in customer retention, but there isn’t much direct evidence to support this. The problem is that while users will flee a provider who has regular issues with availability and quality, they don’t seem to be able to recognize differences between services that aren’t failing outright. Verizon, who used to run “can you hear me now” commercials when mobile services were problematic in many areas, stopped the practice when most operators had service in most places. There are many factors related to QoE for online experiences and the ISP is only one of them. Thus, few operators have justified increased spending on operations to improve acquisition and retention.

Put into this framework, the “cooperative features” model that AT&T has said it’s committed to makes sense. If operators could create service features that promoted OTT partnerships, then those partnerships could generate retail services that would differentiate the partnering operators, to the extent that competing operators didn’t do the same thing. Of course, the value of any cooperative features would depend in part on whether they could be created, shared, and exploited efficiently. Any issues with service quality at the lower level would be inherited by the OTT partners, who would then push back or perhaps seek options elsewhere. Operations cost issues could push up wholesale prices to OTTs, retail prices to users, and tamp down demand for the services throughout the stack.

OTT cooperation is easy to say, but in order to do it the operators would have to spend some effort thinking about the direction service evolution might take. OTTs are unlikely to come to operators to talk about this, for the logical reason that such discussions would then likely get to competitors. Operators who had an idea about future service directions, particularly ideas they could help realize, would have everything to gain by touting these ideas to the skies, because a highly distributed and competitive community of providers would encourage them all to take advantage of any operator-provided features that might improve service quality and reduce time to market. It’s going to be interesting to see how the operator community in general, and AT&T in particular, move forward on this.

Implementation Considerations for the Three VNF Categories

Microservices, cloud-native practices, and service mesh technologies are all popular in cloud development. Does that mean they’re essential in the development of telco applications? Should VNFs be cloud-native microservices, connected via a service mesh? I pointed out in THIS blog that there are very different missions associated with virtual functions, and that the mission determines the best architecture. How would the three VNF categories I discussed map to cloud-native practices? That’s our topic for today.

Cloud-native design isn’t tightly defined (nor is much else these days), but most would agree that it involves building applications from an agile, highly interconnected, set of microservices. A subtext for this is that the application you’re building is logically divided into “pieces of interactivity”, meaning little nubbins of functionality that relate to the steps a user would take in running the application. This logical structure means that the components in an application are regularly involving a GUI and human delays.

A virtual function is different from an application, but how different it is depends on its mission, which maps to the three categories of VNF I outlined in the blog referenced above. Let’s look at each of them to see what we can find.

The first category of VNFs is made up of those involved in direct data-plane connectivity, the flow VNFs category. This is the category where the work of the NFV ISG could be most relevant, since the body focused on “universal CPE” missions almost from the first, and CPE by definition is in the data path.

The challenges with cloud-native implementations in this category are first, that most flow VNFs are likely not stateless, and second that in many cases flow VNFs aren’t really able to be scaled or redeployed in the traditional cloud-native way. That’s because a flow VNF is connected in at a given place, and it may well be that another instance (scaled or replaced) has routing tables that reflect that place.

These issues are in addition to the broader challenge, which is that a hosted function running on a server may not have the performance needed to support the mission. I’ve advocated a function-hosting model that includes white-box hosting, which would partially resolve this challenge. The remainder of the resolution would have to come through something like the ONF Stratum/P4 approach, which uses driver technology to generalize access to switching chips likely to be critical for the VNFs in the data plane.

The second VNF category is the control plane appliance VNFs, and this is largely the focus of NFV in IMS and 5G applications. Here we have things like mobility management and signaling, missions that don’t involve direct packet transfers but may control how and where the packets go. This category raises a number of interesting points with regard to cloud-native implementation.

One obvious point is whether a control-plane appliance’s mission could be related to the mission of a component in an IoT application. A typical control-plane exchange appears analogous to an IoT control loop, and if that’s the case then this category of VNF would behave like an element in edge computing. That would mean that whatever procedures were used in building this category of VNFs should also permit building IoT/edge applications. That, in turn, means that this category of VNFs should be the most “cloud-like”.

I think this is the category of VNF that Nephio is intending to focus on. Certainly the majority of control-plane VNFs should be suitable for cloud-native deployment and for container use. Kubernetes, which is what Nephio centers around, is capable of deploying to bare metal with an additional package, and that might also allow it to deploy to a white-box device.

It’s not clear to me that we’d need to support white-box switches, meaning network devices, for this VNF category, but it may be true that this category of VNF would benefit from custom AI chips, image recognition, etc. Given that, it would be wise to think about how this sort of thing could be accommodated for this category of VNFs, particularly given that it’s probably more likely that edge computing and IoT applications would benefit from those custom chip types.

The final VNF category is by far the most complex; decomposed function hosting. In early NFV discussions, this category of VNF would have supported the breaking up of physical network functions (devices) into components, which would then be reassembled to create a complete virtual device. Thus, this class of VNF depends on what PNFs are being decomposed and how the process divides features. The same category would represent VNFs designed to be reassembled into higher-level “packages” even if there was never a PNF available to provide the feature set involved.

What divides this from our second category is the lack of specificity with regard to what the functions are and how they relate to each other and to the control/data planes. The challenge is that if you’re going to decompose a PNF or build up a kind of “never-seen-before” appliance by assembling features, you can’t accept too many constraints without devaluing the mission overall. It might be easy to say that all you really need to do is consider this category a refined implementation of the two previous ones, but for one factor.

The NFV push for decomposition of PNFs was intended to set up a model of a virtual device where functional components could be mixed and matched across vendor boundaries. That doesn’t mean that the factors related to implementing our first two categories wouldn’t still apply, but it would mean an additional requirement for a standard API set, a “model function set” from which open composition of appliances could be supported. This isn’t really a cloud-native capability, but it draws on a general development technique where an “interface” is “implemented” (Java terms).

The challenge here is that in development, one normally establishes the “interface” first, and if the source of decomposed VNF elements is a prior PNF, you have to not only create an interface retrospectively, you have to figure out how to make current implementations of the PNF decompose in a way that supports it. The idea of decomposition failed in the ISG largely because of these challenges, and that suggests that we might either have to abandon the whole notion of decomposition and assembly of virtual devices, or apply the idea only to things like open-source implementations where standardizing the functional breakdown of a virtual device would be possible.

There’s a wide range of things that these three categories of VNF would be best hosted on, from bare metal and white boxes to containers and cloud-native functions. The third category fits service mesh technology, at least for some functions, the second might in a few, and the first is unlikely to be able to use service mesh at all, given that its message flows and interface relationships would be almost surely static. This illustrates just how variable VNFs are, and how challenging it is to try to shoehorn virtual functions into a single model, which is what the NFV ISG tried to do.

Every one of the three VNF classifications introduce specific development and hosting constraints, but the good news is that these appear to be addressable in an orderly way. You need to be able to map deployment tools to everything from bare metal and white boxes to VMs and containers. You need to introduce some APIs to guide development and standardize operations. None of these needs present overwhelming challenges, and some aren’t really even difficult; we’re part of the way there already. The only question is whether there’s a mechanism and a will. Projects like Nephio may answer both questions by demonstrating progress and support.

Business Spending on Tech in 2022: Hopeful

There’s some good news for the IT and networking sectors from Wall Street, but it’s not unqualified good news. Generally, 2022 numbers are not only looking better than 2021’s (no great feat) but more significantly, they’re looking better than they were earlier this year. However, there were still a significant number of enterprises who expected a decline in spending versus last year, and that was true in every category. Storage was the only product category that managed to get an expectation of growth versus 2021 from a majority of CIOs.

IBM’s surprise beat of expectations is likely related to the same shift in attitude. As I’ve noted in past blogs, IBM has strong strategic influence in its major accounts, but the biggest factor in its growth is clearly Red Hat’s contribution. IBM is indeed becoming Red Hat in a strategic sense; it’s Red Hat products that represent the biggest upside for IBM. In a tactical sense, it’s also IBM’s account control in major enterprises and verticals.

Software is doing way better in terms of expected spending growth than hardware. The only category where hardware spending upside expectations beat downside expectations is storage. In software, every category of software shows a stronger upside potential than downside potential in terms of spending.

If you look at buyer expectations from 2020 onward, you can see a steady uptick in expected spending growth in all spaces except communications, and so it seems that networking isn’t necessarily going to enjoy the same upside this year as the rest of the markets. The view on the Street, consistent with what I hear myself from enterprises, is that there’s still a COVID influence on network spending. The Street surveys don’t show the details, but what enterprises tell me is that they’re still building web/cloud portals for their applications, and still supporting more work-from-home than before the pandemic. However, this trend is slacking off.

There’s a general view that public cloud spending will growing more slowly this year. The number of enterprises who expect an increase in spending is roughly half what it was last year. However, the number who expect to spend more on the data center components related to cloud usage is up roughly 50%. This fits into what I hear from enterprises, which is that it’s time now for enterprises to do more in the data center, partly because they held off during the pandemic and partly because there are some things they know they need to do to optimize the relationship between data center and cloud. The Street tends to characterize this as “private cloud”, but that term has no solid meaning to enterprises, who think more in terms of virtualization and “back-end” transformations.

There’s a common technology piece to this, which is BaaS, meaning “back-end as a service”. Here, of course, the back-end reference relates to the fact that the public cloud has been growing largely because enterprises have built cloud front-ends to legacy applications, making them the back-end piece of the new software model. However, there’s growing interest in beefing up the front-end element by pushing some back-end features forward. So far, that’s focused on validation and optimization components that require some database access, and this seems to be a big driver behind continued cloud spending, and also a driver of a redesign of some data center elements.

Database software and storage hardware are linked to this, I think, and so is apparent increased focus on analytics tools. One of the undercurrents of the pandemic has been a shift of focus from sales to marketing overall, driven by the need to support customers who didn’t want to go to stores. Yes, this means more money for online retail, but it also means more online focus for all of retail, and that means understanding more about things like ad targeting versus older considerations like shelf placement. Browsing online sites offers consumers a way to research remotely, and statistics on what’s happening online thus could offer retailers a window into customer thinking.

The notion of shifting work makes it hard to decode Street data on the details of software and hardware budgets, because the Street doesn’t generally understand technology planning details or reflect them in surveys of buyers. For example, Street research talks about “private cloud” which enterprises tend not to talk about in my own interactions. They also talk about “repatriation” of cloud apps back to the private cloud, meaning the data center, when in fact there isn’t really much net movement out of the cloud at all. A lot of the stuff the Street sees as moving back aren’t what most would consider “applications” at all; it’s things like virtual desktop, databases and storage, and security tools.

Repatriation is a marketing myth, IMHO. It’s not that users are spending less, or even trying to spend less, on the cloud, but that they’re largely done with the front-end-building process, and BaaS has only started to impact cloud spending. I think BaaS will pick up in the second half of 2022 and create another uptick in cloud spending growth in 2023.

The Street sees Microsoft as being most likely to see more cloud spending than the other major providers, which is consistent with what I hear and also with the analysis of the prior paragraph. Microsoft has always been the top-rated cloud player in understanding hybrid cloud applications, and that not only continues but seems to be accelerating slightly in 2022. However, Amazon began 2022 as a slight leader over Microsoft in terms of influence on buyer cloud policy, according to my own interactions.

Security spending is generally up, which of course benefits security products in general, and those offered by network equipment vendors in particular. The primary network vendors like Cisco and Juniper seem to be gaining traction in security spending relative to IT software players, and even relative to more specialized security-network vendors. Enterprises tell me that network vendors are doing a better job of proactive security marketing, which is facilitating their sales initiatives. Security as a driver for spending is gaining traction in the IT integrator space, rivaling cost reduction.

Spending on other network equipment is more complicated; generally it appears that Street research is forecasting roughly flat to slightly down trending in routing and flat to slightly up in data center switching. That’s fairly consistent with what I hear from enterprises. There is a general view that last year saw a relief rally in networking and also in computer systems, as companies caught up with projects held back by virus uncertainty, and that this is largely over in 2022. The Street isn’t seeing much in the way of specific trends in network device spending other than in the data center, but I’m hearing more about SD-WAN and virtual networking.

Overall, it appears that enterprise tech spending will be healthy in 2022, so broad-based tech companies will probably do a bit better than last year. It also appears as though there are forces creating market shifts within the sector overall, and since the Street isn’t saying much about that, I have to assume that vendors aren’t either. Does that mean they aren’t planning for those shifts? If so, there is both a risk and an opportunity in the offing for this year, and next year as well.

Who Has the Best 5G Strategy, Verizon or AT&T?

Verizon and AT&T have been locked in some form of competitive (with ownership of Bell companies playing a role) embrace since the Bell System broke up. Verizon has a major advantage in demand density, the ability of a market to pay back on network infrastructure, and AT&T has been perhaps the most radical Tier One trying to make up for their demand density deficit. Now there are some signs that Verizon may have taken its advantage too much to heart. The problem may be that so much revenue focus has shifted to mobile networks, where demand density is a lot more complicated.

Verizon’s done well in wireline broadband, including FWA, which is the area where demand density is the most important. For mobile networks, the per-tower range is larger and there’s a fair amount of tower-sharing that goes on, which makes it harder to relate demand density to return on infrastructure. Then you’ve got MVNO relationships…you get the picture. It’s not a surprise that some articles on the two Tier Ones have focused on the wireless space, nor is it surprising that some of the stories seem to be missing some key points.

It’s the wireless space, relative to AT&T and Verizon, that some analysts and tech authors are questioning. In particular, some question the C-Band investment by Verizon, but others are questioning AT&T’s delay in making C-Band 5G available. So is C-Band good or bad?

The biggest question that the Street has about Verizon’s strategy is its C-Band 5G deployment. Nobody disputes that Verizon’s move boosted 5G speeds significantly, but there’s a lot of questions circulating on whether that translates to any incremental revenue. My readers know that I’ve been a 5G revenue-gain skeptic from the first, simply because mobile phones can’t usually exploit faster connections than video streaming requires. Verizon’s loss of mobile customers (36,000 net, and almost 300,000 consumer mobile customers, saved by business adds of over a quarter-million) when AT&T gained over half-a-million sure seems like a bad sign. However, it’s better than last year’s number.

Verizon has always commanded a higher price for mobile services, and it’s pretty doubtful that customers would leave Verizon just because they couldn’t see any difference with C-Band 5G performance. It’s more likely that wireless users are steadily more price-conscious and were leaving based on price. C-Band isn’t available over anything like Verizon’s total footprint and it wasn’t available at all until partway through the quarter, so it’s probably not driving decisions much at this point.

The bigger question is how much upside there is in mobile services, even considering 5G and C-Band. Verizon touted the high rate of 5G adoption at this point in the cycle, versus that of LTE, but when LTE was introduced it was rare for operators to have free phones and regular replacements, so handset inertia likely held LTE back. If 5G isn’t going to boost subscribers (which so far Verizon admits it isn’t; they say market shares are stable and so does rival AT&T) and if users won’t pay more for it, then how does incremental revenue get generated?

Verizon, like most operators contemplating 5G, hoped at first that IoT would be the killer revenue booster. Imagine every family’s “things” adding to the number of wireless subscriptions; the CFO’s heart likely melted at the mere thought. Thought was about all that came along, though, because of course nobody really wanted to pay to get their things networked. Use WiFi for free, right?

Verizon mentioned multiaccess edge computing (MEC) too, but we really don’t have a clear picture of how that would generate incremental revenue. Like IoT, MEC is one of those things that you can sort-of-make-a-logical-case-for, at least at the PR level, but like IoT there are a lot of moving parts in an MEC application and you only get revenue if you can get them all to march in harmony.

The most obvious point to be made here is that 5G isn’t transforming the revenue picture for the telcos. That doesn’t mean it won’t continue to deploy—it’s the successor wireless technology after all. Verizon’s CEO suggested that the payoff for 5G might be delayed for five or even ten years. Even that seemingly pessimistic statement is a bit of gloss; what exactly is a payoff? The challenge here is that infrastructure investment by operators often doesn’t meet even their very forgiving internal rate of return constraints. Simple growth in usage and customers, competitive pressure, and the need to refresh technology at least every five years, combine to force investment as a cost of doing business. That story isn’t very appealing to the Street, though, and so it’s no surprise it’s not told very often.

Verizon made a mistake here, I think. Yes, mobile services have been the darling of the CFO for decades, but that doesn’t mean that you can plan their evolution like you’d plan the evolution of other services. Faster demonstrably means something in wireline, but it means way less when everything the user does is compressed onto a smartphone screen and constrained by mobile behavior. They’ve also made a mistake, one that almost every operator has shared, in simply touting IoT or edge computing as opportunities, with no regard for the fact that they’re not in a position to assure they’ll be realized.

It would be possible for Verizon, or any Tier One, to create the ecosystem needed to realize IoT or edge computing. It would have been possible all along, and in fact all the signs that such ecosystems were needed were there well over a decade ago. That’s the biggest reason to be cautious about the possible; what makes the present so different from the past, other than that the passage of time likely makes such ecosystem creations more difficult. Verizon wasted a lot of time if they were expecting to realize actual revenue gains out of either IoT or the edge.

AT&T has issues, too. Most on the Street rate them lower than Verizon, and in terms of fundamental wireline services, they should be. But AT&T’s weakness is proving a strength because it’s forced them to confront truths about service evolution that Verizon has yet to confront. Don’t look at things like net mobile adds to declare AT&T a winner, look at their commitment to truth in future services. Don’t look at their failure to push super C-Band speeds in 5G as a weakness, either. Faster is better only when users value it enough to pay for it.

Thoughts on a PaaS API Set for NFV and Edge Computing

How would you build an optimum service layer and PaaS for Nephio? If I think the project needs to face those tasks, then I should be prepared to comment on how it could be done. The starting point for that, as always, is at the top of the process, which is the notion of a model from which a service could be assembled.

Back in about 2007 when I was asked by some operators to do a quick PoC to demonstrate that the model of creating services from objects (the TMF Service Delivery Framework or SDF activity) would work, I put together a Java application set to do the job. The overall model I used was one that said that there were a variety of “Service Factories” that would publish “Service Order Templates” it could fill. These templates would be elements of Java code (because doing a decomposition software set to process a data model would have taken too long). Any SOT could be sent to a Service Factory for processing. Once the order was processed, there was an instance created as a Java application, and that application could be bound to any compatible Service Factory when an event in the service lifecycle needed to be handled.

This is what I still think that service order (and intent-modeled) processing should look like. Every modeled element is represented by a data model element that will then steer events to distributed processes that, because all their inputs/outputs were into and from the data model, could be stateless microservices. The task of steering the events through a data-model-resident state/event table of some sort could also be instantiated on demand in whatever location was best, so the entire lifecycle management process would be model-driven, stateless, and microservice-compatible.

It’s important to note that this data model is, in effect, a digital twin of the actual VNFs and cloud resources, and not the actual VNFs and resources. Thus, there is no constraint that the VNFs be stateless or that the cloud provide containers or VMs or bare metal or even white-box switches. The service modeling process is thus a kind of metaverse-of-things.

The approach I’m describing has the advantage of being fully distributed and scalable, which separates it from the monolithic approach taken by the two ETSI projects. Even if you componentize something that’s designed to be a serial A-then-B-then-C process you still have a serial process. It would be possible to componentize ONAP to make it scalable, but I think the interface specifications would tend to limit how much you could actually do. The same is true with NFV; when you have to harmonize down to a single interface, you’ve created a scalability and resilience issue.

The PaaS layer I’ve talked about would define both the way that the state/event processes were built, and the way the VNFs themselves were built. Thus, it would have a foot in both of the “twins”, the real thing and the virtual thing. Each instance of a state/event process referenced in the service data model would use one piece of the PaaS, and each instance of a VNF would use the other. There may also be linkages between the two, since it’s very possible that processes referenced in service data models might also be referenced by the implementation of VNFs. That’s why I don’t define two separate PaaS layers; they’re likely overlapping.

The requirements for the PaaS elements related to service modeling arise from the description of the way that works, and I think we could define candidates for discussion and refinement without too much time and effort being expended. For the VNF PaaS piece, it’s going to be a bit more complicated.

There are things that we could identify in a VNF PaaS with fair confidence. We need to have “interface plugs” that will connect with physical interfaces on servers or white boxes, where there has to be an actual connection to an external element. We need “API plugs” that would define connectivity between VNFs that didn’t require an actual discrete physical interface. We need the interface to my “derived operations” databases to either present data via a daemon process or to request it via query to create a MIB. We need the plugins and APIs to connect to “infrastructure drivers” that provide access to the hardware in a generic way, so the same stuff can run on bare metal, in VMs or containers, and in white boxes, and use whatever AI or switching chips might be appropriate to the VNF’s functionality. Since Nephio defines Custom Resource Definitions (CRDs), we need stuff to interpret and handle CRD processing, and we need the APIs that link to the K8S Operator function plugins.

Defining other VNF PaaS functions could be facilitated by addressing the way that Nephio links with existing VNFs. For example, the Nephio diagram shows Netconf used to control CNFs, which suggests that we might want to have our K8S Operator function for VNF control provide the features of Netconf. Should we simply assume Netconf, though, or should we look at those features and develop a full capability set that we can map to it? I think the latter approach would be best, since Netconf is a device control protocol.

The same is true for the APIs used to bind one VNF to another. We could presume nothing more than a socket relationship, but is there a need to have coordination between adjacent VNFs, creating what the NFV ISG called a “service chain”? VNF security suggests there might be, and the ISG didn’t really take up the question of what I’ve called “Packages”, which are sets of VNFs that are analogous to the way Kubernetes deploys applications. Is there a “Package coordination” task? I think there could well be, since cooperative elements within a virtual device need some mechanism to cooperate, and run-time coordination shouldn’t be passed through an external service model for efficiency/latency reasons.

Considering the PaaS layer is important not only to simplify development and operations, but also to ensure that we actually address everything that’s needed. From the top, with an architecture like the one I’ve described, we can pick out the implementation elements using fairly standard software design processes. From the bottom, starting with the details, we are unlikely to create a truly optimum architecture. We can see the evidence of that in the NFV ISG and ONAP work, IMHO.

That raises the biggest risk Nephio faces, which is that in an effort to utilize work that’s been done wrong, it makes those same mistakes. The initial NFV white paper was valuable in laying out the objectives, and the work done already on VNF hosting can teach us lessons, both positive and negative, without dictating that we follow the same path, the path I believe was wrong. We need to learn the lessons and avoid the traps, and with some care I think that’s possible. That’s why I hold so much hope for Nephio.

There’s VNFs and then There’s VNFs

With all the talk about virtual network functions (VNFs) in the ETSI Network Functions Virtualization (NFV) group and now in the Nephio project (see my blogs HERE and HERE), it occurs to me that we’ve not really talked much about VNFs themselves. In particular, we’ve not talked about the fact that there are multiple VNF types or models, and that this high-level classification of VNFs has a lot to do with how we might expect to host and operationalize them.

While the original ETSI material doesn’t make this totally clear, the original vision of NFV was to create VNFs that were flow appliances, meaning that they were intended to be in the service data plane and provide some feature set “above” basic terminating of the network connection. Think firewall. One of the early goals of NFV was to create cloud-hosted equivalents of what were normally provided as CPE, which operators hoped would lower capex.

A flow appliance model has challenges. First and foremost, focusing as it does on CPE, it’s almost automatically limited in utility to business services. Residential CPE today is almost exclusively a WiFi hub combined with a simple home router, and the cost is typically low. “Standard” residential terminating devices are running between $35 and $70 US to operators, and most of that cost comes from the essential function of CPE, which is to terminate the service connection and provide local WiFi. The problem this creates is that most flow appliance missions aren’t going to be financially sensible, and so flow appliances of this type are not likely to impact operator costs very much.

The other type of flow appliances would be switches and routers, things that in most applications require high-performance switching of packets that a standard server backplane could not provide. This would require a white-box device with special switching chips, and while these were becoming available in 2013 when the ISG’s work framed out the architecture, they were not a target for VNF hosting. I believe that without the inclusion of white boxes, any VNF initiative is likely to be seriously, perhaps fatally, hampered.

The second class of CPE, which was also introduced by the ETSI NFV ISG, was the control plane appliance. This was introduced by the ISG in 2013 in the context of the mobile-network IP Multimedia Subsystem (IMS), which defined separate service control and mobility management features. In today’s world, the focus of control plane appliances has been 5G, of course.

Control-plane functions are interesting because they influence data-plane behavior but don’t actually participate in the flows. That makes them a much better candidate for cloud hosting, and it wouldn’t require specialized hardware either. Another interesting aspect of control-plane hosting is that the “disaggregation” interest in IP networking that separates the control and data plane creates a mission potentially broader than mobile/5G networks.

The primary challenge for control plane appliances is hosting resources. Mobility management is a metro function, one not easily pushed deep into the cloud because of latency issues and the risk that long data paths could reduce reliability/availability. Edge computing is the obvious answer, but given that “edge” here really means “metro” in nearly all cases, the question has always been who would deploy the resource pools. If there isn’t a fairly efficient resource pool, then control plane appliance missions start to look like uCPE or bare metal missions.

5G also appears to have spawned the realization that while the 3GPP mandates “NFV” for control-plane deployments, this probably isn’t a good strategy for the 5G RAN. Open RAN defines a Radio (RAN) Interface Controller or RIC that does a lot of what NFV would do, but in a lightweight form. Nephio appears to promise support for the 5G features, which would make the RIC a Kubernetes element. There are other roles the RIC performs, and so it will be interesting to see how the full RIC function can be mapped to a Nephio implementation.

One important point about this category; the “control plane” is a term that means different things depending on who’s talking about it. In IP networks, there’s a control plane where exchanges between nodes are used mediate data plane routes. In mobile networks, there’s a control plane that defines things like mobility management. The term isn’t precise in one sense, but it always tends to mean a level of interaction that controls the way an “element” of the network relates to the cooperative community that’s the network itself.

Category three for VNFs is decomposed function hosting. At the very first ETSI ISG meeting in 2013, there was some interest in hosting pieces of physical network functions (PNFs) independently rather than having VNFs map 1:1 to PNFs. A formalized model of this can be found in disaggregation and in SDN, but it’s possible to conceptualize almost any “appliance” as a series of decomposed functions. For example, a firewall might be a combination of a flow appliance and a separated element that maintains forwarding rules across an entire enterprise, or even for all consumers.

The challenge here is a model for the decomposition. That’s what killed the idea early on; vendors who offered PNFs were totally uninterested in decomposing their stuff, and without some architecture to define how the decomposed elements could be defined and reassembled, it would be difficult to create a useful and open ecosystem. I think serving this particular mission would likely require support from my hypothetical PaaS layer.

This could well be the pivotal VNF model, whatever its challenges, because it would permit a break between the VNF and PNFs, which would mean that cloud-native design would be applicable to network functions overall. It also connects nicely to Kubernetes’ mission of deploying all the components of an application as a unit, and maintaining their connection. With this model, we start thinking about assembling services from assembled virtual devices that don’t have a direct physical-appliance counterpart.

Which brings us to the final VNF category, truly virtualized features. Once we have a model to build services from built functions, we can rethink the whole notion of services and features. This, I think, is what would be needed to achieve AT&T’s stated goal of building a new layer of “connectivity” features that would empower the network and at the same time support new partnerships with OTTs.

One aspect of this class of VNFs is that it could represent a network destination, like an OTT element. This doesn’t commit the operators to being OTTs, but rather to being able to provide features that are evolutionary/revolutionary rather than ones they traditionally offer. That’s why this category could represent a path to achieving AT&T’s goals.

A pedestrian example of this class is the logic associated with content and ad delivery. There are routing-and-redirect functions and caching functions associated with these related applications, with the former obviously being related to DNS decoding, a feature that nearly all the major operators offer, bundled with their broadband access. The latter is something traditionally provided by others, including CDN vendors like Akamai, but operators have a vested interest in content caching as it would relate to optimizing their own metro/access infrastructure.

The three classes of VNFs I’ve cited here are important not only for what each represents, but also for the progression they demonstrate. Operators have, for decades, avoided what I think is an irrefutable truth; connection services will always commoditize. That means that if you want to improve your profits as a network operator, you need to rise above them. There may or may not have been good reasons for operators to eschew the OTT opportunities a couple decades ago when they were still available. Whichever was the case, those opportunities have passed them by, at least in terms of offering potential for direct retail exploitation. AT&T’s theory of fostering a symbiosis with OTTs that allows some OTT service revenues to pass to them is the best strategy that remains, but exploiting it means moving through the progression the VNF classifications represent.

AT&T made an important decision when they laid out their service-symbiosis model, but they need to flesh it out, and Nephio presents an opportunity to define a platform that could support that. It’s not there yet, and operators aren’t yet fully onboard with exploiting it. I’m excited to see how this all develops, but I’m still wary because operators could have fixed their problems in the past, too. They didn’t, and a change in technology won’t overcome a lack of change in willpower.

The Traps that a Nephio-Based NFV Solution MUST Avoid

In my blog last week on the Linux Foundation’s open-source function virtualization project (Nephio), I noted that two things that the project didn’t have. One was service-layer modeling and deployment and the other was a platform-as-a-service API set to define how network functions would be written. Today, I want to explain why I think the two are important and offer my view of how to get them.

Nephio is a framework for network functions virtualization based on open-source development rather than the development of “specifications”. Obviously, we had the spec-based approach in the NFV Industry Specification Group (ISG) within ETSI, and that activity is still going (it’s working through Release 5 in fact). I’ve mad no secret of my view that the NFV ISG got the basic design wrong from the first, and that it’s been unable to work past those initial issues. Therefore, I think, it’s critical that Nephio not fall into the same traps that caused the ISG work to go awry.

The first of the traps, the one that Nephio is obviously intending to avoid, is the “trap of the monolith”. Telcos tend to think of software as monolithic applications, singular instances of stuff that pull events from a queue and push control messages out to devices. Nephio is proposing a true cloud-native, meaning distributed, architecture, and that would be a big positive step. Not, as we’ll see, a step sufficient to ensure success.

The second trap is the “trap of the inconclusive”, meaning that the NFV ISG defined a minimalist implementation that relied on leveraging other stuff that was out-of-scope to their work. That reliance tended to pull the specifications toward a simple virtual implementation of existing appliances, a substitution of virtual boxes for real boxes, and that constrains the ability of NFV to support future missions and ties it to current operations practices, making it difficult for NFV to address operations costs.

My point is that the two omissions in the current Nephio charter could result in an implementation that still falls into one or both of these traps. Let’s see why that is.

Nephio is quite explicit in saying that service-layer modeling and orchestration is outside its scope. On the surface, that might be seen as logical, given that we have both a TMF OSS/BSS framework that addresses the topic, and the ETSI Open Network Automation Platform (ONAP) that more directly aims at service lifecycle automation. However, these relationships also exist for ETSI NFV ISG specifications, and they didn’t save them from the traps.

The purpose of NFV is to deploy virtual functions, but a virtual function outside a service context is just sucking up resource to no purpose. The converse is that the optimum implementation of a given virtual function, and the optimum means of collecting them into feature instances, depends on the way they’d be used in services. If that point can’t be considered because the service processes are out of scope, then there’s a risk that the features required below those service processes won’t address service needs.

Operations issues here could be decisive for NFV and for Nephio, and we learned that in the ISG work. In the ISG, the presumption was that OSS/BSS and NMS relationships were realized by leveraging the current set of management interfaces. That meant that an element management system (EMS) was responsible for managing the virtual network functions created from the physical network functions (devices) the EMS was originally managing. The “management tree” of EMS branches that fed a network management layer (NMS) and finally a service management layer (SMS) was inherited by NFV.

The problem with this is that NFV should present a host of service deployment and remediation options that wouldn’t exist for physical devices. You can’t place, or move, one of those through a console command. You can do both with a virtual element, so how can legacy network and service operations deal effectively with capabilities that don’t exist for legacy devices?

There’s also an issue with the hosting environment itself. A physical network function’s management is a necessary mixture of function and platform management because the two are integrated. With a VNF, the hosting management has to be separate, and if we’re going to push the existing EMS to manage the VNFs, then how do we integrate hosting state and function state into one?

An intent-modeled service would be created by assembling features, which would then be created by integrating some combination of real devices and VNFs. At any level of abstraction, the state of the VNFs, devices, and cloud infrastructure could be integrated. Without service modeling, we’d have to presume that we were going to integrate the cloud and VNF management at the VNF level, and that suggests that a VNF it itself an intent model. If that’s the case, then haven’t we taken a major step toward defining an architecture to model services, without securing service modeling and management benefits?

I think that Nephio has already committed, at least philosophically, to the concepts of service management and modeling. I don’t think that it would be a massive undertaking to address the service layer. Yes, I understand that this would introduce a collision with the ETSI ONAP, but Nephio is a collision with ETSI NFV already. ONAP was done wrong, just as NFV was. Can one be fixed while the other remains broken?

The second of the things not addressed in Nephio is my PaaS layer. Virtual network functions aren’t the same thing as applications; they have a much narrower scope and focus. They do evolve into a form of edge computing at the cloud hosting level, but they have network-specific pieces and features. I think that both should be considered in establishing a cloud-native framework for VNFs.

In the Nephio material, they note that the current model of VNF deployment requires too much customization, too much specialization of integration. True, but the reason that’s true is that the ETSI process had a goal of accepting VNFs from any source, in any original form. Remember that it believed that a VNF was the functional piece of a device, extracted and hosted. Every device has a potentially different hardware/software platform, and thus every VNF has a different potential relationship with its environment. Yes, if we want to accommodate the current VNFs, we have to deal with that. Do we have to perpetuate it?

The software space is replete with plugins. We have them in OpenStack, and we even have them (as “K8S Operators”) in the Nephio diagram of its layers. The goal of the plugin approach is to define an “ideal” interface for something, and then adapt any stuff that can’t use the interface directly. We also have packages that let software for one platform framework (like Windows) run on another (Linux). The ONF P4 approach defines a standard interface between switch software and a switch chip, and a P4 driver adapts different chips to that optimal, common, API set. There’s a software development design pattern called “Adapter” to represent this sort of thing, for a given API. Why not define an API set for a Nephio VNF, which would be a PaaS?

It’s hard to believe that the requirements for edge computing wouldn’t have similarities across all edge applications. Edge applications will have many common requirements, notably latency control. Similarly, it’s hard to believe that the way some edge requirements would be met shouldn’t be standardized to avoid differences in implementation that would waste resources and complicate management. If we defined a PaaS for the edge, that would facilitate edge development, and it would also form the basis for a derivative PaaS set that would be more specialized. VNF development would be such a derivative.

An easy example of a place where a PaaS would be helpful is in intent modeling itself. An intent model exposes a standard “interface” and/or data model, and the interior processes are built to support that exposure no matter what the explicit implementation happens to be. A lot of this process is, or could be, standardized, meaning that a PaaS could expose APIs that the interior implementation could then connect with. Doing that would mean that there would be a toolkit for intent-model-building, which seems a critical step in creating an efficient model development process. Remember that Nephio is based on intent models.

Management is another example. Back in 2012, I was working with operators on what I called “derived operations”. The problem arises with any sort of distributed functionality, because a device can present a single interface as its management information base (MIB) but what does that in a distributed process? There’s also a problem with the MIB approach in that it’s based on polling for status. When a function relies on shared resources, its polls combine with those of other dependent functions to create a risk of what’s effectively a DDoS attack.

My derived operations approach was built on an IETF proposal called “infrastructure to application exposure” or i2aex. A set of daemon processes polled the management information sources and posted results to a database. Anything that needed management state would get it via a query to that database, which would not only gather and format the information, but perform any derivations needed. Derived operations APIs could be part of the VNF PaaS. This, by the way, was a goal in the first NFV ISG PoC that I submitted, and that became the first to be approved. The concept wasn’t adopted.

Nephio isn’t my first rodeo; I’ve been personally involved in multiple initiatives for virtualization of network elements, and assessed others as an industry analyst and strategy consultant. I’ve seen a lot of hard and good work go to waste because of seemingly logical and simplifying steps that turned out not to be either one. Nephio has the best foundation of any network-operator-related software project. That doesn’t mean it can’t still be messed up, and everyone involved in or supporting Nephio needs to take that to heart. I’ll talk about how that could be done in my next blog on this topic.

Nephio, the Open Source Savior of Function Hosting

Finally, a decade after the concept of NFV was first introduced, we may be on track to creating a useful implementation of function hosting. The Linux Foundation and Google Cloud announced Nephio, a project that would create an open-source framework for cloud-compatible function hosting and operations automation. It could realize the goals of the “Call for Action” paper operators put out in 2012, and create a model for at least the lower level of edge computing software deployment. It might also help Google along the way.

Everyone who works in or with the cloud at least knows about Kubernetes, it’s the container orchestration and DevOps tool that has almost swept everything else out of the market. Those who have followed my blog know that I’ve been a proponent of using cloud technology as the foundation of function hosting from the first. I’ve lost faith in the NFV ISG as the driver behind a useful implementation of function hosting, and even suggested that an open-source body like the Linux Foundation would be better. Well, we may find out now.

The Nephio framework presumes that function hosting is best viewed as three layers. At the bottom is cloud infrastructure, the real stuff that hosts functions. Then comes the virtual function workload resources, and then the virtual function workload configuration. Nephio proposes that Kubernetes orchestrate all three of these layers. All of this is managed through a series of Custom Resource Definitions (CRDs).

The bottom layer of Nephio is fairly classic in terms of Kubernetes use; CRDs are the ultimate direction here because they’re Kubernetes-centric, but there are accommodations for other models and APIs planned to make Nephio compatible with current function-hosting directions, like NFV, and also to link back to things like ONAP, via a “K8S Operator” function set.

The same approach is supported for the other layers. The middle layer is designed to replace the customized onboarding and infrastructure-as-code model used for network functions in NFV, with a declarative configuration model that’s again Kubernetes-centric. This is done through the same sort of K8S Operator technique. The top layer configures the functions, and so it will support things like Yang and the 3GPP 5G RAN functions (via a K8S Operator), and will talk to network elements via Netconf.

Overall, everything is built as an intent model, and Google contributed both source code and a reference architecture that would virtually assure that implementations would be open. It’s a major advance in network function virtualization. It doesn’t do everything, though, and so we have to look at the “does” and “doesn’t”
sides to assess the impact and when that impact could be expected. Keep in mind as you read on that the material now reflects my analysis more, and objective data from Nephio’s website is only an input to my views.

The first thing that comes to mind is that Nephio creates an abstraction of a function host that’s broadly applicable to any application of NFV, and also any application of edge computing. It provides for the creation of an abstract “host” that can include almost anything on which software can be run, no matter what it is (as long as it can host Kubernetes node control) and where it is.

One thing that means, I think, is that the Open RAN RIC function could be implemented through Nephio, and that could be of enormous value to 5G deployments, in operations and efficiency terms. If everything is a Kubernetes resource, be it hosting or deployable asset, it makes orchestration of service features much easier.

Similarly, IoT could be facilitated with Nephio. Edge assets for IoT could be on-prem near the processes under control, or distributed variously depending on hosting economy and latency budget. It appears that Nephio could handle this variable mix of stuff and stuff-residence, and that could not only support current IoT models but also facilitate the use of edge computing to support IoT as those resources become available. And, of course, it would support edge computing applications.

It also appears that Nephio could serve as a bridge that existing NFV could cross to reach a rational cloud-centric architecture. Certainly, the virtual functions (VNFs) of NFV could be deployed via Nephio, and they could also be configured, which would mean onboarding would be facilitated. Whether there would be any value to the specific NFV interface points for MANO, VNFM, VIM, and so forth remains to be seen given the minimal use of NFV today, and given the level of the current Nephio documents. I suspect that a lot could be done.

I think Nephio could also be used for white-box switch function deployment and orchestration. If that’s the case, then for sure you could use it for deployment and orchestration of proprietary network elements like switches and routers, which would essentially join these devices to the generalized resource pool created by Nephio. That would go a long way toward creating universal service operations.

But that gets us to the “don’ts” In its current form, Nephio does not do end-to-end service orchestration. If we try to map Nephio to NFV, what we get is something that creates a kind of universal, intent-based, NFVI (infrastructure) with a kind-of-VIM (virtual infrastructure manager) and MANO (management and orchestration). The VNFM (VNF manager) function is addressed peripherally, but there is no model or process designed to define a service or to manage its lifecycle. However, that model doesn’t exist in the NFV specs either, nor in fact is it part of ETSI ONAP.

Could there be a service layer to Nephio down the line? It seems to me that if we have intent-modeled elements, multi-layer orchestration through Kubernetes, and CRDs to describe things, we could add a kind of CRD to describe a service and functional logic to treat that as a decomposable model. If Google and the Linux Foundation are serious about transforming function hosting, I think they’ll need to look at doing this.

Another “don’t” is the virtual functions themselves. As I’ve pointed out in THIS blog, you really need a PaaS model for functions to be written to, and the same is true for edge computing overall. Otherwise, we risk ending up with implementation silos that require a lot of different executions of the same things, raising overall resource usage and making operations more complex.

All these contribute to a big question, which is just where Google will take Nephio. Did Google, flushed with love for their fellow networkers (and maybe filled with pity for those in telecom) just decide to do this project to contribute it? Doubtful. Google has always been a cloud partner of interest to the operators, and in fact were the only cloud provider operators actually asked me about engaging with. I think the fact that Google Cloud rather than just “Google” is the originator and sponsor of Nephio is telling. They’re going to do this, which means that there will be a functional implementation of at least the “do’s” part of NFV available. Everyone else in the market will either hunker down on “raw” NFV and likely fade away, or get on the Nephio bandwagon.

There are a lot on that bandwagon already, including major vendors like Intel and VMware, some network operators, and even some of the NFV VNF vendors. The community interest in Nephio is strong, which is a good sign since support from vendors is a prerequisite for rapid advance in the world of open source. One bad signal, though, is the fact that the majority of those ten operators who launched NFV in 2012 are not, so far, part of the project.

That’s bad because the NFV ISG and ETSI really need to get behind Nephio to redeem a lot of effort that’s been put into function hosting and service automation. The NFV ISG should restructure itself as a requirements contributor to the Nephio project, and a contributor to the issue of creating initial compatibility with ETSI NFV elements, and forget any technical activity outside that project. The ETSI ONAP project should consider how to build something designed as a higher layer, the new fourth layer, to the Nephio structure and focus on models for service orchestration (which it has stubbornly refused to do).

But, of course, neither of these happy outcomes are likely, and that could be good for Google. If they take Nephio where it needs to go while organized standards stand aside, there’s a chance other public cloud providers will stand aside rather than seem to validate Google’s thinking. That could leave Google as the only one thinking at all.

Google’s Aquila Marries the Past and Future of Networks

Google has been one of the great innovators, even though a lot of what they do isn’t appreciated or even known. They launched Kubernetes and Anthos for cloud hosting, their Andromeda SDN network is probably the largest SDN deployment on the planet, and they also launched the Istio service mesh. Now they want to transform data center switching, and maybe other aspects of networking too, with a new project called Aquila.

Data center switching is predominantly based on Ethernet, but for decades there’s been a movement to provide a faster and lower-latency model than a stack of Ethernet switches. One initiative was InfiniBand, another more recent one from Intel was Omni-Path. Vendors, of course, have also launched fabric switches with the same goals; I remember a number of Juniper launches on the topic, stretching back almost two decades.

“Over a decade” surely demonstrates that data center switching alternatives has been hot for some time, but it’s gotten way hotter recently with the advent of cloud computing and the potential deployment of edge computing. Edge applications are latency-sensitive, and so latency in the data center switch (either “pure” latency resulting from hops or latency arising from lack of deterministic performance) is a potential stumbling block.

Fabric switches are often non-blocking, meaning that they can connect any-to-any without the risk of having capacity or interconnect limitations interfering. That helps a lot in both pure and non-deterministic latency sources, but Google wanted to take things a bit further, and so in a sense they’ve married SDN technology and some principles from the old-and-abandoned Asynchronous Transfer Model (ATM) and cell switching to create Aquila.

Cell switching, for those who don’t remember ATM, was based on the notion that if you divided packets of variable size into fixed-size cells, you could switch them more efficiently and also control how you handle congestion by allowing priority traffic to pass non-priority stuff at any cell boundary rather than waiting for a packet boundary. The whole story of ATM was determinism, which is what Ethernet switching tends to lack and what fabric switches don’t fully restore.

The SDN stuff comes in as a replacement for the Ethernet control plane. SDN, you may recall, is also the foundation of Google’s Andromeda core network, and it provides for explicit central route control for optimization of traffic engineering. In Aquila, SDN is distributed, with some in a central controller and some in the custom chips Google has created for Aquila. The combination of technologies means that Google can support distributed Aquila elements up to over 300 feet apart (via fiber optics) and with per-hop latencies no more than in the tens of nanoseconds, including cell processing and forward error correction to further improve determinism.

Aquila is still a kind of super-proof-of-concept at this point, but it’s not difficult to see that Google could create a unified networking model for both data center (Aquila) and core (Andromeda) that would build very efficient coupling between hosting and network. That would be key for Google, obviously, in cloud computing, but it would be even more important for the edge, and it might well be decisive in a metro-centric vision of the future.

Where Google intends to take the pieces of Aquila is an open question. Do they sell the chips? Do they license the concepts/patents? Whatever Google decides to do, it seems likely that others will look at the same overall approach to see if they can do something competitive and profitable. Chip players like Broadcom, for example, might create their own version of Aquila chips and even the Aquila model, given that a lot of the technology is likely inherited from open specifications like ATM and SDN.

Another open question for Aquila is whether the same technology could be used to build a router. “Cluster routers” are available today—AT&T picked DriveNets’ for its core router, for example. You can build one up from white-box switches, which is what DriveNets did, and if we assumed that there were commercial Aquila-like chips available, somebody would likely stuff them into white boxes to create the same sort of model for an SDN-ATM hybrid fabric.

This could be important beyond the cloud, and we can look to AT&T again to see why. AT&T told the financial industry that it intended to create a feature layer above pure connectivity that would add things that OTT partners could exploit to create new retail services. It’s hard to see how such a layer could be created and sustained except as a hosted application set partnered with connectivity. It’s hard to see how that could be made to work without some mechanism to optimize the hosted-feature-to-network-connectivity coupling.

The relationship between Aquila and Andromeda is also interesting here. Is Google working its way up to a multi-level SDN control plane for centralized-and-yet-distributed operation? Andromeda proves that you can build a very efficient IP core network using SDN and make it look like an IP autonomous system by implementing a BGP emulator at its edge. The same thing could be done with any piece of an IP network. SDN would let you traffic engineer things inside but to the outside you’d look pure legacy IP.

Returning to metro, this sort of thing could also play a role in metaverse hosting, my “metaverse of things”, and even things like Web3 or other blockchain-centric concepts. SDN routes could grow from data center, through the core, to metro, and even outward toward the edge. SDN could be a key ingredient in supporting network slicing for 5G. In short, Aquila might be leading us to finally realizing the full potential of SDN, and providing us with a tool in rebuilding our model of IP networks into something more like a low-latency metro mesh. It might even be a way of preserving electrical-layer handling rather than an optical mesh of metro networks as the future core.

Probably few even inside Google know where it might take Aquila, and certainly even fewer (if any) outside Google. Google hasn’t historically announced technologies like Aquila before they’ve exploited them inside their own networks. Are they doing that here because they see that the role of Aquila may be developed very quickly, so quickly that they can’t wait for their own production commissioning? It’s an interesting thought, and it might indicate some interesting twists and turns in networking’s future.

What Do Operators and Vendors Think of my New-Feature Groupings?

In a blog I did last month, I talked about three feature categories that could represent reasonable targets for operators interested in promoting higher-level service partnerships. Recall that AT&T indicated this was one of their strategies for improving their bottom line. Since then, I’ve gotten a lot of comment from operators, vendors, and enterprises, and I wanted to share what I’ve learned (except the comments I got from people at AT&T). Hint: They don’t always agree with my view!

My feature categories were connectivity extensions designed to frame connection services in a slightly different light, connection-integrated features, and features that exploit service opportunities not yet realized. Views on all three were interesting, to say the least.

Connectivity extension features include things like elastic bandwidth, on-demand networking. On this topic, enterprises were highly interested, but their interest was focused totally on cost management. What their view boiled down to was that they’d buy such features if they lowered their overall cost. Obviously, that goal is totally incompatible with the notion of improving revenues.

Apparently neither vendors nor operators really get that distinction, despite the fact that by now it should have become clear. Within the network operators there was a minority of strategists who did realize that there were no non-cost-management benefits for this class of feature, but even this group indicated that their companies (specifically “the executives”) tended to believe in them. A small majority of operators said that establishing things like a “bandwidth exchange” where “spot pricing” could help them sell unused capacity could work. The strategists disagreed.

Among vendors, this class of feature gets even more support. Again, a small group of strategists in nearly every vendor organization saw this feature group as useless in the long run, but again “the executives” were said to hold out hope. Unlike operators, who actually had some specific view of what the features in this group might be, vendors were pretty vague.

My view here is that we’re seeing inertia in action. Networking has been about pushing bits for connectivity from the outset, and everyone has gotten comfortable with the technology associated with that. Yes, the CFOs in operator organizations are complaining about revenue and profits, but hey, CFOs always complain about something. Yes, vendors are being pushed by operators to suggest paths for operators to improve return on infrastructure, but hey, buyers always complain. What choice do they have?

The second feature group gets way more complicated in terms of the attitudes of the players. First, my distinction of the group as “connection-integrated” wasn’t well understood without the examples I offered in the blog. Include those examples and operators immediately said that there was no opportunity for them; the content piece was already addressed by established players and operators couldn’t hope to compete there. As far as security is concerned, I got a lot of pushback.

The majority of my operator comments put security in the same class as content services—too much established competition. A decent minority of operators thought that offering security in any form was a legal and PR risk they didn’t want to take. “Imagine a big TV story on how [my secure service] had a major breach,” one operator contact suggested. “It could hurt my business overall, and I’m not sure it can really be prevented.”

The point that they’re making is that security service might pose risks to the seller that security products do not. To sell a security service, you have to talk about what it protects you from. To sell a product, you can talk about the specific features. This same group of operators also rejected my characterization of their managed service opportunity, for much the same reason.

Network vendors took a very similar view. The majority really saw no clear opportunity for them in the content space. They’d love to sell data center switches to operators for “carrier cloud” but promoting the concept to prospective buyers was something they believed was simply too difficult. “You can’t educate a telco into an opportunity,” one said. “You’re wrong, there’s nothing here,” was another view.

On the security side, vendors were even more negative. Every major network vendor also has a security portfolio, and it’s usually more profitable than their network business. Why risk messing it up by either introducing new security features or by promoting operators to offer security services that would undermine enterprise sale of security technology?

Nobody liked my second feature category, in other words. You can take this comment as defensive, but I think that what the negativity proves is that there would be a considerable educational effort associated with promoting this second feature category. That point is important because I got some vendor comments from non-network players, and they were significant.

Server and software vendors, who didn’t comment on my first feature category, weighed in on the second and third. Not surprisingly, they see both these feature sets in a much more positive light. Roughly two of every three people in that space think that the third category of features is the most critical and that the second feature class is more of a stepping stone. A slight majority will admit that operators they talk to need “some education” to accept and realize either opportunity.

That, I think, encapsulates what I’ve learned from all these comments. There is an almost-universal view that operators are just not ready to move beyond basic connectivity. They’re hopeful that cost management can handle their profit-per-bit and return on infrastructure problems while somebody thinks up a new connection service. Some will add managed services and SD-WAN or virtual networking to that list. That leaves the question of how those operators would get that education.

The facile answer is that software/server vendors would provide it, but that answer is just how I characterized it—facile. Vendors have enormous influence on operators, but not software/server vendors but network vendors. Those vendors have no interest in promoting a new host-and-software model for infrastructure, and in fact have every reason to try to tamp that sort of thing down. Despite efforts, the software/server vendors haven’t been able to step up and promote their strategies for new features, for two reasons.

Reason One is that those software/server vendors haven’t paid their dues. They don’t understand the network operators’ issues, constraints, and processes. This has led them to blindly chase NFV, for example, on the theory that simply adhering to it would fulfill operator requirements and make sales. That’s obviously not true.

Reason Two is a point I’ve made before; new open-model hosting of features has to be platform-independent, meaning it has to work on containers, VMs, bare metal servers, and white-box switches. Absent that scope, there are too many missions it can’t fulfill, which means that operators can’t really fully depend on it.

I’m surprised and a little disappointed by the lack of interest in the feature groups, frankly. I think that it means that only a very few operators (like AT&T) are really facing the issues that all operators will face eventually. That suggests that the industry may transform way more slowly than I thought, and way more slowly than it must.