Has DriveNets Defended its Third-Party Hosting?

DriveNets has long supported a containerized-software framework that’s hosted in a disaggregated cluster of white-box elements. In effect, it’s a little cloud of its own, and the company even calls it the “Network Cloud”. They also support control-plane separation, and I’ve noted several times that their architecture could be used to bind the IP control plane directly to things like CDN, 5G, and even IoT. In short, I believe the Network Cloud concept has a lot of potential, if the company moves to harness it. Now, they may be just starting to do that.

It’s not that DriveNets hasn’t said their architecture would support third-party elements; they have. They’ve just been a bit coy about it, talking about the capability in a general sense. They’ve gone further than that in a white paper they released recently, titled “NCNF: From ‘Network on a Cloud’ to ‘Network Cloud’: The Evolution of Network Virtualization”. You can download the paper HERE (registration required). What they’re getting at is that the notion of function hosting has, for operators, evolved from being VM-specific (the original NFV model of Virtual Network Functions” to being more cloud-efficient through the use of containers. The end-game? Host the containers in DriveNets’ Network Cloud.

There certainly are a number of problems with the notion that NFV and its VNFs can be made cloud-native simply by running them in containers, and I’ve said all along that the notion of “cloud-native network functions” as the successor to VNFs should be named “containerized network functions” instead, because there is no way that the initiative is going to create truly cloud-native elements, whatever you call them. There’s too much baggage associated with NFV’s management and orchestration approach, and the NFV ISG’s CNF efforts don’t address three big questions about the use of containers or VMs in hosting service functions/features.

Big Qustion One is whether containers alone really provide the benefits of cloud-native behavior. An application isn’t stateless because you run it in a container, it’s stateless because it was written to be, and it will have that property however it’s hosted. As far as state goes, containers fall into a kind of gray zone between stateless and stateful. On the one hand, a container is supposed to be redeployable, which is a property sought in stateless design. On the other hand, containers don’t enforce that, they only pretend the attribute it there. In most business applications, that’s good enough because the containerized applications can recover from a failure and redeployment. In event-based applications like almost everything in networking, it’s not nearly enough.

Big Question Two is whether containers’ model fits virtual functions at all. A lot of the container-versus-VM discussion has centered on the number of each that a server could support (containers win because the OS isn’t duplicated for each, where each VM looks like bare metal). More attention should be paid to the fact that a container system is all about resolving configuration variability and dependencies in the software organization stage, rather than manually (or with DevOps tools) at deployment time. Containers work by setting limitations on how wild and free software can behave. Can NFV’s software be tamed like that? Remember that VNFs require extensive onboarding specialization just to run, much less progress cleanly through a lifecycle.

Our last Big Question is whether containers, or any cloud hosting strategy, can meet the latency requirements of many of the applications that are based on hosted functions. The O-RAN, for example, has both a near-real-time and non-real-time RAN Intelligent Controller, whose descriptions make them sound like they include some form of orchestration, something container systems use for deployment and redeployment. Spread latency-sensitive stuff around the cloud in an arbitrary way, and delay accumulation is inevitable.

So how does DriveNets address these Big Questions? We’ll have to make a few presumptions to answer that because of the level of detail in the paper, but here goes.

First of all, DriveNets’ model is a cluster-cloud, designed to deploy service features within itself, where latency is low and where there’s a mixture of general compute and network processor assets available. While this doesn’t offer the broad resource pool that cloud computing is known for, as Question Three noted, the big pool could be wasted on applications where latency issues dictate that things be deployed close to each other. DriveNets clearly addresses Question Three.

The next point is that DriveNets uses a container model, but not standard container orchestration, and it appears that the constraints that DriveNets’ orchestrator (DNOS) imposes on applications keeps them behaving in a more orderly cloud-native-compatible way. However, we have to recognize that standard, badly behaved VNF/CNFs would either behave equally badly with DriveNets or wouldn’t run. This isn’t a criticism of their approach; you can’t really convert physical network functions, the code that’s inside a network appliance, into an optimum cloud-native implementation without recoding it. It’s just that we can’t say that DriveNets fixes the problem of CNFs; it offers another model that would prevent those problems if the model were followed, but we don’t have the details of the model, so we can’t judge whether developing to it would be practical. We can’t give DriveNets a pass on either Question One or Two.

To me, though, those questions aren’t as important to DriveNets’ aspirations as another question, one not related to the whole PNF/VNF/CNF evolutionary dustup. That question is the benefit of having a third-party “NCNF” resident inside the Network Cloud to start with. The right answer to that question could totally validate the NCNF approach, but again, we can’t firmly answer it from the material.

There are two things that you could do with a NCNF application that would be less efficient or even impossible without it. First, you could build a network feature that drew on control-plane knowledge and behavior. You’ve got topology information, state information, a lot of stuff that might be very valuable, and if you could exploit that, you could almost certainly implement higher-layer connection-service-related features better. 5G is a good example. The 5G User Plane is actually the IP control/data plane, so the 5G Control Plane is a higher-layer service feature set. In theory, DriveNets could support this model and provide better services in that layer I’ve called the “service plane” above the IP control plane.

The second thing you could do is also perhaps related or could be related to 5G. You could build a forwarding plane behavior based on something other than adaptive IP. In short, you could create a parallel control plane to the IP control plane, and since you can partition a DriveNets Network Cloud into multiple “forwarding domains” with their own unique rules, the Network Cloud could play a different forwarding tune completely. Inside 5G Core (5G’s equivalent of Evolved Packet Core or EPC), you have to direct tunnels to get user data to the right cells. You could do that directly, without tunnels, using a different forwarding domain within a Network Cloud.

This validate-the-Network-Cloud isn’t just an academic exercise, either. The stronger a point you make about the value of cluster routing as a hosting point for network functions, the more you raise the profile of a competitive concept, edge computing. If clusters are better primarily because the resources are close and the latency low, then server resources similarly clustered together would be good too. If OpenFlow forwarding switches combined with edge-hosted control-plane stuff, you’d end up with something that would answer Question One the same as DriveNets did.

The critical question for is how they would enable the two possible benefits of close control-plane coupling, and the paper doesn’t address that. What I’d like to see from DriveNets is that next level of detail; how NCNF would work, how it would be exploited, and how they’ll control the security and behavioral compliance of any of these parallel domains. The current document adds a bit of diagrammatic meat to the bones of prior statements on third-party feature hosting, perhaps making some of the potential a bit more obvious. It still doesn’t do enough to convey what I think would be the real value of DriveNets’ model, and we need some more detail:

  • A description of how DNOR orchestration would control the deployment of third-party elements, perhaps relating it to Docker or Kubernetes.
  • A description, even at a high level, of the APIs available to link third-party elements to DriveNets’ operating system and middleware elements (DNOS/DNOR, presumably) and what they’d be used for.
  • A description of how higher-level elements could interact with the separate IP control-plane elements of the Network Cloud.
  • A description of how third-party elements could create another forwarding instance that would share the Network Cloud infrastructure, and what specific things that software would have to do to interact with the rest of DriveNets’ software.

Maybe the next paper will cover that, DriveNets?

More 5G Data Points, one Big Risk

Everything seems to be a trade-off, particularly with technology. Planners and CIOs are always balancing risk and reward, upside and downside. One place where that balancing act is especially important is in 5G, and especially 5G’s RAN, New Radio or NR. I’ve noticed that telco planners and some of the vendors engaged with them are starting to realize just how critical striking the right 5G RAN balance is, but most still don’t see their risks and rewards fully.

We’re hearing about two aspects of the 5G RAN trade-off in the tech media. One aspect is the O-RAN versus proprietary 5G RAN, though right now most vendors seem to be coming down on some open-model approach (except Huawei). The other aspect is the public-versus-carrier-cloud deployment, and that one is just getting started.

Mixing with these two factors is the release of a position paper outlining the “requirements” of a dozen big telcos. The paper seems on the surface to be a good thing; why not get buyer requirements for a change instead of pushing the sale of whatever was convenient to create? The problem is that the telco buyers have historically been inept at stating their own requirements, and this document is no exception.

I’ve worked with the telcos for decades, and for the last two of those decades they’ve been guilty of multiple-personality-disorder thinking with respect to open technologies. On the one hand, they hate vendor lock-in and thus tend to support things like white boxes and open source. On the other hand, they hate integration, so as soon as they have an open model to play with, they demand somebody single-source the integration of the components. Some of my telco friends tell me that this is different; they can always demand the integrator change out a piece they don’t like, and they can change integrators. I’m not sure about that line, but let’s move on.

The logical response to a buyer community that wants openness in order to somewhat-close it is to make a commitment to open-model yourself and provide your own “open elements” in critical places. Other than Huawei, who seems bent on keeping their original business model of proprietary gear, this is the approach that seems popular in the 5G space. Both Ericsson and Nokia have taken this tack, though it appears that Nokia has been more aggressive in their moves, and taken things further beyond the pretty-billboard PR starting point.

Ericsson and Nokia have two advantages; they are incumbents who know the telco space well and who are well-known in it, and they understand the challenges 5G evolution poses for operators because they’ve provided much of what operators are evolving from. No little player could hope to address either of these points, important as they are to operators. Only a giant non-traditional network vendor could have a shot.

There’s no shortage of players who’d like to be that giant non-traditional supplier. On the cloud software side you have Red Hat and VMware, in hardware you have Dell and HPE, both of which have software bundles they can link with their hardware. In this space (so far), I think VMware has the best initial position, but they’ve been preoccupied with executive changes and they’ve not moved as fast or positioned as aggressively as they should have. Thus, there’s a little wiggle room for others to creep in, providing they are truly aggressive in contrast, and that they do their thing before VMware gets its management act coordinated.

There’s another option, of course, and it opens the second of my two media-visible 5G RAN aspects, the role of the public cloud versus carrier cloud. If you want open technology in a pre-integrated form, and if you want to avoid vendor lock-in above all, what better option could you get than 5G-as-a-service? In fact, this approach is getting so much traction (and publicity) that it’s spawned a secondary trend, which is a kind of “5G commune” where operators share a single 5G deployment, presumably sliced in conformance with 5G specs, and whoever builds that collective network bears the risks.

Some players, like the antenna providers, are logical entrants in this second space. Others, like IBM and Cisco, are rumored to have looked at the approach, seeing it as a kind of managed-service network that (in Cisco’s case) would be congruent with their expense-everything-as-a-subscription-service approach. But let’s return to the real giants here, the public cloud providers.

Public cloud providers could theoretically mix all the model options for 5G. They could host arbitrary functionality the operators provided, they could provide their own software functionality and then either let the operators host it, or host it in their own (or someone else’s) cloud. They could also offer hosted 5G-as-a-service with or without shared network resources and slice-based tenant separation.

Microsoft may be the one who’s thought about this all the most, and taken the most steps to realize all the options. They’ve done some serious M&A (my favorite 5G software player, Metaswitch, was one) and they’ve talked up hosting operator 5G, but they still sell the software they’ve acquired. I’m also getting serious indications that they’d like to offer a shared-by-slices 5G model, and that they’re more serious than rivals Amazon and Google.

In a way, all this happy explosion of choices of supplier and form of service could be compromised by the buyers themselves, specifically that requirements paper. Apart from the fact that the paper isn’t the best piece of technical composition the world has known, it’s also suggesting things that would tend to pull O-RAN away from the cloud and more into the world of NFV and ONAP. This might play well in Europe, the source of the paper, but I’m not hearing hymns of praise for that in other market geographies. One Asian operator told me that “This looks like an attempt to guide and promote O-RAN by burying it in bureaucracy”.

That’s the biggest issue facing open-model networking in a nutshell. The future of networking is the cloud, period. Operators, at the planning levels associated with their standards initiatives, don’t get the cloud, period. To the extent that they attempt to assure their real issues are covered in a cloud migration, they introduce box-network concepts to explain those requirements and move away from the function-network concepts that they should be embracing. For the vendors who want to see O-RAN and open-model networking succeed, countering this unfortunate bias is the most important thing they’ll need to accomplish, both in product design and in positioning.

Where is Enterprise Transformation, Really?

You gotta love “transformation”; the term is hopeful, positive, even revolutionary, and at the same time it’s almost totally vague, which means you can say pretty much anything you want about it. One reason, as I suggested in a blog last week, was that IT planners and network planners are actually vague themselves on the topic. A technology shift without buyer specificity is bound to end up as a hype magnet.

A recent article on CIO attitudes on the transformation topic caught my eye. It promised a boom year for tech in 2021, buoyed by “transformation”. I tend to agree that this year will be generally good for tech, but I’m not sure about the role transformation will play in that. In fact, I think that transformation’s impact is likely a couple years further in the future.

All surveys are fraught with peril, as they say. Most of them don’t target people who are actually tuned into the topics and trends that are addressed. Even those that are are prone to the survey equivalent of the Stockholm Syndrome; say what your agent wants to hear so they’ll love you and respect you. I’ve worked hard since 1989 to get actual, good, user data, and I’ve found that on many topics it’s utterly impossible to do that unless you sneak up on the topic. Don’t ask “Are you using 5G or IoT?”, ask about current plans without suggesting specific targets. That’s only a starting point, of course.

Another thing that has to be guarded against is “connective bias”. You ask about Thing A, you ask about Thing B, and you connect the responses to the two even though the subject may not have intended that. In the article I’ve cited, CIOs listed strategic priorities as AI, quantum computing, and 5G, but were these the source of a spending boom? I just finished a round of Q&A with enterprise planners in every major market geography, and none of these were considered to be spending priorities for 2021. Further out, sure. Now? No. How, then, would these strategic priorities be influencing the boom expected this year? There is no credible connection.

The boom we’re almost surely having this year is due primarily to two factors. First, companies were cautious in spending in 2020 because of COVID and the lockdown, and they focused their budgets on dealing with those unexpected conditions at the expense of other priorities. Now they have to catch up. Second, the recovery we’re starting to see is boosting current economic growth, which almost always boosts corporate revenues, and budgets tend to be looser in periods of strong revenue growth.

What about the strategic priorities, if we look beyond 2021? I don’t doubt that CIOs would respond to a strategic-priority question that way, because what CIOs are always looking for is an opportunity to use IT transformationally. Since the dawn of commercial computing in the 1950s, we’ve seen three cycles of roughly seven to ten years each, where a new technology paradigm created a surge in IT spending growth relative to GDP growth. If you graph the two together, in fact, the result looks like a sine wave with three peaks. The last one died out in around 1999, and we’ve not had a renewal since. That’s the longest we’ve ever gone without an IT spending wave, and CIOs are hoping for another one for obvious reasons.

The missing link here, both between hopes and actual spending waves and between 2021 IT spending and strategic technologies, is productivity. What transforms a business is what transforms how the business is done, which is how workers do their jobs. All our prior waves of IT success were based on a paradigm that offered productivity benefits. It was the benefits that drove the spending, because they created projects with ROIs that far surpassed the corporate targets. There were dozens of technologies that failed to achieve that, so we can’t say that “technology” will drive digital transformation, no matter what technology it is. It’s the productivity benefits that are in the driver’s seat, and that means we have to at least recognize how to secure the benefits to validate the transformation.

Why have we not been able to do that since 1999? Because any process of successive transformations will tend to pick low apples first. The first wave of IT empowerment in the 1950s and ‘60s was driven by batch computing, which was a revolution because we didn’t have any form of computing before. The second wave was driven by online transaction processing, which was clearly way more efficient than punched cards, and the third wave by personal computer empowerment of workers. All of these were, in a sense, easy wins. All got IT and its information resources, closer to the workers. What’s next?

It wasn’t just “the Internet” or “remote work”, because while both those things had a major impact on some workers, and also on how companies sold things, they didn’t alter the nature of most jobs, where prior waves did. We’ve kind of run out of low apples in improving the traditional way we do business, and we now have to start looking at the non-traditional.

CIOs are telling us that with their strategic priorities, which reflect things they see as profoundly different. According to the survey, CIOs said AI was the “most industry-impactful technology” in the longer term, and yet we have AI today and it’s not been transformational in the sense that other wave-generating technologies were. The reason is that we think AI should be great, even if we don’t exactly know how to make it so. That kind of project plan isn’t a project plan at all, it’s a novel.

I’ve blogged many times on what I personally believe would have to be our next-wave driver. It’s what could be called the “digital twin” approach. You use IoT and analytics to create a model of a real-world business, populated with real-time information on every aspect of its operation. You can then use this model to plan, direct workers, move goods, and so forth. One problem with the approach is what operators call “first cost”, the amount of money you have to spend to get that digital twin in place, but it’s not the biggest problem.

The biggest problem is delayed gratification for vendors. Buyers will never sit around planning technology advances that nobody is selling. That’s a waste of effort. Vendors will never plan transformational technologies that overhang current solutions and the sale of current products. Thus, we have nobody stepping up to do the necessary lifting.

There could be a major transformational role for AI, IoT, 5G, quantum computing, and other glamour tech concepts, but a role is not the whole play. We need a transformational model into which we fit transformational technologies, and the possible sources of this are all sitting on their hands. I hope a startup with good backing and good positioning will break us out, because I don’t see any other good options.

Can Amazon Sidewalk Teach us Federation Lessons?

IoT has been a cherished potential revenue opportunity for 5G for years, but there have been skeptics of the link for just as long—including me. The challenge for 5G IoT is that unless the service is free, people would have to pay to connect IoT devices. Service providers are OK with that (obviously) but consumers and enterprises are more than skeptical. We have IoT in many homes and offices without paying for connection. Why not continue with that model?

One reason is the limited range of traditional IoT, which is based on WiFi or Bluetooth in most cases. As COVID lockdowns ease and people start moving around more freely, they’re more likely to encounter situations where their home/office IoT strategy doesn’t work well for them. Amazon’s Sidewalk may be an answer, and may also be a path to a general solution. That doesn’t mean it would present a path easy to tread.

Sidewalk is an Amazon strategy for providing specialized device access across multiple WiFi/Bluetooth domains, without creating a link between the networks that would present obvious security/compliance risks. Sidewalk lets Amazon pass some IoT data from one home/office location to another in a secure way, extending access to IoT elements. Users have to opt in to use Sidewalk, can opt out as well, and thus the coverage Sidewalk provides depends on just who’s willing to share their resources. I’ve not heard of any performance or security issues associated with Sidewalk so far, but some who have attempted to use it have found that few neighbors have signed up, so it offered them no benefit.

I’m not promoting Sidewalk here, in large part because of the limitations that user willingness to share impose. I do wonder if some aspects of the Sidewalk model might be a better solution to many IoT applications than a pay-for 5G service.

Sidewalk is a form of “federation”, where “services” of a network are shared with other networks. We’ve been kicking around federation in telecom for decades, in the form of interexchange or international calling. Network-to-network interfaces (NNIs) provide for the extension of service connectivity in almost every network service that’s widely used, from phone/SMS to the Internet. The problem with NNIs is that they offer network-level connectivity, which is too broad to support residential or business IoT access without major security/privacy risks.

Over the last twenty years or so, operators have explored making federation more flexible than simple NNIs. As we’ve come to think of a “service” as something that’s composed of a series of components or elements, sharing those service pieces has looked attractive to some operators. The Open Grid Alliance (OGA) seems to be targeting a form of this sort of sharing or federation, and in the past there have been initiatives from the TMF and the IPsphere Forum with the same target.

For IoT, federation seems likely to be offered at the device level, meaning that a network user would share access to their network, not in general, but to support accessing a device or devices that met specific requirements. For example, Sidewalk supports federation of devices that are on-network to a given user, but owned by another. There’s no network-sharing, only sharing of specific access to specific things that are on a foreign network.

A “tag” applied to an object to allow it to be found is an example of something that could be federated. The rule might be that 1) the tag is owned by the user requesting access, 2) the tag is in communication with an on-network gateway element that can detect it and its properties, and 3) the tag, the tag owner, and the network owner are fully authenticated by the federating authority, and 4) the use of the tag federation link matches the limits set by the network owner.

Having the ability to find a lost object that was dropped in another yard is an obvious example of an application that could have appeal, particularly if we presumed that in order to access a federated network to search for something would be provided only to users who elected to share their own networks for the same purpose. That might be especially valuable in neighborhoods or office parks where a formal association or a social connection among residents could promote collective decisions on federating network access.

The same kind of socialization might promote another level of useful federation, which we could call “app-specific connectivity”. Just as a tag could be certified as a federated access device, we could certify an app that’s looking for the tag. That app could then pass requests, via the federation agent (Amazon, for Sidewalk), to a device on another network, subject to the same rules I noted above. That way, if you’re trying to control your sprinkler system from the back yard, where you have a zone installed, but don’t have WiFi there, you could bridge through a neighbor network where access was provided.

There are a lot of ways of looking at this sort of relationship, but I think the only one that makes sense overall is that a service would be defined to represent access to devices or features on one network by users, devices, or software on another. In other words, we should be viewing this the way a service provider would view it, as a formal federation. There are a number of good reasons for that.

The first reason is that defining this as a service would define everyone’s role in it. The problem with having some sort of network-pass-through is that there are a lot of pieces that would have to cooperate and no way to ensure they all did. If my phone is going to control my sprinkler through my neighbor’s WiFi, without making me a full member of the neighbor’s network, then it’s almost certain that there would have to be a gateway element that would act as a proxy on both networks and pass only the proper messages through it. You can’t do gateways without knowing, and having the support of, both sides of the picture.

The second reason is that having support for this cross-network access is not likely to be free in all cases. The utility of having the ability to create a kind of mesh network of individual WiFi and Bluetooth networks is dependent on the coverage you can achieve. Might we see companies, service providers, and other entities deciding to pay to have their stuff cross-linked as a way of making their product/service more valuable? Could a cable company like Comcast, who aspires to create a kind of WiFi-cellular equivalent network, pay users to participate, perhaps in the form of a rebate? In any event, you have to be able to journal what you’re charging for, and prevent misuse or overuse.

The third reason is that we can’t have every possible device and application from every possible supplier come up with their own proprietary approach. The implementation burden, the integration efforts to get everything connected, and the security challenges, would quickly become insurmountable.

If we could get a formal standard for this sort of cross-network gateway access, one that addressed all the requirements I noted above, we could create a platform for mobile IoT without deploying any specialized technology. This wouldn’t address all the mobile IoT applications that might be targeted by 5G, but it would almost certainly address all the requirements associated with home/office IoT. Some 5G proponents might not like that, but there was never any realistic chance residential users would pay for 5G connectivity when they could use the WiFi they already have.

Federation formality (to be euphonic) would be an asset to the network operators too. I’ve already noted the OGA initiative here, though I’ve not seen anything to suggest that the body has the right approach to the problem. I’d also cite THIS TelecomTV that postulates a form of NaaS-think as a pathway to operators sharing infrastructure. That was the goal of most of the past federation initiatives, and a lot of progress was made by those efforts even though there was no agreement reached as a result.

Federation, in a nutshell, is a step toward NaaS, toward the idea of abstracting a “network” as a set of services and then assembling services to create what appears to the user as a traditional network. If Sidewalk forces us to think in those terms, if it stimulates the Open Grid Alliance to think about federation the right way, it’s served a useful industry purpose whatever happens to it as an Amazon feature.

Getting a Handle on Security

Why do we seem to have so many problems with IT and network security? We hear about a new attack almost every day, a new risk, a new set of cautions, and (of course) new products. You’d think that given the long history of bad actors in the space, something effective would have been done by now. It hasn’t, clearly, and we can’t hope to do better if we don’t understand why we’ve done badly so far.

It’s fair to say that almost all the security problems we have stem from or are facilitated by the Internet. The Internet has become a major fixture in our lives, even the center of many lives, and providing Internet access both at home and at work is broadly seen as mandatory. Unfortunately, there are major problems with Internet security, and that opens us up to a lot of risks.

The first, and biggest, problem is that the implementation of the Internet is inherently insecure. We presume open connectivity unless somebody closes it, and of course it’s hard to close off a connection to someone you didn’t even know about. We exchange links that can represent anything, and with one click we can be compromised. We don’t authenticate sender addresses, or much of anything else, and there’s broad resistance to making things on the Internet more traceable. I understand peoples’ reluctance to be tracked, but ironically many who feel that way nevertheless accept all cookies and end up being tracked in detail.

The second problem, nearly as big, is that we’ve tried to improve Internet security by band-aiding symptoms of problems rather than addressing fundamental issues. We add things like firewalls to protect us from unauthorized access, companies can examine emails and attachments for malware before they’re delivered, and we scan our computers with anti-virus software regularly to catch whatever might have been missed at the delivery level. A total security model would have been helpful, but most people and companies tend to think that extensive security improvements are simply too inconvenient or too much work.

The third problem is that we’ve turned the protocols of the Internet into the universal network protocol for businesses, without addressing the limitations I’ve already noted. In fact, many businesses apply no real control over information exchanges within their companies, making VPNs less secure than the Internet if we’re considering an enemy within.

The fourth problem is that personal interest in content and information has encouraged us to be relatively indiscriminate consumers of online information, often on personal devices that do double duty as business devices. Something can be planted on our devices while we’re using them for our own purposes, even at home, and that something then becomes the enemy within I just noted. They don’t call this sort of thing a “Trojan Horse” for nothing.

The final problem complicates everything else. Vendors make money selling security, more in many cases than on selling network equipment. Users have gotten project approvals for early solutions, and it’s difficult for them to tell the CxOs that the early purchases are now obsolete. I see a lot of this sort of inertia limiting the willingness of sellers to promote what we know is a better approach, and a lot of buyer reluctance to admit earlier strategies weren’t optimum.

We could now, if we were optimists, talk about the steps that we proposed to address each of these issues, and claim with considerable justification that those steps would fix the security problems. The problem is that there is virtually no chance that the “right” solution to each of the problems I’ve noted here would be adopted. The impact on users and network operators would be enormous, and there’d be no authority on the planet who could compel everyone to play nice with all the solution strategies. We need a more contained approach, and I think we can define five things that could be done to improve security significantly.

The first thing is to implement zero-trust connection security within a business. What that means is that all connections between network-addressed elements would be considered barred unless explicitly authorized. It means that there’d have to be a fair amount of up-front work done to get the trusted relationships defined, but role-oriented hierarchies could simplify this considerably. Once we have a trust list, any attempt by any network element to connect off-list would be journaled, and repeated attempts could then take the element out of the network until its behavior was explained and remediated as needed.

This process should be extended with respect to “critical” resources. Any attempt by another non-trusted element to access a critical resource should result in the element being taken off-network for examination. This could mean that the offending element was infected, or was actively attacking.

Thing two is to apply central virus scanning to all internal emails. Many companies don’t scan internal emails for malware, which means that an infection in one system, even one with no rights to access a critical system, could be spread to another who has full access rights. Secondary vectors of attack like this must be eliminated if zero-trust access security is to be meaningful.

The third thing to do is to create private subnets for critical applications. All too many enterprises have a single address range for all their users and applications. That makes it hard to control access and hacking risk. If critical applications are in their own private subnets, the components of these applications can access each other, but the private address space means these components can’t be addressed from the outside. Those APIs that represent actual points of access can then be exposed explicitly (via NAT), with stringent access controls based on zero-trust principles.

Thing four is to create virtual air gaps. Most people are familiar with the concept of an “air gap”, meaning a break in network connectivity that isolates a system or systems from the outside world. In the real world, it’s often difficult or impossible to completely isolate critical systems, without making it impossible to use them. However, what could be done is to eliminate network connectivity except through what I’ll call a “service bridge”. This is a gateway between a critical subnetwork and the main company network that’s created by a pair of proxies, one in each network, that pass only service messages and not general network traffic.

Linking a critical system’s subnetwork to the company VPN, even based on the prior private-address technique, is a risk if a system that’s permitted access can be infected. If all interactions across the boundary to the subnetwork are “transactional” then nothing can be done that’s not been predefined as a transaction.

The final point is use separate management, monitoring, and platform software tools inside the critical-system subnets, not a single strategy across all network elements and IT components. There should be no sharing of tools, resources, etc. between critical systems and the broader company network and IT environments. Systems tools represent an insidious back-channel path to the heart of almost every application in a company, as the Solar Winds hack proved. Not only that, operator errors and configuration problems often open security holes, and if critical systems are managed by the same tools that manage general connectivity and IT deployments, the number of moves, adds, and changes made overall create a greater risk that one of them will accidentally create an issue with a critical system.

Container use would simplify all of this. Think of critical systems as being in one or more separate Kubernetes clusters, with all associated tools confined within the cluster boundaries. Containers would also facilitate the standardizing of application packaging, which would reduce operations tasks and thus reduce errors. However, as already noted above, critical systems should be isolated at the platform/tool level, so container systems should partition them in different clusters, each hosting an independent Kubernetes instance, and all federated via a tool like Anthos. For the federation process, it’s important to protect the network pathway from Anthos to the clusters so it doesn’t become a risk source.

The final point is to address any issues of physical security that are a concern. If active physical security threats are possible, rather than remote hacking, then the network equipment needs to be protected in a secure room and cage, with access control. In addition, all systems inside the security perimeter need to have USB ports and network ports disabled to prevent someone from introducing a new device inside the subnet. Additional protection can be added by eliminating DHCP address assignment in favor of assigning IP addresses to devices explicitly, to prevent someone from adding a computer to the secure subnetwork and having it obtain a trusted address.

We have to close with tools and practices. While there are publicized vulnerabilities that are exploited by malware regularly, many of these rely on improper security practices to spread and do significant harm. The great majority of problems with networks and IT infrastructure are self-inflicted wounds, things like poor password practices, improper network setup, failure to isolate critical systems and protect “internal” application APIs, all create an environment that can generate its own risks, and increase your vulnerability to outside forces. Moral: there’s no substitute for planning your network and IT infrastructure and systematizing your operations.

A Cloud-Native 5G/O-RAN Model

Why do I make so big a point about “box-centric” specifications for network virtualization? If somebody virtualizes a box, isn’t it the same thing as virtualizing everything that’s in it? In a blog last week, I looked at the 5G O-RAN specification and talked about some of my issues in abstraction of the functionality. I want to dig deeper today, looking at how you’d do O-RAN in an optimum cloud-native way. As always, I want to start from the top.

We get online from our desktop and laptop computers all the time. We sign on to WiFi, we’re assigned an IP address, and we’re there. Let’s call the “service” that this represents “Service”. Why doesn’t this simple approach work for 5G? The answer is in the word “mobile”.

In a mobile network, we have a device that we move around with, and the connection is made from a series of cell sites. We move through these sites as we walk or drive. If each of these sites were like WiFi, we’d get an IP address in each of them as we entered, which would be fine if we never expected to have a voice or data session active while we were moving. Since we do, we could expect to get an IP address from the cell site when we turned on our phone, and as we moved, we’d need to keep that address or any web activity we had ongoing would be broken when the server replied to an address we no longer had, that would no longer reach us.

What makes mobile connectivity work? We need to define another service, which I’ll call “Mobility Management”. This service somehow finds out where we are and ensures that our traffic is directed to the right cell. The Mobility Management service needs to have a “sub-service” we’ll call “Registration”, that’s invoked when a user appears in a cell, to ensure they have a right to service there. The figure below shows this simple functional structure.

A mobile device needs to support four services, then, meaning four interfaces exist between the device and the mobile (5G) network. We could draw 5G as a black box and show four service interfaces to the device, period. If the mobile device and the network conformed to the same spec (including the radio spec, of course) for these interfaces, any implementation would suit the device and the services. This sure makes 5G look simpler, right?

A High-Level Functional View of 5G

My diagram is a functional view, where the specs focus on the implementation of the services, and that’s my first quarrel with most telecom-related specs. They either don’t produce a functional diagram at all, or they produce one by defining how the function could be performed. We draw these diagrams with boxes representing functions, and we show workflows between interface points that we label. The diagram below represents O-RAN as the Alliance shows it, and you can see the difference.

O-RAN Architecture (from O-RAN Alliance)

What you see here is focused so much on implementation it doesn’t even show those high-level interfaces at all. That’s because O-RAN is really focused on supporting the Mobility and Registration services, and those services are driven from the network side. What you do see are the internal interfaces, things like A1, E1, E2, and F1. You also see functional blocks like CU-CP and CU-UP or O-DU and O-RU. You might wonder where the user is in all of this (nowhere) and where and how the whole process represented in the diagram kicks off.

The temptation is to declare this to be an “application”, something that’s running to supply those Mobility and Registration services, which are triggered by things that are found by the “Application Layer” of the near-RT RIC. The Mobility and Registration services created in that layer interact with features in the device, visible to the user only in the way mobility works. Actually, mobility interacts with deeper 5G standards, 5G Core, for the whole of the service, and if you were to look at a picture of 5G NR (the RAN piece) and 5G Core combined, you could identify users, cells, and the Internet.

The problem here is that declaring something to be an application doesn’t provide a hint about the software architecture it uses, and thus doesn’t provide any insight into how it could be made “cloud-native”. If we could travel back in time, I’d suggest that the O-RAN people start with a functional model similar to the one I showed earlier, but of course that’s not going to happen. We’re also not likely to see any changes in O-RAN that would violate the Alliance’s architecture diagram above. What we’ll have to do is try to interpret the O-RAN diagram in light of the functional model diagram.

If we take the functional diagram as a starting point, and assume the “service” structure I described above, we can say that a “real-world cloud-native” O-RAN implementation starts with two event sources, the User Devices and the Access Network. The user device initiates or participates in registration, and it surely initiates requests for Services like voice/SMS or Internet access. The Access Network initiates/participates in roaming between cells and registration.

Events from these two sources would pass to the near-real-time RIC for handling, and we could visualize this function as being a variety of things, in cloud terms. At the superficial level, it might look like an API broker that pops an event queue and initiates a request for an appropriate microservice to handle the event. It could also be a service mesh with the same goal. Finally, we could presume that since something, somewhere, has to know about a User Device relationship to both 5G infrastructure and Services, it could be a state/event-based microservice set linked to a “relationship record” associated with the device. I’d favor this approach.

The “orchestration” function of the near-real-time RIC is a deeper question. Near-real-time suggests that there’s a need to minimize latency, which to me suggests that we don’t want to use serverless hosting of functions. In any event, I think that the 5G O-RAN instance is likely going to see enough events that there’s little value in loading a microservice on demand and then aging it out; there’s little chance it wouldn’t be needed almost immediately.

Containers seem to be a logical approach for hosting, which might make RIC orchestration (for both the near- and non-real-time RICs) a Kubernetes application, perhaps with the addition of a federation strategy like Anthos. I’m more convinced of the need for federation here than in the need for Kubernetes in the near-real-time RIC. It may be that resources within an O-RAN “enclave” would be dedicated to O-RAN, in which case generalized container orchestration could be overkill. However, if you believe in edge computing, then we need to generalize as much of the RIC processing, even including event-handling, to make them a platform software component for the edge. In that case, a RIC would be truly an edge application.

If we were to accept this model, then the boxes in the O-RAN diagram become microservices or flows of microservices driven by the relationship record. That would tend to converge boxes that the diagram shows separately (the CU and DU, for example), since these do much the same thing but at different points in an event flow.

The “user plane” of the 5G flow, my “Services” block, is something that I think should be more abstracted. In 5G, the user plane starts with the RU-to-DU “front-haul”, then DU-CU for the “mid-haul”, and finally CU-5GC (5G Core) for “backhaul”. Most of the heavy lifting is done in the 5G Core Packet Core, which itself has a pair of Gateways, but all of the elements perform only basic functions that could easily be viewed as either hosted on servers (or on disaggregated white-box components of a router like the DriveNets model). Rather than specify a bunch of discrete functions, I think we should view the entire user plane as a “service” as my diagram suggests, with an interface facing the cell access and user device portions on one end, and the service networks on the other.

There are two reasons for this view. First, there are multiple ways in which a fixed IP address associated with a mobile device could be “found” in the right cell, and we should be open to all of them. Second, I think that the abstraction of the user plane as a “Service” promotes loose coupling between a 5G O-RAN or Core implementation and the data plane, which reduces the risk of having an open model become “closed” in some implementations. In 4G Evolved Packet Core, we have “admission control” elements that regulate call and data traffic to reduce congestion risk; the equivalent 5G Core functions would be represented by APIs rather than boxes in my model.

This is how I think a true cloud-native 5G model could be derived. I’m sure there are other approaches, but I think this does a fair job of balancing the goals of cloud-native software, the future edge computing evolution, and the structure and interfaces specified in O-RAN and 5G Core. Let’s hope it at least generates some discussion on these points, before we end up turning O-RAN into one more virtual-boxes structure.

Unraveling the Mysteries of Enterprise IoT

Why do enterprises think IoT is a key to transformation, but at the same time seem to be vague on the “Why?” Enterprises have been singing IoT transformation praises for over a year, and the same point has emerged in other analyst reports and surveys, but there’s rarely been a lot of detail offered in just what IoT was going to do. I’ve been trying to model IoT’s future, but having problems getting useful data on future plans. So, keeping up my recent effort to track the viewpoints of key technology planners, I dug deeper into enterprise IoT, and found some interesting stuff.

One thing that struck me pretty quickly was a point of commonality between enterprise planner-think and the way that network operator planners conveyed their own views. Neither of them took a top-down, architectural, approach. Neither expected to drive transformation by adopting some key technology. Instead, they both thought of hosting, of mechanization, and assumed that the mission for IoT would come along on its own. They saw their own role less as an evangelist for IoT missions than as a mechanizer of whatever missions emerged.

Almost 90% of enterprises say they use IoT technology, and 100% say they plan to use it or expand their current use. These numbers are a bit less impressive if you consider that enterprise planners view anything that uses remote sensor/controller technology as IoT (not surprising given that media coverage does the same). Still, 100% is impressive. Does this mean that a huge enterprise base thinks IoT is indeed critical to transformation, and has specific plans to adopt or expand that application? No to both.

One interesting data point in the enterprise transformation space is that enterprise planners are divided with respect to what “transformation” is. About a quarter of them say it’s a “revolutionary shift in how their company does business”, which doesn’t necessarily imply that it’s rooted even in IT, much less IoT. Another quarter says it’s “better use of information to guide decisions”, which at least implicates IT. A third quarter says that it’s “changes in business practices/processes to optimize productivity”, which is again fairly non-specific, and the final quarter say that transformation is “a shift in IT to significantly reduce costs and improve benefits”. The point here is that none of them specifically link transformation to IoT.

OK, obviously these planners are going to have to be asked more pointed questions to get useful responses. The obvious one is “What role do you see IoT playing in transformation?” This brought at least some more specific answers, but not necessarily satisfying ones. Forty percent of planners said that IoT would “improve manufacturing and the movement of goods”. Twenty-five percent thought IoT would “enhance the use of AI”, twelve percent said that it would “facilitate robotics”, another twelve thought it would “cut human effort expended in routine tasks”, and seven percent said it would “reduce carbon footprint and energy consumption”. The remaining three percent had no insight into the role.

We’re still not getting much to validate the notion that enterprises think IoT is transformational, or at least are convinced enough to have looked at specific options. Time to try “What if?” games. What if workers could be directed to a specific place, product, or part based on knowing what they were supposed to be doing? Every planner thought that would be helpful. What if people walking by a product could be alerted to its price, based on the same principles as are used to target advertising? They all liked that too. Improve manufacturing by coordinating delivery of all the needed parts to the right place in the factory, at the right time, and at the same time ensuring that the deliveries didn’t create congestion? Loved it. In fact, every specific application I cited got almost universal backing from planners, the same people who didn’t mention any of them spontaneously.

The likely reason for planners’ inability to come up with specific transformative IoT examples is that they’ve not thought about it much. How do you square that with almost-universal IoT adoption? The answer is that enterprise planners think of technology products or elements in a specific-mission context, but think of technology concepts in very general terms. Digging into current enterprise IoT, what I found was that enterprises tended to get IoT as part of a broader technology suite. Almost 80% of enterprise planners said that their IoT deployments were project-based rather than technology-driven. Even among the 20% who said they’d acquired IoT elements directly, three-quarters were augmenting prior project-based deployments. They changed manufacturing, or warehousing, or patient care, and the changes included some IoT.

Enterprises, then, are indirect consumers of IoT. They combined mission and overall facilitating technology into a single deal. What this would mean is that to promote transformation via IoT to enterprises, you’d have to create a packaged solution to a recognized problem or opportunity. You’d not just present sensors, controllers, and other IoT elements and expect the enterprise to deploy them in combination with other elements they somehow knew about, to do some unspecified but important task.

There are specific facts to back this assessment. In three specific verticals (manufacturing, healthcare, and warehousing), planners told me that their current IoT commitments were created through packaged solutions. They didn’t install IoT to track drugs or move goods, they installed tracking and movement technology and IoT rode along. Many of the planners didn’t even know the details of their IoT systems because they were part of something bigger.

The one area where planners saw what might be a generic IoT requirement they could address explicitly was networking. Obviously, IoT devices need to be connected, and whether it’s wireless or wired, the facilities are likely to be provided independent of the applications or “packages” that provide IoT devices and other technology elements. The use of WiFi 6 for IoT was the network interest of three-quarters of planners, where 5G was in view for only a bit over 20%. Most planners expected to use technology already in place, however.

Clearly, all this has significant implications on IoT adoption. First, as usual, a lot of what we hear about IoT is really vendor-driven, and even what users say they’re doing is really very high-level consideration or simple curiosity rather than actual project planning. Vendors tell me that hype waves like the one we’ve seen in IoT and the one in 5G are ways for them to drive customer engagement. They can get an appointment talking about something that’s hot, and they are unlikely to get it trying to push the same story they tried the last time.

The second meaning compounds the first, too. Project-driven IoT is less likely to be seen as “IoT” explicitly, and as I’ve already noted, IoT elements may be secondary to the mission to the point where they’re not particularly conspicuous. That’s made it difficult to even survey IoT reliably or discuss real-world IoT projects, which of course means that there are fewer thoughtful and legitimate stories. It’s why getting information for my modeling has been difficult.

For vendors who recognize that they actually need an effective IoT sales/marketing and product strategy, project-centric IoT means that channel sales through vertical-market integrators may be the most important channel between them and their prospects. Thus, a strong channel program is probably the biggest asset in all of IoT. Not just any channel program, though. What’s needed is a channel program that encourages serious integration skills, partnerships outside the IoT equipment space, software skills—in short, more than a typical partner would supply. The term “VAR” for value-added reseller, comes to mind, though the concept has been weakened over the years in terms of what the value-add might be.

There’s a lot of commonality between operator planners and enterprise planners, so I think. Both of them are focused on implementing something rather than deciding what to implement. They’re lower on the food chain than the critical top level where the business case is decided, which means that these planners aren’t really going to drive transformation, only implement a momentum that develops from somewhere and someone else. If we want transformation to happen, we need to figure out where and who that might be.

Six Data Points on O-RAN

In a single day last week, I saw six news items that demonstrate the market, business, and technology challenge that’s posed by 5G. None of this tumult means that 5G isn’t going to happen, but it seems to me to demonstrate that we’re not yet really sure how it’s going to happen. The who-wins and what’s-offered pieces are still very much up in the air, which of course means that vendors face a period of unparalleled 5G opportunity and risk.

On the service provider side, we have an interesting mixture. India is likely postponing their 5G auction but approved 5G trials. This was done to allow bidders to arrange the financing of spectrum deals that were expected to be costly, as they’ve been in other markets (recently, the US). Then Malaysia invited bidding for their planned “wholesale 5G” network, one of the first examples of a government attempt to create a 5G service model similar to the wireline-focused NBN in Australia. Obviously, there’s a fear that reliance on traditional carrier-funded deployments of 5G might disadvantage the country.

The point here is that 5G isn’t free for providers, so if consumers are unwilling to pay more for it, deployment is a financial balancing act. In countries were cellular services are highly competitive and where the total addressable market is reasonable, operators are able to stomach the cost because network expansion via 5G technology is essential in preserving their market credibility. Elsewhere, things aren’t as rosy, but even where 5G is budgeted, there’s an awareness that the cost might put operators under financial stress for years.

This is the principle driver behind the interest in open-model 5G in general, and O-RAN in particular. Sure, O-RAN supporters may have the same hopes for incremental 5G revenue from 5G business services, network slicing, and the ever-popular sell-5G-to-things IoT space, but they recognize that they can’t count on any of that to build profit margins on the service, so cost reduction has to be the rule.

There are a lot of possible ways that open-model 5G could reduce cost and risk for operators, and Dish Networks has focused on one in particular, the hosting of 5G components in the public cloud, specifically Amazon’s AWS. This reveals a risk and cost source that’s especially problematic for 5G operators who don’t have extensive wireline facilities to tap for locating servers. 5G needs hosting, and in particular what would be considered “edge hosting” for real-time support of some service functions. Dish doesn’t have the real estate and doesn’t want to make the investment in data centers, so grabbing cloud provider support is a good choice.

An open-model counterpoint is T-Mobile, who is a very aggressive 5G provider with aggressive expansion plans to other spectrum bands. Their CTO has been a bit skeptical about O-RAN, and because T-Mobile/Sprint has traditional 4G technology in place, they’re more inclined to expand on current technology and vendor commitments. Verizon and AT&T have taken the other tack, with at least significant O-RAN interest, but for established wireless network operators, it’s obvious that O-RAN isn’t an automatic winner.

T-Mobile’s fear is based largely on the “not-ready-for-prime-time” risk. Telcos have been traditionally slow to accept open-source technology (I was part of an initiative in the TMF to promote open-source to the telco community, showing that it needed promotion to succeed there), and new open-source technology is even harder to stomach. However, T-Mobile’s decision was made last year, when there were no large and highly credible O-RAN providers. We have O-RAN commitments now from VMware (who I think has the best story) to IBM/Red Hat, Microsoft/Metaswitch, Mavenir, and more.

Best-of-breed thinking seems to be fading in the O-RAN space. A big solution source is important because telcos hate to have to do integration, and hate to have to deal with finger-pointing when a problem occurs. These two issues are in fact the largest barriers to a near-term O-RAN commitment, but they’re not the only barriers. We have an unsettling amount of telco-think baked into 5G in general, and into O-RAN.

O-RAN, like 5G, is explicitly linked with ETSI NFV, and that could be highly problematic for a number of reasons. Top of the list is that O-RAN is really (as I’ve suggested above) an edge computing application, which is a cloud application. NFV isn’t a cloud-centric specification, it’s a network-centric specification, and it was designed from the first to replace physical network functions (boxes) with virtual network functions. O-RAN never presumed boxes, nor does 5G, so IMHO the right move would be to link to cloud specifications and practices, not NFV.

NFV was also dominated, early on, by per-customer services. If you’re a box replacement strategy, it makes sense to focus where there are a lot of boxes, and so many of the early proof-of-concept initiatives in NFV were aimed at business services and the replacement of CPE by virtual CPE. 5G is a multi-tenant infrastructure, and the majority of users aren’t business users. Cloud computing’s technology has advanced in no small part because of social-media contributions to software design, which are targeted at the consumer market.

Right now, O-RAN and 5G are hampered by the fact that operators and some vendors are trying to validate funding and staffing a decade-long slog to NFV specifications even when the goal isn’t worth the effort. Interestingly, there have been some examples of a technical end-run around the logjam of NFV. The general edge computing opportunity is at the heart of this.

Edge computing is a form of the cloud in the sense that it’s a resource pool. However, it’s a local resource pool by definition, since “edge” means “close-to-the-user”. It’s likely that for a given geography, “edge” resources wouldn’t be concentrated into a giant cloud data center, but distributed across a lot of locations. Early on, many believed that virtual-machine technology wasn’t the right answer for this, and even NFV diddled with “containerized network functions” or CNFs. That would imply Kubernetes for lifecycle automation rather than NFV’s MANO and NFVI, and Kubernetes may need tweaking for that to be practical.

I think we’re still behind the duck with this question, though. The more important consideration in edge computing is how to frame the platform software so that it supports not only O-RAN and 5G Core, but also other future edge-justifying applications, like the mysterious and venerable IoT. The edge, for all these applications, has to be very cloud-like, and NFV is not that, nor is it ever likely to be that. It’s critical, if O-RAN is going to be the on-ramp to edge computing, that O-RAN promotes the right platform software to support edge opportunities in general. That means it should be not only cloud-centric, but cloud-native. More on that in a blog later this week.

The hope that we could somehow synthesize a workable 5G model for the edge out of all of this seems a bit tenuous at this point, but it gets worse. The media has discovered 6G (with an assist from some vendors), and while it’s easy to dismiss this as another press excursion into neverland, the truth is that there is preliminary work being done on 6G, and that could be a major problem. Standards in 2026 and deployment in 2028? How long has 5G taken, gang? The reason this is a problem is that we are surely at least two or three years from truly effective full-scale 5G implementations even today, and if we were to believe that by 2026 the next generation would be here (on paper), why bother?

At some point, realism has to win over publicity, or wireless, 5G, 6G, and edge computing could all be threatened. Or maybe everyone will just turn their back on telecom standards and assume they’ll cobble together networks from cloud technology. Well, they could surely do worse.

Openness and Interchangeability in Hosted Service Elements

5G specifications have addressed the growing interest in “alternative technologies” in networking, embracing things like NFV and (some say) SDN. What is less clear is how effective the initiatives will be in the real world. A lot will depend on how vendors interpret the specifications, and in particular how far we take the concept of “abstraction” or “intent modeling” in defining implementations.

If you look at 5G or O-RAN diagrams, one thing that’s clear is that they usually represent boxes within boxes, meaning that there may be a big outer box, some middle-level boxes, and finally some detailed boxes. The traditional implementation approach would be to treat the outer boxes as simple functional groupings and implement the innermost boxes. In open-model 5G, the advantage of this is that it assures that in any implementation, the smallest functional elements are interchangeable.

Yes, that’s good, but we should have learned from past work in the telecom space that specifications can lead you astray. In the case of 5G, there seem to be three risks that must be considered.

The first (and easiest-to-understand) is that the specifications are calling out a sub-optimal technology. NFV is my favorite example here; the fact is that 5G is largely a cloud-native application, and what isn’t cloud-native might well be white boxes. NFV is not cloud-native, whatever its supporters say.,

The second risk is that that a given mid-or-high-level functional box could be “wired” to traditional technology concepts when it’s dissected into low-level boxes. 5G specifically calls out IP transport, but does that mean that the transport interface has to look like IP, or that the interior elements have to be routers? The latter would risk ruling out optimal use of SDN, for example.

The final risk is that the specifications, which view 5G as being created by virtual devices, imposes a software implementation model that may not be the right one. If we go back to NFV, we could argue that most of its sub-optimalism goes back to a literal interpretation of what was intended to be (and was stated as) a functional specification.

Let’s look at the O-RAN diagram (from the O-RAN Alliance) as a reference here.

The box inside the gray dashed boundary is O-RAN, and there’s a defined interface between O-RAN and the management and orchestration stuff above. Inside the O-RAN box we have five other boxes, many of which are subdivided into other boxes. One particular box, labeled “NFVI Platform: Virtualization layer and COTS platform”, doesn’t really fit into a functional diagram at all, nor does it present interfaces as the other boxes do. This one diagram demonstrates all three of my risks.

In my view, you have to embrace the notion that 5G hosting is an application of “edge computing”, and in fact is the application that’s likely to drive the baseline deployments there. Few would say that edge computing is anything other than a special case of cloud computing, and yet we’re calling out a network specification for 5G function hosting. NFVI, meaning NFV Infrastructure, is obviously NFV, which isn’t a generalized model for hosting functions. If it were, the ISG wouldn’t be discussing the difference between “containerized” and “virtual” network functions and trying to shoehorn cloud-native into the NFV model.

The next point that jumps out to me is that there should be no reason why someone couldn’t implement an entire higher-level box, filling in the interior any way they like, as long as the external interfaces to the box were satisfied. It might not be equivalently “open” in comparison with an implementation that separated every box, but it might offer benefits relative to those completely open approaches. Buyers would have to be aware that they couldn’t substitute things within the higher-level box (so implementations at that higher level shouldn’t show the elements within), but if the benefits outweighed the risks, buyers should have the option.

Another obvious point is that we have a RAN Intelligent Controller (RIC) in two places, one inside the O-RAN dashed-box and the other in the higher layer. If the only difference between these two RICs (the real-time and non-real-time RICs) is the latency of the stuff each supports, should they be separated at all? If they are, for performance reasons, separated, might there not be some mechanism required to coordinate their behavior?

A less-obvious, but perhaps more significant, point is whether some or all of the interior boxes are functions rather than explicitly separate virtual elements. If we were to review 4G Evolved Packet Core material, for example, we’d find that the Packet Network Gateway (PGW) and the Serving Gateway (SGW) are described as separate elements, but some material notes the two might be combined in a single device. Do specifications, including 5G and O-RAN, encourage separation when there’d be a significant benefit to considering consolidation?

The least-obvious and most significant question is whether any of the boxes should be considered separate devices/components at all, or whether everything should be a set of orchestrated microservices, which is what “cloud-native” would suggest. Remember my comment about how a “functional specification” for NFV ended up driving a stake a decade deep into the ground and tying the market to it?

The reason this is most significant is that “openness” implies the existence of fully interchangeable parts. If a true cloud-native implementation of O-RAN doesn’t divide itself into virtual boxes, then we’d have to know a lot about how the APIs and platform tools that hosted everything were defined, in order to know whether two separate parts would interwork. Do all microservices in cloud-native O-RAN have to be interchangeable?

We’ve still got way too many network thinkers involved in this and not enough cloud-thinkers. In the cloud, it’s critical to get missions aligned with technology early on. If we understand the basic notion of a “service” as a set of coordinated behaviors presented to a user as an experience rather than as pieces of discrete technology, we understand that coordinating the behaviors is central. We can dissect the behaviors, and the coordination, from that starting point. What the past proves is that when we try to describe something functionally, with no specific notion of that coordination to guide us, we tend to create an implicit implementation that we might fight for a decade or more to overcome.

Just as service lifecycle automation is inherently event-driven, so service lifecycle coordination is inherently hierarchical. We therefore have to assess just how far down the hierarchy we’ll push the concept of interchangeability versus simply openness. Is a black box representing all of 5G Core, presenting all the external interfaces, “open” if it can be substituted for another black box similarly presenting interfaces? Even if neither, or only one, can be fully decomposed into deeper interchangeable elements? That may prove to be a big question as we move forward with hosted service elements.

5G, Edge Computing, and the Transactional-versus-Event Debate

I mentioned in yesterday’s blog that there was a profound difference between open-model network software based on the transactional or RESTful model, and software designed to be event-driven. I also said that the difference was critical in selecting the platform software tools to be used to support applications, particularly at the edge. What is the difference, and why does it matter so much. We’ll look at that today.

Let’s start by relating APIs to software models, which is essential because APIs facilitate inter-component communications in software, they don’t define how the software works. My concern is that the publicity that’s been given to “cloud-native” or “event-driven” APIs could foster a belief that all that’s necessary to make software cloud- or event-based is to use those APIs. The opposite is true; the software architecture has to be where we start. The value of a discussion on APIs is that once the architecture is framed out, it’s the APIs that connect the work, and so we can visualize applications through information movement, and thus through APIs.

Today’s common software model is transactional, meaning the software software is designed to support request/response interactions. You have a “user” and a “resource”, and you send a request from user to resource, invoking a response. Absent a request, nothing is sent, and while the user may not “wait” for a response, meaning do nothing until one arrives, that’s a common approach. This is “synchronous” use. There is an option with RESTful APIs to use an asynchronous call, by defining a separate “status resource” that can be queried to get the status of a request, rather than waiting for a response.

Synchronous software design is a bit of a risk because the requester is stopped until a response is received. Asynchronous design prevents that, but the requester still has to keep checking on the status of their request, which is sometimes called “polling”. That’s wasteful, and it can create convoluted software logic if there are multiple things that might be pending.

Event-based systems presume that stuff is happening that needs attention, and that a “happening” can generate an event, which is simply a notification. A software component would normally register interest in a particular kind of event (or events), in which case the event system would post the event to them when it happened. There is no response expected in an event-based system, so the notion of synchronous/asynchronous doesn’t apply. In a simple event-based system, you process the events as they come, but most event-based systems are really event-driven, and they’re not simple.

Protocols, management systems, and other systems that process events usually have to recognize more than one event type. For example, an event might report something was up, or down. Or, it might report a command was processed, or failed. Because event-based systems aren’t request/response oriented, each event has to be related to conditions overall in some way. The same is true, for example, in human conversation. We process speech in context of the conversation and conditions in which it occurs, and events have to be processed the same way.

Context, in event-based systems, is known as state. Most systems that relate to the real world are stateful, and stateful systems have to interpret events in context, meaning according to the current state. To return to our example, a command to activate a connection, issued when the connection is already in the Active state or in Recovery (for example) is an error. In the Inactive state, it’s valid, and should be interpreted as instructions to make the connection active.

The problem with transactional frameworks in real-time systems lies in part with the pace of change. Transactional systems, faced with a lot of change, will queue up things until they can catch up. Because transactional software is usually designed with limited ability to scale capacity, the queuing delay could be considerable, which means that there may be multiple things in the queue relating to the same real-world process, and the current transaction may have set a while too. The state of the system may have changed in during all this sitting around.

This is a particular problem in service management systems and even provisioning systems, where software has to accept external commands (traditionally recognized as transactions) and at the same time reflect the state of the stuff that’s being managed, which is traditionally an event-driven process. Often the software will divide itself into two pieces, which increases cost and complexity. Why not simply use event-based systems for everything?

With event-based systems, each event triggers a state/event table lookup and dispatches a process. There can be only one state/event table for any given real-time system that’s being managed by the software, but any number of processes could use the table (with proper locking) and the state/event intersection’s process could be any instance of that process, if the process is designed to get all its data from the table and the data model that contains it. Further, only a little process needs to be run, something that would likely take little time.

Event-based systems can process transactions as events, with the state/event tables providing the context. In fact, cloud computing in serverless form does this, and many of the social-media applications we run regularly use event-driven software to support what looks to us like transactional behavior. So event-driven software can behave transactionally, but it’s extremely difficult to get transactional software to behave in event-based applications.

The notion that 5G and O-RAN hosting is an on-ramp for edge computing deployment makes this transaction-versus-event issue critical. Whatever edge applications may be, the reason they’re typically classified as “edge” versus “cloud” applications is latency, which at least strongly implies that the applications have a time-critical link to the real world. The real world is event-driven, and that’s a simple truth. Edge computing has to be, too.

To circle all this back to 5G and O-RAN and network lifecycle operations in general, almost everything that needs to be done in these areas is naturally a state/event activity. Every protocol is that, and the great majority of protocol handlers are implemented that way. Every interface point defined in 5G or O-RAN should be viewed as protocol handlers, and thus as event-driven applications. That’s apparently a problem for operators, because all their initiatives on virtualization and management have gone the other way.

NFV and ONAP are both designed as transactional systems and not as event-driven systems. That means that they’re not readily adapted to the kind of elastic adapting to workloads or faults that characterize cloud-native design. The question is whether vendors, who aren’t shy about making their own cloud-native claims, are going do slog down that same path. That’s a particular risk for 5G because the 5G specs tend to call out NFV implementation, which literally interpreted would take the software in a sub-optimal direction.

Both Red Hat and VMware have made recent O-RAN announcements, and both retain an NFV flavor in their story. That may be inevitable given the direction the 5G/O-RAN specs have taken, but you can support NFV while allowing alternative models. That may be critical, because while we can base 5G (imperfectly) on NFV, we can’t easily evolve NFV-and-ONAP-flavored practices to generalized edge computing applications. If one of 5G’s critical missions is to facilitate a transformation to edge computing, then we might expect one of these cloud-software giants to figure that out, and frame a realistic software model that better exploits the cloud. Whatever the edge is in technical terms, it’s functionally an extension of the cloud and not the network, and it’s got to work that way.

A 5G O-RAN model has to be, first and foremost, a cloud computing model, or we risk developing “edge resources” that aren’t applicable to future edge applications. It’s ironic that operators and network vendors, who should have more experience with event-driven applications, seem to be having the most problem with edge-think these days. They’re going to have to get over it if they want to continue to fend off the cloud-software competition.