Is the “Age of Integration” Upon Us?

We’d love to think that advances in the tech sector come about because of successes, but what might be the most consequential advance these days is a result of failures. Integration services are hot already and getting hotter, and according to enterprises the reasons lie mostly in faults in the way we create and explain technology, but we can’t fix that problem quickly. What do enterprises do when they need technologies they don’t understand? Integration services.

Enterprise data over the last three decades tracks some very interesting shifts, but none more than the level of technology dependence of enterprises versus their technology literacy. Thirty years ago, the number of avowed enterprise network decision-makers and the number of specialists who were fully conversant in the technologies they considered was roughly the same. Today, the number of decision-makers is four times the number of tech-savvy people.

An equally interesting statistic comes from the literacy of technical staff, decision-makers and support people alike. Twenty years ago (as far back as I have the data), three-quarters of enterprise technical personnel said they were “fully qualified” in the technologies they currently supported and were evaluating. Today, only 18% say that. Twenty years ago, just under two-thirds of enterprise technical personnel said they “fully understood” the current state of network technology. Today, only 11% say that. The explanation for both shifts, according to the workers themselves, is that sheer number of advances and changes in our industry.

That may not be fully accurate, though. My data strongly suggests that the primary reason for a slip in tech-savvy is the shift to ad-sponsored tech media. This shift has given vendors enormous influence, because of course only sellers buy ads. The increase in vendor influence, coupled with the Sarbanes-Oxley legislation that sought to eliminate hype-driven tech stock prices, shifted the focus of media to one of short-term sales of point products rather than to broad technology evolution. But I’ve blogged on this in the past and I don’t propose to reprise the topic here. Instead, I want to look at the result.

An enterprise who buys a network architecture and the associated products buys a total solution. An enterprise who buys a network product is depending on integrating that product into a solution, into a network architecture. The same, of course, is true of computing. When buyer tech literacy was high, we could expect that buyers understood the architecture, the products, and the integration. When it dropped, integration became an independent requirement, the only alternative to which was staying the course with the technologies and vendors already deployed.

Even vendors sometimes need to promote technology shifts, and so big vendors initially filled the role of integrator, and they benefited because they were able to pull through their own product suites through their integrator role, the classic “Camel’s nose” situation. Eventually, though, vendors realized that they could actually sell integration services and still pull through their products, and virtually all major network vendors now do that, but even that level of integration isn’t keeping up with buyer needs.

Networks are a lot more complex these days, despite the fact that we’ve shifted away from private routers and trunks to VPNs. Not only that, the justification for network budgets is increasingly intertwined with compute (cloud, servers, and software) elements in new projects. Finally, the growth of interest in open-source software and white-box devices has introduced product elements that are to a degree “vendor-less”, and so there is no vendor integration support available, for free or for a fee. There have been specialized integrators in tech for decades, but there is strong indication that dependence on these companies is growing, and may be about to explode.

Twenty years ago, only 6% of enterprises in my survey indicated that they had considered using a network or systems integrator. Early this year, that number had grown to 41%, and when you asked whether they believed they would seriously consider using an integrator within the next five years, 88% said “Yes”. The number one reason was cloud computing, and of course widespread cloud adoption introduces significant potential for changes to networks. For networks specifically, the top reasons for expecting to need integration were open-model elements, multi-vendor networks, new technologies, and expanded business scope, in that order.

Vendor-based integration is challenged to support these issues, particularly if the enterprise is trying to move in a direction (like open-model networks) that vendors don’t support, or if the driver of change is something a network vendor probably doesn’t have skills in, like the cloud. The value proposition for third-party integration services for both networking and computing is clear; you can obtain skills you need and don’t have, from an objective source. The risks are that the integrator doesn’t have the skills, isn’t objective, and is going to add significantly to your cost.

Data on all this is a bit sketchy, but what I have says that the rate of project failures for projects depending on third-party integrators is almost 20% higher than for projects advanced using staff skills. Cost overruns on integration projects occurred at 1.8 times the rate of projects based on staff skills. This in contrast to vendor integration, which resulted in no statistical difference in either project failures or cost overruns versus staff-skill-based projects. Some of this may be due to the increased complexity of the projects most likely to involve third-party integration, but I’m skeptical that’s the big reason.

I’ve audited integrator projects for enterprises and a few network operators, and what I found was that at least half of all deliverables in the planning phase were boiler-plate. The insights that were delivered were little different from what could have been generated by the company’s own staff, which indicated that there was less of a skill difference between integrator and enterprise than the former group presented and the latter one believed.

All this seems to add up to a conclusion that integrators (system or network) aren’t generally delivering what’s expected of them. That has certainly been true in the past, but there is at least some indication that it’s less true today. The rate of project failures and cost overruns from 2000 to 2018 actually went up slightly year over year, but from 2018 to 2022 they stabilized and then began (in mid-2021) to drop just a little. My data suggests that the reason is that enterprises have gotten better at managing projects involving integrators, and better at writing contracts for integration services. That may now be inducing integrators to raise their own skill levels.

The big question is whether that positive news can be sustained as enterprise integration needs grow. Salaries for technical staff among system and network integrators is typically almost 20% higher than among enterprises, and so enterprises may lose skilled people over time. However, vendors have been the largest source of competition with enterprises for skilled people, and many vendors have been slowing their acquisition of integrator talent. There hasn’t been much migration of skills from integrators to somewhere else, but in 2022 I have some limited data that suggests integrators are now starting to lose skilled staff, mostly to vendors but sometimes even to enterprises.

The best news, though, may be the fact that the adoption of open-model technology (as opposed to vendor-proprietary) is the largest current driver of integration services. We already have open-source software distribution through major firms (Red Hat, VMware, HPE, and more), and these firms are increasingly creating mission-specific ecosystems from combinations of open-source tools. They are also increasing their own efforts in the integration space, and if open-model technology remains a top driver of integration needs, they may ultimately be the most significant players in the space.

Less so in networks, of course, and I think that the big challenge in networking is that there is no real “open-source software” giant providing white-box software. However, I think it’s inevitable that VMware, because of the Broadcom acquisition, will end up doing that, and that would mean that Red Hat, HPE, and other vendors who offer open-source would likely be pressured to follow suit.

Managed services are the final element in network integration, but could become one of the most important. Physical networks tend to be partitioned by provider, since most wireline providers serve a specific (often national) geography. Since enterprises are increasingly global in their business scope, they need to create their WAN by compositing various provider networks. In many cases, this means crossing over to use broadband Internet access rather than MPLS VPNs, and that means SD-WAN. Since supporting the smaller remote SD-WAN sites is difficult, enterprises turn to managed SD-WAN services.

Managed service providers (MSPs) who offer SD-WAN VPNs would be happy to displace some MPLS sites, too, and also to expand their support into office LANs in the locations they offer SD-WAN service. If this trend continues, and enough sites are converted to SD-WAN, it could mean that most company sites are connected by MSP services, and no additional integration is required.

No single approach to integration is going to relieve enterprise issues relating to acquiring and retaining skilled people. White-box adoption, including SDN adoption, is likely to depend on how much open-source software vendors are prepared to support white boxes. Multi-vendor network integration depends on either the primary vendor or a third-party integrator, and Internet and SD-WAN integration are probably best obtained in the form of managed services. It’s tempting to wonder whether we’ll need an integrator to integrate the integrators, but enterprises’ choices in building networks with staff skill limitations will only expand, so I guess we’re destined to find out.

The Architecture for a Separate Control Plane in Networks

I blogged last week about the value of, or perhaps the necessity of, separating the control and user planes in 5G. The main point I was addressing was that servers and cloud software aren’t really optimal for pushing packets at high speeds and high volumes. If the control plane were broken out, the user plane could be supported via routes/switches in proprietary or white-box form, and the control plane supported as an application. What would the software architecture to achieve this look like?

The key consideration in this separation is that, because control-plane traffic consists of episodic messages, service logic in general is essentially event-driven and thus something we already have experience handling in the cloud. It should be possible to use a variety of cloud-based services to support service control planes, and that raises the question of what the optimum service would look like.

Finally, we have the question of “five nines”. Everyone says that telcos expect five-nines availability, and that’s almost surely an exaggeration, but it is possible that services demand a higher level of availability than traditional transaction or event-handling applications. Is that true, how much more available are they expected to be, and what might be done to make them more available?

The base architecture for a separated control plane could be visualized easily as a control-plane element set that interfaces with a user-plane element to create what looks like single-box behavior. The 3GPP model is then decomposed into these “pairings”, one for each type of 5G device. Assume we have a number of flow switches, which are the foundation of our user plane, and a control-processor that hosts the CP functions associated with those switches.

One thing that’s immediately clear is that the breakdown of “functional elements” like CUs and DUs in Open RAN is unnecessary and maybe destructive. We’ve seen all too often that functional diagrams created by carrier standards groups all too often get implemented with a device (real or virtual) per functional block, and CU and DU could be represented as a flow switch and a couple cloud services. Also, since the 5G RAN front-haul and mid-haul portions are supposed to be low latency and involve a very small number of connected elements (A DU/CU would connect only to the stuff in the next-lower layer of the Tier structure toward the core), we could presume that these were all supported via almost-static routing, meaning very simplified handling would be sufficient at the user-plane (IP network) level.

But that’s not all. The 3GPP specifications for 5G backhaul (CU to 5G Core) have even more functional boxes, but still separate the user plane and the control plane. It’s easy to visualize the former as flow switches because even there, the number of possible addressable destinations for any given element is limited.

And yet more. Recall that the “user plane” is IP, and that IP also includes a data plane and control plane. Control packets handle things like topology/route management, and it’s fair to ask whether this lower-level control plane also lends itself to separation. It does, and in fact there’s already been an argument to separate it, with classic OpenFlow SDN.

In an SDN network, flow switches’ routing tables are maintained by a central controller, rather than created on-box through topology exchanges with neighbors. The element offering central control over routing could be combined with the “application” that supported the combination of 5G RAN and Core functional elements to create a single application, likely made up of a number of components. There are a lot of operators and vendors saying that control-plane activity is a for-sure microservice application, and many cloud providers are suggesting that it’s also “serverless” or functional.

What it really looks like to me is a kind of service mesh application, meaning microservices combined with sidecars and a controlling element that ensures messages flow reliably. That architecture was described in THIS ARTICLE about an implementation of a highly reliable event-handling system created by Atlassian for its Tenant Context Service. I think something very similar could be done with Istio or Linkerd, but given the tendency to interpret telco functional diagrams as component models, I doubt much thought has been given to that notion.

This illustrates the challenge we have with the a standards community that’s box-centric. In NFV, the “functional block diagram” was interpreted as the literal structure of the software. That’s bad enough when we’re talking about legacy server-hosted software, but it’s awful when the target of deployment is the cloud. Cloud people talk about functional architecture in a totally different way. The cloud, in their terms, is a kind of floating reservoir of functionality that events/messages draw on. Not only does having all these little functional (virtual) boxes not translate into something that describes the cloud implementation, it often constrains the optimum way of approaching things. In my view, both the ETSI NFV ISG and the 3GPP have created a 5G model whose description (as a bunch of boxes) interferes with the stated goal of cloud compatibility.

A good example of this problem is the concept of the interface. What standards diagram these days isn’t replete with interfaces with cryptic labels consisting of a letter (or letter/number combination) and a subscript? But “interface” is really a box concept. A microservice is a function, and a function is a message processor. You send a message to a function, you don’t send it over an interface. The interface is the IP network connection. In a cloud implementation of 5G what you should be describing is message types and processing functions. But you can’t have a functional block diagram without blocks and connecting lines, and thus we fall into interface chaos.

What about the software concept of an API? APIs really describe the protocol and format of the event/message exchange. But when we show a box with three or four connecting lines that are described by interfaces, are we saying that this is a software component that exposes those APIs? Hopefully not, because we should be talking about a set of software functions, each having an API that represents one of those lines.

You can dismiss all of this as software-guy ranting against hardware-people-think, but that’s not my reason. The cloud has capabilities, benefits, inherent in the cloud model. We can realize those if we adhere to that cloud model, but if we constrain our software to act like boxes, if we create software that’s a set of virtual network functions that map 1:1 to boxes (physical network functions), then we are not building the cloud model. In that case, pushing the VNFs into containers or deploying them with Kubernetes doesn’t create a cloud implementation, and we can forget creating the kind of infrastructure everyone says they want and need.

The Shape of the Future Robot

There’s no question that AI is important. There shouldn’t be a question that robots is also important, but Amazon’s long interest in robotics, it’s Astro proto-robot, its desire to acquire iRobot, and the rumors I’ve heard that Google, Microsoft, and Meta are all looking at robots should be proof enough.

Amazon’s Astro and the rumors I’ve heard about the other three vendors’ programs suggest that the majority of home-robot interest focuses on a device rather than on what most of us would call a “robot”, which is something anthropomorphic. In fact, the majority of technical people I know would define the ultimate robot as the marriage of AI technology for smarts, and a humanoid form.

The marriage of these two concepts, which is what Tesla proposes to do with Optimus, a humanoid robot, could be arresting, but as usual the qualifier here is the critical point. The original Optimus unveiling showed a robot that one of my friends who saw it described as a “patient recovering from a paralyzing injury”. Elon Musk said that Optimus would learn to walk and to behave with a lot of autonomy, but maybe three to five years out. Still, the mere possibility that we could actually have humanoid robots raises a lot of hopes, and a lot of hackles too.

A price tag of twenty thousand dollars means Optimus isn’t going to be a fixture in every household, at least not immediately. However, that’s less than the cost of the average car and far less than a Tesla, and people still buy them. Could we actually see millions of Optimi (I guess we have to figure out what the plural of “Optimus” would be) out there? If we did, what would the risk/reward balance look like? It depends on the degree of autonomy that Musk can actually achieve, and what others (perhaps Amazon) might do in response to Tesla’s moves.

We are a very long way from being able to create a robot that could actually match human behavioral and functional standards. However, as anyone who’s encountered animals in the wild knows, you don’t have to be human to be dangerous. A chess-playing robot, as I said in a prior blog, injured a boy it was playing against, and it clearly wasn’t even attempting full autonomy. A humanoid, autonomous, robot might have a behavioral range that could include something that closely resembled human hostility. Asimov’s Three Laws of Robotics might end up coming into play after all.

Those three laws, summarized, say that a robot cannot harm a human or allow them to come to harm, must obey human commands subordinate only to the First Law, and must preserve themselves subject to the First and Second. While these surely sound worthy goals, even a moment’s thought should demonstrate that there is a presumption that Asimov’s robots were truly humanoid in “thinking”, since even interpreting what those laws mean and what violation would look like is an exercise in human judgment. If you’re a lousy driver, would your household robot be justified in holding you prisoner to keep you safe? No skiing or scuba either.

The big problem, of course, is that early robots wouldn’t be truly humanoid and couldn’t hope to apply these laws. Musk wants to test his robots by having them work in his factories, and that means that they’d have a wide functional range even though they wouldn’t be able to understand who Asimov was or what his Laws of Robotics mean. How does such a robot learn not to sit a crate down on a human co-worker, or hit one with a 2×12? The challenge Musk faces is that when early robots work in the real world, they have to obey real-world rules without human thinking. We don’t have to be told not to hit someone with a heavy plank or sit a crate on them, but what about our factory robot? The truth is that the number of behavioral rules such a robot would have to obey to be safe and functional would be a major test of AI in itself, and what happens if somebody forgets to tell Robbie the Robot that an I-beam is as lethal as a 2×12?

We have self-drive vehicles, though. John Deere says it’s looking to have fully autonomous farm production vehicles by 2030. It seems logical that we could create a humanoid robot, right? Not so fast. An autonomous vehicle is a much easier nut to crack than an autonomous fully-humanoid robot, because the range of functional behaviors and the number of relevant stimuli are both limited. The same limitations mean that there’s no value to making a vehicle look human; there are other form factors that better suit the mission. Amazon and other Tesla competitors (or potential competitors) in the robotics space have apparently decided that the best strategy is to create a specialized autonomous device for the home that, like an autonomous car or harvester, isn’t expected to do everything people can do and thus doesn’t have to look and act like them.

But that’s not visionary, or even responsive to the broad view of what a robot should be and do. Tesla apparently wants a big jump, but if Musk wants a truly humanoid (human-looking) robot, it follows that it has to have a much broader set of functional behaviors and stimulus sensors than a car, or it can’t act in a way appropriate to its appearance. People expect something that looks like C3PO to behave like that Star Wars character.

If Optimus works in a factory as a humanoid, it’s going to have to take on jobs people could do, and Musk clearly expects that. That’s going to demand something that does a lot more than wave and walk, or it will end up reducing factory productivity rather than increasing it. Human workers who have to dodge 2x12s and I-beams and crates don’t get much work done. Thus, Musk’s goal demands that he somehow address the question of how to address the wide range of things people do, and know to avoid, and I think that’s something he’s underestimating.

I also think that it’s clear that a near-term autonomous humanoid robot would have to be supported by an “out-of-body” AI agent process. In other words, the robot would not have an internal brain, or at least would have only minimal locally hosted functionality. The remainder would come from elsewhere, presumably close enough that the latency associated with reaction to events wouldn’t be an issue. Human reaction time is significant in machine terms, so the control latency of this configuration wouldn’t be an issue.

This approach raises the question of whether Optimus might be working with other Optimi rather than with humans, and whether the central intelligence would then be controlling the entire robotforce. That would significantly reduce the burden of creating a robot smart enough to safely and functionally interact with complex humans in complex situations. But would a “factory robot” need to look like a human? Wouldn’t it be more logical to have specialized robots for specific factory tasks, and in fact for things like the home and garden? We have robots that can pick fruit already, and nobody expects to talk with one of these or have it walk the dog.

There’s a fine line between vision and delusion, and a lot of people think Elon Musk has crossed that line a number of times, but he’s also made good on some pretty astonishing promises. Can he make C3PO real? Maybe he can, but I don’t think it’s going to happen as quickly as we might hope, or as he might believe. Still, I sure hope he makes it!

What’s the Missing Ingredient in Open-Model Networking?

I’ve blogged often on the importance of 5G function hosting to the deployment of edge computing. If operators were to create a “carrier cloud” to host virtual functions for 5G and other service missions, the resource pool created could then be available to host generalized edge applications. That could advance edge computing significantly.

It may the risk that operators would become edge and cloud players that’s been motivating cloud providers to pursue carrier cloud hosting as an alternative to operator build-outs. It may simply be a desire to grab some incremental revenue to sustain cloud growth, and of course it may be a combination of the two. Nevertheless, operators and cloud providers aren’t the only players who are interested, The big mobile network infrastructure vendors are also working in the space, perhaps to accommodate what emerges and perhaps to attempt to influence things in a direction that favors their own business models. Light Reading has a story on this.

None of the mobile infrastructure players are really pushing edge computing products of their own. Thus, they have to see the battle between cloud-provider-hosted and operator-deployed edge resources something that influences their own sales in a more indirect way. The way that seems most obvious is the link between cloud-provider 5G support, Open RAN, and open-model networking in general. Our Things Past and Things to Come podcast for October 3rd talks a bit about open-model networking, but not about this specific linkage.

The 3GPP 5G specifications divide mobile networks into a “control plane” and a “user plane”, as our podcast notes. The user plane represents IP network infrastructure, something that’s not likely to be hosted as virtual functions because custom devices (routers) are far better packet-pushers. The control plane is the smarts of mobile networks, and it’s more like a cloud application set. That makes it a logical target for cloud hosting, whether it’s in carrier cloud or in a public cloud provider. Open RAN initiative, and open 5G Core implementations that similarly separate control and user plane, address a generalized framework for control-plane hosting, and cloud providers can gain traction by adopting open-source software and standards for the 5G control plane. If they succeed, then they encourage open-model networks, and that’s perhaps a threat to the big mobile infrastructure vendors.

Ericsson, featured in the Light Reading piece, may be especially concerned here. While Ericsson has been at least supportive of open 5G initiatives, they’ve not been as firmly linked with them as competitor Nokia. Nokia’s business has outperformed Ericsson’s, which surely puts pressure on the latter company. More support for open-model 5G could end up increasing that pressure, or forcing Ericsson to belatedly jump into the concept with both feet, behind their rival.

It’s not only Ericsson that’s at risk here, though. If operators adopt any cloud form of 5G control plane it could spur integration and a true open-model 5G implementation, providing that the cloud was used to host any suitable implementation of the 5G control plane. If the operators select a public cloud vendor’s implementation rather than simply host software on a public cloud provider, it would have the opposite effect because that could stifle interest in any 5G control plane implementation that wasn’t selected to serve as a public-cloud 5G element.

A true open-model 5G wouldn’t favor the big mobile infrastructure vendors. An open-model 5G defined by the software selection of the public cloud providers could make a mockery of “open” and substitute cloud lock-in for vendor lock-in. Not an attractive set of options, but if one or the other isn’t going to prevail by default, somebody has to do something. Ericsson? We can’t be sure.

Ericsson is quoted as saying that the 5G CU and DU should run in a dedicated server, and that operators like Rakuten “does not recommend placing core network operations – those requiring low-latency connections between radios and DUs – into the public cloud.” The fact is that it’s doubtful the public cloud would be deployed far out enough toward the mobile edge to even carry that traffic. The broader question IMHO is whether a separation of user and control planes could separate CU and DU functionality similarly, and allow for cloud hosting of the control piece. This might require some specialized white-box software to push the bits.

The reason I say we can’t be sure whether Ericsson sees all of this is that the story doesn’t talk about the control/user plane separation, and doesn’t question Ericsson on the topic. But if Ericsson is willing to say that it would use Dell servers to host the CU/DU, why not say they’d support a specialized white-box element combined with a control-plane function hosted elsewhere?

Is it feasible to “run a 5G network” inside a public cloud? Frankly, I don’t think so. The user plane traffic is not the kind of event/transactional stuff public clouds are designed to support. The charges for traffic ingress/egress would be daunting, and making the whole thing reliable enough is probably something that’s not being thought much about, and possibly could not be done consistent with cost constraints. Is it feasible to run the control plane inside a public cloud? Yes, but if we want to do that then we have to view both the 3GPP and O-RAN work as tentative because neither really thinks about white boxes.

White box switching is increasingly a missing ingredient in advanced network infrastructure standardization and planning. We talk about “hosting” functions without regard for the fact that hosting means servers and servers don’t mean a lot of traffic-handling. White box switches could “host” functions too, and any networking standards or strategies that don’t recognize and accommodate that truth is selling the whole notion of open networks short.

The primary requirement for white-box function hosting is less a part of the white box or even white-box platform, but a representation in the deployment and lifecycle management tools to be used. Clearly, if 5G networks are to be “deployed” and managed on open resources, we need to have unified lifecycle operations support or the opex costs could overwhelm capex savings.

The cloud used to be all about x86/x64 hosting. We added ARM and GPUs, and I think it’s inevitable that we add white boxes in as well. The cloud is also the primary driver of network change, both from the supply and demand side, and networks will always be primarily supported through specialized white-box devices. Without adding these to the cloud, at least in terms of making them an element in deploying virtual functions, our concept of the cloud and edge computing could both be crippled.