Is Microsoft Seeing the Transformation Truth IBM Misses?

Microsoft reported an astonishing quarter, and I don’t think it’s a coincidence that they mentioned “cloud” 35 times on their earnings call.  Unlike IBM, Microsoft was aligned with enterprise strategic shifts through 2020, and it showed on their bottom line.  One question Microsoft’s results raises is whether the alignment was accidental or intentional, and if the former, whether that serendipity can be counted on in quarters to come.  Another is what fundamental trends might be behind the good news.

Microsoft’s cloud revenues were up 34% year-over-year, which is surely a good performance, but that growth isn’t out of line with Microsoft’s historical cloud performance.  Microsoft characterized their success as showing that revenues were returning to pre-COVID levels, which doesn’t address the big question, which is why that’s happening.  There are two possibilities.  First, Microsoft is simply riding a wave.  Second, Microsoft sees the future, reads the wave, and is prepping to take advantage.  Which it is will not only determine Microsoft’s fortune, but perhaps determine the shape of the industry.

Microsoft gained in commercial and consumer computing, including PCs, which is also a possible business-as-usual sort of indicator.  If we were to think about the technology story most likely to benefit from WFH, Microsoft ticks all the boxes.  Workers need cloud technology and better home computing, and consumers stuck in the house need some outlet for their frustrations.  Gaming and online content and social media fit the bill.  This is why you could ask whether Microsoft was just lucky.

One comment Nadella makes might be a reinforcement of the “serendipity” point: “We’ve always led in hybrid computing…” is true according to what enterprises have told me from the first.  Amazon has gone after the startups, and Microsoft the hybrid-cloud enterprises.  Certainly, their history in the space benefitted them in the quarter.

The counterpoint to that possibility, perhaps, is this quote from Nadella.  “What we are witnessing is the dawn of a second wave of digital transformation sweeping every company and every industry. Digital capability is key to both resilience and growth. It’s no longer enough to just adopt technology. Businesses need to build their own technology to compete and grow.”  This is a pretty clear statement of the transformational condition that I said (in yesterday’s blog) IBM didn’t get.

Resolving the accident-or-plan point is going to require some deeper analysis.  If we took that second quote from Nadella as a mission statement, we could interpret it to mean that Microsoft believes that 2020 ushered in buyer awareness that their prevailing IT model, meaning the way they use IT to support productivity, needs a revamp.  That revamp will come by retuning the business-process-to-IT relationship, and to make that happen optimally there has to be an improvement in their “digital capability”.

But what about this two-sentence piece: “It’s no longer enough to just adopt technology. Businesses need to build their own technology to compete and grow.”  Is Microsoft advocating that enterprises trash packaged software and roll their own, or what?  There are a number of possibilities.

First, Microsoft knows that hybrid cloud is really about building cloud front-end pieces to legacy business applications, not “migrating” things to the cloud.  Microsoft has a strong development base, with tools and website support, and not only for Azure but for legacy platforms too.  They also seem less inclined to try to push cloud prospects into development strategies that would lock users into their cloud platform.  A development-centric push would also help fend off IBM, whose Cloud Paks are aimed at development.

Second, Microsoft believes that SaaS platforms like Salesforce could be their real competition.  Line organizations are more likely to acquire SaaS than PaaS/IaaS for obvious reasons.  The “digital transformation” Nadella sees might empower line departments because it has to be accompanied by a business transformation.

Third, Microsoft may think that IBM will push packaged-software, vertical-focused, “Cloud Paks”, or that Oracle’s cloud application suites will prove interesting to buyers.  Microsoft, I think, believes that the hybrid cloud model really does require a development focus, but vertical application plays could muddy the waters.  Note that Microsoft plays up the database and analytics features of Azure, and IBM has married “hybrid cloud” and “AI” in their positioning, which could indicate they’re going to take a run at the space.  Microsoft would then need to nail it down.

Can the call give us any hints?  I think it can.  After his broad comments on the cloud, Nadella jumps into a listing of the PaaS elements that make Azure a strong offering.  Remember that Azure has been as focused on PaaS as on hybrid cloud, and a PaaS model facilitates development by providing a set of cloud-ready tools that add capability to applications.  Most of Microsoft’s emphasis is on “horizontal” things like governance, AI, analytics, and security.  Security is theoretically stronger in a PaaS cloud model because the cloud provider has more control over intercloud, inter-component workflows.

Speaking of horizontal, Microsoft’s Office 365 and Teams solutions are a great way to address the new digital transformation.  I’ve seen significant uptick in Teams adoption, displacing Zoom in many cases, in just the last couple of months.  Microsoft is working to facilitate integration between the two products, and also to provide tools to jump into Azure applications from their online office and collaboration platforms.  There’s enough going on, under this scenario, to argue that Microsoft has read the direction of its digital transformation correctly.

You could indeed be justified in believing that Microsoft is planning their success and not tripping into it, but I’m not completely confident.  For one thing, they were vague about their guidance on cloud revenues, while at the same time providing hard numbers in most other segments.  For another, their positioning to prospects and customers doesn’t align as closely to the theme of a new digital transformation age as their earnings call did.

This could mean that Microsoft sees the digital transformation as a relentless, almost-natural, trend rather than something they could initiate, promote, and own.  It’s like “we’re building mills on the river”, because that’s where the water is. It’s a big step to create a canal to bring the water to where you want the wheel to be.  Microsoft might be in that first stage of insight, seeing trends better than (say) IBM, but not in control of them.  That’s consistent with positioning digital transformation to the financial analysts (we can make money on it) but not to the prospective buyers (you’re lemmings marching to the digital sea, and we don’t need to push you along).

I think Microsoft does see the future better than most, and thus has exploited it better than most.  I don’t think they grasp the why and where of the current “digital transformation” trend, though, and that means somebody else could step up and sweep Microsoft out of the way.  Who that might be is simply impossible to say at this point, but we can fairly say that whoever it is will have to exercise naked aggression.  This market isn’t waiting for slowpokes.

Microsoft Corporation (MSFT) CEO Satya Nadella on Q2 2021 Results – Earnings Call Transcript | Seeking Alpha

Why Does IBM Seem to Be Blowing It?

What happened to IBM?  Things were looking good for the computing giant as recently as last quarter, with the acquisition of Red Hat.  Now…well, not so good at all.  The most interesting thing is that it may well be that IBM has failed at the test it should have aced, and that is far worse than “not so good”, it’s downright bad.  Why that is, and why it’s important, has to be related to my thesis about the impact of COVID on IT.

The quarter that ended in July was a great one for IBM, and I blogged about that HERE.  The keynote quote from IBM’s earnings call that quarter was cited in my blog: “One of IBM’s Krishna’s early comments is a good way to start.  He indicated that ‘we are seeing an increased opportunity for large transformational projects.  These are projects where IBM has a unique value proposition….’”  Large transformational projects are IBM’s, said its CEO.

COVID launched such projects, but not in the simplistic way that’s usually assumed.  Enterprises I’ve talked with offered an interesting and unexpected insight when they said that COVID’s specific impact on them was largely over by July 2020.  That didn’t mean that its effects were over, though.  What enterprise planners were saying was that COVID taught them a systemic lesson about their IT strategy, which was that it was hidebound.  They’d allowed themselves to be lulled into a rut of sameness, and had been sustaining that rut by having IT spending focused primarily on “refresh”, which tends to perpetuate current IT plans and thinking.  Beginning in the summer of 2020, they shifted from band-aiding COVID impacts on WFH to thinking about what it should really look like in 2021 and beyond.

Who do you turn to when you want to rethink the whole basis of your IT?  Historically, the answer has been “the vendor with the most strategic influence”, and historically that vendor has been IBM.  History is the past, though, and IBM’s strategic influence has sunk over time.  Sunk, but not disappeared.  IBM still had, in 2020, the ability to punch above its weight because it was seen as a company that thought about IT, really thought about it.

The acquisition of Red Hat improved IBM’s position, not only by broadening its prospect base but also by providing an extension of its software technology to a market that’s far more cloud-centric, the open-source platform-software space.  I think that Krishna was right that users turned to IBM when they recognized the need for something transformational coming out of the COVID mess.  Where he was apparently wrong was thinking that IBM could do something with the opportunity.

The challenge facing both enterprises and vendors like IBM in 2020 was significant.  COVID showed that traditional prospecting, sales, fulfilment, and support had too many steps and people involved, which made them vulnerable to things that reduced or immobilized the workforce.  However, traditional flows that were vulnerable to COVID were also vulnerable to competition.  COVID retuned buyer expectations as well as seller practices.  People are far more likely to order online than before.  They rely more on web support than before.  Those changes mean that vendors who continued and even expanded the trends COVID created could expect competitive advantage.

The pat strategy for COVID and beyond is the cloud, but as I pointed out yesterday, buyers quickly realized that moving everything to the cloud, or even moving a lot more than they’d moved to date, created significant compliance and cost problems.  Thus, simply waving the cloud magic wand over the buyer wasn’t going to cut it.

You could say the same for AI, but perhaps even more so.  AI is a broad classification within the even-broader area of analysis tools.  Yes, the new age that started the end of last summer needed new analytics to support the new flow of information and ordering, but enterprises generally believed (and still believe) that this is possible within the framework of their current software and databases.

This pair of realizations is the core of IBM’s challenge.  They saw themselves as hybrid cloud and AI, and while those strategies could certainly be used to help address the massive post-COVID shift in business-process-to-technology relationships, just having the tools didn’t automatically generate the desired result.  IBM, with decades of experience in mapping business needs to IT resources, should have been able to make the connection for buyers, but they couldn’t, for three reasons.

The first reason is that IBM’s knowledge of businesses has always been based on dedicated account teams.  I remember my early programming experience, mostly in IBM shops, and I remember that every one of my employers had an on-site IBM team who represented IBM’s interests and worked to engage IBM solutions wherever a problem or opportunity was presented to the user.  That kind of account support isn’t the norm today, nor is the level of strategic engagement that IBM used to have with line organizations.

The second is that IBM, like other vendors, tended to evolve to a product-aligned organization.  That’s great if the buyer wants a widget, because the widget group can be easily identified and engaged.  What about when a buyer wants a different business process?  A process that likely cuts across a swath of products?  The fact is that “hybrid cloud” and “AI” aren’t even particularly connected in terms of application, but those were IBM’s strategic priorities.

Reason number three is that IBM hasn’t managed to integrate Red Hat strategically, or even lay out a clear roadmap that leads to that outcome.  In fact, Red Hat itself hasn’t done a great job of positioning itself strategically.  The key line from their website home page is “Clouds that compete can still connect”, which implies that multi-cloud is a strategic goal.  Rival VMware does better with “Own Your Path to the Future: Run any app on any cloud on any device with a digital foundation built on VMware,” but even that fall short of addressing that now-pressing need to transform the relationship between IT and business processes.

OK, you can say I’m seeing the same troll under every bridge here, but I think the fundamental issue that links all three of these points is the lack of a top-down sense of the buyers’ own business cases.  There is no business drive to adopt technology, there’s a drive to utilize technology to solve business problems.  You can’t present the tools in a heap at the buyer’s door and hope they’ll figure out what to use.  The IBM of my early days in programming would never have done that.  The IBM of today isn’t going to get away with doing it, nor is any other vendor.

We’re on the verge of a real transformation here, one that was launched by COVID but is being sustained by the fact that companies now see their business practices are rooted in a world of handshakes and long supply chains, and we can’t count on that world now.  How many other worldviews will businesses have to abandon to face the future?  There’s no answer to that, and so the only responsible step to take is to increase information portability and workflow agility.  IBM should have known, and so should everyone else.

What’s Behind the Negative Shift in Cloud Attitudes?

Why are enterprises more skeptical about public cloud these days?  Vendors and Wall Street research have both noted the attitude.  Why?  We just came off a period of record consideration of increased public cloud use, and all the pundits are saying that the future is the cloud.  Of course, pundits have lied to us for decades, and sometimes “consideration” identifies as many issues as benefits.  Is the cloud losing its luster, and if so, how could we get that luster back, at least to the point of a realistic view of public cloud as an IT option?

According to enterprises, the biggest problem the public cloud has today is overpromotion.  One CIO told me that “When we opened our assessment of public cloud computing, we had done our research, or so we thought.  What we found was that we knew nothing, and were being fed nonsense.”

Two-thirds of enterprises say that the view of public cloud computing presented in the media is “totally inaccurate”.  They say that it’s believed that public cloud will replace the data center, and enterprises are almost unanimous in saying that’s not the case.  However, CIOs and IT planners spend a lot of time arguing with line organizations and the CEO/CFO about that point.  Senior management’s view of the cloud is largely set by media/analyst material, which the technical people actually doing the planning say is likely to present a universal-cloud vision that simply doesn’t fit reality.

That’s not the end of the overpromotion.  The public cloud providers, say enterprises, are rarely satisfied with selling what enterprises really want (and have wanted all along), which is hybrid cloud.  The cloud is an elastic front-end element that’s an essential piece of “application modernization” according to the technical staff.  What they get from cloud providers is something like “Sure, but when do you think you’ll move the rest of your work to the cloud?”

There shouldn’t be any secret to the fact that enterprises don’t want to get rid of data centers, not for reasons of lack of sophistication or technology skill, but because you can’t make the business case for it.  As public cloud providers got more exposure to decision-makers during 2020, their overpromotion drove broader consideration of public cloud, and created more concern among CxOs that the cloud providers were manipulating them.  In December of 2019, only 14% of enterprises told me that their public cloud providers were manipulative, pushing unrealistic missions, and trying to lock them in.  This, compared with 52% who said that their traditional IT vendors were doing that.  In January of 2021, 67% of enterprises thought their cloud providers were manipulative, and only 50% said that of their IT vendors.

This issue may be related to the second point that enterprises raise about the public cloud, which is that the cloud providers aren’t helpful in framing public cloud in hybrid missions from a technical perspective.  A CIO put it this way: “We get a lot of advice in how to optimize our applications with cloud features, but not much about how to connect the new front-end piece with our current business applications.”  This is likely why more and more enterprises seem to be turning their cloud planning around, starting from the data center and working outward.  That, in turn, may be why IT vendors are now getting more respect.

If you don’t have a hybrid mission, of course, you have a data center replacement mission or you forget the cloud completely.  Neither of these are realistic options to enterprises.  Not only do they not want to replace their data centers, they don’t want to rearchitect or replace their current mission-critical applications either.  The front-end mission of the public cloud has been the line of least resistance for the cloud all along, and the line that most cloud providers were pursuing.  In 2020, those providers got greedy and unrealistic.

There’s still a problem, though.  We saw in 2021 a major uptick in distrust for the public cloud providers.  They’re overcharging, they’re locking me in, they don’t have the reliability they claim, they don’t understand my goals…you get the picture.  At the same time, we saw only about a 5% decline in distrust for the IT vendors (52% to 50%).  Why didn’t the reputation of IT jump as that of the cloud providers fell?

“These guys don’t know anything about each other,” a CIO complained.  Sure, the cloud guys were trying to grab all the money, and sure the IT guys were resisting, but IT vendor resistance wasn’t addressing the problem much better than the cloud providers were.  Remember, the mission of enterprise IT planners was simple from the start—use public cloud services where they made business sense.  That was up front, near the GUI, where increased interest in smartphones as worker appliances and application portals were in demand to support customers and partners.  Not in the back end, which is why “hybrid cloud” was the strategy from the first.

It’s still the strategy.  Enterprises have broadly declined to move mission-critical applications to the cloud for reasons too numerous to cite here, and in my view, their decision is the right one.  What do we do to support it?

Anyone who ever played “whip” as a child knows that the people closest to the point of pivot move less and those on the end move more.  The elastic benefits of public cloud services are great for the dynamic edge but not for the core application.  The real question, a question neither cloud providers nor IT providers can answer, is where along the path of the whip a particular business would find the optimal transition point between cloud and data center.

We can do elastic deployment in the data center, after all.  What we can’t do is distribute it geographically and scale it almost infinitely.  That’s OK where we have data center legacy transactional apps doing their thing on a very limited number of places.  The elastic elements closest to the handle of our whip don’t need to move far.  Further out, we might benefit more from elasticity, so the real focus of hybrid cloud planning should be the portion of the whip somewhere around midway to the tip.  How do we decide exactly where, and what technology tools would optimize the coupling between the cloud and data center pieces?

This shouldn’t be rocket science, technology-wise.  It might require some realism and discipline from both cloud providers and vendors, and it’s sure going to be difficult to promote something this technological and (if we’re frank) boring to the media/analyst communities.  Without some PR, can we popularize the solution, even if we manage to find it?  That’s probably the trade-off of our age.

Cloud computing isn’t the only place we have this problem.  Edge computing, 5G and private 5G, artificial intelligence and machine learning, IoT…the list goes on…all present the same challenges, the balance between the need to inform buyers of solutions but at the same time support realistic paths to those solutions, however complex and boring they may be.  The cloud in 2020 showed us there’s a price to getting it wrong, but there’s still time to get it right, for the cloud and for the other kingpin technologies on the horizon.

Why Millimeter Wave May Be the Path to 5G

Why is it that the focus of 5G discussions is a space that’s relatively unimportant?  Why are we largely ignoring the most promising piece of 5G?  I’ve been talking to enterprises, operators, and vendors on 5G for months, and over the last month I’ve focused on this underrated piece—millimeter wave.  Here’s what I found, and why this particular technology option should be front-and-center for anyone looking for a 5G revolution.

I got interested in enterprises and millimeter wave when I was doing a quick survey of enterprise attitudes on private 5G, something I blogged about last week.  The problem I found was that enterprises were getting a technology-centric vision of private 5G, not a goal-centric vision.  Given that they needed business cases to adopt something, this was leaving them without a realistic path forward.  I started talking about the enterprise potential of millimeter wave, and found that most had never been pitched on it, but most could in fact see a realistic mission set that could drive its deployment.

There’s also a real mission for millimeter wave in public 5G applications.  I’ve run the numbers with many operators, and they’ve told me that there is real potential for “significant” deployment of millimeter wave.  In some areas of the world, the potential is described as “compelling”, and yet even combining this with enterprise interest doesn’t seem to be promoting millimeter wave as a concept.  That’s why I propose to look at it in more detail here.

There are two broad flavors of 5G technology.  The one most people have heard of, and some are even using, is the mobile version.  This is designed to support smartphones and other mobile devices, and is a logical evolution from earlier cellular mobile standards, notably 4G/LTE.  The other one is millimeter-wave.  Unlike mobile 5G, millimeter wave is really aimed at replacing fixed wireline broadband, particularly the relatively low-speed DSL.  It’s an alternative to fiber to the home (FTTH) and a competitor to cable.  However, you can in fact support mobile users with millimeter wave, as we’ll see.

Millimeter wave is called that because the wavelength of the spectrum used is very short—millimeters.  For those who care, a millimeter is 0.03937 inches.  Most people are more accustomed to seeing frequency measurements in Hertz, or in the mega- or giga-Hertz variant, so millimeter wave is generally considered to be between 30 and 300 GHz, where traditional mobile 5G would typically be using spectrum below 5GHz.

The disadvantage of millimeter wave is that as radio frequencies get higher, they tend to act like radar, bouncing off stuff instead of going through it to the customer.  Millimeter wave doesn’t go through buildings or even heavy foliage as well, and so it’s generally considered to be a shorter-range technology, good for up to perhaps five miles depending on things like antenna heights and large obstructions.  It’s also limited in its ability to support direct-to-user connections within a building.

There are three advantages that can more than offset these issues, for at least some missions.  The first is that the information-carrying capacity of spectrum is directly related to its frequency, so millimeter-wave stuff could carry more bandwidth per spectrum unit.  The second is that more bandwidth per allocated channel is available, further multiplying capacity, and the final is that there’s more available spectrum in that range than in the cellular ranges.

Some operators have cited another potential benefit, which is that 5G mm-wave isn’t necessarily “5G” in an implementation sense.  It’s in a kind of interesting transitional zone, between multipoint microwave and cellular networking.  This can give operators interesting options, and create a truly compelling business case for private millimeter-wave 5G.

I’ve noted in some earlier blogs that the business case for mm-wave 5G isn’t hard to find.  Wireline replacement is all about finding a way to approach the baseline bandwidth choice of consumers (around 200 Mbps) with a lower “pass cost” (the cost to get broadband to an area so that it can be sold and connected) and per-customer connection cost.  Legacy CATV cable achieves a pass cost of perhaps $180, and for new installations it’s on the order of $220.  FTTH pass costs are averaging almost $600, and none of these technologies can be self-installed.  The 5G/FTTN hybrid’s pass costs are highly variable depending on the topography and household density of the target service area, but they can range from as low as $150 to as high as $350.

Some operators say that you can self-install the 5G/FTTN hybrid, by attaching the antenna to a window facing the right direction, which does seem to be a bit of coordination to identify the right direction for the antenna to face.  Only the operator knows the location of the FTTN node relative to the home and windows.  This is very much like the installation of an over-the-air TV antenna.  Once the antenna termination is inside, it’s hooked up to a “modem” that provides in-home WiFi connectivity (or, with suitable equipment, Ethernet direct connection).

It’s this easy setup of what could be a multi-hundred-megabit broadband connection that makes this approach appealing for private 5G.  Companies that have a campus, or even that have multiple sites located relatively close to each other, could feed broadband to one and use the 5G mm-wave hybrid to cover others, either through point-to-point or using the primary feed as the “node”.  This application could be called “5GINO” (5G in name only) because the only thing it really needs from “5G” is the spectrum and radio technology.

For both public or private applications, mm-wave could be revolutionary.  It’s likely that the public 5G/FTTN mission will work out the details and provide an early market for antennas, radio elements, and so forth.  This would facilitate broader adoption by lowering costs.  Even the FTTN nodes could become almost commodity items, and broad private millimeter-wave adoption could drive costs down even more.

There are a few different requirements for private millimeter-wave installations, the most notable being the ability to frequency-hop to respond to interference in unlicensed spectrum.  This same problem occurs with WiFi (particularly in dense residential areas) so it’s not rocket science to address it.

Millimeter wave doesn’t need to be limited to wireline replacement, either.  Remember that a normal mm-wave installation would feed a local WiFi router.  Any smartphone with WiFi calling would then be able to take advantage of the wider spread of broadband across a campus or large facility, and still have traditional cellular service available.

Would it be possible for an enterprise to deploy a mobile 5G service using unlicensed spectrum?  The emerging approach for this involves two technologies.  The first is “LAA” for “License-Assisted Access”, and the second is NR-U, for “New Radio-Unlicensed”.  LAA emerged with 4G and Evolved Packet Core, and NR-U is a pure 5G option.  You can run “standard” 5G on unlicensed spectrum with NR-U, or run it with a cellular “anchor” using LAA.  Among enterprises I’ve chatted with, there is virtually zero knowledge of either LAA or NR-U, which shows that if there’s any real presentation of private 5G going on, it’s not exactly insightful.  The reason may be that enterprises see any network capability that’s a conglomerate of standards an invitation to integration hassles and finger-pointing when something goes wrong.

That doesn’t disqualify millimeter wave as a technology useful to enterprises, though.  In fact, it would be easier to use 5G mm-wave and WiFi to support calling from within a facility, than to adopt either NR-U or LAA, providing of course that your primary cellular carrier(s) supported WiFi calling.  There’s no question that a pure 5G approach could offer more features, but given the fact that WiFi calling works fine for millions today, it’s not clear whether the additional pure-5G capabilities are needed.

What about IoT, though?  Well, obviously WiFi 6 could support all the WiFi IoT applications we have today, and many more besides.  You can use Basic Service Set Coloring to balance QoS over multiple WiFi zones, and it supports both high-capacity devices and low-power-and-bandwidth devices.  Again, I don’t dispute that there are applications it won’t work for (mobile IoT) and that there are more that pure 5G could do better, but how much “better” will most IoT need?

My point is simple.  You can make a good business case for millimeter-wave 5G, in both public and private 5G applications.  It does everything we can do now, but better and cheaper, and it hybridizes nicely with WiFi 6.  That’s important because WiFi calling and WiFi IoT are proven solutions to provable opportunities—we have them now and it’s working.  Even without any attempt to co-utilize mm-wave 5G for mobile devices, it’s the form of 5G that has the cleanest business case for deployment.  Add in the mobile dimension and it could be a killer step toward not only public but private 5G as well.

A little respect, please?

Has 5G Learned from the Technology Failures of the Past?

Why did ISDN, X.25, frame relay, SMDS, and ATM go wrong?  These technologies all spawned their own forums and standards, got international media attention, and promised to be the foundation of a new model of infrastructure.  Obviously, they didn’t do that.  When we look at current technology hype around things like NFV or SDN or 5G, we have a similar build-up of expectations.  Are we doomed to a similar let-down?  It’s not enough to say “This time it will be different,” because we said that many of the last times too.  And, because the last couple of promising technologies were different, and it wasn’t enough.

What, then, do we say?  Let’s start by summarizing what each of those technology dodos of the past were supposed to be, and do.

ISDN (Integrated Services Digital Network) was a response to the fact that telephony had migrated to digital transport.  For years, voice calls were converted to 64 Kbps digital streams using pulse-code modulation (PCM), and aggregated into trunks (the T1/E1 hierarchies in the US and Europe, respectively).  The 64 Kbps timeslots were sold as 56 kbps Dataphone Digital Service and the trunks were sold as well, but “calls” were still analog.  ISDN provided a signaling adjunct to the digital network to permit switched digital connections.  “Basic Rate ISDN” or BRI let you signal for 56 Kbps connections, and “Primary Rate” or PRI for T1/E1.  It was an evolution to circuit-switched digital voice networks.

Circuit switching is inefficient, though.  A RAND Corporation study decades ago showed that data applications utilized a small portion of the nominal bandwidth assigned, and that by breaking data into “packets” and interleaving them with other packets from other sources, transport utilization could be significantly improved.  In the ‘70s, X.25 came along, an international standard to define how you could create shared-resource “packet” networks.  This roughly coincided with the birth of the famous OSI Model (Basic Reference Model for Open Systems Interconnect).  X.25 even offered (via a trio of satellite specifications, X.3, X.28, and X.29) a way to use asynchronous (gosh, there’s a term to remember!) terminals on a packet network.

X.25 and its relatives actually got decent international support, though they weren’t as popular here in the US.  They were perhaps swamped by the advent of frame relay and, to a lesser extent, SMDS.  These technologies evolved from different legacy starting points, along different paths, to common irrelevance.

Frame relay was an attempt to simplify packet switching and introduce some notion of quality of service.  Users could, in a sense, buy a kind of envelope in which they could transport things with a guarantee, and what fell outside was then best-efforts or even blocked.  There was also a totally best-efforts service.  It was based on legacy packet, meaning X.25.

SMDS, or Switched MultiMegabit Data Service, was a packet ring technology, a kind of offshoot from Token Ring local-area network technology.  Think of it as packet on a SONET-like ring.  There were “slots” that data could be stuck in, and the sender stuck in a packet where a free slot was available, and the receiver picked it out.  The benefit of the approach was that slots could be assigned and so QoS could be assured.

Neither frame relay nor SMDS made a major impact, and telco-sponsored SMDS arguably fell flat.  That created a vacuum for the next initiative, ATM, to fill.

ATM stands for Asynchronous Transfer Mode, and unlike all the prior technologies that were designed as an overlay on existing networks, ATM was designed to replace both current circuit-switched and packet-switched networks.  That meant it was supposed to handle both low-speed voice and high-speed data, and given that data packets could be over a thousand bytes long and would generate a significant delay for any voice packets that got caught behind them on an interface, ATM broke packets into 53-byte “cells” (5 header and 48 payload).  ATM had classes of service, and even support for TDM-level QoS.  In an evolutionary sense, it was the right approach, and had operators had control of their own destinies in a “Field of Dreams” it is likely what would have emerged.

OK then, what was the reason why these things didn’t work, didn’t succeed?  I think some of them are obvious.

First, all these technologies were evolutionary.  They were designed to support a reasonable transition from existing network technology, to fit a new set of opportunities that operators believed would develop out of the new technologies they deployed.  One problem with evolution is that it tends to delay benefits because unless you fork-lift all the stuff you’re trying to evolve from, the scope of the changes you make limit the extent to which you can offer anything transformationally different.  Another is that evolutionary change tends to perpetuate the fundamental basis of the past, because otherwise you’re not evolving.  If that fundamental basis is flawed, then you have a problem.

The specific evolutionary thing that all these technologies was mired in was connection-oriented service.  A connection-oriented service has its value in connecting things, like people or sites or computers.  It’s a bit pipe, in short.  Not only that, connection-oriented networking is stateful, meaning that a connection is a relationship of hops set up when it’s initiated, and involving all the nodes it transits.  Lose a node, you lose state, and you have to restore everything end to end.  It doesn’t scale well for zillions of relationships, which of course wasn’t the goal for evolving from dial-up telephony.

Finally, evolution all to easily ignores a fundamental truth about new opportunity, which is that new opportunities will be realized using the technology options that are the fastest and cheapest to deploy incrementally.  It was cheaper to build an overlay network for consumer Internet, leveraging dial-up modems at first and then existing cable and copper loop plant, than to transform the entire public network to digital-packet form.  So, that’s what we did.

The second problem was that all these services presumed that evolution was necessary because revolution was impossible.  There was, so the movers and shakers who dreamed up all this stuff, no way to support radically new network missions in a different way.  IP networks, which had their roots in the research and university community in the same general period as packet networks and the RAND study, happened to be a way to do data delivery and experience delivery better, and you could extend them via the digital trunks that TDM networks had already created.  Thus, IP came along and swept the other stuff aside.

The third problem was more financial than technical.  Every step you take toward new network technology will have to prove itself in ROI terms, and with these technologies there was simply no way that could happen.  The big problem was that each step in the evolution changes a microscopic piece of the whole, and that piece is left as an island in a sea of legacy technology.  It’s helpless to do much.  The alternative, which is to fork-lift things, increases costs radically and increases risk as well.

The three technologies we’ve looked at failed all of these tests, arguably, and I think that technology planners realized that they had to try to think differently.  How did they do with SDN, NFV, and 5G?

SDN didn’t fall into two of these traps, it tried to learn from them, but it didn’t account for the third of the traps.  SDN was surely not an evolution, but rather it was arguably a total transformation of packet networking, one that eliminated the adaptive nature of IP in favor of central control.  The problem was that in order for ATM to make a significant difference, it had to be applied on a large scale.  The centralization of the control plane also had to be proven at that large scale, and large-scale application was going to be too costly.

The “unproven paradigm” piece of SDN carried to NFV.  Central control was unproven as a paradigm, and so was the overall ROI on a function-hosted infrastructure.  Before the first year of NFV had passed, the operators who launched it were prepared to admit that the capex savings it could deliver wouldn’t have been much more than “beating up Huawei on price” could have achieved.  The opex impact was nearly impossible to assess because it was clear that the framework for the whole NFV lifecycle process was uncertain.  Insufficient benefits, insufficient ROI.

Now we come to 5G, and here we come to a new complexity.  There’s a simple numerical truth to “5G”, which is that by implication there were four “Gs” before it.  5G is explicitly an evolution.  All evolutions can be justified by two different things.  First, there’s growth in the current user base, and second, new applications that justify different usage.  It’s the interplay between these two that makes 5G complicated.

A part of 5G deployment is as close to being written in stone as you get in carrier networking.  Operators want higher customer density per cell to reduce the risk of having some users unable to connect.  Some are interested in offering higher bandwidth per user to facilitate the use of wireless to provide what’s essentially wireline replacement in some areas.  We will therefore have 5G deployment whether there’s any credible new application set, or whether 5G is just an operator convenience.

The other part of 5G is the hope that there is something in it on the revenue side, and for that we have considerable uncertainty.  There are three general places where that could be found.

First, customers might pay more for 5G’s additional capacity.  The idea that 5G is better for smartphone or tablet users is easy to sell if it’s true, but hard to make true.  The size of the device sets the data rate needed to support high-quality video, the most capacity-consuming application we have.  Obviously, 4G works fine most of the time for video, and obviously many users aren’t watching video on their phones except in unusual situations.  Operators I talk with are privately doubtful that they’ll earn any significant revenue this way.

Second, 5G could fill the world with connected “things”, each of which are happily paying for 5G services.  Think of it as having an unlimited robotic army of prospects to supplement the human population of users, stubbornly limiting their birth rate and so thwarting operator hopes of an exploding new market.  The problem is that the most pervasive “things” we have, stuff like home control and process automation, aren’t likely to have their own 5G connections.  Things like connected car, even if we presume there’s a real application for them, are going to add to revenue only when somebody trades up to one.  IoT and related applications are a “hope” to many operators, but most of those I talk with believe it will be “many years” before this will kick in.

Where we are at this point is pretty obvious.  5G, with only the two drivers noted above, is going to be under ROI pressure early on, encouraging operators to limit their costs.  That’s the real story behind things like Open RAN.  If you have a rampant opportunity to gain new revenue, new costs aren’t an issue.  If you don’t, then you have to watch the “I” side of “ROI” because the “R” will be limited.  So do we have rampant revenue on the horizon?

If we do, they it has to come from the “new” applications.  These include things like artificial and augmented reality on the “suppositional” side, and fixed broadband replacement on the current/practical side.  I believe that it’s this third area that will decide whether there are any credible new revenue drivers for 5G.

5G’s higher capacity, particularly in millimeter wave form, hybridized with FTTN, would significantly change suburban broadband economics, and even help in some urban areas.  Operators tell me that they believe they can deliver 100Mbps service to “most” urban/suburban users, and 500Mbps or more to “many”.  Where DSL is the current broadband technology, 5G/FTTN could offer at least four times the capacity.

The problem here is that evolution-versus-revolution thing.  Operators have been in a DSL-or-FTTH vice for decades, and the cable companies have taken advantage of that in areas where cable is offered.  Forcing change on users is never possible, you have to induce it, and historically the delivery of linear TV has been the primary differentiator for home broadband.  You can’t deliver it with 5G/FTTN, so operators would have to commit to a streaming strategy of their own, or share TV revenues with a streaming provider at best, or be bypassed on video at worst.

Australia, with one of the lowest demand densities in the industrial world, is already giving us a sign that 5G/FTTN could be a game-changer.  Telstra, the once-incumbent operator forced by the government to cede access infrastructure to a not-for-profit NBN, is getting aggressive in using 5G to win direct access to users again.  Rolling out FTTH in someone else’s territory is a non-starter in nearly every case, but a 5G/FTTN hybrid?  It could be done, and Telstra is proving it.  Competitive home broadband rides again, and telcos fear competition more than they pursue opportunity.

Which brings us to those suppositional things, like augmented reality, maybe connected cars, and maybe full contextual services that show us stuff based on where we are and what we’re trying to do.  These could be massive 5G drivers, but…

…they’re part of an ecosystem of applications and services that 5G is only a small piece of.  If you want to bet on these, you’re making the classic “Field of Dreams” bet that somehow the ecosystem will build itself around the service you provide.  Evolution in action.  The problem is that evolution takes millions of years, which obviously operators don’t have.

I think it’s clear that modern technologies attempted to address the failures of early revolutionary telco technology changes by focusing on new applications.  That’s made them more vulnerable to the third, that ever-present and ever-vexing problem of ROI.  A massive service ecosystem needs massive benefits to justify revolutionary change.  If I’m right, then it will be fixed broadband and augmented reality and contextual services in combination that would have to justify a true 5G revolution, and anyone who wants to see one should now be focusing on how to get those two opportunities out front, with developed ecosystems.  Otherwise, 5G is just another “G”.

Could Public Cloud Providers Really End Up Hosting 5G?

What might actually drive telcos to host 5G on public cloud services?  That’s another issue that we’re starting to see some clarity on, as operators get the range on 2021 budgets and which of their tech plans might be fulfilled.  Clarity doesn’t mean firm conviction, though, as we’ll see.

Operators list three reasons for their interest in 5G-on-public-cloud.  First, they have some service sites where they have no current physical footprint.  Out-of-region is particularly an issue for operators who have long had a wireline service footprint, and whose available real estate is associated with it.  Second, they are not confident about the state of an open solution to 5G, but want to avoid a proprietary one.  The cloud, in this case, is a kind of placeholder.  Finally, they lack confidence in their ability to deploy their own “carrier cloud”.  This is the group that intends to stay with public cloud services for considerable time, at least until that confidence is built.

An issue that cuts across all these motivations is that of the relationship between Open RAN, which is well understood, and evolving 5G services, which are not.  As I noted in my blog yesterday, it’s hard to see a lot of new 5G applications evolving with features like slicing confined to the RAN.  But for them to get out of a local RAN domain, we need an understanding of how to extend slicing and the standards and implementations needed to facilitate the extension.

Another cross-motivational issue is that of hosting OSS/BSS in the public cloud.  Most operators are at least interested in this, and many are already dabbling in the ways it could be done.  Some OSS/BSS systems have evolved away from the monolithic-and-batch model of the old era, but not all of them.  In addition, effective use of the public cloud to support any application will also involve at least some revision to the business practices being supported, and likely the extension of more autonomy to customers in service buys, changes, and support.

The operators who have the most immediate interest in hosting 5G on public cloud seem to be those with a lot of competition and limited in-house cloud skills.  Most operators agree that whether 5G is hosted in the public cloud or not, it’s still a cloud application.  If an operator without much cloud expertise is competing for an aggressive and early 5G positioning, they need a short-term option.  Even larger operators who have substantial out-of-region wireless business has a similar problem, because 5G hosting alone (even Open RAN) doesn’t justify deploying private cloud data centers everywhere you want to license spectrum.

Where this particular influence isn’t present, there’s an almost-even split among operators with immediate interest, based on whether they think they need more time to get a self-hosted 5G solution, or simply don’t have the skill.  A big part of this may be due to operator concerns about the state of “Open 5G” as opposed to Open RAN.  Another concern of this segment of the market is the credibility of 5G applications beyond those that have driven 4G.  The theory is that if there are credible new 5G revenue sources, it would be better to wait until operators know what they are, before they commit to any new 5G hosting in house.  Thus, use public cloud till the situation is clarified.

Operators in this second two-motivation group are also concerned about just who will provide a credible strategy for open 5G.  A part of that is uncertainty about the scope needed (RAN versus end-to-end) and a part because operators are seeing a major integration story associated with Open RAN, and think that a full open 5G would require even more.

From this you might believe that operators are really committing in growing numbers to hosting 5G control plane in the public cloud.  That’s sort-of-true, but not decisively so, and the reasons relate to the flip side of the motivational story behind public cloud.

One operator says that “lock-in is bad no matter who’s doing the locking.”  Public cloud players are seen, by some operators, as less trustworthy than traditional network equipment vendors.  They’re almost universally seen as being just as predatory, and they’re an unknown quantity to boot.  Then there’s the conviction that once operators decide not to host their own 5G elements, they may be destroying the best justification for starting their own carrier cloud.  You don’t need an external force to lock you in with traditional predatory tactics when you’ve surrendered all your other choices anyway.

Carrier cloud, which if fully realized could generate a hundred thousand incremental data center deployments by 2030 according to my model, is the brass ring (nay, the gold ring) to many forward-thinking operator planners.  These people think that bit-pushing has been a profit sink for decades, and there’s no reason to think that changing the technology is going to change that economic reality.  Thus, operators have to stop relying on pure connectivity and transport for profit, which means they have to rely on something that’s hosted.  For which, of course, you need hosts.  Do operators want to surrender a big cut of the kind of revenue streams that could justify a hundred thousand data centers?  For that kind of money, you could put up with some feelings of personal ignorance, or get educated.

This is another place where we have a balance of forces.  Some operators do believe they could run their own carrier clouds.  More think that if they gained some cloud experience, they could be effective, but most still have their doubts.  The lock-in risk balances against simple fear of screwing things up.

Public cloud providers have generally been careful not to accidentally trigger more lock-in fear.  Even those who offer hosted 5G as SaaS aren’t advocating the operator stay with public cloud perpetually.  But for OSS/BSS modernization, it’s a different story, and that may be the deciding factor here.

Operators have most of their IT expertise in the CIO organization, running OSS/BSS systems.  If those organizations transform to the public cloud, then “cloud knowledge” will come to mean “public cloud knowledge” not “carrier cloud knowledge”, which would mean that the operations part of the carriers would have to develop IT and cloud skills almost from scratch.

In the end, though, the real issue that will drive, or fail to drive, a transformation of 5G hosting to public cloud platforms, is the integration and responsibility issue.  Media stories on Dish have listed a dozen different players involved, and operators cringe at jumping into 5G RAN with a cast of thousands trying to sing without a conductor.  Who is that conductor?  Open strategies in IT have succeeded largely because a vendor took responsibility for creating and sustaining an ecosystem, bring order from the chaos of best-of-breed.  I think that unless some major players step up on 5G Open RAN and carrier cloud, operators may take the expedient public-cloud step as what they believe is a transitioning strategy, and then hunker down for the long term.

Reviewing Recent Developments in Network Technology

There’s clearly a lot of movement in network equipment, both in technology and competitive areas.  There have been some recent developments in the Open RAN space, and other network vendor announcements that suggest changes in 2021.  This gives us an opportunity to examine how the trends I’ve talked about in the past are evolving through the present.

One announcement that got a lot of attention recently was the kerfuffle between Cisco and Acacia Communications.  The two agreed on a deal, then Acacia said it would back out because Cisco had failed to secure China’s regulatory approval on time.  Cisco said Acacia was simply maneuvering to get a better deal, and that might be validated because Cisco and Acacia reached a new deal that gave them a 64% premium to the original deal.

Obviously, Cisco wanted Acacia badly, and just as obviously there had to have been changes in the market’s perception of the future to explain why Cisco made such a low initial offer, and why Acacia accepted it.  That situation is a good place to start our exploration of network market conditions and developments.

The original offer was made in July 2019, well before COVID but also before the full import of the shifts in network infrastructure planning by operators, cloud providers, and large enterprises was known.  While everyone knew that operators were suffering a profit-per-bit squeeze and were pushing down on capital spending, few at the time recognized any special urgency.

In the two service provider fall planning cycles that followed, operators were pretty clear that they intended to reduce costs, and in particular intended to do whatever they could to reduce capex.  By that time, it was pretty clear that things like NFV were not going to have a major impact, and so operators were putting a lot of price pressure on equipment, which favored (in 2019) price-leader Huawei.  But US government pressure on operators’ use of Huawei equipment may have created a sense of relief for competitor Cisco.

The 2020 planning cycle showed that operators were not going to simply shrug off the loss of a price leader.  They were instead turning aggressively toward white-box networking.  AT&T’s announcement that they were going with startup disaggregated white-box router vendor DriveNets was a critical proof point; Cisco fought hard against that deal.

For Cisco, committed to selling network equipment for most of their revenues, this generated a push for a response that could immediately reduce the pressure on their network gear.  Longer-term, Cisco also saw the need to get more directly involved in the cloud-platform business, but that move would surely take a lot of time to mature.  Acacia had been providing Cisco with optical modules, and the original Cisco driver for the deal—improved margins on the gear by owning the optical interface side—still looked good.  What looked even better was the idea that if Cisco could leverage Acacia to create a packet-optical layer that was integrated with Cisco routers, they could take a major step forward toward addressing operator capex concerns without ruining their own margins.

Router networks run on transport optics, and vendors like Ciena have traditionally supplied the gear.  Over time, it’s been clear that packet optics could tap off a lot of the handling at the IP layer, and so optical vendors have been moving into that space.  If Cisco and Acacia could create a packet-optical underlay within a router chassis, they could displace the additional layer of gear, which would reduce capex, give Cisco a bigger share of what was left, and reduce opex by reducing the number of boxes in the network.  This is what made Acacia more valuable to Cisco, I think, and both Acacia and Cisco knew that.  Hence, the new deal at a big premium.

Open-model packet optics isn’t here yet, so the Acacia deal gives Cisco a way to exploit the benefits ahead of a more competitive market.  That situation isn’t likely to last forever, though, and Cisco has to face a technology challenge in packet-optical Acacia mission-building.  A truly effective packet optical layer could arguably significantly reduce the value of IP-level grooming using things like MPLS.  In fact, packet optics could reduce the value of the IP layer overall, if it handled things like resiliency and failure recovery.  Cisco needs to have a strategy to address how to frame their Acacia-based packet optical stuff without eating the IP layer to the point where it becomes a commodity.

If IP is simple enough, then packet optical network equipment vendors could still propose a new packet optical layer, overlaid by a white-box IP layer, and end up with lower capex and opex than Cisco could offer with Acacia.  What’s required is a strategy to separate the control and data planes of IP and thus facilitate the deployment of white-box data-plane-specialized boxes, likely in a disaggregated mode.  The fact is that we have a number of initiatives that would deliver that, including the ONF’s programmable networks concept and DriveNets’ disaggregated router model.

The challenge is getting something to deploy in enough volume to be meaningful in terms of capex impact.  The easiest place to do that is a place in the network where considerable change is already mandated by a budgeted shift in technology.  Enter 5G in general, and Open RAN in particular.

More than anything else, Open RAN is about creating a truly open separate-control-plane concept.  However, what 5G calls the “control plane” and “user plane” doesn’t match IP control plane separation.  Open RAN is separating the 5G mobility and registration services, network slicing, and so forth.  All of this is above the IP control plane.  Open RAN stories today are told without any presumption that the 5G User Plane, which is really IP, is in any way impacted.  Similarly, players like the ONF and DriveNets are telling a separate-IP-control-plane story that doesn’t in any way involve the 5G control plane, or Open RAN.

Cisco needs to control the way IP evolves, to stave off a massive shift in the way operators see the IP layer, so that an Acacia/Cisco packet optical layer can pull through Cisco devices without heavy discounting.  Right now, the fact that 5G is the only greenfield space where massive shifts could be expected, and that 5G so far isn’t proposing to rethink the IP control plane, favors Cisco.  But that’s just right now.

The problem for Cisco is that there’s a lot of competition developing around hosting the RAN control plane.  AT&T, already using DriveNets’ disaggregated router in its core, is committed to Open RAN.  “It means a huge opportunity for us. Let me start by saying that AT&T will deploy and implement Open RAN” is an AT&T statement from a recent article.  Not surprisingly, that view is creating a bit of competitive jousting.

The general interest of cloud providers in hosting the 5G control plane got a boost with a deal involving Google Cloud and Nokia.  The focus on cloud-native here is likely to be more than blowing an NFV-laced kiss at the terminology, given Google’s involvement.  Cloud-native 5G would be a much more credible platform to extend into higher-layer services.

Cisco rival Juniper, already apparently looking at network-as-a-service as a broad strategy, cut a deal to help Turk Telekom to export its 5G technology base, and obviously intend to integrate it with Juniper.  Since Juniper’s NaaS model could present an opportunity for operators to extend connectivity services without becoming OTTs, bringing the story to 5G could give Juniper a jump too.

Cisco seems to be holding back a bit on committing to their own 5G control plane strategy.  Does this mean that they’re not seeing a threat to the IP layer?  Does it mean they think it’s too soon to take a position; that they might speed up what they fear might happen instead of taking control of it?  We don’t know, but it seems likely that Cisco will have to make a move by late spring, or they risk losing the initiative in a space that might be defined by fast-movers.

Some Insights from the Evolution of Operator 5G Viewpoints

As 5G planning progresses, we’re getting more information from operators about how they’re seeing things like Open RAN, public-cloud 5G hosting, and new 5G-dependent services.  Things are still very preliminary, but one thing that’s already clear is that the high cost of 5G licenses (as reflected in recent auctions) are raising the risk level for operators who simply blunder along into 5G.  It’s not business as usual.  What might come of the impact of these forces is critical to 5G and to network evolution overall.

The current focus of 5G is the RAN (technically, “New Radio” or NR in 5G terms), for the simple reason that most of the visible aspects of 5G (bandwidth per user and per cell) are created there, and because RAN changes are reflected in 5G devices, which have to get into the pipeline quickly or there’s no user base available for services.    As essential as this early RAN-focused stuff is to jump-start the device market, it’s not likely to deliver on tangible differentiators for 5G-specific services that operators could monetize.  To understand why, we’ll have to dig a bit into 5G, and perhaps invent some useful terminology.

5G, like all mobile network technologies, creates a kind of functional domain within which it offers specific features to support mobile users.  Mobility management, an element of the 4G IMS and evolved packet core (EPC) are examples of this.  We have an IP service network (the Internet, VPNs, or whatever), and the 4G functional domain is sort-of-glued onto it, to provide the additional stuff needed to support mobile users.  Think of this as a big circle (the public data network, like the Internet) with a bunch of little circles distributed around the edge.  At the point of contact, we can imagine a kind of “shim layer”, a broad outline of that central circle.  Let’s look at 5G based on this.

The little circles are the metro area RAN and associated “on-ramp” technologies.  These include the true 5G NR/RAN implementations, as well as on-ramp facilities for things like WiFi users and (likely) 5G fixed-wireless (5G/FTTN).  Within these little circles, it’s important to be able to provide for free mobility of users across 5G cells, and also to other access technologies like WiFi.

The shim-layer outline of the big circle is the control-plane extension and coordination that allows for unification of 5G mobile experiences across the little-circle RAN metro domains.  If somebody roams out of one RAN metro domain into another, there has to be a graceful way of coordinating that process so packets are properly delivered.  Ideally, sessions should be maintained.  The big inner circle, the PDN, has been the historical common connecting point, linking mobile users to the services and sites they want.  It’s not currently required to be aware of mobile behaviors.

From a function-and-features perspective, 5G differs from 4G in large part because it offers capabilities beyond mobility.  Latency control, for example, is a special feature of 5G, and another might be support for simple devices (IoT), precision location services, and security.  Most of these features, which are usually envisioned as being extra-cost options, are implemented through the network slicing capability of 5G, which creates “virtual networks” over common infrastructure.

These added 5G features require 5G control over more of the information flow than simple mobility management would require.  Thus, we could visualize their impact as being a “thickening” of our shim-layer outline to extend coordination and data handling directly between metro domains, and likely the introduction of similar features into the big center circle, the core PDN in current terminology.  Some might see more and more of 5G information flows staying in the 5G functional domain, making the “shim” outline thicker, to the point where it might end up being much or even most of that central core.

This relates to operator plans for two reasons.  First, the standards needed to fully outline the implementation of this added 5G feature set aren’t finalized, and second, the implementation of all of these additional features demands a considerable shift in thinking.  If “Open RAN” is a desirable goal for operators, then “Open Core” is mandatory.

We know what the 5G model of what I’m calling “Open Core” would look like.  There’s common infrastructure under all the slices, and that infrastructure might be dedicated to a slice, to 5G, or coerced from shared resources.  The model the 3GPP promotes is that features are all built on NFV, but this presupposes that NFV actually represents a useful model of function hosting in an age where we already have cloud-native concepts that are far beyond it in utility.

The other challenge here, as operators see it, is just what “Open” means.  There is no question that operators believe that mobile services of the kind we’ve had for years must support intercalling across operator boundaries.  There’s no question that basic enhancements to our little-circle metro domains like WiFi roaming would have to extend to any roaming user, and also would have to support calls across operator boundaries.  But the rest?  Do operators want to invest in IoT and network-slicing-based services in cooperation with other operators?  That’s not clear to many of them yet.

Operators in areas like the EU tend to believe that they would likely face a regulatory mandate to open 5G-specific features across operator boundaries to create a uniform EU-wide service set.  In the US, that seems less likely, and in much of the rest of the world, there’s no clear consensus on the topic at all.  The likely outcome, most think, is that operators would decide based on the overlap in service areas between themselves and potential partner operators.  Big-overlap plus no-regulatory-mandate equals try an all-on-us solution rather than partnering.

One thing that almost all operators say now is that they would like to see an open 5G Core model, not just Open RAN.  Perhaps one in five are suggesting that absent that capability, Open RAN is less useful to them.  This seems congruent with the way that the proprietary 5G vendors are positioning; they focus more and more on the concept of “end-to-end” 5G slices and features.

Another area getting some attention is the “federation” of features implemented through NFV (or a cloud-native successor technology).  There are no specific standards relating to the way these features would be accessed, even by the operator who deployed them, much less how they’d be shared.  Some operators see a value in having “feature brokers” who built useful features and even services and then offered them on a wholesale basis for others to integrate.

Many operators see public cloud hosting of 5G service components as a kind of federation.  While there’s certainly interested in using public cloud generalized hosting, especially for 5G services deployed outside operators’ traditional footprints, there’s also interested in a SaaS model where functional elements of 5G are offered, not just hosting of those elements.  That same approach could be taken for those above-5G features that build additional service value.

Open RAN could have a direct impact on this, because of the RIC or RAN Intelligent Controller element.  This provides orchestration for key 5G components, but it also offers a point of interface between 5G control and higher-layer add-on elements.  Thus, how capable, how evolved, the RIC turns out to be could influence how quickly Open RAN could evolve to support NaaS.

That’s where things stand.  The Open RAN discussion, the costly spectrum auctions, the continued profit pressure on network operators, and the uncertainties of a future service set that’s not wired into monolithic hardware are unsettling to operators overall, but they’re something they’re increasingly resigned to facing, and hoping to face well.  That, of course, may be the big vendor opportunity.

Could IT and Networking be Growing Together?

Are we heading toward a union of network and IT concepts, vendors, and even infrastructure?  Over the last two decades, companies have slowly pulled networking and IT into a common organization, and today the “software-defined” technologies seem to be uniting the two spaces for service providers and the cloud players.  Is this trend real, is it a good thing, and what might it mean to tech overall?

Networking has been subducted.  When I started doing enterprise surveys, I had a survey base of 300 companies, and in the first survey I did, 254 of them said that network and IT were separate buying authorities in their organizations.  In my most recent survey of 220 companies, 212 said the two were combined.  For 20 years, enterprises have said that the largest factor in establishing network policy and exercising strategic influence was the data center.  Today, more companies tell me that their key IT vendors (servers, platform software, virtualization, etc.) drive network decisions than say that Cisco and other network vendors do.

Why has this happened?  Digging through the survey history I’ve accumulated, what I find is that three major factors have driven the change we’re seeing.  I think we learn the most by taking them in the order they appear, chronologically.

The first influence I find in my surveys dates from the late 1980s, and it’s the shift to IP networks as the baseline strategy for private networking.  Up to this point, computer vendors like IBM had their own unique networking strategy, which meant that data networking tended to be divided among the computer vendors, and tightly coupled to them.  Transport networking, what companies bought in the way of services to create connections, tended to be based on TDM, and carried voice and data traffic.  Since the data side was so balkanized, TDM and voice kept networks independent.  When IP came along and quickly took over as the data network model, it unified all the data center forces on a single network approach, and that reduced the forces that separated voice and data.  Of course, IP ultimately started to carry voice, too.

The second influence the surveys show is the substitution of virtual private networks for private networks.  When IP started its climb toward supremacy, it was used to build private networks by combining routers with TDM connections.  VPN services, meaning primarily IP VPNs but also VLANs, created a service that included transit switching and routing, and so eliminated a lot of “interior” nodes.  What was left was a bunch of access routers in remote locations, and bigger data center routers at points where computer systems were installed.  Power follows money, and not only did this reduce pure network capex, it also put the biggest-ticket items in the place where the IT organization ran the show.

The next influence was componentization and service-and-microservice technology.  When applications are monolithic, there is a sharp boundary between network and application—the network gets you to the data center where the applications are run.  Separation of networking and IT is still possible because of that.  When you start to build applications from services, meaning that you componentize applications and link the pieces with network connections, then more and more of “the network” is internal to IT.  Not only that, this internal network is now critical in sustaining QoE, and network features like load-balancing and failover are now linked to what’s seen by users as application behavior.  Obviously, the cloud extends this particular influence, and expands it greatly.

The final influence, the one that’s raising its head most dramatically today, is security.  In modern distributed and cloud applications, it’s increasingly difficult to separate network security and IT security, since both are really targeting the protection of critical IT assets and resources.  What makes this particular influence so interesting is that it’s directly driving competing initiatives from network and IT vendors already.  Not only does the space conceptually unite IT and network, the offerings are competitive too.

The competitive theme here is also moving backward, so to speak, through the earlier influences uniting network and IT.  Virtualization in general has generated virtual networking initiatives in parallel with server virtualization.  VMware’s NSX is derived from Nicira, which came about to create large-scale multi-tenant data centers with totally separated virtual networks.  Kubernetes has virtual network add-ons, so does OpenStack, and so do the service mesh technologies.

The virtual network, then, is perhaps the single thing that’s framing the union of IT and network.  A virtual network looks like a “network” to the user, but is a tenant on generalized infrastructure provided by any combination of entities playing any convenient set of roles.  Separate use from realization and you see how important this can be.  If a virtual network can be defined by any player, then an IT version of it is just as credible as one from a network vendor.  In fact, it may be more credible, because the virtual network from the IT player is free to follow the twists and turns of IT needs, as seen by the players who know those needs best.

The new competitive avenues would seem to favor the network vendors in that they have an opportunity to jump over into another space when network spending is under pressure.  Cisco, for example, has long appeared to covet a position in cloud platform software and containers.  However, the true situation might be just the opposite.  Network vendors have been trying to claw their way out of the box of connection infrastructure, and just as they’ve had some success, IT vendors are now more able to compete with them in that higher-layer space.

Here’s where Juniper’s position bears watching.  As I’ve noted in prior blogs, Juniper has done some (uncharacteristically) savvy M&A recently, picking up players like Mist, Netrounds, 128 Technology, and Apstra.  These combine to offer Juniper what may well be the most solid network-as-a-service positioning of any network vendor.  If they’re rolled into a unified NaaS-like story, it could be a game-changer for Juniper, and for the space overall.  Nokia also has a shot at NaaS through its long-mishandled Nuage asset.

Could a NaaS story create some headroom for the network vendors?  It could also, of course, offer another and perhaps more dramatic path to success for the IT players, particularly those like VMware, who already has a virtual-network offering.  Still, I think NaaS could tip the balance of opportunity more toward network vendors, simply because the IT side isn’t used to positioning technology as a new service paradigm.

This may turn out to be the battle of the budgets, too.  Enterprises have steadily shifted their financial planning influence toward the IT side.  If those players get a stake of the network piece of the pie, there could be big trouble in store for the leading network vendors, including Cisco.  Might Cisco’s drive toward software for virtualization and the cloud be a response to this risk, an attempt to take the battle to the home ground of the IT players?  It could be, and in any event, even the potential for greater network/IT fusion could be a game-changer in the market.

Can We Make “Observability” Meaningful?

If a tree falls in the woods….  We’ve all heard that old saying, and many remember debating it in their youth.  Now, it seems, the story is skirting the edge of becoming a paradigm, and we even have a name for it, observability.  Since I blogged yesterday on the risks of tech-hype, is this another example of it?  Not necessarily, but the truths of observability could get tangled with the tales it’s generating, to the detriment of the whole concept.

If there’s a difference between information and knowledge, the difference lies in a combination of context and interpretation.  In things like network and IT monitoring, the goal is to take action based on information gathered, and that depends on somehow converting that information into knowledge.  I submit that the difference between “monitoring” and “observability” lies in that conversion.

Protocol just did a piece on observability, and while I think it’s interesting, I don’t think it grasped this essential truth.  I think they made a connection between monitoring at the operations level and interpreting information through the introduction of context at the application level, but they seem to be focusing on software changes as the target of the shift.  It’s way broader than that.

Information is the sum of the statistical outcome of network and IT operations.  You can count packets, accesses, time responses, and do a bunch of things, but it’s usually difficult to make a useful decision on the basis of this kind of data.  As I noted above, there’s a need to inject context and then interpret the result.  What I think provides the missing link, in the sense of creating an implementation anchor to the abstract term of “observability”, is workflow.

Networks and IT infrastructure serve a set of missions, the missions that justify their deployment and operation.  These missions can be related to specific information flows, usually between workers and applications but sometimes between application components, and between “things” in an IoT sense, or between these things and applications.  All these information flows are, in IT terms, workflows, and each of them has an implicit or explicit quality-of-experience relationship that describes the sensitivity of the mission(s) to variations in the workflows.  Usually, QoE is related to quantifiable properties like packet loss, latency, and availability.  I submit that observability is really about knowing the QoE associated with a given workflow, meaning knowing those quantifiable properties.

Monitoring is the process of recording statistical data on the performance of something, and that truth establishes its limitations.  I can count the number of packets I receive at a given point, without any issues.  However, that’s not really a part of my basic QoE formula; I need packet loss.  The loss of a packet can’t be counted where it didn’t show up.  It has to be determined by comparing the number of packets sent with the number received.  But even that’s an oversimplification, because it presumes that every packet “sent” was supposed to be received.  An interface supporting multiple conversations can be analyzed with the sent/received difference if there are no intervening points between sender and receiver that could divert some packets to a different path.

We usually end up reflecting this truth by measuring packet loss on a hop basis, meaning over a point-to-point path, and that works, but it creates another problem, which is that we now have to know what paths our workflow is transiting to know if it’s subject to a known packet loss from a hop, and even then, we don’t know if the packets lost include the packets from our workflow.

The simple answer to observability would be to eliminate the middleman in the QoE process.  If the on-ramp agent for a particular workflow measured all the QoE quantities and the destination or off-ramp agent did the same, we could decide whether packets from that workflow were lost.  Count what the agent for a particular direction sends and compare it with what’s received.  Software can do this if the process is injected into the code, and that’s not difficult to do.

But does this really constitute “observability”?  We’re supposed to be able to take action on what we find, and how can we do that if all we know is that something got lost in transit?  We don’t know where the workflow was supposed to go, path-wise, or where it actually went, or what conditions existed along the path.

Software development does indeed impact the observability process.  If workflows are threads connecting microservices, as they are in the cloud-native model, then changing the software architecture would change the elements of the workflow.  A new microservice could lose data, data could be lost getting to or from it…you get the picture.  However, we still have the fundamental requirement to track what was supposed to happen versus what did, or we have no means of adopting a strategy of remediation.  Logically, can we assume that all packet loss in a multi-microservice workflow arises because a microservice logic error dropped it?

Then there’s the question of “contaminated” versus “dropped”.  A microservice might drop a packet because of a software error, but it’s probably more likely it would process it incorrectly and pass it along.  Thus, our workflow is prone to both packet loss and contamination.  Packet error-checks will usually detect a transmission garbling of a packet, so we could assume the latter problem was caused by software logic.  The former could be caused by either software or transmission/handling, and to decide which was the villain we’d either have to correlate packet counts at the hop level (meaning we’d have to know where the packet went) or infer it based on (perhaps) the fact that the problem occurs more often than network errors would be likely, or because it can be correlated with a new software element in the workflow or a change to an existing element.

I’m not suggesting that all these issues can’t be resolved.  We can count packets everywhere they’re handled.  We can track packet paths so that even adaptive changes to network topology won’t confuse us.  If we lay out how this could be made to work, though, we’d see that the result is likely to generate more packets of statistical data than packets in the workflow, and we’ve added a whole new layer, or layers, of cost and complexity.

The alternative approach, perhaps the best approach, may already be in use.  We don’t try to correlate faults, but to prevent or address them.  If a network is self-healing, then there is no need for higher-level remediation should there be an issue with one of the network QoS parameters that relate to our mission QoE parameters.  The network will fix itself, if repair is possible.  The same could be done, through virtualization and the cloud, to fix issues with QoE created by server failures.

This aims us at a possible approach; intent modeling.  If we can break down infrastructure (network and IT, data center and cloud) into manageable functional pieces and compose application hosting and connection from them, we can link mission and function easily, and then link functions with realizations.

If we presumed this approach was an acceptable path to observability, then the Protocol story makes more sense.  Take the network and data center faults out of the picture, and what’s left is software.  The problem is that users aren’t sensitive to where problems originate, only to whether their expected experiences are disrupted.  We need to address “observability” in a general way because its goal is necessarily general.

This whole debate, in my view, is another example of our failure to look at problems holistically.  Fixing one flat tire out of four is progress in one sense, but not in the sense of the mission of the tires, or the car, or the driver.  The observability discussion needs to be elevated or we’re wasting our time.