Vendors and Operators: Changing Dynamic?

Should prey help predators by cooperating with them in the hunt?  A Light Reading story on the Broadband World Forum conference highlights the view that operators want the telecom equipment vendors to “get more involved in open source projects instead of sitting on the sidelines.”  Sorry, but that sure sounds to me like proposing a cheetah ask a gazelle to slow down a bit.  Operators need to accept the basic truth that open-source tools and other transformation projects that cut operator costs are also cutting vendor revenue.

The problem here, according to operators who’ve opened up to me, is that the operators don’t know what else to do.  They’ve proved, in countless initiatives over the last fifteen years, that they have no skill in driving open-source software, or even driving standards initiatives that are based on software rather than on traditional boxes.  I’ve criticized the operators for their failure to contribute effectively to their own success, but maybe it’s time to recognize that they can’t help it.

Back in my early days as a software architect, I was working with a very vocal line department manager who believed that his organization had to take control of their destiny.  The problem was that he had nobody in the organization who knew anything about automating his operation.  His solution was to get a headhunter to line up some interviews, and he hired his “best candidate.”  When he came to me happily to say “I just hired the best methods analyst out there,” I asked (innocently) “How did you recognize him?”

If you don’t have the skill, how can you recognize who does?  That’s the basic dilemma of the operators.  In technology terms, the world of network transformation is made up of two groups.  The first, the router-heads, think of the world in terms of IP networks built from purpose-built appliances.  The second, the cloud-heads, think of the world in terms of virtualized elements and resource pools.  Operators’ technologists fall into the first camp, and if you believe that software instances of network features are the way of the future, you need people from the second.  Who recognizes them in an interview?

It’s worse than that.  I’ve had a dozen cloud types contact me in 2020 after having left jobs at network operators, and their common complaint was that they have nowhere to go, no career path.  There was no career path for them, because the operators tended to think of their own transformation roles as being transient at best.  We need to get this new cloud stuff in, identify vendors or professional services firms who will support it, and then…well, you can always find another job with your skill set.

It’s not that operators don’t use IT, and in some cases even virtualization and the cloud.  The problem is that IT skills in most operators are confined to the CIO, the executive lead in charge of OSS/BSS.  That application is a world away, in technology terms, from what’s needed to virtualize network features and support transformation.  In fact, almost three-quarters of operators tell me that their own organizational structure, which separates OSS/BSS, network operations, and technology testing and evaluation, is a “significant barrier” to transformation.

About a decade ago, operators seemed to suddenly realize they had a problem here, and they created executive-level committees that were supposed to align their various departments with common transformation goals.  The concept died out because, according to the operators, they didn’t have the right people to staff the committees.

How about professional services, then?  Why would operators not be able to hire somebody to get them into the future without inordinate risk?  Some operators are trying that now, and some have already tried it and abandoned the approach.  Part of the problem comes back to those open-source projects.  Professional services firms are reluctant to spend millions participating in open initiatives that benefit their competitors as much as themselves.  Particularly when, if they’re successful, their efforts create products that operators can then adopt on their own.

Why can’t cloud types and box types collaborate?  The come at the problem, at least in today’s market, from opposite directions.  The box types, as representatives of the operators and buyers, want to describe what they need, which is logical.  That description is set within their own frame of reference, so they draw “functional diagrams” that look like old-line monolithic applications.  The cloud people, seeing these, convert them into “virtual-box” networks, and so all the innovation is lost.

Here’s a fundamental truth:  In networking, it will never be cheaper to virtualize boxes and host the results, than it would be to simply use commodity boxes.  The data plane of a network can’t roam around the cloud at will, it needs to be where the trunks terminate or the users connect.  We wasted our time with NFV because it could never have succeeded as long as it was focused only on network functions as we know them.  The problem is that the network people, the box people, aren’t comfortable with services that live above their familiar connection-level network.

This “mission rejection” may be the biggest problem in helping operators guide their own transformation.  You can’t say “Get me from New York City to LA, but don’t cross state lines” and hope to find a route.  Operators are asking for a cost-limiting strategy because they’re rejecting everything else, and that’s what puts them in conflict with vendors.

The notion of a separate control plane, and what I’ve called a “supercontrol-plane” might offer some relief from the technical issues that divide box and cloud people.  With this approach, the data path stays comfortably seated in boxes (white boxes) and the control plane is lifted up and augmented to offer new service features and opportunities.  But here again, can you create cloud/box collaboration on this, when the operators seem to want to sit back and wait for a solution to be presented?  When the vendors who can afford to participate in activities designed to “transform” are being asked to work against their own bottom lines, because the transformation will lower operator spending on their equipment?

New players at the table seem a logical solution.  Don Clarke, representing work from the Telecom Ecosystems Group, sent me a manifesto on creating a “code of conduct” aimed at injecting innovation into what we have to say is a stagnating mess.  I suggested another element to it, the introduction of some form of support and encouragement for innovators to participate from the start in industry initiatives designed to support transformation.  I’ve attended quite a few industry group meetings, and if you look out at the audience, you find a bunch of eager but unprepared operator types and a bunch of vendors bent on disrupting changes that would hurt their own interests.  We need other faces, because if we don’t start transformation initiatives right, there’s zero chance of correcting them later.

This is one reason why Kevin Dillon and I have formed a LinkedIn group called “The Many Dimensions of Transformation”, which we’ll be using to act as a discussion forum.  Kevin and I are also doing a series of podcasts on transformation, and we’ll invite network operators to participate in them where their willing to do so and where it’s congruent with their companies’ policies.  A LinkedIn group doesn’t pose a high cost or effort barrier to participation.  It can serve as an on-ramp to other initiatives, including open-source groups and standards groups.  It can also offer a way of raising literacy on important issues and possible solutions.  A community equipped to support insightful exchanges on a topic is the best way to spark innovation on that topic.

Will this help with vendor cooperation?  It’s too early to say.  We’ve made it clear that the LinkedIn group we’re starting will not accept posts that promote a specific product or service.  It’s a forum for exchanges in concepts, not a facilitator of commerce.  It’s also possible for moderators to stop disruptive behavior in a forum, where it’s more difficult to do that in a standards group or open-source community, where vendor sponsorship often gives the vendors an overwhelming majority of participants.

The vendor participation and support issue are important, not only because (let’s face it) vendors have a history of manipulating organizations and standards, but because vendors are probably critical in realizing any benefits from new network technology, or any technology, in the near term.  Users don’t build, they use.  Operators are, in a technology/product sense, users, and they’ve worked for decades within the constraint of building networks from available products, from vendor products.

I’d love to see the “Code of Conduct” initiative bear fruit, because it could expand the community of vendors.  I’d also love to see some initiative, somewhere, focus on the ecosystem-building that’s going to be essential if we build the future network from the contribution of a lot of smaller players.  You can’t expect to assemble a car if all you’re given is a pile of parts.

Would Satellite Broadband Work Better for IoT than 5G?

Should we be thinking about a Satellite Internet of Things?  The emerging battle between Elon Musk and Jeff Bezos for low-earth-orbit satellite broadband raises the question, not because satellite broadband is a universal option, but because it’s an option where other options don’t exist, and likely won’t for decades.  That could be a killer IoT opportunity.

Satellite broadband isn’t the Holy Grail of broadband in general.  In nearly every case where terrestrial options are available to consumers, they’d be better off taking them.  Furthermore, my model says that 5G/FTTN millimeter-wave and even 5G mobile technology offer a better general solution to consumer broadband problems caused by low demand density.  But IoT is different, or at least the “real” new-connection IoT opportunity is.

Where IoT is within a facility, it will almost always be cheaper to use traditional short-range wireless technology, or even sensor wiring, to connect it.  Where the IoT elements are widely spaced and, in particular, when they’re actually mobile, you need some form of wide-area solution.  The operators’ hopes for massive 5G revenue were (and in some cases, still are) based on the vain hope that companies will pay monthly 5G bills to connect what WiFi could connect already.  The real question is those things that WiFi can’t connect.

5G faces a problem that goes back as far as (gasp!) ISDN.  CIMI Corporation signed the Cooperative Research and Development Agreement with the National Institute of Standards and Technology, and so I had an opportunity to play in the early justification for ISDN.  I remember well one particular occasion when a technical representative from a big vendor came in and said, with breathless excitement, “We’ve discovered a new application for ISDN!  We call it ‘File Transfer’!”  Well, gosh, we were already doing that.  The point is that there’s always been a tendency to justify a new technology by showing what it can do, rather than what it can do better.

We can do any sort of IoT connectivity with 5G, and nobody questions that.  However, we already have IoT in millions of residential applications, applications which demand a very cost-effective and simple-to-deploy solution, and none of it uses 5G.  Rather than looking at how 5G could help where no help is needed, why not look at what current technology doesn’t do well?

The most demanding of all IoT applications involve sensors and controllers that are mobile.  5G is a mobile technology, so why wouldn’t we see it as a possible fit?  I had an opportunity to engage a big Tier One about their 5G developer program, and I mentioned this issue to one of the program heads.  The response was “Yes, that’s a great application, but there aren’t enough of those mobile sensors to create a really good market for us.”  In other words, start with the number you want and then make up stories that appear to get you there.

The problem with 5G sensors is that if they consume mobile service, they require a monthly cost.  A small percentage of home security systems have mobile-radio connectivity to the alarm center, and these will add around two hundred dollars per year (or more) to the cost of monitoring, plus the cost of the system itself.  Imagine a vast sensor network, with each sensor racking up a nice fat cellular service bill, and you see why my Tier One program head thought it would be exciting.  It would, for those getting the money.  For those spending it, not so much.

The interesting thing was that the day after I had this conversation with the operator, I was watching one of my wildlife shows, and it was about elephant tracking.  They had a satellite GPS system attached to an elephant, and it provided a means of getting regular position updates on the animal.  No 5G (or even 4G) infrastructure needed.  Why not have satellite IoT?  I’m not suggesting that the exact same technology be used, but that satellite Internet could in fact support most of the remote-site and mobile IoT applications that have been proposed for 5G.

The nice thing about satellite is that almost all the cost is in getting the bird into orbit.  Once you’ve done that, you can support users up to the design capacity at no incremental service cost.  No matter where the sensors are installed, no matter how they might roam about, you can keep in touch with them.  Since sensor telemetry is low-bandwidth, you could support a heck of a lot of sensors with one of those satellite systems.

Satellite-based IoT would be a great solution for the transportation industry.  Put a “goods collar” on a shipment, on every vehicle that carries goods, and on every facility that handles and cross-loads the goods, and you could track something everywhere it goes in near real time.  Wonder if your freezer car is malfunctioning and letting all your expensive lobster or tuna spoil?  You can know about the first sign of a problem and get something out to intercept and fix or replace the broken vehicle.  Vandalism?  Covered.  Satellite applications of IoT for transportation, or for anything that’s mobile, could be killer apps for those competing satellite networks.

So, probably, could many fixed-installation applications that people are also claiming as 5G opportunities.  Sensors and controllers in an in-building IoT system can be connected through local wiring or a half-dozen different industrial control RF protocols, including WiFi and WiFi 6.  Stuff that’s somewhere out there in the wild aren’t so easily connected, but they’d be child’s play for satellite IoT.  Even power could be less an issue.  Elephants are big, but they can’t carry a substation on their collar, so you can make these satellite systems’ power requirements modest enough to be supported for a long period on a battery.

Battery?  Who knows batteries better than Musk?  Amazon already builds IoT sensor/controller devices in the Ring line.  It seems to me like these guys are missing an opportunity by not pushing IoT applications for their dueling satellite data networks.

Or are they?  I hear some whispers that there are in fact a number of initiatives either being quietly supported by one of the contenders for satellite-data supremacy, or are being watched closely by one.  Some are attracting the attention of both.  We may see some action in this space quickly, and if it does, not only will it be a powerful validation of satellite broadband, it could force some realism into claims of 5G applications.  After all, we’ve done a heck of a lot of file transfer, and it didn’t help ISDN.

Juniper Gets the Best SD-WAN, and Combined with Mist AI, it Could Take Off

Juniper Networks has announced its intention to acquire 128 Technology, the company I’ve always said was the drop-dead leader in SD-WAN and virtual network technology.  128T will apparently be integrated with Juniper’s Mist AI at some point, and the combination of the technologies opens up a whole series of options for service creation and automation, not only for enterprises, but also for service providers and managed-service providers.

The press release from Juniper isn’t the best reference for the deal.  In the release, they reference a blog entry that frames the combination of 128 Technology and Mist AI very effectively.  The fusion has both Juniper-specific implications and industry implications, and at both the tactical and strategic levels.  I want to focus mostly on the industry-strategic stuff here, with a nod here and there to the other dimensions.

We are in a virtual age.  Enterprises run applications on virtual hosts and connect to them with virtual private networks.  The cloud, arguably the most transformational thing about information technology in our time, is all about virtualization.  I doubt if there are any enterprises today who believe virtualization is unimportant; certainly, none I’ve talked with hold that view.  And yet….

….and yet we don’t seem to have a really good handle on what living in a virtual world really means.  We run on stuff, connect with stuff, that’s not really there.  How do we manage it?  How do you “optimize” or “repair” something that’s simply your current realization of an abstraction?  Could it be that until we understand the virtual world, everything we do with the cloud, the network, the data center, are like bikes on training wheels?

One of my old friends in IT admitted the other day that “virtualization gives me a headache.”  It probably does that with a lot of people, both the older ones who now have risen to senior roles, and the newcomers who are inclined to see virtualization as sticking post-it notes on infrastructure than then trying to run on them.  So here’s an interesting question: If humans have a problem coming to terms with the virtual world, why not give it over to AI?

When Juniper acquired Mist Systems, it seemed from the release that Juniper was doing it to get an AI-powered WiFi platform.  Whether that would have been a good move is an open question, but Juniper evolved the Mist relationship to the “Mist AI” platform as a tool to create and optimize connectivity over all of Juniper’s platforms, in the LAN and WAN.

In order to do the stuff that Juniper/Mist promises, you need three things.  First and foremost, because they focus so much on optimization, you need objectives.  Otherwise, how do you know what’s “optimum?”  The second thing you need are constraints, because the best answer may not be one of the choices.  Finally, you need conditions, which represent the baseline from which you’re trying to achieve your objectives.  AI is actually a potentially great way to digest these three things and come up with solutions.

Mist doesn’t describe their approach to the application of AI to broad network optimization, but I think it surely involves addressing these three things.  Thus, the big question on the 128T deal, from Juniper’s side, is how the deal could help with Mist’s objectives.  There are, I think, two ways.

The first thing 128T adds to Mist AI is session awareness.  Those of you who have followed my coverage of 128 Technology know that this has always been, in my view, their secret sauce.  Yes, they can eliminate tunnel overhead, but what makes them different is that they know about user-to-application relationships.  The driver of enterprise IT and network investment is, and always has been, productivity enhancement.  Workers can’t be made productive by an application, technology, or network service that doesn’t know what they’re doing.  Except, perhaps, by accident, and accidental gains are a pretty lame story to take to a CFO.  128 Technology is based on recognizing session relationships, so it knows who’s trying to do what, and that knowledge is essential in any AI framework that wants to claim to “optimize”.

The second way 128 Technology adds to the Mist AI story is that knowing about something you can’t impact is an intellectual excursion, not a business strategy.  Traditional networking, including traditional SD-WAN, is all about connecting sites.  There are a lot of users in any given site, doing a lot of stuff, and much of it is more likely to be entertaining them or getting them dinner reservations than enhancing company sales and revenues.  The relationships between workers and applications are what empower them (or entertain them), so you need to be able to control these to make them more productive.  It’s not enough to know that a critical Zoom conference isn’t working because of bandwidth issues.  You need to be able to fix it, and 128 Technology can prioritize application traffic and provide preferential routing for it, based on the specific application-to-user relationships.

Sum this up, then.  Combining Mist AI and 128T’s session awareness can first extend Mist AI’s awareness of network relationships down to the user/application level, a level where productivity tuning is critical.  Most companies would likely prioritize workers’ interactions with applications based on their importance to company revenue generation or unit value of labor.  128T can gather data at that level, and feed the AI vision of where-we-are relative to where-we-should be with the best and most relevant information.  Once that information has been AI-digested, the results can be applied in such a way as to maximize network commitment to business benefits.  What more can you ask?

Well, we could at least wonder where Juniper might take all of this.  If we presume that Mist AI and 128 Technology combine to support those three requirements of optimization, we could ask whether it creates the effect of a higher, control-plane-like, element.  Does AI and collected data combine to establish real understanding of the network below, understanding that could be molded into new services?  Could session-awareness, the key attribute of 128 Technology’s product, be used to map data flows over arbitrary infrastructure?  Since I’ve always said that 128 Technology was as much a virtual network solution, an application network solution, as an SD-WAN, could Juniper use it to augment Mist and create a Network-as-a-Service model?

Both Cisco and Juniper have unbundled their hardware and software, making it theoretically possible that they could offer hardware as a kind of “gray box” and software as a generalized routing engine.  Could Mist AI and 128 Technology provide them a way of enhancing their value in this unbundled form, and accommodating white-box and even SDN within a Juniper-built network?  Cisco has nothing comparable, which wouldn’t break hearts at Juniper.  There’s a lot of potential here, but without details on both how Mist AI works and where Juniper plans to take 128 Technology, we can’t do more than guess whether it will be realized.

Ponder, though, the title of Juniper’s blog (referenced above): “The WAN is Now in Session.”  Nice marketing, and perhaps an introduction to something far more.

My Response to the Code of Conduct Framework

I was very pleased and interested when Don Clarke, an old friend from the days of NFV, posted a link on a “code of conduct” to “boost innovation and increase vendor diversity”.  He asked me to comment on the paper, and I’m going to give it the consideration it deserves by posting a blog on LinkedIn as a comment, then following the thread to respond to any remarks others might make.

One of the biggest barriers to a real transformation of network operator business models and infrastructure is the inherent tension between incumbency and innovation.  The vendors who are most entrenched in a network are the ones least likely to see a benefit from radical change, and thus the least likely to innovate.  The vendors who are likely to innovate are probably small players with little or no current exposure in operator networks.  For transformation to occur, we either have to make big vendors innovate at their own business risk (fat chance!) or we have to somehow help smaller players engage with operators.  The barriers to that second and only viable option are profound, for four reasons.

First, startups that aim at the infrastructure space and target network operators are far from the wheelhouse of most VCs, so it’s difficult to even get started on such a mission.  There was a time perhaps 15 years ago when VCs did a blitz on the space, and nearly all the startups that were founded in that period failed to pay back as hoped.  The investment needed to enter the space is large, the time period needed for operators to respond is long, and the pace of deployment even if they make a favorable decision, means payback might take years.  All this is exacerbated by the cost and complexity of creating and sustaining engagement with operators, which is the focus of the rest of the barriers below.

Second, transformation projects, because of their scope of impact, require senior executive engagement, which smaller firms often can neither establish nor sustain.  A big router vendor can probably call on the CTO, CIO, or CIO of most operators with little trouble, and in many cases get a CFO or CEO meeting.  A startup?  Even if somehow the startup gets in the door, the big vendors have a team of experts riding herd on the big operators.  That kind of on-site sales attention is simply not possible for smaller companies.

Third, transformation initiatives by operators usually involve standards processes, international forums, open-source projects, and other technical activities.  Participation in any one of these is almost a full-time job for a high-level technical specialist.  Big vendors staff these activities, often with multiple people, and thus engage with operator personnel who are involved in transformation.  Small companies simply cannot donate the time needed, much less pay membership fees in the organizations that require them and pay for the travel necessary.

Finally, operator procurement of products and services impose conditions easily (and regularly) met by major vendors, but beyond those normally imposed on startups by smaller prospects they regularly call on.  As a result, simply complying with the song-and-dance needed to get an engagement may be a major investment of resources.  Some operators require financial disclosures that private companies might be unwilling to make, or insurance policies that would be expensive enough to be a drain on resources.

It’s my view that if the Code of Conduct proposes to impact these areas, it has a chance of doing what it proposes to do, which I agree is very important.  Let’s look then at the paper in each area.

In the area of VC credibility for funding transformation startups, the paper makes a number of suggestions in its “Funding” subsection, the best of which would be an “investment fund”.  I do have concerns that this might run afoul of international regulatory practices, which often cite cooperative activities by the operators as collusion.  If a fund could be made to work, it would be great and the suggestions on managing it are good.

If a fund wouldn’t be possible, then I think that operator commitment to a small-vendor engagement model, such as that described in the paper, might well be enough.  VCs need the confidence that the whole process, from conception of a transforming product through either an IPO or M&A, will work because transformation opportunities will exist and can be realized.  For this to be true, though, the strategies for handling the other three issues have to be very strong.

The next issue is that appropriate engagement is difficult to achieve and sustain for startups.  Some points relating to this are covered in the paper’s “Innovation” and “Competition” subheads.  Because some of those same points relate to my participation in industry activities section, I’ll cover both these issues and the paper’s sections in a single set of comments.

I like the notion of facilitating the field trials, but as my third issue points out, participation in industry events is a critical precursor to that.  Startups need to be engaged in the development of specifications, standards, software, and practices, if they’re to contribute innovation at a point where it has a chance of being realized.  You cannot take a clunker idea out of a fossilized standards process and ask for innovation on implementation.  It’s too late by then.

I’d propose that operators think about “creative sponsorships” where individuals or companies who have the potential to make major innovative contributions are funded to attend these meetings.  Individual operators could make such commitments if collective funding proves to pose anti-trust issues.  These sponsorships would require the recipients to participate, make suggestions, and submit recommendations on implementation.  From those recommendations, the “Innovation” recommendations in the paper could be applied to socialize the ideas through to trials.

This would also address the issue of “public calls” and “open procurements” cited in the Competition portion of the paper.  The problem we have today with innovation is most often not that startups aren’t seen as credible sources in an RFP process, but that the RFP is wired by the incumbent and aimed at non-innovative technology.  Operators who want an innovative solution have to be sure there’s one available, and only then can they refine their process of issuing RFIs and RFCs to address it.

A final suggestion here is to bar vendors from participation in creating RFIs and RFCs.  I think that well over 80% of all such documents I’ve seen are influenced so strongly by the major vendors (the incumbent in particular) that there’s simply no way for a startup to reflect an innovative strategy within the constraints of the document.

The final issue I raised, on the structuring of the relationship and contractual requirements, is handled in the Procurement piece of the paper, and while I like all the points the paper makes, I do think that more work could be done to grease the skids on participation early on.  There should be an “innovative vendor” pathway, perhaps linked to those creative sponsorships I mentioned, that would certify a vendor for participation in a deal without all the hoop-jumps currently required.

In summary, I think this paper offers a good strategy, and I’d be happy to work with operators on innovation if they followed it!

How Good an Idea is the “ONF Marketplace?”

The ONF may just have done something very smart.  It’s been clear for at least a decade that operators want to buy products rather than just endorsing standards, but how do the products develop in an open-source world where no single player fields a total solution?  The ONF says that the answer to that is ONF Marketplace.  The concept has a lot of merit, but it’s still not completely clear that the ONF will cover the whole network-ecosystem waterfront, and that might be an issue.

For decades there’s been a growing disconnect between how we produce technology and how we consume it.  Technology is productized, meaning that there are cohesive functional chunks of work that are turned into “applications” and sold.  Rarely do companies or people consume things this way.  Instead, they create work platforms by combining the products.  Microsoft Office is a great example of such a platform.  When technology is “platformized” like this, the natural symbiosis between the products builds a greater value for the whole.

The same thing happens in the consumer space.  Consumers don’t care about network technology, they care about experiences, and so consumer broadband offerings have to be experience platforms.  They have to deliver what the consumer wants, handle any issues in delivery quality quickly and cheaply, and evolve to support changes in consumer expectations or available experiences.  It’s not just pushing bits around.

In networks, platforms for work or experience are created by integrating all the network, hosting, and management technologies needed.

Historically, networks were built by assembling products, and to maximize their profits, vendors also produced related products that created the entire network ecosystem.  When you bought routers from Cisco, for example, you’d get not only routers but management systems and related tools essential in making a bunch of routers into a router network.  That goal—making routers into router networks—is the same goal we have today, but with open components and startups creating best-of-breed products, it’s not as easy.

Look at a virtualization-based or even white-box solution today.  You get the boxes from Vendor A, the platform software for the white boxes from Vendor B, the actual network/router software from Vendor C.  Then you have to ask where the operations and management tools come from.  The problem is especially acute if you’ve decided on a major technology shift, something like SDN.  Traditional operations/management tools probably won’t even work.  How do you convert products to platform?

The ONF Marketplace is at least an attempt to bridge the product/platform gap.  If you establish a set of standards or specifications and certify against them, and if you also align them to be symbiotic, buyers would have more confidence that getting a certified solution would mean getting an integrated, complete, solution.

The fly in the ointment is the notion of an “integrated, complete, solution”.  There are really three levels of concern with regard to the creation and sustaining of a complete transformation ecosystem.  Does the ONF Marketplace address them all, and if not, is what’s not part of the deal critical enough to threaten the goal overall?

The ONF has four suites in its sights at the moment; Aether, Stratum, SEBA and VOLTHA.  Aether is a connectivity-as-a-service model linked to mobile (4G and 5G) networking.  Stratum is a chip-independent white-box operating system, SEBA is a virtual-based PON framework for residential/business broadband and mobile backhaul, and VOLTHA is a subset of SEBA aimed at OpenFlow control of PON optics.  One thing that stands out here is that all of these missions are very low-level; there’s nothing about management and little about transformation of IP through support of alternative routing—either in “new-router” or “new-routing” form.

The ONF does have a vision for programmable IP networks, based on OpenFlow and SDN, but as I’ve noted in prior blogs, the concept doesn’t have a lot of credibility outside the data center because of SDN controller scalability and availability fears.  There is really no vision at a higher level, no management framework, no OSS/BSS, and nothing that really ties these initiatives to a specific business case.  That all raises some critical questions.

The first is whether transformation is even possible without transforming IP.  Operators don’t think so; I can’t remember any conversation I’ve had with an operator in the last five years that didn’t acknowledge the need to change how the IP layer was built and managed.  I think that makes it clear that the only players who will be able to transform anything above or below IP will have to start with an IP strategy.

In a left-handed way, that might explain why transformation has been so difficult.  The logical players to transform the IP layer would be the incumbents there, and of course those incumbents have no incentive to redesign the network so as to reduce operator spending on their products.  Any non-incumbents have to fight against entrenched giants to get traction, which is never easy.

The second question is whether something like ONF Marketplace could elevate itself to consider the infrastructure, network, and hosting management issues.  Maybe, but right now the initiative is focused on the ONF’s own work, the specifications it’s developed.  The ONF has no position in the management space, nothing to build on.  Would they be willing to at least frame partnerships above their own stuff, and then certify them in their marketplace?

Then there’s the key question, which is whether a marketplace is really a way to assemble a transformational ecosystem.  An ecosystem has to be characterized by three factors.  First, it has to be functionally complete, covering all the technical elements needed to make a complete business case.  Second, it has to be fully integrated so that it can be deployed as a single package, without a lot of incremental tuning through professional services.  Otherwise, buyers can’t really be sure it’s going to work.  Finally, it has to offer specific and credible sponsorship, some player whose credibility is sufficient to make the concept itself credible.  How does the ONF Marketplace concept fare in these areas?

It’s not functionally complete.  There’s no credible IP strategy at this point, and nothing but open sky above.  It is fully integrated within its scope, but because it’s not complete the level of integration of the whole (which isn’t available to judge at this point) can’t be assessed.  But what it does have is specific and credible sponsorship.  The ONF has done a lot of good stuff, even in the IP area where it’s taking a lead in programmable control-plane behavior, a key to new services.

From this, I think we can make a judgement on the ONF Marketplace concept.  If that concept can be anchored at the IP level, either by fixing the SDN-centric view the ONF has now or by admitting to other IP-layer approaches in some way, then the concept can work.  In fact, it might be a prototype for how we could create a transformed network model, and sell it to buyers who are among the most risk-adverse in all the world.

I hope the ONF thinks about all of this.  They’ve done good work below IP, particularly with Stratum and P4, but they need to get the key piece of the puzzle in place somehow.  If they do, they could raise themselves above the growing number of organizations who plan to do something to drive the network of the future.  If they do that, they might keep us from getting stuck in the present.

Translating the Philosophy of Complexity Management to Reality

Could we be missing something fundamental in IT and network design?  Everyone knows that the focus of good design is creating good outcomes, but should at least equal attention to preventing bad outcomes?  A LinkedIn contact of mine who’s contributed some highly useful (and exceptionally thoughtful) comments sent me a reference on “Design for Prevention” or D4P, a book that’s summarized into a paper available online HERE.  I think there’s some useful stuff here, and I want to try to apply it to networking.  To do that, I have to translate the philosophical approach of the referenced document into a methodology or architecture that can be applied to networking.

The document is a slog to get through (I can only imagine what the book is like).  It’s more about philosophy than about engineering in a technical sense, so I guess it would be fair to say it’s about engineering philosophy.  The technical sense is that there’s always a good outcome, a goal outcome, in a project.  The evolution from simple to complex doesn’t alter the number of good outcomes (one), but it does alter the number of possible bad outcomes.  In other words, there’s a good chance that without some specific steps to address the situation, complex systems will fail for reasons unforeseen.

The piece starts with an insight worthy of consideration: “[The] old world was characterized by the need to manage things – stone, wood, iron.  The new world is characterized by the need to manage complexity. Complexity is the very stuff of today’s world. This mismatch lies at the root of our incompetence.”—Stafford Beer.  I’ve been in tech for longer than most of my readers have lived, and I remember the very transformation Beer is talking about.  To get better, we get more complicated, and if we want to avoid being buried in the very stuff of our advances, we have to manage it better.  That’s where my opening question comes in; are we doing enough?

Better management, in the D4P concept, is really about controlling and preventing the bad outcomes that arise from complexity, through application of professional engineering discipline.  Add this to the usual goal-seeking, add foresight to hindsight, and you have something useful, even compelling, providing that you can make the philosophical approach the paper takes into something more actionable.  To fulfill my goal of linking philosophy to architecture, it will be necessary to control complexity in that architecture.

D4P’s thesis is that it’s not enough to try to design for the desired outcome, you also have to design to prevent unfavorable outcomes.  I think it might even be fair to say that there are situations where the “right” (or at least “best”) outcome is one that isn’t one of the bad ones.  With a whole industry focused on “winning”, though, how do we look at “not-losing” as a goal?  General McArthur was asked his formula for success in offensive warfare, and he replied “Hit them where they ain’t”.  He was then asked for a strategy for defense, and he replied “Defeat”.  Applying this to network and IT projects, it means we have to take the offense against problems, not responding to them in operation but in planning.

Hitting them where they ain’t, in the D4P approach, means shifting from a hindsight view (fix a problem) to a foresight view (prevent a problem by anticipating it).  Obviously, preventing something from happening can be said to be a “foresight” approach, but of course you could say that about seeking a successful outcome.  How, in a complex system, to you manage complexity, discourage bad outcomes, by thinking or planning ahead?  There are certainly philosophers among the software and network engineering community, but most of both groups have a pretty pragmatic set of goals.  We don’t want them to develop the philosophy of networking, we want a network.  There has to be some methodology that gives us the network within D4P constraints.

The centerpiece of the methodology seems to me to be the concept of a “standard of care”, a blueprint to achieve the goal of avoiding bad outcomes.  It’s at this point that I’ll leave the philosophical and propose the methodological.  I suggest that this concept is a bit like an intent model.  That’s not exactly where D4P goes, but I want to take a step of mapping the “philosophy” to current industry terms and thinking.  I also think that intent modeling, applied hierarchically, is a great tool for managing complexity.

D4P’s goal is to avoid getting trapped in a process rather than responding to objective data.  We don’t have to look far, or hard, to find examples of how that trap gets sprung on us in the networking space.  NFV is a good one, so is SDN, ZTA, and arguably some of the 5G work.  How, exactly, does this trap get sprung?  The paper gives non-IT comments, but you could translate them into IT terms and situations, which of course is what I propose to do here.

Complexity is the product of the combination of large numbers of cooperating elements in a system and large numbers of relationships among the elements.  I think that when faced with something like that, people are forced to try to apply organization to the mess, and when they do that, they often “anthromorphize” the way the system would work.  They think of how they, or a team of humans, would to something.    In-boxes, human stepwise processes, outboxes, and somebody to carry work from “out” to “in”.  That’s how people do things, and how you can get trapped in process.

This approach, this “practice” has the effect of creating tight coupling between the cooperative elements, which means that the system’s complexity is directly reflected in the implementation of the software or network feature.  In IoT terms, what we’ve done is created a vast and complex “control loop”, and it’s hard to avoid having to ask questions like “Can I do this here, when something else might be operating on the same resource?”  Those questions, the need to ask them, are examples of not designing for prevention.

So many of our diagrams and architectures end up as monolithic visions because humans are monoliths.  The first thing I think needs to be done to achieve D4P is to come up with a better way of organizing this vast complexity.  That’s where I think that intent models come into play.  An intent model is a representation of functionality and not implementation.  That presents two benefits at the conceptualization stage of an IT or network project.  First, is lets you translate goal behavior to functional elements without worrying much about how the elements are implemented.  That frees the original organization of the complex elements from the details that make them complex, or from implementation assumptions that could contaminate the project by introducing too much “process” and not enough “data”.

Artificial intelligence isn’t the answer to this problem.  An artificial human shuffling paper doesn’t do any better than a real one.  AI, applied to systems that are too complex, will have the same kind of problems that we’ve faced all along.  The virtue of modeling, intent modeling, is that you can subdivide systems, and by containing elements into subsystems, reduce the possible interactions…the complexity.

Intent models, functionality models, aren’t enough, of course.  You need functionality maps, meaning that you need to understand how the functions relate to each other.  The best way to do that is through the age-old concept of the workflow.  A workflow is an ordered set of process responses to an event or condition.  The presumption of a workflow-centric functionality map is that a description of the application or service, a “contract”, can define the relationship of the functions within the end-result service or application.  That was the essence of the TMF NGOSS Contract stuff.

In the NGOSS Contract, every “function” (using my term) is a contract element that has a state/event table associated with it.  That table identifies every meaningful operating state that the element can be in, and how every possible event the element could receive should be processed for each of those states.  Remember here that we’re still not saying how any process is implemented, we’re simply defining how the black boxes relate to each other and to the end result.

The state/event table, in my view, is the key to introducing foresight and D4P principles to application and service design.  We can look at our elements/functions and define their meaningful states (meaningful, meaning visible from the outside), and we can define how the events associated with the elements are linked to abstract processes.  If we do this right, and the paper describes the philosophy associated with getting it right, we end up with something that not only recognizes the goal, but also handles unfavorable things.  We’ve created a goal-seeking framework for automation.

Does it really address the “design-for-prevention” paradigm, though?  We’ve done some of the work, I think, through intent-modeling and functional mapping, because we’ve organized the complexity without getting bogged down in implementation.  That reduces what I’ll call the “internal process problem”, the stuff associated with how you elect to organize your task in a complex world.  There’s another process issue, though, and we have to look at how it’s handled.

The very task of creating functional elements and functional maps is a kind of process.  The state/event table, because it has to link to processes, obviously has to define processes to link to.  In the approach I’m describing here, it is absolutely essential that the functional and functional-map pieces, and the event/process combinations, be thoroughly thought out.  One advantage of the state/event system is that it forces an architect to categorize how each event should be handled, and how events relate to a transition in operating states.  In any state/event table, there is typically one state, sometimes called “Operational”, that reflects the goal.  The other states are either steps along the way to that goal, or problems to be addressed or prevented.

At the functional map level, you prevent failures by defining all the events that are relevant to a function and associating a state/process progression to each.  Instead of having a certain event, unexpected, create a major outage, you define every event in every state so nothing is unexpected.  You can do that because you have a contained problem—your function abstraction and your functional map are all you need to work with, no matter how complex the implementation is.  In IoT-ish terms, functional separation creates shorter control loops, because every function is a black box that produces a specific set of interfaces/behaviors at the boundary. No interior process exits the function.

But what about what’s inside the black box?  A function could “decompose” into one of two things—another function set, with its own “contract” and state/event tables, or a primitive implementation.  Whatever is inside, the goal is to meet the external interface(s) and SLA of the function.  If each of the functions is designed to completely represent its own internal state/event/process relationships in some way, then it’s a suitable implementation and it should also be D4P-compliant.

I’ve seen the result of a failure to provide D4P thinking, in a network protocol.  A new network architecture was introduced, and unlike the old architecture, the new one allowed for the queuing of packets, sometimes for a protracted period of time.  The protocol running over the network was designed for a point-to-point connection, meaning that there was nothing inside the network to queue, and therefore its state/event tables didn’t accommodate the situation when messages were delayed for a long period.  What happened was that, under load, messages were delayed so much that the other end of the connection had “timed out” and entered a different state.  Context between endpoints was lost, and the system finally decided it must have a bad software load, so everything rebooted.  That made queuing even worse, and down everything came.  The right answer was simple; don’t ever queue messages for this protocol, throw them away.  The protocol state/event process could handle that, but not a delayed delivery.

I think this example illustrates why functionality maps and state/event/process specification is so important in preventing failures.  It also shows why it’s still not easy to get what you want.  Could people designing a protocol for direct-line connection have anticipated data delayed, intact, in flight and delivered many seconds after it was sent?  Clearly they didn’t.  Could people creating a new transport network model to replace physical lines with virtual paths have anticipated that their new architecture would introduce conditions that couldn’t have existed before, and thus fail when those conditions did happen?  Clearly they didn’t.

Near the end of the paper is another critical point: “Complexity reduction is elimination and unification.”  I think that’s what the approach I’m talking about here does, and why I think it’s a way to address D4P in the context of service and application virtualization.  That’s why it’s my way of taking D4P philosophy and converting it into a project methodology.

In the same place, I find one thing I disagree with, and it’s a thing that nicely frames the difficulty we face adopting this approach.  “Keep in mind that the best goal-seeking methods are scrutably connected to natural law and from that whence commeth your distinguishing difference and overwhelming advantage.”  Aside from validating my characterization of the piece as being pretty deep philosophy, this points out a problem.  Software architecture isn’t natural law, and cloud development and virtualization take us a long way out of the “natural”.  That’s the goal, in fact.  What is “virtual” if not “unnatural”.  We have to come to terms with the unnatural to make the future work for us.

I agree with the notion of D4P, and I agree with the philosophy behind it, a philosophy the paper discusses, but I’m not much of a philosopher myself.  The practical truth is that what we need to do is generalize our thinking within the constraints of intent models, functionality maps, and state/event/process associations, to ensure that we don’t treat things that are really like geometry’s theorems as geometry’s axioms.  I think that the process I’ve described has the advantage of encouraging us to do that, but it can’t work any better than we’re willing to make it work, and that may be so much of a change in mindset that many of our planners and architects will have trouble with the transition.

How a Separate Control and Data Plane Would Work

How would a separate control plane for IP work?  What would it facilitate?  It’s pretty obvious that if you were to separate the control and data planes of IP, you could tune the implementation of each of these independently, creating the basis for a disaggregated model of routing versus the traditional node-centric IP approach, but why bother?  To answer these questions, we have to go back in time to early attempts to work non-IP “core networks” into an IP network.

The classic, and most-technology-agnostic, view of a separate control plane is shown in the figure below.  In it, and in all the other figures in this blog, the blue arrows represent data-plane traffic and the red ones the control-plane traffic.  As the figure shows, the control plane sets the forwarding rules that govern data-plane movement.  Control-plane traffic (at least that traffic that’s related to forwarding control or state) is extracted at the entry interface and passed to the control plane processing element.  At the exit interface, control-plane traffic is synthesized from the data the processing element retains.

The “Classic” Separate Control Plane

The earliest example of something like this is decades old.  Back in the 1990s, when ATM and frame relay were just coming out, there was interest in utilizing widespread deployment of one or both of these networks for use with IP.  Either protocol created “virtual circuits” analogous to voice calls, and so the question was how to relate the destination of an IP packet that had to travel across one of these networks (called “Non-Broadcast Multi-Access” networks) with the IP exit point associated with the destination.  The result was the Next-Hop Resolution Protocol, or NHRP.

NHRP’s Approach Uses a Control Server

NHRP, shown in the figure above, visualizes NHRP operation.  The IP users are expected to be in “Local Independent Subnets” or LISs, and a LIS is a Level 2 enclave, meaning it doesn’t contain routers.  The gateway router for the LIS is an NHRP client, and each client registers its subnet with the NHRP server.  When the NHRP client receives a packet for another LIS, it interrogates the server for the NBMA address of the NHRP client that serves the destination.  The originating Client then establishes a virtual connection with the destination Client, and the packets are passed.  Eventually, if not used, the connection will time out.

NHRP never had much use because frame relay and ATM failed to gain broad deployment, so things went quiet for a while.  When Software-Defined Networking was introduced, with the ONF as its champion, it proposed a different model of non-IP network than ATM or frame relay had proposed, and so it required a different strategy.

The goal of SDN was to separate the IP control plane from forwarding to facilitate centralized, effective, traffic engineering.  This was done by dividing IP packet handling into a forwarding-plane element and a control-plane element.  The forwarding plane was expected to be implemented by commodity white-box switches equipped with the OpenFlow protocol, and the control plane was to be implemented using a central SDN controller.

The SDN Model of Separating the Control Plane

In operation, SDN would create a series of protocol-independent tunnels within the SDN domain.  At the boundary, control packets that related in any way to status or topology would be handled by the SDN controller, and all other packets would be passed onto an SDN tunnel based on a forwarding table that was maintained via OpenFlow from the SDN controller.

While the principle goal of SDN was traffic engineering, it was quickly recognized that if the SDN controller provided “northbound APIs” that allowed for external application control of the global forwarding table and the individual tables of the forwarding switches, the result would allow for application control of forwarding.  This is the current SDN concept, the one that was presented in the recent SDN presentation by the ONF, referenced in one of my earlier blogs.

This SDN model introduced the value of programmability to the ONF approach.  Things like the IP protocols (notably BGP, the protocol used to link autonomous systems, or ASs, in IP) and even the 5G N2/N4 interfaces, could now be mapped directly to forwarding rules.  However, the external applications that controlled forwarding were still external, and the IP control plane was still living inside that SDN controller.

The fact that Lumina Networks closed its doors even as it had engagements with some telcos should be a strong indicator that the “SDN controller” approach has issues that making it more programmable won’t resolve.  In fact, I think the biggest lesson to be learned from Lumina is that the monolithic controller isn’t a suitable framework.  How, then, do we achieve programmability?

Google had (and has) its own take on SDN, one that involves both traffic engineering and a form of network function virtualization.  Called “Andromeda”, the Google model was designed to create what’s turned out to be two Google backbone networks, one (B2) carrying Internet-facing traffic and the other (B4) carrying the inter-data-center traffic involved in building experiences.  Andromeda in its current form (2.2) is really a combination of SDN and what could be described as a service mesh.  Both the IP control plane and any of what were those “external applications” in SDN are now “microfeatures” implemented as microservices and hosed on a fabric controlled by SDN and OpenFlow.  The latency of the fabric is very low, and it’s used both to connect processes and to pass IP protocol streams (the two data sources for those two Google backbones).

Google Andromeda and Microfeatures

With Andromeda, virtual networks are built on top of an SDN “fabric”, and each of these networks is independent.  The early examples of Andromeda show them being based on the “private” IP address spaces, in fact.  Networks are built from the rack upward, with the basic unit of both networking and compute being a data center.  Some of Google’s virtual networks extend across multiple locations, even throughout the world. 

Andromeda could make a rightful claim to being the first and only true cloud-native implementation of virtual functions.  The reliance on microfeatures (implemented as cloud-native microservices) means that the control plane is extensible not only to new types of IP interactions (new protocols for topology/status, for example) but also “vertically” to envelope the range of external applications that might be added to IP.

An example of this flexibility can be found in support for 5G.  The N2/N4 interfaces of 5G pass between the control plane (of 5G) and the “user plane”, which is IP.  It would be possible to implement these interfaces as internal microservice APIs or events, coupled through the fabric between microfeatures.  It would also be possible to have 5G N2/N4 directly influence forwarding table entries, or draw on data contained in those tables.  For mobility management, could this mean that an SDN conduit created by OpenFlow could replace a traditional tunnel?  It would seem so.

It’s worthwhile to stop here a moment to contrast the ONF/SDN approach and the Google Andromeda approach.  It seems to hinge on two points—the people and the perceived mission.  The ONF SDN model was created by network people, for the purposes of building a network to connect users.  The Google Andromeda approach was created by cloud people to build and connect experiences.  In short, Google was building a network for the real mission of the Internet, experience delivery, while the ONF was still building a connection network.

I think the combination of the ONF concept and the Google Andromeda concept illustrates the evolution of networking.  If operators are confined to providing connectivity, they’re disconnected from new revenue sources.  As a cloud provider, an experience-centric company, Google designed a network model that fit the cloud.  In point of fact, they built the network around the cloud.

I’ve blogged about Andromeda dozens of times, because it’s always seemed to me to be the leading edge of network-think.  It still is, and so I think that new-model and open-model networking is going to have to converge in architecture to an Andromeda model.  Andromeda’s big contribution is that by isolating the data plane and converting it to simple forwarding, it allows cloud-hosted forwarding control in any form to be added above.  Since networks and network protocols are differentiated not by their data plane but by their forwarding control, it makes networks as agile as they can be, as agile as the cloud.

Where “above” turns out to be is a question that combines the issues of separating the control plane and the issues of “locality” (refer to my blog HERE).  I think the Andromeda model, which is only one of the initiatives Google has undertaken to improve “experience latency” in the cloud, demonstrates that there really should be cooperation between the network and any “service mesh” or microfeature cloud, to secure reasonable accumulative latency.  To make that happen, the process of placing features, or calling on feature-as-a-service, has to consider the overall latency issues, including issues relating to network delay or feature deployment.

There’s also the question of what specific microfeatures would be offered in the supercontrol-plane model.  Obviously, you need to have a central topology map for the scope of the supercontrol-plane and obviously you have to be able to extract control-plane packets from the interface and route them to the supercontrol-plane, then return synthesized packets to the exit interfaces.  How should all this be done, meaning how “microfeatured” should we go?  There’s a lot of room for differentiation here, in part because this is where most of the real service revenue potential of the new model would be created.  Could an entire CDN, for example, be migrated into the supercontrol-plane, or an entire 5G control plane layer?

A supercontrol-plane strategy that would allow new-model networking to be linked to both revenues in general and 5G in particular would be a powerful driver for the new model and white boxes.  By linking white boxes to both the largest source of greenfield deployment (5G) and the strongest overall driver for modernization (new revenue), supercontrol-plane technology could be the most significant driver for change in the whole of networking, and thus the greatest source of competition and opportunity.  Once vendors figure this out, there will likely be a land-rush positioning effort…to some variant on one of the approaches I’ve outlined.  Which?  Who knows?

While all of this clearly complicates the question of what the new network model might be, the fact is that it’s still a good sign.  These issues were never unimportant, only unrecognized.  We’re now seeing the market frame out choices, and in doing that they’re defining what differentiates the choices from each other.  Differentiation is the combination of difference and value, and that’s a good combination to lead us into a new network age.

How Will Cisco Respond to Open-Model Networking?

Cisco is facing a revolution that would totally change their business model, a revolution that will devalue traditional routers.  They’re already seeing the signs of a business revolution, in fact.  Thus, the question isn’t whether they’ll respond (they must) but how they’ll respond, and where that response might lead the rest of the industry.

To software, obviously, or at least to more software.  There are growing signs that Cisco is going way deeper into software.  For years, Cisco has been the only major network vendor who provided servers and platform software tools, and its recent acquisitions (Portshift, BabbleLabs, Modcam) have been more in the IT space than in the network space.  It’s not surprising that Cisco would be watching the IT side, given that it faces a major challenge in its network equipment business.  What may be surprising is that Cisco seems focused not on IT applications to networks, but at applications overall.  The question is whether what “seems” to be true, really is.

For literally half a century, networking has been a darling of CFOs.  Information penned up in data centers could be released, via a network connection, to reach workers and enhance their performance.  There was so much pent-up demand that few network projects really faced much resistance.  It was almost “Build it, in case they come!”

The good times are all gone, as the song goes.  Network operators face compression between revenue and cost per bit, reducing return on infrastructure investment and putting massive pressure on capital budgets.  Enterprises, having unlocked most of that confined information resource base, are now having difficulties justifying spending more network dollars without additional, specific, benefits.  The most significant result of this combination has been a shift of focus among buyers, toward “open-model” networks.

An open-model network is a network created by a combination of white-box switches and separate network software.  White-box switches have a long history, going back at least as far as SDN, but the Open Compute Project and the Telecom Infra Project are current supporters of the concept, and so is the ONF (with Stratum).  Because the white-box device can run multiple software packages, it’s almost like a server in its ability to be generalized.  It’s built from commercially available parts, based on open specifications, and so there’s a competitive market, unlike that for proprietary switches and routers.

Network operators, in particular, have been increasingly interested in this space because it promised a break from the classic vendor-lock-in problem that they believe has driven up costs for them, even as revenue per bit has fallen.  The original SDN white-box approach was somewhat attractive in the data center, but the central controller wasn’t popular for WAN applications.  Now, with players like DriveNets pushing a cluster-cloud router based on white-box technology, it’s clear that routers will be under direct threat too.

Cisco has lost business to white boxes already, and with the AT&T/DriveNets deal demonstrating that a Tier One operator is willing to bet their core network on them, further interest among operators is inevitable.  Capital budgets for networking were already slipping, and white boxes could make things immeasurably worse.  No wonder Cisco feels pressure, especially from investors.

Logically, there are two steps that Cisco could take to relieve their own stock-price worries.  The first is to increase their revenues outside their core device sales.  They started doing this years ago with things like WebEx and Unified Computing System (UCS) servers, and they’ve also been unbundling their IOS network operating system to run on white boxes, and as a subscription offering.  The second is to try to beat the white-box movement at its own game.

Just selling white boxes, or promoting IOS as a white-box OS, wouldn’t generate much for Cisco.  You have to be able to add value, not just replicate router networks with a cheaper platform.  As I pointed out in an earlier blog (HERE), the white-box space and the SDN movement combine to argue for a strict separation of the IP control plane and the data plane.  The devices that can host the control plane look very much like standard cloud servers, and the data-plane devices are custom white-box switches with chips designed to create high-performance forwarding.  It’s very much like the SDN model of the forwarding devices and the central controller, except that the control plane isn’t necessarily implemented by a central controller at all.  DriveNets hosts the control plane via microservices in what we could describe as a “cluster-cloud”.  Google’s Andromeda composes control planes (and other experience-level components) from microfeatures clustered around an SDN core, and older concepts like the Next-Hop Resolution Protocol (NHRP) describe how to deliver IP routing from what they call an NBMA (Non-Broadcast Multi-Access) network.  In short, we have no shortage of specialized non-centralized IP control planes (I’ll get into more detail on some of these in a later blog).

Referring again to my earlier blog, the IP control plane is only one of several “control planes” that act to influence forwarding behavior and create network services.  IMS/EPC in 4G networks, and the control/user-plane separation in 5G NR and Core, have service control layers, and it’s not hard to see how these new service-control elements and the IP control plane could be combined in a cloud implementation.  Given that, the first question is, “Does Cisco see it too?”  The second is “Does Cisco think they could survive it?”  It’s obvious they can, and do, see it, so the survival question is the relevant one.

The SDN model uses white-box forwarding devices as slave to control logic, and vendors like Cisco have generally (and reluctantly) provided for SDN OpenFlow control of their routers and switches.  It’s looked good in the media and hasn’t hurt much because there was no practical control logic strategy that could span more than a data center.  The problem is that white-box switches like the kind AT&T describes in its press release on its disaggregated core are way cheaper than routers, so a new and practical implementation of a separate control plane to create that “control logic” would validate white boxes network-wide.

One story on the AT&T deal with DriveNets frames the risk to Cisco in terms of Cisco’s Silicon One chip strategy, which demands IOS integration.  That’s not the risk, in my view.  The risk is that a new model of network with a separate control plane that’s expanded to support service coordination, could make packet forwarding a total commodity, and provide a mechanism for services at any level to directly manipulate forwarding behavior as needed.  You could argue that the network of the future would become not an IP network (necessarily) but a forwarding plane in search of a service.  If you want to talk commoditization, this is what it would look like if taken to the ultimate level.

And that, friends, is likely what’s on Cisco’s mind.  Cisco has always seen itself as a “fast follower” and not a leader, meaning that it’s wanted to leverage trends that have established themselves rather than try to create their own trends.  That’s probably particularly true when the trend we’re talking about could hurt Cisco, and all router vendors, significantly.  And when the market doesn’t have a clear model of how this new combined “supercontrol-plane” would work, why would Cisco want to teach it that critical lesson?  Why commoditize your own base?

Only because it’s inevitable, and that may explain Cisco’s current thinking.  Server vendors like Dell and HPE, software giants like IBM/Red Hat and VMware, and cloud providers like Amazon, Google, and Microsoft, could all field their own offerings in this area.  So could startups like DriveNets.  Once that happens, Cisco can no longer prevent the secret from getting out.  To the extent that this new-model network is truly best (which I believe it is), Cisco now has to choose between losing its current customers to Cisco’s own successor new-model implementation, or losing to someone else’s.

OK, suppose this is Cisco’s thinking.  What characterizes this new supercontrol-plane?  It’s cloud-hosted, it integrates applications and experiences directly with forwarding.  It’s really mostly an application, right?  Things like Kubernetes, containers and container security, and even application features like text processing, all live in the cloud, and very possibly either inside (or highly integrated with) this new supercontrol-plane element.  If Cisco has to face the truth of this new element at some point, it makes sense to get its software framework ready to exploit it.

But can a fast-follower strategy work with this kind of disruption?  It might.  The whole reason behind white-box switches and disaggregation of software and hardware is to ensure that the capital assets that build network infrastructure are open.  It’s the hardware that creates a financial boat anchor on advances.  Open it up, and you cut the anchor rope.  But remember that any network operator will already have routers in place.  If they’re Cisco routers, and if Cisco can make its current routers compatible with its supercontrol-plane concept, then Cisco has a leg up, financially, on competitors who’d have to displace Cisco’s routers and force operators to take the write-down.

Finally, if Silicon One is a Cisco asset to be protected, isn’t it one that could be leveraged?  Cisco could build white-box forwarding devices, if white-box forwarding is the model of the future.  Sure, they’d lose revenue relative to selling chassis routers, but if they could make that up by feeding service applications into their supercontrol-plane, that could be OK.  In any event, they can’t stick their finger in the open-model dike and think it will hold forever.

Timing issues represent the big risk to Cisco.  Fast following when you’re doing layoffs and your stock has been downgraded can be a major risk if the player you let take the lead decides to do things perfectly.  I wouldn’t count Cisco out at this point; they still have some pathways to a strong position in the new-model network era, but they’re going to have to accept that they’ll never be the Cisco they were, and sometimes that sort of thing creates a culture shock management can’t get past.  They’ll need to overcome that shock, and be prepared to jump out if it looks like a serious rival for new-model network leadership is emerging.

Tracking the White-Box Revolution

Sometimes the real story in something is deeper than the apparent news.  Nobody should be surprised  by the decision by AT&T to suspend any new DSL broadband connections.  This is surely proof that DSL is dead, even for the skeptics, but DSL isn’t the real issue.  The real issue is what’s behind the AT&T decision, and what it means to the market overall.  AT&T is telling us a more complex, and more important, story.

The fundamental truth about DSL is that, like a lot of telecom technology, it was focused on leveraging rather than on innovating.  “Digital subscriber loop” says it all; the goal was to make the twisted-pair copper loop plant that had been used for plain old telephone service (POTS) into a data delivery conduit.  At best, that was a tall order, and while there have been improvements to DSL technology that drove its potential bandwidth into the 25 Mbps range, that couldn’t be achieved for a lot of the current loops because of excessive length or the use of old technology to feed the digital subscriber line access multiplexers (DSLAMs).

The biggest problem DSL faced was that the technology limitations collided with the increased appetite for consumer access bandwidth.  Early attempts to push live TV over DSL involved complex systems (now largely replaced by streaming), and 25 Mbps wasn’t fast enough to support multiple HD/UHD feeds to the same household, at a time when that was a routine requirement.  Competition from cable and from fiber-based variants (including, today, millimeter-wave 5G), means that there’s little value in trying to keep those old copper loops chugging along.

OK, that’s it with DSL.  Let’s move on to AT&T and its own issues.  AT&T had the misfortune to be aligned as the only telco competitor to Verizon in wireline broadband.  As I’ve noted in past blogs, Verizon’s territory has about 7 times the potential to recover costs on access infrastructure as AT&T’s.  Early on, cable was delivering home video and data and AT&T was not, which forced them to try to provide video, leading eventually to their DirecTV deal (which AT&T is trying to sell off, and which is attracting low bids so far).  They’re now looking to lay off people at WarnerMedia, their recent acquisition, to cut costs.

AT&T has no choice but to cut costs, because its low demand density (see THIS blog for more) has already put it at or near the critical point in profit-per-bit shrinkage.  While AT&T may have business issues, they have an aggressive white-box strategy.  Their recent announcement of an open-model white-box core using DriveNets software (HERE) is one step, but if they can’t address the critical issue of customer access better than DSL can, they’re done.  The only thing that can do that is 5G, and so I think it’s clear that there will be no operator more committed to 5G in 2021 than AT&T, and that’s going to have a significant impact on the market.

Recall from my white-box blog reference that AT&T’s view is that an “open” network is one that doesn’t commit an operator to proprietary devices.  The AT&T talk at a Linux Foundation event suggests that their primary focus is on leveraging white boxes everywhere (disaggregated routing is part of the strategy).  That means that AT&T is going to be a major Tier One adopter of things like OpenRAN and open, hosted, 5G options overall.

There couldn’t be a better validation of the technology shift, though as I’ve noted, AT&T’s demand density places it closer to the point of no return (no ROI) on the profit-compression curve than most other operators.  That means that those other operators will either have more time to face the music, or need another decision driver to get things to move faster, but I think they’ll all see the handwriting on the wall.

For the major telco network vendors, this isn’t good news, and in fact it’s bad news for every major network vendor of any sort, even the OSS/BSS players.  As nearly everyone in the industry knows, vendor strategy has focused on getting their camel-nose into every buyer tent and then fattening up so there’s no room for anyone else.  The problem with open-model networking is that it admits everyone by default, because the hardware investment and its depreciation period has been the anchor point for our camel.  Take that away and it’s easy for buyers to do best-of-breed, which means a lot more innovation and a lot less account control.

We’ve already seen signs of major-vendor angst with open-model networking.  Cisco’s weak comment to Scott Raynovich on the DriveNets deal, that “IOS-XR is already deployed in a few AT&T networks as white box software,” is hardly a stirring defense of proprietary routers.  Ericsson did a blog attacking OpenRAN security.  The fact is that no matter what the major vendors say, the cat is out of the bag now, and with its escape it reveals some key questions.

The first of these questions is how much of a role will open-source network software play?  AT&T has demonstrated that it’s looking at open hardware as the key; either open-source or proprietary software is fine as long as it runs on open hardware.  That would seem to admit open-source solutions for everything and perhaps kill off proprietary stuff in all forms—like a technology version of the Permian Extinction.  The problem with that exciting (at least for those lifeforms who survive) theory is that there really aren’t any open-source solutions to the broad network feature problem.  Yes, there are tools like Kubernetes and service mesh and Linux and so forth, but those are the platforms to run the virtual features, not the features themselves.  That virtual feature space is wide open.

Can open-source fill it?  Not unless a whole generation of startups collectively sticks their heads in the sand.  Consensus advance into a revolutionary position is difficult; it’s easier to see revolution through infiltration, meaning that startups with their own vision and no committees to spend days on questions like “When we say ‘we believe…’ in a paper, who are ‘we?’” (a question I actually heard on a project), can drive things forward very quickly.

The second of these questions is are there any fundamental truths that this new open-model wave will have to embrace?  Obviously, the follow-on question would be “what are they”, so let’s admit that the answer to the first is “Yes!” and move to the follow-on.

The first of our specific fundamental truths is that all open-model software strategies, particularly those that exploit white-box technology, have to separate the control and data planes.  The control plane of a network is totally different from the data plane.  The data plane has to be an efficient packet-pusher, something that’s analogous to the flow switches in the ONF OpenFlow SDN model.  AT&T talked about those boxes and their requirements in the Linux Foundation talk.  The control plane is where pushed packets combine with connectivity behavior to become services.  It’s the petri dish where the future value propositions of the network are grown, and multiply.

The second of our specific truths is that the concept of control planes, now defined in multiple ways by multiple bodies and initiatives, have to somehow converge into a cooperative system.  What 5G calls a “control plane” and “user plane” defines a control plane above the IP control plane, which is actually part of 5G’s user plane.  The common nomenclature is, at one level, unfortunate; everything can’t have the same name or names have no value.  At another level, it’s evocative, a step toward an important truth.

Networking used to be about connecting things, and now it’s about delivery.  What the user wants isn’t the network, it’s what the network gets them to.  Thus, the true data plane of a network is a slave to the set of service-and-experience-coordinating things that live above, in that “control plane”.  The term could in fact be taken to mean the total set of coordinating technologies that turn bit-pushing into real value.  Because that broader control plane is all about content and experience, it’s also all about the cloud in terms of implementation.

Who and how, though?  The ONF approach uses an SDN controller with a set of northbound APIs that feed a series of service-specific implementations above, a layered approach.  But while you can use this to show a BGP “service” controlling the network via the SDN controller, BGP is almost a feature of the interfaces to the real BGP networks, not a central element.  Where do the packets originate and how are they steered to the interfaces?  In any event, the central SDN controller concept is dead.  What still lives?

This raises what might be the most profound competitive-race question of the entire open-model network area; is this “control plane” a set of layers as it’s currently implicitly built, or is it a floating web of microfeatures from which services are composed?  Why should we think of 5G and IP services as layers, when in truth the goal of both is simply to control forwarding, a data-plane function?  Is this new supercontrol-plane where all services now truly live?  Are IP and 5G user-plane services both composed from naked forwarding, rather than 5G user-plane being composed from IP?

These questions now raise the third question, which is what are the big cloud players going to do in this new open-model network situation?  By “big cloud players” I of course mean the cloud providers (Amazon, Google, IBM, Microsoft, and Oracle), and also the cloud-platform players like IBM/Red Hat, VMware, and HPE) and even players like Intel, eager for 5G revenue and whose former Wind River subsidiary offers a platform-hosting option.  And, finally, I’d include Cisco, whose recent M&A seems to be aimed at least in part at the cloud and not the network.

It’s in the cloud provider community that we see what might be the current best example of that supercontrol-plane, something that realizes a lot of the ONF vision and avoids centralized SDN controllers.  Google Andromeda is very, very, close to the goal line here.  It’s a data-plane fabric that’s tightly bound to a set of servers that host virtual features that could live anywhere in the application stack, from support for the IP control-plane features to application components.  Google sees everything as a construct of microfeatures, connected by low-latency, high-speed, data paths that can be spun up and maintained easily and cheaply.  These microfeatures are themselves resilient.  Google says “NFV” a lot, but their NFV is a long way past the ISG NFV.  In fact, it’s pretty close to what ISG NFV should have been.

Andromeda’s NFV is really “NFV as a service”, which seems to mean that microfeatures implemented as microservices can be hosted within Google’s cloud and bound into an experience as a separate feature, rather than being orchestrated into each service instance.  That means that each microfeature is scalable and resilient in and of itself.  This sure sounds like supercontrol-plane functionality to me, and it could give Google an edge.  Of course, other cloud providers know about Google Andromeda (it dates back over five years), so they may have their own similar stuff out there in the wings.

The cloud-platform vendors (IBM/Red Hat and VMware, plus the server players like Dell and HPE) would probably love to build the supercontrol-plane stuff.  They’d also love to host 5G features.  So far, though, these platform giants are unable to field the specific network microfeature solutions that ride on top of the platforms and justify the whole stack.  Instead, most cite NFV as the source of hostable functionality.  NFV, besides not being truly useful, is a trap for these vendors because it leads them to depend on an outside ecosystem to contribute the essential functionality that would justify their entry into the new open-network space.  They might as well wait for an open-source solution.

And this raises the final question:  Can any of this happen without the development of a complete, sellable, ecosystem that fully realizes the open-model network?  The answer to that, IMHO, is “No!”  There is no question that if we open up the “control plane” of IP, consolidate it with other control planes like the one in 5G, frame out the ONF vision of a programmable network and network-as-a-service, and then stick this all in the cloud, we’ve created something with a lot of moving parts.  Yes, it’s an integration challenge and that’s one issue, but a greater issue is the fact that there are so many moving parts that operators don’t even know what they are, or whether anyone provides them.   For operations focus, I recommend that people keep an eye on models and TOSCA (an OASIS standard for Topology and Orchestration Specification for Cloud Applications) and tools related to it.

Big vendors with established open-source experience (all of the vendors I named above fit that) will do the logical thing and assemble an ecosystem, perhaps developing the critical supercontrol-plane tool and perhaps contributing it as an open-source project.  They’ll then name and sell the entire ecosystem, because they already do that, and already know how it’s done.

This will be the challenge for the startups who could innovate this new open-model space into final victory.  No startup I’m aware of even knows all the pieces of that complete and sellable ecosystem, much less has a ghost of a chance of getting the VC funding (and the time) needed to build it.  Can they somehow coalesce to assemble the pieces?  Interesting question, given that many of them will see the others as arch-rivals.  Can they find a big-player partner among my list of cloud or cloud-platform vendors?  We’ll see.

Another interesting question is what Cisco will do.  They’re alone among the major network vendors in having a software/server position to exploit.  Could Cisco implement the supercontrol-plane element as cloud software, and they promote it both for basic OpenFlow boxes and for Cisco routers?  We’ll get to Cisco in a blog later on.

I think AT&T’s DSL move is both a symptom and a driver of change.  It’s necessitated by the increasingly dire situation AT&T is in with respect to profit per bit, just as AT&T’s white-box strategy is driven by that issue.  But it drives 5G, and 5G then drives major potential changes in the control-to-data-plane relationship, changes that could also impact white-box networks.  Everything that goes around comes around to going around again, I guess.  We’ll see where this complex game settles out pretty quickly, I think.  Likely by 1Q21.

What’s Really Behind the IBM Spin-Out?

The news that IBM will spin off its managed infrastructure services business into a new company created a pop for its stock, but what does this mean (if anything) for the IT market?  In particular, what does it mean for cloud computing?  It’s not as simple as it might seem.

The basic coverage theme for the deal, encouraged no doubt by IBM’s own press release, is that this is going to take IBM another step away from “legacy” to “hybrid cloud”.  “IBM Slashes Legacy to Focus on Hybrid Cloud” says SDxCentral, and other sites had a similar headline.  Click bait?  From that, it wouldn’t be unreasonable to think that the deal was spinning out all of IBM’s mainframe computer business, along with related software and services.  “IBM trashes everything about itself except Red Hat and the Cloud,” right?  Wrong.  What IBM is spinning off is its “managed infrastructure services” business, which they say is $19 billion annually.  It doesn’t include either software or hardware elements.

In its SEC filing today, IBM said that customer needs for applications and infrastructure services were diverging.  It’s pretty clear what applications are, and “infrastructure services” are the collection of managed services described HERE on IBM’s site.  It’s a fair hodgepodge of technical and professional services relating to just about everything associated with data centers, but not the cloud.  Thus, it could be more accurate to say that IBM is spinning off its managed services except for those involved in hybrid cloud.

If this seems like an almost-political level of spin to you, you’re not alone.  I think IBM is doing a smart thing, but they’re wringing the best PR angle out of the move rather than providing the literal truth, which I think is more related to optimizing Red Hat and “NewCo” than anything else.  It’s about combined shareholder value.

Before Red Hat, IBM was a creaky old organization with a good self-image, credibility among aging CxOs, and little or no path to serious engagement of new customers.  Red Hat gave IBM something modern to push through its admittedly powerful sales engagements, and it also gave IBM one of the best platforms for attracting the attention of new prospects.  In short, Red Hat was, and is, IBM’s future, and one of the things it does rather nicely is to link IBM’s cloud business to that future too.

A lot of what builds and goes into the cloud is open-source, and Red Hat is a natural leader in that sector of software.  Not only that, a lot of that cloud stuff is still tied back to the data center, creating the “real hybrid cloud” model that’s been there all along and never written about much.  There will be very, very, few “cloud enterprises” if we take that to mean 100% cloud, but every enterprise will be a hybrid cloud enterprise.  No surprise, then, that IBM has picked that notion up in its PR regarding the spin-out.

But why spin anything out?  IBM has Red Hat, after all, and this isn’t some sort of nuclear business atomic-organization theory where one company coming into the IBM atom has to knock another piece out.  However, an organization to offer managed infrastructure services really needs to be somewhat technology agnostic to reach its full potential, and if IBM is going whole-hog to Red Hat and associated technology, it hardly wants to present an open-minded vision of the industry on its sales calls.  There’s a mission conflict here, one that could hurt the managed infrastructure service business if they stay, as well as the rest of IBM’s business.  The new company, “NewCo” in the SEC filings, will be a pure managed infrastructure services company, free now to manage any infrastructure as an impartial professional team, without fear that IBM product/technology bias will scare non-IBM accounts away.

Before we write all IBM’s hybrid cloud singing and dancing off as market hype, though, we have to recognize that while linking this deal to hybrid cloud may be an excursion away from strict facts, it is a fact that IBM is highly committed to hybrid cloud.  Why?  Because, as I just said, every enterprise will be a hybrid cloud enterprise, and if we define “enterprise” as a ginormous company, IBM has engaged a higher percentage of them than any other IT firm, and has a deeper engagement level with most.  Given that the market hype on the cloud has totally distorted reality, IBM is in a position to pick the low apples while all its competitors are still staring into the sun…not to mix metaphors here.

Here’s a basic truth to consider.  If the cloud is an abstraction of the thing we call “servers”, then the future of software and computing isn’t based on hardware at all.  Even “infrastructure” (in the sense of computing and related platform software) is shaped by the symbiotic relationship the infrastructure has with the cloud, and applications are “cloud application” not mainframe or minicomputer applications, even if they actually run (in part or entirely) on mainframes or minicomputers.  The hybrid cloud is the IT of the future, period.

IBM, the old and original IBM, has solid account control with many if not most of the enterprises.  It has, with Red Hat, something to sell them, and something from which they can create a respectable hybrid cloud.  The core business of IBM-the-new-and-exciting depends on making sure that Red Hat shift works, while at the same time trying to preserve overall shareholder value.  Remember, current shareholders will get some NewCo too, and IBM wants that to do well.  Inside IBM, the goals of managed infrastructure services and hybrid cloud would likely be cross-purposed often enough to be a problem.  Thus, spin it out.

The hybrid cloud lesson here is clear, but another lesson we can draw is also important.  The age when enterprise IT and “ordinary” IT differed at the core technology level—mainframes versus minis or even PCs—are over.  The enterprise is a hybrid cloud consumer, and hybrid cloud (as I noted earlier) is computing abstracted.  One platform to rule them all.