October 2020 – Welcome to CIMI Corporation's Public Blog

What Do Operators Say are the Barriers to Transformation?

What do operators think makes transformation difficult? That’s a key question not only for operators, but for any who want to drive changes to fundamental network infrastructure principles and architectures. There are technical elements that operators see as barriers, but the biggest ones are far more financial than technical.

I’ve gathered this information from 84 discussions with operators over a period of 11 months. A total of 44 operators were represented, which means that I had about two discussions, on the average, with different people within the operator organizations. All those involved were either CxO-level or senior planners. As is always the case with these contacts, I’ve had longer-term interactions with them, and I promise full protection of identities to all involved, so they can talk freely.

The number-one barrier, one cited in every discussion I had, was residual depreciation on existing infrastructure. Network equipment is written down at different rates in different markets, based on tax laws in each location. On the average, operators report about a five-year depreciation cycle, which means that when they buy gear, they post the cost as an expense over a five-year period. If they displace and dispose of a device before it’s fully depreciated, they have to take a write-down of all the depreciation not yet taken. This write-down will impact the cost, and thus the justification, for any transformation project.

It’s clear that the presence of a lot of undepreciated equipment is a barrier to transformation, but what makes it a special problem is that networks typically evolve, so there are usually a number of devices in place at every possible stage of depreciation. One operator said their equipment was, on the average, three-and-a-half years old, but they had a quarter of it in the final (fifth) year of depreciation and a bit less than that in the first year. If you considered the network as having three layers—access, aggregation/metro, and core—the spread of depreciation across the three was very similar, though access networks were almost always at least slightly “newer”.

The depreciation spread is important because operators were generally of the view that it was “difficult” to displace technology with two years of depreciation remaining, and “almost impossible” to displace something with four or more years remaining. Since none of the access/metro/core network segments were made up primarily of assets with two or fewer years of depreciation remaining, it was difficult to target a transformational change that wouldn’t involve “almost impossible” write-downs.

This raises the question of how transformation could ever be accomplished, of course. One possible answer is the greenfield deployment of mass technology related to a basic change in the network. 5G is the example every operator gives for this kind of driver. When a new service mission comes along, that mission drives greater willingness to displace technology, in no small part because the budget for the mission will include write-down of obsolete assets. But this raises the second-most-cited barrier to transformation, lack of a strong profit incentive to justify new service missions.

Almost half the operators who have deployed 5G already, or who are scheduled to deploy within the next year, say that their deployment plans are hampered by the lack of a clear profit direction on the service decision. All operators who are 5G or 5G-ready say that “competition” is either the primary or second-most-significant driver. The next-highest-rated driver is “enhancement to current mobile network capacity”. Expected new profits on 5G services is ranked third.

This is important because every single operator discussion demonstrated a view that since 5G “mandates” a transformed model of networking, broad adoption of 5G would have the effect of driving transformation through more of the network. It also demonstrates that vendors who offer 5G have a better chance of engaging than those who do not, since there were no other new service missions with any significant credibility at all. This, of course, is already known by vendors, and it’s why there’s such a furor around 5G positioning, not only for traditional network vendors but also for cloud-software and public cloud providers.

We now get to the first “technical” barrier to transformation, one cited by over three-quarters of operators as an “important” issue. Loss of technical and operations continuity without compensatory features to address it, to operators, means that transformed technologies involve changes to network planning and operations practices that aren’t offset by capabilities provided by those technologies. It was a bit surprising to me to hear this point stated this way; I’d expected them to say that “service automation” was a key goal and wasn’t being provided, but operators say service automation is an overall strategy (see below) and not an element of a transformation project.

This issue is most likely to arise in what we could call “point-project” introduction of new/transformed network infrastructure. Router A is at end-of-life, so why not replace it with Transformed-Router B? The problem is that the introduction of a single device within a web of legacy network elements has little chance of presenting any benefits and a significant chance of disrupting operations. Interestingly, operators noted that this is one of the reasons why it’s so hard for a new router competitor to displace the incumbent vendor. Absent a broad service mission change to drive broader change, you’re stuck replacing boxes, and for that, it’s hard to justify a vendor change.

This problem seems almost insurmountable to operators when considering virtual-infrastructure transformation. The problem is that hosted features mean cloud infrastructure, which almost inevitably means major changes in operations and planning. A bit over two-thirds of the operators indicated that they found technical/operations planning and continuity a “major issue” with NFV, and why they believed that the ETSI zero-touch automation initiative really got launched. Without service automation to take up the slack, it’s easy for integration and operations changes to create so much disruption that opex costs override capex reduction benefits.

Most operators don’t believe that we have an effective service lifecycle automation approach; only about a third thought that the ONAP model was workable, a quarter thought some TMF initiatives were the way to go, and the rest really had no useful notion of how to proceed. This was (according to that group) the primary reason why they favored “white-box” networks as a transformation vehicle. With white boxes, the operations could be presumed to be more or less equivalent to traditional operations and integration.

That brings us to what I’ll call the “technical strategy” issues, which really represent the basic approach operators might take to transformation. The first of these, the service mission change has already been talked about. The other two are the opex-driven transformation approach and the white-box transformation approach. We’ll look at these now.

If the service mission change represents a revenue-side justification for transformation, the opex-driven transformation is obviously aiming at operations costs. For the average operator, opex represents almost the same number of cents per revenue dollar as capex does, and when NFV came along, many believed that service lifecycle automation could eliminate the “opex tax” that the additional complexity of chained-virtual-function deployment and the maintenance of the hosting resource pool would necessarily generate. One reason why operators have expressed so much interest in hosting 5G on public cloud resources is their concern over this opex tax.

Opex reduction strategies have other important backers. Current router incumbents like Cisco and Juniper are lining up in favor of an opex-centric transformation because it doesn’t reduce their revenues. In fact, by selling products (like AI enhancements to operations), they hope they might actually increase revenues. The challenge here is that these strategies so far fail the “D4P” (Design for Prevention) test I blogged about last week. It may be that the Juniper acquisition of 128 Technology was aimed at least in part at improving their AI/automation model.

The flip side of opex is capex, and the white-box transformation model is the poster child for capex-driven transformation. About a quarter of the operators in my informal survey have taken a serious look at this, but commitment is still the exception. AT&T is the only major operator who has endorsed, and actually deployed, a large-scale white-box-based network. The majority of that network is deploying on a 5G-related mission, but they also recently announced a transformation of their core to white-box clusters provided by DriveNets.

The operators who are still dipping their toes in white boxes, so to speak, are deterred primarily by the broad question of risk versus cost impact. Estimates of savings attainable via white-box adoption vary from a low of 25% to a high of 50%, and these generally assume minimal increases in opex related to the change. There’s also at least some indication that operators are using white-box networks as a way of squeezing discounts from their incumbent vendors. Even if they’re serious, operators appear to require a fairly significant study period in which to become comfortable with the white-box strategy, and a big part of it is aligning their current business, operations, and network management frameworks to the new model.

This alignment problem is currently the largest barrier to adoption of white-box technology, for two reasons. First, it raises the risk that new operations tools and practices, including service lifecycle automation, will be required to prevent opex costs from overrunning the capex benefits. Most open-model, white-box, solutions don’t as yet have full operations solutions. Second, because current practices are set by router incumbents, it tends to put the white-box providers in direct conflict to the router vendors from the start of the engagement. This changes the point of interaction from the strategic level, where executives who actually have purchase authority live, to the technical level, where router-certified technologists confront the impact of not-router on their own careers.

These two issues have impacted nearly all new operator technology initiatives, and you can see the results in deployment successes. For projects with low-level initiation, meaning modernization and replacement initiatives driven largely by network operations, only 6% of new technology trials advance to production network deployment. For projects with senior-level personnel driving consideration, 63% achieve production deployment.

We can say that, overall, transformation is deterred by risk/reward questions. What contributes to the risk side are concerns about technology change, the need to trust new vendors, the lack of a clear architecture for the new network ecosystem, the need for a high first-cost commitment, and low in-house skill levels with the critical new technologies. Risks are objections at the low level of an organization, and to overcome those “risks/objections”, vendors tend to try to underplay the changes they’re proposing. Every white-box proposal I’ve seen has emphasized the similarity/congruency between white boxes and proprietary routers. The problem with “subtractive” risk management is that the benefits are subject to subtraction too. In fact, these kinds of proposals always end up in a discount war, where it becomes all about price. In the end, the incumbents usually discount enough to win.

And that raises the biggest difficulty of all, a difficulty that operators don’t really mention. Transformation usually doesn’t present enough clear benefits to make a business case. The goal of transformation loses steam in the realization. A big part of that is due to the fact that operators think of transformation as the elimination of a hazy problem of profit-per-bit compression. Hazy problems begat hazy solutions, and you can’t buy those, you can buy only concrete products. When you try to make transformation into a series of project steps, the steps don’t convincingly deliver the operators from that fear of profit compression. Usually because transformation-oriented vendors end up, as I’ve said, focusing on simple cost differences.

Transformation, overall, occurs when you have a transformational technology that can combine with a transformational mission that justifies it. It seems to me that if we assume that there are no credible transformational service missions from which operators could derive significant new revenues, it will be difficult to sustain momentum on transformation, even if it can be started. Cost-driven transformation can only wring out so much from the cost side of profit-per-bit compression. The revenue side must eventually play a role.

That’s why we need to keep working on this problem. Operators have all manner of hopes about new revenue, but for the great majority, there’s absolutely no credible reason to believe in them. One reason I’m so enthusiastic about the network-as-a-service model I mentioned in a blog HERE, is a that an aggressive NaaS positioning could transport some features, value, and revenue from the over-the-network space to the network itself. NaaS is, in a real sense, potential “service middleware”, allowing operators to build components of higher-level service rather than competing with established players in that OTT space. While this may not be the easiest thing to conceptualize or develop, it has the advantage of keeping revenue targets in the service comfort zone of the operators. That may be the critical requirement if anything useful is to be done.

NaaS and the Protocol-Independent Network

Over the years, we’ve had many different protocols, but the interesting truth is that within each of them, there’s a common element—the data. Whether we’re transporting information in IP or SNA or DECnet or X.25, we’re still transporting information. The protocol isn’t the information, it’s just a contextual wrapper that defines the rules for exchanging the information successfully. If we accept that, then we have the beginnings of a truth about the way future networks might be built.

Let’s start with a reality check here. Nobody is trying to get rid of IP or the Internet, as they exist as end-user services. What they’d love to refine, or even get rid of, is the baggage associated with IP and router networks. If you look at how IP has evolved, it’s hard not to think of it as being a very long way around the track to repurpose a protocol for stuff that was never even considered when the protocol was designed. What we do with IP today is deliver multimedia content, support mobile users, and neither mission really existed back in the days when the protocol was invented. The thing is, everything in a commercially useful sense that consumes data today is an IP device. We have to do whatever we do to networks without impacting what uses the networks.

IP as an interface isn’t the same thing as IP as a protocol, and that’s not the same thing as IP as a network architecture. IP networks are built with routers, which are purpose-built devices that talk not only the user IP interface but also “internal” interfaces that include features not made visible to the user at all. One big problem with this model is that it’s difficult to advance features quickly or incrementally because the routers have to be upgraded to do it.

One of the big areas where network operators want to add features is in the area of traffic engineering. Another is in fault recovery. Both these were drivers behind the original concept of software-defined networking, and SDN started with the presumption that we’d accomplish IP advances by first separating the control and data planes. The data plane would carry only end-to-end information, and all the rest would be pulled out and handled “over top of” the data plane. Today, we’d say that “the cloud” would handle it, but in the original SDN approach (still the official one) it’s handled by an SDN controller.

A lot of people would say that a control plane in the cloud isn’t much of a leap beyond the centralized (and vulnerable) SDN controller concept, and that could be true. “Could be” depending on how the concept is implemented. Recall that I blogged a bit ago about “locality” as opposed to “edge” in considering the placement of computing resources. A few people on LinkedIn have also noted that separating the control plane by simply hosting it somewhere on a potentially vastly distributed set of servers might not be optimal in terms of latency or stability. I agree; there has to be a new way of thinking about the cloud if you’re going to deploy the IP control plane there.

In my blogs on the topic (especially the one referenced above), I noted that the missing ingredient in “telco-oriented” cloud computing was the notion of “locality”, meaning that some microfeatures had to be hosted proximate to the point of activity they served. Something supporting Interface A on Device B has to be close to where that happens to be, if the “support” is expected to be near-real-time. A randomly chosen resource location could be a disaster.

It follows, though, that if you’re actually going to have a “cloud” at all, you have to pull the microfeature-hosting resources from the device and place them nearby. It would also be logical to assume that the operation of a control plane, especially the supercontrol-plane that I postulated as a combination of all service-coordinating features, would include microfeatures that wouldn’t be as latency sensitive. Thus, we’d need to be able to ask for resources proximate to the point of activity but not joined at the hip to it, so to speak.

The point here is that “separating the control plane” and “virtualizing the control plane” aren’t the same thing. If control planes did nothing beyond what IP itself requires, you could see that the implementation of a cloud-hosted control plane would likely involve a kind of local interface process set, resident with the interface in some way, and a topology-and-management process set that would be distributed. The where and how of the distribution would depend on the implementation, which means that we’d need to think about how this new supercontrol-plane got implemented in order to decide where the microfeatures that make it up could be placed.

From what Google has said about its use of this approach in its B4 background, we can infer that Google saw separating control and data planes at least in part as a means of simplifying what goes on with IP, MPLS, and optical networking. Each of these has its own control plane, and in a traditional large-scale application of router networks, each would be part of a layer stack in the order I referenced them. Google, in B4 itself, didn’t integrate higher-layer services into the picture, because B4 was an IP backbone. In the broader Andromeda application, which includes Google’s B2 interprocess network, they do network microfeatures directly. The relationship, if any, between these two isn’t specified.

That may not be the best way to approach a commercial solution to optimizing the separate and supercontrol-plane concepts. For that goal, it would be best to either integrate commercial tools (Kubernetes, Istio) to create a microfeature framework, enhanced to support locality, or to create a high-efficiency version of the two for specialized network microfeature applications. This would be best done by integrating it with the architecture that separates the control plane in the first place, meaning the emulation of IP networking.

The role of B4 in flattening the IP, MPLS, and optical layers is an interesting topic in itself. We’ve tended to treat layers as not only inevitable but mandatory since they came along with OSI in 1974, but that was then and this is now. Is a layered approach really the best way to run networks today? For example, if we have dynamic optical connectivity and dynamic connectivity at Level 2 or 3 or both, and if we have “control-plane” processes that can manipulate features at all the levels, are we really looking at layers in the OSI sense? Are all the higher layers of the OSI models now subsumed into a supercontrol-plane approach? That seems to be the goal of B4.

It may seem a radical thought, but it’s very possible that we’ve outlived the usefulness of layered protocols. While the OSI model defined seven layers, those layers are divided into two “zones”, one (Level 1-3) representing the connection network and the other (4-7) the application relationship, meaning end-to-end. We’ve seen the world focus its attention on IP-based connection services in the first zone, and we’ve seen the second zone compressed by things like 3GPP specifications, which create an interface point between a “network” and a “service”, meaning between the two zones. In both cases, the specifics of the zones themselves seem to be getting lost. Maybe, in fact, the zones themselves are starting to mingle.

What I think is happening is that we’re entering (perhaps unwittingly) the age of network-as-a-service, and what NaaS is a service to is applications, meaning software. For those who need an OSI lesson, the Application Layer was the 7^th of those OSI layers, the top of the stack. If NaaS presents a Level 7 interface to applications, then the entire OSI stack is subducted, invisible. Present the right API at the NaaS top, and Tinker Bell can carry packets on little silver wings for all we care (as long as she can meet the SLA).

NaaS, as a conception of delivered services, could free network infrastructure from strict conformance to IP behavior, even if the service delivery is made in an IP envelope. Networks composed from microfeatures can have a variety of internal properties as long as they present the proper NaaS interface(s) at the top. Are some of the challenges we have in network virtualization due to the fact that, by “virtualizing” router networks, we’re also codifying routers?

For service providers, network operators, NaaS could be a windfall. The absorption of higher-layer features into “network” services lets them climb the value chain via a familiar ladder. Connectivity is expanded to include application-specific elements—4G and 5G mobility-to-data-plane interfaces, CDN, data center interconnect, IoT services, you name it. If the “service” serves applications directly, then a lot of application features that would benefit from as-a-service implementation can be absorbed into the service. This might be the pathway to new revenue that avoids direct collision with the current cloud providers and OTTs.

Again, this doesn’t mean the death of IP, because network connections still require network services, but it does mean that the network could become (as I’d argue it has for Google Andromeda) almost a byproduct of a microfeature-connection architecture, an architecture that isn’t protocol-specific, but interface-defined. I think that things like SD-WAN, SDN, and Andromeda are leading us in that direction, and I think that a shift of that sort could be the most radical change in networking that’s ever been considered.

It could also be fun; many of those who fought the transition from TDM to packet would have to admit that their lives have gotten more interesting under IP, in large part because IP is less constraining to applications. NaaS could remove all constraints, both to applications, and perhaps to fun as well.

What is an NGOSS and Why Do I Care?

What the heck is an NGOSS? I’ve used the term a lot, relating to the “NGOSS Contract”, but for those not familiar with the OSS/BSS space or the TMF, it may be mysterious. Tata Consultancy Services, a large and active professional services firm, recently did a paper on “reimagining” the OSS/BSS, a TMF paper asking whether there’s life beyond connectivity services. There are interesting contrasts and similarities between the two documents, well worth exploring.

NGOSS stands for “Next-Generation Operations Support System”. It’s a term that’s been widely used for at least 15 years, and it’s often used by the TMF in their specifications. A lot of the TMF stuff is members-only though, and so I can’t cite it (or, if I’m not a member at the time, can’t even access it) except by summarizing. The TMF paper is particularly helpful because of this; it’s a public summary of the way the body sees the future of “transformation”. The Tata paper is interesting because it reflects a professional-services view of the same process of transformation.

The opening of the Tata paper has the usual platitudes about bandwidth. “As businesses increasingly adapt to a predominantly online model in line with the new reality, the need for high bandwidth and reliable network services is soaring. Consequently, communications service providers (CSPs) are experiencing exponential demand for services and are dependent on progress in virtualization technologies to continue scaling at current rates.”

The problem with this, besides it’s platitude status (one more “exponential” demand statement and I’ll puke), is that the real issue isn’t demand growth, it’s profit shrinkage. No operator would be unhappy with growth in demand for bandwidth, if they could charge incrementally for the growth. They can’t, so they need to either find something other than bandwidth to sell, or cut the cost of producing bandwidth to offset the fact that higher bandwidth services don’t generate proportionally higher prices.

This same issue colors the next section, which is called “Rethinking the OSS Function”. It cites TMF objectives for OSS modernization, none of which even imply dealing with that profit compression problem. In fact, that section of the document is why a lot of operator strategists are in favor of scrapping the whole OSS/BSS thing and starting over. It’s not that no useful goals are set (automation is one of the attributes of a future OSS/BSS), but that nothing much is said about achieving them.

That comes in the next section, which is called “Driving Well-Orchestrated OSS Functions”. This section finally makes a recommendation that’s useful; OSS/BSS has to be made event-driven. I had hopes here, because the TMF was in fact the source of the key concept in OSS and operations automation—that NGOSS Contract I mentioned at the start of this blog. Sadly, neither the term nor the concept is raised in the Tata paper. What it says is that “future OSS functions must be created and offered as services composed of microservices to support automated end-to-end orchestration of hybrid (physical, logical, and virtual) network resources.”

The TMF paper, in contrast, opens with the statement that “connectivity is rightly seen as a low margin business and is rapidly commoditizing further.” The goal of operators is “getting their interior world right with an agile technology foundation and operating model.” The TMF obviously includes its primary OSS/BSS target in this interior world, but just as obviously the agile technology foundation has to extend to infrastructure.

The mechanism of getting their “interior world in order” is, by implication, 5G. The TMF says that it’s critical in order “to make the most of the hugely expensive shift to 5G.” But that seems to be contradicted by the next paragraph, where the former CEO of the forum says “For consumers and smart home individuals, ‘connectivity’ tends to have an intrinsic value and is a valid thing to be purchasing. As you go up the scale, however, into the industry transformation realm, connectivity is only valued in so far as it is tied into the solution: “They don’t want to buy connectivity from you, they want to buy a solution.” 5G is clearly a connectivity strategy for consumers, as all mass-market infrastructure is. If we presume that to “go up the scale into the industry transformation realm”, meaning business services, the key is to sell solutions.

This seems to argue for the operators to focus their “digital service provider” business on enterprises, and to provide solutions there is to be a SaaS provider. Is that really a smart approach, given that there’s a highly active and competitive public cloud business already selling solutions to those customers? Especially when the majority of the profit-per-bit problems operators have comes from all-you-can-eat consumer services?

The resolution, says a Vodafone quote and other comments in the TMF paper, is that “what we now call ‘services’ won’t involve the telco alone, but will comprise a range of partners, including the telcos.” Thus, the telcos aren’t really digital service providers at all, but digital service integrators or resellers. Can a business that’s looking at transformation as a means of escaping profit squeeze afford to be a reseller of another player’s services?

It seems to me that the TMF vision is really not aiming beyond the OSS/BSS at all, but rather is suggesting that operations and services evolve to something above connectivity by partnering with those who supply services up there. That could be defeating the whole purpose of “digital transformation”, locking the telcos into not only their current level of disintermediation and commoditization, but also in a whole new level.

Both papers seem to suggest that OSS/BSS transformation is essential, and at least imply that an event-driven approach is the answer. That’s actually a good idea, but it misses the challenge of “how?” In order to be event-driven, a system has to recognize both the concept of events (obviously) and the concept of “state” or context. Anyone who’s ever looked at a protocol handler knows that the same message, in different contexts/states, has to be processed differently. For example, getting a “data packet” message in the “orderable” state for a service is clearly an error, while it’s fine in the “data transfer” state. For there to be states and events and related processes, you need a state/event table.

State/event tables are descriptions of the collective contexts of a cooperative system, and thinking about building them is useful in that it forces software architects to consider all the possibilities, instead of having something happen that falls through the cracks. However, there’s a potential conflict between the value of state/event tables and the number of possible states and events. If you look at a complex network as one enormous, flat, system, you’d have way too big a table to ever implement. The TMF and my own ExperiaSphere work dealt with this by dividing complex systems into “intent models” that each had their own state/event relationships. Hierarchical composition, in short. That’s what NGOSS Contract described.

The point here is that both papers miss what should be its own strongest point, which is that data-model-driven steering of events to processes via component-aligned state/event tables is the way to create both event-driven behavior and microservice-compatible design. If a service data model drives an event to a process, the process can get the information it needs from the service data model alone, which means it’s stateless and can be deployed as a microservice or even in serverless form.

If you pull the NGOSS-Contract approach out of the OSS/BSS modernization story, you’re left with the thing that has plagued the whole notion of OSS/BSS modernization from the first—platitudes. We can talk about bottom-up and top-down as long as we’re focusing on project methodologies, but a project methodology drives a project, it doesn’t write software. A software architecture should emerge from the methodology. That’s a separate element, a result of the right process, but it’s not the automatic consequence of waving a wand over a bunch of data and chanting “top-down, top-down!”

That sums up my problem with the Tata paper. Project methodologies in IT and networking lead to application or service architectures, which then frame the requirements for the components of the solution and the way they’re integrated and managed. The project is not the output, it’s the path to the output. The problem with the Tata paper is that it’s yet another description of a project methodology (a good one, but not a transformative one), at a time when we’re long past looking for a path to OSS modernization and instead are looking for specific products, or at least architectures. The TMF seems to be heading to the same place by a different path—transform by partnership with your former enemies.

The Tata paper does, in the section called “The Role of Industry Standards”, call out an important problem, one so important it might actually be the barrier to progress toward the OSS modernization goal. The paper cites the TMF and ONF models for top-down design, but throughout the paper it’s clear that the “modernized” OSS/BSS has to be more tightly integrated with the rest of the network and service ecosystem. We have standards for every possible piece of every possible network strategy, and in some cases the standards even compete. We recently heard applause for the unification of two different operations API specifications, for example. We should be asking how we came to have them in the first place.

The TMF paper seems to not only accept this fragmentation of the future, but depend on it. Cede the mechanization of stuff beyond OSS/BSS, and focus on harnessing that “beyond” stuff to create services as a pseudo-integrator. OK, that may not be an unreasonable view for the TMF (an OSS/BSS-dominated body) to take, but it’s a formula for staying disorganized while facing what almost has to be a unified initiative—transformation.

It’s my view that the TMF was the logical body to modernize the OSS/BSS, and that the TMF did (with NGOSS Contract) devise the central paradigm of event steering to processes via a data model, that’s critical to this modernization. Everything else the papers describe, every API anyone is developing or harmonizing, every standards activity aimed at any aspect of operations and management, should be fit into that NGOSS Contract framework. If that were to be done, the result would be exactly what “NGOSS” stands for.

The model of the TMF NGOSS Contract is just as valuable, or even more valuable, if you step into the network domain. A true “contract” state/event process could manage everything about the service lifecycle, including the network piece. It follows that a network-centric solution could easily be extended into the service, the OSS/BSS, domain. The universality of the approach is good for the industry, because service lifecycle automation should be universal to be useful.

It should also be based on state-of-the-art cloud-think. Both papers seem to agree with that, and yet both skirt the question of how to bring that about. If you’re planning to use current tools to accomplish something, you have to frame your approach in terms of those tools. You can’t accept the notion that you can write specifications for everything, or simply translate goals at the top to arbitrary features at the bottom. That’s especially likely to bite you given that the standards processes take years to come to a conclusion. We’re deploying 5G today and the standards aren’t finished, and likely won’t be until 2022. I wonder if there’s time for that stuff, given that operators are already facing falling infrastructure ROIs that are nearing the critical point.

NGOSS Contract has been around for about 13 years now, and the TMF once told me that it had gained very limited traction. It doesn’t seem to be played in the current TMF material, though as I’ve said, I don’t have access to the members-only stuff. The question, then, is whether the TMF is prepared to promote its own (dazzling and unique) insight, first within the narrow OSS/BSS domain and then in the broader lifecycle automation mission. If it does, the TMF takes its rightful role in NGOSS evolution and defines the basis for service lifecycle automation overall. If it doesn’t, the it will be up to some other standards body or open-source group to pick up the torch, and the TMF will then have to fight for relevance in its own space.

Did Juniper Pay too Much for 128 Technology?

Did Juniper pay too much for 128 Technology? That’s a question that’s been raised by some financial analysts, by some who hold Juniper stock, and by at least one industry pundit, HERE. It’s not unusual for some to question the value of an acquisition like this, particularly in an industry that’s not been a dazzling success at exploiting the M&A they’ve done. It’s a good thing, in fact, because the best way to prevent failure is to understand that it’s possible.

Since I blogged that the Juniper deal for 128T was a smart one, it’s important that I have a response to criticisms of the deal, not only to justify my view but also to guide the two companies toward the symbiosis that could make the deal a great one for all involved. I’ll do that by looking at the article and dealing with its comments.

To me, the article is questioning three points that justify the deal, and the implication that the justifications aren’t compelling. Let’s look at them, then, first as a list, and then in detail. The first is that the 128T deal is part of Juniper’s desire to be more software-centric. Second, it’s a way of exploiting the SD-WAN market, which has a very high growth rate. Finally, it’s because 128 Technology’s smaller compute footprint makes it an ideal SD-WAN solution.

If anyone out there doubts that networking is getting to be more about software than hardware, they’ve been living in a time warp. AT&T made a public decision to move to white-box-and-software networking a couple years ago. Even the 3GPP standards, never ones to be on the leading edge of technology shifts, embraced a virtual implementation. OpenRAN is software-centric. It would be truly frightening if Juniper didn’t think it needed to think more about software, but the fact is that they’ve been doing that for some time, and so has arch-rival Cisco. I don’t think that they’d use the “we-are-software” argument to justify a deal like this, but let me get back to this point.

OK how about the SD-WAN market, which the piece says is growing at a CAGR of 34%. Could Juniper be jumping onto the 128T bandwagon to get a piece of the action? Actually, Juniper had an SD-WAN solution already, but it certainly wasn’t something that was sweeping the market. The biggest consideration, though, isn’t as much to grab a share of the market, but to track where the market is going.

SD-WAN is a terrible name for a market that’s actually evolving to be something very much more. My own regard for 128 Technology was based on its potential for that “very much more”, which is (and was) “virtual networking”. The majority of SD-WAN products and services don’t do anything more than connect small sites to a company VPN, or perhaps offer some cloud networking. That’s shooting way behind the market-evolution duck at this point.

Networking has never been about sites, it’s always been about people. Companies pay to make workers productive, and because workers are concentrated in workplaces, we networked workplaces. The thing is, the goal was never to make networks about sites or traffic, but to make networks about facilitating relationships, relationships between people, between people and applications, and even between application components. If you read what I’ve said about 128 Technology, that’s what they do, and what they’ve done from the first. That’s their strategic story.

Which leaves us the last point. The article quotes Andy Ory, co-founder and CEO of 128 Technology, saying ““What separates us from every other SD-WAN player in the world is that we have a much, much smaller computational footprint and we leverage the underlay. We don’t instantiate really costly, computationally expensive tunnels.” If this is their true differentiator, it means that 128T offers better plumbing, which might not be a great basis for a deal with Juniper.

It isn’t the true differentiator, in my view. From the very first, well over two years ago, I pushed back on even 128 Technology’s own quotes and stories about “lower overhead” and “no-tunnel” differentiation. The reason is that to take SD-WAN into virtual networking, those aren’t the right drivers. The key to 128 Technology’s value is session awareness, as I’ve said from the very first. There is value to lower tunnel header overhead, and to lower compute overhead in terminating tunnels, but the value is derived because of the limitations that high overhead could impose on session awareness. Without session awareness, those other points are like arguing over plastic or copper pipe in the wastewater plumbing mission.

A session is a relationship, a communications path that carries some kind of value. For enterprise applications, sessions represent information flows that support (in some way) worker productivity. They are why we have applications, why we build networks, because without them we can’t deliver information to information consumers and enhance their productivity, their value. Sessions are, in the OSI model, a “higher layer”, and in the OSI model, we often forget that the lower layers (including Level 2 and 3 where all the attention in networking is focused) are just plumbing to carry sessions.

Every application access is a session. Every video you watch is a session, and so are your Zoom conferences or VoIP calls. Given that sessions are the foundation of network value, you could almost be justified saying that something that was aware of them, could handle packets based on that awareness, had to be better than something that wasn’t, and couldn’t. You don’t have to, though, because there is solid value to both session awareness and session-based handling. If Mist AI is about optimizing, then what better goal to optimize toward than sustaining the combined value of sessions?

The quote that sums up the piece is this: “Raymond James analyst Simon Leopold reflected my thinking, saying the deal was expensive but that Juniper pretty much had to do something to stoke growth in its software story.” I can’t understand that view at all. Most enterprises probably wouldn’t even classify SD-WAN as “software” since they’d get it delivered on a device. In any event, there is no network company on the planet who has a “software story” in a cohesive sense, because there’s no cohesion to network software. Finally, if all you want is software, simply adopt some open-source tool and sing like a bird in front of the media. The whole “software” story for network companies was pushed out by stories, not by network companies. They’re just going with the flow while they work feverishly behind the scenes.

Toward what goal? What network companies are trying to do is to reduce their dependency on proprietary appliances, things like switches and routers. They’re doing that because switches and routers are the target of buyer initiatives to reduce cost and vendor lock-in, in a market that seems to buyers to be demanding higher prices when there’s no compensatory higher value obtained. White-box switches and open-source software are eating what we could call “vanilla” routing and switching, because everything with a static feature set gets commoditized. There is now, and there never has been, any escape from that.

That means what you need is a dynamic feature set. Dynamism, though, has to mean changing in the right direction, not just blowing hype waves out of your PR department. If there is a right direction in networking today, it’s maximizing the application of resources, and the response to problems, based on the revenue associated with what’s being impacted. That’s just another way of saying “session awareness”. If you know what relationships a session is supporting, and if you know what sessions make up a particular assembly of “traffic” that needs handled, you know the value of the traffic and how to trade resources against it. Juniper’s Mist AI platform is all about using AI to manage networking. Wouldn’t adding session-awareness to that be a big step forward for Mist AI, and for Juniper?

The last sentence of the piece is “Long-suffering Juniper investors may now have to look out to 2022 to see the elusive boost in earnings and growth.” If the article is correct, and if Juniper is doing the 128 Technology deal to get a “software story”, they could be right. I’m hoping, and in fact I’m predicting, that the story is really about session awareness. I’m also hoping that Juniper is as insightful about the potential of 128T’s stuff as they could be, need to be. That they might not be is the biggest risk to Juniper’s investors, not the deal.

There’s an alternate story here, another possibility for the ending that the ending; “long-suffering Juniper investors” may now have a credible path to the differentiation that the securities analyst told. One that could lead them to a boost in earnings, growth, and share prices. A path they’re never going to have if they keep hunkering down to defend the box business or mouthing silly platitudes. I’m not saying that this is a slam dunk; Juniper doesn’t have a good record on M&A, and because there are financial analyst mutterings about the price, it’s incumbent on Juniper to deliver some clear value here. But this is the most promising thing Juniper has done so far, period.

Vendors and Operators: Changing Dynamic?

Should prey help predators by cooperating with them in the hunt? A Light Reading story on the Broadband World Forum conference highlights the view that operators want the telecom equipment vendors to “get more involved in open source projects instead of sitting on the sidelines.” Sorry, but that sure sounds to me like proposing a cheetah ask a gazelle to slow down a bit. Operators need to accept the basic truth that open-source tools and other transformation projects that cut operator costs are also cutting vendor revenue.

The problem here, according to operators who’ve opened up to me, is that the operators don’t know what else to do. They’ve proved, in countless initiatives over the last fifteen years, that they have no skill in driving open-source software, or even driving standards initiatives that are based on software rather than on traditional boxes. I’ve criticized the operators for their failure to contribute effectively to their own success, but maybe it’s time to recognize that they can’t help it.

Back in my early days as a software architect, I was working with a very vocal line department manager who believed that his organization had to take control of their destiny. The problem was that he had nobody in the organization who knew anything about automating his operation. His solution was to get a headhunter to line up some interviews, and he hired his “best candidate.” When he came to me happily to say “I just hired the best methods analyst out there,” I asked (innocently) “How did you recognize him?”

If you don’t have the skill, how can you recognize who does? That’s the basic dilemma of the operators. In technology terms, the world of network transformation is made up of two groups. The first, the router-heads, think of the world in terms of IP networks built from purpose-built appliances. The second, the cloud-heads, think of the world in terms of virtualized elements and resource pools. Operators’ technologists fall into the first camp, and if you believe that software instances of network features are the way of the future, you need people from the second. Who recognizes them in an interview?

It’s worse than that. I’ve had a dozen cloud types contact me in 2020 after having left jobs at network operators, and their common complaint was that they have nowhere to go, no career path. There was no career path for them, because the operators tended to think of their own transformation roles as being transient at best. We need to get this new cloud stuff in, identify vendors or professional services firms who will support it, and then…well, you can always find another job with your skill set.

It’s not that operators don’t use IT, and in some cases even virtualization and the cloud. The problem is that IT skills in most operators are confined to the CIO, the executive lead in charge of OSS/BSS. That application is a world away, in technology terms, from what’s needed to virtualize network features and support transformation. In fact, almost three-quarters of operators tell me that their own organizational structure, which separates OSS/BSS, network operations, and technology testing and evaluation, is a “significant barrier” to transformation.

About a decade ago, operators seemed to suddenly realize they had a problem here, and they created executive-level committees that were supposed to align their various departments with common transformation goals. The concept died out because, according to the operators, they didn’t have the right people to staff the committees.

How about professional services, then? Why would operators not be able to hire somebody to get them into the future without inordinate risk? Some operators are trying that now, and some have already tried it and abandoned the approach. Part of the problem comes back to those open-source projects. Professional services firms are reluctant to spend millions participating in open initiatives that benefit their competitors as much as themselves. Particularly when, if they’re successful, their efforts create products that operators can then adopt on their own.

Why can’t cloud types and box types collaborate? The come at the problem, at least in today’s market, from opposite directions. The box types, as representatives of the operators and buyers, want to describe what they need, which is logical. That description is set within their own frame of reference, so they draw “functional diagrams” that look like old-line monolithic applications. The cloud people, seeing these, convert them into “virtual-box” networks, and so all the innovation is lost.

Here’s a fundamental truth: In networking, it will never be cheaper to virtualize boxes and host the results, than it would be to simply use commodity boxes. The data plane of a network can’t roam around the cloud at will, it needs to be where the trunks terminate or the users connect. We wasted our time with NFV because it could never have succeeded as long as it was focused only on network functions as we know them. The problem is that the network people, the box people, aren’t comfortable with services that live above their familiar connection-level network.

This “mission rejection” may be the biggest problem in helping operators guide their own transformation. You can’t say “Get me from New York City to LA, but don’t cross state lines” and hope to find a route. Operators are asking for a cost-limiting strategy because they’re rejecting everything else, and that’s what puts them in conflict with vendors.

The notion of a separate control plane, and what I’ve called a “supercontrol-plane” might offer some relief from the technical issues that divide box and cloud people. With this approach, the data path stays comfortably seated in boxes (white boxes) and the control plane is lifted up and augmented to offer new service features and opportunities. But here again, can you create cloud/box collaboration on this, when the operators seem to want to sit back and wait for a solution to be presented? When the vendors who can afford to participate in activities designed to “transform” are being asked to work against their own bottom lines, because the transformation will lower operator spending on their equipment?

New players at the table seem a logical solution. Don Clarke, representing work from the Telecom Ecosystems Group, sent me a manifesto on creating a “code of conduct” aimed at injecting innovation into what we have to say is a stagnating mess. I suggested another element to it, the introduction of some form of support and encouragement for innovators to participate from the start in industry initiatives designed to support transformation. I’ve attended quite a few industry group meetings, and if you look out at the audience, you find a bunch of eager but unprepared operator types and a bunch of vendors bent on disrupting changes that would hurt their own interests. We need other faces, because if we don’t start transformation initiatives right, there’s zero chance of correcting them later.

This is one reason why Kevin Dillon and I have formed a LinkedIn group called “The Many Dimensions of Transformation”, which we’ll be using to act as a discussion forum. Kevin and I are also doing a series of podcasts on transformation, and we’ll invite network operators to participate in them where their willing to do so and where it’s congruent with their companies’ policies. A LinkedIn group doesn’t pose a high cost or effort barrier to participation. It can serve as an on-ramp to other initiatives, including open-source groups and standards groups. It can also offer a way of raising literacy on important issues and possible solutions. A community equipped to support insightful exchanges on a topic is the best way to spark innovation on that topic.

Will this help with vendor cooperation? It’s too early to say. We’ve made it clear that the LinkedIn group we’re starting will not accept posts that promote a specific product or service. It’s a forum for exchanges in concepts, not a facilitator of commerce. It’s also possible for moderators to stop disruptive behavior in a forum, where it’s more difficult to do that in a standards group or open-source community, where vendor sponsorship often gives the vendors an overwhelming majority of participants.

The vendor participation and support issue are important, not only because (let’s face it) vendors have a history of manipulating organizations and standards, but because vendors are probably critical in realizing any benefits from new network technology, or any technology, in the near term. Users don’t build, they use. Operators are, in a technology/product sense, users, and they’ve worked for decades within the constraint of building networks from available products, from vendor products.

I’d love to see the “Code of Conduct” initiative bear fruit, because it could expand the community of vendors. I’d also love to see some initiative, somewhere, focus on the ecosystem-building that’s going to be essential if we build the future network from the contribution of a lot of smaller players. You can’t expect to assemble a car if all you’re given is a pile of parts.

Would Satellite Broadband Work Better for IoT than 5G?

Should we be thinking about a Satellite Internet of Things? The emerging battle between Elon Musk and Jeff Bezos for low-earth-orbit satellite broadband raises the question, not because satellite broadband is a universal option, but because it’s an option where other options don’t exist, and likely won’t for decades. That could be a killer IoT opportunity.

Satellite broadband isn’t the Holy Grail of broadband in general. In nearly every case where terrestrial options are available to consumers, they’d be better off taking them. Furthermore, my model says that 5G/FTTN millimeter-wave and even 5G mobile technology offer a better general solution to consumer broadband problems caused by low demand density. But IoT is different, or at least the “real” new-connection IoT opportunity is.

Where IoT is within a facility, it will almost always be cheaper to use traditional short-range wireless technology, or even sensor wiring, to connect it. Where the IoT elements are widely spaced and, in particular, when they’re actually mobile, you need some form of wide-area solution. The operators’ hopes for massive 5G revenue were (and in some cases, still are) based on the vain hope that companies will pay monthly 5G bills to connect what WiFi could connect already. The real question is those things that WiFi can’t connect.

5G faces a problem that goes back as far as (gasp!) ISDN. CIMI Corporation signed the Cooperative Research and Development Agreement with the National Institute of Standards and Technology, and so I had an opportunity to play in the early justification for ISDN. I remember well one particular occasion when a technical representative from a big vendor came in and said, with breathless excitement, “We’ve discovered a new application for ISDN! We call it ‘File Transfer’!” Well, gosh, we were already doing that. The point is that there’s always been a tendency to justify a new technology by showing what it can do, rather than what it can do better.

We can do any sort of IoT connectivity with 5G, and nobody questions that. However, we already have IoT in millions of residential applications, applications which demand a very cost-effective and simple-to-deploy solution, and none of it uses 5G. Rather than looking at how 5G could help where no help is needed, why not look at what current technology doesn’t do well?

The most demanding of all IoT applications involve sensors and controllers that are mobile. 5G is a mobile technology, so why wouldn’t we see it as a possible fit? I had an opportunity to engage a big Tier One about their 5G developer program, and I mentioned this issue to one of the program heads. The response was “Yes, that’s a great application, but there aren’t enough of those mobile sensors to create a really good market for us.” In other words, start with the number you want and then make up stories that appear to get you there.

The problem with 5G sensors is that if they consume mobile service, they require a monthly cost. A small percentage of home security systems have mobile-radio connectivity to the alarm center, and these will add around two hundred dollars per year (or more) to the cost of monitoring, plus the cost of the system itself. Imagine a vast sensor network, with each sensor racking up a nice fat cellular service bill, and you see why my Tier One program head thought it would be exciting. It would, for those getting the money. For those spending it, not so much.

The interesting thing was that the day after I had this conversation with the operator, I was watching one of my wildlife shows, and it was about elephant tracking. They had a satellite GPS system attached to an elephant, and it provided a means of getting regular position updates on the animal. No 5G (or even 4G) infrastructure needed. Why not have satellite IoT? I’m not suggesting that the exact same technology be used, but that satellite Internet could in fact support most of the remote-site and mobile IoT applications that have been proposed for 5G.

The nice thing about satellite is that almost all the cost is in getting the bird into orbit. Once you’ve done that, you can support users up to the design capacity at no incremental service cost. No matter where the sensors are installed, no matter how they might roam about, you can keep in touch with them. Since sensor telemetry is low-bandwidth, you could support a heck of a lot of sensors with one of those satellite systems.

Satellite-based IoT would be a great solution for the transportation industry. Put a “goods collar” on a shipment, on every vehicle that carries goods, and on every facility that handles and cross-loads the goods, and you could track something everywhere it goes in near real time. Wonder if your freezer car is malfunctioning and letting all your expensive lobster or tuna spoil? You can know about the first sign of a problem and get something out to intercept and fix or replace the broken vehicle. Vandalism? Covered. Satellite applications of IoT for transportation, or for anything that’s mobile, could be killer apps for those competing satellite networks.

So, probably, could many fixed-installation applications that people are also claiming as 5G opportunities. Sensors and controllers in an in-building IoT system can be connected through local wiring or a half-dozen different industrial control RF protocols, including WiFi and WiFi 6. Stuff that’s somewhere out there in the wild aren’t so easily connected, but they’d be child’s play for satellite IoT. Even power could be less an issue. Elephants are big, but they can’t carry a substation on their collar, so you can make these satellite systems’ power requirements modest enough to be supported for a long period on a battery.

Battery? Who knows batteries better than Musk? Amazon already builds IoT sensor/controller devices in the Ring line. It seems to me like these guys are missing an opportunity by not pushing IoT applications for their dueling satellite data networks.

Or are they? I hear some whispers that there are in fact a number of initiatives either being quietly supported by one of the contenders for satellite-data supremacy, or are being watched closely by one. Some are attracting the attention of both. We may see some action in this space quickly, and if it does, not only will it be a powerful validation of satellite broadband, it could force some realism into claims of 5G applications. After all, we’ve done a heck of a lot of file transfer, and it didn’t help ISDN.

Juniper Gets the Best SD-WAN, and Combined with Mist AI, it Could Take Off

Juniper Networks has announced its intention to acquire 128 Technology, the company I’ve always said was the drop-dead leader in SD-WAN and virtual network technology. 128T will apparently be integrated with Juniper’s Mist AI at some point, and the combination of the technologies opens up a whole series of options for service creation and automation, not only for enterprises, but also for service providers and managed-service providers.

The press release from Juniper isn’t the best reference for the deal. In the release, they reference a blog entry that frames the combination of 128 Technology and Mist AI very effectively. The fusion has both Juniper-specific implications and industry implications, and at both the tactical and strategic levels. I want to focus mostly on the industry-strategic stuff here, with a nod here and there to the other dimensions.

We are in a virtual age. Enterprises run applications on virtual hosts and connect to them with virtual private networks. The cloud, arguably the most transformational thing about information technology in our time, is all about virtualization. I doubt if there are any enterprises today who believe virtualization is unimportant; certainly, none I’ve talked with hold that view. And yet….

….and yet we don’t seem to have a really good handle on what living in a virtual world really means. We run on stuff, connect with stuff, that’s not really there. How do we manage it? How do you “optimize” or “repair” something that’s simply your current realization of an abstraction? Could it be that until we understand the virtual world, everything we do with the cloud, the network, the data center, are like bikes on training wheels?

One of my old friends in IT admitted the other day that “virtualization gives me a headache.” It probably does that with a lot of people, both the older ones who now have risen to senior roles, and the newcomers who are inclined to see virtualization as sticking post-it notes on infrastructure than then trying to run on them. So here’s an interesting question: If humans have a problem coming to terms with the virtual world, why not give it over to AI?

When Juniper acquired Mist Systems, it seemed from the release that Juniper was doing it to get an AI-powered WiFi platform. Whether that would have been a good move is an open question, but Juniper evolved the Mist relationship to the “Mist AI” platform as a tool to create and optimize connectivity over all of Juniper’s platforms, in the LAN and WAN.

In order to do the stuff that Juniper/Mist promises, you need three things. First and foremost, because they focus so much on optimization, you need objectives. Otherwise, how do you know what’s “optimum?” The second thing you need are constraints, because the best answer may not be one of the choices. Finally, you need conditions, which represent the baseline from which you’re trying to achieve your objectives. AI is actually a potentially great way to digest these three things and come up with solutions.

Mist doesn’t describe their approach to the application of AI to broad network optimization, but I think it surely involves addressing these three things. Thus, the big question on the 128T deal, from Juniper’s side, is how the deal could help with Mist’s objectives. There are, I think, two ways.

The first thing 128T adds to Mist AI is session awareness. Those of you who have followed my coverage of 128 Technology know that this has always been, in my view, their secret sauce. Yes, they can eliminate tunnel overhead, but what makes them different is that they know about user-to-application relationships. The driver of enterprise IT and network investment is, and always has been, productivity enhancement. Workers can’t be made productive by an application, technology, or network service that doesn’t know what they’re doing. Except, perhaps, by accident, and accidental gains are a pretty lame story to take to a CFO. 128 Technology is based on recognizing session relationships, so it knows who’s trying to do what, and that knowledge is essential in any AI framework that wants to claim to “optimize”.

The second way 128 Technology adds to the Mist AI story is that knowing about something you can’t impact is an intellectual excursion, not a business strategy. Traditional networking, including traditional SD-WAN, is all about connecting sites. There are a lot of users in any given site, doing a lot of stuff, and much of it is more likely to be entertaining them or getting them dinner reservations than enhancing company sales and revenues. The relationships between workers and applications are what empower them (or entertain them), so you need to be able to control these to make them more productive. It’s not enough to know that a critical Zoom conference isn’t working because of bandwidth issues. You need to be able to fix it, and 128 Technology can prioritize application traffic and provide preferential routing for it, based on the specific application-to-user relationships.

Sum this up, then. Combining Mist AI and 128T’s session awareness can first extend Mist AI’s awareness of network relationships down to the user/application level, a level where productivity tuning is critical. Most companies would likely prioritize workers’ interactions with applications based on their importance to company revenue generation or unit value of labor. 128T can gather data at that level, and feed the AI vision of where-we-are relative to where-we-should be with the best and most relevant information. Once that information has been AI-digested, the results can be applied in such a way as to maximize network commitment to business benefits. What more can you ask?

Well, we could at least wonder where Juniper might take all of this. If we presume that Mist AI and 128 Technology combine to support those three requirements of optimization, we could ask whether it creates the effect of a higher, control-plane-like, element. Does AI and collected data combine to establish real understanding of the network below, understanding that could be molded into new services? Could session-awareness, the key attribute of 128 Technology’s product, be used to map data flows over arbitrary infrastructure? Since I’ve always said that 128 Technology was as much a virtual network solution, an application network solution, as an SD-WAN, could Juniper use it to augment Mist and create a Network-as-a-Service model?

Both Cisco and Juniper have unbundled their hardware and software, making it theoretically possible that they could offer hardware as a kind of “gray box” and software as a generalized routing engine. Could Mist AI and 128 Technology provide them a way of enhancing their value in this unbundled form, and accommodating white-box and even SDN within a Juniper-built network? Cisco has nothing comparable, which wouldn’t break hearts at Juniper. There’s a lot of potential here, but without details on both how Mist AI works and where Juniper plans to take 128 Technology, we can’t do more than guess whether it will be realized.

Ponder, though, the title of Juniper’s blog (referenced above): “The WAN is Now in Session.” Nice marketing, and perhaps an introduction to something far more.

My Response to the Code of Conduct Framework

I was very pleased and interested when Don Clarke, an old friend from the days of NFV, posted a link on a “code of conduct” to “boost innovation and increase vendor diversity”. He asked me to comment on the paper, and I’m going to give it the consideration it deserves by posting a blog on LinkedIn as a comment, then following the thread to respond to any remarks others might make.

One of the biggest barriers to a real transformation of network operator business models and infrastructure is the inherent tension between incumbency and innovation. The vendors who are most entrenched in a network are the ones least likely to see a benefit from radical change, and thus the least likely to innovate. The vendors who are likely to innovate are probably small players with little or no current exposure in operator networks. For transformation to occur, we either have to make big vendors innovate at their own business risk (fat chance!) or we have to somehow help smaller players engage with operators. The barriers to that second and only viable option are profound, for four reasons.

First, startups that aim at the infrastructure space and target network operators are far from the wheelhouse of most VCs, so it’s difficult to even get started on such a mission. There was a time perhaps 15 years ago when VCs did a blitz on the space, and nearly all the startups that were founded in that period failed to pay back as hoped. The investment needed to enter the space is large, the time period needed for operators to respond is long, and the pace of deployment even if they make a favorable decision, means payback might take years. All this is exacerbated by the cost and complexity of creating and sustaining engagement with operators, which is the focus of the rest of the barriers below.

Second, transformation projects, because of their scope of impact, require senior executive engagement, which smaller firms often can neither establish nor sustain. A big router vendor can probably call on the CTO, CIO, or CIO of most operators with little trouble, and in many cases get a CFO or CEO meeting. A startup? Even if somehow the startup gets in the door, the big vendors have a team of experts riding herd on the big operators. That kind of on-site sales attention is simply not possible for smaller companies.

Third, transformation initiatives by operators usually involve standards processes, international forums, open-source projects, and other technical activities. Participation in any one of these is almost a full-time job for a high-level technical specialist. Big vendors staff these activities, often with multiple people, and thus engage with operator personnel who are involved in transformation. Small companies simply cannot donate the time needed, much less pay membership fees in the organizations that require them and pay for the travel necessary.

Finally, operator procurement of products and services impose conditions easily (and regularly) met by major vendors, but beyond those normally imposed on startups by smaller prospects they regularly call on. As a result, simply complying with the song-and-dance needed to get an engagement may be a major investment of resources. Some operators require financial disclosures that private companies might be unwilling to make, or insurance policies that would be expensive enough to be a drain on resources.

It’s my view that if the Code of Conduct proposes to impact these areas, it has a chance of doing what it proposes to do, which I agree is very important. Let’s look then at the paper in each area.

In the area of VC credibility for funding transformation startups, the paper makes a number of suggestions in its “Funding” subsection, the best of which would be an “investment fund”. I do have concerns that this might run afoul of international regulatory practices, which often cite cooperative activities by the operators as collusion. If a fund could be made to work, it would be great and the suggestions on managing it are good.

If a fund wouldn’t be possible, then I think that operator commitment to a small-vendor engagement model, such as that described in the paper, might well be enough. VCs need the confidence that the whole process, from conception of a transforming product through either an IPO or M&A, will work because transformation opportunities will exist and can be realized. For this to be true, though, the strategies for handling the other three issues have to be very strong.

The next issue is that appropriate engagement is difficult to achieve and sustain for startups. Some points relating to this are covered in the paper’s “Innovation” and “Competition” subheads. Because some of those same points relate to my participation in industry activities section, I’ll cover both these issues and the paper’s sections in a single set of comments.

I like the notion of facilitating the field trials, but as my third issue points out, participation in industry events is a critical precursor to that. Startups need to be engaged in the development of specifications, standards, software, and practices, if they’re to contribute innovation at a point where it has a chance of being realized. You cannot take a clunker idea out of a fossilized standards process and ask for innovation on implementation. It’s too late by then.

I’d propose that operators think about “creative sponsorships” where individuals or companies who have the potential to make major innovative contributions are funded to attend these meetings. Individual operators could make such commitments if collective funding proves to pose anti-trust issues. These sponsorships would require the recipients to participate, make suggestions, and submit recommendations on implementation. From those recommendations, the “Innovation” recommendations in the paper could be applied to socialize the ideas through to trials.

This would also address the issue of “public calls” and “open procurements” cited in the Competition portion of the paper. The problem we have today with innovation is most often not that startups aren’t seen as credible sources in an RFP process, but that the RFP is wired by the incumbent and aimed at non-innovative technology. Operators who want an innovative solution have to be sure there’s one available, and only then can they refine their process of issuing RFIs and RFCs to address it.

A final suggestion here is to bar vendors from participation in creating RFIs and RFCs. I think that well over 80% of all such documents I’ve seen are influenced so strongly by the major vendors (the incumbent in particular) that there’s simply no way for a startup to reflect an innovative strategy within the constraints of the document.

The final issue I raised, on the structuring of the relationship and contractual requirements, is handled in the Procurement piece of the paper, and while I like all the points the paper makes, I do think that more work could be done to grease the skids on participation early on. There should be an “innovative vendor” pathway, perhaps linked to those creative sponsorships I mentioned, that would certify a vendor for participation in a deal without all the hoop-jumps currently required.

In summary, I think this paper offers a good strategy, and I’d be happy to work with operators on innovation if they followed it!

How Good an Idea is the “ONF Marketplace?”

The ONF may just have done something very smart. It’s been clear for at least a decade that operators want to buy products rather than just endorsing standards, but how do the products develop in an open-source world where no single player fields a total solution? The ONF says that the answer to that is ONF Marketplace. The concept has a lot of merit, but it’s still not completely clear that the ONF will cover the whole network-ecosystem waterfront, and that might be an issue.

For decades there’s been a growing disconnect between how we produce technology and how we consume it. Technology is productized, meaning that there are cohesive functional chunks of work that are turned into “applications” and sold. Rarely do companies or people consume things this way. Instead, they create work platforms by combining the products. Microsoft Office is a great example of such a platform. When technology is “platformized” like this, the natural symbiosis between the products builds a greater value for the whole.

The same thing happens in the consumer space. Consumers don’t care about network technology, they care about experiences, and so consumer broadband offerings have to be experience platforms. They have to deliver what the consumer wants, handle any issues in delivery quality quickly and cheaply, and evolve to support changes in consumer expectations or available experiences. It’s not just pushing bits around.

In networks, platforms for work or experience are created by integrating all the network, hosting, and management technologies needed.

Historically, networks were built by assembling products, and to maximize their profits, vendors also produced related products that created the entire network ecosystem. When you bought routers from Cisco, for example, you’d get not only routers but management systems and related tools essential in making a bunch of routers into a router network. That goal—making routers into router networks—is the same goal we have today, but with open components and startups creating best-of-breed products, it’s not as easy.

Look at a virtualization-based or even white-box solution today. You get the boxes from Vendor A, the platform software for the white boxes from Vendor B, the actual network/router software from Vendor C. Then you have to ask where the operations and management tools come from. The problem is especially acute if you’ve decided on a major technology shift, something like SDN. Traditional operations/management tools probably won’t even work. How do you convert products to platform?

The ONF Marketplace is at least an attempt to bridge the product/platform gap. If you establish a set of standards or specifications and certify against them, and if you also align them to be symbiotic, buyers would have more confidence that getting a certified solution would mean getting an integrated, complete, solution.

The fly in the ointment is the notion of an “integrated, complete, solution”. There are really three levels of concern with regard to the creation and sustaining of a complete transformation ecosystem. Does the ONF Marketplace address them all, and if not, is what’s not part of the deal critical enough to threaten the goal overall?

The ONF has four suites in its sights at the moment; Aether, Stratum, SEBA and VOLTHA. Aether is a connectivity-as-a-service model linked to mobile (4G and 5G) networking. Stratum is a chip-independent white-box operating system, SEBA is a virtual-based PON framework for residential/business broadband and mobile backhaul, and VOLTHA is a subset of SEBA aimed at OpenFlow control of PON optics. One thing that stands out here is that all of these missions are very low-level; there’s nothing about management and little about transformation of IP through support of alternative routing—either in “new-router” or “new-routing” form.

The ONF does have a vision for programmable IP networks, based on OpenFlow and SDN, but as I’ve noted in prior blogs, the concept doesn’t have a lot of credibility outside the data center because of SDN controller scalability and availability fears. There is really no vision at a higher level, no management framework, no OSS/BSS, and nothing that really ties these initiatives to a specific business case. That all raises some critical questions.

The first is whether transformation is even possible without transforming IP. Operators don’t think so; I can’t remember any conversation I’ve had with an operator in the last five years that didn’t acknowledge the need to change how the IP layer was built and managed. I think that makes it clear that the only players who will be able to transform anything above or below IP will have to start with an IP strategy.

In a left-handed way, that might explain why transformation has been so difficult. The logical players to transform the IP layer would be the incumbents there, and of course those incumbents have no incentive to redesign the network so as to reduce operator spending on their products. Any non-incumbents have to fight against entrenched giants to get traction, which is never easy.

The second question is whether something like ONF Marketplace could elevate itself to consider the infrastructure, network, and hosting management issues. Maybe, but right now the initiative is focused on the ONF’s own work, the specifications it’s developed. The ONF has no position in the management space, nothing to build on. Would they be willing to at least frame partnerships above their own stuff, and then certify them in their marketplace?

Then there’s the key question, which is whether a marketplace is really a way to assemble a transformational ecosystem. An ecosystem has to be characterized by three factors. First, it has to be functionally complete, covering all the technical elements needed to make a complete business case. Second, it has to be fully integrated so that it can be deployed as a single package, without a lot of incremental tuning through professional services. Otherwise, buyers can’t really be sure it’s going to work. Finally, it has to offer specific and credible sponsorship, some player whose credibility is sufficient to make the concept itself credible. How does the ONF Marketplace concept fare in these areas?

It’s not functionally complete. There’s no credible IP strategy at this point, and nothing but open sky above. It is fully integrated within its scope, but because it’s not complete the level of integration of the whole (which isn’t available to judge at this point) can’t be assessed. But what it does have is specific and credible sponsorship. The ONF has done a lot of good stuff, even in the IP area where it’s taking a lead in programmable control-plane behavior, a key to new services.

From this, I think we can make a judgement on the ONF Marketplace concept. If that concept can be anchored at the IP level, either by fixing the SDN-centric view the ONF has now or by admitting to other IP-layer approaches in some way, then the concept can work. In fact, it might be a prototype for how we could create a transformed network model, and sell it to buyers who are among the most risk-adverse in all the world.

I hope the ONF thinks about all of this. They’ve done good work below IP, particularly with Stratum and P4, but they need to get the key piece of the puzzle in place somehow. If they do, they could raise themselves above the growing number of organizations who plan to do something to drive the network of the future. If they do that, they might keep us from getting stuck in the present.

Translating the Philosophy of Complexity Management to Reality

Could we be missing something fundamental in IT and network design? Everyone knows that the focus of good design is creating good outcomes, but should at least equal attention to preventing bad outcomes? A LinkedIn contact of mine who’s contributed some highly useful (and exceptionally thoughtful) comments sent me a reference on “Design for Prevention” or D4P, a book that’s summarized into a paper available online HERE. I think there’s some useful stuff here, and I want to try to apply it to networking. To do that, I have to translate the philosophical approach of the referenced document into a methodology or architecture that can be applied to networking.

The document is a slog to get through (I can only imagine what the book is like). It’s more about philosophy than about engineering in a technical sense, so I guess it would be fair to say it’s about engineering philosophy. The technical sense is that there’s always a good outcome, a goal outcome, in a project. The evolution from simple to complex doesn’t alter the number of good outcomes (one), but it does alter the number of possible bad outcomes. In other words, there’s a good chance that without some specific steps to address the situation, complex systems will fail for reasons unforeseen.

The piece starts with an insight worthy of consideration: “[The] old world was characterized by the need to manage things – stone, wood, iron. The new world is characterized by the need to manage complexity. Complexity is the very stuff of today’s world. This mismatch lies at the root of our incompetence.”—Stafford Beer. I’ve been in tech for longer than most of my readers have lived, and I remember the very transformation Beer is talking about. To get better, we get more complicated, and if we want to avoid being buried in the very stuff of our advances, we have to manage it better. That’s where my opening question comes in; are we doing enough?

Better management, in the D4P concept, is really about controlling and preventing the bad outcomes that arise from complexity, through application of professional engineering discipline. Add this to the usual goal-seeking, add foresight to hindsight, and you have something useful, even compelling, providing that you can make the philosophical approach the paper takes into something more actionable. To fulfill my goal of linking philosophy to architecture, it will be necessary to control complexity in that architecture.

D4P’s thesis is that it’s not enough to try to design for the desired outcome, you also have to design to prevent unfavorable outcomes. I think it might even be fair to say that there are situations where the “right” (or at least “best”) outcome is one that isn’t one of the bad ones. With a whole industry focused on “winning”, though, how do we look at “not-losing” as a goal? General McArthur was asked his formula for success in offensive warfare, and he replied “Hit them where they ain’t”. He was then asked for a strategy for defense, and he replied “Defeat”. Applying this to network and IT projects, it means we have to take the offense against problems, not responding to them in operation but in planning.

Hitting them where they ain’t, in the D4P approach, means shifting from a hindsight view (fix a problem) to a foresight view (prevent a problem by anticipating it). Obviously, preventing something from happening can be said to be a “foresight” approach, but of course you could say that about seeking a successful outcome. How, in a complex system, to you manage complexity, discourage bad outcomes, by thinking or planning ahead? There are certainly philosophers among the software and network engineering community, but most of both groups have a pretty pragmatic set of goals. We don’t want them to develop the philosophy of networking, we want a network. There has to be some methodology that gives us the network within D4P constraints.

The centerpiece of the methodology seems to me to be the concept of a “standard of care”, a blueprint to achieve the goal of avoiding bad outcomes. It’s at this point that I’ll leave the philosophical and propose the methodological. I suggest that this concept is a bit like an intent model. That’s not exactly where D4P goes, but I want to take a step of mapping the “philosophy” to current industry terms and thinking. I also think that intent modeling, applied hierarchically, is a great tool for managing complexity.

D4P’s goal is to avoid getting trapped in a process rather than responding to objective data. We don’t have to look far, or hard, to find examples of how that trap gets sprung on us in the networking space. NFV is a good one, so is SDN, ZTA, and arguably some of the 5G work. How, exactly, does this trap get sprung? The paper gives non-IT comments, but you could translate them into IT terms and situations, which of course is what I propose to do here.

Complexity is the product of the combination of large numbers of cooperating elements in a system and large numbers of relationships among the elements. I think that when faced with something like that, people are forced to try to apply organization to the mess, and when they do that, they often “anthromorphize” the way the system would work. They think of how they, or a team of humans, would to something. In-boxes, human stepwise processes, outboxes, and somebody to carry work from “out” to “in”. That’s how people do things, and how you can get trapped in process.

This approach, this “practice” has the effect of creating tight coupling between the cooperative elements, which means that the system’s complexity is directly reflected in the implementation of the software or network feature. In IoT terms, what we’ve done is created a vast and complex “control loop”, and it’s hard to avoid having to ask questions like “Can I do this here, when something else might be operating on the same resource?” Those questions, the need to ask them, are examples of not designing for prevention.

So many of our diagrams and architectures end up as monolithic visions because humans are monoliths. The first thing I think needs to be done to achieve D4P is to come up with a better way of organizing this vast complexity. That’s where I think that intent models come into play. An intent model is a representation of functionality and not implementation. That presents two benefits at the conceptualization stage of an IT or network project. First, is lets you translate goal behavior to functional elements without worrying much about how the elements are implemented. That frees the original organization of the complex elements from the details that make them complex, or from implementation assumptions that could contaminate the project by introducing too much “process” and not enough “data”.

Artificial intelligence isn’t the answer to this problem. An artificial human shuffling paper doesn’t do any better than a real one. AI, applied to systems that are too complex, will have the same kind of problems that we’ve faced all along. The virtue of modeling, intent modeling, is that you can subdivide systems, and by containing elements into subsystems, reduce the possible interactions…the complexity.

Intent models, functionality models, aren’t enough, of course. You need functionality maps, meaning that you need to understand how the functions relate to each other. The best way to do that is through the age-old concept of the workflow. A workflow is an ordered set of process responses to an event or condition. The presumption of a workflow-centric functionality map is that a description of the application or service, a “contract”, can define the relationship of the functions within the end-result service or application. That was the essence of the TMF NGOSS Contract stuff.

In the NGOSS Contract, every “function” (using my term) is a contract element that has a state/event table associated with it. That table identifies every meaningful operating state that the element can be in, and how every possible event the element could receive should be processed for each of those states. Remember here that we’re still not saying how any process is implemented, we’re simply defining how the black boxes relate to each other and to the end result.

The state/event table, in my view, is the key to introducing foresight and D4P principles to application and service design. We can look at our elements/functions and define their meaningful states (meaningful, meaning visible from the outside), and we can define how the events associated with the elements are linked to abstract processes. If we do this right, and the paper describes the philosophy associated with getting it right, we end up with something that not only recognizes the goal, but also handles unfavorable things. We’ve created a goal-seeking framework for automation.

Does it really address the “design-for-prevention” paradigm, though? We’ve done some of the work, I think, through intent-modeling and functional mapping, because we’ve organized the complexity without getting bogged down in implementation. That reduces what I’ll call the “internal process problem”, the stuff associated with how you elect to organize your task in a complex world. There’s another process issue, though, and we have to look at how it’s handled.

The very task of creating functional elements and functional maps is a kind of process. The state/event table, because it has to link to processes, obviously has to define processes to link to. In the approach I’m describing here, it is absolutely essential that the functional and functional-map pieces, and the event/process combinations, be thoroughly thought out. One advantage of the state/event system is that it forces an architect to categorize how each event should be handled, and how events relate to a transition in operating states. In any state/event table, there is typically one state, sometimes called “Operational”, that reflects the goal. The other states are either steps along the way to that goal, or problems to be addressed or prevented.

At the functional map level, you prevent failures by defining all the events that are relevant to a function and associating a state/process progression to each. Instead of having a certain event, unexpected, create a major outage, you define every event in every state so nothing is unexpected. You can do that because you have a contained problem—your function abstraction and your functional map are all you need to work with, no matter how complex the implementation is. In IoT-ish terms, functional separation creates shorter control loops, because every function is a black box that produces a specific set of interfaces/behaviors at the boundary. No interior process exits the function.

But what about what’s inside the black box? A function could “decompose” into one of two things—another function set, with its own “contract” and state/event tables, or a primitive implementation. Whatever is inside, the goal is to meet the external interface(s) and SLA of the function. If each of the functions is designed to completely represent its own internal state/event/process relationships in some way, then it’s a suitable implementation and it should also be D4P-compliant.

I’ve seen the result of a failure to provide D4P thinking, in a network protocol. A new network architecture was introduced, and unlike the old architecture, the new one allowed for the queuing of packets, sometimes for a protracted period of time. The protocol running over the network was designed for a point-to-point connection, meaning that there was nothing inside the network to queue, and therefore its state/event tables didn’t accommodate the situation when messages were delayed for a long period. What happened was that, under load, messages were delayed so much that the other end of the connection had “timed out” and entered a different state. Context between endpoints was lost, and the system finally decided it must have a bad software load, so everything rebooted. That made queuing even worse, and down everything came. The right answer was simple; don’t ever queue messages for this protocol, throw them away. The protocol state/event process could handle that, but not a delayed delivery.

I think this example illustrates why functionality maps and state/event/process specification is so important in preventing failures. It also shows why it’s still not easy to get what you want. Could people designing a protocol for direct-line connection have anticipated data delayed, intact, in flight and delivered many seconds after it was sent? Clearly they didn’t. Could people creating a new transport network model to replace physical lines with virtual paths have anticipated that their new architecture would introduce conditions that couldn’t have existed before, and thus fail when those conditions did happen? Clearly they didn’t.

Near the end of the paper is another critical point: “Complexity reduction is elimination and unification.” I think that’s what the approach I’m talking about here does, and why I think it’s a way to address D4P in the context of service and application virtualization. That’s why it’s my way of taking D4P philosophy and converting it into a project methodology.

In the same place, I find one thing I disagree with, and it’s a thing that nicely frames the difficulty we face adopting this approach. “Keep in mind that the best goal-seeking methods are scrutably connected to natural law and from that whence commeth your distinguishing difference and overwhelming advantage.” Aside from validating my characterization of the piece as being pretty deep philosophy, this points out a problem. Software architecture isn’t natural law, and cloud development and virtualization take us a long way out of the “natural”. That’s the goal, in fact. What is “virtual” if not “unnatural”. We have to come to terms with the unnatural to make the future work for us.

I agree with the notion of D4P, and I agree with the philosophy behind it, a philosophy the paper discusses, but I’m not much of a philosopher myself. The practical truth is that what we need to do is generalize our thinking within the constraints of intent models, functionality maps, and state/event/process associations, to ensure that we don’t treat things that are really like geometry’s theorems as geometry’s axioms. I think that the process I’ve described has the advantage of encouraging us to do that, but it can’t work any better than we’re willing to make it work, and that may be so much of a change in mindset that many of our planners and architects will have trouble with the transition.