Why I’m Obsessed with Architectures

If you ever wondered why there were so many songs about rainbows, you might be among those who wonder why Tom Nolle talks so much about architectures.  Somebody actually raised that point for me in a joking way, but it occurred to me that not everyone shares my own background and biases, the factors that make me think so much in architecture terms.  What I propose to do is explain why I believe that architectures are so critical in the evolution of networking and computing, and why we don’t have them when we need them.

In the early days of software development, applications were both monolithic and small, constrained by the fact that the biggest computers that a company would have in the early 1960s had less memory than a smartwatch has today.  You could afford to think in terms of baby-steps of processing, of chains of applications linked by databases.  As we moved into the late ‘60s and had real-time or “interactive” systems and computers with more capacity (remember, the first mainframes came along in 1965), we had to think about multi-component applications, and that was arguably dawn of modern “architecture” thinking

An architecture defines three sets of relationships.  The first is the way that application functionality is divided, which sets the “style” of the application.  The second is how the application’s components utilize both compute and operating-system-and-middleware resources, and the third is how the resources relate to each other and are coordinated and managed.  These three are obviously interdependent, and a good architect will know where and how to start, but will likely do some iterating as they explore each of the dimensions.

The role of architecture came along very early in the evolution of software practices, back in the days when much of application development was done in assembler language, so the programming languages themselves didn’t impose any structure (functions, procedures, “blocks” of code).  Everyone quickly realized that writing an application as a big chunk of code that wandered here and there based on testing variables and making decisions (derisively called “spaghetti code” in those days) created something almost unmaintainable.  Early programming classes often taught methods of structuring things for efficient development and maintainability.

Another reason why we started having architectures and architects was that it makes sense to build big, complex, systems using a team approach.  The systems are essentially the integration of related elements that serve a common goal—what I tend to call an “ecosystem”.  The challenge, of course, is to get each independent element to play its role, and that starts by assigning roles and ensuring each element conforms.  That’s what an architecture does.

The 3GPP specifications start as architectures.  They take a functional requirement set, like device connectivity and mobility, and divide it into pieces—registration of devices, mobility management, and packet steering to moving elements.  They define how the pieces relate to each other—the interfaces.  In the 3GPP case, they largely ignore the platforms because they assume the mobile ecosystem is made up of autonomous boxes whose interfaces define both their relationships and their functionality.  It doesn’t really matter how they’re implemented.

Applications also start as architectures these days, but an application architecture has to start with a processing model.  Some applications are “batch”, meaning they process data that’s stored in a repository.  Others are “transactional”, meaning that they process things that follow an input-process-update-result flow.  Still others are “event-driven” meaning that they process individual signals of real-world conditions.  Because applications are software and utilize APIs, and because the hosting, operating system, and middleware choices are best standardized for efficiency, the resource-relationship role of an architect is critical for applications—and for anything that’s software-centric.

Suppose we were to give four development teams one step in our input-process-update-result flow and let them do their thing optimally based on its individual requirements.  We might have a super-great GUI that couldn’t pass data or receive it.  That’s why architectures are essential; they create a functional collective from a bunch of individual things, and they do it by creating a model into which the individual things must fit, thereby ensuring they know about how they’re supposed to relate to each other.

You can see, from this, two challenges in architecture that have contaminated our network transformation goals.  The first is that network people are, by their history, box people.  They think in terms of functional distribution done by boxing and connecting.  When you apply that kind of thinking to software-and-cloud network infrastructure, you create “soft-box networking”, which doesn’t optimize the software transformation because it’s constrained.  The second is that if the ecosystem you’re trying to create is really large, and if it’s divided up into autonomous projects, there’s usually no overarching architecture picture at all.

NFV suffered from both these problems.  The NFV “end-to-end architecture” was a box architecture applied to a software mission.  The architecture guides the implementation, and in the case of NFV what some said was supposed to be only a “functional diagram” became an application blueprint.  Then the NFV ISG declared the surrounding pieces of telco networking, like operations, to be out of scope.  That meant that the new implementation was encouraged to simply connect to the rest of the ecosystem in the same way as earlier networks did, which meant the new stuff had to look and behave like the old—no good path to transformation comes from that approach.

Anyone who follows NFV knows about two problems now being cited—onboarding and resource requirements for function hosting.  The NFV community is trying to make it easier to convert “physical network functions” or appliance code into “virtual network functions”, but the reason it’s hard is that the NFV specs didn’t define an architecture whose goals included making it easy.  The community is also struggling with the different resource requirements of VNFs because there was never an architecture that defined a hardware-abstraction layer for VNF hosting.

Even open-source projects in middleware and cloud computing can suffer from these problems.  Vendors like Red Hat struggle to create stable platform software for user deployment because some middleware tools require certain versions of other tools, and often there’s no common ground easily achieved when there’s a lot of tools to integrate.  We also normally have multiple implementations of the same feature set, like service mesh, that are largely or totally incompatible because there’s no higher-level architecture to define integration details.

What happens often in open-source to minimize this problem is that an architecture-by-consensus emerges.  Linux, containers, Kubernetes, and serverless evolved to be symbiotic, and future developments are only going to expand and enhance the model.  This takes time, though, and for the network transformation mission we always talk about, time has largely run out.  We have to do something to get things moving, and ensure they don’t run off in a hundred directions.

Networks are large, cooperative, systems, and because of that they need an architecture to define component roles and relationships.  Networks based on software elements also need the component-to-resource and resource-to-resource relationships.  One solid and evolving way of reducing the issues in the latter two areas is the notion of an “abstraction layer”, a definition of an abstract resource everything consumes, and that is then mapped to real resources in real infrastructure.  We should demand that every implementation of a modern software-based network contain this (and we don’t).

But who produces the architecture?  That’s the challenge we’ve had demonstrated in almost every networking project involving service provider infrastructure for as long as I’ve been involved in the space (which goes back to international packet networks in the ‘70s).  Network people do boxes and interfaces, and since it’s network people who define the functional goals of a network project, their bias then contaminates the software design.  Open-source is great to fill in the architecture, but not so great at defining it, since there are so many projects contending for the mission.

This is why we need to make some assumptions if we’re ever to get our transforming architecture right.  The assumptions should be that all network functions are hosted in containers, that cloud software techniques will be adopted, at least as the architecture level, and that the architecture of the cloud be considered the baseline for the architecture of the network, not the previous architecture of the network.  Running in place is no longer an option, if we want the network of the future to actually network us in the future.

There are a lot of songs about rainbows because they’re viscerally inspiring.  I’m always singing about architectures because, in a software-driven age, we literally cannot move forward without them.  Large, complex, systems can never accidentally converge on compatible thinking and implementation.  That’s what architectures do, and why they’re important—even in networks.

Without a Network Equipment Price Leader, What Happens?

What happens when a price leader cannot lead, or maybe even follow?  In the world of carrier networking, we may be about to find out.  Whatever you think about the war between the US Government and Huawei, the impact on Huawei seems to be increasing, and that could have a major impact on telecom, not only on the capex budgets but on network technology itself.  In fact, the only answer to the end of the price-leader paradigm may be a new architecture.

Operators today spend between 18 and 25 cents per revenue dollar on capital equipment, something that’s been an increasing burden as their revenue stagnates and costs seem to be either static or even increasing.  Of the 77 operators I’ve interacted with in the last 6 months, 73 have said that they are in a “critical profit-per-bit” squeeze.  For most, capital budgets have been an attractive target.

The problem, of course, is how you attack them.  A decade ago, some operators were experimenting with hosting router instances on servers, and about seven years ago they launched the Network Functions Virtualization initiative.  Neither of these has proved out as a means of significant capex reduction.  Only 19 of those 77 operators think that either of these initiatives will ever lower their capex significantly.

It’s obvious where Huawei comes into this picture.  They’ve consistently been the price leader in the network equipment space.  Back in 2013 when I was attending a networking conference in Europe, I met with a dozen operator experts on NFV and transformation, and one made the comment that the 25% capex improvement that some NFV proponents were promising wasn’t enough.  “If we were satisfied with 25%, we’d just beat Huawei up on price” was the key comment.  Technology change has failed; let’s go for discount pricing.

That’s the problem in a nutshell.  The best we’ve come up with using new technology so far hasn’t measured up in terms of capex reduction.  It can’t match what operators could hope to get in the way of extra discounts from Huawei.  If Huawei is off the table as a supplier, even if competitors like Ericsson or Nokia were willing to cut the same 25% discount, their starting price is often at least that 25% higher.  Operators are feeling the stress of dying financial options, so they need some options to develop.

Bringing Huawei back is an option none of them can really control, so there’s no point talking about that.  Our easiest response would then be to resurrect either router instances or NFV, so we have to ask why these two failed and whether we could address those issues.

Router instances running on commercial servers, and virtual-function versions of routers, have the same issues, as it turns out.  First, commercial services aren’t the right platform to host the data plane of high-capacity network devices.  The current-market shift of focus to white-box technology is a result of this truth.  You need specialized data-plane chips to do efficient packet-handling in multi-terabit devices.  Second, all these hosting-of-routers-and-functions stuff has been a miserable failure in an operations sense.

The NFV realists among the telcos tell me that they’re finding NFV to be so operationally complex that they believe they’d lose more in opex than they’d save in capex.  Think about it; in the router-in-a-box days, you managed a network by managing the routers.  With NFV, you still need to manage the routers, but you also have to manage the servers, the server platform software, the NFV management and orchestration tools, and the virtual-network resources that connect all the pieces of functionality.

White boxes could fix some of these problems, but not all of them.  If we were to look at the hosted-router model, assuming white-box deployments, we would expect to save the price difference between a vendor router platform and a white box (about 60-70%).  We still need the software, and many router-software vendors want to license on a per-box basis, so that eats up about a quarter or more of the capex savings.  We can get some savings, in the net, but we’ve become an integrator or we have to hire one, which further reduces the savings.  We might also have to customize the router software for our black-box choice.  This still leaves some savings, so we have to ask why it’s not being adopted, and the answer lies in the limitations of those white boxes.

There’s a router for every mission these days.  Home, office, edge, aggregation, core, whatever.  Most router vendors have at least three models, and all of them are based on a different architecture.  There are many different white-box platforms too, and most of them are also based on a different architecture.  Operators aren’t excited about trying to match software-routing licenses to white-box architectures themselves.

There’s a different problem at the high end, which is lack of an available platform.  Cisco and Juniper offer models that offer over 200 Tbps capacity.  Try to find a 200 Tbps white box.  In fact, the big router vendors don’t build their high-capacity routers as a monolithic box anyway, they use a chassis design that combines fabric switching and routing elements.  Operators could in theory build a chassis router from the right white boxes, but could they do the software?

Then there’s the big question; who operationalizes this whole mess?  One advantage of the single-vendor approach is that they make everything fit.  Just getting all the pieces to run within the same operational framework is a challenge (called “onboarding” in NFV with virtual functions) that’s defeated a whole industry.  A big part of the problem is that open platforms tend to develop in microcosmic pieces, and operators have to deploy ecosystems.  Nothing fits in the open community, because nothing is designed to fit.  There’s no over-arching vision.

An over-arching vision is what we need, a vision of what cloud-native network function deployment would really look like, and how it would create superior network functionality and operational efficiency.  What the heck does the network of the future, the great alternative to low-priced vendors, even look like?  Even before we had NFV, we had SDN, and it articulated or implied some principles about the network of the future.  The net of them all was that you don’t build that network of the future by redoing the same software and hardware model that you’re trying to replace.  Cisco and Juniper both talk about “cloud principles” in their stuff, but they’re mostly focusing on interfaces to cloud tools for orchestration and management, not on the way that the devices themselves are built and deployed.

You can’t easily apply cloud principles to the data plane.  That means that you can’t apply them to networks built from software instances that don’t separate the data and control planes.  It also means that once you separate the data and control planes to apply cloud principles, you have to somehow bring the two together to coordinate the overall functionality of the network.  You also have to decide how a cloud-native control plane would look, and surely it would be a collection of microservices that implement individual control-plane features.  That’s not how we build routers today; they’re monolithic and not microservice-ized.  I did a whole blog on this, as applied to 5G, HERE.

This problem was recognized early on.  At their very first meeting, the NFV ISG considered the question of whether you had to disaggregate the features of physical devices before you virtualized them.  The idea was almost literally shouted down, and we were left with the notion that a virtual network function was a hostable form of a physical network function, meaning an existing network device.  That decision has hobbled our attempts to rebuild networking by forcing us to accept current devices as our building-blocks.  If you build a network of router instances, it’s still a router network.

The telco world, in the 5G architecture, is admitting that you have to build on virtual principles and not on box principles, but the 5G work not only utilizes NFV and all its limitations, its answer to avoiding box principles is to create virtual boxes.  Gang, a box is a box.  But 5G does separate control and data planes, and it does have at least some recognition that signaling is, in the end, an event-driven application.  It would be possible to take current 5G specs and convert them into something that’s truly a virtual powerhouse.  That’s a good start, but the cloud-centric design is not being carried forward enough in our implementations, which have largely come from (you guessed it) box vendors.

Price leaders, as a concept, will always exist, but how much price leadership we’ll see is surely going to diminish as long as we stay within the old network model.  Is there truly a more economical networking paradigm out there?  If there is, I contend that it’s not created by stamping out “router device cookies” from different dough.  We need a whole new recipe.  If there’s a hope that new network architecture will create an alternative to a network-vendor price leader, then that new recipe is our only path forward.  Otherwise, the overused phrase “Huawei or the highway” may be true.

What’s the Right Network for Cloud and Function Hosting?

If containers and Kubernetes are the way of the future (which I think they are), then future network models will have to support containers and Kubernetes.  Obviously, the combination expects to have IP networking available, but there are multiple options for providing that.  One is to deploy “real” IP in the form of an IP VPN.  A second is to use a virtual network, and a third is to use SD-WAN.  Which of these choices is the best, and how are they evolving?  That’s what we’ll look at here today.

Most container networking is done via the company VPN with no real additional tools.  Classic containers presume that applications are deployed within a private IP subnet, and the specific addresses that are intended to be shared are exposed explicitly.  Kubernetes presumes that containers are all addressable to each other, but not always exposed on the company VPN.  This setup doesn’t pose any major challenges for data center applications.

When you cross this over with the cloud, the problem is that cloud providers give users an address space, which again is typically a private IP address.  They then can map this (Amazon uses what it calls “elastic IP addresses”) to a standard address space on a company VPN.  The resulting setup isn’t too different from the way that data center container hosting would be handled.

Hybrid clouds often won’t stress this approach either.  If the applications divide into permanent data-center and cloud pieces, then the combination works fine.  The only requirement is that the cloud and data center pieces should be administered as separate Kubernetes domains and then “federated” through a separate tool (like Google’s Anthos).

The rub comes when components deploy across cloud and data center boundaries, so as to treat the sum of both cloud(s) and data centers as a single resource pool.  This can create issues because a new component instance might now be somewhere the rest of its subnetwork isn’t.  It can still be made to work, but it’s not ideal.

The final issue with the standard IP model is that when users have to access cloud applications, they’d typically have to go through the VPN, and in multi-cloud, it’s now difficult to mediate access to an application that might have instances in several clouds.

The second network option for containers and the cloud is to use a virtual network, which Kubernetes supports.  The positive of the virtual-network approach is that the virtual network can now contain all the Kubernetes containers, retaining the property of universal addressability, and that if the virtual network is supported across all the clouds and data centers, then everything is addressable to everything else (unless you inhibit that), and if something is moved it can still be reached.  With some technologies, it may even be possible for redeployed elements to retain their original addresses.

The obvious problem with the virtual network model is that it creates an additional layer of functionality, and in most cases that means virtual nodes to handle the virtual-network, which is effectively an overlay above Level 3.  There are additional headers that have to be stripped on exit and added on entry, and this is also a function of those virtual nodes.  While the packet overhead of this may not matter to all users in all applications, hosting the virtual nodes creates a processing burden and limitations on the number of virtual-network connections per node can also impact scalability.

A less-obvious problem is that virtual networks are another layer to be operationalized.  In effect, a virtual-network overlay is a whole new network.  You need to be able to correlate the addresses of things in the virtual world with the things in the real world.  Real worlds, in fact, because the correlations will be different for the data center(s) and cloud(s).  Some users like the virtual network approach a lot for its ability to independently maintain the container address space, but others dislike the complexity of maintaining the independent network.

The third option is the SD-WAN, which is a complicated story in itself, not to mention the way it might impact containers and the cloud.  SD-WAN implementations are overwhelmingly based on the virtual-network model, but unlike the generalized virtual network, an SD-WAN really extends a company VPN based on MPLS technology to sites where MPLS VPNs are either not available or not cost-effective.

Recently, SD-WAN vendors have started adding cloud networking to their offerings, which means that cloud components can be placed on the company VPN just like a branch office.  The SD-WAN “virtual network” is really an extension of the current virtual network.  Because SD-WANs extend the company VPN, they use company VPN addresses, and most users find them less operationally complex than the generalized virtual networks.

Because most SD-WANs are overlay networks just like most virtual networks are (128 Technology is an exception; they use session-smart tagging to avoid the overlay), they still create a second level of networking, create a need to terminate the tunnels, and potentially generate scalability problems because of the fact that a number of specific overlay on/off ramps are needed.  Because most SD-WAN sessions terminate in a hosting point (the data center or cloud), this concentrates those incremental resources at a single point, and careful capacity planning is needed there.

SD-WAN handling of the cloud elements varies across implementations.  In some cases, cloud traffic is routed via the company VPN in all cases.  In others, cloud traffic might go directly to the cloud.  Multi-cloud is similarly varied; some implementations add a multi-cloud routing hub element, some permit direct multi-cloud routing, and some route through the company VPN.

Many virtual network implementations have provided the plugins to be Kubernetes-compatible.  At least a few SD-WAN vendors (Cisco, most recently) have announced that they’ll offer Kubernetes support for their SD-WAN, but it’s not clear just what benefit this is in the SD-WAN space; perhaps more exemplar implementations will help.

You can see that, at this point, you couldn’t declare a generally “best” approach to container/Kubernetes networking.  The deciding factor, in my view, is the way that the various options will end up handling service mesh technology associated with cloud-native deployments.

Cloud-native implementations will usually involve a service mesh like Istio or Microsoft’s new Open Service Mesh, as a means of linking all the microservices into the necessary workflows while retaining the natural dynamism of microservices and cloud-native.  The performance of the service mesh, meaning in particular its latency, will be a critical factor in the end-to-end workflow delay and the quality of experience.  While a few vendors (Cisco, recently) have announced Kubernetes compatibility with their SD-WAN, nobody is really addressing the networking implications of service mesh, and that’s likely the future battleground.

One issue/opportunity that service-mesh-centric virtual networking raises is the question of addressing elements.  It’s highly desirable that components have something that approaches logical addressing; you aim a packet at a component and the mesh or network then directs the packet to either a real component (there’s only one instance) or to a load-balancer that then selects a component.  If there’s no component available, then it gets instantiated and the packet goes to the instance.  Does the network play a role in this, or is it all handled by the service mesh?  If a component has to be instantiated because nothing is available, can a newly freed-up component then jump in if the instantiation doesn’t complete before something becomes newly available?  You can see that this isn’t a simple topic.

All of this also relates to the way that network functions would likely be implemented.  The control and management planes of a network are event-handlers, and lend themselves to cloud-native-and-service-mesh deployment.  The data plane likely works best if the nodes are fixed, with new instances coming only if something breaks or under strict scalability rules.  The two planes have to be functionally integrated, but their requirements are so different that they’re really parallel networks (DriveNets, who recently won one of Light Reading’s Leading Lights awards, takes this approach).

Control-/data-plane separation, IMHO, is a mandate for any cloud implementation of network functions, and both then have to be related to the network technology that will bind the elements together within a plane, and carry coordinating messages between planes.  Since traditional protocols like IP will carry control/management messages in the data plane, the network of the future may well separate them at the ingress point and rebuild the traditional mixed-model data path on egress.  That admits to the possibility of an “internal” network structure that’s different from the traditional way that routers relate to each other in current IP networks.  SDN is an example of this, but only one possible example.  There’s plenty of room for invention here.

There’s a big opportunity here to go along with that invention, IMHO.  Cloud-native networking has to be efficient, agile, and exceptionally secure, and likely a virtual-network technology of some sort is the right answer.  I think vendors are starting to see the value of container- and Kubernetes-specific virtual network support, but it’s far from universal, and the great service-mesh-and-cloud-native opportunity is, for now, untapped.  It probably won’t stay that way for long.

A Good Step Toward Open RAN and 5G

Is an open 5G infrastructure model really possible?  Many (and, of course, most of the 5G vendors) have said it’s not.  There are providers of pieces of the 5G software in open-source form, but one area that’s been particularly challenging has been the 5G New Radio (NR).  Now, just perhaps, it might be possible, thanks to an announcement by the ONF on August 25th, but there are still some rivers to be crossed on the way to a true, open, 5G.

The ONF is launching an open project for Software-Defined RAN, which will be an exemplar implementation of the 3GPP open-RAN and “consistent with” the O-RAN specification.  The key piece of this is an ONOS (this is the open-source project for a white-box operating system that came out of AT&T’s DANOS) implementation of the RAN Intelligent Controller (RIC).  The RIC implements interfaces to the O-RAN Central Unit and Distributed Unit (O-CU and O-DU, respectively) of the 5G NR protocol stack, often provided by the RAN hardware vendors.

You may wonder why it’s important to have this project at all, given all the noise we’ve been hearing around O-RAN.  Remember that O-RAN is a specification, not a software package.  You can implement O-RAN as a spec without using open-source software for any piece of it, and the current market model is largely focused on that.  Even the SD-RAN group will support “closed-source” components.  What the ONF is hoping to do is create an open-source software implementation of O-RAN, which would then support open-model 5G radio networks.  Combine this with other open stuff in the 5G Core software space, and you have a complete open-source-based 5G implementation.

There are two pieces of this story that we need to look at.  The first is why it’s going to be important, which in part relates to how this concept fits in with other 5G developments, and the second is just how the new project, and in particular the new RIC, work.  We’ll take them in that order.

One truth about 5G projects that permeates every possible 5G deployer is that nobody wants to self-integrate.  5G, like any network technology, is a rather vast ecosystem, and getting all its clocks to chime at the same time is a formidable effort.  Even Tier One operators have been reluctant to try to do a best-of-breed 5G story, to the point where companies who attempt it get a lot of ink (Dish, recently).  For Tier Twos and below, there’s little hope they could acquire and retain the skill sets.

So 5G in any semi-open form cries out for an integrator?  Get one, you might think.  The problem there is that it’s hard to be an integrator with stuff that you don’t have rights to.  If we could build 5G from totally open-source software, any company could set out to be the “Red Hat of 5G”, but that’s not been the case.  This new initiative could make it happen.

Even public cloud provider efforts to get into 5G would be facilitated by the availability of a fully open 5G NR implementation.  All of the public cloud providers are now interested in becoming the “carrier cloud” hosting point for operator 5G Core implementations.  Full 5G will require that there also be a 5G NR available, and if this new initiative really does what it promises to do, it will make a cloud-native 5G NR software stack a reality.  That might then augment the current drive toward mobile-edge computing (MEC) and allow public cloud providers to offer MEC hosting.

There are plenty who think this is essential for 5G.  A Light Reading article by Telefonica’s CTIO is all about the importance of having operators take advantage of open 5G strategies, and it’s obvious that this is a sincere hope.  The challenge is that we’ve had plenty of hoping in the past, and the results haven’t measured up to expectations.  We have to go into the technology details of this announcement with that in mind.  Not discouraged or dismissive, but wary.

That gets us through the “Why? of this discussion, so it’s time to move into the “How?”  To do that we have to start with the O-RAN architecture, the best reference for which is HERE and HERE.  The architecture shows a service management layer (which might be any of a number of things we’ll get to in a bit), and a RIC layer that make up a central piece, and a set of distributed pieces that represent the RAN head-end elements.  In the classic 5G stack, the structure is monolithic, and the goal of the SD-RAN process is to open up the critical pieces, which is done by supporting an open “xApps” model for features, interfacing to the RIC, which then handles the down-the-stack interfaces.  This is a more disaggregated and cleaner approach, frankly, than the classic 5G NR stack, even discounting the open goals.

The specific things being implemented are pieces of the “near-real-time” layer of 5G NR, including the nRT-RIC built on the ONOS platform (µONOS-RIC), and “exemplar” xApps (starting with handover and load balancing).  Radio Resource Management (RRM) is handled at this level, and it’s the functional core of 5G NR capabilities, so it makes sense to quickly offer an open platform to provide useful/necessary features.  The RIC is also the element that controls (via the 5G CU-C and CU-U) the control- and user-plane connections between the RAN and the 5G Core.  That makes this approach particularly interesting in terms of creating 5G-wide service feature control.

That, usually, is the province of the orchestrator layer, and that raises one of the points we have to consider regarding the future of the initiative.  I’ve already noted the point that traditional vendors don’t like O-RAN or SD-RAN much (the bodies involved in the initiatives admit as much).  There is strong support from operators, software vendors, and other telecom projects (like TIP), but there’s a big chunk of telco-land that this will have to work with, including orchestration.

The “big chunk” point is important, because many of the projects aimed at telco infrastructure have suffered from myopia; they see only what’s close.  Any carrier service is based on a fairly vast ecosystem, and that’s especially true of mobile services.  Vast ecosystems make difficult projects, so even telco standards tend to break things into pieces, and that’s particularly true of projects like SD-RAN, which has an explicitly limited scope.

Pieces mean integration, though, and that’s always a boondoggle.  Operators hate to have to piece stuff together, and hate finger-pointing when something goes wrong.  That makes it essential that a new piece to a puzzle like 5G infrastructure gets fit somehow into the Glorious Whole, and the ONF material can’t really take on that task.

The dependency on other elements also put a new concept at risk for “failure by association”.  The diagrams you see in presentations about SD-RAN focus on things like ONAP and NFV MANO for orchestration, and some of the diagrams show an NFV component within the 5G NR stack implementation.  These are at best big-carrier concepts, and at worst, useless burdens.  Certainly, enterprise 5G, which an ONF slide cites as a market with “lots of untapped potential” isn’t going to see much tapping if all this telco-centric baggage is part of the implementation.

A kind of associative failure would be a shame here, because the concept behind the ONF RIC is insightful, forward-looking, and cloud-modern, at a time when none of those attributes can be claimed by much in the carrier space.  I’d love to see the whole of 5G Core done this way, rethought to true cloud-native-and-service-mesh form.  Perhaps some eager cloud developers will take that on.

I think that this open-source approach to O-RAN is potentially a great step, but it’s going to have to get more populist in order to be accessible to even Tier Three operators (and some Tier Twos) much less the enterprise.  People will want open-model hardware suggestions, and most of all, lighter-weight, less big-telco-standards-centric operations elements.  Get that in place and this initiative could do something important and profound to the 5G space.

5G Pricing, Progress, and Missions

The easiest way to kill a new product or service is to price it wrong.  Buyers always compare new stuff with older stuff, and accept a higher price only if they see a clear incremental benefit.  Sellers always want to recover incremental costs with price increases.  This challenge is now coming to the fore in 5G technology, and while I don’t think it will kill 5G to get it wrong, it could impact how we use 5G for years to come.

5G is, first and foremost, a part of a long and relatively orderly evolution of mobile network services.  As such, the biggest test it has to meet is becoming the dominant if not universal framework for cellular networks.  That means that users will have to accept 5G and acquire 5G-capable devices.  Switching to a 5G device will cost money, and if 5G services are also more expensive, users will drag their feet.

Operators, of course, are looking at 5G as an almost-complete transformation of mobile infrastructure, something that’s going to cost them a lot of money.  Most are already scraping the bottom of the cost-reduction barrel to get their profit per bit up, and so a big capital boondoggle like 5G infrastructure doesn’t look very attractive.  Why not jack up the price a bit?

Publicity for 5G encouraged operators.  You can’t attract readers to a story about technology that’s invisible to them, so most 5G stories have focused first on the much higher speed of 5G networks.  This, despite the basic fact that a smartphone user might have considerable difficulty seeing any difference between 5G and 4G services (5G wireline replacement would be different, of course).  If users somehow believe that 5G will make their videos faster or better, or that it will make them cool or desirable, then maybe they’d pony up some extra bucks for it.

At the basic service level, this isn’t looking like a winning approach.  Verizon, who had planned to charge $10 per month more for 5G, recently admitted they were dropping that price hike.  In a competitive market, there was little chance they could command the higher price when there were few (if any) differences in the service.

What this has done is to force 5G operators to focus their increase-my-revenue hopes on 5G applications other than smartphone mobile service.  Network slicing and IoT are typical examples of this, and as this story in Light Reading shows, at least one operator believes that the ability to offer very dynamic and flexible 5G pricing for specialized service/application mixes will be critical.  But will it really be the opportunity driver, or is pricing here still posing the same optimization risks we’ve faced with all products and services?

IoT services were operators’ first excursion into “new applications” for 5G, and operators still have lingering hopes that a bunch of Internet-connected “things” will each get a cellular service bill each month.  Could pricing help with that?  Sure, if the price were zero, which is what alternative technologies cost today.  There are certainly applications in industries like transportation where 5G might well offer enough of a benefit to create direct mobile service subscriptions, but this is not ever going to be a mass market.  It might be able to exploit flexible pricing, but pricing capabilities aren’t going to drive adoption.  Let’s move on.

The two specific things that the LR article talks about are network slicing and dynamic pricing for capacity.  Dynamic capacity pricing mirrors pricing policies based on scarcity, like the Uber example the article cites or off-hour versus peak-period network usage.  The value of dynamic pricing is difficult to assess in 5G, given that we have virtually no exposure to it, but other attempts to offer capacity at different prices, including the “turbo button” approach and seasonal or hourly-shift-based enterprise service pricing, have not been widely accepted or successful.

Dynamic pricing seems to depend on a different application model for success, and it’s not clear just what that model might be.  Some operators have suggested that things like smartphone updates and app purchases might benefit from the dynamic model, but if updates and purchases can be deferred, they could in most cases be done over WiFi, which has no cost to the user.

Network slicing is an example of the real value of dynamic pricing, including that the real value may depend on hypothetical applications again.  We can offer examples of network slicing in use, including secure networks, company secure networks, isolation for IoT services, and so forth, but we still can’t demonstrate a broad business case for those examples.

What, then, are we seeing with operators like Dish?  I think it’s related back to the fact that operators have always been doubtful of the mass-market value of 5G, to the extent that they’d hoped to be able to limit their 5G costs with things like the 5G-over-4G-infrastructure “non-standalone” or NSA approach.  When it became clear that competition was likely to force them to full 5G, they were then constrained to rely on some sort of new application set to pay back their investment.  That, in turn, means they have to be able to do very flexible service pricing.

5G Core is one of those in-for-a-penny things, in short.  If you decide you need to do it, you have to do it in such a way as to be responsive to a set of opportunities that everyone writes about and nobody currently validates.  It’s difficult to adapt to things that are formless, so you try to create the most agile implementation possible in order to maximize the chance that the emerging and real opportunities will fall in the scope of what you’ve prepared for.

This raises an interesting question, which is whether this same sort of adaptation will be required for overall service, network, and operations/business management.  Whether the people involved want to admit it or not, today’s OSS/BSS processes tend to be tightly coupled to network behaviors, because those behaviors are what create services.  If a network is built to be flexible, agile, dynamic, is it also “behavior-less” in a sense?  What will drive the overall operations models of the future if there are no fixed behavioral relationships built into the network?

Some of the operator planners who have dissed OSS/BSS systems to the point where they want to see them scrapped and a new model adopted, believe that this is the time when we should be thinking of driving networks from services and not the other way around.  They may well be correct, and this may be how model-driven operations really comes into its own.

Are We Focused on the “Wrong” Latency Sources?

Does lower latency automatically improve transaction processing?  That may sound like a kind-of-esoteric question, but the answer may determine just how far edge computing can go.  It could also help to understand what network-infrastructure applications like 5G would mean to mobile edge computing (MEC) and even what kind of edge-computing stimulus we might expect to see from microservices and the cloud.

“Latency” is the term used to describe the time it takes to move something from point of origin to point of action.  It’s a modern replacement for (and factually subset of) the old popular term, “round-trip delay”.  We also sometimes see the term “response time” used, as an indicator of the time between the user does something at a user interface device, and they receive a response.  The basic premise of networking and IT alike is that reducing latency will enhance the user’s quality of experience (QoE).  Like all generalizations (yes, perhaps including this one!), it’s got its exceptions.

When we talk about latency in 5G, and combine it with MEC, what we’re doing is suggesting that by lowering the transit delay between message source and processing point, either by improving network handling or by moving the processing point closer to the user, we can improve QoE and often productivity.  The reason this discussion has started to get attention is that it’s becoming clear that things like self-driving cars (which don’t really have much to do with latency in any case) are not jumping out of showrooms and onto the roads.  If 5G latency and MEC are to gain credibility from latency reduction, we need an application that benefits from it, and can be expected to deploy on a very large scale.

Everything that we need to know about 5G and MEC latency benefits can be summed up in one word—workflows.  The user’s perception of a response comes from a number of factors, but they ultimately come down to the flow of information between handling or processing points, from user to their logical end, and then back.  We forget this all the time in talking about latency, and we should never, ever, forget it.

Let’s take a silly example.  Suppose we have a 5G on-ramp to a satellite phone connection.  We have 5G latency to the uplink, which might be just a few milliseconds, and then we have a satellite path that’s a minimum of about 44 thousand miles long, then a return trip of the same length, and then the 5G leg.  The round-trip to the satellite is 88 thousand miles, which would take (over the air only, no account for relay points) 470 milliseconds.  The point is that, in comparison with that satellite hop, nothing likely to happen terrestrially is going to make a statistically significant difference.

We can apply this to transaction processing too.  Suppose we have a transaction created on a mobile phone, one that hops to a cloud location for front-end processing, and then on to a data center for the final transaction processing.  The hop to the Internet from the phone might take 10 milliseconds, and then we might have an average of about 60ms to get to the cloud.  The hop to the data center might consume another 60ms, and then we have processing delay (disk, etc.) that would require 100ms.  At this point, we go back to the phone via the same route.  Total “round-trip” delay is 360ms (2x130ms for the cloud and 100ms in the data center).  This is our baseline latency.

Suppose now that we adopt 5G, which drops our phone-to-Internet latency down to perhaps 4ms.  We’ve knocked 12ms off our 360ms round-trip, which I submit would be invisible to any user.  What this says is that 5G latency improvement is significant only in applications where other sources of delay are minimal.  In most cases, just getting to the processing point and back is going to obliterate any 5G differences.

This, of course, is where edge computing is supposed to come in.  If we move processing closer to the point of transaction generation, we eliminate a lot of handling.  However, if we go back to our example, the total “transit latency” in our picture is only 260ms.  Edge computing couldn’t eliminate all of that, but it could likely cut it to less than 50ms.  Whether that’s significant depends on the application.  For transaction processing, the 210ms eliminated is at least slightly perceptible.  For closed-loop control applications, it could be significant.

But there’s another point to consider.  If we look at edge computing as it is today, we find that it’s implemented most often via one of two architectural models.  The first model is the local controller model, where an event is cycled through a local edge element as part of a closed-loop system.  That’s really more “local” computing than “edge” computing.  The second model is the cloud edge, and that’s the one we need to look at.

The cloud-edge model says that in a transaction dialog, there are some interactions that don’t connect all the way to the database and transaction processing element.  Complex GUIs could create some, and so could parts of the transaction that do editing of data being entered, perhaps based even on accessing a fairly static database.  If we push these to the edge, we reduce latency for “simple” things, things that the user would be more likely to get annoyed with.  After all, all I did was enter a form, not update something!

But this raises what’s perhaps the biggest issue of latency in transaction processing in our cloud and cloud-native era, which is the interior workflow.  Any multi-component application has to pass work among components, and the passage of this work creates workflows that don’t directly involve the user.  Since their nature and number depends on how the application is architected, a user wouldn’t even be aware they existed, but they could introduce a lot of latency.

This may be the most compelling problem of cloud-native applications.  If we presume that we adopt a fully microservice-ized application model with a nice service mesh to link it, we’ve taken a whole bunch of interior workflows and put them into our overall transaction workflow.  In addition, we’ve added in the logic needed to locate a given component, perhaps scale it or instantiate it…you get the picture.  It’s not uncommon for interior flows in service meshes to generate a hundred milliseconds of latency per hop.  A dozen hops and you have a very noticeable overall delay.

What this means is that dealing with “latency” in the radio network (5G) or in the first logic step (edge or local computing) doesn’t mean you’ve dealt with latency overall.  You have to follow the workflows, and when you follow them into a modern service mesh and cloud-native deployment, you may find that other latency sources swamp what you’ve been worried about.

It also means that we need to think a lot more about the latency of the network(s) that will underlay a service mesh.  There’s a lot of queuing and handling in an IP network, and we should ask ourselves if there’s a way of reducing it, so that we can efficiently hop around in a mesh.  We also need to think about making service meshes highly efficient in terms of latency (the same is true of serverless computing, by the way).  Otherwise we could see early cloud-native attempts underperform.

Why Burying Costs in Bandwidth Might be Smart

We could paraphrase an old song to promote a new network strategy.  “Just wrap your opex in bandwidth, and photon your opex away.”  From the first, a lot of network design has focused on aggregating traffic to promote economies of scale in transport.  That has translated into equipment and into protocol features.  Many (including me) believe that well over three-quarters of router code is associated with things like capacity management and path selection.

How much opex could that all represent, and how much could we save if we just buried our problems in bits?  That would depend on what path to that goal we might take, and there are several that would show promise.

What’s the “shortest” or “best” alternative path in a network whose capacity cannot be exhausted?  There is none; all paths are the same.  Thus, it doesn’t make any sense to worry about finding an optimum alternative.  If we could achieve a high level of optical meshing in a network, we could simplify the task of route selection considerably.  That could reduce the adaptive routing burden, and the impact of route adaptation on operations processes overall.  An option, for sure.

Suppose we added electrical grooming layer technology to transport optics, so that we had a full electrical mesh of all the sites.  We now have a vastly simplified routing situation; everything is a direct connect neighbor, and so we only need to know what our neighbor can connect with at the user level.  That again vastly simplifies routing burden, so it’s another option.

One problem with these options is that both of them rely on an assumption, which is that routing-layer functionality would be adapted to the new situation and would create a cheaper and easier-to-operate device.  That assumption flies in the face of the interest of router vendors in sustaining their own revenue streams.

A second problem is that any change in router functionality would almost surely have to be based on a standard.  Operators hate lock-in, and they’d likely see any vendor implementation of a new router layer as proprietary unless there was a standard behind it.  Since the interfaces to the devices would be impacted, even an open-source solution wouldn’t alleviate the operator concerns on this point.

Since we’ve had these two options available for at least a decade (they’re more practical now because of the declining cost of transport bandwidth that’s come with optical improvements), we have to assume that there’s no obvious, simple, solution to the problems.  Let’s then look for a solution that’s less than obvious but still simple.

Technically, the challenge here is to define a standard mechanism for the operation of the simplified router layer elements, and provide a means of its integration into existing networks so that fork-lift upgrades are not necessary.  To meet this challenge, I propose a combination of the OpenFlow SDN model, intent-model principles, and Google’s Andromeda SDN core concept.

Google built an SDN core and surrounded it with an open-router layer based on the Quagga BGP implementation.  This layer made simple SDN paths look like an opaque BGP network, an “intent model” or black-box implementation.  It has the classic inside/outside isolation, and for our purposes we could implement any open-model IP software or white-box framework to serve as the boundary layer.  The protocol requirements would be set by what the black box represented—a router in a router network, a subarea in an OSPF/IS-IS network, or a BGP network.

This model, which is an SDN network inside a black-box representation of a router or router network, is then the general model for our new router layer in a drive to exploit capacity to reduce opex.  In practice, what you’d do is to define a geography where you could create the high-capacity optical transport network, and possibly the electrical-layer grooming that would make every network edge point adjacent to every other one within that transport domain.

You’d then define the abstraction that would let that transport domain fit within the rest of the IP network.  If you could do an entire BGP AS, for example (as Google did), you’d wrap things up in a Quagga-like implementation of BGP.  If you had a group of routers that were convenient to high-capacity transport optical paths, and they didn’t represent any discrete IP domain or subarea, you’d simply make them look like a giant virtual router.

One purpose of this boundary layer is to eliminate the need for the SDN central control logic to implement all the control-plane protocols.  That would add latency and create a risk that loss of the path to the controller could create a control-plane problem with the rest of the network.  Better to handle them with local implementations.

The benefit of this approach is that it addresses the need for a phase-in of the optical-capacity-driven approach to networking.  The risk is that it partitions any potential opex improvements, and of course having efficient operations in one little place doesn’t move the needle of network- and service-wide opex by very much.  This benefit/risk could easily create an unfavorable balance unless something is done.

A small-scope optically optimized deployment would generate a minimal benefit simply because it was small.  It could still generate a significant risk, though Google’s Andromeda demonstrates that big companies have taken that risk and profited from the result.  The point is that generally you have to have a large deployment scope to do any good, but that large scope tends to make risk intolerable.  Is there a solution?

It would seem to me that you need to consider optically dominated transformation of IP networks after you’ve framed out an intent-model-based management framework.  That framework would, of course, have to focus on electrical-layer (meaning IP) devices, and so it’s out of the wheelhouse for not only the optical vendors, but the network operations types who the optical vendors engage with.

When I ask operators why they don’t plan for this sort of transformation, what they say boils down to “We don’t do systemic network planning”.  They plan by layers, with different teams responsible for different layers of technology.  Things that require a harmonization of strategy across those layers are difficult to contend with, and I’m not sure how this problem can be resolved.

One possible solution, the one I’ve always believed had the most potential, was for a vendor to frame an intent-based management model and use it to do “layer harmonization”.  That hasn’t made much progress in the real world, in no small part because management itself forms a layer of network technology, and operators’ management processes are further divided by network and operations/business management.

I’d hoped that Ciena, when they did their Blue Planet acquisition, would develop this layer harmonization approach, and I still think they had a great opportunity, but the operators themselves blame themselves more than Ciena or other vendors.  They think that their rigidly separated technology enclaves make it very difficult for any vendor to introduce a broad technology shift, or an idea that depends on such a shift for an optimal result.  They may be right, but if that’s the case then who is likely to drive any real transport-centric revolution?

Maybe the open-model network movement?  Open-model networking assembles pieces that are united in that they’re based on open hardware/software technology, but divided in just about every other way.  Somebody has to organize the new system of components into a network, and it might be the operators themselves or an integrator.  Whoever it is may have to deal with the whole layer integration problem, and that may lead them to finally take full advantage of the transformation that simple capacity could bring to networking.

Can Cisco and Other Network Vendors Navigate the Future?

Cisco had a disappointing quarter, and there’s no getting around it.  The question then is whether Cisco can “get around it” in future quarters, and the situation in that regard is really complicated.  It’s going to depend on just how radical Cisco is willing to be in facing the future.

Networking had a golden age, the period from the ‘80s through about 2017.  Like virtually all golden ages, it’s now passed away, and candidly it’s never, ever, coming back.  Yes, that’s right, we aren’t going to see any rebirth of double-digit sales growth for network equipment.  That doesn’t mean that network companies are doomed, only that the golden-age-driven business models are doomed.  The companies will follow the business models to the toilet only if they elect to stay with them too long.

Spending on network equipment has to be justified like everything else, meaning that the equipment has to produce enough benefit to cover costs plus the company’s internal rate of return margin (in total, their “target ROI”).  On the service provider side, the benefit is the revenue paid by customers, and for enterprises, it’s the productivity benefit created by the services.  The benefits driving procurement of network technology have been slowing for both enterprises and service providers, and that’s a problem that can’t be fixed on the network side.  It would require deep thinking on applications that could drive networks, and there’s been little or no organized effort to even encourage much less create that kind of thinking.

Cisco’s Robbins gives us our starting point, then: “If the past year has taught us anything, it’s the need to always be nimble.”  Actually, it’s the lesson of the last decade.  That’s how long the trends that are impacting network purchases have been acting on the market, and my surveys have showed it (and these blogs have documented it).  The market isn’t changing, it has changed.  There is now no going back, because the demand-side transformation needed could never be done fast enough to allow vendors to cruise along on their old box-centric sales plans.

Cisco has recognized this for some time, but their recognition has been more one of recognizing symptoms than recognizing root causes.  I actually offered Cisco a deep picture of IT spending and how it changed over macro periods, under various stimulus, almost 15 years ago, and they didn’t want to hear it (rival Juniper didn’t either).  But even if Cisco can only see a symptomatic view of the future, they at least have recognized some of the band-aids needed based on the symptoms.  You have to move more to an as-a-service approach.  You have to do more software than hardware.  All that is very true, but it would sure help Cisco if they understood why this was necessary.  It could guide their response, and let them address the risks each response creates, because every response does create a risk.

Moving to as-a-service is a response to a combination of buyer fears of lock-in and buyer fears of stranded assets.  If I can buy connectivity rather than buying technology, and do so on a short-contract basis, then I’m in the driver’s seat.  Thus, I’d favor vendors who offered a more non-traditional expense-based approach to things, right?  Yes, and vendors who supply what the buyer wants will then lose account control and participate in their own commoditization.  That’s the risk.

The future of networking, at the low level, is virtualization, and in two dimensions.  First, virtual networks which offer ad hoc connectivity in community form or overall, are the future of connections.  SD-WAN (which Cisco does offer) is an example, but Cisco’s own offerings are functionally primitive in comparison with those of feature-leaders (like 128 Technology).  Second, virtualization of the platforms to create white-box-and-software networking is inevitable, and that’s a game that no current box-network vendor can possibly win.  Unless networking is transformed by new benefits, these trends will totally erase future opportunities for “network equipment”.

Does this mean Cisco and other vendors should lay down with a rose on their chests?  Not necessarily.  Let’s go back to a previous point:  Lock-in.  Nobody fears getting locked into the right answer, the best approach.  The problem arises when there is no acknowledged right or best answer.  The whole as-a-service thing is related to the fact that networks are really just pushing 1’s and 0’s in the end, which is not much of a basis for differentiation.  But the fact is that buyers, as much as sellers, see the decline in network benefits and would like to address that decline.  If networks can’t improve productivity, they’d like to know what can, not just dismiss network spending.  The first thing that vendors like Cisco need to do is to elevate their pitches to the level of network applications, not network technology.

There have been vendors who have attempted this, of course, but “network applications” can’t mean stuff that, speculatively and with unspecified support from unspecified products/vendors, might be useful.  What’s needed is a driver of actual benefits, realizable benefits that don’t depend on the cooperation of a vast and undeveloped community of products and vendors.  Or, the vendor has to take steps to build that community and develop the applications.

Every one of the waves of increased network benefits and spending has been related to an IT development, not a network development.  Even the growth of the Internet would have been impossible without the explosion in personal computing.  Cisco should embrace this, because it’s one of the very few network vendors that has IT credentials.  Cisco has under-exploited its servers and software business, and fixing that problem should be the highest priority for Cisco and a message to the other vendors in the industry.  If the future is software, the spending is going to be more on stuff that looks more like IT gear than like big (or little) network boxes.

What will future service users pay for, meaning pay more for?  Not bandwidth; we already know mobile users expect 5G to cost no more.  Users tend to cluster at the low end of wireline broadband services to keep prices down.  This, while they pay for streaming video, streaming music, games, and other experiences.  That’s what the future of “networking” is about, moving up from being the delivery truck to being what somebody actually bought.

The real competition for network vendors like Cisco is therefore the IT vendors, and in particular the software and platform vendors like Red Hat and VMware, and the public cloud providers.  The former occupies the space that network vendors have to get into in the long term, and the latter represents the as-a-service market model that Cisco and others are trying to adopt in the near term.  Can Cisco, or any network vendor, conceptualize this IT-centric vision, productize it, and then reorient its sales/marketing to make it a success?  Against established, credible, competitors?

I think Cisco has to try.  There is no salvation for the big-box network business.  Eventually, open-model networking will permeate even the enterprise and the data center.  Cisco right now has a unique ability to bridge the past (glory days) and the future (experience networks not connection networks).  They can’t hold onto that unique position for long.

https://seekingalpha.com/article/4368167-cisco-systems-csco-q4-2020-results-earnings-call-transcript?part=single

Resolving Network/Application Co-Dependency

A lot of our tech advances are about carts and horses, chickens and eggs.  In other words, they represent examples of co-dependency.  5G is perhaps the leading example of this, but you can make the same statement about carrier cloud and perhaps even IoT.  The common issue with these things is that there’s an application ecosystem that depends on a technology investment, whose justification is the same application ecosystem that depends on it.  Is this a deadlock we can break?

The future always depends on the present, and evolves from it.  When you read about the things that 5G Release 16 is supposed to add to 5G Core, you see features that are very interesting, and that could in fact serve to facilitate major changes in things like “Industry 4.0”, which is aimed at increasing M2M connections to automate more of industrial/manufacturing and even transportation processes.  When you read about the more radical forms of IoT, you find stuff like universal open sensor networks that create new applications in the same way the Internet, an open network, has.  Reading and aiming isn’t the same as doing and offering, though.  Something seems to be missing.

And it’s the same thing that’s often missing, the horse that pulls the cart or the egg that hatches the chicken.  To mix in another metaphor from a song, we need to prime the pump.  The Internet grew up on top of the public switched telephone network (PSTN), and it grew out of a major business imperative for the telcos, which was how to create a consumer data service to eat up some of the bandwidth that fiber technology was creating, thus preventing bit commoditization.  We have the latter force, arguably, today, but we don’t have the former.  The question of our time is how we create a self-bootstrapping set of co-dependent technologies.

Evolutionary forces are one answer, the default answer.  Eventually, when there’s a real opportunity, somebody figures out a way of getting things started, usually in a form where startup risk can be controlled.  The problem with evolution is that it takes a long time.  What we need is a way of identifying opportunities and managing startup risk that doesn’t require a few decades (or centuries) of trial and error.  To get that, we have to identify credible opportunity sources, credible risks, and then manage them—with an architecture.

The hidden thing that we had in the Internet was just that; an architecture.  TCP/IP and other “Internet protocols” are objectively not the perfect solution to universal data networking today, but they were a suitable transformation pathway.  There were three important truths about the Internet protocols.  First, they were specified.  We didn’t need to invent something new because we already had an Internet when consumer data services were emerging.  Second, they were open, so any vendor could jump on them without having to license patents and so forth.  That widened the scope of possible suppliers.  The final truth is that they were generalized in terms of mission.  The Internet firmly separated network and application, making the former generalized enough to support many different examples of the latter.

Architectures, in this context, could do two critical things.  First, they could set a long-term direction for the technology components of a complex new application or service.  That lets vendors build confidently toward a goal they can see, and for which they can assign probable benefits or business cases.  Second, they could outline how the technology pieces bootstrap, ensuring that there’s a reasonable entry threshold that doesn’t require paying for the entire future infrastructure on the back of the first application.

I’ve noted before that one of the issues we have with architecture these days is that telcos think a lot about migration and a lot less about where they’re migrating to.  You might think this contradicts my point about 5G, but while 5G outlines an “architecture” for wireless infrastructure, it conveniently declares application infrastructure out of scope.  NFV did that with management functions, and it ended up being unmanageable in any realistic sense.  So, there are two things that we’d need to have in a real architecture for the future of 5G services.  We need a long-term service vision, and we need an architecture defining the necessary elements of the technology ecosystem overall, not just the piece the telcos want to carve out for themselves.

I think that IoT, at least in the sense that it would be a specific 5G application, could be architected effectively by defining both a service interface (protocol) and data formats.  We have candidates for the first of these, but mostly for low-power 5G, which isn’t what the telcos at least are hoping for.  Having a strong definition of what a 5G mobile sensor protocol and data format would be, where “strong” means well-defined and accepted, is essential to get that ecosystem moving.

If we wanted to go beyond just getting things moving, we’d have to consider the relationship between sensors and services, and then between services and service providers.  You could build an IoT application with a simple “phone-home” approach, meaning that the application would be expected to be directly connected to sensors and controllers.  That’s not likely to be the best approach for broad IoT adoption.  A better approach would be to define a “service” as the digested information resources contributed by a set of sensors, and the sum of control behaviors available through a similar set of controllers.

This raises two challenges, the first being simply defining these services and the second being establishing an information framework for their use.  The latter then relates to the relationship between services and service providers.  Would operators take on the responsibility for turning 5G-connected sensors and controllers into services, or would they simply provide the connectivity?  If the latter, then we’re still stuck in co-dependency.  If the former, then operators would have to accept what’s essentially an OTT service mission.

For “Industry 4.0” things could be, if anything, a little more complicated.  5G, at the technology level, is straightforward.  Industry 4.0 would have to specialize based on the specific mission of the “industry” and the company within it.  Presuming that we could target applications at the industry level, we’d still have dozens of different combinations of technology elements to deal with, which would mean having dozens of blueprints or templates that would define a basic business suite, and then specializing it as needed for a business.

Who does this?  Software companies could work at it, but in today’s world we’d probably want to assume that an open-source initiative would drive the bus.  That initiative would have to start with a broad application architecture and then frame industry-centric applications within the framework of that architecture.

We can’t assume that solving a network problem, removing a network barrier, will automatically drive all the application development that the network would facilitate.  We can’t even assume that the network will advance in deployment scope or features without specific near-term applications.  We can’t assume those applications would develop absent the needed network support, so we’re back to that co-dependency.  Standards haven’t solved it, and neither has open-source, so we may need to invent a different kind of “architecture forum” to align all our forces along the same axis.

When Will This All Be Over?

When will this all be over?  That’s the question that’s on everyone’s mind in these pandemic days.  There are really two answers, of course.  One relates to how long it will be before human interactions won’t be significantly hampered by the virus, and the other relates to how long it will be before behaviors return to normal, even given the lessening of risk.  The latter is the one I’m best qualified to talk about, but some general comments on the first one are required to level-set.  Remember, I’m not a doctor and these are simply my views as a researcher.

It’s very unlikely that a vaccine will be available broadly before the first quarter of 2021.  First responders could get one a little sooner, but for most industrial countries, at least, the vaccine probably won’t be available to the mass population until February or early March.  Then it becomes a question of how quickly people get the vaccine, how effective it is, and what level of natural immunity is really present in the population.

In the meantime, we are seeing more and more treatments for COVID, treatments not aimed at eliminating the disease but at reducing the rate of complications or fatalities.  These are already available, and will be available in greater numbers through the fall and winter, to the point where some medical experts I’ve chatted with tell me that the fatality rate in the industrial world will likely fall to about 10% of the spring peak rate, making COVID no more serious than a bad flu season.  However, these treatments are less likely to reduce hospitalizations; my contacts expect them to decline by no more than about a third.

The imponderable, on the medical side, is how cooperative the population will be.  Today US surveys show only about two-thirds of people would get a vaccine if one was offered.  Other countries seem to have much higher levels of vaccine acceptance, up to 90%.  Mask wearing and social distancing, which now seem more effective in halting the spread of the disease, have been politicized in the US but are better accepted elsewhere.  If we do all the right things, then we could see rapid and real recovery even this fall.  If not….

Where all this leaves us is that there will still be a perceived (and real) risk of contracting COVID well into late fall and winter.  People will almost surely tend to avoid risk by altering behavior, in proportion to what they believe their real risk to be.  The young, obviously, seem to be more risk-takers here.  Among my own contacts, I found it interesting that people under 30 and unmarried were twice as likely to disregard or minimize COVID risks as those married or older.

Let’s leave the medical side now.  Industries that necessarily expose people to masses of others who may have COVID are going to be affected by the perceived risk of contagion.  Travel, sports, theaters, dining, and related businesses aren’t going to see full recovery until the perception of risk is much lower, meaning, I suspect, the April/May 2021 timeframe.  Even then, most are unlikely to return to their old levels.  People have learned new behaviors, become conditioned to new fears.

We’re currently in a revenue/spending positive-feedback slump.  Companies see sales fall, so they spend less, which causes other companies to see sales fall, and so forth.  This kind of situation prevails until the cycle is broken by something, and the something that’s already doing it is a shift in behavior and business methods that accommodate the risk of being proximate to a lot of potentially infected people.  It’s that adaptation to conditions that then becomes the norm, and it’s the new norm that determines whether there’s a fast exit to the old norm, or whether things change…maybe forever.

Retail may be the best example of this principle.  Smaller retailers have a lower probability of surviving extended virus impact, and even the larger retailers have been hit with traffic reductions and lower sales.  As a result, many have stopped stocking less-popular items, and that has forced buyers to turn to online sources.  Amazon’s interest in turning empty department store sites in malls into fulfillment centers is proof of the fact that online sales are now in our blood.  I’ve found for myself that about a third of the things I used to go to a “big box” store for are now out of stock there, so I’m buying things online that I never had before.  So are others.

And what about office businesses?  Employers have found that many of their employees are actually more productive working remotely than they would be in the office, and almost all have told me that “business travel” was far less efficient than online meetings and calls.  This, without a significant number of truly innovative tools to aid in virtual meetings and collaboration.  There are products in the pipeline that will do much better, and those products will further shift the balance away from in-person collaboration.  Without a benefit driver for worker collectivism, what’s the role of an office?

Remember that we don’t have to prove that an office is a lot better than remote work, or even better at all.  We just have to prove that remote work, in the balance, is preferable to accepting the risk of having a big chunk of the workforce come down with COVID.  Same with shopping, same with entertainment.  Even travel is going to be impacted for years.  People are very antsy over the image of sitting on an airplane with a bunch of maskless companions, or being delayed and possibly missing connections because planes had to return to the gate because of passenger non-compliance with mask rules.  Why not drive?

We are re-learning our living patterns.  Many countries in Asia that I’ve visited over the years embraced wearing masks just to reduce routine infections.  How many in the US and other industrial nations will do the same, especially if there’s a lingering COVID risk for years, even decades?  And this, without considering that there are plenty more new viruses where COVID came from.  There’s been long-term scientific speculation that most of our major diseases crossed over between animals and humans because of specific conditions of close contact, often including butchering in proximity to people.

Tech is going to adapt to this, and in fact it’s likely to see a big of a resurgence.  There are many software products that have failed, along with their companies, because new features could no longer deliver new benefits.  Look at Novell and NetWare.  If “work” is now virtual, then much of the technology that supports it will have to be optimized for that new situation.  Same with shopping, same with entertainment.  Might people who no longer feel safe at concerts buy big TVs with powerful and feature-rich sound systems?  It seems likely.

Perhaps the biggest reason we can’t say that this will all be over soon is that it shouldn’t be.  As bad as COVID has been, it could have been a lot worse.  We could, just as easily, had a crossover virus that had Ebola’s fatality rate and the contagion of the flu or measles, and killed half the human race.  It would be nice to think that everyone will have learned from this, that the basic vulnerabilities and risk factors would be addressed effectively.  Nice, but far from certain.  Until we see that something positive and systemic has been done, we’re going to be more standoffish in our behaviors overall, and how we work and what we work on and with will simply have to adapt…as they are already adapting, and as we are adapting too.