Balancing the Forces Driving IP into the Future

We live in an IP world, at least insofar as the way the network appears to users.  IP remains a largely adaptive protocol, one that makes traffic engineering more difficult.  Some of the proposals to modernize IP networks, including SDN/OpenFlow, have focused on centralizing route control for the purpose of improving traffic engineering and potentially reducing “convergence time” as a network adapts to a failure mode.  We may now be seeing developments that combine the revolutionary aspects of separating the IP control plane, with the kind of evolutionary changes that IP’s broad use almost dictates.

There have long been initiatives to improve how IP itself works, including multi-protocol label switching (MPLS) and the SPRING initiative (Source Packet Routing In NetworkinG).  Both these concepts have evolved from early initiatives and ideas.  Have they, or will they, relieve the pressure on transforming IP for better traffic efficiency?  Will some form of SDN do that?  Will something make the whole topic moot?  That’s what we’ll look at today.

IP networks are based on connectionless, per-packet forwarding.  There is really no “route” in vanilla IP, only a series of hops to the next-best place to be, which inevitably get you where you need to be.  Topology and conditions advertising build a routing table in the devices, and that table is then used to determine where to send the packet next.  The problem with this is that you don’t really know where anything is going, which makes it hard to optimize resources.  SDN attempted to address this, in its OpenFlow variation, by centralizing the control of forwarding-table entries.  That way, with a god-like view of the overall topology and conditions in the network, the central controller could work to balance traffic optimally.

There is, and in fact has been, another way of using a single source of routing intelligence.  Packets all enter at the edge, by definition.  Suppose we have a database (forget how it’s created for the moment) containing the labels of nodes/trunks that made up all the possible paths source-to-destination.  If, on entry, a packet’s destination is looked up, and the list of labels associated with its route is then prepended to it, you could get from source to destination by following the list of labels.  That’s “source routing”.

Source routing is a very old concept, one that was adopted by both Frame Relay and ATM standards a couple decades ago.  In its most modern form, source routing assumes that a network consists of a series of linked black-box domains.  When an “item” (a packet/message/session, depending on just what we’re talking about) has to connect across these domains, the source of the item determines the best path as a series of domain hops and codes the packet header accordingly, to follow the route.  At each domain entry, the domain edge device pops its own header segment and replaces it with instructions on how to transit the domain to the designated next domain on the vector.

The nice thing about this approach, from a perspective of modernizing networks, is that it allows a network to be divided into domains that can represent different implementations.  When a packet enters the network, a stack of labels defines its path, but a label can represent a domain, too.  That lets different implementations within each domain to exist, while retaining source-side path selection.

Obviously, this corresponds at least somewhat with the notion of segment routing in IP today, and SPRING is a refinement of (or implementation of) source routing concepts.  The advantage of source/segment routing is that packets can be directed along a specific route, which means that it’s possible to do more traffic engineering than would be possible in a purely adaptive IP network, where packets work their way to their destination through the interaction of a series of dynamically defined routing tables they meet along the way.

MPLS is a younger concept; the first draft goes back to about 1997.  The roots of MPLS really came about because of a startup, Ipsilon, who proposed to use ATM networks as the “core” of an IP network.  When the Ipsilon edge device recognized a session (which they called a “persistent flow”), they’d set up a virtual circuit to the destination’s Ipsilon edge device, and forward along that, bypassing all the interior IP stuff.  Cisco picked up the concept as Tag Switching, and that became the basis for a workgroup that produced MPLS.

MPLS also works loosely on source routing, and its goal was different from that of Ipsilon; MPLS paths or tunnels were designed more to carry aggregate traffic, meaning it was first adopted to support traffic engineering.  It’s been adopted as a means of controlling actual lower-layer behavior, including optical-layer, via “generalized” MPLS or GMPLS.  Most think of MPLS as a new protocol layer below IP and above any physical or data-link protocols.

There are two significant differences between segment routing and SPRING, and MPLS.  The first is the control plane, and the second is the way that forwarding state is maintained.  MPLS is more IP-ish, and segment routing is a bit more source-routing-ish.

In MPLS, routers involved in MPLS forwarding have a forwarding state (think of it as a set of tables) that is exchanged among themselves via LDP or RSVP-TE, which is in effect an MPLS control plane.  In effect, these form a different forwarding rule set from normal IP, and so to make data forwarding work in MPLS, the states of these tables are synchronized via the control-plane exchanges (LDP, RSVP-TE).  In segment routing, the packets themselves maintain a label stack (source routing) that identifies the forwarding state explicitly on entry.  The intermediate segment-routing elements just process the source route, so they don’t need to maintain forwarding states for each tunnel.

The segment routing approach lets you do path computation (building the label stack) as a cooperative function among edge routers, or through a central controller.  It’s thus philosophically a kind of bridge between IP and OpenFlow-modeled SDN.  That makes segment routing a potentially critical bridge between the IP-modeled network and a network in which SDN is used to build the interior routes.

Source routing, in the most flexible form I described, is good for another reason, which is that it could be a means of integrating different forwarding approaches into a common IP network.  If, in my example earlier, we think of the “domains” as different network forwarding models (some IP, some SDN, or whatever), then each is a black box.  We push a route at the domain level, and have each domain pop their contribution and convert it to something internal (SDN, MPLS, SPRING), and so on until the next domain entry point.

Right now, source routing is mostly an interior behavior, meaning within a network, and extensions to the interior gateway protocols (OSPF, IS-IS) are proposed.  BGP-LS Extensions for Segment Routing allow for the necessary forwarding-state information to be carried between AS domains via BGP.  However, the implementation of the SR extensions to BGP is necessary for this to work, not just the definitions of how the implementation should be done.

What I’m not clear on, frankly, is exactly what BGP-LS is advertising.  As I see it, there would be two options.  It could advertise a label representing the entry-to-egress path in aggregate, in which case it would have to translate that to an IGP label stack on entry.  It could also advertise its segment structure overall, in which case the source route from the originator would/could include the segment label stack end to end.  The latter approach would necessarily give another AS some control over traffic engineering, which I think is unlikely to be met with much enthusiasm.

Everything in BGP, pretty much, is optional under policy control.  That gets us back to implementation.  Not every ISP will elect to support any form of source routing, and if there are indeed different approaches to what might be advertised by BGP-LS, then there are different implementations that could be chosen.  This might mean that source routing and even BGP-LS extensions to support it would be most valuable in large data-center-and-VPN combinations, where everything is within a single true network administration, even if there are multiple AS’s involved.

The central point to all of this change is that we’re evolving our sense of how IP networks are built, but more perhaps through “Brownian movement” than through a specific drive toward a goal.  The segment routing initiatives seem to bring out a couple of important points, particularly that a different model of the routing control plane is indeed helpful and that source routing could be applied to IP to gain many of the benefits claimed for OpenFlow SDN.  It just doesn’t seem to be pursuing these points directly, and segment routing could bring us a lot of benefits if some of its critical points were addressed more systematically.

But let’s now get to that “mootness” point I opened at the start of this blog.  Traffic engineering is one of two possible responses to congestion.  The other is to bury the problem in capacity.  As fiber transport improves, and more significantly as transport economies of scale grow, aggregating traffic is less likely to create congestion even with primitive mechanisms to allocate capacity.  We could, at some point, build a more meshed, more efficient, higher-capacity transport network, and that could take a lot of the pressure off on traffic engineering enhancements to IP.  It could also reduce the benefit of the alternative approaches.

We can do virtual networking above IP—look at SD-WAN.  If we can minimize traffic engineering’s value through capacity augmentation, we might be able to degrade or even eliminate the MPLS mission entirely.  But if we create a formal virtual-network mission, we’ll then have to decide how that mission is harmonized across the world, across different implementations, across different sets of needs.  It’s a challenge not unlike “network slicing” in 5G.

I can’t give you a firm proof of this, but I think that segment routing is going to be important in the future, because I think we’re in for a long period of divergence in approaches to networking before we see convergence.  That alone makes segment routing valuable.  Segment routing in any form isn’t going to sweep MPLS aside; there are just too many things done with it and too large a skills commitment to it.  I do hope that wat segment routing does accomplish is making the way we structure IP networks more friendly with regard to the introduction of new technologies like SDN, and it does seem to be doing just that so far.  This could be a very important space to watch.

When Augmenting Reality Loses It’s Touch

Augmented reality is a viable focus for a new model of worker empowerment.  That’s one reason why it’s important to look at places where the model appears to have failed, even before the concept of AR in empowerment got started.  Protocol gave us an opportunity to do that with an article on Daqri, a company who raised a big war chest and then fell from grace.  How, and what might Daqri’s experience teach us about next-gen empowerment?  Let’s see.

The Protocol tale of Daqri woe deals with a lot of business issues, and they’re surely valuable here.  It seems clear that the big influx of dollars came along with a total buyout of Daqri, a situation that essentially gave all the founders of the startup an exit before they’d really even had an entry, market-wise.  It also made the company totally reliant on the insights of the investors, which isn’t usually a good way to get a tech concept going.

I want to deal with the broader technology and focus issues in this blog.  The business issues may well have contributed to the company’s going under, but could Daqri have been saved by a good business framework?  Only if the technology was suitable.

Augmented reality, obviously, is about augmenting reality.  That means mixing real-world viewpoints with something generated from data and applications.  There seem to be three logical pieces to all AR.  First, something that lets the data/applications know what the real-world context is.  Second, something that frames the data/application information in that context, and finally, something that actually creates the display.  The balance of importance of these three things can vary, but it sure seems like a measure of each is always necessary for AR to work.

Conceptually, Daqri was a surfer on the AR wave, starting with the idea of using QR codes on objects to superimpose something on a smartphone field of view when the code was scanned.  The QR code itself is the first of my three AR requirements, the database of QR-code information the second, and the phone the third. The early target applications were very consumeristic, and it was quickly obvious to everyone that the market for consumer AR was still a long way off.  The shift to a business focus was the result, and that’s when the bigger funding influx came.

Here’s a quick lesson in itself.  How many times (most recently in my blog yesterday on AT&T) have we seen vendors shift from a broad-market target to a business-market target because the let the hype in the market lead them too far ahead of reality?  Yes, it’s often easier to make a business case for something enterprise targeted, because an enterprise’s value propositions are purely financial.  Consumers have more subjective benefit cases, always exciting because they’re always susceptible to hype.  Just saying….

Moving back to Daqri, the logical thing to have done at this point was to look at the general model of an AR-based business case, encapsulated in my three points.  Businesses want worker productivity and empowerment, so the best approach would have been to ask how AR might empower workers.  This, I think, might have led to a top-down exploration of the notion of AR as a bridge between the physical world in which the worker and the work coexists, and the information world that IT systems, applications, and data could create.

Instead, what Daqri really did was focus on the device that created the AR experience, the last of my three requirements for AR.  Their M&A, fueled by that windfall funding, focused on building the AR experience, too.  There was little or no effort expended on the architecture of the two-different-worlds model of worker empowerment via AR.  This architecture, you may recall, has been the subject of a number of my “information fields” blogs.  Without an architectural model, focusing on the AR device was like focusing on a killer billboard beer ad without having any beer to sell.

What they came up with was an AR helmet, a device that would surely have been a wonderful platform but for the fact that it was going to cost well over ten thousand dollars per unit.  This, at a time when businesses were going for bring-your-own-device (BYOD) to avoid buying workers smartphones.  The helmet would require a monumental business case to justify, and there was still no real application framework to exploit it.  The later decision to focus on glasses rather than helmets didn’t help as much as hoped; even the glasses cost more than three iPhones.  So, to make a long story short, down they went.

Protocol here gives a nice summation: “But ultimately, the demise also has a lot to do with factors that aren’t all that unique to Daqri: Making hardware is difficult, as is inventing products for a use case nobody yet wants.”  But that kind of understates the problem; there was no use case to invent the product for.  A productivity tool, more perhaps than any other kind of tool, is contextual.  It has to fit into the world of worker and work, and at the same time into the information and empowerment world.  That’s not a job for a display, it’s a job for a kind of software application that we can glimpse today but which we can’t just go out and buy.

The other point is that Daqri’s approach could never generate a “use case”, only a framework in which use cases could be developed.  Most of the architectural heavy lifting of a two-worlds productivity enhancement framework using AR would have been generally useful.  A little specialized tweaking could then have presented a prospect with a real use case, with modest effort.  If somebody gave me three hundred million, I’d sure believe I could generate that productivity enhancement framework.  I’m equally sure thousands of others could do it too.

Did Daqri miss the mark because the state of the art, at the critical time in 2013 when they tried to shift their focus, was simply not advanced far enough?  Kubernetes’ initial release was the following year, and the concept of a Kubernetes ecosystem, cloud-native, service mesh, and all the other good DNA, was still in the future.  Did they miss it because they never saw it, instead thinking the realization of AR was the device that displayed it?  Did they just get caught up in promoting the concept to get a profitable flip, and never quite got to the realization?  We’ll probably never know, but that’s not the question we need to answer.

The real question?  Will others repeat the failure, for the same reason(s) or new ones of their own invention?  It’s not that doing the right thing in AR is “difficult” in a technical sense, but it may well be difficult in a business sense.  Who will fund a startup well enough to take that productivity enhancement framework concept and run with it?  Who will design it?  Who will wait, through the development and prototyping, to buy it?  I think you could get the answer for three hundred million, but could you productize it successfully for that?  A harder question.

Remember the old song “Born too Late?”  Daqri may have been born too early, and we may still be too early.  Open-source software is advancing, and with the advance (and with some false starts and disorder) it’s inventing the things that are needed, because it’s need that drives it.  Maybe as soon as this year, or maybe several years down the line, the critical pieces will emerge, and that will cut the cost and risk of developing the framework Daqri needed, but never had.

Can AT&T Navigate its New Content-Centric Path?

Barron’s ran an interesting interview with ATT heir-apparent John Stankey, now CEO of WarnerMedia, and it offers some views on how streaming and 5G might transform the telco space.  I think these same factors will influence the direction of cable companies, and of course will then pull through influence on the vendor space.  We have to take all public comments like this with a grain of salt, of course, and I intend to provide at least some of that!

The first statement Stankey makes sounds a lot like one Cisco has been making, or at least suggesting.  Business, he thinks, will lead the business case development for 5G.  This is a return, says Stankey, to the norm in technology, with a new development first appearing in the business service space and then transferring to the consumer space.  There’s some sense in this view, but less sense in what he’s suggesting will drive those business models.

Stankey says that things like distributed manufacturing, with 5G mass-distributed sensors, will be the applications that create the early business case, and I have two problems with that view.  First, factory floors are well within the network range of current technologies like WiFi and even ZigBee, and thus it’s hard for me to see how 5G is really necessary to connect them.  Especially if you assume you have to pay for those devices to be networked, which is my second point.  If some new technology like 5G were to be needed, I think it would likely be an unlicensed form with a more local scope.  WiFi 6 would serve here, likely as well as unlicensed 5G.

The thing Stankey isn’t saying, and that could actually create a business case for 5G in the enterprise, is that worker productivity gains have driven all past enterprise IT spending waves, and will surely drive the next one.  It is very possible to link mobile empowerment with 5G services to worker productivity improvement—see all my blogs on the topic, including the service-centric “information fields” model.

Perhaps AT&T is quiet on the real opportunity to build compelling business cases because they’re too complicated to explain to the press, or perhaps because they’re dependent on ecosystem-building activities that are more likely to promote cloud software players like Red Hat or VMware than to promote service providers.  A simple “they’ll pay monthly to connect factory sensors via 5G” is a better slant for a financial publication.

The important point here, I think, is that AT&T is admitting that consumers are not going to pay more for 5G per se; they might pay more if 5G were linked to something that was compelling at the application/service level.  Thus, we need to pin early incremental 5G revenue on businesses, who have at least a theoretical path toward 5G business cases.  The current push to get 5G out there doesn’t account for any path to those business cases at all, so if real incremental 5G revenue is needed, then we need to get on the job and create something above the network.

Stankey reinforces the fact that consumers need a real-world value-add connection to new capabilities through 5G, not just more capacity.  He says that consumers won’t care about the wavelength of 5G, or whether “I can give you 110 megabytes and they can give you 105”.  He makes the point that “you’re not going to be able forever to sell connectivity based on speed.”  Thus, AT&T’s Warner Media activity, perhaps especially HBO Max.

For the consumer space, AT&T seems to believe that the HBO Max platform will be the unifying mechanism to deliver whatever it is that consumers will pay for, and that it’s revenue from this platform that will realize the incremental revenue of 5G.  There’s truth to that, and some fairly aggressive hopeful thinking.

The truth part is clear; nobody is really interested in paying simply to push bits to a mobile device a little faster.  In most cases, any incremental speed advantage of 5G could be seen only in the speed of a download; streaming video doesn’t require higher speeds and if you postulated 8k video delivery to smartphones, you’d have to wonder if any could display it or if people could see the difference on a small screen.

The hopeful part is that 5G will have any role in making the platform successful.  I think AT&T realizes that the HBO Max story can be told just as well for the mobile user of 4G as for 5G.  Thus, they either have to believe that there’s no real link between 5G and HBO Max other than that AT&T has to deal with them at the same time, or that millimeter-wave 5G fixed wireless is the link.

The point Stankey makes that HBO Max “is a high-value, low-priced service that we believe most U.S. households will want. It starts with a subscription video service, the HBO brand. But we can add ad-supported services, as well” is telling here.  It says that AT&T thinks that an Amazon/Netflix model of subscription streaming of video is the current need, more so than live TV, because all of the latter is going to have to be ad-sponsored.  This, IMHO, says that Warner Media may exploit its library of content more, at least initially, and then focus on using its production facilities to create original video content, as Amazon and Netflix have done.

AT&T also believes, according to Stankey, that “cable TV” meaning live streaming I presume, would eventually thin out to be nothing more than true “live” stuff.  The rest would migrate to a subscription model, with a series being put up for streaming all at once and streamed when needed.

All of this says to me that AT&T accepts the view that linear TV is going away, which links arms nicely with the fact that all of this is linked to being able to deliver a lot more bandwidth to the home at a lower price/cost than before.  That’s where millimeter-wave 5G/FTTN hybrids come in.  They’d give AT&T two things they don’t have now.

Thing number one is a lower-pass-cost way of delivering broadband speeds of 50 Mbps or higher to areas with higher population density (suburban, primarily).  Copper loop doesn’t have the combination of capacity and low operations cost, and FTTH has too high a pass cost for lower-density geographies.  AT&T’s HBO Max strategy has to work for TV, and work well, and 5G/FTTN could make that happen.  In more rural areas, 5G in its cellular form could provide better home broadband speeds at a lower cost, too.  The combination could be killer.

Thing number two is the potential to enter the home broadband market out of their home region.  If AT&T could leverage its content offerings to the TV in other areas, their plan for streaming would be all the more viable.  Verizon, AT&T’s big rival, has lower demand density and so can more easily deploy deep fiber and higher-speed broadband.  Why not take advantage of 5G to go into Verizon’s territory, particularly when their chief cable rival, Comcast, shares much of Verizon’s geography?

The fact is that AT&T’s content strategy could well depend on their exploiting 5G/FTTN and cellular 5G for home broadband.  People still want to “watch TV”, meaning video content of any sort, on their TV if they’re at home.  Mobile-only strategies are at risk to strategies linked to home broadband, so AT&T needs to be the one to create the linkage, which means they really have to jump on 5G/FTTN.

They also need to jump on things like advertising in a streaming world, and integrating that with the notion of a contextualized, personalized, service platform that lives above the network and below service features, providing uniform support for advertising and other services that require knowledge of the real world.  Fortunately for AT&T, they’re leading the operators in their awareness of the need to evolve the notion of what the network provides.  That gives them at least a shot of getting to the place they need to be.

Two Rules to Make Microservices and Cloud-Native Work

Oversimplification is never a good thing, and sometimes it can be downright destructive.  One of those times is when we look at “distributed” application models, which include such things as serverless and microservices.  Classical wisdom says that distributing applications down to microservices is good.  Classical wisdom says that hosted service features are best when distributed across the carrier cloud.  There’s a lot of goodness in that classical wisdom, but there are also some major oversimplifications.  The good news is that there’s almost surely a path to resolving some or all of the problems of distributed applications, and that could be the most profound thing we could do to promote a new computing model.

Software used to be “monolithic”, meaning that there was one big software application that ran in one place and did one specific thing.  A bank might run its demand deposit accounting (DDA, meaning savings and checking) application, another application for lending, another for investing, and so forth.  Because core business applications tend to be focused on database activity, it actually makes sense to think about many of these applications in monolithic terms.

There are two problems with monolithic programs.  First, they tend to be enormous, meaning that there’s a lot of code involved.  Their massive size, and the fact that structural rules to make them more maintainable are often ignored during development, makes changing or fixing them difficult and even risky.  Some financial institutions I’ve worked with had core applications that they approached with real fear, and that’s in a highly regulated industry.  The second problem is that they are vulnerable.  If the program crashes, the system it’s running on fails, or the network connections to the site where the system is located in breaks, the application is down.

Componentization, meaning the division of programs into components based on some logical subdivision of the tasks the programs do, is one step in fixing this.  Componentized programs are easier to maintain because they’re smaller, and even if good structural practices aren’t followed within the components, their smaller size makes it easier to follow the code.

Where things get a bit more complicated is when we see that componentization has two dimensions.  For decades, it’s been possible to divide programs into autonomous procedures/functions, meaning smaller units of code that were called on when needed.  These units of code were all part of one program, meaning they were loaded as a unit and thus generated what looked like monolithic applications.  Inside, they were componentized.

The second dimension came along fairly quickly.  If we had separate components, say our procedures or functions, why not allow them to be separately hosted?  A distributed component model did just that, breaking programs up into components that could be linked via a network connection.  Things like remote procedure calls (RPC), the Service Oriented Architecture (SOA), the common object request broker architecture (CORBA), and recent things like RESTful interfaces and service mesh, are all models supporting the distribution of components.

When you have distributed components, you have to think about things differently.  In particular, you have to consider three questions related to that approach.  First, where do you put things so that they can be discovered?  Second, what defines the pattern of workflow among the components and how does that impact quality of experience?  Third, how do distributable-model benefits like resilience or scalability, impact the first two?  We need to look at all these.

The general term used to describe the “where” point is orchestration.  The industry is converging on the notion that software is containerized, and that an orchestrator called Kubernetes does the work of putting things where they need to be.  Kubernetes provides (or supports) both deployment and discovery, and it can also handle redeployment in case of failures, and even some scaling under load.  If something more complex has to be deployed, you can add a service mesh technology (like Istio, the by-far market leader in that space) to improve the load balancing and discovery processes.

A lot of people might say that service mesh tools are what determine distributed workflows, but that’s not really true.  A service mesh carries workflow, but what really determines it is the way that the application divided into components to start with, and how components expect to communicate among themselves.  This is the place where oversimplification can kill distributed application, or service, success.

Let’s say that we have a “transaction” that requires three “steps” to process.  One seemingly logical way to componentize the application that processes the transaction is to create a Step-1, -2 and -3 component.  Sometimes that might work; often it’s a bad idea, and here’s why.

If the transaction processing steps are all required for all transactions, then every distributed component will have to participate in order.  To start off, that means that the delay associated with getting the output of Step-1 to the input of Step-2, and so on, will accumulate.  The more components, the more steps, the more delay.

Where this step-wise componentization makes sense is if there’s a lot of stuff done in each step, so it takes time to complete it, and in particular if some (hopefully many) transactions don’t have to go all the way through the stream of steps.  This matches the cloud front-end model of applications today.  The user spends most of the time interacting with a cloud-front-end component set that is highly elastic and resilient, and when the interaction is done, the result is a single transaction that gets posted to the back-end process, usually in the data center.

One area of distributed components that doesn’t make sense is the “service chain” model NFV gave us.  If we have a bunch of components that are just aligned in sequence, with everything that goes in the front coming out the back, we’ve got something that should be implemented in one place only, with no network delay between components.  The single-load or monolithic VNV in NFV would also make sense for another reason; it’s actually more reliable.

Suppose our Step-1 through -3 were three VNFs in a service chain.  A failure of any of three components now breaks the service, and the MTBF of the service will be lower (the service will fail more often) than would be the case in a monolithic VNF.  For three steps, assuming the same MTBF for all the components, we’d have an MTBF a third of that base component MTBF.  In scalability planning for variable loads, consider that if everything has to go through all three steps in order, it’s going to be hard to scale one of them and have any impact on performance.

We can take this lesson to edge computing.  Latency is our enemy, right?  That’s why we’re looking to host a distributable component at the edge, after all.  However, if that edge component relies on a “deeper” connection to a successor component for processing (a chain that goes user>edge>metro, for example), we’ve not improved latency end to end.  Only if we host all the processing of that transaction or event at the edge do we reduce latency.

Complex workflows generate more latency and reduce overall MTBF, versus an optimized workflow designed to componentize only where the components created actually make sense in workflow terms.  Don’t separate things that are always together.  One of the most profound uses of simulation in cloud and network design is simulation of distributed component behavior in specific workflows.  This could tell you what’s going to actually improve things, and what’s likely to generate a massive, expensive, embarrassing, application or service failure.

You might conclude from this that not only is everything not a candidate for microservice implementation, most applications may not be.  Classical wisdom is wrong.  Edge hosting can increase latency.  Service chains reduce reliability and performance.  You have to plan all distributed applications, networks, and services using a new set of rules.  Then, you can make the best trade-off, and the nature of that tradeoff is dependent on something we typically don’t consider, and is more dynamic than we think.

The first rule of distributability, if there is such a thing, is that there exists a “work fabric” into which components are connected, and whose properties determine the optimal tradeoffs in distributability.  Workflows move through the work fabric, which in practice is vehicle through which inter-component connections are made.  The most important property of the work fabric is a delay gradient.  The further something moves in the fabric, the more difference in delay is created.

The minimum delay in the fabric is the delay associated with component calling, which is typically small.  Then comes the network binding delay, which consists of the time needed to discover where work needs to go and dispatch it on its way.  Finally, there’s network delay itself.  Since the delay gradient contribution of the first two delay elements is constant, the delay gradient is created by transit delay in the network.

That means that for distributability to be optimized, you need to minimize network delay.  You can’t speed up light or electrons, but you can reduce queuing and handling, and that should be considered mandatory if you want to build an efficient work fabric.

The second rule of distributability (same qualification!) is that an optimum distributed model will adapt to the workflows by moving its elements within the work fabric.  The best way to picture this is to say that workflows are attractive; if you have two components linked via a workflow, they attract each other through the workflow.  The optimum location for a component in a distributed work fabric is reached when the attracting forces of all the workflows it’s involved in have balanced each other.

This suggests that in service meshes or any other distributed structure, we should presume that over time the workflow properties of the application will combine with the work fabric to minimize collective latency.  We move stuff, in short.  Whenever a new instance of something is needed, spin it up where it better balances those workflow attractions.  Same with redeployment.  When you have to improve load-handling by scaling, you scale in a location that’s optimum for the workflows you expect to handle.

This has major implications on the optimization of load-balancing.  You can’t just spin something up where you have the space and hope for the best.  You have to spin up something where it can create a better (or at least the best of the not-better) total delays.  You always look to minimize delay gradient overall.

One minor implication of this is that every microservice/component model should work transparently if the components are attracted all the way into being co-hosted.  We should not demand separation of components.  If QoE demands, we should co-host them.  Any model of dynamic movement of components demands dynamic binding, and co-hosting means that we should be able to eliminate the network discovery and coupling processes in our work fabric if the components are running together.

What does this then do to “orchestration”?  If the microservices are going to rush around based on workflow attractions, is where they start even meaningful?  Might we start everything co-hosted and migrate outward if workflow attractions say that proximity of components doesn’t add enough to QoE to compensate for the risk of a facility failure taking everything out?  Do we need to need anything more than approximation?  I don’t think this changes the dynamic of deployment (Kubernetes) but it might well change how we try to optimize where we host things.

This isn’t the kind of thing you hear about.  We’re not looking at either of these rules today, in an organized way, and if we don’t then we end up creating a work fabric model that won’t deliver the best quality of experience.  A good model might come from the top, or it might (and likely will) come as an evolution to the service mesh.  I do think it will come, though, because we need it to get the most from distributed applications and services.

Cisco’s State of the Internet: A Promise or a Fable?

Cisco released their Annual Internet Report, and it’s the typical mixture of interesting stuff, insight, and blatant hype.  With Cisco, though, even their hype is interesting, so I want to look at the report and sort out the angles, hopefully adding some analysis that will clarify just what pieces of the report fit in which of my categories.

At the high level, the report says that by 2023 there will be 5.6 billion Internet users worldwide, representing two-thirds of the world’s population.  There will be 3.6 devices per every human on earth, and the average broadband speed will be 110 Mbps.  The report is about the Internet, of course, but there’s a very strong mobile bias to it.  That’s fair given that mobile devices, smartphones in particular, are the big instrument of change in the Internet overall.  Consumers drive the Internet, and connected-home devices likely drive consumer device growth.

This raises an important point, which is that Cisco (as it has regularly) conflates “Internet” and “connected”.  Their definition of something like IoT reflects any device that can be accessed using an Internet connection, even indirectly.  They also use the term “connected car” to mean a car with some sort of Internet connection, not to mean autonomous operations.

The next point, which I think is one of the most important, is that Cisco makes the following comment about video:  “An Internet-enabled HD television that draws couple – three hours of content per day from the Internet would generate as much Internet traffic as an entire household today, on an average. Video effect of the devices on traffic is more pronounced because of the introduction of Ultra-High-Definition (UHD), or 4K, video streaming.”  There are (forgive the pedantic phrasing) profound implications in this quote…several, in fact.

The first one is that all Internet-enabled TVs don’t necessarily draw content from the Internet.  Most have both linear and Internet connections, and so there is a combination of streaming and viewing off-Internet.  However, the signs that even cable companies are abandoning linear TV for streaming are already highly visible.  Revenue from linear TV is falling, in fact.  The most significant transformation in Internet traffic is not the 4K or 8K video transformation of sets (what the set can do is independent of what the streaming video offers in terms of resolution), it’s the shift between linear TV and streaming TV.  From zero traffic to potentially significant traffic, since linear TV doesn’t require an Internet connection at all.

The second point, related to the first, is that mobility is a big reason why the first implication matters.  People have learned to stream because they’ve learned to consume content as needed.  You don’t sit down in front of a TV to watch “what’s on” these days, you pick up whatever Internet device is convenient and pick what suits you.  That mobile habit has crossed over into living rooms.  We may, perhaps, see the growth in video devices that Cisco’s report predicts, but we almost certainly won’t see the bandwidth of all that stored content rise to fit the bigger screens.  New programming may be available in 4K/8K form, but not much of the old, and so the impact of higher-definition TV is lower.  In any case, mobile devices won’t benefit from 4K/8K.

Let’s move on now to what Cisco calls “M2M”, meaning machine-to-machine or IoT-like stuff.  They say “Connected home applications, such as home automation, home security and video surveillance, connected white goods, and tracking applications, will represent 48 percent, or nearly half, of the total M2M connections by 2023, showing the pervasiveness of M2M in our lives.”  Yes, it’s true that M2M connections (in the loose Cisco sense) will dominate the future.  Home automation, surveillance…you get it.  However, this misses two key points.

First point: most home automation devices don’t send anything or receive anything for long periods of time.  If your home controller switches the lights on at nightfall and off when you go to bed, those events aren’t cropping up every couple seconds.  Almost all smart-home applications do next to nothing most of the time.  Thus, the fact that they’re connected isn’t highly relevant to the Internet.

Second point: message sizes on smart-home applications are typically small.  Home security devices like video doorbells are an exception; they send compressed video files.  But remember that everyone in your neighborhood isn’t likely standing on your doorstep every minute to activate your doorbell.  The things that might happen more often involve very short messages.

But IoT (or M2M, as Cisco calls it) is still important, just not so much for traffic.  The difference between “Internet” in a traditional emails-and-searching sense, and Internet as a means of controlling your home, is profound.  You need a lot better availability and QoS for M2M because it matters more.  You can’t drop events, you can’t overload and delay things significantly (before you drop them, in many cases!).  Everyone hits send a couple times and things little of it.  When the lights don’t come on or the heat won’t work, we react differently.

The next point is the explosion in mobile devices.  By 2023, Cisco says there will be 8.7 billion, which is well beyond the number of people there are online at that point.  Unless Cisco thinks people will be juggling a couple phones and tablets at the same time, the number of different devices is less important than the number of total devices in use at a given moment, which of course Cisco doesn’t estimate.  Maybe yuppies juggle many smartphones with a video stream on each, but not regular people.

This isn’t to say that the number of devices isn’t important for reasons other than a yuppie arms race.  Multiple Internet-ready devices says that we’re getting more and more dependent on the Internet, and our behavior is so altered by that dependency that we optimize our usage by customizing the devices we select at a given time.  We don’t need 3.6 devices at once, but we need 3.6 (on the average, obviously) to choose from.  And our devices have to be Internet devices; Cisco says non-smartphone market share will fall from 27% in 2018 to 11% in 2023, and I believe it.

Now for 5G, a topic so rife with exaggeration it’s hard to see how Cisco could even move the ball.  Fortunately, they rise to the occasion, but with an unusually conservative start.  They say that 4G will grow from 3.7 billion to 6 billion connections by 2023, which is likely true.  By 2023, there will be 11% “devices and connections with 5G capability.”  Which means what?  Surely, new phones without 5G are going to be rare beyond 2020, and surely most operators in major markets will be deploying new mobile capacity in 5G form, but remember we have 6 billion mobile 4G connections in 2023, out of 6.7 billion by Cisco’s estimates.  That means almost 90% of mobile connections are still 4G in 2023.

But never mind, it’s not the phones that matter.  Cisco says “5G connectivity is emerging from nascency to a strong contender for mobile connectivity driven by mobile IoT growth.”  I sure admire the use of “nascency”, a word even I never considered using, but what exactly is that mobile IoT growth?  Here, Cisco merges the IoT and M2M terms and concepts.  Many (perhaps most) of us have some M2M in our homes, but how many have 5G connections to it?  WiFi absolutely dominates in-home Internet-type connectivity, and most sensors in the home are still wired or use specialty protocols rather than Internet.  The good news is that even Cisco says that WiFi will “gain momentum”.

WiFi also helps buff up the broadband speed numbers, I think.  Cisco’s forecast for a global 20% compound increase in broadband speed, which roughly doubles it by 2023, is pretty hard to reconcile with data on broadband speed from various governments, both in terms of current speed and in terms of rate of growth.   I’ve blogged a lot about the fundamental techno-economics of broadband (demand density and access efficiency), and it’s darn difficult to see how those Cisco numbers could work…unless you made the assumption that the WiFi speed to devices was the thing being measured.  The thing is, if you have 100 Mbps WiFi in the home, connected to 12 Mbps Internet, you don’t have 100 Mbps, obviously.

The biggest issue I have with the Cisco report is that it treats the Internet like some vast vacuum that sucks dollars from all those who have to contribute to its improvement, in order to meet the requirements of the applications Cisco says is driving it.  Operators are clearly cutting back on infrastructure investment, as Cisco’s own earnings call showed.  How do you fund the kind of radical increases in broadband performance that Cisco says will come along?  To quote the book on the Mercury astronauts, “The Right Stuff”, “No bucks, no Buck Rodgers!”  An exciting future has to be paid for.

But the thing I like most about the report is that it shows the bucks are on the table.  There is no doubt that the applications and changes Cisco touts are possible.  The challenge is making them practical, and in the end Cisco’s service to the industry in pointing out the table stakes is offset by its refusal to address the question of those bucks.

The Annual Internet Report is interesting, but it could have been so much more.  I’d like to see Cisco take it to the next level with some economic realism.

The Impact of the Cloud on Enterprise IT

The impact of the cloud on enterprise IT, either technology impact or financial impact, has been a topic difficult to navigate.  The financial impact, meaning whether cloud spending will erode IT spending, is particularly difficult, because it depends on a number of factors where popular opinion seems to diverge from objective reality.  When that happens, I try to model the thing, and that’s what I’m going to talk about today.  One thing the modeling did was convince me that the technology and financial impacts of the cloud are, if not in lock-step, at least somewhat synchronized.

In the early days of the cloud, the popular vision was that “everything” was going to “migrate to the cloud.”  That was never going to happen, nor is it going to happen now.  If it were true, then cloud-provider spending would have risen, and enterprise IT spending fallen, to respond to the shift in hosting location.  The fact that this vision was…well…dumb doesn’t negate the fact that it was newsworthy and simple in terms of predicting market trends.

The truth is way more complicated.  Enterprise IT spending has three components, cloud-wise.  The first component, which accounts for about 75% of current enterprise IT spending, is core business applications that aren’t going to migrate to the cloud for security, compliance, and pricing reasons.  The second component, which is the other 25% of current IT spending, are non-core applications that could just as well be cloud-hosted as data-center-hosted, and whose migration will depend on pricing.

If we stuck with the move-to-the-cloud technical paradigm, two things would happen.  First, we’d top out with cloud spending about a quarter of IT spending overall.  Second, overall IT spending would tend to rise a bit slower than GDP, which doesn’t generate all that much excitement in the tech world or on Wall Street.  What could help us do better?  The third component.

Since I’ve used up all 100% of current enterprise IT spending, you might wonder what’s left to say about the third component.  That component is itself divided.  One piece represents application modernization to accommodate web-portal access to core applications, mobile worker support, and so forth.  The other represents productivity enhancement applications not practical or even possible pre-cloud.  We’re currently in the early stages of realizing that first piece, which represents an increase in IT spending of around 25% from current levels.

The second piece has not yet seen any statistically significant realization, and only a very small fraction of either buyers or vendors even understand it.  Nevertheless, it’s this piece that represents the real plumb, an incremental 70% in IT spending growth potential.  Most of this revolves around the exploitation of context, which I’ve blogged quite a bit about.

These components of cloud impact aren’t the decisive drivers, at least at present, for spending trends.  Enterprise IT spending has always had two elements, one being the orderly modernization of existing resources, and the other being the resources needed for new projects that delivered incremental business benefits to justify incremental costs.  Budget and project money, in short.  The latter grows and shrinks in value cyclically, depending on the introduction of new IT paradigms that can fuel investment.  We’ve not had such a cyclical augmentation since the late ‘90s, by far the longest delay in IT history.

Put in this light, what the cloud’s second productivity-based piece would involve is igniting another of those cycles.  Absent one, what we’re seeing is less a “shift to the cloud” than a complicated cost-based approach to the first piece of cloud impact, application modernization.

When you do mobile/web enhancements to core applications, you introduce the need for a “front-end” piece that should be more resilient and elastic-with-load than typical core applications.  To host this stuff in the data center would involve creating a corresponding elastic resource pool.  Core applications don’t generally need that level of elasticity, so the size of this pool would be small (limited to the front-end elements) and it would be less resource-efficient than the public cloud.  As a result, a fair percentage (a bit less than two-thirds, says the model) of these front-end elements are going to the cloud.

What this has done, and will continue to do, is to cap the levels of enterprise IT spending on data center resources, including servers, software, and networking.  When vendors say that the enterprise is cautious in spending, what they’re saying is that the productivity-project piece hasn’t developed, and the tactical front-end piece is being tapped off by the cloud.  Little movement in the net, and so little or no growth in the budget, and more price pressure on vendors.  This is likely to prevail, says my model, through 2021.

The first force acting to change this stasis is the need to operationalize applications that have both a front-end cloud piece and a back-end data center piece.  This is what “hybrid cloud” is really about these days, and the need for this was first recognized only last year.  Hybrid cloud has shaped the offerings of the cloud providers and spawned data-center-centric visions of hybrid cloud from players like IBM/Red Hat and VMware.  To get full operations compatibility in the hybrid cloud, you need to adopt a container model, Kubernetes deployment, and unified monitoring and management.  This positive force on data center IT spending is already operating, but since it’s not particularly hardware-centric, the impact isn’t easily seen.

The second force operating on IT budgets is the gradual retro-modernization of application front-ends to make them cloud-resident.  My model says that about 40% of web/mobile modernization projects have already resulted in increased data center spending, because they were realized in-house and not in the cloud.  In almost all cases, these early appmod players regret not having taken a cloud focus, and they’re trying to work a change through the budgeting.  This is already having a somewhat suppressing effect on data center spending, because moving the front-end pieces into the cloud frees capacity in the data center, meaning less incremental gear is needed to compensate for demand growth.

The third force, the one that will be the first major spending policy shift, could be characterized as the tail/dog boundary function.  The cloud is elastic, the data center core application transaction coupling to the cloud is far less so, if at all.  As a result, there’s pressure created on the boundary point, the application components just inside the data center.  These, by this point, will be under common orchestration/operations control from the hybrid cloud operationalization step I noted above, but some level of elasticity will now be included, and some data center changes will be needed to support the greater elasticity of these boundary components.  By 2021 this force will be visible, and by 2022 my model says it will have added enough to IT spending to reverse the current stagnation, raising budgets a modest two to three percent.

This is the time when it would be helpful to have that productivity-project benefit piece laid on the table.  The technical architecture to support it should be the same as would be needed to add elasticity to the cloud/data-center boundary, and it’s within the scope of current “Kubernetes ecosystem” trends to create it.

The Kubernetes ecosystem is a necessary condition for our productivity-project benefits to boost IT into another of those cyclical growth periods that filled five decades before petering out in 1999.  It’s not a sufficient condition, because we still need to create that “virtual world” I’ve blogged about.  Further productivity enhancements can only come by continuing the trend that all the prior IT waves were based upon, which is getting IT empowerment closer to the work.  Robotics, AI, augmented reality, edge computing, IoT, and 5G all depend on this single fact.  To have computers and information technology help workers, they have to share a world.  To make that work, we have to create a bridge between us and the world we inhabit, and the information world we live in parallel with.

This is the bridge to greater enterprise IT spending, that potential for 70% spending growth.  On the current side of the bridge is the old model of computing, which centralizes information collection and processing, then distributes it as a set of information elements designed to help workers.  On the future side is a newly created game-like artificial or augmented reality, one that frames work in a way that unites everything from the real world and from the old IT world.  This is what my modeling says would be the real future of both the cloud and enterprise IT, and of course all the vendors and users in the space as well.

This future is also a bit like the old networking “god-box”, the device that did everything inside a single hardware skin.  People quickly realized that 1) the god-box was going to be way too complicated and costly unless the future was totally chaotic, and 2) that since consensus on handling chaos is difficult to achieve, the real market was likely to sort out smaller options, making the box less “god-like” in mission.  What we’re seeing in the enterprise IT space is something like this sorting out, because vendors can’t sell and enterprises can’t buy something that revolutionary.

Well, they had some issues with buying past revolutions in IT as well.  On the average, the industry could only sustain previous revolutions at a rate of one every fifteen or twenty years.  It’s been 20 years since the last one, so in one sense, we’re due.  The challenge is that this virtual-world revolution is a lot more complex.  It’s an ecosystem that envelops everything, or it falls short of optimality.  Modeling, sadly, doesn’t tell me how we could get through it.

Prior IT revolutions happened because a single strategic vendor created a vision, and was willing to take a risk to advance it.  I wonder what vendor might do that today?  I’d sure like to see someone take it up, because an industry generating 1.7 times the current level of revenue in information technology would likely be a gravy train for us all.

Cisco Sees a Sea Change in Networking

In a lot of ways, Cisco is the most important company in networking.  They have the largest strategic influence on buyers in my surveys, and they have the most strident and effective marketing.  They’ve been no slouch in market performance, either.  It’s because of all of this that we need to look hard at their quarterly earnings, and what they said on the earnings call.  They might know something, and they’re surely able to communicate.

Street consensus is that Cisco disappointed, though their quarterly numbers matched or even slightly exceeded expectations.  What concerned Wall Street was soft guidance and unfavorable trends in sales in many areas.  Services did well, but the key product sectors were flat to down.  Service provider/commercial sales were a bit improved, but enterprise orders were down 7% y/y, worse than the previous quarter.

The issue for Cisco, IMHO, is the same as it is for Wall Street.  Networking has been a kind of supply-side darling for decades, where everyone anticipated the value of improved connectivity and deployed equipment to provide it.  Now, service providers are finding return on infrastructure creeping toward zero (or negative), and enterprises have shifted their focus to almost pure cost management.  From “Build it and they will come!” to “Build it cheaper or don’t build at all!”, in short.  Even though this particular shift has been visible for two decades, everyone has been successful in blinding themselves with hope that it will naturally reverse, or someone else will take the time and money to reverse it.

5G is the focus of the current hope delusion.  It’s not that 5G won’t happen, of course; it’s a modernization of the radio network that’s essential for mobile services to continue to grow, and it’s a reasonable (and perhaps compelling) new approach for “wireline”, meaning fixed wireless.  The thing is that we’ve promoted 5G to be such a winner for everyone that the value chain needed to generate the victories has become implausible.

Which leaves networking with no clear driver.  As Chuck Robbins, Cisco’s CEO, said in their earnings call, “we are seeing longer decision-making cycles across our customer segments for a variety of reasons, including macro uncertainty as well as unique geographical issues.”  While he still told the Street that traditional supply-side values will return (“The long-term secular growth trends of 5G, Wi-Fi 6, 400-gig and the shift to the cloud remain and we expect to benefit from them”), Cisco is actually facing the truth internally, I think, and that just might set them apart from the rest of the network vendors.  Telling the Street the raw truth is rarely smart in confusing times, so we have to look inside Cisco’s comments to find just what that raw truth is.

Raw Truth Number One: Cisco is expecting that network equipment will not regain its former glory even three or more years out, and that whatever it does regain will come as a result of a complete transformation of the vision of (and mission of) the network.  Cisco lays this change at the foot of things the Street and media understand (“The broad adoption of multi-cloud and modern application environments is changing how the world’s largest networks are built, operated and secured, and Cisco is at the center of this transition”), but they do understand it internally.

Which leads us to Raw Truth Number Two:  Cisco knows that monolithic network hardware designed around a proprietary model of equipment will fall to open-model networking.  Open-model networking means that software and hardware will be disaggregated, and that open architectures will be available for both.  I blogged last week on the various network models, and while I’d hardly suggest that Cisco agrees with me, I think they see the same basic points.

In an open-model world, you have three choices: commoditize, differentiate in each disaggregated area, or make money on integration more than on product.  I think Cisco is demonstrating it intends to do all three of these things, but with some finesse.

That’s particularly true with the “commoditize” choice.  Cisco accepts commoditization in areas where differentiation is difficult, and so I think it intends to focus on promoting areas where Cisco’s capabilities can shine, and focus on contaminating others’ hopes for differentiation where Cisco itself doesn’t have any.  Last week, Cisco said the US should not invest in 5G companies, which to nobody’s surprise isn’t what Cisco is.  They also announced a partnering with Rakuten to “do the impossible”, and build the “world’s first fully virtualized, fully-automated, cloud-native mobile network.”  Obviously, this marks territory on the turf of Nokia, Ericsson, and Huawei.

5G is important less for the boatloads of cash vendors could gain, than for the influence Cisco could lose.  The real meat of 5G lies in the radio network and mobility management; beyond that, it’s pretty much just a question of backhaul and metro infrastructure.  Cisco doesn’t make the secret sauce of 5G, and so why not change the recipe?  Be a good citizen and focus on open networking where your competitors want proprietary advantage.  That might also force them to defend their own turf, making it harder for them to try the same recipe-change strategy on you!

“In December, we introduced Cisco Silicon One, a first-ever single unified silicon architecture and the Cisco 8000 carrier class router family built on Silicon One as well as our new IOS XR7 operating system. We also announced new flexible purchasing options that enable customers to consume our technology however they choose.”  So, Robbins describes Cisco’s overall differentiation approach.  You start by giving yourself distinctive assets in the spaces you’re going to defend, which is disaggregated silicon, hardware, and software.

Your next step is to elevate the decision; I’ve italicized the key points in this quote: “Now, let me share a brief update on our businesses, starting with infrastructure platforms. As the global leader in networking we believe we are well positioned with our intent-based networking portfolio given the strategic investments we’ve been making. Over the past several quarters we’ve made tremendous progress integrating automation, analytics and security across our enterprise networking portfolio, while at the same time shifting to a subscription-based model.”  Integration is one of the future business foundations Cisco wants to address, so they’ll say “integrating” and they’ll also emphasize their global experience; you can trust Cisco.  Intent-based networking is an umbrella layer to abstract traditional hardware and support evolution, and automation, analytics, and security integration point out the higher-layer requirements and the need for integration.  Bet they spent some time wordsmithing this.

One point that cuts across multiple value propositions is the challenge of addressing the commoditizing trend in enterprise network gear.  Make it cheaper, remember?  Cisco proffers a vision of workplace transformation, meaning productivity enhancement.  “we believe we’re the only Company providing a cognitive, highly secure and analytics-driven collaboration platform, which is the foundation for their workplace transformation.”  How?  “We recently brought to market several key WebEx capabilities which combine context, AI and machine learning to enable our customers and their teams to further enhance their meeting experiences.”  That gets both the PR-glitz buzzwords (AI/ML) and the real issue (context) into one sentence.  Bravo!

The only place Cisco punted in this whole picture is in the area of addressing transformation of the service providers.  There, Cisco is accepting that the open-network model is inevitable:  “we launched Silicon One which is at the heart of these new systems called Cisco 8000 that we launched and we also announced that we would be willing to sell our silicon to go into a white-box or sell it just directly to a customer if that’s how they like to procure it.”

Every engagement to Cisco is marketing, messaging, and I poke fun at them for that, but the fact is that they’re doing the right thing here, and doing it far better than anyone else.  Sure, Cisco would use a megaphone to whisper sweet nothings into a prospect’s ear, but the prospect would hear it for sure.  Cisco competitors don’t bother whispering or shouting, and that’s the on-ramp to Cisco success for this coming, critical, period.

Open-model networking has been available for a decade, and yet it has failed to generate enough of a shift in infrastructure to create a significant digit on an operator’s bottom line.  One big reason is trust, because anyone who recommends an approach hardly anyone takes had better be darn confident that they can trust whoever’s going to convert that approach into deployment.  To quote a financial network service provider, “You gotta understand my position here; If this doesn’t work there’s a hundred banks I can never work for.”  Robbins has the answer: “we’re benefiting from our strong position as our customers’ most trusted partner.”

Is Cisco now unstoppable (again), making themselves a kind of IBM-of-networking that just keeps reinventing itself?  There is still a vulnerability, and that’s the overall challenge of return on infrastructure, the growing focus on cost avoidance.  There are things that both service providers and enterprises could do to dramatically improve return on their network investments.  Cisco has at least taken a positioning step toward workplace transformation for enterprises, but it remains bogged down in the supply-side vision with service providers.  If somebody really grabs onto the higher-layer service points and runs with them, it would compromise Cisco’s service provider dominance worse than 5G RAN focus would.  It could dribble over into the enterprise space too.

“Judge an enemy by its capabilities, not its intentions” is what Norman Schwarzkopf and others have said.  Any of Cisco’s network vendor competitors, and any of the cloud-industry players, have the capability to transform the whole network dialog.  The latter group are better positioned to do it than Cisco.  This is Cisco’s big long-term risk today.

The big short-term risk is that whatever it tries to do to alleviate the long-term issues may have a further negative impact in the short term.  If buyers are holding off decision-making, giving them a harder decision to make isn’t much of a remedy.  However, if Cisco had faced today’s truth a decade ago, they might not have either short- or long-term issues to juggle today.  They’re doing the only thing possible, which is to work to get things right while convincing everyone they’ve done that already.

Time to get out that megaphone, Chuck.

Tasting the Flavors of Virtual Networking

What is virtualization as applied to networking?  What is “open”?  One of the challenges we always seem to face in networking (and often seem to get wrong) is defining our terms.  I’ve noted in past blogs that a virtual, open, network means to some the substitution of open devices or hosted instances of device software for proprietary appliances.  Others think it’s more, and I’m in the latter category.  Could our differences here be nothing more than how we define things?  Let’s look a bit deeper and see, using some examples.

We’ll start with a hypothetical router network consisting of a hundred transit routers distributed through a couple-thousand-mile geography.  Imagine these routers to be nice, familiar, Cisco or Juniper boxes.  Everyone in the service provider network operations space knows how this kind of network works, everyone knows how to put out an RFP to expand it, and if management tools are needed, everyone knows where to get them and how to apply them.  This is the traditional network model.

Suppose now that we find some router software that’s compatible with the same standards as the real devices are.  We then find white-box hardware, not servers, that can host this software, and we build an open router model using the combination.  The network still has the same topology, the same operations procedures, the same management tools.  The only question is the white box selection, and we could assume that white box vendors are going to provide their specifications, including expansion capacity.  Maybe we miss our familiar sales and support people, and maybe we get a little culture shock when our racks of routers have different colors, but we’re OK here.

We get some servers with the right hardware acceleration features, add the software to it, and we create what we could call the router instance model.  If all the interfaces are indeed compatible with the same standards as our real routers, we can apply the same management practices to this setup, but service provider network operations types don’t know how to pick servers and server features, or how to decide what the capacity of one of our hosted instances might be.  Servers, after all, are really not designed to be routers.  We’re a little off into scary-land, but not totally in the wild, and we can always paint our racks blue to calm things down.

Now suppose that we have a range of suitable servers stashed in data centers, including all of the locations of our original traditional routers, but also including other locations.  Suppose that we can dynamically position router instances in any available data center in response to load changes or failures.  This is what NFV proposed to do, and so we could call this the NFV model, based on pool infrastructure and agile deployment.  To make the most of this, we’d have to deal with questions like whether a failure should result in adaptive reconfiguration as it would among real routers, or in some sort of dynamic scaling or recovery, as it could in a virtual-resource world.  We have to think about supplementing our notion of the control plane to address this point.  Old network rules might no longer apply, and so we might have to do some aromatherapy to calm down.

Then, think about a model where we have simple forwarding machines, white boxes or servers, that do nothing but push packets in response to forwarding tables that are loaded by a central controller.  This is the SDN model.  For the first time, we have actually separated the data and control planes, and we now have a completely different kind of network.  The old management models, the old operations models, the tools we need, and so forth, are all likely to change, and change significantly.  If we look around our ops center or network equipment racks, we see a brave new world.  Our grizzled netops people may need a lot more of a morale boost than smelling the pretty flowers now.

For those bold and brave enough, we can then contemplate the next step.  Suppose that we have a black box inside, connecting to our edge routers/devices.  There is no specific structure, no model at all.  We have floating control, data, and management functions described as a set of interface specifications.  These make us an IP network because that’s what we look like, and we have a service level agreement (SLA) to show what we’ll commit to do, but inside we’re as mysterious as that stranger in the shadows.  This is the black box model, where anything goes as long as you can look like you’re supposed to look to users, and do what you’re supposed to do.

Which of these is “virtual networking?”  Which is fully exploitive of the cloud?  We can get a lot by doing some groupings.

Our first group includes the first three models (traditional, open device, software instance), and what characterizes it is that while the nodes in the network change their implementation, the network itself remains the same.  We’ll call this the node group.  We make the same assumptions about the way we’d decide on routes for traffic, and we handle failures and congestion in the same way.  All of these models are device networks.

Our second grouping includes the NFV and SDN models, and it represents a soft group.  In NFV’s case, we soften the topology constraints—you can have nodes pop up or disappear as needed, which isn’t the case with device networks.  In SDN’s case, you have a softening of the functional association with devices—the control plane moves off the forwarding device, and the dumbing down of the nodes could make it possible to have “empty nodes” deployed in a pool, to be loaded up with forwarding rules and run as needed.

Our final grouping includes only the black box model, so we’ll call it the black box group.  The significance here is that only the connections to the outside world—the properties—matter.  This may seem, on the surface, to be either a cop-out on implementation or a box that contains all my other models, but it’s not.  The black-box group says that any implementation that can meet the interconnect requirements is fine.  Thus, this group proposes that a “network” is defined by a hierarchical “intent model” that shows how the various technology-specific pieces (presumably created from one or more of the prior models) combine.

Since the black box group admits current (legacy) network structures, it might seem that it shouldn’t be considered a virtualization approach at all, but I think that it’s the key to how we should be thinking about network virtualization.  We’re not going to snap our fingers and find ourselves in a brave new (virtual) world, we’re going to evolve to it.  It’s almost certain that this evolution will come through the introduction of “enclaves” of new technology, which means that the overall model has to accommodate this.  We need defined interfaces between the enclaves, and we need a mechanism to harmonize enclave-processes into network-processes at the management level.

This universal need for black-box harmonization doesn’t mean we don’t have to understand the value of the other groups under it.  I think that the node group of virtual-network models offers the greatest ease of migration, but the smallest overall benefit.  You save on capex by the difference between legacy device prices and new-node prices, and that’s really about it.  I think the soft group, the group that represents both defining discrete functions rather than devices, and recognizing control/data separation, is really table stakes for network virtualization.

It’s interesting to note that the virtualization models in my critical-focus “soft group” include SDN and NFV, two of the early approaches to the problem.  Given that neither of the two has swept legacy networking aside, it’s fair to ask how they can be as critical as I’m suggestion, or if they are, why they’ve not transformed networking already.  I think the answer lies in that higher-level black-box approach.  Google adopted SDN as the center of its content network, and they did so by making it a black box—surrounding SDN with a BGP edge that provided that critical enclave interface I mentioned above.  SDN and NFV worked on the enclaves without working on the black box, and as a result they missed the broader issues in assembling networks from technology-specific pieces.

You could obviously add in some black-box-layer thinking to both SDN and NFV, and there has been some effort directed at doing that.  It’s always hard to fit a fender into a car when you didn’t start with how fenders and cars related, though, and so there are aspects of both SDN and NFV that would probably have been done differently had the broader picture been considered.  How much tuning might be required to achieve optimality now is difficult to say.

What isn’t difficult, in my view, is to say that the current industry practice of turning functional specifications into a box-centric architecture, as has been done with 5G, isn’t helping us achieve the most from new technologies in networking.  It’s not that 5G specs demand boxes, but that the natural path from the diagrams in particular is to assume functional blocks equal devices.  That specific problem hit NFV, and I think our priority with 5G in general, and open-model 5G in particular, should be to avoid that fate.

The Real Relationship Between IoT and Edge Computing

The relationship between edge computing and IoT is complex, to say the least.  One major factor is that we don’t really have either one of them at the moment, and another is that there are strident claims that each could be the killer driver of the other.  Is this just marketing circular logic, or is where something inside the combination that merits a further look.  To find out, we have to start with some premises to winnow down the complexity of the topic.

The value of IoT lies primarily in its ability to report real-world conditions and relationships into the “virtual world”.  This virtual world is a kind of not-necessarily-visible form of augmented reality, where things that we can sense directly (see, hear, etc.) are combined with things that are known or inferred, but cannot be directly sensed.  It’s the augmentation piece that’s critical here, because it’s the basis for most of the purported IoT applications.

I believe that this virtual world has a center, which is us.  Every person, of course, has a sensory-based view of the real world, and since that view dominates our lives, it follows that the virtual world has to maintain that focus.  There may be a lot going on in the real world a thousand miles away, and similarly distant places might have virtual-world contributions, but those things are important to people (or processes) local to the conditions.  We’re each the center of our real, and virtual, universes, and this is a critical point I’ll come back to.

Most of the value of the virtual world lies in how it supplements real-world sensory information, which changes as we move around.  That’s why I tend to use the concept of information fields to describe how the virtual world is constructed and used.  As we move through the real world, and as real-world things come to our attention, we also move through information fields that represent the augmented information that the virtual world can link to our current context.  Walk by a shop and the shop’s information field intersects with our own, with the combination potentially generating additional information for both us and for the shop.

The content of information fields would likely be somewhat static; it’s our movement through them (our real-world movement) and events from the real world (like calls, texts, web searches) that create shifts.  The current context, our current context, is as dynamic as our lives.  A call, a sound, a sight, a reminder, can all change our context in a moment.  In addition, the richer the augmentation created by our virtual world, and in particular, the extent to which it is directly reflected to our senses (via VR/AR glasses, for example), the tighter the integration we need between real and virtual.

Let me offer an example.  If we suppose that an AR display is showing us where something, or several somethings are, by superimposing a marker or label on our field of view in the proper place.  As we turn and move, that proper place changes position in our field of view.  If our label/marker doesn’t track it, it’s jarring.  Anyone who’s used both a DSLR with a direct optical viewfinder and a mirrorless camera with an optical viewfinder (OVF) knows that if you turn your body and camera with both setups, the OVF will “lag” the turn just enough to feel uneasy.  Same with AR.

This is where edge computing comes in, at least potentially, and perhaps (gasp!) even 5G.  If we were to assume that because everyone’s context is self-centered, we could also assume that it’s reasonable to assume that context-hosting would be done local to each user.  Let’s call what does it our context agent.  The context agent creates the virtual world and then delivers it (selectively) to us.  If the agent were hosted at the “edge”, meaning in a place with a very low-latency path between us and the context agent, we’d reduce the risk of that annoying and possibly dangerous lag.

Where, though, is that edge place?  One obvious possibility is that it’s in the possession of each of us, our mobile device.  Another possibility is that it’s located where our cellular signal terminates—the classic definition of “the edge”.  Other locations between us and the “RF edge” or deeper inside are also possible; it would depend on the sensitivity of our virtual-world applications to latency and the cost of hosting the context agent and supplying it with information.  That depends on the balance and time sensitivity of the information flow.

IMHO, self-drive vehicles are a perfect candidate for a locally hosted context agent.  A car is big, so there’s no difficulty finding a place to put one.  The cellular network can extend an information field to the car-resident agent, and that agent can then use something local to link to the driver, including direct visualization on a console or on a cellphone via Bluetooth.  A car-resident agent makes sense because at highway speeds, a millisecond is long enough to travel an inch, and a couple of these is enough to make the difference between hitting someone/something and not.  The edge, I think, is not going to drive the car, the car itself will…or will guide us to do it.

This doesn’t rule out an edge computing element in self-drive, though.  Self-drive applications are perfect use cases for the notion of layers of edge, based on information fields and movement.

We are the center of our context.  Might it also be true that shops or intersections, or other physical places, have a context?  Visualizing information fields as emanating from a context is a helpful approach, and in our example of self-drive, you can see that a car moving through a series of intersections could be seen as moving through a series of information fields created by the intersections’ contexts.  Now, instead of cars figuring out what each other vehicle is doing, they interact with a context agent (from the Intersection) that figures all that out and communicates it as an information field.

Shops and intersections don’t move, of course, which makes their context more one of assimilating what’s moved into their fields.  Like personal contexts, you could host them in edge facilities, deeper, or even out to the context source—a shop might have a practical hosting mission for its own context, just as a vehicle would.  Things like intersections would likely not have a local context host, so edge hosting would be a logical assumption.

I hope this all shows that that “IoT” or “edge computing” or even “5G” aren’t going to pull each other through.  What pulls them all through is the combination of a mission and a model of the application at the high level, an application architecture.  It’s not difficult to define these things (I just did, at a high but useful level, here), but I think that proponents of the various technologies want the technology deployments to come first, and then people rush around figuring out cool things to do with everything.

We aren’t likely to get that, and certainly we’re not going to get it any time soon.  The problem, of course, is that when you have to define an ecosystem, who does the heavy lifting?  There are hundreds or thousands of technology pieces, and procurements, involved in the sort of thing I described, a major task selling all the stakeholders, and a similarly major problem selling regulators.  It’s easy to understand why nobody would want to do it, but that doesn’t mean that the edge, IoT, or 5G market can ever reach optimality without it.

We actually have the technology to create something like I’ve described.  Some will argue that this is a good reason to support the “Field of Dreams” model, to hope that somebody will spread IoT sensors around like Johnny’s apple seeds and that edge computing will fill all the vacant spaces in or near cell sites and central offices.  Surely, if somebody did those things, we would in fact get a model something like I’ve described.  Who, though, will fall on their capex sword?  I think it would be easier for a player like Amazon or Google to simply assemble the architecture, at which time we might actually find some business cases out there to justify deployment.

How “New” is the Newest Technology Publication?

Remember when I asked if we needed a new tech news site?  Well, we got one.  Protocol launched on February 5th, and it certainly looks different from the mainstream tech sites.  The question, which only time will answer, is whether it’s really offers not only more than we can get now, but offers what we need now.

The tag line for Protocol is “A new media publication focused on the people, power and politics of tech.”  Since this is coming from the publisher of Politico, a politics site I read every day, the emphasis isn’t surprising.  Is Protocol something that aims, in the end, to be a kind of tech-industry gossip column, or does it actually intend to serve a need?  If the latter, what need does it serve and does it look like it’s going to succeed?

Perhaps not surprisingly, Protocol looks a bit like Politico, and Source Code (a section also available for email delivery) looks a bit like the Playbook newsletters on Politico.  Source Code is news snippets like Playbook, but if it’s to set itself apart from the current crowd of technical publications, and fill a need in tech overall, the rest of Protocol has to be more insight, the depth Politico offers.

There is a need in tech for context and insight, and that’s because we need to both energize a mass market and educate an elite group, at the same time.  Ad sponsorship tends to push material toward the mass market, and so we don’t get to the information that the real movers and shakers need to have, and that’s hurting us as we try to advance networking and IT to the next wave of growth.  Buyers of network and IT products today tell me that they’re aware of new developments, but have a problem fitting them into an organized plan.  It wasn’t always that way.

In the early ‘80s, the total circulation of the best of the networking rags of the time (BCR, Data Communications, Network Magazine) was roughly equal to the number of true decision and planning professionals in the industry.  These publications talked their language.  Today, everyone with a phone or a computer makes technical decisions, and this mass market now dominates advertiser thinking.  Extensive detail on something new, such as cloud-native, doesn’t generate the clicks that some flashy story does, and so we end up with more flash than substance.

We can’t move to complex things like cloud-native or services composed from virtual features by exploiting the average.  The smartest people in the industry are coming up with good new stuff, and they need to be able to communicate it effectively to the best and brightest on the buyer side.  If I want to each quantum physics to a company, I don’t hold a mass class in the auditorium, I get the people with physics training together in a classroom.  Hopefully that elite group will exploit the technology in such a way as to support a mass-market impact.

Can Protocol give us an ad-sponsored site that can somehow build knowledge and insight in those who need it most, and on whom we’ll rely in building our tech future?  Let’s look at some of the early material to see.

In the first issue, Protocol covers an important topic, saying that the US (and in fact the networking industry) has been looking for an alternative to Huawei for 5G.  The main focus is to say that industry interest in an open-model networking solution to 5G is now getting government support.  It’s useful, but it could have been better.  I did my own take on this topic in my blog yesterday, by the way.

The piece says that “no American company is set up to compete head-on with Huawei in the 5G infrastructure business,” and that’s not really true.  Huawei doesn’t have any technology magic; their advantage lies in pricing.  Network equipment vendors have difficulty making a profit if they cut their prices to match Huawei, and that’s been true from the first.  Everyone knows the reasons Huawei’s critics give for Huawei’s price advantage, but whatever is true, cheapness sells.  Startups in the space have the same problem; no VC wants to fund an entry into a commoditizing market.  The only solution is open-model 5G.

Open 5G is only an extension of open-model networking, which is the combination of commodity “white-box” hardware and open-source software.  Everything in 5G is really part of either “network equipment” or the 5G New Radio (NR) space. We’ve had the for almost a decade now.  The Open Compute people have network device specs, and there’s a number of open-source projects on the software side.  The OpenRAN initiative is in the progress of giving us the latter.

We’re not 18 months from a solution here; parts of it are already deploying.  The real challenge for open-model networking is credibility.  Who stands behind it?  Who’s to say it will advance in the directions needed?  Who integrates all the pieces?  What we need for open-model 5G to work is first a credible OpenRAN model, which I think we’ll have by year-end, and a credible set of integrators, which may be harder to get.  Will operators pay for integration when they want 5G on the cheap?  To me, it means that open-model 5G supported by one of the major cloud vendors is the only answer.

The second issue offers another piece that’s interesting but not as insightful as it could be.  A Google spinoff, Replica, is offering statistical information on the movement of people and vehicles to urban planners, by using cellphone location information.  The article points out that the wealth of information available is actually intimidating and confusing to many planning boards, and that the future of the company is uncertain because the value proposition isn’t clear.  All true, I think, but it only gets tantalizingly close to the key point.

All of advertising, all of augmented reality, all of productivity enhancement, depends on the ability to create, for each of us, a parallel and contextually linked online universe.  The real world has to be known even to an AI process that wants to exploit or augment it in some way.  What Replica is showing is that it’s possible to know a lot about the overall movement of people, and by inference the movement of any given set of people, through cellphone location data.  This is somewhat helpful for urban planners, but how much depends on just what the knowledge could be used for.  In terms of providing my “information fields” in this parallel online universe, it’s critical.  Nothing matters more in contextualization than the physical location of things.

This truth ties into a parallel truth, which is that there’s probably nothing more sensitive than that location information.  The big wireless providers have been accused of selling user location data, and the selling or even using of the location of any specific person is certainly a major privacy risk.  Suppose, though, that you construct a set of services that instead of providing the raw (and dangerous) data, provides insights from it.

There are hundreds of things that could be done with that, from real-time traffic avoidance to helping friends meet.  In the former case, you don’t care who you’re about to collide with, only that collision is imminent.  Anonymized data is fine.  In the latter example, if two people agree to share location information, then services that use both locations to facilitate a meeting (or even avoidance) is also fine.  It’s easier to manage oversight and permissions through service abstractions than to make IoT elements themselves aware of the need.

Replica could be an example of a critical step forward for so many of the technologies we’re trying to promote that the list would be as complicated as Replica’s urban planning value proposition.  It’s this broader utility that makes the story important, and this is what I’d have liked Protocol to have pointed out.

My final example is very pertinent to the cloud-focus I mentioned earlier in this blog.  Protocol ran, on Friday of last week, a piece on “What earnings reports tell us about the state of the cloud”.  Those who read my blog know I regularly analyze these documents for insights, and I think they’re a valuable source.  In fact, I analyzed some of the same things that Protocol did, but I think my take was rather different.  I’m not saying that I’m right and they’re wrong, but that I tried to get under the surface facts, and I think they missed the boat.

The Important Truths about cloud computing are that 1) very few enterprises will ever move totally to the cloud, 2) that hybridizing public cloud services and data center applications are the real future, and 3) that we’re still in search of the broad software architectural model that’s needed for that.  I don’t think any of these are captured, and they may even be contradicted, in the piece.

“Cloud computing is still on the rise and starting to eclipse traditional enterprise technologies” is the thing that Protocol says is the takeaway from earnings.  The truth, I think, is the opposite.  Cloud computing is changing because it’s being fit into traditional enterprise technology, via the hybrid cloud.  That’s what’s driving an increase in cloud adoption.  The cloud is adapting to us, not the other way around.

How that adaptation is working, or should be, is our key cloud challenge.  We’re too fixated on the idea that stuff is “moving to the cloud” when the key to cloud growth is what’s being written for the cloud.  The highly visual and user-interactive pieces of applications, particularly the mobile and web front-end piece, are “new” development, and the cloud is the perfect place for them.  The traditional transaction processing and business reporting that’s still the key to enterprise IT has stringent security/compliance requirements that the cloud doesn’t meet (and may never meet), and the pricing model of the cloud would make moving these apps impossible without significantly increasing costs.  Accommodating both these points means creating explicit hybrids.

Microsoft, who has long had an enterprise data center presence, got the message a bit faster than Amazon, and Amazon has been focusing on expanding support for new cloud development (in that front-end piece) rather than on hybrid cloud.  It’s paid off for Microsoft, at least somewhat, but it won’t fully level the playing field with Amazon until Microsoft can present the hybrid application model framework that would govern the hybrid cloud overall.

The article also infers that enterprises’ turning to SaaS is a big factor, and that includes things like Microsoft’s Office 365 and various online video collaboration services.  The former is a better example of the software industry’s move to subscription services than an example of SaaS success, IMHO, and the latter is natural given that collaboration is a natural fit for a service feature, given the scope of collaboration, the specialization of needs by application, and the variability of usage.  Enterprises are interested in SaaS, demonstrably given Salesforce’s success, but primarily for applications peripheral to their core business, and the core business drives tech policy and spending.

Where does this leave us with Protocol and its mission, then?  I had a brief exchange with one of the editors when the publication was first announced last year.  In it, I said “What I think is most important in tech coverage is context.”  He responded “I completely agree — this is one of the things we want to do really well, making sure we try to tell the whole story instead of tiny pieces of it.”  So far, I don’t see that happening in the stories.

News with context is actionable insight, something we surely need in tech these days.  News without it is glorified gossip.  I don’t think Protocol has fulfilled its promise yet, but it’s early yet, and they’re still finding their legs.  I’ll be keeping an eye on things, and I may change my mind if they change their approach a bit, and if so, I’ll blog on the topic again.