Can NFV Rise Above vCPE to Reach For the Carrier Cloud?

Many vendors have found hope in NFV opportunity, including network vendors, software vendors, server vendors, and chip vendors.  At VMworld the CEOs of Dell and VMware held a kind of NFV love-fest, and Intel has long been promoting NFV for the obvious reason that hosting anything consumes hosts, which consume chips.  At the same time as all of this is going on, though, we hear that operator projects in NFV have largely focused on premises-hosted vCPE applications.  Are these going to evolve to “real” NFV, or are all these vendors dreaming?  Do they even have to?  Could NFV be a “success” if it never goes beyond virtual CPE?

Let’s deal with that last question first.  Operators’ consensus on defining NFV success is that it would have to improve their overall profit per bit by 10% or more.  Except for business-service-only players, that clearly cannot be achieved with the business-directed vCPE services that are the current priority.  We have to get beyond those, and the question is how we do that.

NFV is about virtualizing network functions, meaning extracting features from dedicated appliances and making them available in software form so that they can be hosted on something.  The original NFV model was focused on hosting in the cloud, or at least on virtualization-equipped data centers of some sort.  “COTS”, meaning commercial off-the-shelf servers, as the hosting point could credibly lead to carrier cloud deployments that my models have forecast could add over one hundred thousand new data centers worldwide.  That’s the kind of opportunity that would engage Dell or Intel or HPE or anyone with a server business.

The challenge up to now has been the limited number of business prospects suitable for vCPE.  While you can credibly host any VNFs in a cloud data center, the vCPE initiatives to date have largely focused on business buyers with Ethernet access.  There aren’t enough of these opportunities to create a big demand for hosting or drive deployment of a hundred thousand data centers.  In fact, one of the reasons vCPE has been popular as an application of NFV is that it doesn’t require data centers at all.  Operators like the idea of putting a general-purpose device on the premises, a kind of mini-server-CPE box, and then deploying the software in it.  The benefit is less economy of scale than it is agility in adding features for managed services.  But that agility value is hard to put a number on, and it doesn’t build masses of data centers that consume servers and chips.

Operators who love the idea of vCPE will generally admit (or at least have admitted to me) that these applications don’t seem to lead to a carrier cloud.  A very few hope that the repertoire of VNFs that could be hosted will expand, but they don’t have convincing candidates for the expansion or market data to validate the opportunity.  Some think that consumer-level vCPE might get them there, but the benefit of deploying cloud-hosted access features to consumers when a typical broadband hub costs less than fifty bucks is limited.  Particularly when you still need home termination of the broadband connection and WiFi.

If vCPE is going to build carrier cloud, it would have to extend to the consumer, and that extension would obviously depend on having a large number of new service opportunities that would justify cloud hosting rather than edge hosting.  Most operators say that things like home monitoring could help, but marketing these against established incumbents is a challenge, and if you need hosting you’d have to deal with the first-cost question, which is getting the hosting out there to fulfill the opportunities marketing creates.

This is why most operators believe that it will take something other than vCPE to drive carrier cloud.  What that could be divides them somewhat, but mostly in terms of the priority given to each option.  The most obvious and most credible is mobile infrastructure, particularly the way that infrastructure would change leading up to 5G.  IoT ranks second, and the generic hosting of application components ranks third.  Let’s look at them in reverse.

Application hosting (meaning offering cloud services) has been attractive to operators for the carrier cloud for almost as long as cloud computing has existed.  Verizon tried it, and by most accounts failed, and nothing much has really changed.  The cloud is about marketing, positioning, and brand.  Verizon shot behind the duck, or ahead of it, depending on what you thought the opportunity was.  Most operators still yearn for cloud revenue, but they still seem uncertain as to how to get it.  That makes the cloud computing driver the least dependable in terms of building infrastructure.  However, it is a logical way to extend vCPE services if you could get that started, which is why it makes the list of drivers in the first place.  My model says you can’t drive an initial deployment with cloud services.

IoT is the most interesting and probably compelling of all the drivers.  My model says that IoT alone could drive a deployment of a hundred thousand or more data centers, making it the only driver of carrier cloud that could stand on its own.  If it could stand at all, that is.  The problem with IoT is that it’s a poster child for what’s called a nascent opportunity.  IoT is to carrier cloud what a whiff of perfume is to a walk down the aisle.  It’s a major commitment in the long term, but in the short term it’s just a vague promise.  There are so many things that would have to come together to make IoT a driver for carrier cloud that the combination looks unlikely in the near term.  In the long term, it rules.

My current model says that mobile infrastructure would likely add no more than about 30,000 data centers worldwide if taken alone.  That doesn’t get us to nirvana, but it would be enough base deployment to facilitate other applications’ use of the data centers, which could then promote those applications, and that would bootstrap us to the necessary level of deployment.  If you believe in 5G, which I do for “arms race” reasons, then it’s going to happen in some form.  The trick would be making sure that the form that happens is a driver for carrier cloud and not just an abstract change in the RAN.

One other thing to consider is the combinatory value of the carrier cloud computing services and IoT drivers, particularly if they’re combining with a consumer target and maybe even consumer vCPE.  Home control, after all, could be framed as an IoT application as long as we don’t get religious about demanding all the sensors be directly on the Internet.  It’s not a major step from home control to home financial management, home photo management, and so forth.  Thus, those applications kind of have a foot in multiple doors.  Could you target them collectively?

You could.  Verizon rolled out FiOS in a cherry-picking way, focusing on the areas where they had the highest probability of earning early return on infrastructure.  You could do carrier cloud the same way, providing that you had credible services with provable opportunities and that you had a very strong marketing plan to promote them.

All my modeling and conversations with operators converge on the point that to make NFV successful, to make it into something other than a niche approach to business services, you have to build out carrier cloud at enough scale to enable a cascade of other applications that can exploit but not justify the cloud.  So doing that is critical, and it’s going to either mean betting on 5G and mobile infrastructure or constructing a more complicated service set around a combination of cloud hosting and IoT.  That’s why I’ve been saying that I think 5G may be the critical NFV driver.

That would seem to spell success for players who are incumbent in the mobile space, but operators tell me that none of these mobile incumbents are really swinging for the carrier-cloud bleachers.  Instead, they’re bunting by aiming at very limited mobile missions.  The opportunity is still there for any of the full-spectrum NFV players to step up and claim the space.  Which might mean claim the market.

Can Rackspace Reinvent Itself as a Private Company?

Rackspace knows a lot about the cloud.  Maybe they know more than the pundits do, and very possibly more than the consortium of investors (led by Apollo Global Management) that are taking them private.  The question now is whether they know more than those who think that the managed cloud services space is a great, latent, independent, opportunity.

For at least two decades, the world of technology has been driven far more by hype than by reality.  Arguably this started with the Frame Relay Forum, which was essentially a marketing promotion activity hiding under the label of “standards group”.  In most cases, including that of frame relay, there was at least some real substance underneath the covers, but the hype tended to blur things so much it was hard to ferret it out.  The cloud is like that today.  There is no question that the cloud has substance (no pun intended) and in fact it could develop into the IT advance of the age.  There’s so much hype, though, that the market often responds to that more than top opportunity.

Rackspace was a hosting company when the two-decade hype-dominating period I just referenced began.  It transformed itself into a cloud company as the cloud wave exploded, but here I think the hype started to catch up with it, and the latest going-private step is just the latest reaction.

Being a cloud provider of any sort is putting yourself in a vice.  On the one hand, cloud services are attractive to the extent that they’re cheap—cheaper than traditional purchased IT devices.  On the other hand, you have to make a profit to stay in business, so you have to sell the cloud for more than it costs you.  If your clients are big companies, you can expect their own economies of scale (which rise according to an Erlang C curve, meaning it rises slowly, then more quickly, then plateaus) to nearly match yours.  How do you respond?

Managed cloud services, meaning professional services offered to facilitate cloud adoption.  These can admit a company to the sacred field of cloud computing without requiring them to invest in massive infrastructure in competition with others.

Professional services and managed cloud could work, but it faces an early and serious challenge—marketing it.  You can’t sell managed cloud door-to-door like it was lawn service.  That’s particularly true with SMBs, because the SMB market is a very large number of prospects with a small sales value each.  Imagine knocking on a couple million doors.  No, the customer has to come to you because of marketing, and where Rackspace fell down was in its ability to stay in the public eye.  Amazon got all the good ink, and companies like IBM were able to leverage their brand and account presence.

Well, that explains why the Rackspace strategy didn’t take off.  Their stock peaked in early 2012 and has generally trended downward as “real” cloud momentum built.  But taking a company private is a bet that they’re seriously undervalued and you can turn things around.  Is that true?  Actually, it might be.

Like most tech companies, Rackspace has undervalued marketing.  The hype around things like the cloud makes it seem to vendors that there’s some great ecosystemic tidal wave sweeping buyers toward inevitable adoption, and nothing need be done but toss a windmill in the stream and use it to drive a press to print money.  “Build it and they will come,” as the saying goes.  Had Rackspace been smart in marketing they could probably have built a business to rival Amazon.

That was then.  What about today?  Well, the truth is that the cloud is kind of old news.  Every day, reporters have to write stories people will click on.  “The Cloud Saves You Money” might have worked 20 years ago, but it’s been said too many times now to generate much interest.  If the only thing that Rackspace and its new investors do is better marketing against the traditional cloud messages, I don’t think they’ll succeed.

Obviously, then, there would have to be other things that might make this deal sensible, and there might be two of them.  One is the systemic change to cloud usage that we’re now starting to see, and the other is the potential entry of a bunch of new cloud aspirants, some of whom might be happy to do some M&A.

IaaS stinks as a business model, and even Amazon knows and is proving that.  There’s minimal differentiation possible, and because all you’re doing is hosting a VM you’re not displacing much in the way of opex.  The problem is that PaaS in the true model (Azure, for example) isn’t as versatile in addressing opportunity.  What’s emerging as the alternative is creating an ad hoc PaaS by adding a set of web services to IaaS.  These web services, accessible to applications that are built to link to their APIs, can provide horizontal or vertical tools to build cloud-centric or cloud-specific applications.  Because customers pay for their use, they add to provider revenue and that’s a direct benefit.

The indirect benefit for a managed-cloud provider is that the web services can facilitate application-building for customers, for specialized VARs and resellers, or even for the cloud-MSP itself.  For example, Amazon and Microsoft, both of whom offer these services in their clouds, include tools to build IoT applications.  These kinds of tools could be expanded to provide a much bigger slice of application functionality, which would make them even more attractive to VARs or SMB users.  From them, Rackspace could even build a kind of IoT shell application that could be customized as needed.

The challenge for Rackspace, as a “cloud services provider” rather than a cloud hosting company, is that they really don’t have a great place to put such services.  They do have servers for hosting, but not the kind of structure optimal to host web services.  They resell other providers’ public cloud services.  That reduces their chances of profiting directly from hosting web-service enhancements or specializations like Amazon or Google or Microsoft.  Thus, the goal has to be developing super-skills that value you highly.  Those skills are credible only if we step beyond IaaS to web services.

However, leveraging skills doesn’t always provide a fast path to profit.  The challenge of marketing remains, and so does the challenge your own cloud-host partners could pose if they decided to get into the same space.  Which, if it’s a good one, they should.  That’s what raises the second issue, of selling yourself off.  To exit going private, you can either go public again at a profit, or sell out.

The telcos are the only players in all the world who actually have a path to building massive distributed clouds that could fully open the opportunity for new-style cloud applications.  They’d be crazy not to want to exploit those kinds of applications to build value on their clouds, which is why I think “carrier cloud” is the real goal for telcos and not “SDN” or “NFV”.  Exiting to a telco could be a very smart play for Rackspace, providing that Rackspace builds the essential skills to validate the carrier cloud—then sings the song beautifully and builds brand recognition there.

Here, the problem is that telcos already have favored integrators—the integration arms of their big equipment providers and particularly Ericsson, who has been almost a professional services company with limited equipment sidelines.  Why would telcos want to buy a telco professional services company when so far they’ve been content to hire one?  Rackspace would have to answer that question the only place it can be answered—in the media via marketing and positioning and singing pretty.

That brings us full circle, doesn’t it?  Rackspace didn’t push themselves effectively, and still doesn’t.  They now need to develop a new asset to push, and expertise in “fog-cloud” applications would surely qualify.  They can then position that new asset and make themselves attractive as a target.  All of this is possible, but it’s going to take a lot of enlightenment to drive it.  Do they have that?  Look at Dell, who went private.  With carrier cloud the largest single server opportunity out there, has Dell been revolutionary in their offerings or positioning?  Not by my standards for sure.  Can Rackspace do better?  We’ll have to wait and see.

What, When, and How to Use T&M With SDN and NFV

There have been a number of online articles recently on the relationship between testing and monitoring and NFV.  In the CloudNFV project I headed in 2013 to early 2014, I did some work defining these issues in more detail, and though the results were never incorporated in the PoC they do offer some input on the issue.  The most interesting thing is that we have to review our entire testing, monitoring, and management practice set in light of virtual-network technologies.

One clear truth about T&M in a virtual age is that just as virtualization separates the abstract from the real in a clearly defined way, so it separates T&M tools and practices—or should.  There is “Service T&M” and “Resource T&M”, each focusing on a specific area and each with a specific mission.  The focus and mission differences dictate the technology implications of both.

Service T&M, since it lives at the service layer, lives above the abstract service/resource boundary, or in a sense on the “outside” of the intent models, where only guaranteed behaviors are visible and not how they’re fulfilled.  Obviously Resource T&M has to focus on the real resources, and so should always live “inside” the intent models.  Put another way, Service T&M tests and monitors virtual/functional things and Resource T&M actual things.

A complication to both kinds of T&M is the boundary point.  It’s not unreasonable to think that Service T&M practices would eventually lead to an intent model boundary, and to penetrate it you’d have to know service-to-resource correlations.  Similarly, it’s not unreasonable to think that tracing a resource issue might lead you to wonder what service impacts it was having.  We’ll deal with boundary-crossings in both directions.

Let’s start with the easy stuff.  Resource T&M, being aimed at testing real paths and devices, is at one level at least similar to or identical with traditional T&M.  In my consideration of T&M for NFV, my conclusion was that the only area where NFV (and, in my view, SDN) differed in Resource T&M was the introduction of a new class of resources—the hosting sites or servers.  These are the foundation points for NFV, and they also represent a complexity in the boundary-condition problem.

If a service like VPN is viewed from the top (outside the intent model) it can easily be represented as a single virtual router.  That’s true with most network-connection services.  Similarly, a chain of virtual functions that make up vCPE could be visualized (you guessed it!) as “virtual CPE”.  However, inside-the-model view would be (in both cases) a bunch of VMs or containers likely linked by SDN paths.  The transition from service to resource is not obvious.

The solution I proposed in CloudNFV (and ExperiaSphere) was to have a “soft” boundary between the service and resource layer, where instead of having a bottom-level service intent model decompose into resources directly, it decomposes into virtual devices.  A resource architect can then formulate how they want to expose virtual devices for composition into services, and however that happens the relationship between the virtual devices and the real resources is set.  Orchestration and decomposition then take place on both sides of the boundary, but driven by different but related missions.

Service T&M is more complicated, and the top reason for that is that it’s far from clear whether you actually want to do it.  Everyone used to crank the engine by hand to start a car, but we don’t put cranks on cars any longer.  The point is that many of the old-line ways of managing networks and dealing with problems don’t really relate to the way we have to do it in an age of orchestration and service automation.

Operators themselves were (and still are) very conflicted on the role of T&M, even at the Resource level.  The trend seems clear; they would rather manage resources against a capacity plan based on policies and perhaps on admission control—the mobile model, in short.  If that’s what you’re doing, then you probably don’t want to do specific resource testing much, if at all.  On the service side, the issue is a little more complicated because it’s largely driven by the fear of a major enterprise client calling and shouting “What do you mean all your lights are green?  My &#$* service is down!” when they have no way of “testing” or “measuring” just what’s going on.

After some deliberation, my own conclusion was that service-layer T&M should really consist of the following, in order of value and lack of complications:

  1. The ability to trace the binding to resources by tracing the state of the intent models from service down to the resources. Any inconsistencies in state could then be determined.
  2. The ability to employ packet inspection on a path/connection and gather statistics.
  3. The ability to “see” or tap a path/connection, subject to stringent governance.

In the model I believe to be workable, the central thesis is that it’s useless to test a virtual resource; you can’t send a real tech to fix one.  The goal, then, is to validate the bindings by looking at how each intent model in the service structure has been decomposed, and establishing whether the state of each is logical given the conditions below.  For example, if a model representing a virtual router shows a fault, then the higher-level models that include it should also be examined for their fault state.  This lets you uncover problems that relate to the transfer of status in a virtual-network model, before you start worrying about resource state.

At the resource level, the tracing of state can link the status of the intent model that represents the resource (in my example here, the virtual router model) to the status of the resources that were committed.  This could involve a dip into a status database, or a direct query of the management interface of the resource(s) involved.

The packet-inspection mission has interesting SDN/NFV connotations, for “inspection-as-a-service” in particular.  Inspection is a combination of creating a tap point and then linking it to a packet inspection engine, and both these could be done using virtual elements.  Any virtual switch/router could be made to expose a tap, and once there is one it’s not difficult to pipe the data to a place where you have static instances of inspectors, or to spawn a local inspector where you need it.  You could extend this to data injection without too much of a problem, but data injection in today’s network protocols has always been more problematic; it’s easy to do something that creates an instability.

In my view, based on some real experience, I think that any discussions on SDN/NFV T&M that don’t focus first on binding are a waste of time.  In SDN, you need to know how a route is constructed from successive forwarding instructions in devices.  In NFV you need to know where something is hosted and what connection resources are used to connect in the pieces.  I believe that if service models are constructed logically, the models themselves will provide access to the information you need to trace functionality, and little more will be required.  Where it is required, then the packet-inspection-as-a-service approach can supplement binding tracing as needed.

If bindings are important, then service models and the nested-intent-model approach are critical.  The state of a service today is directly related to the state of the devices that make it up.  Whether the service of the future is built from virtual functions, virtual devices, or virtual routes, the same dependency will exist then.  The most logical way to determine the status of a given intent model is to look at the state of the things underneath, what it decomposes to, and continue that downward progression until you find the problem.  If you can’t do that, they you might as well throw darts at a network resource map and look where each one lands.

But let’s get to the top of the issue.  All of this, IMHO, demonstrates how convincingly virtualization technology changes network operations and management, or should, or perhaps must change it.  Nobody should doubt that virtualization is more complex than old-fashioned fixed assets.  That additional complexity will swamp any capex benefits in opex unless we’re very careful with service automation.  T&M as we know it is irreconcilable with service automation; you can’t remedy problems with low-touch opex practices by touching them.  However, those who want to practice them on real resources can continue to do so, perhaps as a last resort.  What we should be worrying about is reconnecting the automated analysis of service/resource behavior at that “slash point”, the boundary that virtualization will always create.

Could it be that the biggest opportunity in SDN and NFV is one that’s related to doing the kind of deep-thinking-and-heavy-lifting stuff that’s essential in framing virtual technologies in profitable service practices?  If so, then I think that the modeling and binding approaches are the most critical things in any of these new technologies, and ironically the least developed.  I looked at the major vendors who can make a business case for broad deployment of NFV, for example, and found that of the seven who now exist, all have either incomplete modeling/binding approaches or have recently made changes in theirs.  Yet these should be the top of the heap in terms of software design and evolution.  We still have a lot of catching up to do.

Metro-Networking in the Fog and Optical Respect

If the future is as “foggy” (yes, I mean in the sense of being edge-distributed, not murky!) as I have suggested, and if networks have to be adapted to the mission of “fog-connection”, then how does this impact metro networking explicitly?  In particular, would this new mission create opportunities for the optical vendors, and could it help them “get more respect” as I’ve suggested they need to do?  Let’s look at the way things seem to be developing.

A rational fog-distributed model would be a set of CORD-modeled data centers linked with a dense optical grid that offered essentially inexhaustible capacity.  This model would coexist with current metro infrastructure, and logically you’d expect it to share metro connectivity with that infrastructure for the period of evolution to the fog model, which could last for five to seven years.

The connectivity mission for next-gen metro infrastructure, then, would consist of four services.  First, DCI-like service between fog data centers, for the purpose of connecting the resources there into uniform virtual pools.  These could be “service chains” or links between virtual functions or horizontal connections in carrier IoT or cloud computing.  Second, SDN trunk/path connection services created between fog data centers for the purpose of building connection services at the retail level.  Third, traffic from legacy sources in the same fog data centers, not part of the actual fog infrastructure, and finally wholesale or bulk fiber connectivity sold to other operators or large enterprises.

I’ve listed these in order of their importance to operator planners, most of whom see “carrier cloud” missions as their primary revenue goal.  Operators a decade ago had great hopes for cloud computing services, and even though most now believe these will be harder to promote and less profitable to offer, they still realize that some form of service-hosting revenue is the only credible way for them to boost their top lines.  So you can take the list, from the operator perspective, as a model for how to transition, or fund the transition, to metro-cloud deployments.

The most important point about these service targets is that the mixture of needs and the evolution they represent are the main reasons why optics could hope to escape plumbing status.  If the future is going to be fog-connection, then you can run glass right to the SDN- or service-layer devices and skip an independent optical layer.  At the least, this could reduce the feature demand on optical equipment to the point where low-price players would be the only winners.  If, for example, we had only service missions one and three above, we could satisfy the need for optical transport with minimalist add-drop wavelength multiplexing.

That one-and-three example also illustrates an important point for the optical space, because if you want to get more out of the market than carrying the bits that empower other vendors, you have to somehow climb up the food chain to the services.  Mission four, which is direct optical service fulfillment, isn’t the answer either.  The margins on these connections are very limited, and in many cases operators actually fear selling metro transport for fear it will empower competitors and generate regulatory pressures on pricing practices.  Mission three is just merging in legacy, so that means that elevating optics has to focus on missions one and two.

If, as I’ve suggested in an earlier blog, the metro area is destined to be an enormous virtual data center, then somehow the abstraction/virtualization/orchestration stuff has to be handled.  Missions one and two are network-as-a-service (NaaS) missions, with the services delivered to software residents of the fog data centers through available computer-network interfaces and conforming to the specific feature, service, and application missions those software elements support.  To see how that could work, we have to look at missions one and two in more detail.

It’s tempting to look at DCI as being a fat-pipe mission, and of course it could be.  That’s part of the “optics dilemma”; you always have the opportunity to lay down and accept being just a route others travel on.  If you go back to the CORD approach, though, you would see that DCI services should be abstracted into a virtual network-as-a-service model.  That abstraction would accept connectivity requests from applications or virtual functions deployment and lifecycle management.  Ideally, optical players would provide the full virtualization this represents, but at the very least they’d have to be able to fit under the SDN-service-layer technology that another vendor provided.  In short, if you present a pipe interface you’re a pipe.

This is even more true for the second mission, which is to provide actual new-age service connectivity.  Think of this as a mission like connecting a metro area where there’s one customer demarcation point to another where the main concentration of user access points is found, or where the HQ of the enterprise is located.  Here again, in modern terms, this is a NaaS mission.  Can an optical vendor be a virtual network onramp here?  Or at least, can they synchronize optical configuration and reconfiguration with the higher-layer service mission?

In theory, you could be a player in the virtual network game either at the data-plane/control-plane level or at the control-plane level alone.  A data-plane NaaS optical player would focus on creating an on-ramp to optical paths that could groom some set of electrical interfaces to optical wavelengths.  It would make no sense to do this without adding in control or management interfaces to connect and reconnect stuff.  A control-plane-only player would provide a means of connecting optical lifecycle management to service-layer lifecycle management.

The linkage with the service layer begs a question I’ve discussed in an earlier blog, which is the relationship between optical-layer provisioning and electrical-layer service changes.  I stand by my original views here, which are that there is no value to having a service request at the electrical layer modify optical configuration directly.  In fact, there’s a negative value because you’d have to be concerned about whether a single user could then impact shared resources.  At the most, you could use an electrical-layer service request to trigger a policy-based reconsideration of optical capabilities.

You can see from this that the ability to configure and reconfigure optical paths is valuable in context rather than directly, which means that obtaining service context from above is critical.  That context can be expressed in a NaaS service request if the optical player has a data/control connection, or in a pure control/management request if the optical player couples to services indirectly by policy support.  Without context, either the optical layer has to respond to conditions visible only to itself, or it has to rely on some external (and possibly human-driven) coupling.  Neither is optimal; both reduce optical value and potentially increase opex.

I think we’re at a crossroads here for metro optics, which means for optics overall since the metro space is going to be the bright spot in optical capex.  Take the path to the left and you focus only on the lowest generated cost per bit, and become truly plumbing.  Take the path to the right and you have to shoot tendrils upward to intersect with the trends that are driving fog dispersal of compute resources in the carrier cloud.  I’m planning to do an assessment of where pure-play optical types seem to be heading, and over time look at how the market and vendor plans seem to be meshing as we move forward.

Networking the Fog

I blogged yesterday about the economics-driven transformation of networking from the data center out.  My point was that as operators drive toward greater profits, they first concentrate on higher-layer content and cloud-hosted services, then concentrate these service assets in the metro areas, near the edge.  This creates a kind of metro-virtual-data-center structure that users connect to via access lines, and that connect to each other with metro links.  Traditional hub-and-spoke networking is phased out.

What, in detail, is phased in?  In particular, how are services and service management practices impacted by this model?  If network transformation is going to be driven by economics, remember that saving on opex is the lowest apple available to operators.  How would the metro model impact service automation targeting opex?

Going back to the primary principle for fog-distributed data centers and DCI, you’d want to oversupply your links to eliminate the risk of reducing the size of your on-call resource pool because some resources would be accessible only through congested links.  This concept would have a pretty significant impact on resource management and service management practices.

One of the high-level impacts is on equalizing the resource pool.  Both SDN and NFV have to allocate bandwidth and NFV also has to allocate hosting resources.  Allocation is easy when resources are readily available, and that means that you can grab almost anything that’s preferred for some reason other than residual capacity and know that capacity won’t interfere with your choice.  Want to avoid power-substation dependency?  No problem.  Want data center redundancy in hosting?  No problem.  NFV benefits from having everything in the fog look equivalent in terms of delay and packet loss, which makes allocating fog resources as easy as allocating those in a single central data center.

The next layer down, impact-wise, is in broad management practices.  There are two general models of management—one says that you commit resources to a service and manage services and resources in association and the other that you commit a pool to a service, then presume that the pool will satisfy service needs as long as the resources in the pool are functional.  With virtualization, it’s far easier to do service management using the latter approach.  Not only is it unnecessary to build management bridges between resource status and service status, you can use resources that aren’t expected to be a part of the service at all without confusing the user.  Servers, for example, are not normal parts of firewalls, but they would be with NFV.

With fog-distributed resource pools, you’d want to do capacity planning to size your pools and network connections.  You’d then manage the resources in their own terms, with technology-specific element management.  You’d use analytics to build a picture of the overall traffic state, and compare this with your “capacity-planned” state. If the two were in correspondence, you’d assume services are fine.  If not, you’d have some orchestration-controlled response to move from the “real” state to one that’s at least acceptable.  Think “failure modes”.

If we expand from “management” to “management and orchestration” or “lifecycle management”, the impact could be even more profound.  For SDN, for example, you could use the analytics-derived picture of network traffic state to select optimum routes.  This is important because you want to have a holistic view of resources to pick a route, not find out about a problem with one when you’ve committed to it.  The SDN Controller, equipped with the right analytics model, can make good choices based on a database dip, not by running out and getting real-time status.

For NFV this is even more critical.  First, the “best” or “lowest-cost” place to site a VNF will depend not only on the hosting conditions but on the network status for the paths available to connect that VNF to the rest of the service.  The NFV ISG has wrestled with the notion of having a request for a resource made “conditionally”, meaning that MANO could ask for the status of something and deploy conditionally based on it.  That’s a significant issue if you’re talking about “real-time specific” status, because of course there’s a risk that a race condition would develop among resource requests.  If we assume overprovisioning, then the condition is so unlikely to arise that an exception could be treated as an error.

The evolutionary issues associated with SDN and NFV deployment are also mitigated.  Migrating a VNF from virtual CPE on the customer prem to a fog-edge cloud point is going to have a lot less impact on the service than migrating to a central cloud that might take a half-dozen hops to reach.  For SDN, having edge hosts means you have a close place to put controllers, which reduces the risk of having network state issues occur because of telemetry delays.

Even regulatory problems could be eased.  The general rule that regulators have followed is that paths inside higher-layer services are exempt from most regulations.  That includes CDN and cloud facilities, so our fog-net connectivity would be exempt from neutrality and sharing requirements.  Once you hop off the access network and enter the fog, the services you obtain from it could be connected to your logical access edge point with minimal regulatory interference.

The primary risk to this happy story is the difficulty in getting to the fog-distributed state, for which there’s both an economic dimension and a technical one.  On the economic side, the challenge is to manage the distributed deployment without breaking the bank on what operators call “first cost”, the investment needed just to get to a break-even run rate.  On the technical side, it’s how to transition to the new architecture without breaking the old one.  Let’s start here, because the technical dimension impacts costs.

The logical way to evolve to a fog model is to deploy a small number of servers in every edge office (to reprise the operator remark, “Everywhere we have real estate.”) and connect these with metro fiber.  Many paths are already available, but you’d still have to run enough to ensure that no office was more than a single hop from any other.  Here’s where the real mission of hyperconverged data centers comes in; you need to be able to tuck away these early deployments because they won’t displace anything in the way of existing equipment.

The CORD model, standing for Central Office Re-architected as a Data Center, does a fairly nice job of presenting what this would look like, but the weakness is that it doesn’t focus on “new” revenue or benefits enough.  I think it’s most likely that the evolution to a fog-distributed metro model would come about through 5G deployment, because that will require a lot of fiber and it’s already largely budgeted by operators.  But even if we figure out a way to get fog deployed, we still have to make money on it, which means finding service and cloud-computing missions for the new fog.

All we need to do here is get started, I think.  The operational and agility benefits of the fog model could be realized pretty easily if we had the model in place.  But the hardest thing to prove is the benefit of a fully distributed infrastructure, because getting that full distribution represents a large cost.  The rewards would be considerable, though.  I expect that this might happen most easily in Europe, where demand density is high, or in Verizon’s territory here in the US where it’s high by US standards.  I also think we’ll probably see some progress on this next year.  If so, it will be the first real indicator that there will be a truly different network in our futures.

How Opportunity Will Change the Data Center, and the Global Network

What does the data center of the future look like?  Or the cloud of the future?  Are they related, and if so, how?  Google has built data centers and global networks in concert, to the point where arguably the two are faces of the same coin.  Amazon and Microsoft have built data centers to be the heart of the cloud.  You could say that Google had an outside-in or top-down vision, and that Amazon and Microsoft have a bottom-up view.  What’s best?

We started with computing focused on a few singular points, the habitat of the venerable mainframe computers.  This was the only way to make computing work at a time when a 16kb computer cost a hundred grand, without any I/O.  As computing got cheaper, we entered the distributed age.  Over time, the cost of maintaining all the distributed stuff created an economic break, and we had server consolidation that re-established big data centers.  The cloud today is built more from big data centers than from distributed computing.

That is going to change, though.  Anything that requires very fast response times, including some transactional applications and probably most of IoT, requires “short control loops”, meaning low propagation delay between the point of collection, the point of processing, and the point of action.  NFV, which distributes pieces of functionality in resource pools, could be hampered or even halted if traffic had to go far afield between hosting points and user connections—what’s called “hairpinning” in networking of old.  Cisco’s notion of “fog computing”, meaning distribution of computing to the very edge of a network, matches operators’ views that they’d end up with a mini-data-center “everywhere we have real estate.”

The more you distribute resources, the smaller every pool data center would be.  Push computing to the carrier-cloud limit of about 100 thousand new data centers hosting perhaps ten to forty million servers, and you have between one and four hundred servers per data center, which isn’t an enormous amount.  Pull that same number of servers back to metro data centers and you multiply server count per data center by a hundred, which is big by anyone’s measure.  So does hyperconvergence become hype?

No, we just have to understand it better.  At the core of the data center of the future, no matter how big it is, there’s common technical requirements.  You have racks of gear for space efficiency, and in fog computing you probably don’t have a lot of edge-space to host in.  My model says that hyperconvergence is more important in the fog than in the classic cloud or data center.

You have very fast I/O or storage busses to connect to big data repositories, whether they’re centralized or cluster-distributed a la Hadoop.  You also, in today’s world, have exceptionally fast network adapters and connections.  You don’t want response time to be limited by propagation delay, and you don’t want work queuing because you can’t get it in or out of a server.  This is why I think we can assume that we’ll have silicon photonics used increasingly in network adapters, and why I think we’ll also see the Google approach of hosted packet processors that handle basic data analysis in-flight.

You can’t increase network adapter speeds only to lose performance in aggregation, and that’s the first place where we see a difference in the architecture of a few big data centers versus many small ones.  You can handle your four hundred servers in fog data centers with traditional two-tier top-of-rack-and-master-switch models of connectivity.  Even the old rule that a trunk has to be ten times the capacity of a port will work, but as your need to create data center meshing to allow for application component or virtual function exchanges (“horizontal traffic”) you find larger data centers need either much faster trunks than we have today, or a different model, like a fabric that provides any-to-any non-blocking connectivity.  I think that even in mini (or “fog”) data centers, we’ll see fabric technology ruling by 2020.

The whole purpose of the ten-times-port-equals-trunk rule was to design so that you didn’t have to do a lot of complicated capacity management to insure low latencies.  For both mini/fog and larger data centers, extending that rule to data center interconnect means generating a lot of bandwidth.  Again by 2020, the greatest capacity found between any two points in the network won’t be found in the core, but in metro DCI.  In effect, DCI becomes the higher-tier switching in a fog computing deployment because your racks are now distributed.  But the mission of the switching remains the same—you have to support any-to-any, anywhere, and do so with minimal delay jitter.

Future applications will clearly be highly distributed, whether the resource pools are or not.  The distribution demands that inter-component latency is minimal lest QoE suffer, and again you don’t want to have complicated management processes deciding where to put stuff to avoid performance jitter.  You know that in the next hour something will come up (a sporting event, a star suddenly appearing in a restaurant, a traffic jam, a natural disaster) that will toss your plans overboard.  Overbuild is the rule.

Beyond fast-fiber paths and fabric switching, this quickly becomes the classic SDN mission.  You can stripe traffic between data centers (share multiple wavelengths by distributing packets across them, either with sequence indicators or by allowing them to pass each other because higher layers will reorder them) and eventually we may see wavelengths terminating directly on fabrics using silicon photonics again.  Probably, though, we’ll have to control logical connectivity and white-box forwarding is probably going to come along in earnest in the 2020 period to accommodate the explosion in the number of distributed data centers and servers.

You can see that what’s happening here is more like the creation of enormous virtual data centers that map to the fog, and the connectivity that creates all of that is “inside” the fog.  It’s a different model of network-building, different from the old notion that computers sit on networks.  Now networks are inside the virtual computer, and I don’t think vendors have fully realized just what that is going to mean.

Do you want content?  Connect to a cache.  Do you want processing?  Connect to the fog.  Whatever you want is going to be somewhere close, in fact often at the inside edge of your access connection.  The whole model of networking as a vast concentrating web of devices goes away because everything is so close.  Gradually, points of service (caching and cloud) are going to concentrate in each metro where there’s an addressable market.  The metros will be connected to be sure, and there will still be open connectivity, but the real bucks will be made where there’s significant economic opportunity, which means good metros.  Short-haul optics, agile optics, white-box-supplemented optics are the way of the future.

This adds to the issues of the legacy switch/router vendors because if you aren’t going to build classic networks at all.  The metro model begs for an overlay approach because you have an access network (which is essentially a tunnel) connected through a fog edge to a DCI network.  Where’s the IP or Ethernet?  Both are basically interfaces, which you can build using software-instances and SD-WAN.

The key point here is that economic trends seem to be taking us along the same path as technology optimization.  Even if you don’t believe that tech will somehow eradicate networks as we knew them, economics sure as heck can do that.  And it will, starting with the data center.

Pathways to Network Capex Reduction: Do Any Lead to a Good Place?

Everyone is talking about carrier capex, in some sense or another.  If you’re a vendor you know that your buyers have been pinching pennies, and if you’re an operator you know that return on infrastructure is threatening your capital plans.  The Street has worried about it too, though more for vendors’ profit impact than for operators, since the traditional EBIDTA model discounts capital spending by operators.  The problem with all these discussions is that they don’t really address the critical question of balancing the chances of capex reduction with the risks of transformation.

Capex reduction is the most commonly stated goal for new technologies too, so I wondered how much capex could actually be saved, and pushed some numbers through my market model to find out.  My goal was to test some possible capex-controlling approaches to see how they might do.  I focused on electrical-layer devices only, and looked at the US and international markets.

The most obvious way to control capex is to put price pressure on vendors.  Operators say that’s been done for years, and that they believe that Huawei in particular is a good way of pushing prices down.  In the US, where operators can’t buy from Huawei, equipment prices seem to be about 14% higher, and operators believe that Huawei could be hit for an additional 20% and other vendors by 25% or even 30%.

My model says that current network technology pricing could decline by about 17% outside the US and 24% in the US under optimum price pressure, which in the US would include releasing Huawei to sell here.  Enterprise data is a bit harder to get, but the model suggests that enterprises could reduce their capex by about 19% through pricing pressure, and this is fairly consistent across the market segments.

The next approach is to substitute “white-box” technology for traditional electrical-layer devices.  There is at present no direct white-box alternative for all the electrical-layer devices in use; generally, it’s available for the mid-range and below products but not for the terabit-level stuff.  Where you can get white-box stuff, direct 1:1 substitution without a technology shift (to OpenFlow, for example), the savings potential is slightly better, meaning 19% internationally and 27% in the US.

We now enter the gray area, because the other options for capex reduction have to be considered with some skepticism.  SDN and NFV create a different network model, one that would have to be assumed would impact operations costs as well.  There are no really good ways of estimating the impact because we don’t have many vendors who even tell a complete story, and in any case I’m trying to look at capex here.  So for the stuff below, I’m making a critical assumption, which is that SDN and NFV opex impact is neglected.  Service automation creates most of the opex benefit in any case, and you don’t need SDN or NFV to employ that.

Moving up, the “hosted-router-switch model” would substitute software instances of L2/L3 devices for the physical appliances.  The software would be hosted on “commercial off the shelf” or COTS servers.  This isn’t NFV yet, it’s a static model of software switch/routing.  Like white-box products, this approach isn’t extensible to the very top end of the range of devices, but where it’s linked with SD-WAN edge technology to create overlay networks, the model says it saves 21% of capex internationally and 29% in the US.

True, OpenFlow, SDN is next.  This model combines white-box and central controller logic and generates a lower capex not only because the devices are cheaper but because the stuff is less complex.  However, server cost and complexity for the controller has to be considered.  If we stay with traditional network applications and forego cloud multi-tenancy for the moment, the model says that this approach would save 23% of capex internationally and just under 30% in the US (US data center switch prices are lower, even not considering Huawei).

NFV comes next.  Here, the challenge is that NFV even more than OpenFlow SDN should really be considered not box for box but on an alternative-architecture approach, and the benefits will vary considerably depending on the mission.  I’ve modeled only two for this blog.  The first is “vCPE” or “service chaining” where software features substitute for premises equipment.  In the model, a premises-device-hosted vCPE approach showed only a 15% capex savings, with little variation by market area.  Cloud-hosting the software features proved impossible to model because it’s impossible to assess the cloud cost without knowing the volume of deployed servers, and vCPE can’t drive mass deployment alone.

In the mobile application and CDN area, the model says that NFV could save almost 20% of mobile-or-CDN-specific capex internationally and 25% in the US.  This benefit, I think, understates the value of NFV in mobile and CDN applications; much of the benefit would come from creating more agile deployments that (without NFV) would cost more than today’s mobile infrastructure.  If you try to factor that in, the savings jump to 29%, but these numbers are softer because of the difficulty in establishing a model census for data center resources.

What’s the best case?  This is really a modeling stretch because of the host of assumptions you have to make, not the least of which is that because there’s no vendor offering this solution, you kind of have to make up a product set.  However, if you assume that white-box OpenFlow SDN is used as an overlay on agile optics, and that this segments networks to the point where hosted router/switch technology can be applied throughout, and that NFV then redefines metro and feature hosting, you get an international savings of 33% of capex and a US savings of 38%.

The important point here is that L2/L3 boxes don’t make all of network capex; the actual percentage for L2/L3 varies by operator type, but a good benchmark is that access makes up about 40% of capex (and isn’t impacted by L2/L3 changes) and fiber makes up about 30%.  That means roughly 30% of total capex is impacted by these measures.  Since globally, capex makes up about 18 cents of every revenue dollar on the average, optimum capex measures could save about 7 cents of each revenue dollar.

My model had previously shown that service automation techniques, applied to all services in an optimum way, would save 7.7 cents of every revenue dollar by 2020 without changing capital equipment spending or practices.  Projecting further (and obviously with less confidence!) into the future, the service automation approach appears to plateau at approximately 12 cents per revenue dollar, which could be achieved in 2024.

You can draw a lot of interesting stuff from this.  First, it would be relatively easy for vendors to secure greater savings for operators through service automation than the operators could get by “saving money” on equipment.  In fact, the potential savings is nearly double.  The ROI on this savings is considerably better too.  But the second point is also critical; capex reduction savings and opex savings are additive.  Operators could totally transform themselves by harnessing both, and by 2024 could hope to have saved a total of 22 cents of every revenue dollar, which means they’ve cut costs overall by more than a quarter.

The best ROI could be achieved by combining service automation with the hosted router-switch model and SD-WAN principles, which would generate a significant capex savings and could easily be paid forward through savings from the opex side.  The model suggests that short of SDN virtual wire underlay and SD-WAN overlay, there aren’t many other capital-focused steps that could really pay a big dividend.

It’s essential to realize that these pathways to transformation aren’t accumulative in savings, though.  The same model says that the most you could achieve in capex reduction would be from my “optimum” model that forecasts 33%/38%, so all the others are going to be optimal only to the extent that they lead to this happy place.  That shows that you have to understand your transformation goal before you start chasing it.

It’s Time to Get to the Real 5G Issues and Architectures

Everyone probably knows the old story about trying to identify an elephant behind a curtain.  If you grab a leg, you think it’s a tree; grab a trunk and it’s a snake.  I’ve been reading the stories on 5G for the week, and it’s hard not to believe that we’re back to groping the elephant here.  If so, we’re again at risk to under-supporting and misunderstanding a massive market shift, because we persist in focusing on its piece and not the whole.

I’ve noted in earlier blogs that 5G is a lot like an arms race; operators know that a simple “4-is-better-than-3-so-5-is-better-yet” market value proposition is effective.  Customer acquisition and retention is the largest component of what I call “process opex”, and most operators don’t think they could survive as the only 5G holdout.  Given that, something that we’d be able to call a 5G evolution is inevitable.  What that means is a bit murkier, because a lot of the 5G revolution has to play out in the rest of network infrastructure, where uncertainties abound.  And completing or harmonizing 5G standards won’t address these issues at all.

Going back to the groping of elephants, just look at the wish list for 5G.  For many, this is about the difference in the radio network’s capacity.  5G would deliver over a gigabit of data, potentially.  For others, it’s about the number of devices; 5G is supposed to support IoT by allowing many more devices to be connected.  Network equipment vendors think 5G is about getting rid of a lot of specialized mobile metro gear and going back to something more like vanilla IP routing.  Intel, according to a Light Reading piece, sees 5G as an opportunity to push their chips into wireless prominence, and operators think it might be an opening to introduce white-box and NFV technology.  Has there ever been such a laundry list of wishes?  Sure; with SDN, NFV, and IoT, and of course frame relay and ATM before them.  Is there a better chance of wish fulfillment with 5G?  Maybe.

Money is one big thing 5G has going for it—meaning budget money and not just vendor revenue hockey-stick aspirations.  The big difference in 5G is that market evolution in at least a titular sense seems inevitable for the customer acquisition and retention reasons I’ve already noted.  Unlike other stuff like SDN and NFV, which are still largely science projects seeking a business case, 5G has the best business case possible—do it or you lose all your business.

There are, however, worse things than losing your business by not investing in the future.  You could invest a boatload of cash and lose it anyway.  The big worry now for operators is whether they can make that 5G investment into more than table stakes, and here some old and new truths could help them plan.

The old truth is that for several decades, network profits have been (like, it’s said, “all politics” is) local.  About 80% of profitable traffic is carried 40 miles or less.  Not only that, efforts to improve the price/performance of the big traffic headache, OTT video, has resulted in turning the Internet from a global network in a traditional sense to a bunch of loosely connected metro enclaves.  Hope what you want is cached, because if it isn’t then you’re trusting it to the Great Beyond.

Cisco, who often sees the future correctly then blurs the vision for opportunistic reasons, predicted another trend, with what they called “fog computing”.  The notion was that if a cloud is dispersed, fog is more so and because of ultra-dispersal it’s putting your stuff, your information, your content, closer to you.  You can see content delivery networks (CDNs) as a poster child for cloud-to-fog evolution.  People are now talking about caching in the neighborhood or even in the home.

All of this is driven by personalization, which is a nice way of saying that we want what we want when we want it.  Every whim demands satisfaction, and every whim is expressed through a little portable device that has more compute power than a data center did fifty years ago, and is connected through a broadband pipe faster than 90% of business sites had only 20 years ago.  These people aren’t doing abstract research, they’re trying to tie IT and the Internet into their lives at this very moment.  Not only that, they’re moving around while they do it.  You can see this as a world-class driver of infrastructure agility and it doesn’t matter what “G” they’re connected on.

NFV and SDN have been the technology targets in achieving that agility.  Virtualization, properly used, lets operators spin up instances of stuff that’s being over-used, change routes to accommodate traffic patterns created when all our personalization gets aggregated by some concert or game, and deal with the expected new demands of contextual services based on social connectivity and IoT.  You create a kind of blank slate of resources, and then write on it with virtualization.  To make that work, you have to host features as software rather than solidify them in appliances, and that’s a cloud mission.

If we presume a distribution of caches, and if we then presume that cloud computing.  If we presume successful mobile virtualization via NFV adaptations to mobile services, we create a mesh of in-metro data centers.  It’s this into which we have to fit 5G.  Yes, 5G has to fit in, fit into a profitable, responsive, infrastructure plan.  It’s going to fund a lot of the plan but it’s not the direct driver.  Personalization and metrofication are driving both agility and 5G.

To avoid getting G-centric, we have to ask where all this agility would lead without 5G or we risk stranding ourselves in sub-optimal technology till 5G becomes pervasive.  We have today, in mobile NFV, perhaps a half-dozen truly useful VNFs.  We have in vCPE, presuming we started using 5G to do wireline last-500-meters, another half-dozen truly useful VNFs.  From there, we move to things less useful and compelling when we need to be finding things that are better.  The only way to add value to this picture is to forget the limiting notion of a virtual network function and embrace the notion of a general cloud component.  Some of them will be network functions, some application functions, some IoT or collaboration functions.

Whatever operators use to connect with their customers, whether it’s fiber or RF or tin cans and strings, they will need to have something of a higher level to make money on.  That lesson should have been brought home by Cisco’s quarter and restructuring.  The value in networking is in the cloud, and getting the cloud and the network together and not separated into layers that will end up being operator and OTT is the mission of network planners (vendor and operator) in the near future.  5G isn’t going to change the picture, only change the details on how it’s applied.  The stuff we’re worrying about in the 5G evolution is trivial, table stakes.  Standards will define it; all vendors will support it.  All it does is connect, at the end of the day.  We need to build the stuff that empowers, and by doing that get the “value” back into “value proposition.”

The metro network is going to turn into a cloud enclave, a place that’s connected with fiber and SDN and over which we build extemporaneous connection networks for various users and applications, including mobile, IoT, and the infrastructure elements of 5G.  There will be new issues associated with the dynamic adaptation of infrastructure to match what we expect 5G to deliver in the RF domain with C-RAN.  All these things will depend on standard approaches, but all of them have to be hosted in an agile metro platform domain, and deploying that domain and making it profitable for every access mechanism it supports is the challenge, not simply “5G”.

Getting back, the, to our elephant, the most important point about 5G is that it’s at the endpoint (for now, of course) to a set of trends in demand and infrastructure evolution.  What’s important is first that set of trends and only second the details of how 5G accommodates them.  That’s because without the trends, 5G is just an expensive wireless transformation of supply, not an accommodation of demand.

One of the interesting things about all of this, presuming you agree with my points, is that it would suggest a much broader 5G opportunity than the current radio-dominated vision would permit.  Few vendors have credibility with the radio network, but a lot of vendors would benefit mightily from the broader transformation vision I’m describing.  If that’s true, then one or more of them could step up and promote a change of mindset, and that could be revolutionary in terms of 5G benefits.  By semi-disconnecting them from actual 5G, they could be realized in an evolutionary way even now, and by framing 5G in benefit terms instead of trying to find benefits to suit our conception of 5G, we have a better chance of actually making it happen.

Cisco’s Reorganization and Network Transformation: Do They Fit?

It is now being widely reported that Cisco is going to undertake a major restructuring.  The Street said Cisco would lay off between 12% and 20% of its global workforce, and that it’s related to shifting resources from hardware to software.  In their earnings call, Cisco confirmed the latter but said that only about 5,400 would be laid off.  Since Cisco is the market leader in IP, which everyone seems to think is invincible as a technology, any significant restructuring would create considerable market shock and raise the inevitable “What now?” question.  Let’s try to answer it.

Cisco and all the other network equipment companies got to where they are by ignoring the future to feed the present.  The primary reason for this was the knee-jerk reaction of legislators and regulators to the NASDAQ bubble of the late ‘90s.  Sarbanes-Oxley (SOX) in 2002 made it very difficult for companies or the Street to push anything but the current quarter.  If you reward a particular class of behaviors, or more dramatically if you demand them by law, you get what you asked for.

Feeding the future means doing what’s needed to strengthen your position with your buyers in the long term.  Sometimes that means taking a hit in the near term, supporting product and technology transitions that could hurt your revenue for multiple quarters.  Because of SOX, in large part, that doesn’t happen much and certainly didn’t with Cisco, who is notoriously sales-driven.  As a result, they face an immediate threat in the operator space, and a delayed one in the enterprise market.

The root of Cisco’s operator problem, and the market’s, is that despite hope and hype, operator profit per bit carried has been declining for two decades, for as long as SOX was operative though the timing I think is coincidental.  Operators have been talking about the problem openly for a full decade, and proposing various projects to relieve the pressure under the name “transformation”.  Everything that’s been going on in the network equipment space for the last decade, including things like SDN and NFV, the cloud, mobility, 4 and 5G and so forth, have been driven largely by the converging cost/price curves for bits.  Operators have pled for relief.  They’ve shifted increasingly to Huawei as a price leader, to the point where Huawei is probably the only equipment company doing well.  Still everyone, Cisco included, stuck their head in the sand.

If revenue per bit and cost per bit are converging to exclude profit and ROI, you spend less.  I’ve read transcripts on a decade’s worth of earnings calls and I never saw any vendor point out this obvious point or claim a strategic shift that was explicitly aimed at correcting it.  Where I will fault Cisco specifically is that it instead implicitly promoted the notion that as long as Internet traffic was exploding, operators were obliged to carry it, profitable or not.  Their Internet index stuff never called for any steps to address ROI with the increases in traffic, and they’ve backed regulatory positions that would favor settlement for Internet traffic and “non-neutral” concessions.

But that was then and this is now.  Even now, nobody including Cisco is stepping up to say that things can’t go on as they are, but restructuring sure implies that, and the promise of a software shift is explicit.  The question is what they mean by “software”.  You can do a software transformation of your network gear, shifting to a hosting and subscription model.  You can elevate your story to a higher, above-network layer too.  Cisco seems to be focusing on the former, which changes their business less but changes their revenue prospects more.

Shifting to software-based networking and cutting costs by cutting staff is an admission of change, but it doesn’t specifically make a critical point, which is that operators are going to put capex under stringent pressure if they can’t raise ROI.  Yes, a software-centric vision could be more appealing in a cost sense, but if it is then you’re making an important point in a too-subtle way.  We are shifting to software to allow operators to spend less.  Thus, we will make less.

The interesting thing is that while constraining capital spending seems to be the pathway of choice for operators, the fact is that it’s not one that they generally believe in.  Way back in 2013 a group of global Tier Ones told me that they didn’t believe that NFV or any other new technology would reduce capex by more than 25% (they now tell me less than 20% is likely) and that they could get that much by “beating up Huawei on price.”

That poses the central question raised by all of this, which is whether there is a way to transform networks that solves the profit-per-bit problem a lot better than just buying cheaper switches and routers.  If the answer is “Yes!” then Cisco could hope to restructure itself to fit the new demands of the transformed network space.  Cisco has the capability of supporting this kind of transformation, but they seem unwilling to get involved in “transformation education” for their buyers.  If it’s “No!” then it’s doubtful that operators could be induced to try any radical new infrastructure models, and network connection and transport will totally commoditize.

On the enterprise side there’s also a “driver” problem.  While customer revenues are the top-line engine for operator profits, worker productivity is the top-line engine for enterprises.  The problem is that we haven’t had an effective new driver for productivity enhancement since the late 1990s.  Historically, networking has followed the rest of IT in that spending was a combination of sustaining current infrastructure (“budget”) and supporting new and beneficial activities (“projects”).  Over the last decade, project spending as a percent of total spending has steadily declined, which means that most spending by enterprises just keeps the network lights on.  How do you, as a buyer, make that better?  By spending less.

It’s pretty clear that software would have to drive enterprise productivity changes, and Cisco has in the past aimed at productivity, but not very insightfully.  The answer, they suggested, was full-room videoconferencing.  The number of workers that such a technology would touch is minimal, neglecting the question of whether even for them there’d be a productivity benefit to justify the cost.  Work is populist by nature, and if you want to empower workers you have to empower a lot of them or nothing much happens to your bottom line.

Collaboration is an option for empowerment.  So is mobility, and contextual communications, and IoT.  However, you can’t empower people by just connecting them, which means the mission for productivity software has to go beyond simply generating traffic.  You have to get to the worker, what the worker needs in terms of information and relationships, and only then worry about connections.  This is a complicated undertaking, requiring a lot of planning and a lot of sales effort educating the buyer.  It’s not the sort of thing you do when regulations demand that you be rewarded only for your current quarter.

This is Cisco’s challenge now, and the industry’s challenge too.  For decades we’ve thought about the problem of the future as being “connected”.  More networking means more value, because the network was what was holding things back.  That’s not the problem anymore, and Cisco’s moves prove that.  The problem is being relevant, valuable in what we present though the connections.  You can’t outrun product creation with product delivery.

Software can define a network.  Software can automate manual tasks, and software can create information and contextual value.  But a software-defined network is still really defined by a connectivity mission, and manual-task automation eventually runs out of humans to displace.  Cisco’s software changes will help it only if they follow the network-to-value trail farther than they have in the past.  The rest of the industry will have to learn the same lesson.

They could, and still can.  Cisco is unique in the industry in the breadth of its technology offerings.  The cloud, both the technological and “spiritual” fusion of network and IT, is something they probably get better than any other network vendor.  Certainly they’re better positioned to leverage it than any other network vendor, but they have been bound by their own constraints, and by the ubiquitous SOX.  They now have to break free.

Apply software to connection networks and you commoditize faster and are eaten by open-source.  Cisco has to focus on the leading edge of the productivity and profit value proposition in both the enterprise and operator space, or they will never be a high-margin company again, with command of the markets and their destiny.  There is absolutely nothing to stop them from doing all of this…but themselves.  We’ll see if it does.

What Do the Network Operators Think About IoT?

In my blog yesterday, I talked about what operators thought about vCPE and how their views and the demographics of the service market could impact an NFV vision driven by service chaining.  I also had a chance to talk with the same operators about the Internet of Things, and I think their views in that area are at least as interesting.  Most interesting, perhaps, are some of the preconceptions they hold, or at least profess to hold.

The most important question about IoT, as it is for anything else in the tech space, is how the business case will be made.  What do operators think?  Dive through their views and it comes down to “LSODDI”, meaning “Let Some Other Dude Do It.”  You could say that some think IoT will simply happen, and that others think it will emerge as OTT-like players enter the space.  Virtually none of the operators think that they, meaning the network operators, will be the ones driving it.

This interesting view may explain why operators are so focused on the connection side of IoT.  If IoT is going to happen automatically, all they really do need to do is to sit back and let the 4/5G connection revenue roll in.  I don’t have to tell you, I’m sure, how illogical and destructive I think this particular view is, but in a sense it’s not surprising.

The most significant phenomena in networking in modern times was the Internet, which created a residential data application that transformed residential communications, and eventually just about everything.  The Internet sprung up on its own, essentially, independent of any operator support and exploiting only the connection services then available.  Operators then developed Internet access to suit the emerging market.  Why not think that would work with IoT?

Good question, in many regards.  Could a Google or Microsoft or Amazon be the genesis of the new IoT wave?  All of these companies have IoT initiatives, but you can see that they’re mostly focused on supporting someone else’s IoT applications rather than developing their own IoT framework (at least for now).  So who do operators think will drive things, if they aren’t going to do it?

About a third think that the current OTT giants I just named will be the IoT giants.  Another third think that there will be some new startup wave, a kind of social-media-like explosion.  The final third admit they don’t know who will do it, but they’re confident somebody will and that they’ll know IoT when they see it happen.

So could this work?  It depends on a lot of undependable factors that all net out to the question of whether IoT is more like OTT, or more like the old Bell System.

The big challenge with IoT versus the original Internet is the lack of a single, clear, technology core concept.  The Internet was created not by a network at all, but by HTML.  There was an Internet before anything we’d recognize as one, and it was consumerized by the notion of the web, which was based on HTML.  A simple HTML engine—a browser—makes anyone capable of rendering pretty pages, and with that we were off.  What is the simple centerpiece of IoT?

The next challenge is a massive ecosystemic dependence.  For IoT to work you need a huge collection of sensors and controllers, networking to connect them, applications to exploit them, security and compliance, and partners—all of which means an enormously complicated business case.  Anyone who wanted to put a server behind a bank of modems could have created an Internet of One.  By nature, IoT is more complicated.

Complicated, but this may be the core of what operators think will happen.  Most operators with IoT aspirations are supporting IoT with developer communities, which is the classic strategy for building co-dependencies.  Sadly, most of these communities are focused around managing the connection side of IoT, which apparently is about all the operators can visualize.  Is that true?  Do they see the rest?

No, not much.  The majority of operators actually do see IoT as the media describes it.  There are a zillion sensors and controllers on the Internet, each with a spanking new 4/5G radio (when you ask who’s paying for this, they have only vague answers).  People write applications to exploit the data, sell the applications, and oh yes, pay the “thing-phone-bills.”

You’d think that if anyone believed that IoT was “Bell-like” it would be the telcos, but comments from operators suggest that they’ve become used to the notion that regulations aren’t a way of protecting investment but rather a way of putting it at risk.  Even those who think that another regulated monopoly is the way to go (they exist, but in a decided minority) don’t think there’s any chance that the government would take that kind of action today.

Absent regulated monopolies of some sort, I don’t think any party would invest in open sensors in the quantity needed to drive IoT forward.  There would be a way, though.  All you have to do is forget the notion of sensors on the Internet and move to the notion of “virtual” sensors online.  If we leveraged the vast number of sensor/controller devices out there, selectively posting their data under the proper level of social control, we could build a community of information large enough to drive an OTT-like opportunity.  But then we get rid of all those lovely-to-the-operator “thing-phone-bills”.

In my view, operators are stuck in an IoT dilemma of their own making.  On the one hand, they want to collect new revenue from wireless attachment of “things”.  On the other hand, they don’t want to drive “thing” deployment even though, financially speaking, it would be easier for them to do that given their public-utility roots.

What this suggests to me is that despite the promise of IoT, realization is likely to be delayed because of a lack of a sensible way of moving forward, a way that deals realistically with the investments and risks that IoT will pose.  And when it does move, it will likely be driven by a non-operator player or players, and operators will then complain about being “disintermediated.”  This time, at least, they’ve had plenty of warning and a direct example from their past OTT experiences, and they’ll have no one to blame but themselves.