What, When, and How to Use T&M With SDN and NFV

There have been a number of online articles recently on the relationship between testing and monitoring and NFV.  In the CloudNFV project I headed in 2013 to early 2014, I did some work defining these issues in more detail, and though the results were never incorporated in the PoC they do offer some input on the issue.  The most interesting thing is that we have to review our entire testing, monitoring, and management practice set in light of virtual-network technologies.

One clear truth about T&M in a virtual age is that just as virtualization separates the abstract from the real in a clearly defined way, so it separates T&M tools and practices—or should.  There is “Service T&M” and “Resource T&M”, each focusing on a specific area and each with a specific mission.  The focus and mission differences dictate the technology implications of both.

Service T&M, since it lives at the service layer, lives above the abstract service/resource boundary, or in a sense on the “outside” of the intent models, where only guaranteed behaviors are visible and not how they’re fulfilled.  Obviously Resource T&M has to focus on the real resources, and so should always live “inside” the intent models.  Put another way, Service T&M tests and monitors virtual/functional things and Resource T&M actual things.

A complication to both kinds of T&M is the boundary point.  It’s not unreasonable to think that Service T&M practices would eventually lead to an intent model boundary, and to penetrate it you’d have to know service-to-resource correlations.  Similarly, it’s not unreasonable to think that tracing a resource issue might lead you to wonder what service impacts it was having.  We’ll deal with boundary-crossings in both directions.

Let’s start with the easy stuff.  Resource T&M, being aimed at testing real paths and devices, is at one level at least similar to or identical with traditional T&M.  In my consideration of T&M for NFV, my conclusion was that the only area where NFV (and, in my view, SDN) differed in Resource T&M was the introduction of a new class of resources—the hosting sites or servers.  These are the foundation points for NFV, and they also represent a complexity in the boundary-condition problem.

If a service like VPN is viewed from the top (outside the intent model) it can easily be represented as a single virtual router.  That’s true with most network-connection services.  Similarly, a chain of virtual functions that make up vCPE could be visualized (you guessed it!) as “virtual CPE”.  However, inside-the-model view would be (in both cases) a bunch of VMs or containers likely linked by SDN paths.  The transition from service to resource is not obvious.

The solution I proposed in CloudNFV (and ExperiaSphere) was to have a “soft” boundary between the service and resource layer, where instead of having a bottom-level service intent model decompose into resources directly, it decomposes into virtual devices.  A resource architect can then formulate how they want to expose virtual devices for composition into services, and however that happens the relationship between the virtual devices and the real resources is set.  Orchestration and decomposition then take place on both sides of the boundary, but driven by different but related missions.

Service T&M is more complicated, and the top reason for that is that it’s far from clear whether you actually want to do it.  Everyone used to crank the engine by hand to start a car, but we don’t put cranks on cars any longer.  The point is that many of the old-line ways of managing networks and dealing with problems don’t really relate to the way we have to do it in an age of orchestration and service automation.

Operators themselves were (and still are) very conflicted on the role of T&M, even at the Resource level.  The trend seems clear; they would rather manage resources against a capacity plan based on policies and perhaps on admission control—the mobile model, in short.  If that’s what you’re doing, then you probably don’t want to do specific resource testing much, if at all.  On the service side, the issue is a little more complicated because it’s largely driven by the fear of a major enterprise client calling and shouting “What do you mean all your lights are green?  My &#$* service is down!” when they have no way of “testing” or “measuring” just what’s going on.

After some deliberation, my own conclusion was that service-layer T&M should really consist of the following, in order of value and lack of complications:

  1. The ability to trace the binding to resources by tracing the state of the intent models from service down to the resources. Any inconsistencies in state could then be determined.
  2. The ability to employ packet inspection on a path/connection and gather statistics.
  3. The ability to “see” or tap a path/connection, subject to stringent governance.

In the model I believe to be workable, the central thesis is that it’s useless to test a virtual resource; you can’t send a real tech to fix one.  The goal, then, is to validate the bindings by looking at how each intent model in the service structure has been decomposed, and establishing whether the state of each is logical given the conditions below.  For example, if a model representing a virtual router shows a fault, then the higher-level models that include it should also be examined for their fault state.  This lets you uncover problems that relate to the transfer of status in a virtual-network model, before you start worrying about resource state.

At the resource level, the tracing of state can link the status of the intent model that represents the resource (in my example here, the virtual router model) to the status of the resources that were committed.  This could involve a dip into a status database, or a direct query of the management interface of the resource(s) involved.

The packet-inspection mission has interesting SDN/NFV connotations, for “inspection-as-a-service” in particular.  Inspection is a combination of creating a tap point and then linking it to a packet inspection engine, and both these could be done using virtual elements.  Any virtual switch/router could be made to expose a tap, and once there is one it’s not difficult to pipe the data to a place where you have static instances of inspectors, or to spawn a local inspector where you need it.  You could extend this to data injection without too much of a problem, but data injection in today’s network protocols has always been more problematic; it’s easy to do something that creates an instability.

In my view, based on some real experience, I think that any discussions on SDN/NFV T&M that don’t focus first on binding are a waste of time.  In SDN, you need to know how a route is constructed from successive forwarding instructions in devices.  In NFV you need to know where something is hosted and what connection resources are used to connect in the pieces.  I believe that if service models are constructed logically, the models themselves will provide access to the information you need to trace functionality, and little more will be required.  Where it is required, then the packet-inspection-as-a-service approach can supplement binding tracing as needed.

If bindings are important, then service models and the nested-intent-model approach are critical.  The state of a service today is directly related to the state of the devices that make it up.  Whether the service of the future is built from virtual functions, virtual devices, or virtual routes, the same dependency will exist then.  The most logical way to determine the status of a given intent model is to look at the state of the things underneath, what it decomposes to, and continue that downward progression until you find the problem.  If you can’t do that, they you might as well throw darts at a network resource map and look where each one lands.

But let’s get to the top of the issue.  All of this, IMHO, demonstrates how convincingly virtualization technology changes network operations and management, or should, or perhaps must change it.  Nobody should doubt that virtualization is more complex than old-fashioned fixed assets.  That additional complexity will swamp any capex benefits in opex unless we’re very careful with service automation.  T&M as we know it is irreconcilable with service automation; you can’t remedy problems with low-touch opex practices by touching them.  However, those who want to practice them on real resources can continue to do so, perhaps as a last resort.  What we should be worrying about is reconnecting the automated analysis of service/resource behavior at that “slash point”, the boundary that virtualization will always create.

Could it be that the biggest opportunity in SDN and NFV is one that’s related to doing the kind of deep-thinking-and-heavy-lifting stuff that’s essential in framing virtual technologies in profitable service practices?  If so, then I think that the modeling and binding approaches are the most critical things in any of these new technologies, and ironically the least developed.  I looked at the major vendors who can make a business case for broad deployment of NFV, for example, and found that of the seven who now exist, all have either incomplete modeling/binding approaches or have recently made changes in theirs.  Yet these should be the top of the heap in terms of software design and evolution.  We still have a lot of catching up to do.

Metro-Networking in the Fog and Optical Respect

If the future is as “foggy” (yes, I mean in the sense of being edge-distributed, not murky!) as I have suggested, and if networks have to be adapted to the mission of “fog-connection”, then how does this impact metro networking explicitly?  In particular, would this new mission create opportunities for the optical vendors, and could it help them “get more respect” as I’ve suggested they need to do?  Let’s look at the way things seem to be developing.

A rational fog-distributed model would be a set of CORD-modeled data centers linked with a dense optical grid that offered essentially inexhaustible capacity.  This model would coexist with current metro infrastructure, and logically you’d expect it to share metro connectivity with that infrastructure for the period of evolution to the fog model, which could last for five to seven years.

The connectivity mission for next-gen metro infrastructure, then, would consist of four services.  First, DCI-like service between fog data centers, for the purpose of connecting the resources there into uniform virtual pools.  These could be “service chains” or links between virtual functions or horizontal connections in carrier IoT or cloud computing.  Second, SDN trunk/path connection services created between fog data centers for the purpose of building connection services at the retail level.  Third, traffic from legacy sources in the same fog data centers, not part of the actual fog infrastructure, and finally wholesale or bulk fiber connectivity sold to other operators or large enterprises.

I’ve listed these in order of their importance to operator planners, most of whom see “carrier cloud” missions as their primary revenue goal.  Operators a decade ago had great hopes for cloud computing services, and even though most now believe these will be harder to promote and less profitable to offer, they still realize that some form of service-hosting revenue is the only credible way for them to boost their top lines.  So you can take the list, from the operator perspective, as a model for how to transition, or fund the transition, to metro-cloud deployments.

The most important point about these service targets is that the mixture of needs and the evolution they represent are the main reasons why optics could hope to escape plumbing status.  If the future is going to be fog-connection, then you can run glass right to the SDN- or service-layer devices and skip an independent optical layer.  At the least, this could reduce the feature demand on optical equipment to the point where low-price players would be the only winners.  If, for example, we had only service missions one and three above, we could satisfy the need for optical transport with minimalist add-drop wavelength multiplexing.

That one-and-three example also illustrates an important point for the optical space, because if you want to get more out of the market than carrying the bits that empower other vendors, you have to somehow climb up the food chain to the services.  Mission four, which is direct optical service fulfillment, isn’t the answer either.  The margins on these connections are very limited, and in many cases operators actually fear selling metro transport for fear it will empower competitors and generate regulatory pressures on pricing practices.  Mission three is just merging in legacy, so that means that elevating optics has to focus on missions one and two.

If, as I’ve suggested in an earlier blog, the metro area is destined to be an enormous virtual data center, then somehow the abstraction/virtualization/orchestration stuff has to be handled.  Missions one and two are network-as-a-service (NaaS) missions, with the services delivered to software residents of the fog data centers through available computer-network interfaces and conforming to the specific feature, service, and application missions those software elements support.  To see how that could work, we have to look at missions one and two in more detail.

It’s tempting to look at DCI as being a fat-pipe mission, and of course it could be.  That’s part of the “optics dilemma”; you always have the opportunity to lay down and accept being just a route others travel on.  If you go back to the CORD approach, though, you would see that DCI services should be abstracted into a virtual network-as-a-service model.  That abstraction would accept connectivity requests from applications or virtual functions deployment and lifecycle management.  Ideally, optical players would provide the full virtualization this represents, but at the very least they’d have to be able to fit under the SDN-service-layer technology that another vendor provided.  In short, if you present a pipe interface you’re a pipe.

This is even more true for the second mission, which is to provide actual new-age service connectivity.  Think of this as a mission like connecting a metro area where there’s one customer demarcation point to another where the main concentration of user access points is found, or where the HQ of the enterprise is located.  Here again, in modern terms, this is a NaaS mission.  Can an optical vendor be a virtual network onramp here?  Or at least, can they synchronize optical configuration and reconfiguration with the higher-layer service mission?

In theory, you could be a player in the virtual network game either at the data-plane/control-plane level or at the control-plane level alone.  A data-plane NaaS optical player would focus on creating an on-ramp to optical paths that could groom some set of electrical interfaces to optical wavelengths.  It would make no sense to do this without adding in control or management interfaces to connect and reconnect stuff.  A control-plane-only player would provide a means of connecting optical lifecycle management to service-layer lifecycle management.

The linkage with the service layer begs a question I’ve discussed in an earlier blog, which is the relationship between optical-layer provisioning and electrical-layer service changes.  I stand by my original views here, which are that there is no value to having a service request at the electrical layer modify optical configuration directly.  In fact, there’s a negative value because you’d have to be concerned about whether a single user could then impact shared resources.  At the most, you could use an electrical-layer service request to trigger a policy-based reconsideration of optical capabilities.

You can see from this that the ability to configure and reconfigure optical paths is valuable in context rather than directly, which means that obtaining service context from above is critical.  That context can be expressed in a NaaS service request if the optical player has a data/control connection, or in a pure control/management request if the optical player couples to services indirectly by policy support.  Without context, either the optical layer has to respond to conditions visible only to itself, or it has to rely on some external (and possibly human-driven) coupling.  Neither is optimal; both reduce optical value and potentially increase opex.

I think we’re at a crossroads here for metro optics, which means for optics overall since the metro space is going to be the bright spot in optical capex.  Take the path to the left and you focus only on the lowest generated cost per bit, and become truly plumbing.  Take the path to the right and you have to shoot tendrils upward to intersect with the trends that are driving fog dispersal of compute resources in the carrier cloud.  I’m planning to do an assessment of where pure-play optical types seem to be heading, and over time look at how the market and vendor plans seem to be meshing as we move forward.

Networking the Fog

I blogged yesterday about the economics-driven transformation of networking from the data center out.  My point was that as operators drive toward greater profits, they first concentrate on higher-layer content and cloud-hosted services, then concentrate these service assets in the metro areas, near the edge.  This creates a kind of metro-virtual-data-center structure that users connect to via access lines, and that connect to each other with metro links.  Traditional hub-and-spoke networking is phased out.

What, in detail, is phased in?  In particular, how are services and service management practices impacted by this model?  If network transformation is going to be driven by economics, remember that saving on opex is the lowest apple available to operators.  How would the metro model impact service automation targeting opex?

Going back to the primary principle for fog-distributed data centers and DCI, you’d want to oversupply your links to eliminate the risk of reducing the size of your on-call resource pool because some resources would be accessible only through congested links.  This concept would have a pretty significant impact on resource management and service management practices.

One of the high-level impacts is on equalizing the resource pool.  Both SDN and NFV have to allocate bandwidth and NFV also has to allocate hosting resources.  Allocation is easy when resources are readily available, and that means that you can grab almost anything that’s preferred for some reason other than residual capacity and know that capacity won’t interfere with your choice.  Want to avoid power-substation dependency?  No problem.  Want data center redundancy in hosting?  No problem.  NFV benefits from having everything in the fog look equivalent in terms of delay and packet loss, which makes allocating fog resources as easy as allocating those in a single central data center.

The next layer down, impact-wise, is in broad management practices.  There are two general models of management—one says that you commit resources to a service and manage services and resources in association and the other that you commit a pool to a service, then presume that the pool will satisfy service needs as long as the resources in the pool are functional.  With virtualization, it’s far easier to do service management using the latter approach.  Not only is it unnecessary to build management bridges between resource status and service status, you can use resources that aren’t expected to be a part of the service at all without confusing the user.  Servers, for example, are not normal parts of firewalls, but they would be with NFV.

With fog-distributed resource pools, you’d want to do capacity planning to size your pools and network connections.  You’d then manage the resources in their own terms, with technology-specific element management.  You’d use analytics to build a picture of the overall traffic state, and compare this with your “capacity-planned” state. If the two were in correspondence, you’d assume services are fine.  If not, you’d have some orchestration-controlled response to move from the “real” state to one that’s at least acceptable.  Think “failure modes”.

If we expand from “management” to “management and orchestration” or “lifecycle management”, the impact could be even more profound.  For SDN, for example, you could use the analytics-derived picture of network traffic state to select optimum routes.  This is important because you want to have a holistic view of resources to pick a route, not find out about a problem with one when you’ve committed to it.  The SDN Controller, equipped with the right analytics model, can make good choices based on a database dip, not by running out and getting real-time status.

For NFV this is even more critical.  First, the “best” or “lowest-cost” place to site a VNF will depend not only on the hosting conditions but on the network status for the paths available to connect that VNF to the rest of the service.  The NFV ISG has wrestled with the notion of having a request for a resource made “conditionally”, meaning that MANO could ask for the status of something and deploy conditionally based on it.  That’s a significant issue if you’re talking about “real-time specific” status, because of course there’s a risk that a race condition would develop among resource requests.  If we assume overprovisioning, then the condition is so unlikely to arise that an exception could be treated as an error.

The evolutionary issues associated with SDN and NFV deployment are also mitigated.  Migrating a VNF from virtual CPE on the customer prem to a fog-edge cloud point is going to have a lot less impact on the service than migrating to a central cloud that might take a half-dozen hops to reach.  For SDN, having edge hosts means you have a close place to put controllers, which reduces the risk of having network state issues occur because of telemetry delays.

Even regulatory problems could be eased.  The general rule that regulators have followed is that paths inside higher-layer services are exempt from most regulations.  That includes CDN and cloud facilities, so our fog-net connectivity would be exempt from neutrality and sharing requirements.  Once you hop off the access network and enter the fog, the services you obtain from it could be connected to your logical access edge point with minimal regulatory interference.

The primary risk to this happy story is the difficulty in getting to the fog-distributed state, for which there’s both an economic dimension and a technical one.  On the economic side, the challenge is to manage the distributed deployment without breaking the bank on what operators call “first cost”, the investment needed just to get to a break-even run rate.  On the technical side, it’s how to transition to the new architecture without breaking the old one.  Let’s start here, because the technical dimension impacts costs.

The logical way to evolve to a fog model is to deploy a small number of servers in every edge office (to reprise the operator remark, “Everywhere we have real estate.”) and connect these with metro fiber.  Many paths are already available, but you’d still have to run enough to ensure that no office was more than a single hop from any other.  Here’s where the real mission of hyperconverged data centers comes in; you need to be able to tuck away these early deployments because they won’t displace anything in the way of existing equipment.

The CORD model, standing for Central Office Re-architected as a Data Center, does a fairly nice job of presenting what this would look like, but the weakness is that it doesn’t focus on “new” revenue or benefits enough.  I think it’s most likely that the evolution to a fog-distributed metro model would come about through 5G deployment, because that will require a lot of fiber and it’s already largely budgeted by operators.  But even if we figure out a way to get fog deployed, we still have to make money on it, which means finding service and cloud-computing missions for the new fog.

All we need to do here is get started, I think.  The operational and agility benefits of the fog model could be realized pretty easily if we had the model in place.  But the hardest thing to prove is the benefit of a fully distributed infrastructure, because getting that full distribution represents a large cost.  The rewards would be considerable, though.  I expect that this might happen most easily in Europe, where demand density is high, or in Verizon’s territory here in the US where it’s high by US standards.  I also think we’ll probably see some progress on this next year.  If so, it will be the first real indicator that there will be a truly different network in our futures.

How Opportunity Will Change the Data Center, and the Global Network

What does the data center of the future look like?  Or the cloud of the future?  Are they related, and if so, how?  Google has built data centers and global networks in concert, to the point where arguably the two are faces of the same coin.  Amazon and Microsoft have built data centers to be the heart of the cloud.  You could say that Google had an outside-in or top-down vision, and that Amazon and Microsoft have a bottom-up view.  What’s best?

We started with computing focused on a few singular points, the habitat of the venerable mainframe computers.  This was the only way to make computing work at a time when a 16kb computer cost a hundred grand, without any I/O.  As computing got cheaper, we entered the distributed age.  Over time, the cost of maintaining all the distributed stuff created an economic break, and we had server consolidation that re-established big data centers.  The cloud today is built more from big data centers than from distributed computing.

That is going to change, though.  Anything that requires very fast response times, including some transactional applications and probably most of IoT, requires “short control loops”, meaning low propagation delay between the point of collection, the point of processing, and the point of action.  NFV, which distributes pieces of functionality in resource pools, could be hampered or even halted if traffic had to go far afield between hosting points and user connections—what’s called “hairpinning” in networking of old.  Cisco’s notion of “fog computing”, meaning distribution of computing to the very edge of a network, matches operators’ views that they’d end up with a mini-data-center “everywhere we have real estate.”

The more you distribute resources, the smaller every pool data center would be.  Push computing to the carrier-cloud limit of about 100 thousand new data centers hosting perhaps ten to forty million servers, and you have between one and four hundred servers per data center, which isn’t an enormous amount.  Pull that same number of servers back to metro data centers and you multiply server count per data center by a hundred, which is big by anyone’s measure.  So does hyperconvergence become hype?

No, we just have to understand it better.  At the core of the data center of the future, no matter how big it is, there’s common technical requirements.  You have racks of gear for space efficiency, and in fog computing you probably don’t have a lot of edge-space to host in.  My model says that hyperconvergence is more important in the fog than in the classic cloud or data center.

You have very fast I/O or storage busses to connect to big data repositories, whether they’re centralized or cluster-distributed a la Hadoop.  You also, in today’s world, have exceptionally fast network adapters and connections.  You don’t want response time to be limited by propagation delay, and you don’t want work queuing because you can’t get it in or out of a server.  This is why I think we can assume that we’ll have silicon photonics used increasingly in network adapters, and why I think we’ll also see the Google approach of hosted packet processors that handle basic data analysis in-flight.

You can’t increase network adapter speeds only to lose performance in aggregation, and that’s the first place where we see a difference in the architecture of a few big data centers versus many small ones.  You can handle your four hundred servers in fog data centers with traditional two-tier top-of-rack-and-master-switch models of connectivity.  Even the old rule that a trunk has to be ten times the capacity of a port will work, but as your need to create data center meshing to allow for application component or virtual function exchanges (“horizontal traffic”) you find larger data centers need either much faster trunks than we have today, or a different model, like a fabric that provides any-to-any non-blocking connectivity.  I think that even in mini (or “fog”) data centers, we’ll see fabric technology ruling by 2020.

The whole purpose of the ten-times-port-equals-trunk rule was to design so that you didn’t have to do a lot of complicated capacity management to insure low latencies.  For both mini/fog and larger data centers, extending that rule to data center interconnect means generating a lot of bandwidth.  Again by 2020, the greatest capacity found between any two points in the network won’t be found in the core, but in metro DCI.  In effect, DCI becomes the higher-tier switching in a fog computing deployment because your racks are now distributed.  But the mission of the switching remains the same—you have to support any-to-any, anywhere, and do so with minimal delay jitter.

Future applications will clearly be highly distributed, whether the resource pools are or not.  The distribution demands that inter-component latency is minimal lest QoE suffer, and again you don’t want to have complicated management processes deciding where to put stuff to avoid performance jitter.  You know that in the next hour something will come up (a sporting event, a star suddenly appearing in a restaurant, a traffic jam, a natural disaster) that will toss your plans overboard.  Overbuild is the rule.

Beyond fast-fiber paths and fabric switching, this quickly becomes the classic SDN mission.  You can stripe traffic between data centers (share multiple wavelengths by distributing packets across them, either with sequence indicators or by allowing them to pass each other because higher layers will reorder them) and eventually we may see wavelengths terminating directly on fabrics using silicon photonics again.  Probably, though, we’ll have to control logical connectivity and white-box forwarding is probably going to come along in earnest in the 2020 period to accommodate the explosion in the number of distributed data centers and servers.

You can see that what’s happening here is more like the creation of enormous virtual data centers that map to the fog, and the connectivity that creates all of that is “inside” the fog.  It’s a different model of network-building, different from the old notion that computers sit on networks.  Now networks are inside the virtual computer, and I don’t think vendors have fully realized just what that is going to mean.

Do you want content?  Connect to a cache.  Do you want processing?  Connect to the fog.  Whatever you want is going to be somewhere close, in fact often at the inside edge of your access connection.  The whole model of networking as a vast concentrating web of devices goes away because everything is so close.  Gradually, points of service (caching and cloud) are going to concentrate in each metro where there’s an addressable market.  The metros will be connected to be sure, and there will still be open connectivity, but the real bucks will be made where there’s significant economic opportunity, which means good metros.  Short-haul optics, agile optics, white-box-supplemented optics are the way of the future.

This adds to the issues of the legacy switch/router vendors because if you aren’t going to build classic networks at all.  The metro model begs for an overlay approach because you have an access network (which is essentially a tunnel) connected through a fog edge to a DCI network.  Where’s the IP or Ethernet?  Both are basically interfaces, which you can build using software-instances and SD-WAN.

The key point here is that economic trends seem to be taking us along the same path as technology optimization.  Even if you don’t believe that tech will somehow eradicate networks as we knew them, economics sure as heck can do that.  And it will, starting with the data center.

Pathways to Network Capex Reduction: Do Any Lead to a Good Place?

Everyone is talking about carrier capex, in some sense or another.  If you’re a vendor you know that your buyers have been pinching pennies, and if you’re an operator you know that return on infrastructure is threatening your capital plans.  The Street has worried about it too, though more for vendors’ profit impact than for operators, since the traditional EBIDTA model discounts capital spending by operators.  The problem with all these discussions is that they don’t really address the critical question of balancing the chances of capex reduction with the risks of transformation.

Capex reduction is the most commonly stated goal for new technologies too, so I wondered how much capex could actually be saved, and pushed some numbers through my market model to find out.  My goal was to test some possible capex-controlling approaches to see how they might do.  I focused on electrical-layer devices only, and looked at the US and international markets.

The most obvious way to control capex is to put price pressure on vendors.  Operators say that’s been done for years, and that they believe that Huawei in particular is a good way of pushing prices down.  In the US, where operators can’t buy from Huawei, equipment prices seem to be about 14% higher, and operators believe that Huawei could be hit for an additional 20% and other vendors by 25% or even 30%.

My model says that current network technology pricing could decline by about 17% outside the US and 24% in the US under optimum price pressure, which in the US would include releasing Huawei to sell here.  Enterprise data is a bit harder to get, but the model suggests that enterprises could reduce their capex by about 19% through pricing pressure, and this is fairly consistent across the market segments.

The next approach is to substitute “white-box” technology for traditional electrical-layer devices.  There is at present no direct white-box alternative for all the electrical-layer devices in use; generally, it’s available for the mid-range and below products but not for the terabit-level stuff.  Where you can get white-box stuff, direct 1:1 substitution without a technology shift (to OpenFlow, for example), the savings potential is slightly better, meaning 19% internationally and 27% in the US.

We now enter the gray area, because the other options for capex reduction have to be considered with some skepticism.  SDN and NFV create a different network model, one that would have to be assumed would impact operations costs as well.  There are no really good ways of estimating the impact because we don’t have many vendors who even tell a complete story, and in any case I’m trying to look at capex here.  So for the stuff below, I’m making a critical assumption, which is that SDN and NFV opex impact is neglected.  Service automation creates most of the opex benefit in any case, and you don’t need SDN or NFV to employ that.

Moving up, the “hosted-router-switch model” would substitute software instances of L2/L3 devices for the physical appliances.  The software would be hosted on “commercial off the shelf” or COTS servers.  This isn’t NFV yet, it’s a static model of software switch/routing.  Like white-box products, this approach isn’t extensible to the very top end of the range of devices, but where it’s linked with SD-WAN edge technology to create overlay networks, the model says it saves 21% of capex internationally and 29% in the US.

True, OpenFlow, SDN is next.  This model combines white-box and central controller logic and generates a lower capex not only because the devices are cheaper but because the stuff is less complex.  However, server cost and complexity for the controller has to be considered.  If we stay with traditional network applications and forego cloud multi-tenancy for the moment, the model says that this approach would save 23% of capex internationally and just under 30% in the US (US data center switch prices are lower, even not considering Huawei).

NFV comes next.  Here, the challenge is that NFV even more than OpenFlow SDN should really be considered not box for box but on an alternative-architecture approach, and the benefits will vary considerably depending on the mission.  I’ve modeled only two for this blog.  The first is “vCPE” or “service chaining” where software features substitute for premises equipment.  In the model, a premises-device-hosted vCPE approach showed only a 15% capex savings, with little variation by market area.  Cloud-hosting the software features proved impossible to model because it’s impossible to assess the cloud cost without knowing the volume of deployed servers, and vCPE can’t drive mass deployment alone.

In the mobile application and CDN area, the model says that NFV could save almost 20% of mobile-or-CDN-specific capex internationally and 25% in the US.  This benefit, I think, understates the value of NFV in mobile and CDN applications; much of the benefit would come from creating more agile deployments that (without NFV) would cost more than today’s mobile infrastructure.  If you try to factor that in, the savings jump to 29%, but these numbers are softer because of the difficulty in establishing a model census for data center resources.

What’s the best case?  This is really a modeling stretch because of the host of assumptions you have to make, not the least of which is that because there’s no vendor offering this solution, you kind of have to make up a product set.  However, if you assume that white-box OpenFlow SDN is used as an overlay on agile optics, and that this segments networks to the point where hosted router/switch technology can be applied throughout, and that NFV then redefines metro and feature hosting, you get an international savings of 33% of capex and a US savings of 38%.

The important point here is that L2/L3 boxes don’t make all of network capex; the actual percentage for L2/L3 varies by operator type, but a good benchmark is that access makes up about 40% of capex (and isn’t impacted by L2/L3 changes) and fiber makes up about 30%.  That means roughly 30% of total capex is impacted by these measures.  Since globally, capex makes up about 18 cents of every revenue dollar on the average, optimum capex measures could save about 7 cents of each revenue dollar.

My model had previously shown that service automation techniques, applied to all services in an optimum way, would save 7.7 cents of every revenue dollar by 2020 without changing capital equipment spending or practices.  Projecting further (and obviously with less confidence!) into the future, the service automation approach appears to plateau at approximately 12 cents per revenue dollar, which could be achieved in 2024.

You can draw a lot of interesting stuff from this.  First, it would be relatively easy for vendors to secure greater savings for operators through service automation than the operators could get by “saving money” on equipment.  In fact, the potential savings is nearly double.  The ROI on this savings is considerably better too.  But the second point is also critical; capex reduction savings and opex savings are additive.  Operators could totally transform themselves by harnessing both, and by 2024 could hope to have saved a total of 22 cents of every revenue dollar, which means they’ve cut costs overall by more than a quarter.

The best ROI could be achieved by combining service automation with the hosted router-switch model and SD-WAN principles, which would generate a significant capex savings and could easily be paid forward through savings from the opex side.  The model suggests that short of SDN virtual wire underlay and SD-WAN overlay, there aren’t many other capital-focused steps that could really pay a big dividend.

It’s essential to realize that these pathways to transformation aren’t accumulative in savings, though.  The same model says that the most you could achieve in capex reduction would be from my “optimum” model that forecasts 33%/38%, so all the others are going to be optimal only to the extent that they lead to this happy place.  That shows that you have to understand your transformation goal before you start chasing it.

It’s Time to Get to the Real 5G Issues and Architectures

Everyone probably knows the old story about trying to identify an elephant behind a curtain.  If you grab a leg, you think it’s a tree; grab a trunk and it’s a snake.  I’ve been reading the stories on 5G for the week, and it’s hard not to believe that we’re back to groping the elephant here.  If so, we’re again at risk to under-supporting and misunderstanding a massive market shift, because we persist in focusing on its piece and not the whole.

I’ve noted in earlier blogs that 5G is a lot like an arms race; operators know that a simple “4-is-better-than-3-so-5-is-better-yet” market value proposition is effective.  Customer acquisition and retention is the largest component of what I call “process opex”, and most operators don’t think they could survive as the only 5G holdout.  Given that, something that we’d be able to call a 5G evolution is inevitable.  What that means is a bit murkier, because a lot of the 5G revolution has to play out in the rest of network infrastructure, where uncertainties abound.  And completing or harmonizing 5G standards won’t address these issues at all.

Going back to the groping of elephants, just look at the wish list for 5G.  For many, this is about the difference in the radio network’s capacity.  5G would deliver over a gigabit of data, potentially.  For others, it’s about the number of devices; 5G is supposed to support IoT by allowing many more devices to be connected.  Network equipment vendors think 5G is about getting rid of a lot of specialized mobile metro gear and going back to something more like vanilla IP routing.  Intel, according to a Light Reading piece, sees 5G as an opportunity to push their chips into wireless prominence, and operators think it might be an opening to introduce white-box and NFV technology.  Has there ever been such a laundry list of wishes?  Sure; with SDN, NFV, and IoT, and of course frame relay and ATM before them.  Is there a better chance of wish fulfillment with 5G?  Maybe.

Money is one big thing 5G has going for it—meaning budget money and not just vendor revenue hockey-stick aspirations.  The big difference in 5G is that market evolution in at least a titular sense seems inevitable for the customer acquisition and retention reasons I’ve already noted.  Unlike other stuff like SDN and NFV, which are still largely science projects seeking a business case, 5G has the best business case possible—do it or you lose all your business.

There are, however, worse things than losing your business by not investing in the future.  You could invest a boatload of cash and lose it anyway.  The big worry now for operators is whether they can make that 5G investment into more than table stakes, and here some old and new truths could help them plan.

The old truth is that for several decades, network profits have been (like, it’s said, “all politics” is) local.  About 80% of profitable traffic is carried 40 miles or less.  Not only that, efforts to improve the price/performance of the big traffic headache, OTT video, has resulted in turning the Internet from a global network in a traditional sense to a bunch of loosely connected metro enclaves.  Hope what you want is cached, because if it isn’t then you’re trusting it to the Great Beyond.

Cisco, who often sees the future correctly then blurs the vision for opportunistic reasons, predicted another trend, with what they called “fog computing”.  The notion was that if a cloud is dispersed, fog is more so and because of ultra-dispersal it’s putting your stuff, your information, your content, closer to you.  You can see content delivery networks (CDNs) as a poster child for cloud-to-fog evolution.  People are now talking about caching in the neighborhood or even in the home.

All of this is driven by personalization, which is a nice way of saying that we want what we want when we want it.  Every whim demands satisfaction, and every whim is expressed through a little portable device that has more compute power than a data center did fifty years ago, and is connected through a broadband pipe faster than 90% of business sites had only 20 years ago.  These people aren’t doing abstract research, they’re trying to tie IT and the Internet into their lives at this very moment.  Not only that, they’re moving around while they do it.  You can see this as a world-class driver of infrastructure agility and it doesn’t matter what “G” they’re connected on.

NFV and SDN have been the technology targets in achieving that agility.  Virtualization, properly used, lets operators spin up instances of stuff that’s being over-used, change routes to accommodate traffic patterns created when all our personalization gets aggregated by some concert or game, and deal with the expected new demands of contextual services based on social connectivity and IoT.  You create a kind of blank slate of resources, and then write on it with virtualization.  To make that work, you have to host features as software rather than solidify them in appliances, and that’s a cloud mission.

If we presume a distribution of caches, and if we then presume that cloud computing.  If we presume successful mobile virtualization via NFV adaptations to mobile services, we create a mesh of in-metro data centers.  It’s this into which we have to fit 5G.  Yes, 5G has to fit in, fit into a profitable, responsive, infrastructure plan.  It’s going to fund a lot of the plan but it’s not the direct driver.  Personalization and metrofication are driving both agility and 5G.

To avoid getting G-centric, we have to ask where all this agility would lead without 5G or we risk stranding ourselves in sub-optimal technology till 5G becomes pervasive.  We have today, in mobile NFV, perhaps a half-dozen truly useful VNFs.  We have in vCPE, presuming we started using 5G to do wireline last-500-meters, another half-dozen truly useful VNFs.  From there, we move to things less useful and compelling when we need to be finding things that are better.  The only way to add value to this picture is to forget the limiting notion of a virtual network function and embrace the notion of a general cloud component.  Some of them will be network functions, some application functions, some IoT or collaboration functions.

Whatever operators use to connect with their customers, whether it’s fiber or RF or tin cans and strings, they will need to have something of a higher level to make money on.  That lesson should have been brought home by Cisco’s quarter and restructuring.  The value in networking is in the cloud, and getting the cloud and the network together and not separated into layers that will end up being operator and OTT is the mission of network planners (vendor and operator) in the near future.  5G isn’t going to change the picture, only change the details on how it’s applied.  The stuff we’re worrying about in the 5G evolution is trivial, table stakes.  Standards will define it; all vendors will support it.  All it does is connect, at the end of the day.  We need to build the stuff that empowers, and by doing that get the “value” back into “value proposition.”

The metro network is going to turn into a cloud enclave, a place that’s connected with fiber and SDN and over which we build extemporaneous connection networks for various users and applications, including mobile, IoT, and the infrastructure elements of 5G.  There will be new issues associated with the dynamic adaptation of infrastructure to match what we expect 5G to deliver in the RF domain with C-RAN.  All these things will depend on standard approaches, but all of them have to be hosted in an agile metro platform domain, and deploying that domain and making it profitable for every access mechanism it supports is the challenge, not simply “5G”.

Getting back, the, to our elephant, the most important point about 5G is that it’s at the endpoint (for now, of course) to a set of trends in demand and infrastructure evolution.  What’s important is first that set of trends and only second the details of how 5G accommodates them.  That’s because without the trends, 5G is just an expensive wireless transformation of supply, not an accommodation of demand.

One of the interesting things about all of this, presuming you agree with my points, is that it would suggest a much broader 5G opportunity than the current radio-dominated vision would permit.  Few vendors have credibility with the radio network, but a lot of vendors would benefit mightily from the broader transformation vision I’m describing.  If that’s true, then one or more of them could step up and promote a change of mindset, and that could be revolutionary in terms of 5G benefits.  By semi-disconnecting them from actual 5G, they could be realized in an evolutionary way even now, and by framing 5G in benefit terms instead of trying to find benefits to suit our conception of 5G, we have a better chance of actually making it happen.

Cisco’s Reorganization and Network Transformation: Do They Fit?

It is now being widely reported that Cisco is going to undertake a major restructuring.  The Street said Cisco would lay off between 12% and 20% of its global workforce, and that it’s related to shifting resources from hardware to software.  In their earnings call, Cisco confirmed the latter but said that only about 5,400 would be laid off.  Since Cisco is the market leader in IP, which everyone seems to think is invincible as a technology, any significant restructuring would create considerable market shock and raise the inevitable “What now?” question.  Let’s try to answer it.

Cisco and all the other network equipment companies got to where they are by ignoring the future to feed the present.  The primary reason for this was the knee-jerk reaction of legislators and regulators to the NASDAQ bubble of the late ‘90s.  Sarbanes-Oxley (SOX) in 2002 made it very difficult for companies or the Street to push anything but the current quarter.  If you reward a particular class of behaviors, or more dramatically if you demand them by law, you get what you asked for.

Feeding the future means doing what’s needed to strengthen your position with your buyers in the long term.  Sometimes that means taking a hit in the near term, supporting product and technology transitions that could hurt your revenue for multiple quarters.  Because of SOX, in large part, that doesn’t happen much and certainly didn’t with Cisco, who is notoriously sales-driven.  As a result, they face an immediate threat in the operator space, and a delayed one in the enterprise market.

The root of Cisco’s operator problem, and the market’s, is that despite hope and hype, operator profit per bit carried has been declining for two decades, for as long as SOX was operative though the timing I think is coincidental.  Operators have been talking about the problem openly for a full decade, and proposing various projects to relieve the pressure under the name “transformation”.  Everything that’s been going on in the network equipment space for the last decade, including things like SDN and NFV, the cloud, mobility, 4 and 5G and so forth, have been driven largely by the converging cost/price curves for bits.  Operators have pled for relief.  They’ve shifted increasingly to Huawei as a price leader, to the point where Huawei is probably the only equipment company doing well.  Still everyone, Cisco included, stuck their head in the sand.

If revenue per bit and cost per bit are converging to exclude profit and ROI, you spend less.  I’ve read transcripts on a decade’s worth of earnings calls and I never saw any vendor point out this obvious point or claim a strategic shift that was explicitly aimed at correcting it.  Where I will fault Cisco specifically is that it instead implicitly promoted the notion that as long as Internet traffic was exploding, operators were obliged to carry it, profitable or not.  Their Internet index stuff never called for any steps to address ROI with the increases in traffic, and they’ve backed regulatory positions that would favor settlement for Internet traffic and “non-neutral” concessions.

But that was then and this is now.  Even now, nobody including Cisco is stepping up to say that things can’t go on as they are, but restructuring sure implies that, and the promise of a software shift is explicit.  The question is what they mean by “software”.  You can do a software transformation of your network gear, shifting to a hosting and subscription model.  You can elevate your story to a higher, above-network layer too.  Cisco seems to be focusing on the former, which changes their business less but changes their revenue prospects more.

Shifting to software-based networking and cutting costs by cutting staff is an admission of change, but it doesn’t specifically make a critical point, which is that operators are going to put capex under stringent pressure if they can’t raise ROI.  Yes, a software-centric vision could be more appealing in a cost sense, but if it is then you’re making an important point in a too-subtle way.  We are shifting to software to allow operators to spend less.  Thus, we will make less.

The interesting thing is that while constraining capital spending seems to be the pathway of choice for operators, the fact is that it’s not one that they generally believe in.  Way back in 2013 a group of global Tier Ones told me that they didn’t believe that NFV or any other new technology would reduce capex by more than 25% (they now tell me less than 20% is likely) and that they could get that much by “beating up Huawei on price.”

That poses the central question raised by all of this, which is whether there is a way to transform networks that solves the profit-per-bit problem a lot better than just buying cheaper switches and routers.  If the answer is “Yes!” then Cisco could hope to restructure itself to fit the new demands of the transformed network space.  Cisco has the capability of supporting this kind of transformation, but they seem unwilling to get involved in “transformation education” for their buyers.  If it’s “No!” then it’s doubtful that operators could be induced to try any radical new infrastructure models, and network connection and transport will totally commoditize.

On the enterprise side there’s also a “driver” problem.  While customer revenues are the top-line engine for operator profits, worker productivity is the top-line engine for enterprises.  The problem is that we haven’t had an effective new driver for productivity enhancement since the late 1990s.  Historically, networking has followed the rest of IT in that spending was a combination of sustaining current infrastructure (“budget”) and supporting new and beneficial activities (“projects”).  Over the last decade, project spending as a percent of total spending has steadily declined, which means that most spending by enterprises just keeps the network lights on.  How do you, as a buyer, make that better?  By spending less.

It’s pretty clear that software would have to drive enterprise productivity changes, and Cisco has in the past aimed at productivity, but not very insightfully.  The answer, they suggested, was full-room videoconferencing.  The number of workers that such a technology would touch is minimal, neglecting the question of whether even for them there’d be a productivity benefit to justify the cost.  Work is populist by nature, and if you want to empower workers you have to empower a lot of them or nothing much happens to your bottom line.

Collaboration is an option for empowerment.  So is mobility, and contextual communications, and IoT.  However, you can’t empower people by just connecting them, which means the mission for productivity software has to go beyond simply generating traffic.  You have to get to the worker, what the worker needs in terms of information and relationships, and only then worry about connections.  This is a complicated undertaking, requiring a lot of planning and a lot of sales effort educating the buyer.  It’s not the sort of thing you do when regulations demand that you be rewarded only for your current quarter.

This is Cisco’s challenge now, and the industry’s challenge too.  For decades we’ve thought about the problem of the future as being “connected”.  More networking means more value, because the network was what was holding things back.  That’s not the problem anymore, and Cisco’s moves prove that.  The problem is being relevant, valuable in what we present though the connections.  You can’t outrun product creation with product delivery.

Software can define a network.  Software can automate manual tasks, and software can create information and contextual value.  But a software-defined network is still really defined by a connectivity mission, and manual-task automation eventually runs out of humans to displace.  Cisco’s software changes will help it only if they follow the network-to-value trail farther than they have in the past.  The rest of the industry will have to learn the same lesson.

They could, and still can.  Cisco is unique in the industry in the breadth of its technology offerings.  The cloud, both the technological and “spiritual” fusion of network and IT, is something they probably get better than any other network vendor.  Certainly they’re better positioned to leverage it than any other network vendor, but they have been bound by their own constraints, and by the ubiquitous SOX.  They now have to break free.

Apply software to connection networks and you commoditize faster and are eaten by open-source.  Cisco has to focus on the leading edge of the productivity and profit value proposition in both the enterprise and operator space, or they will never be a high-margin company again, with command of the markets and their destiny.  There is absolutely nothing to stop them from doing all of this…but themselves.  We’ll see if it does.

What Do the Network Operators Think About IoT?

In my blog yesterday, I talked about what operators thought about vCPE and how their views and the demographics of the service market could impact an NFV vision driven by service chaining.  I also had a chance to talk with the same operators about the Internet of Things, and I think their views in that area are at least as interesting.  Most interesting, perhaps, are some of the preconceptions they hold, or at least profess to hold.

The most important question about IoT, as it is for anything else in the tech space, is how the business case will be made.  What do operators think?  Dive through their views and it comes down to “LSODDI”, meaning “Let Some Other Dude Do It.”  You could say that some think IoT will simply happen, and that others think it will emerge as OTT-like players enter the space.  Virtually none of the operators think that they, meaning the network operators, will be the ones driving it.

This interesting view may explain why operators are so focused on the connection side of IoT.  If IoT is going to happen automatically, all they really do need to do is to sit back and let the 4/5G connection revenue roll in.  I don’t have to tell you, I’m sure, how illogical and destructive I think this particular view is, but in a sense it’s not surprising.

The most significant phenomena in networking in modern times was the Internet, which created a residential data application that transformed residential communications, and eventually just about everything.  The Internet sprung up on its own, essentially, independent of any operator support and exploiting only the connection services then available.  Operators then developed Internet access to suit the emerging market.  Why not think that would work with IoT?

Good question, in many regards.  Could a Google or Microsoft or Amazon be the genesis of the new IoT wave?  All of these companies have IoT initiatives, but you can see that they’re mostly focused on supporting someone else’s IoT applications rather than developing their own IoT framework (at least for now).  So who do operators think will drive things, if they aren’t going to do it?

About a third think that the current OTT giants I just named will be the IoT giants.  Another third think that there will be some new startup wave, a kind of social-media-like explosion.  The final third admit they don’t know who will do it, but they’re confident somebody will and that they’ll know IoT when they see it happen.

So could this work?  It depends on a lot of undependable factors that all net out to the question of whether IoT is more like OTT, or more like the old Bell System.

The big challenge with IoT versus the original Internet is the lack of a single, clear, technology core concept.  The Internet was created not by a network at all, but by HTML.  There was an Internet before anything we’d recognize as one, and it was consumerized by the notion of the web, which was based on HTML.  A simple HTML engine—a browser—makes anyone capable of rendering pretty pages, and with that we were off.  What is the simple centerpiece of IoT?

The next challenge is a massive ecosystemic dependence.  For IoT to work you need a huge collection of sensors and controllers, networking to connect them, applications to exploit them, security and compliance, and partners—all of which means an enormously complicated business case.  Anyone who wanted to put a server behind a bank of modems could have created an Internet of One.  By nature, IoT is more complicated.

Complicated, but this may be the core of what operators think will happen.  Most operators with IoT aspirations are supporting IoT with developer communities, which is the classic strategy for building co-dependencies.  Sadly, most of these communities are focused around managing the connection side of IoT, which apparently is about all the operators can visualize.  Is that true?  Do they see the rest?

No, not much.  The majority of operators actually do see IoT as the media describes it.  There are a zillion sensors and controllers on the Internet, each with a spanking new 4/5G radio (when you ask who’s paying for this, they have only vague answers).  People write applications to exploit the data, sell the applications, and oh yes, pay the “thing-phone-bills.”

You’d think that if anyone believed that IoT was “Bell-like” it would be the telcos, but comments from operators suggest that they’ve become used to the notion that regulations aren’t a way of protecting investment but rather a way of putting it at risk.  Even those who think that another regulated monopoly is the way to go (they exist, but in a decided minority) don’t think there’s any chance that the government would take that kind of action today.

Absent regulated monopolies of some sort, I don’t think any party would invest in open sensors in the quantity needed to drive IoT forward.  There would be a way, though.  All you have to do is forget the notion of sensors on the Internet and move to the notion of “virtual” sensors online.  If we leveraged the vast number of sensor/controller devices out there, selectively posting their data under the proper level of social control, we could build a community of information large enough to drive an OTT-like opportunity.  But then we get rid of all those lovely-to-the-operator “thing-phone-bills”.

In my view, operators are stuck in an IoT dilemma of their own making.  On the one hand, they want to collect new revenue from wireless attachment of “things”.  On the other hand, they don’t want to drive “thing” deployment even though, financially speaking, it would be easier for them to do that given their public-utility roots.

What this suggests to me is that despite the promise of IoT, realization is likely to be delayed because of a lack of a sensible way of moving forward, a way that deals realistically with the investments and risks that IoT will pose.  And when it does move, it will likely be driven by a non-operator player or players, and operators will then complain about being “disintermediated.”  This time, at least, they’ve had plenty of warning and a direct example from their past OTT experiences, and they’ll have no one to blame but themselves.

What’s the Latest on NFV Justification?

The big question about any new network technology has always been whether it could raise revenues.  Cost reduction is nice, but revenue augmentation is always a lot nicer—if it’s real.  With NFV, the focus of the revenue hopes of operators has been virtual CPE (vCPE) that could offer rapid augmentation of basic connection services with add-ons for security, monitoring and management, and so forth.  In fact, because vCPE is a pretty easy service to understand, it’s also been the focus of NFV PoCs and of many vendors.

Operators aren’t completely sold on the concept, though, and the reason is that many have been encountering early issues in vCPE-based services.  Some have told me that they are now of the view that they’ll have to reimagine their vCPE deployment strategy, in fact.  Nearly all are now “cautious” where many had been “hopeful” on the prospects of the service.  What’s wrong, and how can it be fixed?

The value proposition for vCPE seems simple.  You have a customer who needs some higher-layer services at their point of network connection.  Security-related services like firewall are the biggest draw, followed by VPN, application acceleration, and so forth.  These services would normally be provided by specialized appliances at each site, and with vCPE they can be hosted in the cloud and deployed on demand.

There are two presumptions associated with this model.  First, you have to presume that there are enough prospects for this kind of service.  Second, you have to assume that you can deliver the service to the customer at a lower cost point than the traditional appliance model could hit.  Operators report some issues in both areas.

Despite some vendors (Adtran, recently) advocating the use of NFV and cloud hosting as the platform for delivering services to SMBs, operators are indicating that the interest in the vCPE model doesn’t seem to extend very far down-market.  To offer some numbers from the US market, we have about seven and a half million business sites here, which seems a lot.  Less than half of them are network-connected in any way.  Of that half, only about 150,000 are connected with anything other than consumer-like broadband Internet access.  Globally, there’s a bit more than half-a-million such sites.

Business broadband using consumer-like technology (DSL, cable, FiOS, etc.) is almost always sold with an integral broadband access device that includes all the basic features.  My surveys have always shown that these devices are very low-cost in both opex and capex terms, whoever buys them.  In fact, the number of SMBs who reported incurring any significant cost in broadband attachment, including add-on elements like security, was insignificant.  What this means is that about one site in fifty is a prospect for any sort of vCPE service unless you step outside the realm of what’s currently being used.

Operators also tell me that even that one-in-fifty odds can be optimistic.  The problem is that most companies who have network connectivity today had it a year ago, or more.  The need for a firewall or a VPN isn’t new, and thus it has probably been accommodated using traditional devices.  Users who already have what they need are uninterested in new services whose cost is higher because those services include vCPE features.

You can see the pattern of vCPE success already; where “managed services” are the opportunity then vCPE is much more likely to be successful.  MSPs offer the service and the CPE together, and if you can reduce the cost (both capex and opex) of fulfilling managed services you can earn more money.  However, most enterprises aren’t interested in managed services because they have professional network staffs available.  That squeezes vCPE opportunity into the high end of SMBs, perhaps to professionals-dominated sites where you have valuable people who aren’t particularly tech-savvy.

The cost problem remains, or at least the impacts remain.  The presumption with NFV has been that “cloud hosting” of virtual functions would offer significant economies of scale.  That’s probably true, providing you have a cloud to host in.  Most operators not only have no such cloud in place, they don’t have the opportunity density for vCPE to justify building one.  You can’t backhaul VPN access very far without incurring too much cost and delay, so centralized hosting isn’t easy.

This has given rise to the idea that “vCPE” really means an agile premises box that’s kind of a mini-server into which you load features that you call “VNFs”.  In point of fact, there’s no need to use any of the standard NFV features at all in such a configuration, unless you believe that you can evolve to a cloud-hosted model for vCPE or can drive toward true NFV another way, then reuse the facilities for your vCPE deployments.

There is absolutely no question that there is an opportunity for vCPE created with this agile device, but it’s not a server or cloud opportunity.  There’s no question that it could evolve into a cloud opportunity if you have some other means of driving cloud deployment to reach a reasonable level of data center distribution near the access edge.  The problem is that this all begs the question of what’s going to create that driver for cloud deployment en masse.

This suggests that we’re spending too much time focusing on vCPE, because it’s not going to be the thing that really drives NFV success.  For that, you have to look to an application of NFV that has a lot more financial legs.  As I’ve noted in the past, there are two pathways toward broad-based NFV deployment—mobile infrastructure and IoT.

Operators love the idea of mobility as a driver for NFV; every mobile operator I’ve talked with believes that NFV would improve their agility, reliability, and capability.  They’re most interested in NFV as a part of a 5G rollout plan, since most of them believe that they’ll have to adopt 5G and will also have to transform their mobile backhaul (IMS, EPC) infrastructure to accommodate and optimize 5G.  They also tell me that they are getting 5G-centric NFV positioning from at least two vendors, which means that there’s already competition in this area.

The challenge with 5G as a driver is twofold.  First, you have to wait for it to happen; most believe it will roll out no earlier than the end of 2018.  That’s a long time for NFV vendors to wait.  Second, the 5G driver seems to preference the mobile-network-equipment vendors, which means everyone else is pushing noses against candy-store windows.

IoT looks more populist, more cloud-like, but the problem there is that operators are far from confident that they should take a cloud-hosting role in IoT.  They’d love to simply connect all the “things” using expensive wireless services and let the money roll in.  It’s not a totally stupid concept, if you presume that over time every home and office and factory with security, environment, and process sensors will end up being connected wirelessly.  If you think every sensor gets a 4/5G link, you’re imbibing something.

The problem with the IoT model is that unlike mobile infrastructure, operators don’t have anyone in the vendor space ringing the dinner bell for the revenue feeding.  There are a few IoT players (GE Digital with Predix) that actually have the right model, but operators don’t seem to be getting the full-court press on solutions they can apply to their own IoT services.

The net of all of this is that we are still groping for something that could create a large enough NFV service base to actually justify full-scope NFV as the standards people have conceptualized it.  We’re in the “NFV lite” era today, and vCPE isn’t going to get us out of that.  The winners in the NFV vendor space will be companies who figure out that the key to NFV success is justifying a cloud.

What’s the Connection Between “Open” and “Open-Source”?

The transformation of telecommunications and the networks that underlay the industry is coming to grips with what may seem a semantic barrier—what’s the relationship between “open” and “open-source?”  This seems to many a frustratingly vague problem to be circling at this critical time; something like the classic arguments about how many angels can dance on the head of a pin or how big a UFO is.  There’s more to it than that; a lot of stuff we need to address if we’re going to meet operator transformation goals.

Operators have, in modern times at least, demanded “open” network models.  Such a model allows operators to pick devices to fit in on the basis of price and features because the interfaces between the devices are standardized so that the boxes are at least largely interchangeable.  Nobody can lock in a buyer by selling one box which then won’t work properly unless all the boxes come from the same source.

I’ve been involved in international standards for a couple of decades, and it’s my view that networking standards have focused on the goal of openness primarily as a defense against lock-in.  It’s been a successful defense too, because we have many multi-vendor networks today and openness in this sense is a mandate of virtually every RFI and RFC issued by operators.

The problems with “open” arise when you move from the hardware domain to the software domain.  In software, the equivalent of an “interface” is the application program interface or API.  In the software world, you can build software by defining an API template (which, in Java, is called an “interface”) and then define multiple implementations of that same interface (Java does this by saying a class “implements” an interface).  On the surface this looks pretty much like the hardware domain, but it’s not as similar as it appears.

The big difference is what could be called mutability.  A device is an engineered symphony of hardware, firmware, and perhaps software, that might have taken years to perfect and that can be changed only with great difficulty, perhaps even only by replacement.  A software element is literally designed to allow facile changes to be made.  Static, versus dynamic.

One impact of this is elastic functionality.  A router, by definition, routes.  You connect to a router for that purpose, right?  In software, a given “interface” defines what in my time was jokingly called the “gozintas” and “gozoutas”, meaning inputs and outputs.  The function performed was implicit, not explicit, just like routers.  But if it’s easy to tweak the function, then it’s easy to create two different “implementations” of the same “interface” that don’t do the same thing at all.  Defining the interface, then, doesn’t make the implementations “open”.

On the positive side, mutability means that even where different “interfaces” to the same function are defined, it’s not difficult to convert one into another.  You simply write a little stub of code that takes the new request and formats it as the old function expects, then invokes it.  Imagine converting hardware interfaces that way!  What this means is that a lot of the things we had to focus on in standardizing hardware are unimportant in software standards, and some of the things we take for granted in hardware are critical in software.  We have to do software architecture to establish open software projects, not functional architecture or “interfaces”.

IMHO, both SDN and NFV have suffered delays because what were explicitly software projects were run like they were hardware projects.  IMHO, open-source initiatives like Open Daylight or OPNFV were kicked off to try to fix the problem, which is how open-source got into the mix.

Open-source is a process not an architecture.  The software is authored and evolved through community action, with the requirement that (with some dodges and exceptions) the stuff be free and the source code made available.  There are many examples of strong open-source projects, and the concept goes back a very long way.

You could argue that the genesis of open-source was the in university and research communities, the same people who launched the Internet.  The big, early, winner in the space was the UNIX operating system, popularized by UC Berkeley in what became known as “BSD” for the Berkeley Software Distribution.  What made UNIX popular was that at the time it was emerging (the ‘80s) computer vendors were recognizing that you had to have a large software base to win in the market, and that only the largest vendors (actually, only IBM) had one large enough.  How do these vendors compete without anti-trust?  Adopt UNIX.

The defining property of open-source is that the source code is available, not just the executable.  However, very few users of open-source software opt to even receive the source code and fewer do anything with it.  The importance of the openness is that nobody can hijack the code by distributing only executables.  However, there have been many complaints through the years that vendors who can afford to put a lot of developers on an open-source project can effectively control its direction.

For network operators, open-source projects can solve a serious problem, which is that old bugaboo of anti-trust intervention.  Operators cannot form a group and work together to solve technical problems.  I was involved in an operator-dominated group, and one of the big Tier Ones came in one day and said their lawyers told them they had to either pull out of the group or wrap the group into a larger industry initiative that wasn’t operator-dominated, or face anti-trust action.  The problem of course is who buys carrier routers except carriers, and how can you preserve openness if you have to join forces with the people who are trying to create closed systems for their own proprietary benefit?

An open-source project is a way to build network technology in collaboration with competitors, without facing anti-trust charges.  However, it poses its own risks, and we can see those risks developing already.

Perhaps the zero-day risk to creating openness with open-source is the risk that openness wasn’t even a goal.  All software isn’t designed to support open substitution of components, or free connection with other implementations.  Even today, we lack a complete architecture for federation of implementations in SDN and NFV for open-source implementations to draw on.  Anyone who’s looked at the various implementations of open office software knows that pulling a piece out of one and sticking it in another won’t likely work at all.  The truth is that for software to support open interchange and connection, you have to define the points where you expect that to happen up front.

Then there’s the chef-and-broth issue.  Let’s start with a “software router” or “VR” concept in open-source.  The VR would consist of some number of components, each defined with an “interface” and an “implementation” for the interface.  A bunch of programmers from different companies would work cooperatively to build this.  Suppose they disagree?  Suppose the requirements for a VR aren’t exactly the same among all the collaborators?  Suppose some of the programmers work for vendors who’d like to see the whole project fail or get mired in never-ending strife?

Well, it’s open-source so the operators could each go their own way, right?  Yes, but that creates two parallel implementations (“forks”) that if not managed will end up breaking any common ground between them.  We now have every operator building their own virtual devices.  But even with coordination, how far can the forks diverge before there’s not much left that’s common among them?  Forking is important to open-source, though, because it demonstrates that somebody with a good idea can create an alternative version of something that, if it’s broadly accepted, can become mainstream.  We see a fork evolving today in the OpenStack space, with Mirantis creating a fork on OpenStack to use Kubernetes for lifecycle orchestration.

Operators have expressed concern over one byproduct of community development and forking, which is potentially endless change cycles, version problems, and instability.  I’ve run into OpenStack dependencies myself, issues where you need specific middleware to run a specific OpenStack version, which you need because of a specific feature, then you find that the middleware version you need isn’t supported in the OS distro you need.  Central office switches used to have one version change every six months, and new features were advertised five or six versions in advance.  The casual release populism of open-source is a pretty sharp paradox.

The next issue is the “source of inspiration.”  We’ve derived the goals and broad architectures for things like SDN and NFV from standards, and we already know these were developed from the bottom up, focusing on interfaces and not really on architecture.  No matter how many open-source projects we have, they can shed the limitations of their inspirational standards only if they deliberately break from those standards.

The third issue is value.  Open-source is shared.  No for-profit company is likely to take a highly valuable, patentable, notion and contribute it freely.  If an entire VR is open-source, that would foreclose using any of those highly valuable notions.  Do operators want that?  Probably not.  If there are proprietary interfaces in the network today, can we link to them with open-source without violating license terms?  Probably not, since the code would reveal the interface specs.

The bottom line is that you cannot guarantee an effective, commercially useful, open system with open-source alone.  You can have an open-source project that’s started from the wrong place, is run the wrong way, and is never going to accomplish anything at all.  You can also have a great one.  Good open-source projects probably have a better chance of also being “open”, but only if openness in the telco sense was a goal.  If it wasn’t, then even open-source licensing can inhibit the mingling of proprietary elements, and that could impact utility at least during the transformation phase of networking, and perhaps forever.

“Open” versus “open-source” is the wrong way to look at things because this isn’t an either/or.  Open-source is not the total answer, nor is openness.  In a complex software system you need both.  Based on my own experience, you have to start an “open system” with an open design, an architecture for the software that creates useful points where component attachment and interchange are assured.  Whether the implementation of that design is done with open-source or not, you’ll end up with an open system.