Which of the Many NFVs are Important?

Sometimes words trip us up, and that’s particularly true in tech these days.  Say we start with a new term, like NFV.  It has a specific technical meaning, but we have an industry-wide tendency to overhype things in their early stages, and vendors jump onto the concept with offerings and announcements that really aren’t strongly related to the original.  Over time, it gets harder to know what’s actually going on with that original concept.  Over time, the question arises whether the “original concept” is really important, or whether we should accept the dynamic process of the market as the relevant source of the definition.  So it is with NFV.

The specific technical meaning of NFV would be “the implementation of virtual function hosting in conformance with the ETSI ISG recommendations.”  Under this narrow definition, there is really relatively little deployment and frankly IMHO little opportunity, but there are some important forks in the road that are already established and will probably be very important.  In fact, NFV will leave a mark on networking forever through one or more of these forks in the evolutionary pathway, and that would be true if there was never even a single fully ETSI-compliant deployment.

One fork is the “agile CPE” fork.  This emerged from the NFV notion of “virtual CPE”, which initially targeted cloud-hosted instances of virtual functions to replace premises-based appliances.  You could frame a virtual premises device around any set of features that were useful, and change them at will.  Vendors quickly realized two things.  First, you needed to have something on premises just to terminate the service.  Second, there were sound financial reasons to think about the virtual hosting as an on-premises device, especially given that first point.

The result, which I’ll call “aCPE”, is a white-box appliance designed to accept loaded features.  These features may be “VNFs” in the ETSI sense, or they may simply be something that can be loaded easily, following perhaps a cloud or container model.  aCPE using a simple feature-loading concept could easily be a first step in deploying vCPE; you could upgrade to the ETSI spec as it matured.  aCPE-to-vCPE transformation would also prep you for using the cloud instead of that agile device, or using a hybrid of the two.

Most of what we call “NFV” today is a form of aCPE.  Since it would be fairly wasteful to demand all of the cloud-hosted elements, including “service chaining” when all your functions are in the same physical box, most of it doesn’t conform to the ETSI spec.  I suspect that over time it might, providing that a base of fully portable VNFs emerges from the ongoing ETSI activity.

Another form is the “virtual device instance”, which I’ll call vDI.  Virtual functions are features that are presumably somewhat dynamic.  The vDI is a replacement for a network device, not an edge feature, and so it’s much more like a hosted instance of device software.  A “virtual router” is usually a vDI, because it’s taking the place of a physical one and once it’s instantiated it behaves pretty much like the physical equivalent.

Perhaps the most significant attribute of the vDI is that it’s multi-service and multi-tenant.  You don’t spin up a set of Internet routers for every Internet user, you share them.  Same with vDIs that replace the real routers.  There are massive differences between the NFV model of service-and-tenant-specific function instantiation and a multi-tenant vDI model, and you can’t apply service-specific processes to multi-tenant applications unless you do want to build an Internet for everyone.

Issues notwithstanding, we’re starting to see some activity in the vDI space, after a number of tentative beginnings.  Brocade’s Vyatta router software (now acquired by AT&T) was an early vDI, to some extent subsumed into the NFV story.  However, vDI is really more like a “cloud router” than a virtual network function.  I believe that most of the IMS/EPC/5G software instances of functionality will really be vDIs because they’ll deploy in a static configuration.  In the 5G space, the NFV ISG seems to be taking up some multi-tenant issues in the context of their 5G network slicing work, but it’s too early to say what it will call for.

The real driver of vDI, and also perhaps a major driver for NFV, depends on reshaping the role of the lower OSI layers.  The original OSI model (Basic Reference Model for Open Systems Interconnect) was created in an age where networking largely depended on modems, and totally depended on error-prone electrical paths.  Not surprisingly, it built reliable communications on a succession of layers that dealt with their own specific issues (physical-link error control was Layer 2, for example).  In TCP/IP and the Internet, a different approach was taken, one that presumed a lot of lower-level disorder.  Neither really fits a world of fiber and virtual paths.

If we were to create, using agile optics and virtual paths/wires, a resilient and flexible lower-layer architecture, then most of the conditions that we now handle at Levels 2 and 3 would never arise.  We could segment services and tenants below, too, and that combination would allow us to downsize the Level 2/3 functionality needed for a given user service, perhaps even for the Internet.  This would empower the vDI model.  Even a less-radical rethinking of VPN services as a combination of tunnel-based vCPE and network-resident routing instances could do the same, and if any of that happens we could have an opportunity explosion here.  If the applications were dynamic enough, we could even see an evolution from vDI to VNFs, and to NFV.

Another of my versions of NFV that are emerging is what could be called “multi-layer orchestration”, which I’ll call “MLO” here.  NFV pioneered the notion of orchestrated software automation of a virtual function deployment lifecycle, which was essential if virtual network functions were to be manageable in the same way as physical network functions (PNFs).  However, putting VNFs on the same operational plane as PNFs doesn’t reduce opex, since the overall management and operations processes are (because the VNFs mimic the PNFs in management) the same.  The best you can hope for is to keep it the same.  To improve opex, you have to automate more of the service lifecycle than just the VNFs.

MLO is an add-on to NFV’s orchestration, an elevation of the principle of NFV MANO to the broader mission of lifecycle orchestration for everything.  A number of operators, notably AT&T with ECOMP, have accepted the idea that higher-layer operations orchestration is necessary.  Some vendors have created an orchestration model that works both for VNFs and PNFs, and others have continued to offer only limited-scope orchestration, relying on MLO features from somewhere else.  Some OSS/BSS vendors have some MLO capability too.

NFV plus MLO can make a business case.  MLO, even without NFV, could also make a business case and in fact deliver a considerably better ROI in the first three or four years.  Add that to the fact that there is no standard set of MLO capabilities and no common mechanism to coordinate between MLO and NFV MANO, and you have what’s obviously fertile ground for confusion and misinformation.  You also have a classic “tail-or-dog” dilemma.

NFV drove our current conception of orchestration and lifecycle management, but it didn’t drive it far enough, which is why we need MLO.  It’s also true that NFV is really a pathway to carrier cloud, not an end in itself, and that carrier cloud is likely to follow the path of public cloud services.  That path leads to event-driven systems, functional programming, serverless computing, and other stuff that’s totally outside the realm of NFV as we know it.  So, does NFV have to evolve under MLO and carrier cloud pressure, or do we need to rethink NFV in those terms?  Is virtual function deployment and lifecycle management simply a subset of MLO?  This may be something that the disorderly market process I opened with may end up deciding.

Perhaps it’s not a bad thing that we end up with an “NFV” that doesn’t relate all that much to the original concept.  Certainly, it would be better to have that than to have something that stuck to its technical roots and didn’t generate any market-moving utility.  I can’t help but think that it would still be better to have a formal process create the outcome we’re looking for, though.  I’m pretty sure it would be quicker, and less expensive.  Maybe we need to think about that for future tech revolutions.

Open source seems to be the driver of a lot of successes, and perhaps of a lot of the good things circulating around the various NFV definitions.  Might we, as an industry, want to think about what kind of formal process is needed to launch and guide open-source initiatives?  Just saying.

Dissecting the Details of the Carrier Cloud Opportunity

The “carrier cloud” should be the real focus of transformation.  For operators, it epitomizes the shift from connectivity technology to hosting technology, and for vendors a change from network equipment to servers and software.  Things like SDN and NFV are not goals; they are important only insofar as they can be linked to a migration to the carrier cloud.

I’ve been working hard at modeling the carrier cloud trend, using a combination of operator data and my own “decision model” software.  I’ve shared some early work in this area with you in prior blogs, but I have more refined data now and more insights on what’s happening, so it’s time to take a deeper look.  What I propose to do is to set the stage in a couple paragraphs, and then look at carrier cloud evolution over the next couple decades.

There are six credible drivers to carrier cloud.  NFV, largely in the form of virtual CPE (vCPE) is the one with the longest history and the most overall market interest.  Virtualization of advertising and video features is the second.  The third is the mobile broadband evolution to 5G, the fourth network operators’ own cloud computing services.  Driver five is “contextual” services to consumers and workers, and six is the Internet of Things.  These drivers aren’t terribly important insofar as where they take carrier cloud; their overall strength determines market timing, and their relative strength the things most likely to be successful in any period.

We have a notion, set by “cloud computing”, that the carrier cloud is cloud computing owned by carriers.  The truth is more complicated, and fluid.  There is a strong drive toward hosting not in centralized cloud data centers, but in distributed edge points.  Edge cloud, so to speak, doesn’t depend on economies of scale but on the preservation of a short control loop.  If edge beats out central cloud, then we might see Linux boards in devices providing the majority of the hosting horsepower, and that would favor network vendors (who are already at the edge) over server vendors.

So the net is that the “where” of carrier cloud is the shape of the actual technology, and there are two basic options—edge-centric and centralized.  Were we to find that something like operator cloud computing services was a near-term dominant driver, we could expect to see more centralized deployments, followed by a migration toward the edge.  If any other driver dominates, then the early impetus is for edge hosting, and that will tend to sustain an edge-centric structure over time.  Got it?  Let’s look, then, at “time” in intervals to see what my model says.

The period between now and 2019 is the period that’s most transforming in the sense of the architecture of deployment (edge, center) and the opportunity for a driver and a vendor to dominate.  In this period, we could have expected to see NFV emerge as the dominant driver because it had a major head start, but NFV has failed to generate any significant carrier cloud momentum.  What is actually creating the biggest opportunity is the consumer video space.

Over half of all carrier cloud opportunity through 2019 is video-content related.  A decent piece of the early opportunity in this period is for edge caching of content, associated with improving video QoE while at the same time conserving metro mobile backhaul bandwidth.  Over time, we’ll see customization of video delivery, in the form of shifting from looking for a specific content element to socializing content among friends or simply communities of interest.  The mobile broadband connection to this creates the second-most-powerful driver, which is the mobile evolution driven by 5G issues; not yet truly a 5G convergence but rather a 4G impact of 5G requirements.  Nothing else really matters in this critical period, including NFV.

Between 2020 and 2022 we see a dramatic shift.  Instead of having a single opportunity driver that represents half of all opportunity for carrier cloud, we have four opportunities that have over 15% contribution, and none have more than 24%.  Cloud services, contextual services, and IoT all roughly double their opportunity contribution.  Video-content and 5G evolution continue to be strong, but in this period 5G opportunity is focused increasingly on actual deployment and video content begins to merge into being a 5G application.

This is the period where the real foundation of carrier cloud is laid, because it’s where process hosting begins to separate from content hosting.  The vCPE activity drops off as a percentage of total opportunity, and at this stage there is still no broad systemic vision to drive NFV forward in other applications.  IoT and contextual services, the most direct drivers of process hosting, begin to gather their strength, and other drivers are nearing or at their high-water mark, excepting video content which is increasingly driven by personalization, socialization, and customization of both ads and QoE.

The next period is the beginning of the real carrier cloud age, the 2023 to 2025 period, and in this phase we are clearly in the period when process and application hosting dominate carrier cloud opportunity.  Video-content services, now largely focused on socialization and personalization rather than caching, join with contextual services and IoT to reach 20% opportunity contributions, while most other areas begin to lose ground.  But it is in this period that NFV concepts finally mature and take hold, and most NFV applications beyond vCPE are fulfilled in this period.  This marks the end of the period where NFV specifications and tools are significant drivers of opportunity.

The largest number of new carrier cloud data centers are added, according to my model, in 2025.  In part, this is because of the multiplicity of opportunity drivers and in part because the economy of scale in edge data centers allows for a reduction in service prices, which drives new applications into the carrier cloud.

The period to 2025 is the final period where my model retains any granular accuracy.  Beyond that, the model suggests that operator cloud services and IoT will emerge as opportunity driver of 30% or more of the data center growth.  In this period, point-of-activity empowerment for workers and IoT control processes that focus attention on the edge are the primary drivers of process hosting.  Operators, with ample real estate at the network edge, will continue to expand their hosting there.  From 2025 onward, it appears that process hosting applications become more generalized, less tied to a specific driver, and that this is the period when we can truly say that carrier cloud is the infrastructure strategy driving operator capex.

Carrier cloud, in a real sense, wins because it throws money at the places where a good return is earned.  That’s hosted experiences and content.  What my model doesn’t address is the question of whether fundamental transformation of the network—the stuff that creates end-to-end connectivity—will happen, and how that might impact things like SDN and NFV as substitutes for network equipment as we know it.  I’m not yet able to model this question because we have no organized approach to such a transformation, and my buyer-decision model needs things buyers can actually decide on.

Broadly, it appears that you could reduce connection network capex by about 20% through the optimum use of SDN and NFV.  Broadly, it appears that you could reduce connection network opex by about 30%, beyond what you’d already achieved through process opex service lifecycle automation.  However, getting to this state would demand a complete rethinking of how we use IP, the building of new layers, and the elimination of a lot of OSI-model-driven sacred cows.  All that could be done only if we assumed that we had a reason to displace existing technology before its lifecycle had ended, and that will be hard to find.  Particularly given that we’re going to ramp up on a 5G investment cycle before there’s much chance of promoting a new network model.

5G could transform access networking by creating RF tail connections on fiber runs to neighborhoods, and these same RF points could also support mobile 5G access.  A total convergence of mobile and fixed metro networks would likely follow, and since consumer video is the dominant traffic driver for both, we’d expect to see hosting (in the form of CDN) move outward.  We could see “the Internet” become more of a thin boundary between CDNs and access networks, which would of course open the door for total transformation of IP networks.

What that means in practical terms is that the best shot SDN and NFV have to transform how we build connection networks is the period up to about 2022, when 5G investment peaks.  If nothing happens to how we do metro networking, then transformation of the connection architecture of future networks will come about naturally because of the edge hosting of content and processes and the radical shift from traditional trunking to data center interconnect as the driver of bulk bandwidth deployment.  That means that there is only a very short period to put connection-network transformation into the picture as a driver of change, but even were it to be fully validated it lacks the opportunity value to significantly accelerate the carrier cloud deployment.  All it could do is make network vendors more important influences on the process, particularly Cisco who has both connection and server assets.

If you are not a network equipment vendor but instead a server provider, chip company, or software house, then the carrier cloud bet says forget the network and focus on hosted processes and the software and hardware dynamics associated with that.  For example, given that in the cloud the notion of functional computing (lambdas, microservices, etc.) and dynamic event-driven processes is already developing, these would seem likely to figure prominently in process hosting.  That’s a different cloud software architecture, with different hardware and even chip requirements.

IT players might also want to consider that there is zero chance that process hosting in the carrier cloud wouldn’t impact enterprise IT.  If edge-distributed computing power is good, the either enterprises have to accept cloud providers owning the edge, or they have to think about how to host processes close to the edge themselves (think Amazon’s Greengrass).  The latter would mean rethinking “server farms” as perhaps “server fog”, and that would impact the server design (more CPU and less I/O) and also the software—from operating system and middleware to the virtualization/cloud stacks.  And of course, every software house would have to recast their products in process-agile terms.

It’s also fair to say that discussions of SDN and NFV would be more relevant if they addressed the needs of carrier cloud and the future revenues it generates, rather than focusing on current services that have no clear revenue future.  NFV would also benefit from thinking about the way that functional programming, function hosting, and “serverless” computing could and should be exploited.  It makes little sense to look for NFV insight in the compute model that’s passing away, rather than the one that will shape the future.

Vendors have their own futures to look to.  Carrier cloud will create the largest single new application for servers, the largest repository for the hosting of the experiences that drive Internet growth.  It’s hard to see how that critical issue I opened with—central or edge—won’t determine the natural winners in the race to supply the new compute technology, and vendors need to be thinking ahead to what the carrier cloud issue will create in terms of opportunity (and risk) for them.  I think Cisco’s recent announcements of a shift in strategy show that they, at least, know that things are changing.  They (and others) have to prove now that they know where those changes will take the industry.

In all, I think carrier cloud reality proves a broader reality, which is that we tend to focus on easy and familiar pathways to success rather than on plausible pathways.  There is an enormous amount of money at stake here, and enormous power in the industry.  It will be interesting to see which (if any) vendors really step up to take control of the opportunity.