Finding the Right Path to Virtual Devices

One of the early points of interest for NFV was “virtual CPE”, meaning the use of cloud hosting of features that would normally be included in a device at the customer edge of services.  I’ve blogged a number of times on the question of whether this was a sensible approach, concluding that it isn’t.  The real world may agree, because most “vCPE” as it’s known is really not hosted in the cloud at all.  Instead it involves the placement of features in an agile edge device.  Is this a viable approach, and if so, how important might it be?

Agile or “universal” CPE (uCPE) is really a white-box appliance that’s designed (at least in theory) to be deployed and managed using NFV features.  Virtual network functions (VNFs) are loaded into the uCPE as needed, and in theory (again) you could supplement uCPE features with cloud-hosted features.  One benefit of the uCPE concept is that features could be moved between the uCPE and the cloud, in fact.

What we have here is two possible justifications for the uCPE/vCPE concept.  One is that we should consider this a white-box approach to service edge devices, and the other that we’d consider it an adjunct to carrier-cloud-hosted NFV.  If either of these approaches present enough value, we could expect the uCPE/vCPE concept to fly, and if neither does, we’ll need to fix some problems to get the whole notion off the ground.

White-box appliances are obviously a concept with merit, as far as lowering costs are concerned.  However, they depend on someone creating features to stuff in them, and on the pricing of the features and the uCPE being no greater than and hopefully at least 20% less than, the price of traditional fixed appliances.  According to operators I’ve talked with, that goal hasn’t been easy to achieve.

The biggest problems operators cite for the white-box model are 1) the high cost that feature vendors want for licensing the features to be loaded into the uCPE, and 2) the difficulties in onboarding the features.  It’s likely the two issues are at least somewhat related; if feature vendors have to customize features for a uCPE model, they want a return.  If in fact there are no “uCPE models” per se, meaning that there are no architecture or embedded operating system standards for uCPE, then the problem is magnified significantly.

You could argue that the NFV approach is a way out of at least the second of these two problems, and thus might impact the first as well.  Logical, but it doesn’t seem to be true, because both licensing costs and onboarding difficulties are cited for VNFs deploying in the cloud as well.  Thus, I think we have to look in a different direction.  In fact, two directions.

First, I think we need a reference architecture for uCPE, a set of platform APIs that would be available to any piece of software on any uCPE device regardless of implementation or vendor.  Something like this has been done with Linux and with the Java Virtual Machine.  Suppose we said that all uCPE had to offer an embedded Linux or JVM implementation?  Better yet, suppose we adopted the Linux Foundation’s DANOS?  Then a single set of APIs would make any feature compatible with any piece of uCPE, and we have at least that problem solved.  There are also other open-device operating systems emerging, and in theory one of them would serve, as long as it was open-source.  Big Switch announced an open-source network operating system recently, and that might be an alternative to DANOS.

The second thing we need is an early focus on open-source features to be added to the uCPE.  I’ve always believed that NFV’s success depended on getting open-source VNFs to force commercial VNF providers to set rational prices based on benefits they can offer.  No real effort to do that has been made, to the detriment of the marketplace.

These steps are necessary conditions, IMHO, but not sufficient conditions.  The big problem with uCPE is the relatively narrow range of customers where the concept is really viable.  Home devices are simply too cheap to target, which means only business sites would be likely candidates for adopting the technology.  Then you have the question of whether agile features are valuable in the first place.  Most enterprise customers tell me that they believe their sites would require a single static feature set, and a straw poll I tool in 2018 said that the same feature set (firewall, SD-WAN, encryption) was suitable for almost 90% of sites.  We’ll have to see if a value proposition emerges here.

Let’s move on, then, to our second uCPE possibility.  The notion of uCPE as being a kind of outpost to the NFV carrier cloud has also presented issues.  Obviously, it’s more complicated to populate a uCPE device with features if you have to follow the ETSI NFV model of orchestration and management, and so having uCPE be considered a part of NFV is logical only if you actually gain something from that approach.  What, other than harmonious management where features might move between uCPE and cloud, could we present as a benefit?  Not much.

Operators tell me that they have concerns over the VNF licensing fees, just as they have for the white-box model.  Some are also telling me that the notion of chaining VNFs together in the cloud to create a virtual device is too expensive of hosting resources and too complex operationally to be economical.  Onboarding VNFs is too complex, again as it is for white-box solutions.  They also say their experience is that enterprises don’t change the VNF mixture that often, which means it would be more cost-effective to simply combine the most common VNF configurations into a single machine image.

The solution to these problems seems straightforward.  First, you need that common framework for hosting and to encourage open-source VNFs, the same steps as with white-box uCPE.  Second, you need to abandon the notion of service chains of VNFs in favor of packaging the most common combinations as machine images.  One operator told me that just doing the latter improves the resource efficiency and opex efficiency by 50% or more.

The common thread here is pretty clear.  More work needs to be done to standardize the platform for hosting vCPE, both on-prem (in uCPE) and in the cloud.  If that isn’t done, then it’s likely that neither framework for vCPE will be economically viable on a broad enough scale to justify all the work being put into it.  Second, the best source for VNFs is open-source, where there are ample business models out there in sale of support for operators to mimic.  In addition, commercial software providers would be more likely to be aggressive in VNF pricing if they knew they had a free competitor.

It would have been easy to adopt both these recommendations, and the “one-machine-image” one as well, right at the first, and I know the points were raised because I was one who raised them.  Now, the problem is that a lot of the VNF partnerships created don’t fit these points, and operators would have to frame their offerings differently in order to adopt them today.  The biggest problem, I think, would be for the NFV community to accept the changes in strategy given the time spent on other approaches.

It would be smart to do that, though, because the DANOS efforts alone would seem to be directing the market toward a white-box approach.  If that’s the case, then the APIs available in DANOS should be accepted as the standard to be used even for cloud-hosted VNFs, which would make VNFs portable between white boxes and the cloud.  It would also standardize them to the point where onboarding would be much easier.

To make this all work, we’d need to augment DANOS APIs to include the linkages needed to deploy and manage the elements, and we’d also need to consider how to get DANOS APIs to work in VMs and containers.  A middleware tool would do the job, and so the Linux Foundation should look at that as part of their project.  With the combination of DANOS available for devices, VMs, and containers, operators would have the basis for portable data-plane functions hosted in the cloud or uCPE.

The icing on the cake would be to provide P4 flow-language support on DANOS in all these configurations.  P4 could be used to create specialized switch/router features anywhere, and could also (perhaps with some enhancements) be used to build things like firewalls and VPN on-ramps.  Given that the ONF at least is promoting P4 on DANOS and that AT&T originated DANOS (as dNOS), getting broad operator support for this approach should be possible.

Vendors with competitive network operating systems (like Big Switch) would need something like the points I’ve cited here just to bootstrap themselves into credibility.  There are already enough options in the space to confuse prospective users, and none of them so far have really taken aim at the major value propositions that would justify them.  If we had a bit of a feature-value war among these vendors, it would help elevate the discussion overall, even if it did magnify near-term confusion over selection.

If all this is technically possible and if we could get the framework into place, who would buy into it?  Some operators like AT&T likely would, but another strong possibility is the US Government.  The feds are always looking for ways to get more for less, and they’re supporters of open-source and standard frameworks with interchangeable parts.  Even if operators might drag their feet, government support could create direct deployments and at the same time push operators to support the same architecture.

I firmly believe that this is the right way to do virtual devices, including vCPE.  Despite the fact that things didn’t get off to a smooth start, I think the approach could still be promoted and help drive vCPE, uCPE, and even NFV forward.