How Much “Carrier” versus “Cloud” Should we Have in “Carrier Cloud?”

Should carrier cloud look like a cloud or like a carrier?  In past blogs I’ve pointed out many places where the two aren’t converging and probably should be.  Another such area is virtual networking.  In cloud computing, including public cloud services, hybrid cloud, multi-cloud, and even in data center computing, there’s increased attention being paid to virtual networks.  Arguably, the whole SD-WAN craze is about virtual networking too, and yet we’ve not been hearing much about how virtual networks would figure in carrier cloud.

It’s not that operators (carriers) don’t understand virtual networks, at least at one level.  VPN services are a fixture for most big operators and many smaller ones specialize in business services of some sort.  The increased interest in SD-WAN shows that operators are receptive to different virtual network paradigms, at least for the creation of “VPNs” in a different way.  Operators have also generally adopted virtual network technology in data centers, mostly in support of the use of OpenStack.  With all of that, though, they’re still not comfortable about virtual networking at the level it’s discussed in cloud circles.

Broadly speaking, virtual networking in the cloud world is an extension of “tenant networking”, introduced back in the early SDN days by Nicira (it’s now VMware’s NSX).  Tenant networking was designed to create an overlay network to segment data center networks used in the cloud, without resorting to the more limited formal Ethernet-related protocols like VXLAN.

In these early applications, the goal was to create application/tenant subnetworks that would be isolated from each other, and exposed onto tenant VPNs through a gateway or NAT process.  Thus, the virtual network was really a virtual LAN, something that built subnets in an IP world but really itself lived at Level 2.

Three factors have moved things out of that limited conception.  First, vendors like Nokia/Nuage came along and offered a virtual network that could extend beyond the data center.  This actually happened before most buyers really understood why they’d even want to do that, and Nokia/Nuage (and other vendors who followed, like Juniper) weren’t marketing giants, so this first factor waited in the wings for a time.  Second, startups realized that virtual-network concepts could be used to build or extend VPNs over the Internet or other non-MPLS network resources.  That launched SD-WAN, and created a market awareness of the value of virtual networking in the wide area.

It was really the third factor that’s setting the pace today.  That last factor is the inherent elasticity of the cloud.  Physical networks have no business connecting virtual resources, for the obvious reason that virtual resources like the cloud require connection of logical entities that can be put anywhere and can move often.  If you look at how virtual networking is evolving in the cloud, you see that it’s becoming increasingly an integrated piece of “virtual resources”, a recognized co-equal to hosting.

That’s what carrier cloud is still missing, in part because operators still tend to think in terms of NFV, and NFV has never had a broad conception of virtual networking.  Today, we build real networks with real devices, “physical network functions” or PNFs.  The conception NFV brought along was that you built the same networks using virtual network functions, VNFs.  Thus, whatever you used to connect your PNFs is what you used to connect your VNFs.  A few realized that some VNFs might need to deploy within a private network space, and so you might need a subnet something like the ones that spawned the whole cloud virtual network initiative (but got left behind in the cloud).  Very few have gotten beyond that.

In the cloud, you can see the progress in one simple evolution—the virtual network’s evolution to the service mesh.  A virtual network, even one with a network-wide vision, still provides connectivity.  You connect virtual stuff, of course, but you connect.  If your virtual stuff is moved (by orchestration, for example) or redeployed, you have to reconnect, but it’s probably going to be a task outside virtual networking to accommodate things like load balancing for scaling.  Service mesh is different; instead of presuming your goal is agile connectivity, the assumption is that your goal is agile service delivery, with “service” here meaning software services/microservices.  Spawn a component in response to a need to scale capacity, and service mesh is responsible for getting it connected in correctly and load-balancing work to it and its partner component instances.

If you think about it, even NFV has “service mesh” applications, things where the goal is not just to connect something but rather to fit that something into an application/service-feature framework.  It is possible to do that without a service mesh, using multiple steps and multiple products, but it’s additional complexity and integration.  It would make more sense to use a mesh, and that’s even more likely true if you consider that most of what carrier cloud does resembles cloud applications more than it does a network of physical devices.  Even 5G is likely to deploy more “control plane” VNFs than data-plane VNFs, and things like content delivery or IoT will be even more cloud-centric.

I think we got into the carrier-cloud-versus-cloud disconnect on virtual networking because the operator community has tended to look to bottom-up standards efforts to advance their state of technology.  This tends to harden them on the details before they’ve really established the requirements, and the labyrinthian standards processes take years to advance, where cloud computing is based on open-source initiatives and advances at ten times that pace.  What the cloud is with respect to network and hosting technology today, for example, is so far beyond where it was in 2013 it would be hard for someone of that day to even visualize what’s happened.  NFV, launched at the same time, is still churning along with the same vision as before.

Another contributing problem is the excessive focus on virtual CPE.  vCPE is the simplest and probably overall the least-valuable mission you could project for carrier cloud.  Service chaining, the linking of separately hosted components to create a virtual form of a multi-feature appliance, is the worst possible way of implementing a virtual device.  It’s too expensive in its use of hosting resources, less reliable, and much more operationally complex.  Sticking all the VNFs inside a universal CPE device or a single virtual machine (better yet, a container) would be smarter.  But focusing on vCPE has hidden the connectivity and hosting needs of broader applications of NFV, and so they’re not yet being understood, much less addressed.

What did we miss?  Well, if you look at a “real” NFV opportunity, it’s hard to find one better than IMS, which in fact was my personal pick for a proof-of-concept for NFV back in 2013.  Everyone has seen a diagram of the components of IMS, and if you look carefully at one, you’ll see there are two places (the cell site interface to devices and the network gateway to the Internet) where there’s a clear and exposed public interface.  All of the other pieces of IMS talk only to each other.  Logically, then, what you’d like to have is a private subnetwork hosting “IMS” and exposing the two interfaces that are actually public.  That’s exactly how container hosting works in the cloud (Docker and Kubernetes each have different conceptions of “private subnetworks” but both require explicit exposure of the public addresses).  Where are the discussions of private subnetworks in carrier cloud?

The money, the ROI, is the real answer to my question about the bias of carrier cloud.  Anyone who believes that there is any significant incremental revenue to be had from things like virtual CPE is simply wrong.  The cloud industry is proving there is revenue in hosting parts of business applications and entirely new application models.  There’s also money to be earned from things like streaming video advertising and personalization, IoT, and so forth.  These are not connection applications, they’re experience applications, much like the applications we’re building in public cloud services today.  If operators want their share of the future, they need to do what the cloud is already doing.