Why the Critical Piece of VMware’s NFV 2.0 is the “Network Model” NSX MIGHT Support

I mentioned in my blog yesterday that a network and addressing model was critical to edge computing and NFV.  If that’s true, then could it also be true that having a virtual-network model was critical to vendor success in the NFV space?  The VMware NFV 2.0 announcement may give us an opportunity to test that, and it may also ignite more general interest in the NFV network model overall.

A “network model” in my context is a picture of how connectivity is provided to users and applications using shared infrastructure.  The Internet represents a community network model, one where everyone is available, and while that’s vital for the specific missions of the Internet, it’s somewhere between a nuisance and a menace for other network applications.  VPNs and VLANs are proof that you need to have some control over the network model.

One of the truly significant challenges of virtualization is the need to define an infinitely scalable multi-tenant virtual network model.  Any time you share infrastructure you have to be able to separate those who share from each other, to ensure that you don’t create security/governance issues and that performance of users isn’t impacted by the behavior of other users.  This problem arose in cloud computing, and it was responsible for the Nicira “SDN” model (now VMware’s NSX), an overlay-network technology that lets cloud applications/tenants have their own “private networks” that extend all the way to the virtual components (VMs and now containers).

NFV has a multi-tenant challenge too, but it’s more profound than the one that spawned Nicira/NSX.  VMware’s inclusion of NSX in its NFV 2.0 announcement means it has a chance, perhaps even an obligation, to resolve NFV’s network-model challenges.  That starts with a basic question that’s been largely ignored; “What is a tenant in NFV?”  Is every user a tenant, ever service, every combination of the two?  Answer: All of the above, which is why NFV needs a network model so badly.

Let’s start with what an NFV network model should look like.  Say that we have an NFV service hosted in the cloud, offering virtual CPE (vCPE) that includes a firewall virtual function, an encryption virtual function, and a VPN on-ramp virtual function of some sort.  These three functions are “service chained” according to the ETSI ISG’s work, meaning that they are connected through each other in a specific order, with the “inside” function connected to the network service and the “outside” to the user.  All nice and simple, right?  Not so.

You can’t connect something without having a connection service, which you can’t have without a network.  We can presume chaining of virtual functions works if we have a way of addressing the “inside” and “outside” ports of each of these functions and creating a tunnel or link between them.  So we have to have an address for these ports, which means we have an address space.  Let’s assume it’s an IP network and we’re using an IP address space.  We then have an address for Function 1 Input and Output and the same for Functions 2 and 3.  We simply create a tunnel between them (and to the user and network) and we’re done.

The problem is that if this is a normal IP address set, it has to be in an address space.  Whose?  If this is a public IP address set, then somebody online could send something (even if it’s only a DDoS packet) to one of the intermediary functions.  So presumably what we’d do is make this a subnet that uses a private IP address space.  Almost everyone has one of these; if you have a home gateway it probably gives your devices addresses in the range 192.168.x.x.  This would keep the function addresses hidden, but you’d have to expose the ports used to connect to the user and the network service to complete the path end to end, so there’s a “gateway router” function that does an address translation for those ports.

Underneath the IP subnet in a practical sense is an Ethernet LAN, and if it’s not an independent VLAN then the functions are still addressable there.  There are limits to the number of Ethernet VLANs you can have, and this is why Nicira/NSX same along in the first place.  With their approach, each of the IP subnets rides independently on top of infrastructure, and you don’t have to segment Ethernet.  So far, then, NSX solves our problems.

But now we come to deploying and managing the VNFs.  We know that we can use OpenStack to deploy VNFs and that we can use Nicira/NSX along with OpenStack’s networking (Neutron) to connect things.  What address space does all this control stuff live in?  We can’t put shared OpenStack into the service’s own address space or it’s insecure.  We can’t put it inside the subnet because it has to build the subnet.  So we have to define some address space for all the deployment elements, all the resources, and that address space has to be immune from attack, so it has to be separated from the normal public IP address space, the service address space, and the Internet.  Presumably it also has to be broad enough to address all the NFV processes of the operator wherever they are, so it’s not an IP subnetwork at all, it’s a VPN.  This isn’t discussed much, but it is within the capabilities of the existing NFV technology.

The next complication is the management side.  To manage our VNFs we have to be able to connect to their management ports.  Those ports are inside our subnet, so could we just provide a gateway translation of those port addresses to the NFV control process address space?  Sure, but if we do that, we have created a pathway where a specific tenant can “talk” into the control network.  We also have to expose resource management interfaces, and the same problem arises.

I think that NSX in VMware’s NFV 2.0 could solve these problems.  There is no reason why an overlay network technology like NSX couldn’t build IP subnets, VPNs, and anything else you’d like without limitations.  We could easily define, using the Private Class A IP address (1.x.x.x) an operator-wide NFV control network.  We could use one of the Class B spaces to define a facility-wide network, and use the Class C networks to host the virtual functions.  We could gateway between these—I think.  What I’d like to see is for VMware to take the supposition out of the picture and draw the diagrams to show how this kind of address structure would work.

Why?  The answer is that without this there’s no way we can have a uniform system of deployment and management for NFV because we can’t tell if everything can talk to what it needs to and that those conversations that should never happen are in fact prevented.  Also, because such a move would start a competitive drive to dig into the whole question of the multi-network map that’s an inherent (but so far invisible) part of not only NFV but also cloud computing and IoT.

Finally, because some competitor is likely to do the right thing here even if VMware doesn’t.  Think Nokia, whose Nuage product is still in my view the best overlay technology out there.  Think HPE, who just did their own next-gen NFV announcement and has perhaps the most to gain (and lose) of any vendor in the space.  This is such a simple, basic, part of any virtualized infrastructure and service architecture that it’s astonishing nobody has talked about it.

Ah, but somebody has thought about it—Google.  And guess who is now starting to work with operators on the elements of a truly useful virtual model for services?   Google just announced a partnership with some mobile operators, and they have the necessary network structure already.  And vendors wonder why they’re falling behind!