NFV’s “Performance Problem” Isn’t NFV’s Problem

PR often follows a very predictable cycle.  When you launch a new technology, the novelty (which is what “news” means) drives a wave of interest, hype, and exaggeration.  Whatever this new thing is, it becomes the singlehanded savior of western culture, perhaps life as we know it.  Eventually all the positive story lines run out, and you start to get the opposite.  No, it’s not going to save western culture, it’s actually a bastion of international communism or something.  You’re hot, or you’re not, and you can see a bit of that with NFV and performance concerns.  These concerns are valid, but not necessarily in the way we’re saying they are.

We could take a nice multi-core server with a multi-tasking OS and load it with all of the applications that a Fortune 500 company runs.  They’d run very badly.  We could take the same server, convert it into virtual machines, and then run the same mix.  It would run worse.  We could then turn it into a cloud server and get even worse performance.  The point here is that all forms of virtualization are means of dealing with under-utilization.  They don’t create CPU resources or I/O bandwidth, and in fact the process of subdividing resources takes resources, so adding layers of software to do that will reduce what’s available for the applications themselves.

The point here is that NFV can exploit virtualization to the extent that the virtual functions we’re assigning require single-tenant software components that don’t fully utilize a bare-metal server.  Where either of these two things isn’t true, NFV’s core concept of hosting on VMs (or in containers, or whatever) isn’t going to hold water.

An application component that serves a single user and consumes a whole server has to recover that server’s cost (capex and opex) in pricing the service it supports.  A multi-tenant application spreads its cost across all the tenant users/services, and so has less to be concerned about, efficiency-wise.  Thus, something like IMS which is inherently multi-tenant can’t be expected to gain a lot by sticking it onto a VM.  We’re not going to give every cellular customer their own IMS VM, after all, and it’s hard to see how an IMS application couldn’t consume a single server easily.

No matter how you overload a server, you’ll degrade its performance.  In many cases, the stuff we’re talking about as NFV applications won’t wash if we see transparent virtualization-based multi-tenancy as the justification.  They’re already multi-tenant, and we would expect to size their servers according to the traffic load when they run on conventional platforms.  The same is true with NFV; we can’t create a set of VMs whose applications collectively consume more resources than the host offers.

What we do have to be concerned about are cases where virtualization efficiency is inhibited not by the actual application resource requirements but by resources lost to the virtualization process itself.  Early on in my CloudNFV activity, I recruited 6WIND to deal with data-plane performance on virtualized applications, which their software handled very effectively.  But even data plane acceleration isn’t going to make every application suitable for virtual-machine hosting on NFV.  We are going to need some bare metal servers for applications that demand a lot of resources.

Our real problem here is that we’re not thinking.  Virtualization, cloud computing, even multi-tasking, are all ways of dealing with inefficient use of hardware.  We seem to believe that moving everything to the cloud would be justified by hardware efficiencies, and yet the majority of mission-critical applications run today are not inefficient in resource usage.  That’s true with the cloud and it will be true with NFV.  Virtualization is the cure for low utilization.

So what does this mean?  That NFV is nonsense?  No, what it means is that (as usual) we’re trapped in oversimplification of a value proposition.  We are moving to services that are made up as much or more (value-speaking) of hosted components as of transport/connection components.  You need to host “hosted components” on something and so you need to manage efficiency of resource usage.  Where we’re missing a point is that managing efficiency means dealing with all the levels of inefficiency from “none” to “a lot”.  In totally inefficient situations we’re going to want lightweight options like Docker that impose less overhead to manage large numbers of components per server.  In totally efficient application scenarios, we want bare metal.  And in between, we want all possible flexibility.

NFV’s value has to come not from simply shoehorning more apps onto a server (which it can’t do, it can only support what the underlying virtualization technology can support).  It has to come from managing the deployment of service components, including both connection and hosted applications or content, that make up the experiences people will pay for.  MANO should have been seen not as a mechanism to achieve hosting, but as the goal of the whole process.  We could justify MANO, done right, even if we didn’t gain anything from virtualization at all.

IMS and EPC are applications that, as I’ve said, gain little or nothing from the efficiency-management mechanisms of virtualization.  They could gain a lot from the elasticity benefits of componentization and horizontal scaling.  Virtual routing is easiest to apply on a dedicated server; it’s hard to see why we’d want to virtualize a router to share a server with another virtualized router unless the router was handling a relatively low traffic level.  But again, elastic router positioning and capacity could be valuable in the network of the future.

It’s unfair to suggest that NFV has resource issues; whatever issues it has were inherited from virtualization and the cloud, whose resource issues we’re not talking about.  Even data plane processing is not uniquely an NFV issue.  Any transactional application, any content server, has to worry about the data plane.  Even web servers, which is why at some point you stop sharing the hosting of websites on a single server and go to a dedicated server for a site, even several per site.  But it is fair to say that the “resource problem” discussion that’s arising is a demonstration of the fact that the simplistic view of how NFV will work and save money has feet of clay.  We can justify NFV easily if we focus on how it actually can benefit us.  If we invent easy-to-explain benefits, we invent things that won’t stand up to close examination.

There are two important things here.  First, componentized applications and features are more agile in addressing both new market opportunities and problems with infrastructure.  Second, componentization creates an exponential increase in complexity that if left untreated will destroy any possible benefit case.  NFV is the operationalization of componentized service features.  Ultimately it will morph into the operationalization of the cloud, joining up with things like DevOps and application lifecycle management (ALM) to form a new application architecture.  And that architecture, like every one before it, will put software to hardware mindful of the fact that capacity is what it is, and you have to manage it thoughtfully whether you virtualize it or not.