Stepping Up to, and Beyond, NFV

As we start to hear more about NFV from the application and services side, it’s becoming clear that there are different views of NFV insofar as its relationship with device/appliance networks, hosted functions, and the cloud.  From a benefits perspective it’s important to understand these differences because any specific NFV benefit can drive things forward only as far as the overall NFV model is accepted, and that will depend in part on its marginal utility, meaning how much better it is than the other alternatives.  And how well it does, if it gets going, will depend on the utility it can demonstrate toward the step beyond.

IMS and EPC are popular targets of NFV today.  Both arise out of the 3GPP evolution of mobile networking from circuit voice to packet multimedia.  Both specifications include functional elements like CSCF and MME and eNodeB that are typically mapped into appliances in current deployments.  What we’re now hearing are proposals to translate these to virtual functions to be managed and orchestrated by NFV principles.

But how, exactly?  What we see today is mostly a set of 1:1 mappings of IMS/EPC elements to software images, to be loaded in the places the devices were installed before.  If the goal of NFV is indeed what it was back in October 2012 when it kicked off—reduce capex by substituting hosting for appliances—then this is fine.  But operators have said many times that operations efficiency and service agility is really the goal.  So how far can NFV go to achieve that in the “mapped IMS/EPC” example?

We can do some things, of course.  We know from recent announcements that you can spin up additional EPC nodes to handle call loads, and that’s a good thing, but there’s an obvious question, which is whether we’d do EPC using the same functional divisions today, knowing what we know about the way to build agile applications and services?  For example, you can’t spawn pieces of a PGW or SGW device, so would you create functional subdivisions to optimize the ability to scale components horizontally?  Probably not.  Metaswitch’s Project Clearwater IMS shows that optimized functionality doesn’t necessarily fit the device boundaries of the past; you have to support the critical interfaces but you can still do the interior stuff in a more modern and virtual way.

When you map 1:1 between appliances and hosted functions you’re really not doing much more than porting elements of IMS/EPC to servers.  Perhaps, if you support cloud hosting you can say that you’re a cloud player, but you really have to be able to do something more dynamic than just sticking an app in a server VM and leaving it there till it breaks to say you’re NFV—at least you have to if you hope to drive NFV forward in a useful way.  So you can say that what differentiates the cloud model of network functions from the hosted model is resource dynamism, and what differentiates the NFV model from the cloud model is functional dynamism.  That’s why I think that you have to assume functional modeling and orchestration are part of service-building in an NFV world.

Functional dynamism is an expression of the variability of component relationships in a service.  You can get something highly dynamic because it has to scale or vary significantly for performance/reliability reasons, or because it’s actual functionality varies over a short period of time.  The former is a response to a combination of resource reliability and variation in workloads, and the latter is a function of the duration of the service relationship.

Functional dynamism is an important concept for NFV because it expresses the extent to which a service really benefits from virtualization of resources.  The range across which functional dynamism operates also determines just how sophisticated an NFV MANO function would have to be in order to optimize the binding of resources to services.  The fact is that no matter what you do to provide scaling, functional dynamism potential isn’t at its highest for multi-tenant applications like IMS/EPC for the simple reason that these applications aren’t instantiated per user but per service, and variability is somewhat dampened by the law of large numbers.  Yes, you’ll have macro events like a sporting event letting out, but these events happen on a schedule of days, not seconds.

If I’m right about functional dynamism, then it’s also fair to say that the popular “service chaining” applications may not be ideal for NFV either.  Even though business branch access isn’t inherently multi-tenant (you’re likely to see everyone with their own instance of a firewall, for example) the stuff is sold on a multi-year contract.  That means the components are likely to be put into place and stay there unless something breaks, so we’re back to almost that model of multi-tenancy in terms of variability.  And think about it.  How valuable is resource optimization and orchestration if you do it once a year?  You can’t save enough per occurrence to make much of a difference in revenue or cost, so there’s no driving benefit.

What all of this would mean is that the low apples for NFV may be too low; they may already be on the ground and so they don’t justify automating the “picking” process.  Yes, I know operators are all excited about the potential for service automation, but the benefit case for it is fragile if you focus your attention on things that are just not done that often.  That’s a good reason why you can’t afford to focus your MANO benefits on the part of a service that’s actually based on virtual functions—you won’t address enough in your automation efforts to change costs or improve agility enough to drive things forward.

But here’s the thing.  We’re forgetting IMS and EPC.  We don’t build an IMS/EPC instance for every call or even every customer.  They’re multi-tenant, remember?  Well, think on this point.  Suppose we make our functional relationships more and more dynamic, view services and applications as momentary ships passing in the IT night, created on demand?  Then as our dynamism increases, we reach a point where it’s less difficult to pass work among static instances of processes than to spawn processes for the purpose of doing work.   We’ve made the management workflow larger, the functional workflow smaller, and eventually they cross.  That’s why I think it’s important to think about the fusion of service logic and service management.  They’re inevitably going to fuse in a mass market, because we’ll address dynamism at scale by eliminating “provisioning” completely.

NFV is the implementation of a critical step, a step toward function-defined workflows and dynamic associations of processes with activities.  That’s what mobility is, what the thing I’ve called “point-of-activity empowerment” is, and even what the not-nonsense-and-hype part of the Internet of Things/Everything is.  It’s important we know that because it’s important that we judge NFV not just by what it is, but by what it will necessarily become.