Exploiting the New Attention NFV is Getting

You might be wondering whether perhaps NFV is getting a second wind.  The fact that Verizon is looking at adopting ONAP, whose key piece is rival AT&T’s ECOMP, is a data point.  Amdocs’ ONAP-based NFV strategy is another.  Certainly there is still interest among many operators in making NFV work, but we still have two important questions to answer.  First, what is NFV going to be, and do?  Second, what does “work” mean here.

NFV does work, in a strict functional sense.  We have virtual CPE (vCPE) deployed in real customer services.  We have some NFV applications in the mobile infrastructure space.  What we don’t have is enough NFV to make any noticeable difference in operator spending or profit.  We don’t have clear differentiation between NFV and cloud computing, and we don’t have a solid reason why that differentiation should even exist.  We won’t get those things till we frame a solid value proposition for “NFV” even if it means that we have to admit that NFV is really only the cloud.

Which it is.  At the heart of NFV’s problems and opportunities is the point that its goal is to host some network features in the cloud.  That by rights should be 99% defining feature hosting as a cloud application and 1% doing what special things arise that might demand more than public cloud tools would provide.  What are the differences?  These are the things that have to justify incremental NFV effort, or justify cloud effort to expand the current thinking on cloud computing to embrace more of NFV’s specific mission.

The biggest difference between a cloud application and an NFV application is that cloud applications don’t sit in a high-volume data plane.  The cloud hosts business applications and event processing, meaning what would look more like control-plane stuff in data networking terms.  NFV’s primary applications sit on the data plane.  They carry traffic, not process transactions.

Traffic handling is a different breed of application.  You cannot, in a traffic application, say that you can scale under load, because adding a parallel pathway for data to follow invites things like out-of-order arrivals.  Doesn’t TCP reorder?  Sure, but not all traffic is TCP.  You have to think a lot more about security, because traffic between two points can be intercepted and you could introduce something into the flow.  Authenticating traffic on a per-packet basis is simply not practical.

NFV applications probably require different management practices, in part because of the traffic mission we just noted, and in part because there are specific guarantees (SLAs) that have to be met.  Many network services today have fairly stringent SLAs, far more stringent than you’d find in the cloud.  You can’t support hosting network functions successful if you can’t honor SLAs.

So, we have proved that you do need something—call it “NFV”—to do what the cloud doesn’t do, right?  I think so, but I also think that the great majority of NFV is nothing more than cloud computing, and that the right course would be to start with that and then deal with the small percentage that’s different.  We’ve not done that; much of NFV is really about specifying things that the cloud already takes care of.  Further, at least some of those “NFV things” really should be reflected in cloud improvements overall.  Let’s look at some of the issues, including some that are really cloud enhancements and some that are not, to see what our second-wind NFV would really have to be able to address if it’s real.

Cloud deployment today is hardly extemporaneous.  Even to deploy a single virtual function via cloud technology would take seconds, and an outage on a traffic-handling connection that’s seconds long would likely create a fault that would be kicked up to the application level.  There are emerging cloud applications that have similar needs.  Event processing supposes that the control loop from sensor back to controller is fairly short, probably in the milliseconds and not seconds.  So how do we deploy a serverless event function in the right place to handle the event, given that we can’t deploy an app without spending ten times or more the acceptable time?

Networks are run by events, even if traffic-handling is the product.  Clouds are increasingly aimed at event processing.  What makes “serverless” computing in the cloud revolutionary isn’t the pricing mechanism, it’s the fact that we can run something on demand where needed.  “On demand” doesn’t mean seconds after demand, either.  We need a lot better event-handling to make event-based applications and the hosting of network functions workable.

Then there’s the problem of orchestrating a service.  NFV today has all manner of problems with the task of onboarding VNFs.  We have identified at this point perhaps a hundred discrete types of VNF.  We have anywhere from one to as many as about a hundred implementations for a given type.  None of the implementations have the same control-and-management attributes.  None of the different types of VNF have any common attributes.  Every service is an exercise in software integration.

But what about the evolving cloud?  Today we have applications that stitch components together via static workflows.  The structure is fixed, so we don’t have to worry excessively about replacing one component.  Yet we already have issues with version control in multi-component applications.  Evolve to the event-chain model, where an event is shot into the network to meet with an appropriate process set, and you can see how the chances of those appropriate processes actually being interoperable reduces to zero.  The same problem as with NFV.

Then we have lifecycle management.  Cooperative behavior of service and application elements is essential in both the cloud and NFV, and so we have to be able to remediate if something breaks or overloads.  We have broad principles like “policy management” or “intent modeling” that are touted as the solution, but all policy management and all intent modeling are at this point are “broad principles”.  What specific things have to be present in an implementation to meet the requirements?

Part of our challenge in this area is back to those pesky events.  Delay a couple of seconds in processing an event, and the process of coordinating a response to a fault in a three-or-four-layer intent model starts climbing toward the length of an average TV commercial.  Nobody likes to wait through one of those, do they?  But I could show you how just that kind of delay would arise even in a policy- or intent-managed service or application.

There is progress being made in NFV.  We have an increased acceptance of the notion that some sort of modeling is mandatory, for example.  We have increased acceptance of the notion that a service model has to somehow guide events to the right processes based on state.  We even have acceptance of an intent-modeled, implementation-agile, approach.  We still need to refine these notions to ensure that they’ll work at scale, handling the number of events that could come along.  We also need to recognize that events aren’t limited to NFV, and that we have cloud applications evolving that will be more demanding than NFV.

My net point here is that NFV is, and always was, a cloud application.  The solutions to NFV problems are ultimately solutions to broader cloud problems.  That’s how we need to be thinking, or we risk having a lot of problems down the line.