Out with the Real, In with the Virtual

The attendance at the NFV meeting in Santa Clara seems a pretty solid indication that NFV has arrived in terms of being of interest.  It’s not a surprise given the support that’s obvious among the big network operators.  They run the meetings and are driving the agenda, an agenda that’s also clear in its goal of shifting network features from costly specialized devices to open platforms.

A move like that has its challenges.  We don’t engineer general-purpose servers to be either stellar data movers or high reliability devices.  There is interest among both operators and vendors, right down to the chip level, in improving server performance and reliability, but the move can only go so far without threatening to invent special-purpose network devices.  Every dollar spent making a COTS server more powerful and available makes it less COTS, and every dollar in price increase reduces the capital cost benefit of migrating something there.

I think it’s pretty obvious that you’re not going to replace nodal devices with servers; data rates of a hundred gig or more per interface are simply not practical without special hardware.  We could perhaps see branch terminations in server-hosted virtual devices, though.  How this limitation would apply in using servers to host ancillary network services like NAT or firewall is harder to say because it’s not completely clear how you’d implement these functions.  While we might ordinarily view a flow involving firewall, NAT, and load-balancing as being the pipelining of three virtual functions, do we actually pipe through three or do we have one virtual device that hosts them all with the pipeline managed only at the software level?  The latter seems more likely to be a suitable and scalable design.

Availability issues also have to be looked at.  You can’t make a COTS server 99.999% available, but you could make multiple parallel hosts that available.  The challenge is that it wouldn’t make it available in the same way as our original five-nines box.  A packet stream might be load-balanced among multiple interfaces to spread across a server complex, but unless the servers are running in parallel the result will still be at least a lost packet or two if one unit fails and you have to switch to another.  That wouldn’t happen if you were five-nines and didn’t fail in the first place.  As I said, it is possible to build a virtual application that has the same practical failure-mode characteristics and availability, but again you’re forced to ask whether you need to do that.  Do even modern voice services have to meet traditional reliability standards given how much voice is now carried on a best-efforts Internet or a mobile network that still creates “can you hear me?” issues every day at some point or another?  We’ll have to decide.

Security may or may not be an issue with hosted functions, including hosting the SDN control plane.  If we assume that virtual functions are orchestrated to create a service, there are additional service access points created at the boundaries and these could in theory be targets of attack.  However, you can likely protect internal interfaces among components pretty easily.  A more significant concern is what I’ve called the DoR or Denial of Resources attack, which is an attack aimed at loading up a virtual function host with work in one area to force a failure of another service being hosted there.  If you can partition resources absolutely, this isn’t a significant risk either.

One area that could be a risk is where a data-plane interface can force a control-plane action and a function execution.  The easiest example to visualize is that of the SDN switch-to-controller inquiry when a packet arrives that’s not in the forwarding table.  The switch has to hand it off to Mother, and if you could force that handoff at a high rate by sending a lot of packets that don’t have a forwarding entry in a short period, you might end up by loading down the controller or the telemetry link.

I don’t think that virtual function or SDN security is going to be worse in the net, but it will almost surely be different.  Same with availability and even performance.  There are things we can do in a hosted model that we can’t do in an iron-box model after all.  Even if, as seems likely for migration/transition reasons, NFV first defines a network of virtual devices that mirrors the network of real devices, it can evolve to one where all network functions would appear to be performed by a single virtual superdevice.

That has operational issues of course.  If your goal is to evolve from a real-box network, you’ll likely need your virtual boxes to mirror the real ones even at the management interface level.  But you can’t get deluded into starting to track failure alerts on virtual devices and dispatching real field techs to fix them!  A virtual device is “fixed” by instantiating it again somewhere else, and it might well be that is done automatically without reporting a fault at all.  It probably should be.  And remember that if we have one virtual device doing everything, we have only one management interface and less management complexity!

The point is that the virtual world is different in that it’s whatever you want it to be.  Any kid who ever daydreamed knows that.  We’ll learn it in the real world too.


Leave a Reply