Platform Issues in a Virtual-Networking World

There are always two components to any software-based solution to a problem—the software that implements the solution and the platform that software runs on.  It’s not uncommon for us to take the latter for granted, but in the world of virtualization that’s a big mistake.  For SDN and NFV in particular, platforms are really important stuff.

Servers are where virtualization starts, of course, and everyone knows that things like the total memory available and the number of cores/processors will impact the performance of systems, but it’s more complicated than that.  The availability of a complex system is reduced when you add components unless the components are redundant and load-balanced effectively (meaning that the load balancer isn’t a single point of failure), so server reliability is actually likely to be more important for SDN/NFV.  Remember, those atomic appliances were engineered for higher availability, and so you may actually lose uptime if you pick a cheap server to host the equivalent function on.

Another thing about servers that transitions us into software is the input/output.  Most commercial servers are optimized for disk I/O for the good reason that disk systems are typically critical in the performance of the applications they run.  SDN and NFV may use storage for images and parameters but in general will be pushing their I/O through network connections.  Network data path performance is fairly sad on a typical system; you need to be sure that you have optimized adapters but also that you have optimized software for the data path.

A recent article in SDNCentral about software data path acceleration firm 6WIND has a pretty good explanation of why network I/O needs path optimization in SDN and NFV applications; http://www.sdncentral.com/education/6wind-nfv-interview-2013/2013/08/.  Virtual switching and routing, even when used outside SDN/NFV, is also going to benefit significantly from data path acceleration, to the point where operators tell me that some of their virtual switch/router applications aren’t going to make the business case without it.

Hypervisors and Linux are two other big factors in efficient virtual-function or virtual-network applications.  Users report some pretty significant differences in hypervisor performance, and with both hypervisors and Linux there may be issues with the support of the installed platforms since many of the products are open-source.  Most operators say they expect to get their open-source tools from vendors who will offer support contracts, and many want the software and hardware pre-integrated to insure that there’s no finger-pointing.

Data center networking is also very important in virtual-function/network applications.  I’ve been working pretty hard on the architecture of NFV applications, for example, and it’s clear that one attribute of these applications is that not only are there more horizontal traffic paths in a typical application than you’d find in a commercial business application, they are much higher in speed.  Less than 20% of commercial data centers get any significant benefit from horizontally optimizing switching in the data center, and even for the cloud the number isn’t much more than 25%, but my model says that virtual-function hosting would require better (fewer-hop) horizontal performance in over 75% of cases.

In SDN and NFV, I also think that the metro connectivity is part of the “platform”.  Operators tell me that eventually NFV would involve deploying at least mini-data-centers everywhere they had real estate, meaning in every CO and many depot facilities as well.  In the US, for example, we could end up with between 14,000 and 35,000 of these data centers, and future services would be composed by linking virtual functions and application components across data center boundaries.  That means that those data centers will have to be networked very efficiently, at a high level of performance and availability, to insure that operations costs or QoS/availability problems don’t explode.

From a management perspective, you can see this is a bit of a brave new world too.  In a network of real devices, you have today what’s typically an adaptive system that fixes itself to the greatest possible degree.  Yes you have to “manage” it in the sense of sending a tech out to replace or repair something, but it’s a different management mission than would be needed to sustain a system of virtual functions.  Where you put them in the first place matters, and there’s a bunch of connecting tissue (the network paths) that stitch a virtual device together, and each of these have to be managed.  That means that we may have to rethink the notion of autonomous processes for service remediation, and we’ll darn sure have to think hard about how we even find out the status of a virtual-based service.

The final question for all of this platform stuff is, as I suggested in my opening, the functionality that runs on it.  What exactly has to be done to write virtual components of services?  Is a service cloud close to an IaaS service in that it has little more than operating system and basic OS-related middleware, or does it also include APIs to manipulate the environment, manage components, etc?  Is it like Azure for services, a PaaS?  If the answer is the latter, then we have no virtual functions out there today and we’ll have to write or rewrite to create them.  For that to work, we’d need some standards for the platform APIs, or no development would be portable among platforms and we’d have chaos in trying to put pieces of services together.  But if a virtual-function system is based on standard software, how do we add in the NFV-specific or SDN-specific elements needed to organize and manage services?

The cloud, SDN, and NFV are fusions of network and IT.  Most vendors don’t cover the full range of products needed to build this new model, and so not surprisingly most of them are dissecting it to pick out what they can sell.  The problem is that we’ve not done much of a job of putting the system together in the first place, so the dissection is breaking down the ecosystem into disconnected pieces.  Whatever is different about the network of the future versus that of the present, both of them will still rely on connected pieces.

 

Leave a Reply