One of the (many) implicit contradictions in SDN, NFV, and even cloud deployment is the conflict between infrastructure capital cost and infrastructure TCO. The issues are created by a well-known truth, which is that it’s more expensive to make something work well than to just make it work. How much more? How good is “well?” Those are the issues.
One area where there seems to be some happy agreement is the notion that network infrastructure will be held to a higher standard of availability and operationalization than a simple corporate server platform. I blogged last year about the fact that “NFVI” or the NFV infrastructure framework would, for servers and the associated platform software, have to be capable of better networking performance than standard systems. I also blogged more recently about the Wind River Carrier Grade Communications Server (CGCS), a platform intended to combine the right open-source ingredients to create a suitable framework for SDN, NFV, and perhaps even carrier cloud.
Wind River has now provided benchmark data on the specific topic of vSwitch performance. This is especially critical to NFV because most commercially valuable NFV deployments would have more horizontal integration of components than traditional IaaS cloud apps would have, and so would exercise vSwitches rather heavily. The data shows a 20x performance improvement with up to 33% less CPU cycles, which equates to having more residual processing capacity available for hosting VNFs. Jitter is reduced and performance scales predictably (and almost linearly) with the number of cores allocated.
There’s no doubt that a traditional NFV hosting platform would benefit from this sort of thing, which is achieved without special NICs or hardware acceleration. The interesting question the story raises is whether this means that a CGCS with the right NICs and dedicated to switching via OVS might perform well enough to displace a legacy switch or router in a wider range of applications.
Operators are fairly interested in the idea of having branch or service-edge-routing functions displaced by their virtual equivalent. They are becoming more interested in broader use of hosted switching/routing, though my latest survey shows that they are more willing to accept virtual network switch/routing elements where the element is dedicated to a single customer. The question of performance is one reason why “more interested” hasn’t become “highly interested”—it is in fact the largest reason. There are others.
Next on the issue hit parade is the intersection between availability and operationalization. It’s fairly clear to operators that there are benefits to being able to spin up router/switch instances ad hoc, create distributed functionality to improve availability, and significantly improve MTTR by being able to stamp out another collection of virtual functions to replace a failed element, faster than you could pull out a legacy box and put in a new one. What is less clear is just how big these benefits are and what would be required operationally to make it happen.
A fixed platform like CGCS has an advantage in that it’s “code train” is organized, synchronized, and integrated for stability. I’ve run into many cases where network integration has been limited by issues in versioning various software elements to support a common set of network features or services. That addresses some of the variables in operationalization and availability calculation, but not all. The work of the NFV ISG is demonstrating that a virtual switch or router is just the tip of an operational infrastructure iceberg. There are a lot of questions raised by what might be lurking below the surface.
All forms of redundancy-based availability management and all forms of virtual-replacement MTTR management rely on an effective process of replacing what’s broken (virtually) and rerouting things as needed. In many cases this means providing for virtual distributed load-balancing and failover. We know this sort of thing is needed, and there are plenty of ways to accomplish it, but we’re still in the infancy of proving out the effectiveness of the strategies and picking the best of breed in cost/benefit terms.
This is where Wind River might take a step that elevates it from being a kind of shrink-wrapped middleware player to being a network infrastructure solutions player. There is no reason why other operationalizing features or tools couldn’t be added to CGCS. If I’m right in my assertion that a complete high-level management and orchestration system can be created using open standards and open source software, then there’s a path for Wind River to follow in that critical direction.
Imagine a complete orchestration, operations, and management solution integrated into a server platform for SDN, NFV, and the cloud. Now, rolling out virtual stuff for any mission becomes a lot easier and the results become a lot more predictable. Pre-integrated operational features, if they’re vertically integrated with the rest of the infrastructure and horizontally integrated across domains (separated for administrative reasons, performance, or because they represent different operator partners) could make all of NFV from top to bottom into a plug and play domain. That would be a significant revolution in NFV, perhaps enough to move the whole notion of NFV along much faster.
Five of every seven operators who admit to looking at NFV say that they believe they would like to field-trial something in 2014 and have production deployments at a scale of at least a third of their target footprint by the end of 2015. If that scale of deployment were reached, it would make NFV the largest incremental builder of datacenter complexes in the market. However, almost all of the operators with these lofty hopes say they don’t expect them to be realized. Issues of operational integration are at the top of the list of reasons why.
Wind River has proved that you can make a server into a network platform. They are very close to the point of being able to prove that servers can be a network too, and the only barrier to that is the operational integration. Get that right, and they have the secret sauce.
And remember, Intel owns Wind River. I noted in a recent blog that the commoditization trends on the hardware side could have the effect of driving hardware momentum down to the chip level, and even encourage chip players like Intel to become server/platform integrated providers. That would truly drive change in network operator deployment of all of our revolutionary technologies.