NFV has two “visible” components—the virtual network functions that provide service features and the management and orchestration logic that deploys and sustains them. It also has a critical, complicated, often-ignored third function, which is the network functions virtualization infrastructure (NFVI). NFVI is the pool of resources that VNFs and MANO can exploit, and its contents and its relationship with the rest of NFV could be very important. I’ve blogged a bit on elements of NFVI like data path acceleration, and now I want to look at the topic more broadly.
The first point on NFVI is that it’s not clear what’s in it. Obviously whatever hosts VNFs must be part of NFVI, and so must the resources to connect among the VNFs in a service. But even these two statements aren’t what they seem—definitive.
Operators tell me that they intend to host VNFs in the cloud (100%), on virtualized server pools not specifically part of the cloud (94%), and on bare server metal (80%). All of these hosting resources are generally accepted as being part of NFVI. However, almost three-quarters of operators also say they expect to host VNFs on network devices—either on boards slipped into device chassis slots or directly on a device or chip. There’s a bit more ambiguity as to whether these additional hosting resources are valid NFVI, and that could have significant consequences.
One such consequence relates to the “connect among VNFs” mission. If everything in NFVI is a cloud element then cloud network-as-a-service technology can provide connectivity among the VNFs. If we presume that some of the stuff we’re “stuffing” into NFVI isn’t cloud-driven, we have to ask at the minimum how we’d drive the connectivity. OpenStack Neutron connects among cloud elements, but what connects without a cloud? But “Neutron-or-No-Neutron” is less a question than the question of legacy network elements, a question that hits you right between the eyes as soon as you say that NFVI includes anything that’s on or in a network device.
Even if we were to make an edge router into a cloud element by running Linux on a board, installing KVM, and supporting OpenStack, you’d still have the problem of the gadget being somewhere out there, separated from traditional virtual-network interconnection capabilities available inside a cloud data center. This configuration, which would be typical for edge-hosted services that extend business VPNs, demands that the connection between edge-hosted VNFs and any interior VNFs be made over “normal” infrastructure. It’s not a simple matter of making the cloud board a member of the customer VPN, because that would make it accessible and hackable, and it would also raise the problem of how it is managed as a VPN edge device when other service VNFs are presumably inside the carrier cloud.
Even this isn’t the end of it. It’s almost certain that even cloud-hosted VNFs will have to be integrated with legacy service elements to create a complete user offering. If VNF MANO is fully automated and efficient and the rest of the service operationalization processes are still stuck in the dark ages, how do we achieve any service agility or operating efficiencies? There’s a simple truth few have accepted for NFV, and that is that everything that has to be orchestrated has to be managed as a cooperative unit to insure that the cooperation that was set up is sustained through service life.
All of this is daunting enough, but there’s more. One of the things I learned through two projects building service-layer structures (ExperiaSphere and CloudNFV) is that “cooperative systems” being deployed are darn right uncooperative sometimes. Linux has versions, Python has versions, OpenStack has versions, we have guest OS and host OS and hypervisors and accelerators. Every piece of platform software is a puzzle element that has to be assembled, and any time something changes you have to fit things together again. People involved in software design and deployment know that, and apply structured principles (application lifecycle management or ALM) to this, but systemic applications like NFV add complexity to the problem because their platforms will surely evolve (at least their components will) during service operation.
I had a recent call with Wind River about this, and I think they have a good idea. Wind River is best known for embedded software technology, but they have this thing they call a “Carrier-Grade Communications Server” that’s a platform for NFV and is designed to three two very important NFVI requirements. First, stability. Wind River proposes to advance the whole platform software stack in a coordinated way so that the underlying elements will always stay in sync with each other. Second, optimization. Things like OpenStack and the data plane are hardened in terms of availability and accelerated in CGCS, which can make a huge difference in both performance and in uptime. Third, universality. Wind River supports any guest OS (even Windows), so they don’t impose any constraints on the virtual-function machine images or other management and orchestration components that will have to run on the platform.
You could integrate your own NFV platform, but I suspect that anyone who’s not done this already doesn’t know what they’re in for. NFVI is going to require the distro strategy be implemented—each component is going to have to have a formal “release level”, whether it’s a software component or a physical device or switch. That release level has to mate functionally with the things that are expected to use the element, and coordinated changes to release level and partner control functions will be critical if NFV is to run at acceptable levels of performance and availability.
There’s still a major open question on how NFVI relates to the rest of NFV, and one of my questions is whether Wind River might build a layer on top of the integrated platform it low calls CGCS to add in at least some form of what the ETSI activity calls the “Virtual Infrastructure Management” function. I also wonder if they might provide tools to harmonize the management of each orchestrated, cooperating, set of components in a service. The further they go with CGSC in a functional sense, the more compelling their strategy will be to operators…and even to enterprises committed to cloud computing as a future IT framework.