I’ve been blogging over the last week about things related to the SDN World Congress event and what the activity there shows about the emerging SDN and NFV space. I’ve covered issues of deployment and management of complex virtualized systems, high-performance network data paths, and DPI as a service. For this, my last blog on the Congress topics, I want to look at the virtual network functions themselves.
Every substantial piece of network software today likely runs on some sort of platform, meaning there’s a combination of hardware for which the software is complied into machine code, and software that provides service APIs to the network software with operating system and middleware services. If you have source code for something, you can hope to recompile it for a new environment, but even then there will be a need to resolve the APIs that are used. For example, if a firewall or an EPC SGW expects to use a specific chip for the network interface, the platform software is written for that chip and the firewall or EPC is written for the APIs that expose chip access. You can’t just plunk the software on a commercial x86 server under Linux and expect it to run, and if you don’t own the rights to the software or have a very permissive license, you can’t do anything at all.
Given that the ETSI NFV process expects to have virtual functions to run, it’s clear that they’ll have to come from some source. The most logical place for a quick fix of software would be the vast open-source pool out there. We have literally millions of software packages, and while not all of them are suitable for network-function translation, there’s no question that there are hundreds of thousands that could be composed into services if we can somehow make them into network functions. But…there’s a lot more to it. The ISG wants virtual network functions (VNFs) to be scalable, have fail-over capability, and be efficient enough to create capex savings. How does that happen?
When I was looking at candidates for a CloudNFV demo, I wanted something that addressed one of the specific ETSI use cases, that could provide the performance and availability enhancements that the ISG was mandating, and that could be deployed without making changes to the software itself just to make it NFV-compliant. The only thing I could find was Metaswitch’s Project Clearwater open-source IMS.
Metaswitch’s Project Clearwater was a project to demonstrate that you could do a modern cloud-based version of an old 3GPP concept and get considerable value out of the process. The idea was to create not just an IMS that could run in the cloud but rather one that was optimized for the cloud. There are a number of key accommodations needed for that, and fortunately we get a look at them all in Metaswitch’s Project Clearwater project.
First, the application had to self-manage its cloud behavior. Metaswitch’s Project Clearwater contains a provisioning server that can manage the spin-up of new instances and handle the way that the DNS is used to load balance among multiple instances of a given component. Because of this, the management system doesn’t have to get involved in these functions, which means that there’s no need to expose special APIs to make that all happen.
Second, the application has to have stateless processes to replace the normally stateful behavior of IMS elements. Call processing is contextual, so if you’re going to share it across multiple modules you have to be sure that the modules don’t store call context internally, because switching modules will then lose the context and the call. Metaswitch’s Project Clearwater does a sophisticated back-end-distributed state management that allows for switching modules without loss of call context. That works for both horizontal scaling and for fail-over.
Third, the application has to have the least possible customization needed to conform to VNF requirements. Even open-source software has licenses, and if we were to grab a network component and specialize it for use in NFV we’d have to fork the project to accommodate any special APIs needed. If those APIs were to involve connecting with proprietary interfaces inside an NFV implementation, we’d have a problem with many open-source licenses. Metaswitch’s Project Clearwater can be managed through simple generic interfaces and there’s no need to extend them to provide NFV feature support.
It was a primary goal for CloudNFV to be able to run anything that can be loaded onto a server, virtual machine, container, or even a board or chip. Meeting that goal is largely a matter of wrapping software in a kind of VNF shell that translates between the software’s management requirements and broader NFV and TMF requirements. I think we designed just that capability, but we also learned very quickly that software that was never intended to fail over or horizontally scale wasn’t going to do that even with our shell surrounding it. That’s why we started with Metaswitch Project Clearwater—it’s the only thing we could find that was almost purpose-built to run in an NFV world even though NFV wasn’t there at the time the project launched.
We also learned a couple of things about VNF portability along the way. There is always a question of how stuff gets managed as a virtual function, how it asks for a copy to be spun up, or how it is deployed with the right setup and parameterization. It’s tempting to define management interfaces to address these things, but right now at least the NFV ISG has not standardized all the APIs required. Since the scope of the ISG is limited to the differences in deployment and management created by virtualization of functions, it may never define them all. Any specialized API creates a barrier to porting current software to NFV use, and any non-standard APIs mean VNFs themselves are not portable at all. That could be a big problem; creating software silos instead of appliance silos isn’t what most ISG founders had in mind, I’m sure.
The point here is that there are a lot of things we have to learn about VNFs and how they’re deployed. I learned a bunch of them in working with Metaswitch on our Project Clearwater demo, and I’m fairly confident that Metaswitch learned even more in its Project Clearwater trials and tests. I’d bet they’re looking to apply this to other virtual function software, and if they do then how they build this stuff will be a valuable reference for the industry, a critical element in meeting the goals of NFV.