The state of technologies like SDN and NFV is important, but it seems we can get to it only in little snippets or sound bites. A couple of recent ones spoken at conferences come to mind. First, AT&T commented that they wanted VNFs to be like “Legos” and not “snowflakes”, and then we had a comment from DT that you don’t want to “solve the biggest and most complex problems first.” Like most statements, there are positives and negatives with both of these, and something to learn as well.
The AT&T comment reflects frustration with the fact that NFV’s virtual functions all seem to be one-offs, requiring almost customized integration to work. That’s true, of course, but it should hardly be unexpected given that not only did NFV specifications not try to create a Lego model, they almost explicitly required snowflakes. I bring this up not to rant on past problems, but to show that a course change of some consequence would be required to fix things.
What we need for VNFs (and should have had all along) is what I’ve called a “VNFPaaS”, meaning a set of APIs that represent the connection between VNFs and the NFV management and operations processes. Yes, this set of APIs (being new) wouldn’t likely be supported by current VNF providers, but they’d provide a specific target for integration and would standardize the way VNFs are handled. Over time, I think that vendors would be induced to self-integrate to the model.
What we have instead is the notion of a VNF Manager that’s integrated with a virtual function and provides lifecycle services. This model is IMHO not only difficult to turn into a Lego, it’s positively risky from a security and stability perspective. If lifecycle management lives in the VNF itself, then the VNF has to be able to access NFV core elements and resources, which should never happen. The approach ties details of NFV core implementation to VNF implementation, which in my view is why everything ends up being a snowflake.
An open, agile, architecture for NFV always had three components—VNFs, infrastructure, and the control software. The first of the three needed a very explicit definition of its relationship with the other two, and we didn’t get it. We need to fix that now.
Snowflakes are also why the notion of not solving the biggest and most complex problems first is a real issue. Yes, you don’t want to “boil the ocean” in a project, but you can’t ignore basic requirements because they’re hard to address without putting the whole solution at risk. The architecture of a software system should reflect, as completely as possible, the overall mission. If you don’t know what that mission is because you’ve deferred the big, complex, problems then you can end up with snowflakes, or worse.
What exactly is an NFV low apple? You’d have to say based on current market attitude that it’s vCPE and the hosting of the associated functions on premises devices designed for that purpose. There are a lot of benefits to this test-the-waters-of-NFV approach, not the least of which is the fact that the model avoids an enormous first-cost burden to the operator. The problem is that, as it’s being implemented, the model really isn’t NFV at all.
There is no resource pool when you’re siting the VNFs in a CPE device. The lifecycle management issues are minimal because you have no alternatives in terms of locating the function and no place to put it in the event of a failure. You can’t scale in our out without violating the whole point of the premises-hosted vCPE model by putting multiple devices in parallel by sending out a new one. Management issues are totally different because you have a real box that can become the management broker for the functions that are being hosted.
It’s also fair to say that the VNF snowflake problem is glossed over here, perhaps even caused here. Nearly all the vendors who offer the CPE boxes have their own programs that integrate VNF partners. That’s logical because the VNFs are really just applications running in a box. Do these boxes have to provide a virtual infrastructure manager (VIM)? Is it compatible with a cloud-hosted VIM? Leaving aside the fact that we really don’t have a hard spec for the VIM overall, you can see that if vCPE hosting isn’t really even a hard-and-fast VIM-based approach, there’s little hope that we could avoid the flakes of falling snow.
The other early NFV application, mobile infrastructure (IMS, EPC) has in a way the same problem from a different direction. Some of the operators testing virtualized IMS/EPC admit that the implementations really look like a static configuration of hosted functions, without the dynamism that NFV presupposes. If you think of a network of virtual routers, you can see that you could go two ways. Way One is that you have computers in place to host software router instances. Way Two is that you have a cloud resource pool in which the instances are dynamically allocated. There’s a lot more potential to Way Two, but are the early applications’ attempt to avoid difficulty/complexity going to favor Way One?
For both the snowflake-avoiders and the difficulty/complexity-avoiders we also have the specter of operations cost issues. It’s hard to imagine how you could do efficient software automation of snowflake-based NFV; lifecycle tasks are embedded in VNFMs and their host VNFs after all. Does this then mean that all of the operations integration would also have to be customized by the resident VNFMs? And surely operations automation is a major goal and a major complexity. Can we continue to ignore it by assuming that dynamic virtual processes can be represented to OSS/BSS as static virtual devices?
I think we’re on the verge of doing with NFV what we have done with a host of other technical “revolutions”. We start with grandiose scope of goals and expectations. We are stymied by the difficulty of defining and then justifying the models. We accept a minimal step to test the waters, then we redefine “success” as the achievement of that step and forget our overall goals. If that happens, then NFV can never impact enough services and customers and infrastructure to have the impact those ten operators who launched it expected it to deliver. Recognition of the problem is the first step in solving it, as they say. It’s not the only step.