What Can We Learn from the Light Reading NFV Tests?

Light Reading has published the first of what they promise will be a series of NFV tests, run in combination with EANTC, and the results are summarized HERE.  I think there are some useful insights in the results, but I also think that there are fundamental limitations in the approach.  I’m not going to berate Light Reading for the latter, nor simply parrot the former.  Instead I’ll pick some main points for analysis.  First, I have to summarize what I perceive was done.

This first set of tests were targeted at interoperability at the deployment level, meaning VNFs deploying on NFVI.  Most of the functional focus, meaning software, would thus fall into the area of the Virtual Infrastructure Manager (VIM), which in all the tests was based on OpenStack.  I can’t find any indication of testing of MANO, nor of any program to test operations/management integration at a higher level.

This is the basis for my “fundamental limitation” point.  NFV is a technology, which means that you have to be able to make it functional, and the Light Reading test does make some good points on the functional side.  It’s also a business solution, though; something that has to address a problem or opportunity by delivering benefits and meeting ROI goals.  We cannot know from the test whether NFV could do that, and I contend that implementations that can’t make a business case are irrelevant no matter how well they perform against purely technical standards.

Now let’s get to the points made by the tests themselves.  As I said, I think some of the results were not only useful but highly evocative, though I don’t think what I saw as important matched Light Reading’s priorities.

The first point is that OpenStack is not a plug-and-play approach to deployment.  This is no surprise to me because we had issues of this sort in the CloudNFV project.  The problem is that a server platform plus OpenStack is a sea of middleware and options, any of which can derail deployment and operation.  The report quotes an EANTC source:  “There were tons of interop issues despite the fact that all NFVis were based on OpenStack.”

The lesson here isn’t that OpenStack isn’t “ready to play an interop role” (to quote the report) but that it’s not sufficient to guarantee interoperability.  That’s true of a lot of (most of, in my view) network-related middleware.  There are dependencies that have to be resolved, options that have to be picked, and none of this is something that operators or VNF vendors really want to worry about.

What we have here isn’t an OpenStack failure but an abstraction failure.  The VIM should represent the NFV Infrastructure (NFVI) no matter what is below, presenting a common set of features to support deployment and management.  Clearly that didn’t happen, and it’s a flaw not in OpenStack but in the ISG specifications.  All infrastructure should look the same “above” the VIM, converted by the VIM into an intent model that can represent anything on which VNFs can deploy.   The specifications are not sufficient for that to happen, and the notion of a fully abstract intent model is absent in any event.

You can see another indication of this abstraction failure in the discussion of interoperability issues.  There are discussions of OpenStack Nova, Heat scripts, and so forth.  None of this should matter; a VNF should never “see” or be impacted by specifics of the implementation.  That’s what VIMs are supposed to cover, creating a universal platform for hosting.  It is unacceptable that this did not happen, period.

The next point is that NFV management is broken.  I’ve noted all along that the decision to create a hazy management framework that includes external or VNF-integrated VNF Managers (VNFMs) has negative ramifications.  The tests show that the decision has one I didn’t predict, which was dependence on a tie between VIM management and OpenStack that was never fully described and isn’t really present.  The VIM abstraction must represent NFVI management in a consistent way regardless of how the NFVI is implemented and how the VIM uses it.  The tests show that too much of OpenStack is exposed in the deployment and management processes, which makes all of the service lifecycle stages “brittle” or subject to failure if changes occur underneath.

The model that’s been adopted for VNFM almost guarantees that lifecycle management would have to be integrated tightly with the implementation of the VIM, and perhaps (reading between the lines of the report) even down to the actual OpenStack deployment details.  That means that it will be very difficult to port VNFs across implementations of NFVI.  The VNFMs would likely not port because they can’t exercise an intent-model-level set of management facilities.

I also have concerns that this level of coupling between VNFM and infrastructure will create major security and compliance problems.  If VNFMs have to run HEAT scripts, then how do we insulate an OpenStack instance from incorrect or insecure practices?  Can we prevent one VNFM (which represents one vendor’s notion of a service for a single user) from diddling with stuff that impacts a broader range of services and users?

The third issue raised in the tests was that NFV spends too much time describing logical inter-VNF relationships and not enough time on describing how the VNFs themselves are expecting to be deployed on the network.  This is a problem that came up very early in ISG deliberation.  Every software-implemented network feature expects certain network connections; they’re written into the software itself.  What the Light Reading test showed is that nobody really thought about the network framework in which VNFs run, and that made it very difficult to properly connect the elements of VNFs or link them to the rest of the service.

The most common example of a VNF deployment framework would be an IP Subnet, which would be a Level 2 network (Ethernet) served by a DHCP server for address assignment, a DNS server for URL resolution, and a default gateway to reach the outside world.  The virtual function components could be connected within this by tunneling between ports or simply by parameterizing them to know where to send their own traffic.  To know that traffic is supposed to follow a chain A-B-C without knowing how any of these are actually connected does no good, and the testing showed that.

But this is only the tip of the iceberg.  As I’ve said in prior blogs, you need a specific virtual networking address and membership model for NFV, just as you need one for the cloud.  Amazon and Google have their own, and Google has described its approach in detail (Andromeda).  Without such a model you can’t define how management elements address each other, for example, and how NFV components are separated from service components.

All of this results from the bottom-up approach taken for NFV specifications.  Nobody would design software like that these days, and while nobody disputes that NFV implementation is going to come through software, we’ve somehow suspended software design principles.  What we are seeing is the inevitable “where does that happen?” problem that always arises when you build things from primitive low-level elements without a guiding set of principles that converge you on your own benefit cases.

So where does this leave us?  First, I think the Light Reading test found more useful things than I’d thought it might.  This was dissipated a bit by the fact that the most useful findings weren’t recognized or interpreted properly, but tests aren’t supposed to solve problems but rather to identify them.  Second, I think the test shows not that NFV is fairly interoperable (they say 64% of test combinations passed) but that we have not defined a real, deployable, model for NFV at all.  Truth be told, the tests show that NFV silos are inevitable at this point, because operators could never wait for the issues above to be resolved through a consensus process of any sort.

But this isn’t bad (surprisingly).  The fact is that the operators are largely reconciled to service-specific, trial-specific, NFV silos to be integrated likely by operations processes down the road.  The points of the test are helpful in identifying that those unifying operations processes will have to contend with.  However, I think that PoCs or trials are the real forums for validating functionality of anything, particularly NFV, and that these vehicles will have to show results for operators no matter what third-party processes like Light Reading’s tests might show.