Stories about network transformation tends to focus on capex, despite the fact that network operators have consistently indicated that opex is likely more important. One reason for this is that opex is one of those giant fuzzball sort of areas where you can make almost any claim and get the numbers to work in your favor. Another is that the whole OSS/BSS space tends to be dull, in no small part because every time you try to talk about it, the example you get is “billing”.
Carol Wilson of Light Reading did a story on CloudNFV yesterday that brings home some of the realities of operations and the challenges of next-gen networking overall. The focus of the piece is how the concept of CloudNFV evolved as the project matured, and in particular how the project found it necessary to expand its scope to cover enough of the network problem set to be able to present some true benefits. My role in CloudNFV is known and I’m not going to reprise it here, but I do want to make some points about that critical question of “scope”.
Let’s say that I’m a builder of nice upscale homes, maybe four or five thousand square feet. These homes contain all manner of carpentry, electrical work, plumbing, flooring, painting…you get the picture. So now let’s say that somebody invents a new way of doing bathroom floors that claims to reduce floor cost by 25%. That kind of thing might induce me as a builder to run out and commit to the new approach.
The problem is that a bathroom floor isn’t the product here, a home is. I have to explore at the minimum two critical questions. First, does that new floor paradigm impact the cost of the surrounding/supporting elements? Suppose the floor costs 25% less but the cost of running plumbing through it doubles. Second, is this paradigm of flooring applicable to a larger part of the house? Atomically, I can’t make a decision on my floor; I need to think along a broader scope.
Remember the “first telephone problem?” It goes, “Phones will never be successful because nobody will ever buy the first one—there’d be nobody to call.” NFV and SDN are not going to sweep into networking overnight and displace legacy technology for the very good reason that this legacy stuff has nearly five years of residual depreciation to be accounted for. We’ll have pockets of new stuff embedded in the cotton ball of legacy networking for years to come. That means that the business cases for SDN and NFV will have to be met inside an operations framework that’s been established by legacy gear for an almost-epochal period of time.
The “First SDN” or “First NFV” has to make a business case, but it has to make it when there’s just a little island here and there. If opex savings are the goal, how do these islands pay back? The majority of the network won’t be “new” and the new stuff will, if anything, present higher costs because it’s different. So what does this mean? It means that if we are going to shift the justification for SDN or NFV from capex to opex as nearly all operators say, we need to be looking primarily at opex. In our just-completed fall survey, all but one Tier One said opex improvements were the benefits that would drive both SDN and NFV forward. But even if we know how SDN or NFV can achieve these benefits inside their little initial enclaves, how do those benefits manifest in the network at large?
There are two pieces to NFV, conceptually. One is the issue of creating network features from virtual functions—what we could call “incremental NFV”. This is what the ETSI ISG is working on. The other issue is creating a management framework that can not only sustain current opex costs/practices as increments of NFV deploy here and there, but actually create a new paradigm for management overall—a paradigm that accommodates NFV islands and then rewards operators for deploying them.
It should be clear to everyone that if we were to define NFV management by simply creating virtual versions of every current device, creating virtual MIBs to correspond with real MIBs, and then linking up to the same management systems we had all along, there’s no change in operations practices and no change in opex. So why do we continually hear about that approach? It can never deliver meaningful opex savings.
The TMF may have the critical elements here. The GB922 specification, known as the “SID”, has proved (in CloudNFV) to be a highly useful framework for modeling customer services and service elements. The GB942 specification, sometimes called the “NGOSS Contract”, defines how a data model of a service that includes resource commitments can then become a conduit for channeling management events to the right resource lifecycle processes. The challenge is that neither of these two specs are used this way today. I think they have potential far beyond anything we’ve tried to exploit so far and I hope NFV (CloudNFV and every other implementation) can exploit that potential.
I think that the right answer to the operationalizing of our future network, including a network that’s rich in SDN and NFV capabilities, is going to be based on the principles of GB922/942. I also think that as we adopt these principles, both the NFV ISG and the TMF are going to have to make some accommodations to the principle of management unity. The most efficient operations practices are those that work for everything. Every exception is a cost center. Only effective mechanisms for abstraction can automate unified management over evolving infrastructure. To me, the most critical lesson that CloudNFV and other NFV implementations can teach us is how do we model services so that we achieve efficient operations, without creating resource-specific service definitions. We’re not addressing that now. We need to be.