What’s Missing in Operator SDN/NFV Visions?

The news that AT&T and Orange are cooperating to create an open SDN and NFV environment is only the latest in a series of operator activities aimed at moving next-gen networks forward.  These add up to a serious changing-of-the-guard in a lot of ways, and so they’re critically important to the network industry…if they can really do what they’re supposed to.  Let’s take a look at what the key issues are so we can measure the progress of these operator initiatives.

“Box networking” has created current infrastructure, and boxes are hardware elements that have long useful lives and development cycles.  To insure they can build open networks from boxes, operators have relied on standards bodies to define features, protocols, and interfaces. Think of box networks as Lego networks; if the pieces fit then they fit, and so you focus on the fitting and function.

Today’s more software-centric networks are just that, software networks.  With software you’re not driving a five-to-seven-year depreciation stake in the ground.  Your software connections are a lot more responsive to change, and so software networks are a bit more like a choir, where you want everything to sound good and there are a lot of ways of getting there.

The biggest challenge we’ve faced with SDN and NFV is that they have to be developed as software architectures, using software projects and methods, and not by box mechanisms.  In both SDN and NFV we have applied traditional box-network processes to the development, which I’ve often characterized as a “bottom-up” approach.  The result of this was visible to me way back in September of 2013 when operators at an SDN/NFV event were digging into my proposals for open infrastructure and easy onboarding of VNFs—two of the things operators are still trying to achieve.  When you try to do the details before the architecture, things don’t fit right.

The problem with SDN and NFV isn’t competing standards or proprietary implementations as much as standards that don’t really address the issues.  The question is whether the current operator initiatives will make them fit better, there are a number of political, technical, and financial issues that have to be overcome.

The first problem is that operators have traditionally done infrastructure planning in a certain way, a way that is driven by product and technology initiatives largely driven by vendors.  This might sound like operators are just punting their caveat emptor responsibilities, but the truth is that it’s not helpful in general for buyers to plan the consumption of stuff that’s not on the market.  Even top-down, for operators, has always had an element of bottom-ness to it.

You can see this in the most publicized operator architectures for SDN/NFV, where we see a model that still doesn’t really start with requirements as much as with middle-level concepts like layers of functionality.  We have to conform to current device capabilities for evolutionary reasons.  We have to conform to OSS/BSS capabilities for both political and technical reasons.  We have to blow kisses at standards activities that we’ve publicly supported for ages, even if they’re not doing everything we need.

The second problem is that we don’t really have a solid set of requirements to start with, we have more like a set of hopes and dreams.  There is a problem that we can define—revenue per bit is falling faster than cost per bit.  That’s not a requirement, nor is saying “We have to fix it!” one.  NFV, in particular, has been chasing a credible benefit driver from the very first.  Some operators tell me that’s better than SDN, which hasn’t bothered.  We know that we can either improve the revenue/cost gap by increasing revenue or reducing cost.  Those aren’t requirements either.

Getting requirements is complicated by technology, financial, and political factors.  We need to have specific things that next-gen technology will do in order to assign costs and benefits, but we can’t decide what technology should do without knowing what benefits are needed.  Operators know their current costs, for example, and vendors seem to know nothing about them.  Neither operators nor vendors seem to have a solid idea of the market opportunity size for new services.  In the operator organizations, the pieces of the solution spread out beyond the normal infrastructure planning areas, and vendors specialize enough that few have a total solution available.

Despite this, the operator architectures offer our best, and actually a decent, chance of getting things together in the right way.  The layered modeling of services is critical, the notion of having orchestration happening in multiple places is likewise.  Abstracting resources so that existing and new implementations of service features are interchangeable is also critical.  There are only two areas where I think there’s still work to be done, and where I’m not sure operators are making the progress they’d like.  One is the area of onboarding of virtual network functions, and the other is in management of next-gen infrastructure and service elements.  There’s a relationship between the two that makes both more complicated.

All software runs on something, meaning that there’s a “platform” that normally consists of middleware and an operating system.  In order for a virtual function to run correctly it has to be run with the right platform combination.  If the software is expected to exercise any special facilities, including for example special interfaces to NFV or SDN software, then these facilities should be represented as middleware so that they can be exercised correctly.  A physical interface is used, in software, through a middleware element.  That’s especially critical for virtualization/cloud hosting where you can’t have applications grabbing real elements of the configuration.  Thus, we need to define “middleware” for VNFs to run with, and we have to make the VNFs use it.

The normal way to do this in software development would be to build a “class” representing the interface and import it.  That would mean that current network function applications would have to be rewritten to use the interface.  It appears (though the intent isn’t really made clear) that the NFV ISG proposes to address this rewriting need by adding an element to a VNF host image.  The presumption is that if the network function worked before, and if I can build a variable “stub” between that function and the NFV environment that interfaces with that function in every respect as it would have been interfaced on its original platform, my new VNF platform will serve.  This stub function has to handle whatever the native VNF hosting environment won’t handle.

This is a complicated issue for several reasons.  The biggest issue is that different applications require different features from the operating system and middleware, some of which work differently as versions of the platform software evolves.  It’s possible that two different implementations of a given function (like “Firewall”) won’t work with the same OS/middleware versions.  This can be accommodated when the machine image is built, but with containers versus VMs you don’t have complete control over middleware.  Do we understand that some of our assumptions won’t work for containers?

Management is the other issue.  Do all “Firewall” implementations have the same port and trunk assignments, do they have the same management interfaces, and do you parameterize them the same way?  If the answer is “No!” (which it usually will be) then your stub function will either have to harmonize all these things to a common reference or you’ll have to change the management for every different “Firewall” or other VNF implementation.

I think that operators are expecting onboarding to be pretty simple.  You get a virtual function from a vendor and you can plug it in where functions of that type would fit, period.  All implementations of a given function type (like “Firewall”) are the same.  I don’t think that we’re anywhere near achieving that, and to get there we have to take the fundamental first step of defining exactly what we think we’re onboarding, what we’re onboarding to, and what level of interchangeability we expect to have among implementations of the same function.

The situation is similar for infrastructure, though not as difficult to solve.  Logically, services are made up of features that can be implemented in a variety of ways.  Operators tell me that openness to them means that different implementations of the same feature would be interchangeable, meaning VPNs are VPNs and so forth.  They also say that they would expect to be able to use any server or “hosting platform” to host VNFs and run NFV and SDN software.

This problem is fairly easy to solve if you presume that “features” are the output of infrastructure and the stuff you compose services from.  The challenge lies on the management side (again) because the greater the difference in the technology used to implement a feature, the less natural correspondence there will be among the management needs of the implementations.  That creates a barrier both to the reflection of “feature” status to users and to the establishment of a common management strategy for the resources used by the implementation.  It’s that kind of variability that makes open assembly of services from features challenging.

Infrastructure has to effectively export a set of infrastructure features (which, to avoid confusion in terms, I’ve called “behaviors”) that must include management elements as well as functional elements.  Whether the management elements are harmonized within the infrastructure with a standard for the type of feature involved, or whether that harmonization happens externally, there has to be harmony somewhere or a common set of operations automation practices won’t be able to work on the result.  We see this risk in the cloud DevOps market, where “Infrastructure-as-Code” abstractions and event exchanges are evolving to solve the problem.  The same could be done here.

Given all of this, will operator initiatives resolve the barriers to SDN/NFV deployment?  The barrier to that happy outcome remains the tenuous link between the specific features of an implementation and the benefits needed to drive deployment.  None of the operator announcements offer the detail we’d need to assess how they propose to reap the needed benefits, and so we’ll have to reserve judgment on the long-term impact until we’ve seen enough deployment to understand the benefit mechanisms more completely.