Why SDN and NFV Shouldn’t Force Us to Abandon OSI Layers

In the idealistic vision of the future network (a vision I still hope can be realized), NFV forms an operational- and feature-enhancing umbrella over SDN to create agile services that improve efficiency and add greater value than the basic connection services of today.  This vision would require some significant expansions in scope for NFV; primarily, NFV would have to be given the ability to orchestrate and manage legacy elements of infrastructure.  It’s not a pipe dream that this could happen because we have vendors who already offer this broader scope.

There will always be real network elements, and likely there will always be legacy L2/L3 services as well.  Most of the “SDN” and all of the “NFV” to date are hybrids with legacy technology.  We’ve started to hear about how legacy technology extends NFV hosting of features by connecting the feature subnetworks with the users of the service.  It also creates the underlayment for many of the virtual connectivity features, and this raises the interesting question of whether NFV should be considered to be multi-layered and vertically integrated.  NFV, via SDN, makes vSwitch connections.  Could it also control the real switches underneath?

One reason this is important is that fault management in even today’s services has to contend with the problem of “common cause”.  If a fiber trunk is attacked by the classic “cable-seeking backhoe” then there’s a physical layer outage, a Level 2 outage, and a Level 3 outage even if we limit ourselves to classic OSI.  The break could generate a flood of failures at the service level.  If NFV is responsible for remediation of service faults, does NFV have to “know” that fiber trunk outages should be addressed at the fiber level through rerouting and not by re-framing every service over the trunk individually to use a different path?  In a network operations center today, we’d see fault correlation activity to try to prevent this fight-the-symptoms-not-the-disease syndrome.  How would that work in NFV?

Even at a more mundane level, we have to wonder whether a universally capable and operationally optimizing NFV implementation wouldn’t be used to provision underlying facilities so their operation would be optimized too.  In our hypothetical data center with vSwitches, why wouldn’t we use NFV to provision the physical devices?  Don’t say there aren’t any either; even SDN would demand white-box facilities and SDN depends on having control paths to the switches.

Then there’s multi-tenant.  Suppose we decide to set up a kind of super-IMS using NFV, as many are already proposing and as Alcatel-Lucent is already being contracted to do.  IMS isn’t single-service-per-instance.  You set up an IMS for an operator, not for every call, but how does NFV’s deployment of a multi-tenant resource provide for integrating that resource with other services and applications?  If super-IMS exposed APIs for other services to use, how would those APIs be made available to the other services?

One concept that I think is absolutely critical in addressing these kinds of issues is that all network services have to be framed as network-as-a-service abstractions.  A service at any level is a black box, known by its properties not by its contents (which are invisible).  What follows from that is that the NaaS abstraction has management properties and state which are derived from those invisible contents.  The user of the service “knows” whether the black box has failed, but not how the failure happened.  I think this vision is at least somewhat accepted in both SDN and NFV, but not completely, because we don’t address the notion of layered services even though all services today are layered.

In layered services, a given level (the “retail service” for example) is composited from lower layers.  The user layer is a NaaS, but so are the lower layers.  The retail service would not then have visibility down to the bottom of the stack of infrastructure actually used, but only to the level of the black box combination below.  It would be responsible to remedy faults reported by its own black box and present a fault to its own (retail) user if that isn’t possible.

In fault management terms, this could have profound implications.  If a “retail NaaS” sits on a couple of “component NaaS” services that in turn exercise “transport NaaS” services, then a lower-layer fiber fault causes not a retail fault but a transport NaaS fault.  The lower layer would be given the opportunity to correct the problem, in which case you’d have a report of an interruption but not a failure at the retail level.  If the lower layer (transport) can’t fix things, then the problem would escalate to the “component NaaS” level for remediation, reaching the retail level only if nothing can be done below.

Our visions of SDN and NFV are both, by this measure, too vertically integrated.  We are expecting to allocate resources at a primitive level and not through adopting the NaaS services created by lower layers.  One thing that I think the “right” vision of NaaS would have done is make the whole SDN/NFV integration question moot.  NFV does not, ever, exercise SDN.  It exercises NaaS abstractions that can be fulfilled by SDN.  We need not, should not, focus on the implementation because that would mean our NFV principles would violate the principles of layered networking that are the foundation of packet communication of all types today.

In this framework, the absolutely critical element is that black box.  A black box is defined by its properties, as seen from the outside, so what we need to be thinking about is how we describe this in technology terms.  The most logical answer, I’ve suggested in the past, is the notion of a “recipe”.  If you’re making Margarita Shrimp, you have a black-box abstraction in hand.  The recipe name is the name of the abstraction, and the recipe is a procedure that realizes the outcome (produces the dish) when invoked.

We don’t need to name every possible dish to cook, nor does NFV or SDN have to name all its possible abstractions.  We have to be able to assign a “dish name” and provide a recipe for it no matter what it is.  The biggest hole in the notions of SDN and NFV as they are today is that we’re not focused on the notion of either producing or consuming black-box abstractions.  Without that notion we can’t do layers, and we vertically integrate services so that a common low-level fault blows up into an avalanche of service failures before anything really tries to deal with it.

To me, the lessons of our layered past dictate we have a layered future.  That means that we have to think about the basic principles of isolation, and adhere to them on one hand while making sure that we don’t constrain future services by depending on fixed service models based on older technologies.  The way to accomplish that is simple.  Named abstractions with recipes.  If that concept can be brought successfully into both SDN and NFV, we’ll take a giant step toward saving ourselves from a lot of problems—including those scalability issues I blogged about yesterday.