What I Learned About SDN and NFV (that’s Not Pretty)

The Service Architect Lifecycle tutorial I just completed for ExperiaSphere taught me a lot about management and orchestration for SDN, NFV, and the cloud.  There’s nothing like having to explain how something would be used to focus your attention on what needs to be done!  I don’t want to dive into all the issues, which hopefully the tutorial itself will expose and aren’t appropriate to my public blog in any case, but I do want to do something that’s outside the scope of the tutorial, which is talk about how my induced insights couple with industry trends.

Don Clarke, perhaps the spiritual father of NFV overall and now at CableLabs, said nearly a year ago that operators were going to need to understand an NFV strategy in the context of a complete service lifecycle in order to validate its benefits.  The first step in that lifecycle process is the Architect phase, the place where a specialist who understands the NFV implementation builds the elements from which services will be created, by harnessing the behaviors of resources and systems of resources.  Every operator knows this is essential, and yet we don’t really hear much about lifecycles and Architects in NFV announcements or see them illustrated in PoCs.  Architects do service and resource modeling up front, creating structures that can then support service automation when the service is ordered and as it’s being used.

We don’t hear about this because it’s complicated and most NFV proponents don’t want to address that complexity.  Building a complete picture of NFV is complicated because network services and infrastructure are both complicated.  But the fact that reality is complicated doesn’t justify oversimplification.  If NFV is going to deploy, if SDN and the cloud are going to succeed, we have to come up with an approach for building applications and services that is as agile as these revolutionary technologies allow.  We also have to support our new agile framework, and our evolution to it, at such a high level of operations automation as to make even the complex easy and cheap to do.  Which is why we can’t start off our processes by defining that complexity as “out of scope” or “provided at a higher level”.

I propose a revolutionary thought.  We are the “higher level” here.  Anyone who wants our industry to get better, to be as vibrant and valuable in the future as it was in its golden age, has to step up and try to solve complicated problems by facing them down.  There are those who will disagree with the way I’ve approached SDN, NFV and the Cloud in my open-source-biased ExperiaSphere architecture, but one thing I’m confident about—they will have enough information about the complex stuff that we must address to know what they’re disagreeing with.  It took me over sixty slides to describe the Architect lifecycle stage.  Most “NFV” product presentations use less than a third that number to describe everything they do.

ExperiaSphere is targeted at universal management and orchestration, which is a superset of NFV, but I don’t think that my scope is too broad—NFV’s scope is too narrow, and so is the scope of SDN and other “revolutionary” activities in our industry.  The original goal of reducing capex by exploiting software and COTS has for most operators given way to a new goal of improving service agility and operations efficiency.  But even if that evolution of goals hadn’t happened, NFV has to be a lot more than we think it is.  I believed that from the first, when the NFV Call for Action was published.  I’m certain of it now.

Service automation can only work if we have an abstract model of a service that is first used to marshal the necessary resources (deployment) and then sustain each resource and each level of cooperation through the life of the service.  If we do anything else, then we can’t interpret an event, a change in conditions, and respond in a way that restores normalcy because we don’t know what normal is and in what direction it lies.  A service is a finite-state machine not only at the high level, but at the level of each functional element.  It’s a clockwork-like interplay of interdependent pieces.  If you focus on a single way that a single function is implemented (firewall, for example) you can’t change the overall agility or economics, any more than you can make a clock by making one wheel in the mechanism.

The NFV ISG took a critical, seminal, step into the future with the concept of MANO—management and orchestration.  They introduced the idea that you had to build NFV elements the way software architects build applications—by binding components using tools that in the software space would have been called “DevOps”.  The problem is that they didn’t go far enough.  They’ve limited their work to the enclaves of hosted functionality within a service, and agility and efficiency have to be service-wide to be relevant.  The work being done there is good, but it may be only part of a solution—an appeal to that mythical “higher layer”.  The OSS/BSS hooks in the reference architecture look like the same stuff that manages devices today.  How does that improve agility and efficiency?

The same thing can be said for the ONF and for OpenDaylight.  Technically both are doing good work, but we’re still muddling around in the basement of service creation and ceding all of the visible pieces of the service and service management to higher-layer applications, north of the famous “northbound APIs”.  We could, in theory, use apps to build new and unheard-of services based on explicit forwarding rules.  We could make “virtual EPC” a reality, transform our notions of security and access control, and open whole new retail opportunities.  All of this stuff is up north, where everyone fears to tread, and so we’re proposing to transform networking by using totally new forwarding technologies to replicate what we can already do.  And somehow, this new technology is going to be so operationally compelling that efficiency will justify deployment.  How?  Where do we address those efficiencies?  With a hook to OSS/BSS that looks just like what we have today.

Operations is the ultimate stumbling block for everyone.  The TMF had a number of very strong ideas about the evolution of operations as a component of service agility and management costs.  They had an initiative to create open operations processes based on componentized software principles—OSSJ it was called for “OSS Java”.  They had an initiative to define federation among operators, called “IPsphere”.  They had a vision of steering events to service processes based on the service contract, “NGOSS Contract.”  All of these are still theoretically projects but none have moved the basic structure of the TMF—the SID data model and the eTOM operations map.  ExperiaSphere has shown me that rigid data models are in impediment and that operations processes don’t have a native flow or structure; they’re simply components in a service-wide state/event engine.

I think we have to look at all of our revolutions in a new way—why not, if we believe they really are revolutions?  One thing I learned is that it’s not about interfaces or APIs or data models, it’s about flows and bindings.  Information flows through object-modeled structures to build and sustain services.  A model of a service is a set of objects through which parameters flow downward to drive resource behavior and management information flows upward to automate changes in response to problems and to involve human operators where needed.  So the TMF’s work should be focused on binding objects into services with flows, which I told them.  The NFV ISG’s work should be focused on information flows through logical elements, a virtual management view, which I communicated to the ISG too.

OK, why am I the standard here?  Maybe I’ve got this wrong.  Maybe operators don’t want agile services or efficient operations.  Maybe somehow the high-level stuff everyone seems to be dodging is getting done.  Maybe vendors have a better approach to this than the one I’m advocating.  But I don’t think we’re going to revolutionize networking and transform its benefits by taking all of the limitations of the past and re-implementing them using new technology choices.  Why not take the missions of the future and optimize technology to support both those missions and the evolution to them?  That is what the TMF and the NFV ISG and the ONF and OPN and everyone else should be doing.  Complexity, if faced squarely, can be minimized by making the right choices in architecture.  I’m happy to put my concept out there for everyone to assess.  I’d like others to do the same.  Let’s stop doing SDN or NFV platitudes and start doing architectures.  Show me your sixty slides on Service Architect and we can talk on level ground.