Who Will Orchestrate the Orchestrators?

Most people would agree that NFV crosses over a lot of subtle technology boundaries.  It’s clearly something that could (and IMHO likely would in most cases) be hosted in the cloud.  It’s something that is certain to consume SDN if there’s any appreciable SDN deployment, and it’s something whose principles of orchestration and management are likely to slop over into the broad (meaning legacy device) network.

The orchestration and management implications of NFV are particularly important, in part because we don’t have a fully articulated strategy for orchestration of services today and we don’t really have a handle on management of virtual, as-a-service, or cloud-based resources.  There have been some announcements of element/network management systems that incorporate orchestration principles; Amartus, Overture Networks, and Transition Networks all have management platforms that are converging functionally on a holistic model for management and orchestration.  What’s not yet clear is how far all these players will go and how long it will take to get there.

NFV presumes virtual network functions (VNFs) deployed on servers (in the cloud, or in some other way) will be linked to create cooperative functional subsystems that can then be composed into services.  This process is a deploy-connect model that has to be carried out in a systematic way so that all the pieces that have to participate are there somewhere and are mutually addressable.  This is very much like cloud application deployment, so much so that a number of the current NFV implementations (CloudBand and CloudNFV) use cloud technology to support the actual deploy-connect process (orchestration decides what to deploy and connect, and where, so it’s still a higher-level function).

The complication is that a realistic service consists of a bunch of elements that are probably not VNFs.  The access network, metro aggregation, optical transport, and other network elements are either not immediate NFV targets or not suitable for server support at all.  There are also likely to be legacy devices in networks that have to be used during a transition to NFV, even where such a transition will be complete eventually.  That means that the orchestration process has to handle not only virtual functions but legacy devices.

There are two basic models for implementing orchestration.  One is to say that you’ll orchestrate in the same way that you manage, meaning that there would be a series of element/network management systems that would organize service resources within their own domains, and the other is that there’s a master-orchestrator that will take a complete service description and a complete network/resource map, and then place everything and connect it.

Vendors have tended to support the first of these models, in no small part because NFV as a standard focuses on orchestration of VNFs, which leaves the orchestration of legacy components to existing management systems.  What’s not clear at this point is what the implications of this situation would be at the service management level.

The TMF and OSS/BSS vendors have generally worked to create service templates that describe how a service would be put together.  These templates model the service, but not necessarily at the level needed to create a complete recipe for optimized deployment.  Certainly it’s likely that these templates would need to be updated to hold the information necessary to deploy VNFs, unless it was possible to define each element of the model of a service as being targeted at a single specific NMS/EMS that could “sub-orchestrate” the deployment of that element.  Even there, the question is whether that sort of decision could be made atomically; would it not be true that the optimum placement of VNFs depends on the way the rest of the service, the legacy part, is supported?

Another issue is that orchestration is where resources are assigned to service mission fulfillment.  You can’t do management in a meaningful way without an understanding of the mission (meaning the service) and the state of the resources.  Since orchestration sets the relationship, it’s the logical place to set the management processes into place.  The optimum management vision for a network that has to be extensively orchestrated for any reason is one that recognizes both the need to create service-relevant management and resource-relevant management at the same time, in a multi-tenant way, and at full network scale.  If everyone is orchestrating their heart out at a low NMS/EMS level, how do you provide cohesive management?  That problem occurs even today in multi-vendor networks.  Imagine it in a network whose “devices” are a mixture of real and virtual, and where the MIBs that describe a real device reflect variables that don’t even exist on servers or in data centers!

My personal view has always been that the TMF models can handle this.  You can define service and resource models using the SID (GB922) specification, and the principles of GB942 contract-arbitrated management (where management flows through contract-defined orchestration commitments to find the real resources) seem to resolve many of the management problems.  But even here, the question is whether there’s a way to aggregate NMS/EMS-level orchestration to create a unified service model, without creating some higher-level orchestration process to do it.

SDN presents similar issues.  It’s likely for a number of reasons that SDN will deploy in a series of “domains”, encompassing a data center here or there and perhaps some metro functionality in between.  Maybe in some cases there will even be SDN in branch locations.  The management of SDN has to change because it’s not possible to look at a device and derive much information about the service model as a whole; that knowledge is centralized.  Yet the central knowledge of what was commanded to happen doesn’t equate to the state of the devices—if it always did, you’d not need to manage anything.

So what we’re seeing here is that the two networking revolutions of the current age—SDN and NFV—both demand a different model of management and orchestration.  What model is that?  We’re going to need to answer that question pretty quickly, and while current orchestration offerings by vendors may aim too much at the NMS/EMS level, they’re a useful start.

SDN: Growth or Just Changes

The world of SDN continues to evolve, and as is usually the case many of the evolutions have real utility.  The challenge continues to be the conceptualization of a flexible new network framework that exploits what SDN can do, and at an even more basic level provide the framework by which the different SDN models can be assessed.

One of the most potentially useful announcements came from HP, who say they want to build an SDN ecosystem by providing an SDN app store and a developer environment complete with toolkit and validation simulator.  This is built on top of HP’s SDN controller of course, but it’s arguably the first framework designed to promote a true SDN ecosystem.

I don’t have access to the SDK for this yet; the URL provided for the developer center is broken until November when the tools arrive.  As a result I can’t say what the inherent strengths and limitations of the framework are.  Obviously it’s disappointing that the program doesn’t have the pieces in place, but it’s not unreasonable.  I do think that HP should publish at least the API guide in the open, though.  People need time to assess the tools and their potential before they commit to an ecosystem, particularly one as potentially complex as SDN.

The challenge with any developer ecosystem is the level at which the developers are expected to function.  OpenFlow, as I’ve said before, is simply a protocol to manipulate switch forwarding tables.  To presume that developers would be building services by pushing per-switch commands directly is to presume anarchy, so HP has to be providing higher-level functionality that lets programmers manipulate routes or flows and not tables and switches.  Even there, a basic challenge of SDN is that applications that can manipulate switches, even indirectly, can create serious security and instability flaws.  Logically there has to be two levels of SDN, one that lets applications control basic connectivity on a “domain” basis and another that lets infrastructure providers manage QoS, availability, etc.  How that gets done is critical to any ecosystem, IMHO, and I’d sure like to see HP document their model here.

Another SDN development is Version 1.4 of OpenFlow, which enhances the flexibility of OpenFlow considerably but also raises some questions.  The new version has features that are so different from those of the previous version that it will be essential that switches and controllers know whether they’re running the same thing.  That sort of change is always hard to make because “old” software rarely prepares for “new” functionality.  It’s also virtually certain that some of the features of the new OpenFlow will have to be exposed via changes in the controller APIs, which means that applications that run on top of controllers may also have to be changed.  This collides with the notion of building ecosystems, since nothing aggravates a developer like having the platform change underneath.

Still, it’s pretty obvious that SDN is growing up.  Not surprisingly, players like Cisco and rival Huawei are promoting more SDN-ready technology, perhaps even starting to build things that go beyond exploiting SDN in a limited way toward actually depending on it to fully access features and capabilities.  We’re also hearing about SDN layers a bit, but in what I think is an unfortunate context.  We hear about “data center”, “carrier”, or “transport” SDN, and I think that this division blurs some pretty significant boundaries and issues.

At the top of the network, where applications live, the notion of software-defining networking is fairly logical.  What you want to do is to allow for the creation of new service models (connectivity control based on something other than legacy L2/L3 principles; see my blog yesterday) and at the same time support the notion of multi-tenancy since applications are for users and there’s a load of users to support.  As you get deeper, though, you are now supporting not an application but a community.  It’s always been my view that something like OpenFlow, designed for specific forwarding control, gets more risky as you go down the stack.  Further, at some point, you’re really dealing with routes at the transport level, even TDM or optical paths that don’t expose packet headers and aren’t forwarded as packets but as a unit.  Here we have both a technical and a functional/strategic disconnect with classic OpenFlow.

The OSI model has layers, and I suspect that the SDN model will need them for the same reason, which is that you have to divide the mission of networking up into functional zones to accommodate the difference between network services as applications see them, and network services as seen by the various devices that actually move information.  We’re not there yet on what the layers might be, and arguably there’s a real value in “flattening” the OSI layers down to something more manageable in number and more logical in mission.  We aren’t going to harmonize these goals if we never have real discussions on the topic, though, and we’re not having them now.

We also need to understand how SDN and NFV relate, and how both SDN and NFV relate to the cloud.  If operators are going to host a bunch of centralized SDN functionality or a bunch of virtual functions, it seems to me that they’d elect to use proven cloud technology to do that.  How does proven cloud technology get applied, though?  SDN supports service models that cloud architectures like OpenStack’s Neutron don’t support, because SDN in theory supports any arbitrary connection model.  How do we use the cloud to distribute “centralized” SDN control so it’s reliable and can be exercised across a global network?  How does NFV work in supporting both SDN centralized technology and its own function mission, but in the cloud?  Can it also deploy cloud app components, and build services from both apps and network functions?  There are a lot of questions to consider here, and a lot of opportunity for those who can answer them correctly.