The Best Marriage of SDN and NFV

The thing about revolutions is that they’re…well…revolutionary.  Out-with-the-old-in-with-the-new is a popular concept (especially if you’re not among the “old”).  They’re exciting because the shake things up, create new media fodder, all sorts of good stuff.  But they’re expensive and tiring, and so it’s hard to have more than one of them in the same market area at the same time.

SDN is supposed to be a revolution.  So is NFV.  Which of the two might have the most impact were it alone in the revolutionary stratosphere is hard to say, but they’re not alone.  It’s likely that if there aren’t harmonizing moves, common steps, to be taken to unify the execution if not the technology, both will suffer.

To SDN, most of NFV is north of those northbound APIs, an area that’s not likely to be defined.  To NFV, SDN is south of those Virtual Infrastructure Managers, and the fact is that this area must be defined if NFV is going to work.  Thus, while NFV may be no more important than SDN, NFV will have to accommodate SDN quickly, or lose relevance.  Not because SDN is critical but because it represents something critical.

SDN provides the service of connectivity to its users.  That means that within NFV, SDN has to be viewed as a connection option, a connection service element.  Our first challenge in alignment is that NFV doesn’t really focus on connections to users.  An early decision in NFV was to focus on the specific service elements that were based on software and deployed on servers—the VNFs.  The “end-to-end architecture” for NFV doesn’t actually encompass the ends at all.  It starts inside the service, where NFV hosting starts.  No access, no transport.

This focus is critical because a decision to include everything that’s part of the user’s end-to-end service view would have induced the ISG to address how to model and orchestrate legacy network elements that will surely be part of any NFV service.  Since SDN is intended to displace legacy switch/routing in some or all of these legacy missions, the approach taken for legacy could have been applied to SDN as a baseline.  But don’t despair.  We can extrapolate a bit.

There are two kinds of connections in NFV.  One addresses the data paths that are visible to the user of the service, and the other the data paths that are needed within the structure of virtual functions NFV builds in order to connect the pieces.  In today’s world, it’s likely that almost all the visible, external-to-NFV, data paths would be supported using legacy technology.  It’s also likely that if VNF hosting points were distributed in any way, the internal data paths would today contain some legacy elements.

Over time, internal paths could be connected via SDN fairly quickly because it would be possible to incorporate SDN-based networking in the “cloud networking” part of NFV.  Substitution of SDN for external paths would be more complicated from a justification perspective and also take longer.

The implication of NFV architecture today is that NFV Infrastructure would include both hosting and cloud-like connection services.  This appears to be supported by the PoCs to date, which tend to supply connectivity among VNFs by using something like Neutron or vSwitches.  I suggested in a prior blog that the logical approach to SDN within NFV is to generalize the VIM into an Infrastructure Manager (IM) and assign the connection function to such an element.  This would be compatible with either the internal or external connection missions, so NFV models and MANO could be extended to the true service edge.  But there are also some issues to consider.

The first issue is whether there is such a thing as a “legacy IM” and “SDN IM” or whether SDN and legacy are choices that an IM makes down below.  This might seem like one of those “how-many-angels-can-dance-on-the-head-of-a-pin” arguments, but it’s not.  If we have to segregate IMs by the underlying technology, then the details of network domains that use multiple technologies or even multiple vendors might have to be exposed upward into MANO so it could call the correct IM.  That makes service models very infrastructure-specific and means any change to the network would demand corresponding changes to all the models.  I think that’s outlandishly inefficient and error-prone.

But that makes the second issue even more important.  If we want MANO to ask for a connection service at a generic level, how do we define it?  This is one of the many reasons I’ve rejected the Yang vision, which proposes to model services at a nodal level.  We should not be asking for IMs to build services by handing them the topology; MANO should be opaque to those details.  So what do we hand it?  In both CloudNFV and ExperiaSphere I proposed we define “service models” that describe a service as the connection relationship of the endpoints.  The classic example is LINE, LAN, and TREE for point-to-point, multi-point, and multicast.  If I know the OSI layer (meaning the header component) associated with a service and the connection model I want, I can build it no matter what my underlying infrastructure might look like.

If you take this approach, then SDN becomes an option to support connectivity in any part of any network domain represented by an IM.  What’s required is that the IM itself understand the topology of the network it supports, in terms of the nodes, paths, and protocol choices.  It’s hard to see how you could control something without that knowledge, so I don’t think this is a heavy burden.  In simple terms, a service model would be decomposed into a set of relationships built via NMS (for legacy services) or SDN Controllers for SDN.

You can also see an example of what I previously called “functional” versus “structural” orchestration here.  The service model at the functional level is decomposed into IM-jurisdictional atoms that passes sub-models to the appropriate IMs.  These IMs then perform a structural level decomposition, creating the requested connectivity (or whatever) by marshaling resources based on topology, capability, connectivity, and policy.

My point here should be clear by now.  If you do NFV right, then making it work with SDN is an automatic byproduct of the approach.  SDN gurus can address the structural processes that build resource commitments (forwarding rules and paths) based on the service models.  NFV gurus can worry about the functional modeling that supports decomposition, which as I said in a prior blog is done by binding functional objects to the resource pools responsible for fulfilling them.  We don’t need two revolutions.

However, as I’ve also noted, we don’t need to call the top-level revolution in modeling “NFV”.  The fact is that this kind of functional model could reside in an NFV implementation, or above it in some operations-to-resource boundary layer, or even in the OSS/BSS.  It’s the model that matters, so what we’re saying is that this is a virtualization revolution, using virtualization-framed modeling and decomposition principles.  NFV is a carrier, a vehicle for driving this change because it can link it to some near-term benefits.  Both that concept and that linkage to benefits are critical.

Is This the “Grand Alliance of NFV?”

SDN and NFV have been media events for sure, like the cloud.  But like the cloud, SDN and NFV are technology revolutions that require both the technology part (the right architectures and elements) and some revolutionaries.  We’ve been sadly lacking in both, but that may now be changing.  Two of the most credible of all the NFV players, HP and Wind River, have formed an alliance.

I’ve blogged about both these companies before.  In its Titanium ecosystem, Wind River has what I believe to be the best foundation technology for NFV infrastructure, a software platform that can deliver the kind of reliability and availability that operators will need if they’re to exchange appliances for applications.  HP’s OpenNFV provides the best NFV implementation of any major vendor, and includes the critical orchestration element of NFV that’s been fobbed off by most players.

The partnership between the companies is essentially an integration of Wind River into HP’s Helion OpenStack platform, which means that the two companies will be offering enhanced platform support for the cloud as well.  That’s good news because NFV and the cloud are kissing cousins.  But it also raises the big question on the alliance—how much will the two companies really cooperate to move the ball with NFV.

The great contradiction of NFV is that all the value comes from one component and all the money will be made on another.  I’ve blogged several times about the importance of the NFV’s management/orchestration (MANO) element.  Michael Howard of Infonetics sent me a presentation he gave at the SDN OpenFlow World Congress in Dusseldorf in October, and he also makes the point that it’s MANO that’s going to determine whether NFV means anything or is just a media hype wave.  Without MANO, NFV is really just hosting stuff and hoping for the best.

But will MANO make money?  It’s a software function, it could be done largely with open-source tools, it’s complicated to do and to sell.  At the end of the day, NFV will shift capex from devices to servers, which means that it’s the NFV Infrastructure (NFVI) that matters commercially.  The fact that servers are the winner in NFV isn’t lost on the vendors, who have jumped out to support the NFVI foundation of NFV and largely ignored the critical MANO layer.  Why?  Because if they try to drive their own MANO strategy they’ll alienate other potential NFV partners who also have one.

Most network equipment vendors don’t have servers (Cisco being the obvious exception).  What they do have is a reservoir of network functionality and an understanding of management and orchestration.  They’re also network incumbents.  So does a server vendor field their own complete NFV strategy, one that would compete with the network vendors who might do the same?  That could be a risk.

The flip side is that if everyone decides that NFVI makes money and nobody wants to take MANO risk, NFV never develops at all because its critical core logic isn’t done.  You can’t compete for a market that doesn’t exist, and without MANO the value propositions for NFV cannot be satisfied, period.  In the days of the Mercury Astronauts in the US Space Program there was a saying, “No bucks, no Buck Rodgers”.  That meant that without funding there’s no revolution.  In NFV, it means that without compelling benefits all the PoCs we have now are just science projects.

A Wind River/HP alliance could change that.  The closer we get to somebody packaging NFV as a complete product, the more likely it is that others in the market will take notice and respond.  A good example is the NFV ISG and OPNFV activities, both of which got mired in the bottom-up problem I’ve described many times.  Will the vendors/operators who make up these activities stand by while one solution is articulated by a credible pair of vendors and then runs away with the market?  Maybe we’ll see some real movement toward useful dialog.

Of course, I have to address the “and maybe not” side.  If you read the press release on the alliance, it headlines “HP and Wind River Partner to Create Carrier Grade HP Helion OpenStack Solutions for NFV” but the details are about offering Wind River stuff as an option in Helion OpenStack.  NFV is the stated target, but there’s nothing about cooperating on MANO, at least not yet.  It would be fairly easy for this alliance to fall into the “attractive billboard” model, something to get publicity for everyone but something that generates little progress toward really getting NFV deployed.

For that, the big test is still the test of management.  Orchestration isn’t a concept new to NFV; every DevOps tool ever developed is arguably an orchestration strategy, and the TOSCA standard I believe is the best model for NFV services was generated for the cloud.  The question is how far orchestration will go.  Will we virtualize resources alone, or will we virtualize everything including management models, views, and process relationships?

In the virtual world of the future, things like “OSS”, “BSS”, “NMS”, “NOC”, and “MIB” don’t exist except as instantiated abstractions.  We atomize everything and assemble what we need from the pool of stuff that creates, “everything” here includingoffer all the tools we need to build cloud services, NFV, SDN, and pretty much everything else.  NFV didn’t invent MANO, but it described a management/orchestration mission that should have given us a glimpse of what MANO could, or should, or even must, mean.  If that vision is realized, the vision will transform networking and pull along new technologies like SDN and NFV because it enables them.

So what we have to look for here from our Grand Alliance for NFV is whether the allies see the potential.  They have a big head start because they have MANO and also have a credible framework for creating functionally composed services and service/operations processes.  You can’t have orchestra music without a conductor.  But you also need music and musicians, and we’ll have to wait to see whether HP and Wind River—and Intel behind the latter—are really determined to move the ball, not only for NFV but for the whole of the virtualization revolution.  It could happen.

Did the NFV ISG Bite Off Too Little?

Light Reading’s Carol Wilson did another nice interview this week, and documented it in a story about the progress and future of the ETSI NFV.  A good part of the story focuses on the assertion that the ISG had to constrain its scope to get its work done.  Respectfully, I disagree.

Telecom is an industry that makes capital investments on a cycle as long as 20 years or more, and this isn’t the sort of financial situation that breeds quick changes in course.  Any revolutionary technology has to be able to prove it’s better in the long run, but it also has to somehow generate enough benefit to cover what might be a formidable set of near-term financial hurdles.  It’s those hurdles, and overcoming them, that I’m going to talk about today.

Capex reduction always seems to be an appealing justification for something, but capex is where you run hard into that issue of “sunk costs”.  Network equipment currently installed at Levels 2 and 3 and above has almost four years of residual depreciation.  That means that even if you were to say that a new SDN or NFV approach could cut the cost of equipment by a third, it would take considerable time to make up the savings.  During that time, there’s a risk that something else would come along.  It’s for this reason that operators have tended to distrust revolutionary technology changes justified by capex.

The classic defense against sunk cost is an evolutionary strategy.  If you can gradually introduce something new, you can replace the old stuff that’s fully depreciated as it becomes eligible.  The fact that you don’t touch stuff that still has write-down to be taken means you don’t have those sunk costs.  Evolution can also reduce risk by limiting the amount of infrastructure you change at any given moment.

The problem with the evolutionary-displacement model, at least with respect to SDN and NFV, is that the revolutions themselves may not support it.  NFV, for example, presumes reasonable economies of scale from the pool of resources that would be used to host network features.  If you start off with small pools, you may secure no savings at all.  SDN’s challenge is that control plane separation for two or three devices inside a network isn’t likely to bring about very much change.

The SDN community, in my view, has never really faced up to the issue of financial justification.  I found in a survey this spring that neither enterprises nor service providers cited “SDN benefits” that could be readily quantified.  If you present your CFO with a non-quantified benefit, you may as well say that Gandalf approved of your choice.  Real costs, real risks, demand real dollar-signs benefits.

In the NFV space, operators graduated from a notion that capex reduction would justify NFV, to the current view that it will be justified by creating operational efficiencies and improving service agility.  We’re in an age where running a network costs about as much as buying it, so operations costs are a real concern.  Operators are also concerned that the OTT players appear to be able to deploy services on a dime where they take years to do the same thing.  But there are two problems with these modern justifications.

Problem number one is that old bugaboo of real-dollar benefits.  Operations savings can justify NFV (or SDN), right?  Well, how much operations savings would it take?  How would we go about securing the savings?  And with service agility, given that OTT services are really hosted experiences delivered over the network and not network services per se, how exactly does SDN or NFV make these more agile?  If the OTTs can build new services without SDN and NFV, the operators could build the same ones without the new technologies too.

The second problem is one I’ve been harping (ranting, some may say) about from NFV’s inception.  If you want operations efficiency and service agility, then you want changes in service operations and management.  You also have to think about the often-neglected but always horrific issue of first cost.

Service operations and management are out of scope to the ETSI NFV ISG, and that’s been true all along.  I didn’t agree with that, and I still don’t.  I understand that the scope of an effort like NFV has to be contained to insure it doesn’t become a decades-long project, but the very first thing that any technical project has to do if it’s to succeed is to secure its own driving benefits.  Otherwise you come up with a “solution” that you can’t implement.  NFV has embraced value propositions it doesn’t own.

But even if it could own them, there’s still that first-cost problem.  Network operators, like public utilities, are cash-flow machines.  People buy their stock not because they are going to double their sales in a year or so (by stimulating the birth-rate, perhaps?) but for their return in dividends.  So operators faced with a technology shift will draw this chart of free cash flow.  When you start deploying your new thing, you face immediate costs and there is little chance early service benefits will offset them.  Your cash curve dips negative and stays that way for a while, till finally the benefits start to catch up.  Eventually the curve turns around, goes positive, and the CFO and shareholders are happy.

The problem with this first-cost thing, applied to SDN and NFV and to the notion of operations-and-agility benefits, is that you can’t secure the benefits of the new technologies without wrapping both legacy and new stuff in a common operations framework.  The need for evolution, in short, combines with the need to reduce first cost.  How can we change everything about operations and service creation with SDN and NFV when we can’t fork-lift our whole infrastructure and start over?  By making the changes “above” the infrastructure, where they apply to the old and the new.

That raises the final and perhaps most interesting truth.  If service agility and operations efficiencies are the goal, and if those goals have to be realized through high-level realignment of operations practices and processes that are implemented in a technology-neutral way, are these two “benefits” really benefits of SDN and NFV?  The answer is, “in part, no they are not”.

Every single network operator out there could benefit from an abstraction-virtualization model of services and infrastructure even if they didn’t change a single device in their network.  They could secure a big chunk of the benefits they want without changing infrastructure at all.  They could, by building high-level cover that’s technology-neutral, ease their way past evolutionary sunk costs and formidable first costs.  Cisco could defend its status quo goal better than by inventing new slogans and sticking its fingers in various weak points.

We don’t need to modernize service operations and management practices to implement SDN and NFV, we have to modernize them before we implement either one, and then use what we’ve done to pave the way.  That’s why operations/management was never, could never be, out of scope for SDN and NFV.

Looking at NFV From the VNF Side Now

Last week, and in prior blogs, I mentioned the fact that virtual network functions (VNFs) have to be recipients of NFV services, and that the sum of these services may determine the ease with which current network code could migrate to become VNFs.  It’s also a determinant in the portability of VNFs across multiple platforms, of course.  Today I’d like to talk about what “NFV services” to VNFs are, and how they might develop.

The foundation of all services offered to VNFs are the execution platform services that actually provide for VNF hosting.  If a piece of network code runs on Linux with a given set of middleware tools, and if this combination was needed in a machine image to deploy the VNF in a virtualization or cloud environment, then that stuff is platform services.  The responsibility of NFV in this case is to permit the assembly of correct machine images or similar artifacts representing the VNFs, and deploy them on suitable resources.  This process is well understood because it happens in every virtualized data center, every cloud.

The only thing that NFV adds to the mix is possible refinement of the “suitable resources” stuff.  Suitability has two basic dimensions—what does the VNF need for technical execution and what optimization policies might be applied to select technically suitable resources.  We’ve had lots of discussions about how it would be really great to have all manner of policy-based decision-making here, but remember that anything you do to optimize the use of a resource pool makes the effective size of the pool smaller and the efficiency of your virtual resources lower.  I personally think we’re gilding the lilly here in most discussions.  Yes, there will be some broad optimization policies imposed, but the business case for NFV simply can’t afford excessive complexity at the optimization level.

The next layer to consider in our NFV services picture are connection services.  VNFs are communications software, and thus they expect to run in a given network framework.  Most, for example, think they are running inside a subnetwork that’s then connected through a default gateway to a broader WAN (or the Internet).  They assume that they have the DHCP and DNS services needed for typical application networking.

This is the area where I think current thoughts about NFV architectures need the most work.  We’ve talked about “forwarding graphs” as diagrams representing the flow of information between VNFs as though these graphs were the solution to connection services.  In point of fact, in virtually all the network applications today, the thing that determines what a given application forwards traffic to or gets traffic from is the application logic.  We don’t pipeline applications today, we simply connect them onto subnetworks and let the applications “find” each other through normal IP means.  They then communicate as they were written to communicate.  Thus, the priority in connection services is to replicate the network environment in which the VNFs are expecting to run.  If that happens to include pipelines/tunnels between components, fine, but in virtually no case would that be sufficient.  You have IP VNFs?  Then you need to put them into an IP framework or they won’t work.

The next and final layer to consider is also the most complicated—network-addressable services.  Some of these (DNS, DHCP, gateway) are lumped into my connection services category because they are a normal part of “the network”.  However, applications may expect to contact other network-connected application or device features and services to run.  They may also be expected to expose some of their own features as network-addressable services to others.

Management offers us the best example of network-addressable services, but it also demonstrates how we have to interwork among our three elements of VNF services to get something to actually work when deployed.

When an application is loaded, it gets most of the services it uses through the mechanism of local APIs that either address operating system or middleware features.  These features would often include management interfaces to provide the application some information about local resources, so “management” has to include some resource-to-VNF views.  The complication here, as it always is with shared resources, is that no application/VNF can be allowed to have unfettered access to shared resources or they contaminate the behavior of other applications.

The connection of the application is the next step, and in traditional IP networking applications don’t set up their own networks, they are deployed onto something and connected into something with a parallel set of tools.  OpenStack deploys via the Nova APIs and connects via Neutron.  But while we have to start our consideration of connection services with what the application code that became our VNF expected, we also have to recognize that in many cases multiple VNFs will create a “virtual device”.  In the real world, the components of such a virtual device wouldn’t be accessible from the outside, and we can’t let them be accessible from the inside either.  Thus, we have to assume that somehow service data paths and address spaces are separated from intra-package among-the-VNF pathways.

I’m of the view that cloud deployment models (like TOSCA from OASIS) represent the most logical way to describe VNF platform and connection services because NFV will be a kind of cloud-plus, an extension of cloud capabilities to include improvements in management and multi-tenancy but still fundamentally a cloud.  We should not be describing new stuff like “forwarding graphs” to describe something that already has software tools working to support an alternate model in the real world.  So VNF management has to look more like cloud management at both the platform and connection services level.

But what about “NFV services?”  Does NFV demand that we create a set of services (presumably primarily network-addressable services) that are exercised by VNFs to support NFV’s MANO functions?  If we make that decision, then we have a couple of problems.  First, current code wouldn’t contain the references to these NFV services, so we’d have to retro any logic we expected to use as the basis for VNFs.  That could mean either abandoning or forking a lot of open-source projects.  Second, we can’t incorporate references to NFV services if we don’t define them and provide a standard way of interfacing them, without making VNFs non-portable.  I believe that a baseline rule for NFV should be my NFV implementation can run, as a VNF, any current network application.  Yes, vendors could provide optional NFV services of their own, but they should always be able to support the baseline—the open-source inventory of network tools we can now run in the cloud or in virtualized data centers.

So that’s a picture of NFV from the VNF perspective.  I’d suggest that operators, journalists, analysts, and everyone who’s interested in building or using NFV tools think about these points.  If we lose sight of the basic mission of NFV—running VNFs—we lose everything.