Can NFV and SDN Standards Learn from the Market?

I’ve commented in a number of my blogs that the standards process for both SDN and NFV have failed to address their respective issues to full effect.  The result is that we’re not thinking about either technology in the optimum way, and are at risk for under-shooting the opportunities both could generate.  There are some lessons on what the right way might be out there in the market, too.

One of the most interesting aspects of this swing-and-miss is that the problem may well come simply from the fact that we have “standards processes for both SDN and NFV.”  There’s more symbiosis between the two than many think, and it may be true that neither can succeed without the other.  There’s some evidence of this basic truth in, of all things, the way that OTT giants Amazon and Google have approached the problem.

Amazon and Google are both cloud providers, and both have the challenge of building applications for customers in a protected multi-tenant way.  That sounds a lot like the control and feature hosting frameworks of SDN and NFV.  When the two cloud giants conceptualized their model, their first step was to visualize components of applications running in a virtual network, which they presumed would be an IP subnet based on an RFC 1918 address space.

RFC 1918 is a standard that sets aside some IP addresses for “private” or internal use.  These addresses (there’s one Class A address, 16 Class B addresses, and 256 Class C addresses) are not routed on public networks, and so can’t be accessed from the outside except through NAT.  The presumption of both Amazon and Google is that you build complexes of components into apps or services within a private address space and expose (via NAT) only the addresses that should be accessible from the outside.

Logically this should have been done for management/control in both SDN and NFV, and NFV in particular should have established this private virtual network model for the hosting of VNFs.  The notion of “forwarding graphs” that’s crept into NFV is IMHO an unnecessary distraction from a basic truth that major cloud vendors have accepted from the first.

OpenFlow, which most NFV implementations use, has also accepted this model in a general sense; cloud applications are normally deployed on a subnet (via Neutron) and exposed through a gateway.  Within such a subnet or private virtual network, application components communicate as they like.  You could still provision tunnels between components where the relationship between elements was determined by provisioning rather than by how the elements functioned, of course, but in most cases complex topologies would be created not by defining them but by how the components of an application/service naturally interrelated.

A service/application model like the private virtual network model of Amazon and Google could provide a framework in which security, compliance, and management issues could be considered more effectively.  When you create a “VNF” and “host” it, it has to be addressable, and how that happens will both set your risk profile and expose your connectivity requirements.  For example, if you expect a VNFM component of a VNF to access resource information about its own platforms, you’d have to cross over the NAT boundary with that data—twice perhaps if you assume that your resources were all protected in private address spaces too.  This is exactly the situation Google describes in detail in its Andromeda network virtualization approach.

Another lesson to be learned is the strategy for resource independence.  Amazon and Google abstract infrastructure through a control layer, so that hosting and connection resources appear homogeneous.  A collection of resources with a control agent to manage the abstraction-to-reality connection is the way that new resources get presented to the cloud.  NFV doesn’t quite do that.

In NFV, we have four issues with resource abstraction:

  1. A Virtual Infrastructure Manager (VIM) is only now evolving into a general “Infrastructure Manager” model that admits anything into the orchestration/management mix, not just virtualized stuff. Everyone in the operator space has long realized that you need to mix virtual and real resources in deployments, so that generalization is critical.
  2. In the ETSI model, a VIM/IM is a part of MANO, when logically the VIM/IM and the NFVI components it represents should be a combined plug-and-play unit. Anyone who offers NFVI should be able to pair their offering with an IM that supports some set of abstractions, and the result should be equivalent to anything else that offers those abstractions.
  3. You can’t have resource abstraction without a specific definition of abstractions you expect to support. If a given “offering” has hosting capability, then it has to define some specific virtual-host abstraction and map that to VMs or containers or bare metal as appropriate.  We should have a set of required abstractions at this point, and we don’t.
  4. You can’t manage below an abstraction unless you manage through that abstraction, meaning that abstraction management is decomposed into and composed from management of what’s underneath, what’s realized. Unless you want to assume that actual resources are forever opaque to NFV management and that the pool of resources is managed independently without service correlation, you need to exercise management of realized abstractions through the element that realizes them, the IM.  The current ETSI model doesn’t make that clear.

Google’s Andromeda, in particular, seems to derive NFV and SDN requirements from a common mission.  What would Andromeda say about SDN, for example?  It’s an abstraction for execution of NaaS.  It’s also a mechanism for building those private virtual networks.

There are some things NFV could learn from other sources, including the TMF.  I’ve long been a fan of the “NGOSS Contract” approach to management, where management of events is directed to processes through the intermediation of the service contract.  That should have been a fundamental principle for virtualization management from the first.  The TMF also has a solution to how to define the myriad of service characteristics without creating an explosion of variables that threaten to bloat all of the parameter files.  IMHO, the ETSI work is at risk to that right now.

For quite a while the TMF data model (SID) has supported the use of “Characteristics” which means a dynamic run-time assignment of variables, a dynamic attribute pattern.  It should be possible, using the TMF approach, to define resource- or service-specific variables and pass them along without making specific by-name accommodation.  What’s required is consistency in production and consumption of the variables, which is needed in any case.

I think there are plenty of operators who agree these capabilities are needed, at least in some form.  I don’t think they’re going to get them from the NFV ISG because standards groups in any form are dominated by vendors because there are more vendors and because vendors have more money to spend.  They’re not going to get them from the OPNFV effort either, because open-source projects are dominated by contributors (who are most likely vendors for all the reasons just cited).

The New IP Agency might help here by shining a light on the difference between where we are in NFV and SDN and where we need to be.  Most likely, as I’ve said before, vendors will end up driving beneficial changes as well as being barriers.  Some vendors, particularly computer vendors, have nothing to lose in a transition to virtual technologies in networking.  While the incumbent equipment vendors surely do, they can’t sit on their hands if there are other vendors who are happy (for opportunistic reasons) to push things forward.

In some way or another, all these points are going to have to be raised.  They should have been considered earlier, of course, and the price of that omission is that some of the current stuff is sub-optimal to the point where it may have to be modified to be useful.  I think standards groups globally should explore the lessons of SDN and NFV and recognize that we have to write standards in a different way if we’re to gain any benefit in a reasonable time.

Will Mobile Partnerships Address the Mobility Opportunity?

Red Hat’s deal with Samsung to co-develop mobile applications for business (and of course for Red Hat and Samsung hosting) mirrors in many ways a deal made last year between Apple and IBM.  As I said in an earlier blog, I think it’s very likely that the next wave of productivity improvement that will drive IT spending to grow faster than GDP will come from mobility.  Of course, everyone who makes ice cream at home doesn’t end up being Ben and Jerry.  What is it that’s actually needed to drive mobility to its full potential?  Again, I would suggest we need to focus on the differences.

A mobile worker is different from a desktop worker primarily in that they are literally on the job, meaning that the mobile device is with them at every point of activity.  It is not there to support planning for work, researching stuff for later action, but to support current action.  It’s a portal through which the worker gets what they need, the things that will help do the job.  The worker, naturally, will come to regard the device as a kind of companion or co-worker—“Hey, Siri/Cortina, can you help me with this?”

I think this point is the starting point for mobile empowerment.  In order for a mobile application to be effective the mobile app has to be humanized.  The interface with the device/application has to look as much as possible like a human, conversational, interface.  I’m not saying that it has to be speech recognition and speech output, but the option to do that has to be available, and the alternative to speech has to look more like SMS than like traditional PC forms input.

If humanization is the key to mobile empowerment then Samsung should be trying to push forward its rumored deal with Nuance (whose Dragon engine is the functional core of Siri) to acquire better personal assistant technology to replace its own S Voice, widely seen as a poor competitor to Siri or Cortina.  They could easily kill their long-term chances in the Red Hat deal without a strong UI.

Another thing that could kill the Red Hat deal and also hamper Apple/IBM success is my next point, which is that mobile applications have to be totally mission-driven.  While you can at times have a mobile application that takes you through steps, you have to visualize this process of leading as though a co-worker was setting the scene.  In that situation, the primary worker could interrupt to ask questions, go back to an earlier process, ask for confirmation, and so forth.

To me this introduces the notion we could call “taskflow”.  A worker’s “job” is divided into a number of “tasks”.  Traditional IT tends to support the “job” and mobile empowerment should be aimed at supporting the “tasks”.  When a worker begins to use a mobile device for empowerment, they must be able to quickly identify the task they’re undertaking, and the mobile application(s) have to then frame themselves around that task to provide the support.

Tasks have step, hence the “taskflow” concept.  I can visualize a taskflow engine, designed to support the sequencing of steps in a task, based on a kind of process flow chart.  “Is Valve A on?  If YES do THIS, if NO do THAT.”  You could probably author this sort of thing in BPEL or a similar language, or view it as a kind of web page that just isn’t shown directly.  The important thing is that above all the specific sequencing, you still have to support things like “WAIT” or “STOP” or “BACK” or “CHECK” in some form.

Within taskflow, a mobile user has to have the steps in the task supported in a way that balances the need for specific visual/audible interaction needed for that specific step with the need to present the step in a contextually relevant and functionally similar way to the steps around it.  An example here is that if you’re driving a worker through the process of tightening a big greasy nut with a big dirty wrench, you probably don’t want to ask the worker to touch “continue” when the step is done.  You also don’t want the worker to signal completion explicitly on one step in one way and on the next in a different way.

Feedback is critical for mobile empowerment, and the most useful feedback of all is where the app can sense the result of the task’s progress in some way.  If a worker is turning a valve, is there a way to tell whether the fluid pressure or flow is responding as expected?  If so, feedback can be provided like “That’s working—keep turning” or “I’m not seeing any change; let’s turn that back and check the valve label again.”

I think that taskflow management is likely to be done in the device itself, meaning that there’s a taskflow interpreter there and an outline of the task steps.  When a process has to be invoked, the interpreter would message a partner agent in the cloud to ask for the step support needed.  If something happens external to the worker, the cloud partner agent would signal the taskflow interpreter to take an action like the “that’s working” or “I’m not seeing” or even “STOP!”

We’re not hearing about this sort of thing from the mobile partnerships so far, and that has me worried.  Most of the IT revolutions we’ve had were hampered at first by the fact that we took old practices and transliterated them into new IT technology form without any regard for whether the practices themselves were suited to the technology.  Mobile empowerment isn’t about giving a worker a mobile version of a desktop screen.

We do get some useful insight from some of Google’s work, particularly with its “Andromeda” project for network virtualization.  I think the lesson of Google’s SDN and NFV evolution is that you really have to consider both at the same time, and also that you have to include the cloud in the mix.  Only a combination of the three can generate dynamism at the application/user level.

I also think we have to get out of the label-inspired boxes and understand that this is about the general challenges and opportunities of virtualization.  We should not be talking about “SDN” or “NFV” but about function virtualization (FV).  We can virtualize routing/switching functions, or we can virtualize firewall and NAT functions, or we can virtualize application functions.  We can mix all this virtualization into a highly personalized vision of the world, or of corporate information and human resources.

The mobility-inspired partnerships are on the right track in that they recognize that mobility is likely the demand driver for all of our revolutionary technologies.  They’re on the wrong track if they think that assembling a lot of legacy parts is going to create a revolutionary whole.

NFV Takes a Critical (but not Conclusive) Step Toward the Real World

The end game for any new technology is deployment based on business value, and I’ve noted many times that the biggest challenge faced by NFV is getting out of the lab and into the network.  That’s now starting to happen, at least as a small step, through a Masergy service called Virtual f(n).  It’s not yet a proof point for NFV overall, but it suggests that there are real new service revenues out there for NFV.

Masergy is a private company that’s specializing in cloud networking or perhaps better yet in NaaS.  They’ve been well-regarded in the industry, and as an “alternative” provider of business services have always focused on agile deployment and service enhancement as a differentiator.  Virtual f(n) is a continuation of that approach.

The foundation concept in Virtual f(n) is an agile vCPE box, provided by Overture Networks, in which network features can be hosted.  In the initial deployment, Masergy is hosting a router from Brocade and a firewall from Fortinet.  The Overture box is their 65vSE, which is based on Intel’s Atom CPU, and which Masergy says is one of the very few that can achieve line-rate performance in a service chain application.

The Masergy business model is a big driver for Virtual f(n).  Their customers aren’t concentrated in a few metro areas but rather are scattered over a truly global geography, which makes it highly important to be able to start up and modify new installations without a lot of truck rolls.  With the 65vSE, Masergy can load features into a premises device as needed, and couple these features with the highly agile cloud network service they offer overall.  It seems pretty clear that Masergy intends to go beyond just two virtual functions, though it’s not yet clear just what their next steps will be.

This is an important step forward for NFV, obviously.  Operators have been telling me for two years now that for NFV to roll out they’ll need benefits beyond the original notion of saving money by displacing appliances with COTS-hosted software.  Proving those benefits hasn’t been easy, and in fact Masergy is the first true commercial NFV-based service deployment I know about.

There are two technical things about the Masergy deployment that I think are important.  First, Masergy picked a company that has an open VNF hosting model.  Overture has partners, of course, but they have worked hard to make the 65vSE an open device that’s essentially a locally sited piece of NFVI.  Second, Overture is one of only four vendors that has what I believe to be a truly credible orchestration story as part of their Ensemble suite.  I think both of these played a role in their selection, and I think that in the long run orchestration and management is what will make the Virtual f(n) concept operationally efficient enough to be profitable.

It would be nice to say that this is a giant step toward NFV deployment, but it wouldn’t be a fair comment.  It’s an important step absolutely, and the vCPE-driven or edge-hosted model offers carriers a major advantage in controlling first costs and delivering benefits that scale with costs.  It’s also obviously somewhat specialized to business services, and Masergy is most likely to benefit from it because of new customer turn-ons and changes in service topology.  None of those factors would be enough to create a major NFV explosion, and none have a clear pathway to evolve toward a centralized, cloud-hosted, NFVI that’s still seen as the essential element in NFV future.

What I think this deal does is demonstrate a point I’d made a couple blogs back.  We are now to the point where “standards” are not going to drive NFV, and where NFV PoCs are going to be of very limited value.  There are too many things that have to be done in order to make NFV beneficial enough, and there’s precious little being done to get these things completed.  Vendors, I’ve suggested, will have to take the lead and drive NFV forward by boldly supporting needs even if it means stepping outside the scope of traditional NFV standards.

Overture, for example, has developed a repository-and-analytics management framework that is much closer to the ideal for NFV than the ETSI vision of per-VNF managers.  They’ve also promoted an edge-hosted, generalized-appliance, vision for VNF deployment rather than a centralized pool of resources.  But they’ve done something useful, very useful.

The thing to do now, I think, is watch where Masergy takes Virtual f(n).  I’m particularly interested in what happens with the Brocade/Vyatta router stuff.  Right now, Masergy has a lot of real iron in its networks.  If it takes SDN and NFV principles seriously it will start placing customer-network-specific cloud-hosted router instances inside the cloud at topologically convenient places.  It will start to use Overture’s box to provide multiple Ethernet service virtual terminations for a variety of services.  And if it does all of that it could be the first example of a real SDN/NFV network to create global commercial services.

It could also become the first to prove out the operations questions.  vCPE has an advantage over “traditional NFV” in that it’s loading virtual functions into customer-located-and-dedicated devices.  That simplifies management considerably by eliminating the shared-pool resource management issues that NFV would normally create.  However, it’s very possible to see new VNFs getting deployed that are both more dynamic than edge routers and firewalls would be, and even to see Overtures’s orchestration being spread to aspects of interior network control.  If that happens, then Overture might gain some significant experience in operational integration, something that’s absolutely critical for NFV.

So far, none of the players here are declaring this to be the Dawn of a New Age, which I like because it’s still a bit too early to say whether the lessons of Virtual f(n) will be broadly useful in promoting NFV to other operators and in other missions.  However, this is an absolutely critical step from vendors (Brocade, Intel, and Overture) demonstrating that you can boldly go beyond the too-narrow scope of traditional NFV and find gold.

Cloud and NFV Revolution: Arista’s CloudVision and Beyond

One of the persistent challenges SDN and NFV have faced is the conflict between their “revolutionary” label and the pedestrian applications that tend to come to light.  While there’s much new and different that could be done with either technology, most of what is done looks an awful lot like what you could do with standard devices.

One reflection of this conflict came out in a recent network conference, where a question on the loss or contamination of an SDN controller was answered by one provider with the comment that they’d fall back to legacy mode on all the devices—meaning that the SDN features were being used only to augment normal adaptive device behavior.

Arista Networks is an SDN vendor that has at least taken things further than the traditional.  Their high-level strategy has been based on a single distributed network operating system that provides uniform functionality across devices.  Just this week they announced “CloudVision”, an application of their EOS to enterprise cloud computing.  There’s a lot of interesting stuff in CloudVision, not the least of which being the features it exposes that could be valuable—even critical—to NFV.  This, when Arista doesn’t seem to be targeting the NFV space at all.

At a high level, CloudVision is a network automation or orchestration tool.  Arista characterizes it as a kind of third-generation approach to automation, with generation one being a do-it-yourself customized approach and generation two being something based more on DevOps tools that already exist.  Generation three, says Arista, is

If you see the cloud as a kind of static server-consolidation host, there’s not much that has to be done to networking to make it work.  The challenges arise when you make cloud processes dynamic, meaning that you can move them, replace them, or most of all scale them.  It gets worse when you design applications that create elastic relationships between work and processes, something like the grid computing of old.  If the cloud creates a kind of enormous and flexible virtual computer in the Great Somewhere, the distribution of the components of that virtual computer creates a problem with what we call state.

“State” is another way of saying “process context”.  We have discussions about state every day, when we ask somebody to do something and they say “I’m busy”.  It’s easy to see how you could represent the state of a processing system that has one element—your requested resource.  What if there are two, or a dozen?

The state problem hits network management right between the eyes because a workflow or path or whatever you’d like to call it is the result of a series of stateful behaviors.  Each switch has to participate in a flow through the means of a forwarding table entry.  The state of a flow is the sum of the states of all the switches, and of course all the trunks that connect them.  The fact that there’s nowhere to go to get the real state—or perhaps there are too many places to go—means that you can’t really know the status of the workflow you’re interested in.

Arista builds an abstraction, in a sense, that represents the “virtual” or workflow network.  This abstraction can be pushed down onto devices, and as that happens CloudVision keeps the state of the devices associated with the workflows they support.  It’s now possible to recover the state of the workflow itself, which means it can be managed.  This is why the basic feature of CloudVision, the first one that Arista lists in their documentation, is centralized control of network state.

If you have centralized state for your workflow abstractions you can manage SDN services in a way that’s very similar to how networks are typically managed today.  That’s a powerful capability for SDN networks, particularly for enterprises who are used to a given management approach and don’t want to rock the boat.  But as powerful as CloudVision is with this centralized control of network state, I think there’s greater power to be had, and it’s in this greater power stuff we find some areas that NFV and even broader cloud computing use might need.

There’s more to a cloud application than the network.  Distributed components, linked with CloudVision, form a higher-level abstraction than the workflow—they form the workprocess in effect.  If Arista could combine CloudVision as it is—central control of network state—to become central control of process state, they could “know” all about an application and its relationship to work.

Imagine a cloud that could know when to autoscale, to replace a component, to reconfigure itself at the process hosting level to accommodate changes in traffic.  Something like that could be done by extending CloudVision to the workprocess level.  Some cloud users are already in a position where something like this would be helpful for work management reasons.

Also imagine a sure, effective, distributed way of handling load balancing, something that could reflect the needs for stateful processes and not just stateless web-like exchanges.  NFV introduces this requirement in every single application where horizontal scaling is proposed.  Without distributed load balancing that can reflect process state, you can’t really scale anything.

The point here is that the distributed-state-centrally-represented concept (which is part of Arista’s DNA) is pretty significant for the future of the cloud and for NFV, but you have to extend it from where Arista focuses (the network) upward to that workprocess level.

SDN creates what are effectively virtual devices, distributed over a wide area and made up of many elements.  The state of a service in SDN is dependent on the state of the cooperative device elements that participate.  NFV and the advanced cloud applications create what are effectively virtual devices, too.  They look like giant virtual computers, and their work capabilities have to be represented centrally just as the networks’ capabilities have to be so represented.

I think we’ve missed a lot of this discussion in NFV.  Some NFV supporters, like Metaswitch, have always recognized the need for stateful load balancing in their Project Clearwater IMS implementation.  That’s good, but it would be better if the industry at large understood what Arista does at the network level, and did something about it.

As far as I can see, there’s no commitment on Arista’s part to extend CloudVision up to workprocess state management, but I think it’s something the company should look long and hard at doing.  If they can pull that off, they could be not only the “best” approach to cloud workflow orchestration, they could be the only way to build a truly agile cloud, and to support NFV along the way.

Some General Thoughts on SDN/NFV Security

SDN and NFV security are issues that rank high with network operators and users, but “security” is an issue that ranks high with everyone and ranking doesn’t always equate with rational action.  Of 48 operators I’ve talked with over the last six months, all place SDN and NFV security among their top three issues.  Enterprises, in my past surveys, have also ranked network security in the top three in over 80% of cases.  But only 38% of enterprises in my survey said they had specific and effective strategies for network security, and only four of 48 operators in SDN/NFV said they even had a handle on it.

A big part of the problem of coming to terms with SDN or NFV security is the lack of a really effective model of security issues.  My surveys have shown that enterprises and operators accept that security includes connection security that provides freedom from interception or injection of information on networks, information security that protects information in database form, and software security that protects programs and components from induced bad behavior.  There’s clearly an overlap in the categories, indicating to me that our notion of security is to secure things rather than to address classes of risks.

Another issue with SDN and NFV security is that it’s easy to get wrapped around what could be called the “legacy axel.”  SDN and NFV both produce network services, and all network services have some common security issues—the need to prevent either interception of traffic or unauthorized injection of traffic, for example.  One might be tempted, with both SDN and NFV, to catalog all of the network security issues currently recognized and then rush out to address them in an SDN or NFV context.  I’d submit that exercise could take quite a while, and it might or might not really hit all the SDN/NFV issues.

There might be another way, which is what I’ll call “differential security”.  SDN and NFV are different in certain ways, and those differences are what will generate incremental differences in SDN and NFV security.  If we insure that SDN and NFV implementations deal with securing their differential issues, then we should end up with at least what we’d have had with legacy services.  If SDN or NFV have to do more, then we’ll need a map of specific security issues to be addressed and mechanisms proposed to address them.

All software-controlled or software-defined network technologies have an obvious security dependence on the software.  Anything that’s going to be loaded and run can immediately start doing stuff, at least some of which might be malicious.  For SDN and NFV, the paramount security concern should be software authentication.  Every piece of software run has to be authenticated, meaning that the software copy must be a true copy of an explicit release by an accredited organization/entity.  Furthermore, every parameter submitted to control the operation of the software must be similarly authenticated.

There’s a corollary rule here that I think could be almost as important as the primary rule.  No piece of software and no software parameter file can ever be loaded except by the control software of the network owner.  There can be no situations where service-level software can load other service-level software.  If that happens then the authentication process is broken because there’s no guarantee that proper procedures will be followed.

Software authentication is essential but not sufficient to protect SDN or NFV.  The problem is that software-defining as a process has a natural span of control—a given piece of software is entitled to do specific stuff.  To let it do more is to invite errors at best and malice at worst.  Thus, all SDN and NFV software must be sandboxed to insure that it can’t spread its wings beyond its intended scope.  In most cases this will mean controlling the software’s interfaces, the connections to the outside world.

The most logical way to build a sandbox for SDN and NFV is to presume that you’re going to expose all your interfaces onto a virtual private network or within a private address space, and then explicitly “gate” selected interfaces into the wide world.  You absolutely cannot make your control interfaces accessible in the public address space, and that includes interfaces between virtual components.

This is particularly important for “management interfaces” because NFV in particular presents a risk of “leakage” of untrusted elements into a trust-heavy environment.  VNFs are caught between two worlds; they are not part of the carrier’s NFV software any more than applications bought from third parties are part of a company’s data center.  If management interfaces are exposed within a service then VNFs become a possible portal for malware.

Even in a world of sandboxed, authentic, software we still have some risks of an inside job.  When you can’t prevent a problem, at least identify the culprit reliably so you can punish them as a deterrent.  That means that every operational change made to a software system has to be attributable via a state audit process.  You sign a “change request” and submit it, and the change is then stamped with your signature.  In fact, you could make a strong argument for all management-plane messages to be digitally signed.

Attributability is one of the SDN/NFV topics that I think have gotten short shrift in discussions.  One reason it’s important to talk about it is that not all signature strategies are really capable of being applied to a broad and loosely coupled community of elements.  Yet that’s what SDN and NFV would create.  I’d like to see some strong recommendations in this area.

Boundary issues are my final point.  SDN and NFV have to interwork with legacy technology at various points, and these points represent places where management and control are different on each side of the mirror.  Information like reachability and topology data in IP and even Ethernet may have to be generated or converted at border points, and with this comes the risk of something spoofing itself into creating a failure or even a breach.  To the extent that SDN and NFV create borders, they have to be examined carefully to insure that they’re secured.

The sum of the SDN/NFV security situation is that SDN and NFV create multi-element “black box ecosystems” that represent device functionality of the past.  We have to insure that these black boxes don’t create increased levels of risk versus the devices they displace.  Otherwise black boxes become black hats.

How SDN and NFV Impact Netops

The impact of SDN and NFV on OSS/BSS is a topic that obsesses many operators and also a topic I’ve blogged about extensively.  There’s no question it’s important, but there’s another kind of operations too—network operations.  It’s not always obvious, but both SDN and NFV would have profound impacts on network operations and the operations center—the NOC.  Some of the impacts could even threaten the classic model we call “FCAPS” for “Fault, Configuration, Accounting, Performance, and Security”.

In today’s world, network operations is responsible for sustaining the services of the network and planning network changes to respond to future (expected) conditions.  The ISO definition of the management model is the source of the FCAPS acronym and it reflects the five principle management tasks that make up network operations.  For enterprises, this model is pretty much all of operations, since most enterprises don’t have OSS/BSS elements.

To put netops, as many call it, into a broader perspective, it’s a function that’s typically below OSS/BSS and is made up of three layers—element management, network management, and service management.  Most people would put OSS/BSS layers on top, which means that service management on the netops stack is interfacing or interconnecting with the bottom of OSS/BSS.  Operations support systems and business operations “consume” netops services.  Netops practices can be divided by the FCAPS categories, but both enterprises and service providers tend to employ a kind of mission-based framework based more on the three layers.

Virtualization in any form distorts the classic management picture because it breaks the convenient feature-device connection.  A “router” isn’t a discrete device in SDN or NFV, it’s a behavior set imposed on a forwarding device by central intelligence (SDN) or it’s a software function hosted on some VM or in a container (NFV).  So, in effect, we could say that both SDN and NFV create yet another kind of layered structure.  At the bottom is the resource pool, in the middle is the realized virtualizations of functions/features, and at the top are the cooperative feature relationships.  In a general way, the top layer of this virtualization stack maps to the bottom (element) layer of the old netops stack.

It’s easy to apply the five FCAPS disciplines to the old netops stack, or at least it’s something everyone is comfortable with and something that’s well supported by tools.  If SDN and NFV could just as easily map to FCAPS we’d be home free, but it’s pretty obvious that they don’t.

Take the classic “fault”.  In traditional netops, a fault is something that happens to a device or a trunk, and it represents aberrant behavior, something outside its design range of conditions.  At one level this is true for SDN and NFV as well, but the problem is that there is no hard correlation between fault and feature, so you can’t track the issue up the stack.  A VM fails, which means that the functionality based on it disappears.  It may be difficult to tell, looking (in management terms) at the VM, just what functionality that was and where it was being used.

We can still use classic FCAPS, meaning classic netops, with SDN and NFV as long as we preserve the assumption that the top of our SDN/NFV stack is the “element” at the bottom of the ISO model.  That’s what I’ve called the “virtual device” model of the past.  The problem is that when we get to the ISO “element” virtualization has transformed it from a box we can send a tech to work on, into a series of software relationships.  Not only that, most of the relationships involve multi-tenant resource pools and are designed to be self-healing.

One logical response to this problem at the enterprise level is to re-target netops management practices at the virtualization stack, particularly at the pooled resources, and treat the ISO netops stuff almost like an enterprise OSS/BSS.  This could be called asynchronous management because the presumption is that pooled resources would be managed to conform to capacity planning metrics and self-healing processes (scaling, failover) would be presumed to do everything possible for service restoration within those constraints.  A failure of the virtualized version of a service/device would then be a hard fault.

This seems to me to be a reasonable way of approaching netops, but it does open the question of how those initial capacity planning constraints are developed.  Analytics would be the obvious answer, but to get reasonable capacity planning boundaries for a resource pool would require both “service” and “resource” information and a clear correlation between the two.  You’d want to have data on service use and quality of experience, and correlate that with resource commitments and loading states.

Not only that, we’d probably need resource/service correlations to trace the inevitable problems that get past the statistical management of pooled resources.  Everyone knows that absent solid resource commitments per service, SLAs are just probability games.  What happens when you roll snake-eyes?  It’s important in pool planning to be able to analyze when the plans failed, and understand what has to be done (one option being roll the dice again, meaning accept low-probability events) when they do.

There’s also the question of what happens when somebody contacts the NOC to complain about a problem with a network service.  In the past, NOC personnel would have a reasonable chance of correlating a report of a service problem with a network condition.  In a virtualized world, that correlation would have to be based on these same service/resource bindings.  Otherwise an irate VP calling the NOC about loss of service for a department might get a response like, “Gee, it appears that one of your virtual routers is down; we’re dispatching a virtual tech to fix it!”

To take this to the network operator domain now, we can see that if we presume that there exists a netops process and toolkit, and if we assume that it has the ability to track resource-to-service connections, we could then ask the question of whether OSS/BSS needed to know much about things like SDN and NFV.  If we said that the “boundary” element between the old ISO stack and the new virtualization stack was our service/resource border, we could assume that operations processes could work only with these abstract boundary elements, which are technology-opaque.

This then backs into operators’ long-standing view that you could orchestrate inside OSS/BSS, inside network management, or at the boundary of the two.  The process of orchestration would change depending on where you put the function, and the demands on orchestration would also change.  For example, if OSS/BSS “knows” anything about service technology and can call for SDN or legacy resources as needed, lower-level network processes don’t have to pick one versus the other for a given service.  If operations doesn’t know, meaning orchestration is below that level, then lower-level orchestration has to distinguish among implementation options.  And of course the opposite is true; if you can’t orchestrate resources universally at a low level, then the task of doing that has to be pushed upward toward the OSS/BSS in network operators, or perhaps totally out of the network management process of enterprises, into limbo.  There is no higher layer in enterprise management to cede orchestration to, so it would end up being a vendor-specific issue.

This point puts the question of NFV scope into perspective.  If you can’t orchestrate legacy, SDN, and NFV behaviors within NFV orchestration you have effectively called for another layer of orchestration but not defined it or assigned responsibility to anybody in particular.  That not only hurts NFV for network operators, it might have a very negative impact on SDN/NFV applications in the enterprise.

An Operator’s View of Service/Resource Modeling

I had an interesting exchange with a big national carrier on the subject of management integration and unified modeling of services.  I’ve noted in past blogs that I was a fan of having a unified service model, something that described everything from the tip-top retail experience to the lowest-level deployment.  I pointed out that such a model should make management integration easier, but also that it would be possible to have integrated management without a unified model.  You’d need only firm abstractions understood by both sides at critical boundary points.

My operator friend had given some thought to the same point, and reached somewhat different conclusions.  According to the operator, there is a distinct boundary point between service management and resource management.  Above the line, so to speak, you are focused on conformance to an explicit or implied SLA.  It’s about what the user of the service is experiencing, not about the state of the resources underneath.  In fact, you could have significant faults at the resource level that might not generate any perceptible service impact, in which case nothing needs to be done in service terms.

Taking this even further, the operator says that as you move into virtualized resources, where a service feature is created by assembling hardware/software tools that probably don’t directly bear on what that original feature was, in management terms, the boundary between services and resources gets even more dramatic.  So dramatic, says the operator, that you may not want anyone to cross it at all.

Let’s presume that we have a “service layer” and a “resource layer” both of which are modeled for automated deployment and assurance.  The boundary between the two layers is a set of abstractions that are the products of resource behaviors.  An example might be a “VPN” or a “subnet” or a “VLAN”.  The services are built from these abstractions, and it’s the role of the resource layer to make them real.

The operator’s vision here can now be explained a bit more clearly.  A “VPN” abstraction has a set of management variables that are logical for the abstraction.  The service management side has to be able to relate the state of those variables up the line to the customer or the customer service rep.  However, there is no reason for the customer or CSR to dive down through the boundary to see how a given “abstraction property” was derived from the state of resources.  After all, what’s underneath the abstraction covers is likely a shared resource pool that you don’t want people diddling in to begin with.

There’s a lot to say for this view, and it may be particularly important in understanding the two camps of operations modernization, the “I want to preserve my OSS/BSS processes and tools” and the “I want to start OSS/BSS over from scratch” groups.  If you take my operator friend’s perspective, you can see that things like SDN and NFV can be viewed as “below the line”, down in the resource domain where operations systems and processes that are (after all) primarily about selling and sustaining the service don’t go.

I kind of alluded to this approach in a past blog as the “virtual device” mode, but I think it’s a bit more complicated than the term suggests.  The operator is saying that “networks” are responsible for delivering abstract features from which we construct services.  There may be many ways a given feature could be created, but however many there are and however different they might be, the goal should be to harmonize them to a single common abstraction with a single set of properties.  Management above the line relates those properties to the overall state of the service as defined by the service’s SLA, and management below the line tries to insure that the abstraction’s “abstraction-level agreement” (again, explicit or implied) is met.  Both parties build to the same boundary, almost the way an NNI would work.

The difference between this and what I was thinking of as a virtual device approach is that in general the abstractions would be network behaviors, cooperative relationships involving multiple devices and connections.  My thought was to make “virtual devices” that mapped primarily to real devices.  I have to agree with my operator’s view that the abstraction-boundary model makes more sense than the virtual device model because it fixes the role of OSS/BSS at the service level and lets fulfillment go forward as it has to, given the resources used.

The value of this independence is that “services” know only of network behaviors that are “advertised” at the boundary.  Most new network technologies, including SDN and NFV, are considered primarily as alternative ways of doing stuff that’s already being done.  We have IP and Ethernet networks; SDN gives us a different way to build them.  We have firewalls and NAT and DNS; NFV gives us a different way to deploy those features.  In both cases, though, the service is made up of features and not devices or technologies.

Oracle, interestingly, has been touting something it’s working on in conjunction with the MEF, called lifecycle service orchestration or LSO.  The Oracle model shares much with Alcatel-Lucent’s vision of how operations and NFV or SDN coexist, but interestingly my operator friend says that Oracle, Alcatel-Lucent, and the MEF don’t articulate that critical notion of a boundary abstraction to the satisfaction of at least this particular operator.

The abstraction-boundary approach would make a two-tier management and orchestration model easier to do and set boundaries on implementation that would largely eliminate the risks of not having a single modeling approach to build both sides of the border.  In fact you could argue that it would allow vendors or operators to build service-layer and resource-layer structures using the stuff that made the most sense.  Two models to rule them all instead of one.

Or maybe two dozen?  In theory, anything that could present an abstraction in the proper form to the service layer would be fine as an implementation of the resource layer.  The abstraction model could at least admit to multiple vendor implementations of lower-layer orchestration and resource cooperation.  It could even in theory encourage them because the abstraction and the existence of a higher-layer service model would harmonize them into a single service.  It’s a bit like telling a programmer to write code to a specific functional interface; it doesn’t matter what language is used because from the outside looking in, all you see is the interface.

I’m not quite ready to give up on my “single model” approach, though.  Obviously you can create an abstract-based boundary between services and resources using a single model too; you can create such a boundary anywhere you like whatever model you use.  The advantage of the single model first that you can traverse the model from top to bottom with the same tools, and second that you are using the same tools.  I confess that current practices might make traversing an entire model less useful than I’ve believed it would be, but we still have to see how practices adapt to new technologies before we can be sure.

This is the sort of input I think we should be trying to solicit from operators.  There are many out there who are giving serious thought to how all of our revolutionary technologies would work, and many are basing their views on early experimentation.  This sort of thing can only strengthen our vision of the future, and the strength of that vision will be a big part in getting it financed.

What Oracle Teaches Us About the Cloud

Oracle reported their numbers on Wednesday, and the results weren’t pretty by Street standards.  The company missed pretty much across the board, and in particular in Europe.  Oracle blamed foreign exchange for much of their problem, but the general financial-industry consensus is that it’s deeper than that, including dragging hardware, poor execution, and perhaps a lack of an effective cloud strategy overall.  One Street report said that few would doubt that Oracle will be a major cloud player.  That may well be true, but many (including me) doubt what the profit implications of being a major cloud player will turn out to be.

The overwhelming majority of business IT executives and general management teams say that the reason for cloud adoption is reduced cost.  OK, but if that’s true then we also have to ask how vendors and providers will end up making money on the cloud.  Remember that old joke, “I’m losing money on every deal but I’m making it up in volume?”  For any provider of cloud services or equipment, particularly the CFO in such a company, this is a bad joke indeed.

Business spending on IT (compute and networking) is made up of two components; what can be called “budget” spending and what can be called “project” spending.  Budget spending is allocated to sustain the IT operations in place, operations that were previously justified by business improvements they supported.  Project spending is allocated to reap new business benefits by adding IT features or capabilities.

Any buyer would look at this picture and come to two important conclusions.  First, the smartest thing for me to do with budget spending is to reduce my budget by getting a better deal on technology.  In fact, on the average, companies say they would like to see budget spending decline by about 5% year over year.  Second, I need to justify my project spending carefully because it first has to meet my company targets for ROI and second it will be contributing (through modernization and replacement) to budget spending for some time forward.

CIOs and IT planners understand budget spending pretty well; it boils down to getting more bang for your buck.  They generally don’t understand project spending all that well because in many cases they don’t have a good opportunity model in place.  Projects are driven by benefits, and if you can’t identify opportunities to reap benefits you’ll not justify projects.  That means specific benefits, with quantifiable outcomes, not catch-phrases like “agility”.  Oracle’s first problem is that they have not effectively presented project-oriented benefits, so their cloud offerings tend to fall into the budget buying that’s based only on reducing cost.  Oracle is the cost getting reduced.

Historically, project spending has been driven by what could be called “waves”.  When computers came along and batch processing of information contributed to better management reports and operations, we had a wave.  Distributed computing produced a second wave, the minicomputer and real-time applications.  Personal computing produced a third wave.  With each wave, the new compute paradigm was widely socialized in technical media and the essence of the value proposition for that new paradigm was quickly spread through management.  Project spending ramped up, until eventually all of the new benefits were reaped.  At that point, project spending declined and IT spending overall dipped relative to GDP growth.  You can see this sinusoidal set of peaks and valleys for the three waves we’ve had.

The challenge of project spending today is that there is no clear fourth wave, nor IMHO is there a clear way to socialize a candidate for it.  There’s no real trade media insight any more.  The climate of technology today is controlled by budget spending, which you’ll recall is an area where the target is always to lower IT cost.  If the cloud is important, it’s because it lowers IT costs.  Same with SDN, NFV, or whatever.  No new benefits, no new spending, and what you face is a period of commoditization and decline.  Oracle’s second problem is that the natural direction of IT coverage in the media today, the natural focus of CIOs, is cost reduction.

But does it have to be this way?  The interesting thing about past cycles is that each of them have showed about the same peak/valley relationship.  The problem is that while in the past one valley was followed within a few years by the ramping to a new peak, and since 2002 we’ve had no such ramping—we’ve stayed near the historical lowest ratio of IT spending growth to GDP growth.  There has been no driver to a new cycle, and because we’ve had over a decade of cost-driven IT most of the management on both the seller and buyer side have lost the charts to productivity benefits’ safe harbor.  That includes Oracle.

Part of the problem is that each past productivity cycle, and likely all future cycles, are driven by a new way of supporting workers.  In the past, the change was easily visualized by everyone.  No computing to batch computing—easy to understand.  Batch to real-time—also easy.  Same with remote real time to personal computing.  You can see that each of these trends seem to move computing forward in the worker production cycle—closer to the worker in a physical sense.  But with personal computing we’re there at the worker.  Do you go inside for the next cycle?  You see the visualization problem.

I’ve grappled myself with understanding what the next cycle driver might be.  Clearly it has to have something to do with work and workflow in an IT sense, because the best way to look at past cycles is really not where the computer is but how it plays in the worker’s activity.  My current thinking is that the next cycle would be driven by point-of-activity empowerment.  The difference would be that the worker would no longer even be going to IT, they would be taking IT with them.  Thus, it’s about mobility.

Even if I’m right about mobility, though, it’s not as easy a thing to exploit.  If PCs are a new productivity wave, you buy PCs.  What do you buy if mobility is the next wave?  Mobile phones or tablets aren’t the answer.  First, users have them already—from their company or personally.  Second, the application of PCs to productivity was clear.  Microsoft Office or Oracle’s Creative Suite were the embodiment of PC-based productivity enhancement.  What’s the brass-ring mobile app?  For the first time, perhaps, our next wave is enabled by something (mobility) but is realized in a more complicated way, through a lot of changes in a lot of areas.

Sales stories based on complicated value propositions have always been difficult, and in an age where an insightful online article is about 500 words long, it’s almost impossible to guide planners through mobile-enhanced productivity changes.  Oracle’s sales failure is probably more a marketing failure, because complex new stories have to be told first, broadly, and inspirationally, in the marketing channel.  A salesperson will never be able to drag a buyer out of total ignorance of a benefit case into the state of good customer-hood.

It’s not that Oracle got to the cloud too late, but that they’re apparently (at the marketing and sales levels) getting to it wrong.  At the root of Oracle’s problems is the fact that they’re seeing the future as a cloud transition.  It’s not; it’s a mobile transition that will convert the cloud from a cost-saving-and-revenue-killing strategy to a strategy to gain new benefits and increase total IT spending.  They’re not the only ones with that problem.

The cloud, mobility, and virtualization can change the world and probably will eventually.  The question for companies is whether they’ll be able to navigate those changes, and the long-standing tendency to take an easy sales argument like “Saves money!” and run with it in the near term in the hope of gaining market share is hurting them.  You’ve got to be committed to revolution and not contraction.  True for Oracle, true for us all.

NFV’s Virtual Node Opportunity Could be Significant

I’ve blogged now about “edge-based” and “interior” NFV service opportunities, and in the latter I noted that I was going to treat the case of “interior nodes” separately.  Many of you will probably understand why that is the case, but I hope to show everyone why nodal services are different, and perhaps generate some discussion even among those who’ve known that all along.

A network is a structure designed to generate connectivity through a combination of trunks and nodes.  Nodes provide data passage among trunks, what in the general sense is called “forwarding” based on addresses.  Trunks connect the nodes.  In the old days before virtualization, it was pretty obvious what each of these two things was and most important where they were.

The thing we call “virtualization” (again in its most general sense) has changed networking gradually by allowing for “virtual trunks” that were really paths composited from segments of media.  The segments could be parallel, like lambdas, or sequential as in strung along in a line.  In the OSI model, a given layer offers abstract services to the higher layer and creates them within its own domain as needed, so “virtual circuits” and paths or tunnels are equivalent to physical layer if that’s how the services are pictured.  We’ve had networks based on that for ages.

We’ve also had virtual nodes, in a sense, through VLAN and VPN services.  These services look to the user like a kind of device and replace the classic node-and-trunk per-user structures.  SDN can augment these virtual-node services because it can provide customized forwarding control that could either structure traditional services in a different way or build new ones with new forwarding rules.  You can also simply host a bridge or router instance in a server or VM and create a software node.

The thing that’s common to both legacy and virtual-node models today is that the topology of the network, including the placement of nodes, is fairly static.  You may have a real router or an SDN white box or a software router, and you may let it work adaptively or control forwarding explicitly, but it kind of is where it is.  In theory, NFV could change that and the question is under what circumstances a change would be useful.

There are a lot of reasons why a network topology could change, the most obvious of which is that the optimum location for nodes was impacted either by changes in traffic flows or by changes in the underlying transport properties on which the trunks were based.  The former is obviously the most “dynamic” but you can see the problem immediately; “traffic flows” in aggregate may not be that dynamic.  On the other hand, suppose we returned to the notion of networks of old, trunks and nodes, and simply substituted “virtual” into each term?  On a per-user basis, topology optimization would be a lot more useful.

If we view VPNs and VLANs as being created by special features at Level 3 and 2 (respectively) by routers and switches then we lose dynamism value by aggregating stuff.  If we build private networks, which is what VPNs and VLANs build, in the “old” way with virtual switches and routers then we personalize topology to the service and users.

Virtual trunks of any sort could be married to NFV-deployed virtual routers/switches to create an instant and different model of a virtual private network.  Not only that, the nodes could be moved around, replicated for performance, etc. providing that you could tolerate the disruption in the packet flow.  Obviously a performance problem or a device failure would create a disruption anyway, so it’s a matter of degree.

This model of a virtual private network/LAN could be connected to user sites through a virtual pipe, which would mean their on-ramp router was in the cloud, or through a virtual trunk from a (likely virtual) router hosted on premises.  That could be customer- or carrier-owned.  Since the interior router on-ramp would exist in either case, this looks like what I’ve called the “agent model” of service access; a cloud agent represents each network site and you connect with your agent to get the service.

One of the interesting consequences of this model is that connection services live totally in NFV data centers; you can only host virtual routers where you have hosting resources available.  That facilitates the introduction of other NFV services since you have direct access to user traffic flows from the places where new service features could be placed.  You’d never have to “hairpin” traffic to get to a service chain, for example.

If the model is carried to its logical conclusion and all business virtual network services are hosted this way, you also have a natural way of connecting to applications the operator hosts in their own cloud computing service.  The carrier would have a competitive advantage because they’d have direct connection with customer flows; no additional delay or risk of failure would be introduced if cloud applications were co-located with virtual routers carrying the private network traffic.

There are also, obviously, operational questions that would have to be answered.  One big one is whether multiplying the router instances needed to build private networks (versus providing for them out of shared devices) generates a risk of greater opex.  I think that if we assumed that our trunk underlayment was engineered for reliability and also had path recovery capabilities, you might actually end up with lower operations costs.  You’d also eliminate some security issues by partitioning traffic by customer/service below the router layer.  But we need work on this point.

We also need work on understanding how the model would apply to multi-user service subnets.  If a service is supported on its own subnet like an application, and if users are gated onto that subnet as needed, how would the fact that the users weren’t of the same company impact the topology and costs?  That would also help answer the question of how mobile users would impact the VPN/VLAN picture.  Does the operator provide a mobile virtual on-ramp in multiple metro areas, and if so how would that be priced and how would it impact traffic and operations?

I believe that virtual routers and switches could have a profound impact on network services, starting with VPNs and VLANs, but I also believe that both operators and vendors have tended to think in too linear a way regarding how they could help.  You can never take full advantage of virtualization if you create virtual things that exactly mirror the capabilities of physical devices already in use, then place the virtual things in the same place you put the physical devices.

In some ways, virtual switching and routing deployed by NFV and organized and accessed via SDN could be a killer app for both SDN and NFV.  That would be particularly true where an operator had a lot of new customers or moves, adds, and changes and so would be refreshing infrastructure more regularly.  Of all the NFV opportunities, deployment of virtual nodes has received the least attention.  I think that should change.

Exploring “Natural-Interior” Applications of NFV

I blogged yesterday about the vCPE model of services, talking both about its role in NFV and how it might have a life outside/beyond NFV.  Some of you were interested in what might be said about other new service models, and in particular how “interior” rather than edge models could work.  My information on this topic is more speculative because operators have focused on vCPE and have really looked at only a few “interior” models, but I’m happy to offer what I have and draw some tentative conclusions.

Edge-related services, in my vernacular, are services that are normally associated with devices connected at or near the point of service connection.  Interior services are thus the opposite; ones that don’t normally look like an extension of the access device.  I’m not classifying service chains that support natural-edge functions as “interior services”; they’re virtualized services that happen to be hosted deeper.  For my discussions I want to focus on the natural habitat of the service, not its hosted manifestation.  I also want to exclude, for a later blog, the topic of interior services of routing, switching, and other connection functions.

Operators who have looked at NFV-driven service evolution have identified a number of possibilities for interior services, which I’ll list in order of operator mentions:

  1. Mobility, meaning IMS and EPC. This is the runaway winner in terms of interest.
  2. Content delivery networks (CDN) and the associated elements.
  3. “Distributed” network features such as load balancing and application delivery control.
  4. Hosting services, including web, email, collaboration, and cloud computing.

While there’s widespread interest in the mobility category of interior NFV targets, operators confess that there’s still a lot of confusion in terms of the “what” and the “why”.  Operators have tended to divide into three groups by their primary goal.  The largest is the “agility, elasticity, and cost” group that’s targeting a virtualized replacement of fixed IMS/EPC elements.  This group isn’t seeing a major change in the conceptual structure of IMS/EPC, only elastic hosting of RAN, mobility management elements, HSS elements, and EPC components.  The second “re-architect” group would like to see all the mobility elements redone architecturally based on current insights and current technology.  The final group, “applications”, wants to focus on extending mobile services with applications and look to NFV to host the application components and integrate them with IMS/EPC.

Perhaps the most interesting of the “re-architect” concepts is framing the whole notion of EPC in SDN terms.  Tunneling in the classic EPC sense may not be needed if you can control forwarding explicitly.  In addition, if you presume that EPC components like PGWs and SGWs are virtual and cloud-hosted, might they not end up being adjacent for all practical purposes?  If so what’s the value of having them as separate entities?

The content delivery area is less conflicted but still somewhat diverse in interests.  It divides into two segments; operators primarily worried about mobile content delivery and operators looking for IP video offerings primarily as home viewing offerings.  The first group is blending CDN and IMS/EPC virtualization into a single goal, and the second is looking at CDN and perhaps set-top box virtualization.

TV Everywhere is the primary focus of the first group, obviously.  Operators who have multichannel video services are very interested in leveraging their assets, and those who don’t are looking perhaps to partnership relationships with others who do.  A CDN, IMS-like user certification, and a strong virtual EPC could make TV Everywhere, well, go “everywhere.”

The distributed network features interests are the least focused, obviously in no small part because the category itself is so diffuse.  For all the specific feature targets there are two sub-classes—a group who want to architect the concept for the virtual age and a group that simply wants to virtual-host elements.

Some distributed features, like DHCP and DNS, already exist and would likely be simply virtualized and made scalable in some form.  But in both cases it would be helpful to explore how scaling for performance would be handled given that both DHCP and DNS servers are updated.  Other features like load balancing and application performance management (“higher OSI layer” services) really have to be re-architected to be optimally useful when they’re employed in the cloud, particularly when they’re in front of a scalable community of components.

Many operators believe that the logical structure of these higher-layer, traffic-and-application-aware services is a distributed DPI capability.  Most don’t see DPI being added to every trunk, but rather see it as something that would be hosted in multiple data centers and then used in a cooperative system of forwarding management.  Some believe that DPI would also be coordinated with DNS/DHCP services, particularly for load balancing.

The last category of interior features are the hosted elements of today—things like web servers, mail, SMS/IM, collaboration, etc.  Even cloud computing services could fall into this category.  You might find it surprising that this item is last on the list, but the fact is that these services are typically not provided by the network (regulated) entity and most NFV planners come from that side of the house.

The questions that these services raise in the network sense are only starting to emerge.  If you assume that the hosted element/service is either “on the Internet” or on a VPN/VLAN, then you could assume that the service might simply be made visible in the proper address space.  However, some operators are looking at things like “bicameral email” where an email server has a VPN presence and a public SMTP presence and a security element between.  Here we have something live in both worlds—Internet and VPN.

That model is of growing interest for hosted elements and applications.  I’ve mentioned the idea of having “application-specific VPNs” created through SDN where every application or class of application has its own subnet, and where users are connected onto subnets based on what they’re allowed to access.  This same approach, meaning the creation of what are essentially service-specific subnetworks, would also be valid for NaaS based on SDN technology, and some technology planners in the operator world think that some sort of SDN/NFV hybrid of connectivity and features would likely represent what all future services might look like.

This model could unite the edge and interior visions.  It might also be the key to creating a useful mobile services model, one that can exploit things like IoT, location, social context, and so forth.  That may seem counter-intuitive, so let’s look a bit at why that’s true.

If you think about NFV services, and in particular the notion of “interior” models, what you see is the notion of a service that’s almost everything-independent.  It lives out there on its own hosted by some set of network-linked systems and represents connectivity or other experiences that a buyer finds valuable.  The service is delivered by making an edge connection.  We’d think of that connection as being one of a number of fixed-branch-site access pipes because that’s the typical VPN or VLAN model, but it would be perfectly feasible to envision a service as a kind of “meet-me” where users access a service on-ramp and obtain the service when needed.

That model suits mobility in two ways—first it is easily adapted to a roaming user, and second it adapts to the notion of a “personal assistant” or mobile-agency service models.  It also suits the notion of a universal dmarc with NaaS on-ramps ready to be connected when needed, and with NFV-based services also there to be accessed as an application/service-specific subnetwork.  In fact, if the model were in place, users would have a virtual persona in the network that they could contact from any device, and obtain the services they count on.

One interesting question which (like many, I confess) I don’t have a good answer to is whether edge and interior models of NFV services can grow naturally together to support this symbiotic model.  So far, trials and use cases for both SDN and NFV are fairly contained and so something like this would be a major change of approach.  But without a vehicle that moves both edge and interior forward, we might end up missing the change to help both missions, operators and vendors, and the market overall.