Could SD-WAN Be the Most Disruptive Network Technology?

There are a lot of network options out there, and while it’s fun to talk about them in technology terms there may be a more fundamental issue to address.  Network services have always involved multiple layers, and it’s always been convenient to think of infrastructure as being either “connection” or “transport”.  In the classic services of today (Ethernet and IP) and in the venerable OSI model, these two things are both in the operator domain.  That might change, and we might see some (and just maybe, all) services migrate to a more user-hosted connection model.  Which, in turn, could change the network service market dynamic.  And all because of SD-WAN.

Networks provide information transport and connectivity, meaning that they can move bits and address endpoints.  The old conceptualization of network service was simple—everything that consumed a service had an address in the service address space.  It’s not that simple any more, for a variety of reasons.  Nearly every Internet user employs Network Address Translation (NAT) to allow all the home devices to use the Internet without requiring they all have their own unique IP addresses.  Nearly every enterprise employs virtual private networks (VPNs) or LANs (VLANs) because they don’t want their company addresses to be visible on the Internet.

An even broader and more interesting idea is also an old one, which is “tunnel networking”.  If you use traditional “network services” like the Internet or Ethernet for transport, you could build tunnels using some protocol (MPLS, L2TP, PPTP…you name it) and treat these tunnels as though they were wires, my “virtual wire” concept.  That means you could build a connection network on top of a network service, providing “routing” or “switching” across your tunnels just as you might once have done with real private line connections.

Nicira took this a step forward by bundling the “tunnel” protocol and the “routing” processes into a software package and calling it software-defined networking.  The model is incredibly powerful in virtualization, cloud, and NFV applications because it lets you build a bunch of tenant/application networks in parallel and share the real infrastructure among them.

In a broad sense (which isn’t always how vendors present it), Software-Defined WAN (SD-WAN) arguably uses this same model but in one or more different ways.  With SD-WAN the goal may be to collect sites/users onto a single “virtual network” when there is no single common physical network service available to do that.  You could combine people on the Internet with those who had a private VPN or even a VLAN connection.  In some cases, you might create a virtual connection by building multiple parallel tunnels (over different networks or even over the same one, but with diverse routing) and combining them.

Finally, the Metro Ethernet Forum has proposed its “Third Network” model, which not surprisingly makes Ethernet connections the physical framework of networking and builds other services by creating some form of overlay network—back to tunnels or virtual paths.  Unlike the other approaches, the MEF model is an inside-out or operator-driven vision, a way of creating infrastructure that takes the most dynamic aspect of networking (connectivity) out of the hands of traditional technology.

If we leave the Internet aside for the moment, it’s easy to see that we could move all current network services to an overlay approach.  The user could be given a choice of hosting their own router/switch elements (as devices, as software, as VNFs), buying internal-to-the-real-network instances from operators, or both.  We could create VPNs that would look as they do today, but that didn’t require Level 3 services from operators at all.

All of this seems part of a broad trend toward the separation of function and structure.  Oracle just announced a version of its public cloud software designed to be inserted into customers’ own data centers to bypass problems (real regulatory or policy problems, or just executive resistance) to moving key applications into the public cloud.  This frames the notion of “public cloud” not as a business model but as a service technology layer that could then ride on whatever infrastructure technology is optimum.

This division, which corresponds to my last phase of virtual network evolution, is interesting because it could come about both through the actions of network buyers, at least for enterprise services, and through the action of the network operators.  If service-layer technology is a relatively inexpensive overlay rather than an expensive collection of devices, then the operators might indeed want to promote it.  If operators were to deploy virtual-wire technology in SDN or other form (including the MEF’s “third network”) then it would promote the service/infrastructure dualism.

Enterprise buyers could do this on their own, and SD-WAN concepts lead in that direction.  The notion of multi-infrastructure service is at the least a path to infrastructure-independent services, and some implementations (Silver Peak, for example) are explicitly dualistic (or multiplistic) in terms of what they can run on.  These bridge across infrastructures with a service, so they could be used by enterprises to create something like the MEF’s third-network vision even if the operators who are the intended target somehow don’t see the light.

Another force that could influence a move to an overlay model is the managed service provider market.  We already know from NFV experience that MSPs are a growing force, largely because they address a market segment that needs networking but can’t retain (or afford) the skilled labor needed to run one on their own.  In NFV, MSPs have been able to lead the market for vCPE services because their value proposition is to substitute service-less technology for that which requires internal support.  The same thing could happen with overlay services.

If we look at things this way, then the SD-WAN space could be the most disruptive service technology out there.  It could transform the network model, work both from the supply side and demand side, and it’s currently largely driven by startups or companies who aren’t part of the L2/L3 mainstream who have a vested interest in keeping things as they are.  Since the overlay model favors SDN and fiber, it might be the perfect match for a player like ADVA, Ciena, or Infinera, and all of these companies have the technology to promote the notion.  We’ll see if they do.

How to Make SDN and NFV About Zeros Instead of Nines

We chase a lot of phantoms in the tech space, none as dangerous as the old “five-nines” paradigm.  Everyone obsesses about getting reliability/availability up to the standards of TDM.  That’s not going to happen unless we don’t do the kind of network transformation we’re talking about.  Five-nines is too expensive to meet, and we don’t have it anyway with mobile services.  What we have to worry about isn’t too few nines, but too many zeros.

Telstra, the Australian telecom giant, has been hammered in the press there for having multiple major outages in just a few days, outages where the number of customers with service was zero.  To me this proves that SDN, NFV, cloud, or other network technology evolutions are going to be judged by customers not by the number of dropped calls or video glitches (who doesn’t see those regularly?) but by the number of no-service periods and the number of impacted consumers.  That’s a whole different game.

The overall effect of our proposed network infrastructure changes would be to centralize and software-ize more things, to move away from the adaptive and unpredictable to the more manageable and from the proprietary and expensive to the commodity.  All of this is being proposed to reduce costs, so it’s ridiculous to think that operators would then engineer in that old five-nines standard.  Packet networks in general, and centralized-and-software networks in particular, are not going to meet that except by over-engineering that compromises the whole business case for change.  That’s not a problem, but what is a problem is the fact that the five-nines debate has covered up the question of the major outage.

One of my enterprise consulting engagements of the past involved a major healthcare company who had a “simple” network problem of delayed packets during periods of congestion.  The problem was that the protocol involved was very sensitive to delay, and when confronted by a delay of more than a couple of seconds these protocols tended to let the endpoints get out of synchronization.  These endpoints then reset, which took down the device/user and forced a restart and recovery process—which generated more packets and created more delay.  What ended up happening is that over ten thousand users, everyone in the whole medical complex, lost service and the vendor could not get it back.  They limped along for days until I showed them that it would be better to drop the packets than delay them.  One simple change and it worked again.

Think now of a central control or management process.  It’s doing its thing, and perhaps there’s a trunk problem or a problem with a data center, and a bunch of VNFs or SDN switches fail.  The controller/lifecycle manager now has to recover them.  The recovery takes resources, which creates a waiting list of service incidents to address, which leaves more switches or VNFs disconnected, which creates more failures…you can see where this goes.

There are quite a few “zero-service-creating” conditions in emerging models of the network.  There are also some pretty profound recovery questions.  If an SDN controller has to talk to switches to change forwarding tables, what happens when the failure breaks the switch-to-controller linkage?  If an NFV domain is being lifecycle-managed by a set of processes, what happens if they get cut off from what they manage?

I’m not a fan of adaptive device behavior as a means of addressing problems, and I’m not proposing that we engineer in the five-nines I’ve been pooh-poohing above.  What I think is clear is that we’ve left out an important concept in our advances in network technology, which is the notion of multi-planar infrastructure.  In the old days we had a control/signaling plane and a data plane.  With SDN and NFV we need to reaffirm these two planes, and add in the notion of a management plane because of lifecycle management dependencies.  The control/signaling plane and the management plane, and the processes that live there, do have to be five-nines or maybe even more, because if they are not there’s a risk that a failure will cascade into an explosion of problems that will swamp remediation by swamping or breaking the signaling/management connectivity.  Then we’re in zero-land.

We don’t really have an explicit notion of signaling/control and management planes in either SDN or NFV.  In SDN, we don’t know whether it would be possible to build a network that didn’t expose operators to having large chunks cut off from the controller.  In NFV we don’t know whether we can build a service whose signal/control/management components can’t be hacked.  We haven’t addressed the question of authenticating and hardening control exchanges.  Financial institutions do multi-phase commit and fail-safe transaction processing, but we haven’t introduced those requirements into the control/management exchanges of SDN or NFV.

What do we have to do?  Here are some basic rules:

  1. Management and control processes have to be horizontally scalable themselves, and the hardest part of that is being able to somehow prevent collision when several of the instances of the processes try to change the network at the same time. See my last point below.
  2. Every management/control connection must live on a virtual network that is isolated and highly reliable, not subject to problems with hacking or cross-talk resource competition from the data plane. This network has to connect the instances of management/control processes as they expand and contract with load.
  3. Every control/management transaction has to be journaled, signed for authenticity, and timestamped for action, so we know when we’ve gotten behind and we know how to treat situations when a resource reports a problem and requests help, and then for a protracted period hears nothing from its control/management process.
  4. There can never be multiple restoration/management processes running at the same instant on the same resources. One process has to own remediation and coordinate with other processes who need it.

There are two general ways of doing that which is needed.  One is to approach the problem as one of redundant centralization, meaning you stay with the essential SDN/NFV model.  The other is to assume that you really don’t have centralized lifecycle management at all, but rather a form of distributed management.  It’s this second option that needs to be explored a big given that the path to the first has already been noted—you apply OLTP principles to SDN/NFV management and control.

If you’re going to distribute lifecycle management, you have two high-level options too.  One is to divide the network infrastructure into a series of control domains and let each of these domains manage the service elements that fall inside them.  The other is to forget “service lifecycles” for the most part, and manage the resource pools against a policy-set SLA that, if met, would then essentially guarantee that the services themselves were meeting their own SLAs.

A resource-management approach doesn’t eliminate the need for management/control, since presumably at least some of the resource-remediation processes would fail and require some SLA-escalation process at the least.  It could, however, reduce the lifecycle events that a service element had to address, and the chances that any lifecycle steps would actually require changes to infrastructure.  That could mitigate the difficulties of implementing centralized management and control by limiting what you’re actually managing and controlling.

The forget-lifecycles approach says that you use capacity planning to keep resources ahead of service needs, and you then manage resources against the capacity plan.  Services dip into an anonymous pool of resources and if something breaks you let resource-level management replace it.  Only if that can’t be done do you report a problem at the service level.

Some services demand the second approach, including most consumer services, but I think that in the end a hierarchy of management is the best idea.  My own notion was to assign management processes at the service model level, with each object in the model capable of managing what happens at its own level, and with each object potentially assignable to its own independently hosted management process.  It’s not the only way to do this—you can apply generalized policy-dissemination-and-control mechanisms too.  But I think that we’re going to end up with a hierarchy of management for SDN and NFV, and that working toward that goal explicitly would help both technologies advance.

Looking at the New Cisco (and the Reason Behind It)

Cisco’s apparent reorganization and retargeting of its strategic initiatives is certainly newsworthy, given Cisco’s position both as the premier provider of IP technology and its UCS data center systems.  It’s also newsworthy that Cisco is clearly emphasizing the cloud and IoT over traditional networking missions.  The question is what these newsworthy events mean in the context of the markets, and by association whether Cisco is behaving in a strategic way or just trying a different tactic.

For a decade, Cisco’s implicit story to both enterprises and service provider was simple—more traffic is coming so suck it up and spend on more capacity to carry it.  Forget ROI, forget benefits, even forget costs.  Traffic means transport, period.  As stupid as this may sound when stated as baldly as I’ve stated it, the message has value because it’s simple.  The implicit question being asked of network buyers is “What traffic will you elect not to carry as traffic growth continues?”  That’s a tough question to answer.

I think that two factors have convinced Cisco that their old story was too old for the market.  One is the increased resistance to network spending; the best proof that something is a bad approach is that it’s not working.  The other is the explosion of interest in both SDN and NFV.  Cisco has worked pretty effectively to blunt the immediate impact of both technologies, but in the end they prove that buyers are willing to look outside the box to control cost or improve network ROI.

If all we had to deal with was increased cost pressure, then a lot of Cisco’s old cost-management steps would be enough.  You cut internal cost and waste by consolidating groups who duplicated effort.  You improve cost-efficiency of devices with better semiconductor support.  Those steps have been taken, and they’ve borne some fruit for Cisco, but they apparently have not been enough to future-proof Cisco’s financials.  Thus, we come to the current Cisco moves.

NFV and SDN show that networking in the future will incorporate a lot more hosted functionality rather than just devices.  This isn’t the same thing as saying that the cloud eats the world, a statement attributed to a VC in a news story yesterday.  It’s saying that it’s better to value the functional components of devices than the hardware they run on.  It’s saying not to compete in areas you can’t be competitive, and that are price-driven and margin-less in any event.

If you build a router as a device, your differentiation is largely your software features and anything special you might have in semiconductors to handle the networking-specific tasks.  You don’t then want to have to spend a lot to build a hardware platform that’s not going to be cost-competitive with commodity server platforms just to hold the good stuff.  The long-term direction Cisco is likely plotting is to divide network equipment into two categories—one where the mission justifies customized semiconductor hardware for real feature vale, and one where it does not.  You focus then on devices for the former mission, and for the latter you shift to a hosted model.  Modular software/hardware pricing is clearly a step toward this goal, as is a lot of Cisco’s recent M&A.

Security, provider edge, and enterprise networking in general seem to fall into the second of these two categories, meaning that the real value is features and the best way to deliver those features is through a software license.  This is a big shift for Cisco by itself, but one dictated by market direction.  If anyone is going to offer software-license networking, then the cat’s out of the bag and Cisco has to respond.

The fact that all of enterprise networking does is why Cisco would incorporate something that’s clearly a provider concept—NFV—into their “DNA” (Digital Network Architecture) announcement.  They need a broad model for the enterprise because the enterprise is likely to be where the impact of the transition from device-resident features to hosted features happens first, and most broadly.

There’s nothing wrong with Cisco categorizing the hosting part as “cloud”, nor of course with the media either accepting Cisco’s slant or adopting it on their own.  Most hosting of network features on servers will involve resource pools, and so the cloud label is fair.  It’s also true, despite some of Cisco’s own marketing messages, that IoT if it succeeds will be more a success for the cloud than for connected devices or traffic.  However, it’s important to go back to that “cloud eats the world” comment, because while the label is fine an underlying presumption of cloud-eating-world could be a major problem for Cisco.

Enterprises will probably not be hosting their network features in the public cloud.  Most of their mission-critical applications won’t be leaving the data center, in fact.  If we apply the term “hybrid cloud” to the hybridization of public and private cloud deployments, we probably have an issue.  If we apply the term to public cloud and private IT, then we’re fine.  Cisco’s risk with DNA is not that they don’t support “the cloud” but that they rely too much on cloud-specific internal IT trends.  Way more servers in enterprise data centers will be running virtualization than private cloud.  Dell/VMware, remember, is the king of enterprise virtualization.  Cisco could position itself into a relative backwater of IT evolution if they’re not careful.

They may also have a risk in pursuing an enterprise-centric vision of hosted functionality rather than a more literal implementation of the formal service provider initiatives like SDN and NFV.  If operators do adopt NFV widely and/or accept and support a cloud-centric IoT vision, recall that my model says it would generate over 100,000 incremental data centers worldwide.  That would be an enormous market to miss out on, and a lot of what operators want in NFV would not be present in an enterprise version of NFV—like OSS/BSS integration.

To my mind, the big question with Cisco’s “cloudification” of strategy is the area of external process integration with cloud-feature lifecycle processes.  If lifecycle management manages only the lifecycle of the new stuff, then it won’t integrate into enterprise IT operations any better than it would into service provider OSS/BSS.  We don’t know at this point just how much external process integration Cisco has in mind, either in DNA or in future service-provider-directed offerings.  If it’s a lot, then Cisco is truly committed to a software transformation.  If it’s only a little, then Cisco may still have its heart in the box age of networking, and that would negate any organizational or positioning changes it announces that pledge allegiance to the cloud.

We also don’t know whether Cisco’s DNA-enterprise vision of NFV will be able to support service-provider goals, and depending on the pace of development of NFV and IoT, those goals may mature faster than the enterprises’ own connectivity needs.  The service provider space is also driving the modern conception of intent modeling and data-driven application architectures, things I think will percolate over into the enterprise eventually.  Can Cisco lead, or even be a player, in these critical technologies if they lag with the market segment that’s driving them?

Cisco is changing, there’s no question about that.  There’s no question the reason is that Cisco knows it has to change, and change quickly.  The only question is whether a company that’s prided itself on being a “fast follower” can transition into being a leader.  It’s very difficult for market leaders to lead market change.

The Evolution of the Metro Network and DCI

What exactly is the data center interconnect (DCI) opportunity, and how does it relate to cloud computing, SDN, NFV, and the cloud?  That’s a question that should be asked, isn’t asked often, and might have a significant impact on the way we build the most important part of our network.  Obviously it’s going to have an impact on vendors too.

Microsoft kicked this off by announcing an optical partnership with Inphi that would allow direct optical coupling of data center switches, eliminating the need for a separate device.  The result, according to Wall Street, was bad for Infinera because DCI was considered a sweet spot for them.  Infinera isn’t an enormous company so it’s reasonable to expect that any alternative to the Infinera DCI approach would be a threat.  It’s also reasonable to say that direct optical connection between data center switches could cut out a lot of cost and disrupt the DCI market.  Does that matter?

DCI doesn’t make up more than about 7% of metro optical deployment today, and while it’s a growth market in terms of enterprise services, the growth doesn’t compare with what could be expected (under the right circumstances) from other opportunities.  Metro networking is the hotbed of change, not only for optical equipment but also for new technologies like SDN and NFV, and for new services like cloud computing.

Let’s start with enterprise.  We have today about 500 thousand true enterprise data center locations worldwide.  As these businesses migrate to hybrid cloud (which the great majority will do) about 40% of them will want to have cloud-to-data-center connectivity at a bandwidth level that justifies fiber connectivity.  Not only does that generate about 200k connections, it also encourages public cloud providers to distribute their own data centers to be accessible to enterprises in at least the major metro areas.  My model says that we’d see a major-metro distribution of over 1,500 cloud provider data centers generated from hybrid cloud interest.  All of these get connected to those 200k enterprise data centers, but they also get linked with each other.

Mobile broadband is another opportunity, by far the biggest already and still growing.  Today it accounts for about 55% of all metro fiber deployment, and that number is going to increase to over 65% by 2020.  In technology evolution terms, more and more of mobile fiber deployment is associated with video caching, which is a server activity.  This is shifting the focus of mobile-metro from aggregation to content delivery, from trunking to connecting cloud data centers.

Service provider shifts to NFV would generate an enormous opportunity.  Globally, as I’ve said before, it could add over a hundred thousand data centers, and all of the centers within a given metro area would likely be linked to create a single resource pool.  My model says that this could generate well over 3 million links globally, which is a lot of DCI.  The question, of course, is just how far NFV will deploy given the lack of a broad systemic vision for its deployment.

IoT and contextual services would help drive metro data center deployments, and my model says that the same traffic and activity factors that would justify the 100k NFV data centers are accented by IoT and contextual services.  Thus, the two drivers would tend to combine to create a large distributed pool of servers, the very kind of thing that would be likely to need more interconnection.  That means that our 3 million links would, under the influence of IoT/contextual evolution, rise to nearly 4.5 million.

Ironically, metro shares NFV’s lack of an ecosystemic focus.  While all the drivers of metro deployment may be evolving toward connecting data centers, the largest driver (mobile broadband) is not only still focused on traffic aggregation, it’s often planned and funded independently.  The very fact that everything converges on metro means that metro is responding not only to different mission pressures (converging or not) but also different administrative/business goals.

Still, there are some things we can see as common metro requirements.  Even DCI missions are surely going to consume multiple optical pathways, and physical meshing of large numbers of data centers would be expensive in terms of laying glass.  A smarter strategy would be to groom wavelengths (lambdas) optically so that you could create low-latency-and-cost pathways that hopped from fiber strand to fiber strand as they transited a metro area.  We’re nearly at the point where DCI could justify this approach.

The “virtual wire” model that I’ve been talking about as a union of optical and SDN-forwarding elements at the electrical layer would evolve out of this approach, and could also in theory drive it.  At the end of the day, lambdas versus SDN paths is a matter of transport bandwidth utilization and relative cost.  You could see an evolution from SDN-virtual-wire to lambda-virtual-wire as traffic increased, or you might see one or more of my previously cited drivers develop fast enough to skip the SDN stage.

The deciding factor here isn’t going to be DCI but metro broadband.  We’re investing a lot of money in metro broadband backhaul already, and 5G and content delivery will increase this.  If operators build out backhaul quickly then they’ll commit to infrastructure decisions before services are highly data-center-focused, and we’ll probably see more SDN deployment besides.  This would combine to keep DCI and mobile broadband from coalescing into a common technical model.  If we were to see CDN focus or one of the other drivers develop, we might see DCI win it all.

That’s really why near-term wins and losses in the DCI space are important.  We may not see any of the drivers I’ve cited develop quickly, and we may then see DCI remain a specialized enterprise-service or IaaS-public-cloud-infrastructure element.  We might also see cloud data center deployment explode, DCI links explode even more, and the whole scheme displace all other metro applications.  That would be huge for operators, and for vendors.

Optical vendors, or at least pure ones, could do little to drive things here because all the compelling drivers of metro evolution lie in accelerating network operator use of the cloud—things like NFV and IoT.  What they could do, technically, is to focus on creating lambda-virtual-wire options and means of evolving to them from SDN-virtual-wire.  Some will see that as being dependent on the optical standards of OpenFlow from the ONF, but we should know by now that low-level specifications aren’t going to drive market transformation.  It may be that a generalized virtual-wire model that works for optics and SDN, and that can be operationalized in concert with all the evolving drivers of data center deployment, are what will win the day, for somebody.

Can We Fix the Technology, Vendor, and Service Silos Growing in NFV?

“E Pluribus Unum” is the motto found on US coins and it means “From Many, One”.  Will that work for NFV?  We may have to find out, because it seems that instead of converging on specifications NFV is flying off in a bunch of different directions.  Is that appearance reality, and if so what’s going to happen?

It is true that there’s more difference developing in NFV than commonality.  Just the fact that we have two open-source orchestration initiatives is interesting enough, but we also have the AT&T ECOMP project, and today HPE, Anuta Networks and Logicalis announced a partnership deal that will link HPE NFV Infrastructure with Anuta orchestration.  HPE, you may recall, has its own orchestration product, and in fact it’s one of the six that’s actually complete.  We also have OPNFV, which might end up doing yet another open-source implementation.

There is one ETSI specification for NFV, and at the detail level it’s fair to say that it’s been weighed and found wanting.  The ETSI spec issue is the core of the explosion of implementations and approaches.  If we had a spec that everyone agreed could make the business case at least close to optimally, I think it would have converged a lot of stuff on a single approach.  Since it doesn’t what we’re seeing is a combination of competitive opportunism and special-interest focusing of efforts.

You can’t differentiate in a service provider market by abandoning an accepted standard, but if there are major omissions then those will attract differences in approach simply to help sales people make their own strategy stand out.  That’s why, with six implementations of NFV that can make the business case, we have minimal compatibility among them.  Since we’ve now embarked on competitive approaches to the NFV problem, even harmony at the spec level wouldn’t be likely to drive convergence in approaches.   And we’re not going to get specs that will generate harmony any time in the next year.

The target factor is more complicated and more pervasive.  We do not, at this point, have anybody who’s presenting operators with an NFV story that would justify wholesale adoption of NFV.  Even the vendors who can make the business case for NFV aren’t doing that on a broad scale because such a deal would take a lot of sales cycles, during which nobody would make quota.  As in all new technologies, though, there are places where NFV can be made to fit more easily.  Not only does focusing on one of these things shorten the selling cycle for those vendors who could do the whole thing, it admits a bunch of other vendors who can’t do it all but who can cobble a service-specific story together.

Given the number of issues we’ve already uncovered in NFV with integration at the infrastructure, function on-boarding, and management levels, it might seem that this explosion of service-specific NFV silos would only add to the angst.  Actually it might help things.

NFV has from the first been a multi-level structure built from abstractions representing service features or behaviors, whether vendors have optimized that principle in implementation or not.  This structural model predates NFV, going back to the days of IPsphere and the TMF’s Service Delivery Framework, and even its SID modeling.  Each of the abstractions is essentially a black box, opaque.

One property of a black-box abstraction it that it can contain a bunch of other abstractions.  That means that a “service” created by one vendor’s NFV solution would contain within it a lot of low-level models that eventually ended up with stuff that represents deployment of resources.  From the top, or the outside, it would still look like a single block in a diagram.   It should be possible, then, to represent the product of silo deployments in a way that allows it to be incorporated as an element in another silo.  Yes, you’d have to jiggle the APIs or data formats, perhaps, but it is possible.

This issue is the one I’ve called “federation” in past blogs, and it could go a long way toward not only eliminating the lock-ins and other optimality issues associated with silo deployment of NFV, it could also help make the broad NFV business case.  Best of all, it could encourage vendors to think more in terms of integration at a level where the results could actually be applied.

We are not going to fix the sins of omission in the NFV specification process in time to do much good.  We are not going to be able to harmonize the integration of VNFs or even the portability of resources with the current open-source activities either, at least not in time to address operator concerns about profit per bit.  We are going to continue to have low-apple, service-confined, NFV deployments because that’s the only way to balance risk and reward at this point.  We need to focus elsewhere.

The NFV ISG could prioritize effective “service object federation” right now.  The various open-source groups could do likewise.  A third body could undertake the task.  Any way it happens, having a way to link the service-based silos and implementation enclaves that we’re building would help to generate a broad solution.  The problem is that none of these ways are going to move fast enough.

For vendors, there would be an obvious advantage that accrues to those who could absorb other silo objects into their own service models.  That benefit could be the thing that moves vendors to do more on their own here.  Make no mistake; nothing is going to save NFV’s ability to drive better profit per bit except vendor-driven solutions.  Which means that we can only build more silos.  Which means we have to address silo-binding right now before we collapse our emerging NFV house of cards.

Any intent-modeled structure is self-federating, meaning that since the stuff inside a model element is invisible, it could just as easily have been created by some other service group or even another operator.  Operators who have been pushing for intent-modeled NFV could, if they push with a bit more emphasis on federation, force vendors to accommodate that silo-binding mission without much additional effort.

Of course, operators themselves have had tunnel vision with NFV.  Vendors have worried more about protecting any advantage they gain from early participation in silo-building deployments than building the broad business case by creating an NFV ecosystem.  Some, I think, are starting to realize that there’s an issue here, and I hope these players will lead the charge to change.

 

The Latest in the New-Service Modeling Game

Modeling is critical for both SDN and NFV, and I’ve blogged on the topic a number of times.  In particular, I’ve been focusing on “intent models” as a general topic and on the OASIS TOSCA (Topology and Orchestration Specification for Cloud Applications) in particular.  The two, as we’ll see, have a relationship.

A recent set of comments on LinkedIn opened a discussion of models that I think is important enough to share here.  Chris Lauwers of Ubicity offered his view of the modeling world: “TOSCA models are ‘declarative’ in that they describe WHAT it is you’re trying to provision (as opposed to ‘prescriptive’ models that describe HOW you’re going to get there). ONF uses the term ‘intent’ to describe the same concept, and yet others refer to these types of models as ‘desired-state’ models. While there are subtle nuances between these terms, they all effectively describe the same idea. As far as I know, TOSCA is currently the only standard for declarative models which is why it is gaining so much traction. One of the main benefits of TOSCA is that it provides abstraction, which reduces complexity tremendously for the service designer. TOSCA also use the same declarative approach to model services/applications as well as infrastructure, which eliminates the types of artificial boundaries one often sees in various industry ‘reference models’.”

This is great stuff, great insight.  The modeling distinction between prescriptive and declarative is an attribute of the first DevOps tools, and today’s most popular DevOps tools (Chef, which is prescriptive versus Puppet, which is declarative) still reflect the division.  When you apply the terms to SDN and NFV, there are factors other than DevOps roots that come into play, and these explain the industry shift (IMHO) toward intent modeling.  They also (again, IMHO) illustrate the balance that network/service modeling has to strike.

A prescriptive model, in the context of SDN or NFV, would describe how you do something like provision.  It’s easy to create one of these in the sense that the prescription is a reflection of manual processes or steps that would be needed.  Any time you create a new service or element, you can create a prescriptive model by replicating those manual steps.

Declarative network/service models, in contrast, reflect the functional goal—Chris’s “What?” element.  I’m trying to build a VPN—that’s a declaration.  Because the intent is what’s modeled, this is now often called “intent modeling”.  With intent modeling you can create a model without even knowing how you build it—the model represents a true abstraction.  But to use the model, you have to create a recipe for the instantiation.

Prescription equals recipe.  Declarative equals picture-of-the-dish.  The value proposition here is related to this point.  If you were planning a banquet you might like to know what you’re serving and how to arrange the courses before you go to the trouble of working out how to make each dish.  You might also like to be able to visualize two or three versions of a dish, perhaps one that’s gluten-free and another lactose-free, but set up the menu and presentation and change the execution to reflect a diner’s preference.

This is the reason why I think intent modeling and not prescriptive modeling is the right approach to SDN and NFV, and in fact for service modeling overall.  You should be able to manipulate functional abstractions for as long as possible, and resort to the details only where you commit or manage resources.  You should also be able to represent any element of a service (including access) and also represent what I’ll call “resource behaviors” like VPN or VPLS, then combine them in an easy function-feature-assembly way to create something to sell.

What’s messed up a lot of vendors/developers on the intent-declarative approach is the notion that the dissection of intent into deployment has to be authored into the software.  A widget object, in other words, is decomposed because the software knows what a widget is.  That’s clearly impractical because it forecloses future service creation without software development.  But declarative models don’t have to be decomposed this way, you can do the decomposition in terms of the model itself.  The recipe for instantiating a given model, then, is included in the model itself.

This can be carried even further.  A given intent-model represents a piece of a service, right?  It can be deployed, scaled, decommissioned, managed as a unit, right?  OK, then, it has lifecycle processes of its own.  Those processes can be linked to service events within the intent model, so you have a recipe for deployment, sure, but also a recipe for everything else in the lifecycle.  If we them presume that objects adjacent to a given object can receive and generate service events to/from it, we now can synchronize lifecycle processes through the whole structure, across all the intent-modeled elements.

You can even express the framework in which VNFs run as an intent model.  Most network software thinks it’s running in either an IP subnet or simply on a LAN.  The portals to the outside world are no different from VPN service points in terms of specifications.  Your management interfaces on the VNFs can be connected to management ports in this model, so essentially you’re composing the configuration/connection of elements as part of a model, meaning automatically.

Everything related to an element of a service, meaning every intent element model in a service model (“VPN”, “EthernetAccess”, etc.) can be represented in the model itself, as a lifecycle process.  You can do ordering, pass parameters, instantiate things, connect things, spawn sub-models, federate services to other implementations across administrative boundaries, whatever you like.  That includes references to outside processes like OSS/BSS/NMS because, in a data model recipe, everything is outside.  The model completely defines how it’s supposed to be processed, which means that you can extend the modeling simply by building a new model with new references.

This is what makes intent-model integration so easy.  Instead of having to write complex management elements specialized to a virtual network function, and then integrating them with the VNFs when you deploy, you simply define a handler that’s designed to connect to the VNF’s own management interfaces and reference that in the model.  If you have a dozen different “firewalls” you can have one intent-model-object to represent the high-level classification, and that decomposes into the specific implementation needed.

Another interesting thing about intent modeling is that it makes it easier to mix implementations.  Intent models discourage tight coupling between lower and higher levels in a structure, and that in turn means that a model in one implementation can be adapted for use in another.  You could in theory cross silo boundaries more easily, and this could be important for both SDN and NFV because neither standard has developed a full operations vision, which means their benefit cases are likely to be more specialized.  That promotes service-specific deployment and silos.

Even among the vendors who can make a broad operations business case that couples downward to the infrastructure, talking about intent models is rare and detailed explanations of strategy are largely absent.  I hope we see more modeling talk, because we’re getting to the point where the model is the message that matters.

 

It’s Time to Take a New Look at Transformation (Because We’ve Messed Up the Old Way)

If the biggest question asked by kids in a car is “Are we there yet”, it’s a question that could be applied with a small modification to our network transformation activity.  My suggestion is “Are we going to get there?”  The turn of the tide in publicity on NFV and even SDN is clear; we now see a lot of quotes about failures to meet business cases or issues with operations costs or agility.  I’ve suggested that it may be time to jump off the NFV and SDN horse in favor of something like CORD (Central Office Re-architected as a Datacenter) just to get some good ink.  What’s really happening with the SDN and NFV opportunity.

I ran my models over the last couple days, and interestingly enough the forecasts of the pace of benefit realization from SDN and NFV haven’t changed materially over the run last quarter.  So if somebody asks you whether SDN or NFV have failed, they have not—providing we judge them by my numbers on their realization of benefits.  They’re still on track.  One thing happening now is that my numbers, which were modest to the point of being unappealing to vendors, are now (without changing) becoming mainstream.

It’s the other thing that’s happening that’s important.  The impact of SDN and NFV on vendors is becoming more unfavorable, and that is probably the reason why we’re seeing a change in the tone of coverage on SDN and NFV.  There are more vendors than operators, more salespeople trying to make quota and disillusioned with their progress.  All that griping is coming to the surface.

The long-term impact of SDN and NFV, implemented properly, would be to realize a reduction of about a third in “process opex”, which equates to about ten cents per revenue dollar.  Right now, operators would be prepared to spend about six cents additional to secure the ten cents of benefits, so the opex improvements could increase capex by that much, which is more than 27%.  That’s an opportunity that would make the average sales management type salivate, and while these exact numbers aren’t usually shared or even known, it’s why SDN and NFV have been exciting to vendors.

When SDN and NFV first came along, it was generally expected that there would be a period of technology build-up during which costs would actually run ahead of benefits.  Operators know that “first cost” or pre-deployment in advance of benefits, is a given in their world.  As a result of this, my original forecast for 2015 and 2016 showed capex actually increasing due to SDN/NFV, and then tapering down after the positioning deployments were complete in 2018.  Obviously we didn’t get SDN or NFV in quantity in either year, but most significantly the SDN and NFV we did get was the “Lite” version that didn’t really change infrastructure much overall.

Had we gotten a strong SDN/NFV benefit story in 2014 we could have expected to see an improvement in operations efficiency and service agility two years earlier than the model now says we will.  That would have created enough opex/agility benefits in 2015 and beyond to largely eliminate pressure by operators to improve profits through capex reduction.  Since we didn’t get the benefits early enough, operators have embarked on a new approach that’s likely to reduce capex, and we’re seeing the signs of it now.

The biggest indicator is the interest in open-source.  Why do we now have four primary open-source activities associated with NFV?  Answer:  Because we didn’t get the right result through specs alone.  Operators have relied on vendors to support transformation, to field offerings that made business cases and to take the risks and absorb the R&D costs associated with transformation.  It’s harder to do that when a technology is expected to reduce spending, because those vendors are now being asked to invest in their future loss, not their future profit.  Operators now want to drive the bus for themselves, and they can’t do that in a standards body because the US and EU (for example) would then accuse them of anti-competitive collusion.  Open source offers a possible solution.

It also creates a pathway that further erodes benefits for the vendors.  Can you make money selling a full-spectrum NFV solution?  Yes, as long as somebody isn’t giving it away, and it’s very obvious that operators are advancing their open-source approach to the critical aspects of SDN and NFV—the high-level service operations and orchestration stuff that makes the business case.  If they succeed, then SDN and NFV will be commoditized where it matters most—where the story meets the budget.

This isn’t a slam dunk, though.  The SDN/NFV vendors could still pull their chestnuts out of the fire, providing they took some critical steps.  Open-source isn’t a rapid-response approach, and it may be years before the right solutions to all the critical problems are in place.  Right now, we don’t see an indicator that open-source projects already out there are addressing everything they need to address.

The first problem is opening the ecosystem.  With both SDN and NFV we have a new architecture that’s composed of a bunch of elements from different sources.  NFV is the most complicated—virtual functions, servers, managers, orchestrators, not to mention operations and management processes and lifecycle tools.  All of this has to fit into a defined structure, which we have yet to define.  What isn’t defined isn’t going to connect, and that means custom integration and substituting professional services costs for capex.  A vendor could create an open framework broad enough to fit all the elements, and systematize the building of the new network.

Every component of SDN and NFV has a place in an overall model.  There should be a set of APIs that represent all the connections, and these APIs should be defined well enough to translate into software without additional work.  Everything that has to connect to a given API set then has a blueprint for the changes that will be needed to support those connections.  Vendors could produce this, at least those with a fairly complete NFV implementation.

The second problem is educating the masses.  While it’s an exaggeration to say that vendors think all they need to do to sell SDN or NFV is to go to the operators’ headquarters with an order pad in hand, it’s not that much of one.  In the last month, almost two-thirds of vendor sales personnel who’ve emailed me on their activity say that the problem is the operator doesn’t understand the need for modernization.  When you dig in, you find that means that the vendor sales type doesn’t know how to make a business case, so they’d like the buyer to forget it and just transform on faith.

We need people to understand where NFV and SDN benefits come from and how they can be achieved.  When you say “capex reduction” how much money is really being spent on the equipment you’re impacting, as a percentage of total capex, and what realistic percentage of change can be expected?  If there are assumptions about economies of scale, how much will it cost to reach that scale?  If you’re proposing to reduce opex, what is it now and how is it made up?  You’d be amazed at the small number of vendors who have any idea what opex is today.  Without that, how could you reduce it?

A related issue is riding the horse you created.  The vendors have created the hype wave for SDN and NFV with advertising dollars and favorable editorial participation.  Now it’s clear that this has been a hype wave, and there’s a need to create favorable publicity again and in a different way—like CORD.  But it has to be toward a better goal, which is the framework of SDN and NFV that can actually deploy and make a difference.

This isn’t easy.  This blog will run more than three times as long as the average story on SDN or NFV or CORD, yet it doesn’t begin to explain the real issues and opportunities and benefits.  It would take over 10,000 words to do that, which no reporter is going to write and no editor is going to run.  What those reporters and editors could do is accentuate the real and not just the positive.  Most of what’s said about SDN and NFV has been, and is, absolute crap.  Find what’s not crap and write about it if you’re in the media, and say true things (even if it’s just to play to the climate of reality) if you’re a vendor.

The final problem is setting realistic expectations.  Vendors have to make a choice here.  There is enough opex savings available in 2016 and 2017 to fund a fairly aggressive SDN/NFV deployment, if you can support the right capabilities and if you’re prepared to take the time to make the business case.  The problem is that you won’t see a single dollar of that this year unless you’ve already gotten smart and already started selling the right way.  If you have, none of your customers have recognized it and told me!  On the other hand, if you want to make some money on SDN/NFV today you can do that by whacking at all the low apples.  If that happens you will build nothing extensible and you’ll run out of steam next year, with no chance of improvement thereafter.

Sales people tell me their management expects too much, and they’re right.  I don’t know of a single vendor who has a realistic and optimal strategy for SDN/NFV sales, even the half-dozen who have all the pieces needed.  They see transformation as being self-justifying.  They cite “IP convergence” as an example of why nothing complicated is needed, forgetting that IP convergence was a direct result of consumer data service opportunity created by the Internet.  We had a business case ready-made.  We don’t have that now.

We can still get it.  All the elements of a good business case for transformation are available right now in the marketplace.  All the knowledge needed to apply them is also there.  Yes, it’s going to be harder and yes, you’re going to have to wait a couple years for the money to roll in.  The thing is, you are not winning now as a vendor or as an operator, in the game you’re playing.  Reality has to be better.

 

Learning Some Lessons from AT&T’s ECOMP

Any time a Tier One decides to open the kimono a bit on their planning for transformation it’s important.  AT&T did that with its paper on ECOMP, which stands for “Enhanced Control, Orchestration, Management, and Policy” and the topic makes the comments doubly important.  As operators seem to be looking to take on more of the heavy lifting in transformation ECOMP is a signpost on what a major operator thinks it has to do.  Which of course means it’s an exemplar for doing the right thing.

ECOMP, like so many other things we now seem to be seeing, is a top-down, service-lifecycle-focused approach.  AT&T link it to their Domain 2.0 (D2) project, which has been evolving for almost five years now, and which guides infrastructure procurement.  It seems that ECOMP is an indicator that service lifecycles exist outside the pure infrastructure realm; “ECOMP is critical in achieving AT&T’s D2 imperatives” according to the white paper.

As an architecture for D2, ECOMP is more than SDN, more than NFV, and even more than OSS/BSS, and in fact it displaces some of the functions of all of these things.  Service lifecycle management ECOMP-style steals lifecycle management features from all of these things, and so it can be seen as a kind of shim that connects the next generation of infrastructure and the next generation of services and facilitates evolution both in the service and infrastructure spaces.

The structure of ECOMP is interesting.  At the top, it’s a multi-dimensional portal that offers both direct access to design and operations functions and an interface through which current OSS/BSS systems and presumably NOC tools and procedures can access information.  There’s also provision for big-data analytics integration at the top.  Below that on the “left” in the figure, are the new elements ECOMP introduces, primarily in the area of design and policy specifications.  On the “right” are the collection of applications, services, and tools from legacy and new sources that form the engine of ECOMP, under the control of the Master Service Orchestrator (MSO).  Controllers and infrastructure managers fall into this portion.

The main diagram for ECOMP doesn’t name SDN or NFV (NFV’s VNFs are listed as managed element examples) but it’s pretty clear that ECOMP and MSO live well above both these technologies and that legacy management interfaces and the devices they represent are co-equal with the new stuff in terms of creating service resources.  Thus, like Verizon, AT&T is creating a model of future networking that embraces current technology.  That’s in part for evolutionary reasons, but also I think to keep their technology options open and to introduce orchestration for efficiency and agility without committing to major infrastructure changes.

According to the paper, “Orchestration is the function defined via a process specification that is executed by an orchestrator component which automates sequences of activities, tasks, rules and policies needed for on-demand creation, modification or removal of network, application or infrastructure services and resources.”  No technology specificity and no indication of reliance on higher-level OSS/BSS processes.  The process specification drives orchestration.  It’s also clear in a later section that ECOMP so extends NFV specifications as to totally subsume them, creating a higher-level structure that the NFV ISG might have created directly had they taken a top-down approach.

The biggest advance ECOMP specifies, IMHO, is the metadata-driven generic VNF Manager, which presumably eliminates the need for one-off VNF integration during onboarding, something that the VNF-specific VNFM concept of the ETSI ISG leads to.  This, says the paper, “allows us to quickly on-board new VNF types, without going through long development and integration cycles and efficiently manage cross-dependencies between various VNFs. Once a VNF is on-boarded, the design time framework facilitates rapid incorporation into future services.”  This concept of metadata-driven VNF management is critical, and while the paper doesn’t say so it would appear that the same model could be applied to legacy network elements, which means management could be generic overall.

AT&T Service Design and Creation (ASDC) is the modeling application that manages all this metadata, and metadata also controls the integration of the components of ECOMP themselves.  Thus, ECOMP is a realization of a data-model-driven approach, the very thing I think that the ETSI ISG, the ONF, and the TMF should have worked for from the first.  It appears that metadata from the resource side and from the service side are model-bound, which makes deployment of services resource-independent as long as the modeling defines conditional selection of infrastructure based on things like location or performance—which appears to be the case.

The modeling approach taken by ECOMP seems to draw from a lot of sources.  They have separate service, resource, and product models and that’s a feature of the TMF’s SID approach.  They use inheritance and some of the cloud-centric features of TOSCA too, and in configuration they use YANG.  There’s nothing wrong with multiple modeling approaches, though as I’ve said in the past I believe there are benefits to having a common modeling language down to the level right above the resources themselves.

There’s a lot to like about ECOMP even if you forget for the moment that it’s an operator’s solution to operators’ problems.  There’s also at least a chance (if there’s enough interest, says AT&T) that it will end up open-sourced.  Obviously that would create competition for vendor approaches, but ECOMP could have an impact on vendor solutions whether AT&T opens it up or not.

There is nothing here that a full-spectrum business-case-ready NFV solution could not provide.  Every one of the six vendors I’ve cited in that space could frame something functionally close enough to ECOMP to be suitable.  It would be tempting to suggest that if vendors had jumped out quickly enough, AT&T might have adopted a vendor approach, but from what I know of ECOMP evolution that’s not true.  ECOMP concepts go back to 2013 before any of the implementations of NFV from those six vendors was even announced.

What is relevant at this point is how ECOMP could impact the market dynamic.  If you add it to the CORD interest I’ve recently blogged about, it’s obvious that operators are well into a new stage of SDN and NFV, one where CTOs and standards people have to take a back seat to CFOs, CIOs, and product/service management.  The result of that is a quick upward welling of focus toward the business-case issues.

There are similarities between CORD ant ECOMP, mostly at an architecture or goal level.  CORD seems to take things further and in a slightly different direction, helping operators assemble infrastructure by assembling feature-repository offices instead of trying to glue zillions of technical elements.  The CORD orchestration model, XOS, binds all the functionality and resources.  With ECOMP, the binding and lifecycle processes themselves are the goal, but you probably end up in the same place.

CORD, and now at least for AT&T, ECOMP, represent the new generation in transformation visions.  At least a dozen operators I’ve heard from are looking at something bigger, broader, and more directly aimed at securing benefits than either SDN or NFV.  While some may believe this kind of thinking would kill the two recent industry favorites, the fact is that it could save them.  If SDN and NFV are incorporated in a benefit-capable framework they have a chance.  If not….

If I were a vendor in the NFV space, what I’d do now is to jump on CORD and forget any specific worries about SDN and NFV standards.  ECOMP blows kisses at NFV in particular, but it clearly is a redo of an awful lot, and it ignores even more.  I think any vendor who sticks with the ETSI vision at this point as being sufficient is dooming themselves.  But you can’t roll your own NFV as a vendor, which means you need to lock onto another architecture.  CORD is then only game in town.

CORD isn’t complete.  SDN and NFV aren’t complete.  Nothing much is “complete” in the transformation space or we’d have transformed by now.  But ECOMP shows what at least one form of completeness would look like, would include.  It’s close enough to CORD that it can be reached from there.  ECOMP also shows how badly we’ve messed up what we’re saying are our most transformational network technologies.  I hope we don’t mess up the next one too.

Building a Technology and Regulatory Model for SDN/NFV “New Services”

Suppose that network operators and vendors accepted the notion that profitable “new” operator services had to be built in a network-integrated way, like IMS/EPC or CDN?  What would the framework of these services then look like?  How would they have to be offered and used?  It’s time to dig deeper into a network-operator-centric view of services, something not so much “over the top” as on the top—of the network.

We have to start this from a regulatory perspective because that frames how operators have to consume technology and price and offer services.  Right now, regulators tend to see services as “Internet” and “not-Internet”, with very different rules applying to the two.  The most significant truth is that Internet services are bill-and-keep and pretty much everything else is subject to normal global settlement practices—everyone in the service food chain shares the retail pie.  There are two factors that complicate this division—one favorably and the other unfavorably.

The unfavorable one is that regulators tend to warn operators against creating new services outside the Internet that offer Internet-like capabilities.  In the US, the FCC has warned explicitly against that behavior, and in Europe regulators have also signaled their concern in this area.  That means that it’s not feasible to think of a new services as being an outside-the-Internet application of IP technology.  If it can be done over the Internet and is offered to consumers, then it’s probably close enough to being an Internet service that separating it based on the mission and expecting no net neutrality rules to apply is a dream.

The favorable thing is that regulators have accepted some aspects of Internet service where separation of features from neutrality regulation is already in place.  Service control planes such as those offered in IMS/EPC are one example, and content delivery networks (CDNs) are perhaps a better one.  Most traffic delivered to Internet users is content, and most content delivered is handled by a CDN.  CDN providers (Akamai, for example) are paid for their service by content owners and sometimes by ISPs, so we have a model in CDNs where a network service feature is paid for by someone other than the Internet user who consumes the service the features is a part of.  That, to me, establishes the CDN as the model for what a new service feature has to look like.

A CDN is a combination of a control-plane behavior (customized decoding of a URL) and a cache, which is a hosting/server behavior.  Think of a CDN as being a set of microservices.  One is used to get an IP address that represents a content element in a logical sense, not a resource that’s tied to a specific IP address, server, etc.  The other is simply a delivery engine for content elements.

Let’s now look at how this approach could be applied to a new opportunity, like IoT.  For the example, let’s assume you’re in a vehicle driving through a city and you want to get traffic status/guidance.  The CDN model says that you might make an information request (the content URL click analogy) for traffic data ahead of you (the cache analogy).  The control-plane element has to know where “ahead of you” is, and from that knowledge and perhaps some policies on how distant “ahead” might be (just because you’re heading north doesn’t mean you want to look at traffic in Greenland!) associate your request with the proper resource.

The approach here, both for CDN and my IoT driving example, can be described as “I know what I want and don’t know where it is”, which is a request for a service and not a resource.  That’s the critical point in a regulatory sense.  Resources are on the Internet; services are integrated with it.  It’s my view that explicit following of that model would be the pathway to solving the regulatory problems with advanced services.

Let’s look at the technical framework here.  We have two components to deal with—a kind of service control plane and a set of feature microservices that are used to compose overall service behaviors behind the scenes.  These two pieces are critical, I think, not only for positioning our network evolution and transformation in a regulatory sense, but also for framing what SDN and NFV should evolve to support.

Both SDN and NFV advocates push their wares in the context of current services; we create “new” services only in the sense that we use new mechanisms.  In a future of “on top” services we use new mechanisms but to support new service behaviors, behaviors that are more agile because they’re signaled dynamically.  In this model, we justify SDN and NFV to support that agility, to enable the two-tier control-and-feature-plane model.

This is not to say that that we wouldn’t use SDN and NFV for current services, or couldn’t.  The goal would be to frame those services in the on-top model described here.  In short, what we’d be doing is creating a framework for services, a kind of next-gen architecture, that harmonizes with regulatory reality and at the same time builds a model for both current and next-gen services.

The service signaling part of this model can be based on the CDN model and/or on what I’ll call the “IMS model” meaning the model used in mobile networks for subscriber and mobility management.  What that means is that the signaling plane of a new network could be reached either because it was extended explicitly to the subscriber (phone registration, dialing, movement) or because it was “gatewayed” from the customer data plane (clicking a media URL).

The hosted feature plane of the model would differ from Internet models in that it would not be addressable on the customer data plane level in a direct way.  You can’t pick your own CDN cache point.  Instead, the service feature would be connected/delivered through a gateway, which we could visualize as a kind of virtual microservice.

Let’s look at IoT in this context.  We presume that there is a “traffic” service that is represented on the Internet.  The service offers users the ability to either assess a current route or to optimize a path to a destination from a specified (or current) location.  Our hypothetical driver would exercise this service to see what’s happening ahead, by clicking on what would look something like a content URL.  The service request would be gated into the signaling plane, where a suitable route analyzer can be found.  This gating process could involve access to the customer records, a billing event, etc.

The route analyzer would create a response to the request and return it to the service in the form of the results of the click, just like a CDN does, and the result is then available to either a mobile app or a web page display.  Any of the data paths (except to the customer via the Internet service) could be prioritized, built with SDN.  Any feature could be deployed with NFV.  So we have blended IMS and CDN principles into a single model, used our new network technologies, and created something that could dodge neutrality problems.

“Could”, because if operators or others were to adopt this model the wrong way (letting priority pathways bleed onto the customer data plane) they’d be at risk in at least some major regulatory jurisdictions.  You can’t get greedy here and try to re-engineer the Internet without bill-and-keep and with settlement for traffic handling.

The point of all of this is to demonstrate two things.  First, we can make “new services” work even with neutrality/regulatory barriers in place.  Second, we have to think about these new services differently to get us there.

 

SDN and NFV Pass the Torch to CORD

The interest being shown in ONOS’s CORD architecture (see this Light Reading piece) isn’t a surprise to me (see my own blog on it here) but it’s an indication that CORD might be even more influential because of a singular fact—integration and “packaged solutions” are much more in CORD’s DNA.  I don’t agree that it’s a simple cloud CO in a box, but it’s much closer to that than even most proprietary strategies would offer.  That could be important in populizing cloud support of operator transformation.  I referenced my prior blog for those who’d like a digest of what CORD and related technologies (ONOS and XOS) are, so I can jump off from a technology summary without repeating it.

Redefining the CO in cloud terms, meaning applying virtualization principles to its infrastructure and services, is a useful way of positioning network evolution.  Adopting SDN or NFV might sound exciting, but for network operators you have to look at the end game, meaning what you’d end up with.  Enterprises have built their IT architectures from data centers, and operators have built their networks from “serving offices”, the most common of which are the central offices that form the operators’ edge.  That’s a big factor in CORD’s acceptance, but we’re starting to see another factor.

SDN and NFV are both incomplete architectures, meaning that they don’t define enough of the architectural framework to cement their own benefit case.  In fact, neither really defines enough of the management framework in which they’d have to operate to make operators comfortable with SLA management, and NFV doesn’t define the execution framework for virtual functions either.

The fashionable thing to worry about in that situation is that you’d end up with “pre-standard” implementation.  In the real world of SDN and NFV the real risk is that you end up in a black hole of integration.  There are too many players and pieces to be fitted, and the chances of them forming a useful infrastructure that makes the benefit case is near zero unless somebody jiggles all the pieces till they fall into place.  That’s what CORD proposes to do, at least in part.

A cloud-adapted CO is by definition integrated; COs are facilities after all.  ONOS and CORD have made integrated structures an explicit goal while SDN and NFV standards groups have really failed integration totally.  However, wishing won’t make it so.  CORD may make integration explicitly a goal but that doesn’t get it realized, it only focuses people on it.  The first question now is how long it will take for that focus to mean something.  Not the last question, though.  We’ll get to that one in time.

What I’ve called VNFPaaS, the execution platform for VNFs that NFV needs desperately, is logically within CORD scope but CORD isn’t there yet.  It also needs to deliver the details in the implementation of the Infrastructure Manager intent-model concept that’s critical to resource independence.  Again, it’s an element that’s in-scope, which is more than we can say for SDN and NFV.

What might be helpful in getting to the right place is vendor interest in CORD as a way of packaging their own solutions.  Ciena’s promise of a turnkey CORD implementation is particularly interesting given that Ciena is one of the vendors with all the pieces needed to make an SDN/NFV business case.  Ciena alone could make a difference for SDN and NFV, even if its other five business-case-ready competitors don’t jump on CORD (which they should).

This is where the “how long…” question comes in, though.  Another Light Reading article illustrates the growing cynicism of operators.  Too much NFV and SDN hype for too long has created expectations in the CIO and CEO offices that technologists have not been able to meet.  At one level the cynics are right; both technologies have been mercilessly hyped and the hypothetical (dare we say!) potential has little chance of being met in the real world.  However, the success of something has to be measured, here as always, against the business case and not against conformance to fables full of sound and fury (as Shakespeare said, and perhaps tales told by idiots is also a fair parallel).  Have we so poisoned the well that it no longer matters whether we can make a business case because too much is expected?

That’s the second question I promised to get to.  Will SDN or NFV make the operators into OTTs?  That question is asked by the second Light Reading piece, but it’s not the right one.  Neither SDN nor NFV is needed to do that.  Anyone can be an OTT.  What’s hard is being a network operator.

Let’s forget the OTT goal and focus on reality.  Operators cannot leave their space or there’s no Internet to be on top of.  Operators cannot be profitable above the network they’re losing money in, while competing with others who have no such boat-anchor on their profits.  Google won’t be a network operator; why would they?  They’ll try to scare the media and operators into thinking they might, but they won’t.  So operators are what we have left to carry the water.

SDN and NFV are not about making operators into OTTs, they’re about making networks into something that, if it’s not profitable, is at least not a boat anchor.  What’s needed now is a transformation to improve profitability of network services.  A lot of that has to be cost management, opex efficiency.  Some can also come from redefining “services” to include higher-layer features (like IMS/EPC and CDN).  Very little will come from new models of selling connection services, which is why it’s fruitless to try to change connection technology without changing connection operations economics.  If it’s not cheaper there’s nothing much in connection services buyers value.

This brings us to the “best of CORD” because if we can’t create a service ecosystem into which optimized pieces can be introduced cheaply, nothing good is going to come out of either SDN or NFV.  The right way to do both, top-down, was not adopted and it’s clear that rising to the top of the problem is beyond both the ONF and the ETSI ISG.  All OPNFV has managed to do is create a platform for NFV to run on, without any feature value to make the business case.  ONOS is at least heading in the right direction, and every vendor and operator backer takes us closer to the point where we reach a critical mass of utility.

And yet we are not there at this point, and we’re running out of time.  SDN and NFV can still be redeemed—I still believe that we can extract significant value from both—but no technology is useful if it reaches its potential when its buyers have moved on.  The lesson of the two Light Reading pieces is that the buyers are moving on, reluctantly.  To CORD for now, which is at least SDN and NFV compatible.  Eventually, if CORD can’t harness the business value of transformed data centers, to something else.  In that case we will have spent a lot of time and money on nothing, and wasted an enormous opportunity.