Could Brocade’s StackStorm Deal Be the Start Of Something?

The acquisition of StackStorm by Brocade raises (again) the question of the best way to approach the automation of deployment and operations processes.  We’ve had a long evolution of software-process or lifecycle automation and I think this deal shows that we’re far from being at the end of it.  Most interesting, though, is the model StackStorm seems to support.  Might Brocade be onto a new approach to DevOps, to service lifecycle management, and to software automation of event processing with a broader potential than just servers and networks?

The end goal for all this stuff is what could be generalized as software-driven lifecycle management.  You deploy something and then keep it running using automated tools rather than humans.  As I’ve noted in past blogs, we’ve had a basic form of this for as long as we’ve had operating systems like UNIX or Linux, which used “scripts” or “batch files” to perform complex and repetitive system tasks.

As we’ve moved forward to virtualization and the cloud, some have abandoned the script model (also called “prescriptive”, “procedural” or “imperative” to mean “how it gets done”) in favor of an approach that describes the desired end-state.  This could be called the “declarative” or model-driven approach.  We have both today in DevOps (Chef and Puppet, respectively, are examples) and while it appears that the declarative model is winning out in the cloud and NFV world, we do have examples of both approaches here as well.

One thing that gets glossed over in all this declaring and prescribing is that software automation of any set of lifecycle processes is really about associating events in the lifecycle to actions to handle them.  This sort of thing would be an example of classic state/event programming where lifecycle processes are divided into states, within which each discrete event triggers an appropriate software element.  StackStorm is really mostly about this state/event stuff.

StackStorm doesn’t fall cleanly into either the declarative or prescriptive models, IMHO.  You focus on building sensors that generate triggers, and rules that bind these triggers to actions or workflows (sequences of actions).  There are also “packs” that represent packaged combinations of all of this that can be deployed and shared.  It would appear to me that you could use StackStorm to build what looked like either a declarative or prescriptive implementation of lifecycle management.  Or perhaps something that’s not really either one.

All of my own work on automating lifecycle processes has shown that the thing that’s really important is the notion of state/event handling.  A complex system like a cloud-hosted application or an SDN/NFV/legacy service would have to be visualized as a series of functional atoms linked in a specific way.  Each of these would then have their own state/event processing and each would have to be able to generate events to adjacent atoms to synchronize the lifecycle progression.  All of this appears to be possible using StackStorm.

In effect, StackStorm is a kind of PaaS for software automation of lifecycle processes.  As such, it has a lot of generalized features that would let you do a lot of different things, including addressing virtually any software-based feature deployment and utilizing scripts, tools, and other stuff that’s already out there.  The flip side of flexibility is always the potential for anarchy, and that means that StackStorm users will have to do a bit more to conceptualize their own software lifecycle management architecture than users of something like TOSCA (I think you could implement TOSCA through StackStorm, though).

As a pathway to automating cloud deployment, StackStorm could be huge.  As a mechanism for automating service lifecycle management in SDN, NFV, and even legacy services, StackStorm could be huge.  Played properly, it could represent a totally new and probably better approach to DevOps, ServOps, and pretty much everythingOps.  Of course, there’s that troublesome qualifier “properly played….”

I didn’t find a press release on the deal on Brocade’s site, but the one on StackStorm’s offered only this comment: “Under Brocade, the StackStorm technology will be extended to networking and new integrations will be developed for automation across IT domains such as storage, compute, and security.”  While this pretty much covers the waterfront in terms of the technologies that will be addressed, it doesn’t explicitly align the deal with the cloud, virtualization, SDN, NFV, IoT, etc.  Thus, there’s still a question of just how aggressive Brocade will be in realizing StackStorm’s potential.

The key to this working out for Brocade is the concept of “packs”, but these would have to be expanded a bit to allow them to define a multi-element model.  If that were done, then you could address all of the emerging opportunities with StackStorm.  What would be particularly helpful to Brocade would be if there were some packs associated with SDN and NFV, because Brocade is too reliant on partners or open operator activities to provide the higher-level elements to supplement Brocade’s own capabilities.

It would be interesting to visualize these packs as composable lifecycle management elements, things that could be associated with a service feature, a resource framework, and the binding between them.  If this were to happen, you’d need to have a kind of PackPaaS framework that could allow generalized linkage of the packs without a lot of customization.  This is something where a data framework could be helpful to provide a central way of establishing parameters, etc.

It would also be interesting to have a “data-model-compiler” that could take a data representation of a service element and use it to assemble the proper packs, and thus provide the complete lifecycle management framework.  This would certainly facilitate the implementation of something like TOSCA (and perhaps the ETSI NFV model with the VNF Descriptor) where there’s a data framework defined.

The last “interesting” thing is a question, which is whether something like this could effectively take the “Dev” out of “DevOps”.  Many of the tools used for deployment and lifecycle management were designed to support a developer-operations cooperation.  That’s not in the cards for packaged software products.  Is StackStorm agile enough to be able to accommodate packaged-software deployment?  Could “packs” represent packages to be deployed, perhaps even supplied by the software vendors?  Interesting thoughts.

We may have to wait a while to see whether Brocade develops StackStorm along these lines, or even to see what general direction they intend to take with it.  But it’s certainly one of the most interesting tech M&As in terms of what it could show for the buyer, and for the market.

How Did SDN/NFV Vendors Lose the Trust of their Buyers (And Can they Reclaim It?)

If you look at or listen to network operator presentations on next-gen networking, you’re struck by the sense that operators don’t trust vendors any more.  They don’t come out and say that, but all the discussions about “open” approaches and “lock-in” demonstrate a healthy disdain for their suppliers’ trustworthiness, and just the fact that major operators (like both Verizon and AT&T in the US) are driving their own busses in their network evolution speaks volumes.  What’s going on here, and are operators right in their view that vendors can’t be trusted to serve their interests?

My surveys have shown operator trust in vendors eroded a long time ago.  Ten years ago, operators in overwhelming numbers said that their vendors did not understand or support their transformation goals.  They were right, of course.  The challenge that the evolution of network services has posed for operators is mass consumption, where cost is paramount and where public policy and regulations have constrained the range of operator responses.  Vendors simply didn’t want to hear that the network of the future would have to cost a lot less per bit, and that stagnant revenue growth created by market saturation and competition could only result in stagnant capital budgets.  Cisco, under Chambers, was famous for their “traffic-is-coming-so-suck-it-up-and-buy-more-routers” story, but every vendor had their variant on that theme.

When NFV was first launched, I was surprised to see that even at one of the first meetings of the ETSI ISG, there was a very direct conflict between operators and vendors.  Tellingly, the conflict was over whether the operators could dictate their views in an ETSI group where ETSI rules gave vendors an equal voice.  Ultimately the vendors won that argument and the ISG went off in a direction that was far from what major operators wanted.

Vendor domination of the standards processes, generated by the fact that there are more vendors than operators and that vendors are willing to spend more to dominate than operators are, is the proximate cause of the current state of distrust of vendors.  Since large operators, the former regulated incumbents, are still carefully watched for signs of anti-trust collusion, operators themselves can’t form bodies to solve their collective problems, and open standards seemed to be the way out.  It didn’t work well, and so the next step was to try to drive open-source projects with the same goals.  That’s showing signs of issues too.

So far in this discussion, “vendors” means “network vendors” because operators’ concerns about intransigence driven by greed focused obviously on the vendors whose business cases were threatened by stagnant capital spending or a shift in technology.  In that same early ISG period, operators were telling me that there were three vendors they really wanted to get strong solutions from—Dell, HP (now HPE) and IBM.  Eventually they ended up with issues with all three of these new vendors too, not because they were obstructing the process but because they weren’t perceived by operators as fully supporting it.  Neither Dell nor IBM, IMHO, fields a complete transformation solution, and while HPE has such a solution they’ve not completely exploited their own capabilities.  Operators, in their view, had no vendors left and no viable paths to standardization or community development.  As a last resort you have to do your own job yourself.

If you look at the public comments of AT&T and Verizon as examples, operators are increasingly focusing on self-directed (though not always self-resourced) integration rather than on collective specifications or development.  They’re fearful even of what I’d personally see as successful open-source projects like ODL for SDN, but they’re willing to adopt even commercial products as long as they can frame their adoption in an open model that prevents lock-in.

Open models prevent lock-in.  Integration links elements into open models.  That’s the formula that’s emerging from early examples of operator-driven network evolution.  They’re willing to accept even proprietary stuff as an expedient path to deployment but it has to be in an open context, because down the line they intend to create a framework where vendor differentiation can never generate high profit margins for vendors.  Their own bits are undifferentiable, and so their bit production must also be based on commodity technology.  Open-source software, Open Compute Project servers, white-box switches—these are their building blocks.

So does this mean the End of Vendors as We Know Them?  Yes, in a sense, because it means the end of easy differentiation and the end of the notion of camel’s-nose selling, where you have something useful that pulls through a lot of chaff.  In a way that’s a good thing because this industry, despite tech’s reputation for being innovative, has let itself stagnate.  I was reading a story on the new Cisco organization, and the author clearly believed that one of Cisco’s great achievements was to use MPLS to undermine the incentive for thorough SDN transformation.  Not exactly innovation in action, which of course is both one of the operators’ current issues and the path to vendor salvation.

Innovation doesn’t mean just doing something different, it means doing something both different and differentiable in a value or utility sense.  If a vendor brought something to the operator transformation table that would truly change the game and improve operator profits, the operators would be happy to adopt it as long as it didn’t carry with it so much proprietary baggage that the net benefit would be zero (or less).

Ironically, the operators may be setting out to prove the truth of the old saw “We have met the enemy and they are us!”  Operator vision has been just as lacking as vendor vision, and in many ways lacking the same grounding in market realism versus greed.  IoT is the classic example of operator intransigence.  They promote a vision of IoT whose problems in an economic, security, and public policy sense are truly insurmountable when at least one alternative vision addresses all the problems.

We have IoT pioneers, ranging from credible full-spectrum giants like GE Global Research’s Predix activity to startup innovator Bright Wolf.  Network vendors want in on the game (one of Cisco’s new key groups is focused on IoT) and IT vendors like IBM and HPE have ingredients for the right Big Picture of IoT.  I think it may be that IoT will be the catalyst to educate both sides of the vendor/operator face-off.  It might also be an indicator that even the current SDN/NFV transformation initiatives of the operators will suffer serious damage from operators’ own shallow thinking.  If the transformed network doesn’t promote a vision for what’s likely to be the major new application of network services, it has little hope of helping operator profits.

Because some of the biggest drivers of change, like IoT, are yet to be “service-ized” as they must be, there’s still a chance for vendors to redeem themselves there.  The right answer is always valuable, especially if it can help you move on market opportunities faster.  But vendors don’t have to wait for these big opportunities; there is still plenty of time to offer a better approach than operators are devising on their own, and at a fair profit margin for vendors.  It will take some work, though, and I’m not sure at this point that vendors are willing to invest the effort needed.

Could SD-WAN Be the Most Disruptive Network Technology?

There are a lot of network options out there, and while it’s fun to talk about them in technology terms there may be a more fundamental issue to address.  Network services have always involved multiple layers, and it’s always been convenient to think of infrastructure as being either “connection” or “transport”.  In the classic services of today (Ethernet and IP) and in the venerable OSI model, these two things are both in the operator domain.  That might change, and we might see some (and just maybe, all) services migrate to a more user-hosted connection model.  Which, in turn, could change the network service market dynamic.  And all because of SD-WAN.

Networks provide information transport and connectivity, meaning that they can move bits and address endpoints.  The old conceptualization of network service was simple—everything that consumed a service had an address in the service address space.  It’s not that simple any more, for a variety of reasons.  Nearly every Internet user employs Network Address Translation (NAT) to allow all the home devices to use the Internet without requiring they all have their own unique IP addresses.  Nearly every enterprise employs virtual private networks (VPNs) or LANs (VLANs) because they don’t want their company addresses to be visible on the Internet.

An even broader and more interesting idea is also an old one, which is “tunnel networking”.  If you use traditional “network services” like the Internet or Ethernet for transport, you could build tunnels using some protocol (MPLS, L2TP, PPTP…you name it) and treat these tunnels as though they were wires, my “virtual wire” concept.  That means you could build a connection network on top of a network service, providing “routing” or “switching” across your tunnels just as you might once have done with real private line connections.

Nicira took this a step forward by bundling the “tunnel” protocol and the “routing” processes into a software package and calling it software-defined networking.  The model is incredibly powerful in virtualization, cloud, and NFV applications because it lets you build a bunch of tenant/application networks in parallel and share the real infrastructure among them.

In a broad sense (which isn’t always how vendors present it), Software-Defined WAN (SD-WAN) arguably uses this same model but in one or more different ways.  With SD-WAN the goal may be to collect sites/users onto a single “virtual network” when there is no single common physical network service available to do that.  You could combine people on the Internet with those who had a private VPN or even a VLAN connection.  In some cases, you might create a virtual connection by building multiple parallel tunnels (over different networks or even over the same one, but with diverse routing) and combining them.

Finally, the Metro Ethernet Forum has proposed its “Third Network” model, which not surprisingly makes Ethernet connections the physical framework of networking and builds other services by creating some form of overlay network—back to tunnels or virtual paths.  Unlike the other approaches, the MEF model is an inside-out or operator-driven vision, a way of creating infrastructure that takes the most dynamic aspect of networking (connectivity) out of the hands of traditional technology.

If we leave the Internet aside for the moment, it’s easy to see that we could move all current network services to an overlay approach.  The user could be given a choice of hosting their own router/switch elements (as devices, as software, as VNFs), buying internal-to-the-real-network instances from operators, or both.  We could create VPNs that would look as they do today, but that didn’t require Level 3 services from operators at all.

All of this seems part of a broad trend toward the separation of function and structure.  Oracle just announced a version of its public cloud software designed to be inserted into customers’ own data centers to bypass problems (real regulatory or policy problems, or just executive resistance) to moving key applications into the public cloud.  This frames the notion of “public cloud” not as a business model but as a service technology layer that could then ride on whatever infrastructure technology is optimum.

This division, which corresponds to my last phase of virtual network evolution, is interesting because it could come about both through the actions of network buyers, at least for enterprise services, and through the action of the network operators.  If service-layer technology is a relatively inexpensive overlay rather than an expensive collection of devices, then the operators might indeed want to promote it.  If operators were to deploy virtual-wire technology in SDN or other form (including the MEF’s “third network”) then it would promote the service/infrastructure dualism.

Enterprise buyers could do this on their own, and SD-WAN concepts lead in that direction.  The notion of multi-infrastructure service is at the least a path to infrastructure-independent services, and some implementations (Silver Peak, for example) are explicitly dualistic (or multiplistic) in terms of what they can run on.  These bridge across infrastructures with a service, so they could be used by enterprises to create something like the MEF’s third-network vision even if the operators who are the intended target somehow don’t see the light.

Another force that could influence a move to an overlay model is the managed service provider market.  We already know from NFV experience that MSPs are a growing force, largely because they address a market segment that needs networking but can’t retain (or afford) the skilled labor needed to run one on their own.  In NFV, MSPs have been able to lead the market for vCPE services because their value proposition is to substitute service-less technology for that which requires internal support.  The same thing could happen with overlay services.

If we look at things this way, then the SD-WAN space could be the most disruptive service technology out there.  It could transform the network model, work both from the supply side and demand side, and it’s currently largely driven by startups or companies who aren’t part of the L2/L3 mainstream who have a vested interest in keeping things as they are.  Since the overlay model favors SDN and fiber, it might be the perfect match for a player like ADVA, Ciena, or Infinera, and all of these companies have the technology to promote the notion.  We’ll see if they do.

How to Make SDN and NFV About Zeros Instead of Nines

We chase a lot of phantoms in the tech space, none as dangerous as the old “five-nines” paradigm.  Everyone obsesses about getting reliability/availability up to the standards of TDM.  That’s not going to happen unless we don’t do the kind of network transformation we’re talking about.  Five-nines is too expensive to meet, and we don’t have it anyway with mobile services.  What we have to worry about isn’t too few nines, but too many zeros.

Telstra, the Australian telecom giant, has been hammered in the press there for having multiple major outages in just a few days, outages where the number of customers with service was zero.  To me this proves that SDN, NFV, cloud, or other network technology evolutions are going to be judged by customers not by the number of dropped calls or video glitches (who doesn’t see those regularly?) but by the number of no-service periods and the number of impacted consumers.  That’s a whole different game.

The overall effect of our proposed network infrastructure changes would be to centralize and software-ize more things, to move away from the adaptive and unpredictable to the more manageable and from the proprietary and expensive to the commodity.  All of this is being proposed to reduce costs, so it’s ridiculous to think that operators would then engineer in that old five-nines standard.  Packet networks in general, and centralized-and-software networks in particular, are not going to meet that except by over-engineering that compromises the whole business case for change.  That’s not a problem, but what is a problem is the fact that the five-nines debate has covered up the question of the major outage.

One of my enterprise consulting engagements of the past involved a major healthcare company who had a “simple” network problem of delayed packets during periods of congestion.  The problem was that the protocol involved was very sensitive to delay, and when confronted by a delay of more than a couple of seconds these protocols tended to let the endpoints get out of synchronization.  These endpoints then reset, which took down the device/user and forced a restart and recovery process—which generated more packets and created more delay.  What ended up happening is that over ten thousand users, everyone in the whole medical complex, lost service and the vendor could not get it back.  They limped along for days until I showed them that it would be better to drop the packets than delay them.  One simple change and it worked again.

Think now of a central control or management process.  It’s doing its thing, and perhaps there’s a trunk problem or a problem with a data center, and a bunch of VNFs or SDN switches fail.  The controller/lifecycle manager now has to recover them.  The recovery takes resources, which creates a waiting list of service incidents to address, which leaves more switches or VNFs disconnected, which creates more failures…you can see where this goes.

There are quite a few “zero-service-creating” conditions in emerging models of the network.  There are also some pretty profound recovery questions.  If an SDN controller has to talk to switches to change forwarding tables, what happens when the failure breaks the switch-to-controller linkage?  If an NFV domain is being lifecycle-managed by a set of processes, what happens if they get cut off from what they manage?

I’m not a fan of adaptive device behavior as a means of addressing problems, and I’m not proposing that we engineer in the five-nines I’ve been pooh-poohing above.  What I think is clear is that we’ve left out an important concept in our advances in network technology, which is the notion of multi-planar infrastructure.  In the old days we had a control/signaling plane and a data plane.  With SDN and NFV we need to reaffirm these two planes, and add in the notion of a management plane because of lifecycle management dependencies.  The control/signaling plane and the management plane, and the processes that live there, do have to be five-nines or maybe even more, because if they are not there’s a risk that a failure will cascade into an explosion of problems that will swamp remediation by swamping or breaking the signaling/management connectivity.  Then we’re in zero-land.

We don’t really have an explicit notion of signaling/control and management planes in either SDN or NFV.  In SDN, we don’t know whether it would be possible to build a network that didn’t expose operators to having large chunks cut off from the controller.  In NFV we don’t know whether we can build a service whose signal/control/management components can’t be hacked.  We haven’t addressed the question of authenticating and hardening control exchanges.  Financial institutions do multi-phase commit and fail-safe transaction processing, but we haven’t introduced those requirements into the control/management exchanges of SDN or NFV.

What do we have to do?  Here are some basic rules:

  1. Management and control processes have to be horizontally scalable themselves, and the hardest part of that is being able to somehow prevent collision when several of the instances of the processes try to change the network at the same time. See my last point below.
  2. Every management/control connection must live on a virtual network that is isolated and highly reliable, not subject to problems with hacking or cross-talk resource competition from the data plane. This network has to connect the instances of management/control processes as they expand and contract with load.
  3. Every control/management transaction has to be journaled, signed for authenticity, and timestamped for action, so we know when we’ve gotten behind and we know how to treat situations when a resource reports a problem and requests help, and then for a protracted period hears nothing from its control/management process.
  4. There can never be multiple restoration/management processes running at the same instant on the same resources. One process has to own remediation and coordinate with other processes who need it.

There are two general ways of doing that which is needed.  One is to approach the problem as one of redundant centralization, meaning you stay with the essential SDN/NFV model.  The other is to assume that you really don’t have centralized lifecycle management at all, but rather a form of distributed management.  It’s this second option that needs to be explored a big given that the path to the first has already been noted—you apply OLTP principles to SDN/NFV management and control.

If you’re going to distribute lifecycle management, you have two high-level options too.  One is to divide the network infrastructure into a series of control domains and let each of these domains manage the service elements that fall inside them.  The other is to forget “service lifecycles” for the most part, and manage the resource pools against a policy-set SLA that, if met, would then essentially guarantee that the services themselves were meeting their own SLAs.

A resource-management approach doesn’t eliminate the need for management/control, since presumably at least some of the resource-remediation processes would fail and require some SLA-escalation process at the least.  It could, however, reduce the lifecycle events that a service element had to address, and the chances that any lifecycle steps would actually require changes to infrastructure.  That could mitigate the difficulties of implementing centralized management and control by limiting what you’re actually managing and controlling.

The forget-lifecycles approach says that you use capacity planning to keep resources ahead of service needs, and you then manage resources against the capacity plan.  Services dip into an anonymous pool of resources and if something breaks you let resource-level management replace it.  Only if that can’t be done do you report a problem at the service level.

Some services demand the second approach, including most consumer services, but I think that in the end a hierarchy of management is the best idea.  My own notion was to assign management processes at the service model level, with each object in the model capable of managing what happens at its own level, and with each object potentially assignable to its own independently hosted management process.  It’s not the only way to do this—you can apply generalized policy-dissemination-and-control mechanisms too.  But I think that we’re going to end up with a hierarchy of management for SDN and NFV, and that working toward that goal explicitly would help both technologies advance.

Looking at the New Cisco (and the Reason Behind It)

Cisco’s apparent reorganization and retargeting of its strategic initiatives is certainly newsworthy, given Cisco’s position both as the premier provider of IP technology and its UCS data center systems.  It’s also newsworthy that Cisco is clearly emphasizing the cloud and IoT over traditional networking missions.  The question is what these newsworthy events mean in the context of the markets, and by association whether Cisco is behaving in a strategic way or just trying a different tactic.

For a decade, Cisco’s implicit story to both enterprises and service provider was simple—more traffic is coming so suck it up and spend on more capacity to carry it.  Forget ROI, forget benefits, even forget costs.  Traffic means transport, period.  As stupid as this may sound when stated as baldly as I’ve stated it, the message has value because it’s simple.  The implicit question being asked of network buyers is “What traffic will you elect not to carry as traffic growth continues?”  That’s a tough question to answer.

I think that two factors have convinced Cisco that their old story was too old for the market.  One is the increased resistance to network spending; the best proof that something is a bad approach is that it’s not working.  The other is the explosion of interest in both SDN and NFV.  Cisco has worked pretty effectively to blunt the immediate impact of both technologies, but in the end they prove that buyers are willing to look outside the box to control cost or improve network ROI.

If all we had to deal with was increased cost pressure, then a lot of Cisco’s old cost-management steps would be enough.  You cut internal cost and waste by consolidating groups who duplicated effort.  You improve cost-efficiency of devices with better semiconductor support.  Those steps have been taken, and they’ve borne some fruit for Cisco, but they apparently have not been enough to future-proof Cisco’s financials.  Thus, we come to the current Cisco moves.

NFV and SDN show that networking in the future will incorporate a lot more hosted functionality rather than just devices.  This isn’t the same thing as saying that the cloud eats the world, a statement attributed to a VC in a news story yesterday.  It’s saying that it’s better to value the functional components of devices than the hardware they run on.  It’s saying not to compete in areas you can’t be competitive, and that are price-driven and margin-less in any event.

If you build a router as a device, your differentiation is largely your software features and anything special you might have in semiconductors to handle the networking-specific tasks.  You don’t then want to have to spend a lot to build a hardware platform that’s not going to be cost-competitive with commodity server platforms just to hold the good stuff.  The long-term direction Cisco is likely plotting is to divide network equipment into two categories—one where the mission justifies customized semiconductor hardware for real feature vale, and one where it does not.  You focus then on devices for the former mission, and for the latter you shift to a hosted model.  Modular software/hardware pricing is clearly a step toward this goal, as is a lot of Cisco’s recent M&A.

Security, provider edge, and enterprise networking in general seem to fall into the second of these two categories, meaning that the real value is features and the best way to deliver those features is through a software license.  This is a big shift for Cisco by itself, but one dictated by market direction.  If anyone is going to offer software-license networking, then the cat’s out of the bag and Cisco has to respond.

The fact that all of enterprise networking does is why Cisco would incorporate something that’s clearly a provider concept—NFV—into their “DNA” (Digital Network Architecture) announcement.  They need a broad model for the enterprise because the enterprise is likely to be where the impact of the transition from device-resident features to hosted features happens first, and most broadly.

There’s nothing wrong with Cisco categorizing the hosting part as “cloud”, nor of course with the media either accepting Cisco’s slant or adopting it on their own.  Most hosting of network features on servers will involve resource pools, and so the cloud label is fair.  It’s also true, despite some of Cisco’s own marketing messages, that IoT if it succeeds will be more a success for the cloud than for connected devices or traffic.  However, it’s important to go back to that “cloud eats the world” comment, because while the label is fine an underlying presumption of cloud-eating-world could be a major problem for Cisco.

Enterprises will probably not be hosting their network features in the public cloud.  Most of their mission-critical applications won’t be leaving the data center, in fact.  If we apply the term “hybrid cloud” to the hybridization of public and private cloud deployments, we probably have an issue.  If we apply the term to public cloud and private IT, then we’re fine.  Cisco’s risk with DNA is not that they don’t support “the cloud” but that they rely too much on cloud-specific internal IT trends.  Way more servers in enterprise data centers will be running virtualization than private cloud.  Dell/VMware, remember, is the king of enterprise virtualization.  Cisco could position itself into a relative backwater of IT evolution if they’re not careful.

They may also have a risk in pursuing an enterprise-centric vision of hosted functionality rather than a more literal implementation of the formal service provider initiatives like SDN and NFV.  If operators do adopt NFV widely and/or accept and support a cloud-centric IoT vision, recall that my model says it would generate over 100,000 incremental data centers worldwide.  That would be an enormous market to miss out on, and a lot of what operators want in NFV would not be present in an enterprise version of NFV—like OSS/BSS integration.

To my mind, the big question with Cisco’s “cloudification” of strategy is the area of external process integration with cloud-feature lifecycle processes.  If lifecycle management manages only the lifecycle of the new stuff, then it won’t integrate into enterprise IT operations any better than it would into service provider OSS/BSS.  We don’t know at this point just how much external process integration Cisco has in mind, either in DNA or in future service-provider-directed offerings.  If it’s a lot, then Cisco is truly committed to a software transformation.  If it’s only a little, then Cisco may still have its heart in the box age of networking, and that would negate any organizational or positioning changes it announces that pledge allegiance to the cloud.

We also don’t know whether Cisco’s DNA-enterprise vision of NFV will be able to support service-provider goals, and depending on the pace of development of NFV and IoT, those goals may mature faster than the enterprises’ own connectivity needs.  The service provider space is also driving the modern conception of intent modeling and data-driven application architectures, things I think will percolate over into the enterprise eventually.  Can Cisco lead, or even be a player, in these critical technologies if they lag with the market segment that’s driving them?

Cisco is changing, there’s no question about that.  There’s no question the reason is that Cisco knows it has to change, and change quickly.  The only question is whether a company that’s prided itself on being a “fast follower” can transition into being a leader.  It’s very difficult for market leaders to lead market change.

The Evolution of the Metro Network and DCI

What exactly is the data center interconnect (DCI) opportunity, and how does it relate to cloud computing, SDN, NFV, and the cloud?  That’s a question that should be asked, isn’t asked often, and might have a significant impact on the way we build the most important part of our network.  Obviously it’s going to have an impact on vendors too.

Microsoft kicked this off by announcing an optical partnership with Inphi that would allow direct optical coupling of data center switches, eliminating the need for a separate device.  The result, according to Wall Street, was bad for Infinera because DCI was considered a sweet spot for them.  Infinera isn’t an enormous company so it’s reasonable to expect that any alternative to the Infinera DCI approach would be a threat.  It’s also reasonable to say that direct optical connection between data center switches could cut out a lot of cost and disrupt the DCI market.  Does that matter?

DCI doesn’t make up more than about 7% of metro optical deployment today, and while it’s a growth market in terms of enterprise services, the growth doesn’t compare with what could be expected (under the right circumstances) from other opportunities.  Metro networking is the hotbed of change, not only for optical equipment but also for new technologies like SDN and NFV, and for new services like cloud computing.

Let’s start with enterprise.  We have today about 500 thousand true enterprise data center locations worldwide.  As these businesses migrate to hybrid cloud (which the great majority will do) about 40% of them will want to have cloud-to-data-center connectivity at a bandwidth level that justifies fiber connectivity.  Not only does that generate about 200k connections, it also encourages public cloud providers to distribute their own data centers to be accessible to enterprises in at least the major metro areas.  My model says that we’d see a major-metro distribution of over 1,500 cloud provider data centers generated from hybrid cloud interest.  All of these get connected to those 200k enterprise data centers, but they also get linked with each other.

Mobile broadband is another opportunity, by far the biggest already and still growing.  Today it accounts for about 55% of all metro fiber deployment, and that number is going to increase to over 65% by 2020.  In technology evolution terms, more and more of mobile fiber deployment is associated with video caching, which is a server activity.  This is shifting the focus of mobile-metro from aggregation to content delivery, from trunking to connecting cloud data centers.

Service provider shifts to NFV would generate an enormous opportunity.  Globally, as I’ve said before, it could add over a hundred thousand data centers, and all of the centers within a given metro area would likely be linked to create a single resource pool.  My model says that this could generate well over 3 million links globally, which is a lot of DCI.  The question, of course, is just how far NFV will deploy given the lack of a broad systemic vision for its deployment.

IoT and contextual services would help drive metro data center deployments, and my model says that the same traffic and activity factors that would justify the 100k NFV data centers are accented by IoT and contextual services.  Thus, the two drivers would tend to combine to create a large distributed pool of servers, the very kind of thing that would be likely to need more interconnection.  That means that our 3 million links would, under the influence of IoT/contextual evolution, rise to nearly 4.5 million.

Ironically, metro shares NFV’s lack of an ecosystemic focus.  While all the drivers of metro deployment may be evolving toward connecting data centers, the largest driver (mobile broadband) is not only still focused on traffic aggregation, it’s often planned and funded independently.  The very fact that everything converges on metro means that metro is responding not only to different mission pressures (converging or not) but also different administrative/business goals.

Still, there are some things we can see as common metro requirements.  Even DCI missions are surely going to consume multiple optical pathways, and physical meshing of large numbers of data centers would be expensive in terms of laying glass.  A smarter strategy would be to groom wavelengths (lambdas) optically so that you could create low-latency-and-cost pathways that hopped from fiber strand to fiber strand as they transited a metro area.  We’re nearly at the point where DCI could justify this approach.

The “virtual wire” model that I’ve been talking about as a union of optical and SDN-forwarding elements at the electrical layer would evolve out of this approach, and could also in theory drive it.  At the end of the day, lambdas versus SDN paths is a matter of transport bandwidth utilization and relative cost.  You could see an evolution from SDN-virtual-wire to lambda-virtual-wire as traffic increased, or you might see one or more of my previously cited drivers develop fast enough to skip the SDN stage.

The deciding factor here isn’t going to be DCI but metro broadband.  We’re investing a lot of money in metro broadband backhaul already, and 5G and content delivery will increase this.  If operators build out backhaul quickly then they’ll commit to infrastructure decisions before services are highly data-center-focused, and we’ll probably see more SDN deployment besides.  This would combine to keep DCI and mobile broadband from coalescing into a common technical model.  If we were to see CDN focus or one of the other drivers develop, we might see DCI win it all.

That’s really why near-term wins and losses in the DCI space are important.  We may not see any of the drivers I’ve cited develop quickly, and we may then see DCI remain a specialized enterprise-service or IaaS-public-cloud-infrastructure element.  We might also see cloud data center deployment explode, DCI links explode even more, and the whole scheme displace all other metro applications.  That would be huge for operators, and for vendors.

Optical vendors, or at least pure ones, could do little to drive things here because all the compelling drivers of metro evolution lie in accelerating network operator use of the cloud—things like NFV and IoT.  What they could do, technically, is to focus on creating lambda-virtual-wire options and means of evolving to them from SDN-virtual-wire.  Some will see that as being dependent on the optical standards of OpenFlow from the ONF, but we should know by now that low-level specifications aren’t going to drive market transformation.  It may be that a generalized virtual-wire model that works for optics and SDN, and that can be operationalized in concert with all the evolving drivers of data center deployment, are what will win the day, for somebody.

Can We Fix the Technology, Vendor, and Service Silos Growing in NFV?

“E Pluribus Unum” is the motto found on US coins and it means “From Many, One”.  Will that work for NFV?  We may have to find out, because it seems that instead of converging on specifications NFV is flying off in a bunch of different directions.  Is that appearance reality, and if so what’s going to happen?

It is true that there’s more difference developing in NFV than commonality.  Just the fact that we have two open-source orchestration initiatives is interesting enough, but we also have the AT&T ECOMP project, and today HPE, Anuta Networks and Logicalis announced a partnership deal that will link HPE NFV Infrastructure with Anuta orchestration.  HPE, you may recall, has its own orchestration product, and in fact it’s one of the six that’s actually complete.  We also have OPNFV, which might end up doing yet another open-source implementation.

There is one ETSI specification for NFV, and at the detail level it’s fair to say that it’s been weighed and found wanting.  The ETSI spec issue is the core of the explosion of implementations and approaches.  If we had a spec that everyone agreed could make the business case at least close to optimally, I think it would have converged a lot of stuff on a single approach.  Since it doesn’t what we’re seeing is a combination of competitive opportunism and special-interest focusing of efforts.

You can’t differentiate in a service provider market by abandoning an accepted standard, but if there are major omissions then those will attract differences in approach simply to help sales people make their own strategy stand out.  That’s why, with six implementations of NFV that can make the business case, we have minimal compatibility among them.  Since we’ve now embarked on competitive approaches to the NFV problem, even harmony at the spec level wouldn’t be likely to drive convergence in approaches.   And we’re not going to get specs that will generate harmony any time in the next year.

The target factor is more complicated and more pervasive.  We do not, at this point, have anybody who’s presenting operators with an NFV story that would justify wholesale adoption of NFV.  Even the vendors who can make the business case for NFV aren’t doing that on a broad scale because such a deal would take a lot of sales cycles, during which nobody would make quota.  As in all new technologies, though, there are places where NFV can be made to fit more easily.  Not only does focusing on one of these things shorten the selling cycle for those vendors who could do the whole thing, it admits a bunch of other vendors who can’t do it all but who can cobble a service-specific story together.

Given the number of issues we’ve already uncovered in NFV with integration at the infrastructure, function on-boarding, and management levels, it might seem that this explosion of service-specific NFV silos would only add to the angst.  Actually it might help things.

NFV has from the first been a multi-level structure built from abstractions representing service features or behaviors, whether vendors have optimized that principle in implementation or not.  This structural model predates NFV, going back to the days of IPsphere and the TMF’s Service Delivery Framework, and even its SID modeling.  Each of the abstractions is essentially a black box, opaque.

One property of a black-box abstraction it that it can contain a bunch of other abstractions.  That means that a “service” created by one vendor’s NFV solution would contain within it a lot of low-level models that eventually ended up with stuff that represents deployment of resources.  From the top, or the outside, it would still look like a single block in a diagram.   It should be possible, then, to represent the product of silo deployments in a way that allows it to be incorporated as an element in another silo.  Yes, you’d have to jiggle the APIs or data formats, perhaps, but it is possible.

This issue is the one I’ve called “federation” in past blogs, and it could go a long way toward not only eliminating the lock-ins and other optimality issues associated with silo deployment of NFV, it could also help make the broad NFV business case.  Best of all, it could encourage vendors to think more in terms of integration at a level where the results could actually be applied.

We are not going to fix the sins of omission in the NFV specification process in time to do much good.  We are not going to be able to harmonize the integration of VNFs or even the portability of resources with the current open-source activities either, at least not in time to address operator concerns about profit per bit.  We are going to continue to have low-apple, service-confined, NFV deployments because that’s the only way to balance risk and reward at this point.  We need to focus elsewhere.

The NFV ISG could prioritize effective “service object federation” right now.  The various open-source groups could do likewise.  A third body could undertake the task.  Any way it happens, having a way to link the service-based silos and implementation enclaves that we’re building would help to generate a broad solution.  The problem is that none of these ways are going to move fast enough.

For vendors, there would be an obvious advantage that accrues to those who could absorb other silo objects into their own service models.  That benefit could be the thing that moves vendors to do more on their own here.  Make no mistake; nothing is going to save NFV’s ability to drive better profit per bit except vendor-driven solutions.  Which means that we can only build more silos.  Which means we have to address silo-binding right now before we collapse our emerging NFV house of cards.

Any intent-modeled structure is self-federating, meaning that since the stuff inside a model element is invisible, it could just as easily have been created by some other service group or even another operator.  Operators who have been pushing for intent-modeled NFV could, if they push with a bit more emphasis on federation, force vendors to accommodate that silo-binding mission without much additional effort.

Of course, operators themselves have had tunnel vision with NFV.  Vendors have worried more about protecting any advantage they gain from early participation in silo-building deployments than building the broad business case by creating an NFV ecosystem.  Some, I think, are starting to realize that there’s an issue here, and I hope these players will lead the charge to change.

 

The Latest in the New-Service Modeling Game

Modeling is critical for both SDN and NFV, and I’ve blogged on the topic a number of times.  In particular, I’ve been focusing on “intent models” as a general topic and on the OASIS TOSCA (Topology and Orchestration Specification for Cloud Applications) in particular.  The two, as we’ll see, have a relationship.

A recent set of comments on LinkedIn opened a discussion of models that I think is important enough to share here.  Chris Lauwers of Ubicity offered his view of the modeling world: “TOSCA models are ‘declarative’ in that they describe WHAT it is you’re trying to provision (as opposed to ‘prescriptive’ models that describe HOW you’re going to get there). ONF uses the term ‘intent’ to describe the same concept, and yet others refer to these types of models as ‘desired-state’ models. While there are subtle nuances between these terms, they all effectively describe the same idea. As far as I know, TOSCA is currently the only standard for declarative models which is why it is gaining so much traction. One of the main benefits of TOSCA is that it provides abstraction, which reduces complexity tremendously for the service designer. TOSCA also use the same declarative approach to model services/applications as well as infrastructure, which eliminates the types of artificial boundaries one often sees in various industry ‘reference models’.”

This is great stuff, great insight.  The modeling distinction between prescriptive and declarative is an attribute of the first DevOps tools, and today’s most popular DevOps tools (Chef, which is prescriptive versus Puppet, which is declarative) still reflect the division.  When you apply the terms to SDN and NFV, there are factors other than DevOps roots that come into play, and these explain the industry shift (IMHO) toward intent modeling.  They also (again, IMHO) illustrate the balance that network/service modeling has to strike.

A prescriptive model, in the context of SDN or NFV, would describe how you do something like provision.  It’s easy to create one of these in the sense that the prescription is a reflection of manual processes or steps that would be needed.  Any time you create a new service or element, you can create a prescriptive model by replicating those manual steps.

Declarative network/service models, in contrast, reflect the functional goal—Chris’s “What?” element.  I’m trying to build a VPN—that’s a declaration.  Because the intent is what’s modeled, this is now often called “intent modeling”.  With intent modeling you can create a model without even knowing how you build it—the model represents a true abstraction.  But to use the model, you have to create a recipe for the instantiation.

Prescription equals recipe.  Declarative equals picture-of-the-dish.  The value proposition here is related to this point.  If you were planning a banquet you might like to know what you’re serving and how to arrange the courses before you go to the trouble of working out how to make each dish.  You might also like to be able to visualize two or three versions of a dish, perhaps one that’s gluten-free and another lactose-free, but set up the menu and presentation and change the execution to reflect a diner’s preference.

This is the reason why I think intent modeling and not prescriptive modeling is the right approach to SDN and NFV, and in fact for service modeling overall.  You should be able to manipulate functional abstractions for as long as possible, and resort to the details only where you commit or manage resources.  You should also be able to represent any element of a service (including access) and also represent what I’ll call “resource behaviors” like VPN or VPLS, then combine them in an easy function-feature-assembly way to create something to sell.

What’s messed up a lot of vendors/developers on the intent-declarative approach is the notion that the dissection of intent into deployment has to be authored into the software.  A widget object, in other words, is decomposed because the software knows what a widget is.  That’s clearly impractical because it forecloses future service creation without software development.  But declarative models don’t have to be decomposed this way, you can do the decomposition in terms of the model itself.  The recipe for instantiating a given model, then, is included in the model itself.

This can be carried even further.  A given intent-model represents a piece of a service, right?  It can be deployed, scaled, decommissioned, managed as a unit, right?  OK, then, it has lifecycle processes of its own.  Those processes can be linked to service events within the intent model, so you have a recipe for deployment, sure, but also a recipe for everything else in the lifecycle.  If we them presume that objects adjacent to a given object can receive and generate service events to/from it, we now can synchronize lifecycle processes through the whole structure, across all the intent-modeled elements.

You can even express the framework in which VNFs run as an intent model.  Most network software thinks it’s running in either an IP subnet or simply on a LAN.  The portals to the outside world are no different from VPN service points in terms of specifications.  Your management interfaces on the VNFs can be connected to management ports in this model, so essentially you’re composing the configuration/connection of elements as part of a model, meaning automatically.

Everything related to an element of a service, meaning every intent element model in a service model (“VPN”, “EthernetAccess”, etc.) can be represented in the model itself, as a lifecycle process.  You can do ordering, pass parameters, instantiate things, connect things, spawn sub-models, federate services to other implementations across administrative boundaries, whatever you like.  That includes references to outside processes like OSS/BSS/NMS because, in a data model recipe, everything is outside.  The model completely defines how it’s supposed to be processed, which means that you can extend the modeling simply by building a new model with new references.

This is what makes intent-model integration so easy.  Instead of having to write complex management elements specialized to a virtual network function, and then integrating them with the VNFs when you deploy, you simply define a handler that’s designed to connect to the VNF’s own management interfaces and reference that in the model.  If you have a dozen different “firewalls” you can have one intent-model-object to represent the high-level classification, and that decomposes into the specific implementation needed.

Another interesting thing about intent modeling is that it makes it easier to mix implementations.  Intent models discourage tight coupling between lower and higher levels in a structure, and that in turn means that a model in one implementation can be adapted for use in another.  You could in theory cross silo boundaries more easily, and this could be important for both SDN and NFV because neither standard has developed a full operations vision, which means their benefit cases are likely to be more specialized.  That promotes service-specific deployment and silos.

Even among the vendors who can make a broad operations business case that couples downward to the infrastructure, talking about intent models is rare and detailed explanations of strategy are largely absent.  I hope we see more modeling talk, because we’re getting to the point where the model is the message that matters.

 

It’s Time to Take a New Look at Transformation (Because We’ve Messed Up the Old Way)

If the biggest question asked by kids in a car is “Are we there yet”, it’s a question that could be applied with a small modification to our network transformation activity.  My suggestion is “Are we going to get there?”  The turn of the tide in publicity on NFV and even SDN is clear; we now see a lot of quotes about failures to meet business cases or issues with operations costs or agility.  I’ve suggested that it may be time to jump off the NFV and SDN horse in favor of something like CORD (Central Office Re-architected as a Datacenter) just to get some good ink.  What’s really happening with the SDN and NFV opportunity.

I ran my models over the last couple days, and interestingly enough the forecasts of the pace of benefit realization from SDN and NFV haven’t changed materially over the run last quarter.  So if somebody asks you whether SDN or NFV have failed, they have not—providing we judge them by my numbers on their realization of benefits.  They’re still on track.  One thing happening now is that my numbers, which were modest to the point of being unappealing to vendors, are now (without changing) becoming mainstream.

It’s the other thing that’s happening that’s important.  The impact of SDN and NFV on vendors is becoming more unfavorable, and that is probably the reason why we’re seeing a change in the tone of coverage on SDN and NFV.  There are more vendors than operators, more salespeople trying to make quota and disillusioned with their progress.  All that griping is coming to the surface.

The long-term impact of SDN and NFV, implemented properly, would be to realize a reduction of about a third in “process opex”, which equates to about ten cents per revenue dollar.  Right now, operators would be prepared to spend about six cents additional to secure the ten cents of benefits, so the opex improvements could increase capex by that much, which is more than 27%.  That’s an opportunity that would make the average sales management type salivate, and while these exact numbers aren’t usually shared or even known, it’s why SDN and NFV have been exciting to vendors.

When SDN and NFV first came along, it was generally expected that there would be a period of technology build-up during which costs would actually run ahead of benefits.  Operators know that “first cost” or pre-deployment in advance of benefits, is a given in their world.  As a result of this, my original forecast for 2015 and 2016 showed capex actually increasing due to SDN/NFV, and then tapering down after the positioning deployments were complete in 2018.  Obviously we didn’t get SDN or NFV in quantity in either year, but most significantly the SDN and NFV we did get was the “Lite” version that didn’t really change infrastructure much overall.

Had we gotten a strong SDN/NFV benefit story in 2014 we could have expected to see an improvement in operations efficiency and service agility two years earlier than the model now says we will.  That would have created enough opex/agility benefits in 2015 and beyond to largely eliminate pressure by operators to improve profits through capex reduction.  Since we didn’t get the benefits early enough, operators have embarked on a new approach that’s likely to reduce capex, and we’re seeing the signs of it now.

The biggest indicator is the interest in open-source.  Why do we now have four primary open-source activities associated with NFV?  Answer:  Because we didn’t get the right result through specs alone.  Operators have relied on vendors to support transformation, to field offerings that made business cases and to take the risks and absorb the R&D costs associated with transformation.  It’s harder to do that when a technology is expected to reduce spending, because those vendors are now being asked to invest in their future loss, not their future profit.  Operators now want to drive the bus for themselves, and they can’t do that in a standards body because the US and EU (for example) would then accuse them of anti-competitive collusion.  Open source offers a possible solution.

It also creates a pathway that further erodes benefits for the vendors.  Can you make money selling a full-spectrum NFV solution?  Yes, as long as somebody isn’t giving it away, and it’s very obvious that operators are advancing their open-source approach to the critical aspects of SDN and NFV—the high-level service operations and orchestration stuff that makes the business case.  If they succeed, then SDN and NFV will be commoditized where it matters most—where the story meets the budget.

This isn’t a slam dunk, though.  The SDN/NFV vendors could still pull their chestnuts out of the fire, providing they took some critical steps.  Open-source isn’t a rapid-response approach, and it may be years before the right solutions to all the critical problems are in place.  Right now, we don’t see an indicator that open-source projects already out there are addressing everything they need to address.

The first problem is opening the ecosystem.  With both SDN and NFV we have a new architecture that’s composed of a bunch of elements from different sources.  NFV is the most complicated—virtual functions, servers, managers, orchestrators, not to mention operations and management processes and lifecycle tools.  All of this has to fit into a defined structure, which we have yet to define.  What isn’t defined isn’t going to connect, and that means custom integration and substituting professional services costs for capex.  A vendor could create an open framework broad enough to fit all the elements, and systematize the building of the new network.

Every component of SDN and NFV has a place in an overall model.  There should be a set of APIs that represent all the connections, and these APIs should be defined well enough to translate into software without additional work.  Everything that has to connect to a given API set then has a blueprint for the changes that will be needed to support those connections.  Vendors could produce this, at least those with a fairly complete NFV implementation.

The second problem is educating the masses.  While it’s an exaggeration to say that vendors think all they need to do to sell SDN or NFV is to go to the operators’ headquarters with an order pad in hand, it’s not that much of one.  In the last month, almost two-thirds of vendor sales personnel who’ve emailed me on their activity say that the problem is the operator doesn’t understand the need for modernization.  When you dig in, you find that means that the vendor sales type doesn’t know how to make a business case, so they’d like the buyer to forget it and just transform on faith.

We need people to understand where NFV and SDN benefits come from and how they can be achieved.  When you say “capex reduction” how much money is really being spent on the equipment you’re impacting, as a percentage of total capex, and what realistic percentage of change can be expected?  If there are assumptions about economies of scale, how much will it cost to reach that scale?  If you’re proposing to reduce opex, what is it now and how is it made up?  You’d be amazed at the small number of vendors who have any idea what opex is today.  Without that, how could you reduce it?

A related issue is riding the horse you created.  The vendors have created the hype wave for SDN and NFV with advertising dollars and favorable editorial participation.  Now it’s clear that this has been a hype wave, and there’s a need to create favorable publicity again and in a different way—like CORD.  But it has to be toward a better goal, which is the framework of SDN and NFV that can actually deploy and make a difference.

This isn’t easy.  This blog will run more than three times as long as the average story on SDN or NFV or CORD, yet it doesn’t begin to explain the real issues and opportunities and benefits.  It would take over 10,000 words to do that, which no reporter is going to write and no editor is going to run.  What those reporters and editors could do is accentuate the real and not just the positive.  Most of what’s said about SDN and NFV has been, and is, absolute crap.  Find what’s not crap and write about it if you’re in the media, and say true things (even if it’s just to play to the climate of reality) if you’re a vendor.

The final problem is setting realistic expectations.  Vendors have to make a choice here.  There is enough opex savings available in 2016 and 2017 to fund a fairly aggressive SDN/NFV deployment, if you can support the right capabilities and if you’re prepared to take the time to make the business case.  The problem is that you won’t see a single dollar of that this year unless you’ve already gotten smart and already started selling the right way.  If you have, none of your customers have recognized it and told me!  On the other hand, if you want to make some money on SDN/NFV today you can do that by whacking at all the low apples.  If that happens you will build nothing extensible and you’ll run out of steam next year, with no chance of improvement thereafter.

Sales people tell me their management expects too much, and they’re right.  I don’t know of a single vendor who has a realistic and optimal strategy for SDN/NFV sales, even the half-dozen who have all the pieces needed.  They see transformation as being self-justifying.  They cite “IP convergence” as an example of why nothing complicated is needed, forgetting that IP convergence was a direct result of consumer data service opportunity created by the Internet.  We had a business case ready-made.  We don’t have that now.

We can still get it.  All the elements of a good business case for transformation are available right now in the marketplace.  All the knowledge needed to apply them is also there.  Yes, it’s going to be harder and yes, you’re going to have to wait a couple years for the money to roll in.  The thing is, you are not winning now as a vendor or as an operator, in the game you’re playing.  Reality has to be better.

 

Learning Some Lessons from AT&T’s ECOMP

Any time a Tier One decides to open the kimono a bit on their planning for transformation it’s important.  AT&T did that with its paper on ECOMP, which stands for “Enhanced Control, Orchestration, Management, and Policy” and the topic makes the comments doubly important.  As operators seem to be looking to take on more of the heavy lifting in transformation ECOMP is a signpost on what a major operator thinks it has to do.  Which of course means it’s an exemplar for doing the right thing.

ECOMP, like so many other things we now seem to be seeing, is a top-down, service-lifecycle-focused approach.  AT&T link it to their Domain 2.0 (D2) project, which has been evolving for almost five years now, and which guides infrastructure procurement.  It seems that ECOMP is an indicator that service lifecycles exist outside the pure infrastructure realm; “ECOMP is critical in achieving AT&T’s D2 imperatives” according to the white paper.

As an architecture for D2, ECOMP is more than SDN, more than NFV, and even more than OSS/BSS, and in fact it displaces some of the functions of all of these things.  Service lifecycle management ECOMP-style steals lifecycle management features from all of these things, and so it can be seen as a kind of shim that connects the next generation of infrastructure and the next generation of services and facilitates evolution both in the service and infrastructure spaces.

The structure of ECOMP is interesting.  At the top, it’s a multi-dimensional portal that offers both direct access to design and operations functions and an interface through which current OSS/BSS systems and presumably NOC tools and procedures can access information.  There’s also provision for big-data analytics integration at the top.  Below that on the “left” in the figure, are the new elements ECOMP introduces, primarily in the area of design and policy specifications.  On the “right” are the collection of applications, services, and tools from legacy and new sources that form the engine of ECOMP, under the control of the Master Service Orchestrator (MSO).  Controllers and infrastructure managers fall into this portion.

The main diagram for ECOMP doesn’t name SDN or NFV (NFV’s VNFs are listed as managed element examples) but it’s pretty clear that ECOMP and MSO live well above both these technologies and that legacy management interfaces and the devices they represent are co-equal with the new stuff in terms of creating service resources.  Thus, like Verizon, AT&T is creating a model of future networking that embraces current technology.  That’s in part for evolutionary reasons, but also I think to keep their technology options open and to introduce orchestration for efficiency and agility without committing to major infrastructure changes.

According to the paper, “Orchestration is the function defined via a process specification that is executed by an orchestrator component which automates sequences of activities, tasks, rules and policies needed for on-demand creation, modification or removal of network, application or infrastructure services and resources.”  No technology specificity and no indication of reliance on higher-level OSS/BSS processes.  The process specification drives orchestration.  It’s also clear in a later section that ECOMP so extends NFV specifications as to totally subsume them, creating a higher-level structure that the NFV ISG might have created directly had they taken a top-down approach.

The biggest advance ECOMP specifies, IMHO, is the metadata-driven generic VNF Manager, which presumably eliminates the need for one-off VNF integration during onboarding, something that the VNF-specific VNFM concept of the ETSI ISG leads to.  This, says the paper, “allows us to quickly on-board new VNF types, without going through long development and integration cycles and efficiently manage cross-dependencies between various VNFs. Once a VNF is on-boarded, the design time framework facilitates rapid incorporation into future services.”  This concept of metadata-driven VNF management is critical, and while the paper doesn’t say so it would appear that the same model could be applied to legacy network elements, which means management could be generic overall.

AT&T Service Design and Creation (ASDC) is the modeling application that manages all this metadata, and metadata also controls the integration of the components of ECOMP themselves.  Thus, ECOMP is a realization of a data-model-driven approach, the very thing I think that the ETSI ISG, the ONF, and the TMF should have worked for from the first.  It appears that metadata from the resource side and from the service side are model-bound, which makes deployment of services resource-independent as long as the modeling defines conditional selection of infrastructure based on things like location or performance—which appears to be the case.

The modeling approach taken by ECOMP seems to draw from a lot of sources.  They have separate service, resource, and product models and that’s a feature of the TMF’s SID approach.  They use inheritance and some of the cloud-centric features of TOSCA too, and in configuration they use YANG.  There’s nothing wrong with multiple modeling approaches, though as I’ve said in the past I believe there are benefits to having a common modeling language down to the level right above the resources themselves.

There’s a lot to like about ECOMP even if you forget for the moment that it’s an operator’s solution to operators’ problems.  There’s also at least a chance (if there’s enough interest, says AT&T) that it will end up open-sourced.  Obviously that would create competition for vendor approaches, but ECOMP could have an impact on vendor solutions whether AT&T opens it up or not.

There is nothing here that a full-spectrum business-case-ready NFV solution could not provide.  Every one of the six vendors I’ve cited in that space could frame something functionally close enough to ECOMP to be suitable.  It would be tempting to suggest that if vendors had jumped out quickly enough, AT&T might have adopted a vendor approach, but from what I know of ECOMP evolution that’s not true.  ECOMP concepts go back to 2013 before any of the implementations of NFV from those six vendors was even announced.

What is relevant at this point is how ECOMP could impact the market dynamic.  If you add it to the CORD interest I’ve recently blogged about, it’s obvious that operators are well into a new stage of SDN and NFV, one where CTOs and standards people have to take a back seat to CFOs, CIOs, and product/service management.  The result of that is a quick upward welling of focus toward the business-case issues.

There are similarities between CORD ant ECOMP, mostly at an architecture or goal level.  CORD seems to take things further and in a slightly different direction, helping operators assemble infrastructure by assembling feature-repository offices instead of trying to glue zillions of technical elements.  The CORD orchestration model, XOS, binds all the functionality and resources.  With ECOMP, the binding and lifecycle processes themselves are the goal, but you probably end up in the same place.

CORD, and now at least for AT&T, ECOMP, represent the new generation in transformation visions.  At least a dozen operators I’ve heard from are looking at something bigger, broader, and more directly aimed at securing benefits than either SDN or NFV.  While some may believe this kind of thinking would kill the two recent industry favorites, the fact is that it could save them.  If SDN and NFV are incorporated in a benefit-capable framework they have a chance.  If not….

If I were a vendor in the NFV space, what I’d do now is to jump on CORD and forget any specific worries about SDN and NFV standards.  ECOMP blows kisses at NFV in particular, but it clearly is a redo of an awful lot, and it ignores even more.  I think any vendor who sticks with the ETSI vision at this point as being sufficient is dooming themselves.  But you can’t roll your own NFV as a vendor, which means you need to lock onto another architecture.  CORD is then only game in town.

CORD isn’t complete.  SDN and NFV aren’t complete.  Nothing much is “complete” in the transformation space or we’d have transformed by now.  But ECOMP shows what at least one form of completeness would look like, would include.  It’s close enough to CORD that it can be reached from there.  ECOMP also shows how badly we’ve messed up what we’re saying are our most transformational network technologies.  I hope we don’t mess up the next one too.