Have We Had the Solution to SDN Control All Along?

The question of how the network of the future could work, how SDN in particular could be introduced and managed, needs to be answered.  What’s really interesting is that it might have been answered already, and that not only are we not running to explore the solution, we might be running away.

One of the goals of SDN (at least the OpenFlow version) was to substitute central and explicit control over forwarding tables for adaptive protocols and behavior.  An SDN controller is expected to “rule” on explicit requests for routes or to set up default routes based on some policies.  The challenge with this is that it’s obvious that central control of a very large network with a single controller begs major scalability issues.  The solution so far has been to apply SDN to relatively limited domains and to rely on legacy interworking techniques to connect these domains.  But that limits SDNs benefits.  Is there a better way?

SDN displaces adaptive routing.  Could it be that there were developments in the past that could be considered in deciding how SDN networks might work?  Remember the line “Return with us now to those thrilling days of yesteryear?”  Well, it’s not the Lone Ranger here, but maybe the “Lonely Protocol” I’m talking about.  Let’s go back in time to the late ‘90s and early 2000’s.  We were in the age of frame relay and ATM, and startups like Ipsilon, and we were struggling to figure out how to make connection-oriented protocols work with IP.  Along came a solution, what was known as NHRP or the Next Hop Resolution Protocol.

“Call-oriented” services like ATM required that a connection be made by setting up a virtual circuit to something specific.  If we imagine an IP user at the edge of a frame relay network wanting to do an IP session with another such user, you can see the problem.  Both users are on the network but the network doesn’t support the normal IP discovery processes.  The standards (in the IETF) at the time even came up with a name for this kind of network—an “NBMA” or “non-broadcast multi-access” network.

NHRP was a solution to the NBMA problem.  You had a “next-hop server” that was a lot like a DNS.  The server would provide an NHRP agent at the edge of an NBMA with the network address (in the NBMA) of the “next hop”.  The agent would then call the designated partner agent, make the connection, and we’d have a packet path.  Even if the network were made up of multiple subnet-NBMAs we could run NHRP at each border to find a way across.

No, we’re not going back to ATM here, but consider for a moment the question that SDN should be considering; is an SDN network something like an NBMA?  There are no forwarding paths in the network in its initial state, right?  There’s a “server” that can establish what needs to be done for a given packet at a given node, but no systemic notion of destination.  Could we adopt NHRP principles at the edge of an SDN network to establish pathways for packets based on source/destination?

This probably seems like a lot of work, going to next-hop servers and all.  Remember, though, that we go to DNS servers to decode the URLs that start most web activity.  We also go to an SDN controller to get forwarding instructions, and this on a per-packet basis.  At the very least, something like NHRP could be used for high-value services, and it could easily be used in carrier Ethernet, VPNs, and so forth.  Could it scale to the Internet?  As easily as SDN could without it, and perhaps more easily.

An NHRP-ish concept could in fact be combined with DNS.  We could get an IP address the way we do now when we decode a URL, but we could also get routing information, almost like the source-route vector that was a part of frame relay and ATM and also a part of MPLS.  Suppose the DNS server returned to us an ordered list of NHRP-ish domains that we had to transit to the destination.  We’d then move the packet to the edge of the first domain, let it get through itself as needed, and then do the same with the rest.

With this sort of model (and again I stress I’m citing this as an example, just something to start discussions) we have a mechanism for inter-SDN-domain linkage.  We also have a way of using SDN and the model to improve security.  The enhanced DNS-and-NHRP stuff could also be used to validate source address, so that packets can’t be emitted into the network unless they’re part of the “source tree” that they should be given their address.  You can also quench sources at the source only, by telling their home domain not to connect to them.

This would work for SDN, but it would also work for any tunneling or “connection” protocol as well, at any of the OSI levels.  We could tunnel through something Ethernet-like, tunnel Ethernet through something IP-like, tunnel both through SDN.  Add in the microservices concept that I talked about yesterday to handle control-layer and management-layer interactions, and you could compose a service protocol from bare connectivity, which is what SDN provides.

There are obviously questions about how we’d set up a DNS-like hierarchy for this model, and how we’d integrate it with current IP, but I think you can see that we could route “normally” where we have adaptive discovery and using the new model where we don’t.  There may indeed be scalability issues but those wouldn’t be any worse than we face with DNS now, and that we’d face with SDN controllers in the future.

The net of all of this is that we’d establish a model of “destination finding” that isn’t dependent on discovery, and that we would be able to apply that method to any technology that has forwarding tables that allow for packet movement and delivery.  That includes “connection-mode” stuff like RSVP/MPLS and OpenFlow SDN.  We can even fit the model to legacy technology.

NHRP has been around a long time so you might think that all this is in place and ready to be used.  Well, while researching it I found a note that said Cisco was pulling NHRP support from its products.  It seems to me that we should be looking instead about how either NHRP or its principles could support SDN.  Of course, Cisco’s not a big fan of a revolutionary SDN transition.

I know a lot of people besides Cisco are going to tell me this is an awful idea (or at least they’ll think that!) and that we should be using the pure notions of IP.  I don’t have a problem with that assertion.  If the industry wants IP forever, go for it.  The problem is that we’re saying we want “new” technologies and benefits.  If we plan on doing everything the old way, we have to square that attitude with the notion that we’re going to do something revolutionary with SDN.

Can NaaS and Microservices Shape a Generalized SDN/NFV Service Model?

In my blog on Friday of last week, I talked about the pitfalls of not examining the details of current network services when considering how SDN or NFV might be used to implement or replace them.  Some of you have noticed that the blog opens a door to considering network services of all sorts as a kind of hybrid involving two hot topics—NaaS and “microservices”.  I want to use a discussion of those to start a deeper dive into a service model for SDN and NFV that will carry through both today’s blog and the one tomorrow.

Let’s imagine an access connection, something like Carrier Ethernet or a consumer broadband connection.  It’s just been installed and it doesn’t do anything yet.  Pump in a typical packet and it goes into the bit bucket.  But let’s also suppose that this connection has one special IP address, like 10.0.0.1.  If you go to Port 80 (html) of that address, you would see a portal, and through that portal you could now set up a service.  Maybe it’s consumer Internet, or maybe a VLAN, or even perhaps both.  Click and you have forwarding as requested, and the ability to receive packets from specified places.  You could also say that you wanted firewall, DNS, DHCP, VPN client, special gaming or business collaboration.  Whatever you pick is strung out on your access line and now accessible to you.

This sort of thing is where I think that NaaS, SDN, and NFV could get us, with a healthy dose of “microservices” that would be a lot like VNFs but also a lot more.

All technology innovations aside, it’s time we started thinking about access connections as assets that we, and operators, can leverage freely.  They should not be dedicated to a specific service, but to an elastic set of service relationships that might be delivered using any convenient protocol.  A blank slate.  This model would suggest that our access portal could deliver to us a set of network-as-a-service capabilities, but if you look deeper each of these would consist of a connection model and a microservice set.

The connection model is simply a description of the forwarding behavior and site community associated with a NaaS.  This is what would define the addressing mechanism used and how traffic emitted at a given site would be handled (line, LAN, tree, are the classics).  The microservice set would provide higher-layer features, and could be extended to offer even more—which I’ll get to below.

What this would create would be something like an on-ramp to an expressway that’s lined with useful little shops.  You could access only the shops, you could access only the expressway (picking your route) and you could do a bit of both.  I’m not saying this is the only to do it, but this would create the architectural model of an “enhanced” service.

Management of this could be done through the portal, meaning that we could set this up to deliver web-based management.  We could also deliver SNMP management by simply opening an SNMP microservice, and we could deliver an arbitrary management API as a microservice too, something like TMF MTOSI.

The microservices in this model could be hosted in the cloud, of course, and they could either be deployed per-tenant on demand or be multi-tenant services.  In addition, they could be deployed on the customer premises, either in a private cloud or in CPE that provides service termination.  The model doesn’t care what a microservice is or does, so it blends cloud applications and NFV features and makes it clear that what’s needed for either is increasingly needed for both.

There are obviously a lot of ways of looking at this besides the one I’m proposing, but I hope my comments make my central point.  We need to examine the end-game of SDN, NFV, and the cloud.  We need a picture of what networks and services and applications will look like at that point, because without it we’re going to have real trouble in the evolution to SDN and NFV.

Remember my comments on how all this arguably started was an operator vision of transformation?  Well, you can’t get much traction without having some sense of what you’re transforming to.  Part of that is simple merchandizing.  Advocates for SDN and NFV can talk about “starting small” or “early applications” or “basic services”, but transformation isn’t about limited change, it’s about massive change.  To what?

Not to mention the specific benefits.  If NFV or SDN improves capex or opex or agility, they’d have to spread widely enough for the improvements they offered within their scope to be significant at the business level.  Nobody will bet their job on a migration to something that saves one percent.

Where I think the big problem with limited-scope thinking comes in is in hiding the need for and even value of systemic strategies.  I talked last week about the protocol issues of current services and their impact on SDN and NFV.  The model of NaaS and microservices that I described here could address those issues.  But how about the problems of SDN and NFV?

Let’s look at SDN as an example.  We have three possible modes of SDN operations.  One is where connectivity doesn’t exist and forwarding tables don’t exist, and the introduction of packets stimulates devices to ask the controller what to do with them, thus building up the routes.  Clearly this would never work on the Internet with some gigantic central server; it would likely take weeks to converge on full connectivity.  Another mode is where the central controller preconfigures routes.  This is fine for “interior” routing, but users appear and disappear at the edge and it’s getting to their addresses that forwarding is all about.  The final mode is adaptive, which gets us to building something we say is SDN but is actually just doing legacy routing/switching a little differently.

I think that future services will be NaaS-like, meaning pure forwarding with no inherent control/management behavior.  I think that control-plane activity will be supported then through microservices, and that microservices will also offer management connections.  I’d guess that many agree with these points, but I’d be happy if someone presented an alternative model.  Happy because it would get us to the discussion of what the ultimate SDN/NFV/cloud network would look and work like, and how we’d use our revolutionary technologies to get to that state.  We need that.

The Difference Between Software-Hosted and Software-Defined

I doubt anyone would disagree if I said that we had a strong tendency to oversimplify the impacts of changes or new concepts in networking.  It’s a combination of the desire by network users to focus on what they want not how to get it, and the desire of the media to turn every development into a one-liner.  The problem is that the tendency can result in under-appreciation of both issues and opportunities.

Network connectivity at its core is pretty simple.  You squirt a packet into the network and based on addressing it’s delivered to the correct destination.  If you look at this level of behavior, it seems pretty easy to convert this to “virtual switching and routing” or “white box” or “OpenFlow”.  The problem is that the real process of networking is a lot more complicated.

To start with, networks have to know where that correct destination is.  That’s done today through a discovery process that involves a combination of finding users at the edge and then propagating knowledge of where they are through the network so each device knows where to send stuff.  In theory I could eliminate discovery completely but if I do that then I have to tell “the network” the exact location of every user and work out how to reach them.  The more users and the more routes, the more difficult it would be.

Discovery is part of what’s usually called “control plane processes”, which means it’s supported by protocols that communicate with the network rather than with users.  There are a decent number of other control-plane processes out there.  “Ping” and “traceroute” are common examples.  If we were to eliminate control-plane processes then these protocols wouldn’t work if users exercised them.  If we don’t eliminate them then we have to spoof in some way what the protocol would do.  Do we let users discover “real” routes in an SDN/OpenFlow network using traceroute or do we show them a virtual route?  And whatever choice we make we have to introduce a process somewhere that delivers what’s expected.

Then there’s management.  At a simple level, users would typically expect to have some management access to network elements, something like an SNMP port.  So now what do we give them in the world of SDN or NFV?  The “real” network element they expect, like a fully functioning router/switch or a firewall, isn’t there.  Instead we have a collection of distributed functions hosted on computers that the user has no idea are in place, connected by virtual links or chains that used to be internal data paths on a device, and representing shared facilities that the user can’t be allowed to control even if they knew how.

If we look at the implications of this on “virtual routing” we see perhaps for the first time the difference between two solutions we hear about all the time but never seem to get straight.  “SDN” virtual routing using OpenFlow doesn’t have real routers.  There are no embedded features to provide control-plane responses or management features.  It forwards stuff, and if you want more you have to use almost-NFV-like function hosting to add the capabilities in.  Non-SDN virtual routing (Brocade’s Vyatta stuff is a good example) is a virtualized, hostable, router.  It’s a real router, functionally, and you can build networks with it just like you’d do with router devices.  You have no issues of control or management because your virtual router looks like the real thing—because it is.

The first important conclusion you can draw from this is that the more you expect to have virtualized network elements represent every aspect of the behavior of real devices, the more you need a hosted software version of the real device and not a “white-box” or forwarding-level equivalent.  Rightfully, the Brocade virtual routers are neither SDN nor NFV.  They’re software routers and not software-defined routers.  That, I’d assert, is true with NFV too.  If we have to present a virtual CPE element exactly like real CPE would look at all control and management levels, then we need to have an agile box on the premises into which we can load features, not a cloud-distributed chain of features.

This might sound like I’m saying that SDN and NFV are intrinsically crippled, but what I’m really saying is that we’re intrinsically crippling them.  The fallacy in our thinking is that we have to “represent every aspect of the behavior of real devices.”  If we toss that aside we could imagine a lot of interesting things.

Suppose we had a network with no control-plane packets at all.  Suppose that any given network access port was disconnected until we sent an XML message to a management system that provided ownership credentials and the address mechanism to be used there.  Suppose that a user who wanted NaaS simply provided the network with an XML manifest of the points to be connected using logical addresses, real addresses, any addresses.   We can now define a NaaS on the basis of completely managed connectivity.

Suppose that we define management processes as a data relationship set that binds the status of real resources to the status of the NaaS services these resources support.  We can now find out what the state of a network service is without ever giving direct access to the devices, and we can use the same MIBs and processes to manage hosted software elements or real appliances.

Network protocols have created those velvet ribbons that tie us to stuff without our realizing it, and the way that we use these protocols in the future will have an enormous impact on the extent to which we can optimize our use of SDN or NFV.  We should be able to define services in every respect in a virtual world, and our changes in definition shouldn’t have an enormous impact on cost or operations efficiency.  IP should not be the underlying connection/forwarding fabric—that should be the simple OpenFlow-like packet forwarding based on a mystical forwarding table.  We should be able to build IP, or Ethernet, or any arbitrary collection of network services, equally well on this agile base.

Why can’t we?  Because it’s complicated, and so everyone focuses on making something far more trivial work quickly.  That generates little islands or silos, but it doesn’t build a network and networks are what we’re supposed to be doing.  I think that both SDN and NFV can do a lot more for us than we think, if we start thinking seriously not only about what networks do but how they do it.

What We Can Learn From Chambers’ White-Box-is-Dead Comment

John Chambers said a while ago that the “white box” players were dead and that Cisco had at least helped to kill them.  This is the sort of Chamberesque statement that always gets ink, but we always have to dig into those sorts of statements.  “News” means “novelty” not “truth”.

The whole white-box thing was in large part a creation of the hype engine (media, analysts, and VCs) linked to SDN and later NFV.  The idea was that SDN and NFV would displace proprietary network devices.  Since software has to run in something, the “something” left behind when the last vestiges of proprietarism were stamped out was white (for purity, no doubt) boxes.  SDN, in particular, was seen as the launch point for a populist revolution against big router/switch vendors.

Why didn’t this work?  Well, let me start by posing two questions to you.  First, “What brand of car do you drive?”  Second, “What brand of milk do you buy?”  You likely have a ready answer to the first and no idea on the second.  The reason is that milk is a commodity.  White-box switches are by definition featureless commodities, right?  One instance is as good as another.  So let’s now pose a third question.  “Who spends money to advertise and promote a featureless commodity?”  Answer, “Someone soon to be unemployed.”

Not good enough?  Let’s look at another angle, then.  You’re going into your CFO’s office to buy a million dollars’ worth of network goodies.  “Who makes it?” asks the CFO.  Choice 1:  “Cisco.”  Choice 2:  “I don’t know, it could be anybody.”  Which answer is best?

Still not good enough?  Third slant, then.  You’re called into the CEO’s office because the network is down and nothing that was supposed to be working is working.  CEO’s question:  “Well, who do our lawyers call to file the suit?”  Choice 1:  “Cisco.”  Choice 2:  “Gosh, I think their name is ‘FlownAway Switching’ this week but it’s changed a couple times since we installed.”  How far into orbit does the CEO go for each of these choices?

There’s solid facts behind the whimsy here.  Companies want stuff to work, and most would admit they could never hope to test every choice fully.  They rely on “reputation”, which means that the name is widely publicized, the product concept seems to be market-accepted, and the vendor is large enough to settle a lawsuit if it comes to that.

The point here is that it’s true in one sense that Cisco killed the white-box movement.  Had Cisco spent its marketing money and skill promoting featureless, commodity, switch/routers, the movement would have succeeded—and Chambers would have been gone a lot quicker.  Incumbents will never fund their own demise, so the deep truth is that natural market forces killed white boxes, for now at least.

These same market forces impact SDN and NFV more broadly.  One of the interesting things about NFV, for example, is that the standards-science-and-technology teams tend to like network vendor implementations of NFV while the CIO and CFO tend to like computer-vendor implementations best.  The reason is simple; you want NFV to be offered by a vendor with a lot of skin in the game.  If infrastructure transformation through NFV means a shift to servers, why not pick a server vendor?  They have more to gain and lose.

I think that announcements of server products or partnerships by Alcatel-Lucent and Ericsson reflect this truth, and also the flip side, which is that if NFV is going to be a consultative sell (and if it’s not then I don’t know what would be!) it makes sense that vendors would be more inclined to stay the course if they could benefit strongly from a successful outcome.  Think of an orchestration player going through all the hoops of NFV orchestration and management validation to get perhaps a couple million out of a multi-billion-dollar deployment.

I also think there are support issues here.  NFV and SDN both present the operators with a dilemma.  On one hand they are all committed to a “no vendor lock-in” position and extremely wary about having a vendor create one or more NFV silos.  On the other hand they’re wary about being gouged on integration and professional services.

What they should want is what you could characterize as “open product suites”, meaning that they want SDN and NFV to be based on principles/architectures that at least open up the areas where most product spending will happen.  Thus, perhaps the most important single piece of NFV is what I’ve been calling the “Infrastructure Manager”, a subset of which is the ETSI VIM.  If every piece of current or future hardware that’s useful in service deployment can be represented by an IM and orchestrated/managed, then operators have openness where it counts.

Getting to that point is really about the MANO/IM interface, just as getting to a universal and open vision of SDN is really about how the northbound interfaces talk with whatever is northbound.  An SDN controller is a broker for NaaS.  You need to be able to describe the connection model you’re looking for in a totally implementation-independent way, and then translate that model into network setup for whatever is underneath.

That works with NFV too, but it’s more complicated because NFV deploys functional elements and connects things, so the scope of what has to be modeled is greater.  People have proposed YANG as a modeling language for NFV MANO interfaces, but YANG is best used as a connection model.  We have some insight into what process models should look like with OpenStack, but OpenStack also has connection models (via Neutron) and these models are currently limited to things that application/cloud applications would want to set up, which means IP subnets and VPNs and so forth.

It’s the management of these abstractions that creates the challenge for both, however.  If I have an intent model in abstract, and I’m going to realize that intent through resource commitments, then I need to have three things.  First, I need to know what the management of my intent model would look like.  Yes, I have to manage it because that intent model is what the customer is buying.  Second I have to know what the management of the committed resources is like, because you can’t fix virtual problems.  Finally, I have to know the derivations or bindings that describe how intent model management can be derived from the real resource management.

Where is all this stuff in SDN and NFV?  Nowhere that’s even proximate to utility.  Thus, we’re forced to presume that if SDN and NFV are adopted, current management practices and tools will be less useful because we’re losing the ability to manage what we’re selling, or at least disconnecting that from the management of the real resources used for fulfillment.  We can’t do fault tracking when we can’t see the connection; we can’t present a user who bought a firewall or a VPN with the need to manage real resources that wouldn’t in the user’s mind even be part of such a service.  Virtualization, to work, must be transparent, and transparency can be achieved only by abstraction.

I see some signs that there are operators in the NFV process who agree with this position, and we may be able to fix things at the specification level.  The question is whether we could define a suitable abstraction-based vision of IM/VIM and its interface with MANO or northbound apps in time to test the full range of service lifecycle processes for NFV and SDN.  If not, we’re going to have a hard time putting together a convincing field trial, and a hard time ever making anything other than “current boxes” winners.

HP Boosts their NFV Position

HP ranks among, if not on top of, my selection of bona fide NFV providers.  Their OpenNFV architecture is comprehensive in its support for operations integration and legacy device control, both of which are critical to making an early NFV business case.  Now they’re taking on another issue, one that’s been raised in a number of places including recent LinkedIn comments on my blogs.  The issue is VNF integration.

What HP has announced is HP NFV System, which is a kind of NFVI in a box/package approach.  HP takes its own NFV Virtual Infrastructure Manager and server hardware that includes its carrier-grade OpenStack, pairs it with Wind River platform software and HP’s own resource management tools, and creates a handy resource bundle that can be packaged with VNFs to create either a complete service solution or a set of components that can build a variety of services.  Or, of course, all of the above.  The four “kits” that make up NFV System customize it for trial and standard missions and expand its capacity as needed.

Of course, NFV doesn’t need good strategies as much as it needs solutions to problems.  The problem that NFV System may help solve is the orphaning of VNFs in the whole NFV onrush process.  Everything about NFV is really about hosting VNFs and composing them into service, but some VNF providers and likely more prospective providers have a hard time figuring out how they engage at all.

VNFs, according to the ETSI model, run on NFV Infrastructure (NFVI).  This in turn is represented to the rest of NFV through a Virtual Infrastructure Manager or VIM.  Some of the latest models of NFV seem to want to make the VIM a part of MANO, which I disagree with.  A VIM, and a superset of the VIM that ETSI is kind of dabbling around recognizing that we might call an Infrastructure Manager, are what resources look like to other elements of NFV.  If you have an IM that recognizes your hardware then you have something that can be MANOed, so to speak.

That’s the principle NFV System is based on.  It’s an NFVI atom that should plug into a standard NFV implementation because it’s a VIM with conforming technology underneath.  That means that a software provider who wants to package their stuff for deployment could buy NFV System, add their VNFs, and sell the result.

This could invigorate the VNF space, which has so far been more posturing than substance.  Without a VIM something that purports to be NFV-compatible is just hardware, and without a VIM to run through software functionality isn’t a VNF it’s simply a cloud application.  HP has given voice, perhaps, to the VNF community at long last.

I say “perhaps” because we are still facing what IMHO is an imperfect management model for NFV and also an incomplete or inadequate model for how MANO talks to a VIM/IM.

The problem with the NFV management model is one I’ve blogged about before; here let me confine myself to saying that there is no explicit way that VNFs are pushed through the service lifecycle, there are some options.  Even if those options are all workable (which I don’t believe to be the case), the problem is that there are multiple options and everyone won’t pick the same one.  Right now a VNF provider would likely have to provide some integrated internal VNFMs packaged with their stuff.  How those link to systemic management is still fuzzy.

The model problem is simpler to describe and as hard or harder to solve.  Does a VIM represent only one model of deployment or connection?  That would seem wasteful, but if there are several things a VIM can ask for it’s not entirely clear how it asks.  There are examples of using YANG to describe service chains, OpenStack to host.  But OpenStack Neutron recognizes a number of network models, so do we use that or use YANG or maybe something else entirely?  Without a firm view of what MANO tells a VIM to do, how do we know that a given VIM (and the underlying NFVI) plug into a given MANO?

This isn’t HP’s problem to solve at the industry level, of course.  HP’s OpenNFV has a broader VIM/IM model than ETSI defines, and a much broader set of options for integrating management and operations.  Their NFV Systems would plug into their own OpenNFV software easily of course, and it would likely be at least more adaptable to a generic ETSI-compliant MANO than network functions off the shelf would be, given that there’s a VIM available.  The ETSI process isn’t plug and play with respect to VNFs, though, and HP alone can’t force a repair.

This is a really interesting concept for a number of reasons, the obvious one being the empowerment of VNF providers.  Some of the less obvious reasons may end up being the most compelling.

The first is that it highlights why VIMs need to be generalized to IMs and decisively made a part of NFVI and not part of MANO.  Anyone should be able to create a seat at the table for their VNFs by providing a suitable VIM to deploy them and suitable NFVI to run them on.  HP, by offering NFV System, may make it harder for the industry to dodge these points.

Next, we are significantly behind the eight-ball with respect to both how MANO describes a desired service to a VIM and how NFV and NFV-legacy hybrid services are managed.  HP happens to have a good approach to both, and by making NFV System available they’re showing the world that there are sockets and plugs in NFV that just might not connect.  Competition might induce ETSI or OPNFV to clean things up, or induce other vendors to respond with proprietary approaches.  Either is better than a “wing and a prayer.”

Finally, this may well be the first example of somebody coming out with commercial NFV.  This is a product, designed to support specific goals and to be used by specific prospective customers.  It may not be needed in a lab trial or PoC, or even be critical in early field trials, but it does show that HP is taking NFV seriously.  Given that an optimum NFV implementation would be the largest source of new data centers globally, that gives HP a big advantage in itself.

HP collaterally announced enhancements to HP’s Director orchestration product are making a good NFV strategy even better.  It’s worth citing the release on this:

HP NFV Director 3.0 provides enhanced management and orchestration (MANO) capabilities that streamline bridging of NFV and telecommunications resources, as well as enable faster VNF on-boarding. It combines operations support systems (OSS) and IT management capabilities in a comprehensive, multi-vendor NFV orchestration solution that automate service and infrastructure management to enhance the flexibility and scalability of CSPs current OSS systems.

Bridging telecom and NFV resources means supporting both legacy and NFV-driven infrastructure, critical in making a transition to NFV and to supporting realistic future services that will always draw at least some services from traditional equipment.  Faster VNF onboarding means getting functional value to operators faster.  Combining OSS and IT management in a comprehensive multi-vendor orchestration solution is what efficient operations means.  Director aligns HP even more with the critical early benefit drivers for NFV deployment.

I don’t have the details on all the latest Director stuff at this stage, but what’s out is impressive.  It’s certainly something that other vendors need to be worrying about.  Standards don’t seem likely to create NFV success, and a thoughtful blog by Patrick Lopez published HERE suggests that OPNFV may not be the magic formula for NFV success either.  I suggested all last week that it might be time to let vendor innovation contribute the critical pieces that standards haven’t provided.  Maybe HP is doing that, which could be good for the industry and very good for HP.

What Will Cisco-Under-Robbins Be Like?

I remember a Cisco before John Chambers but I suspect most in the industry today do not.  For those people it might seem a frightening prospect.  For some who have been disappointed by Cisco’s seemingly lackluster support of new initiatives like SDN and NFV, it may seem like an opportunity.  Obviously we have to see what will happen under Chuck Robbins, the new CEO, but there’s a strong chance it will be business as usual.  If it is, then Cisco is making a big bet.

At a time when networking was about technology and strategy, and when network equipment vendors were usually run by technology-strategy people, Chambers brought in the sales side.  He was always a salesman first, even as CEO of Cisco.  One (now-ex) Cisco VP told me “John was the master of tactics over strategy.”  That this worked in the enterprise market is hardly surprising, but that it also worked in the long-cycle-planning-dominated carrier market is more so.  The two questions all Cisco-watchers will ask are “Is Robbins going to stay the Chambers course?” and “Will what worked for Cisco in the past work in the future?”

It’s interesting to contrast Chuck Robbins with a previous heir-apparent, Charlie Giancarlo (another “Charles”).  Charlie was an intellectual, a strategist, a thinker and a technologist.  Chuck is a sales and channel guy in his past jobs with Cisco, a mover and shaker in those spaces but more a newcomer than some on the Street had expected.  The point is that on the surface Cisco has shied away from the strategists in favor of another tactician.  One might see that as a vote against change.

I think it’s a vote against radical change, but the jury is out on a broader implication.  There’s no reason for Cisco to rush to exit a game it’s winning.  Operators seem to be inclined to stay the course with respect to their switch/router vendors, a fact that’s benefitted Cisco and perhaps Juniper and Alcatel-Lucent as well.  Operationally and integration-wise, that’s the easiest course to take.  SDN and NFV could both ease the pain of a shift in vendor, product, or even architectural strategy down the line but we’d have to have those technologies in place to do that.  Cisco benefits from dragging its own feet, and even by actively suppressing motion to SDN and NFV where it can.

For now.  At some point, operators will have the tools they need to accomplish a shift in strategy, and the benefit case to drive the change on a broad basis.  One might be tempted to speculate that Cisco sees that point coming soon, and sees the need for Cisco to change faces to adapt to the new picture as a good time to change CEOs.  What better way to justify a shift in strategy?

Some of Chuck’s early comments about the “Internet of Everything” seem to be a specific commitment to the old Cisco, though.  Cisco’s primary marketing thesis for a decade has been “Traffic is going to increase, operators, so suck it up and buy Cisco stuff to carry it!”   The Internet of Everything is an example of that; so what if nothing that’s being done even today is profitable for operators, there’s more coming.  If a new CEO fossilizes this positioning it’s going to be hard to embrace logic and reality down the line.

Of course it’s impossible for a new CEO to sweep the Chambers Legacy down the tube.  I wouldn’t either.  I’ve talked with John a number of times and he’s an impressive guy.  In a different frame of reference he’s like Steve Jobs (who I’ve also talked with).  He understands the buyer, so even if Cisco were prepared to dis the legacy of their own chieftain (which clearly they won’t do), you can’t abandon a “sell them what they want” value proposition.

Which gets us back to the question of whether what they want is durable.  Cisco or a number of other vendors could promote a vision of future networking in which the current paradigms would be largely preserved for perhaps as long as a decade.  The trick is to establish the principle that operations efficiencies can cure all ills.  There’s no way anyone could say that Cisco would win in a future driven by the quest for reduced capex, after all.

Which is what competitors should be thinking about.  If Cisco needs the opex horse to ride to victory on, then competitors need to saddle it for their own gains.  Remember the numbers from the sum of operator surveys I’ve done.  Twenty years ago about a third of TCO was opex, and by the 2020s it appears that two-thirds will be opex.  Worse, if we focus on “marginal infrastructure” meaning the stuff that’s already being purchased, the average TCO is already almost two-thirds opex.  By 2020 it will be three-quarters.  Worst of all, the high-level services or service features have the highest opex contribution to TCO.

Ironically, Cisco has all the right DNA to be a leader in service automation.  From the very early days of DevOps, they favored a declarative model rather than a script-based approach, for example.  It’s my view that it’s impossible to do effective and flexible service automation any other way.  Cisco’s current approach, which is a policy-based declarative model, isn’t how I’d approach the problem (nor is it how I have approached it in past NFV work) but it’s at least compatible with SDN and legacy infrastructure.  Its failure lies in the fact that it’s not really addressing operational orchestration.

Which is where I think the brass ring is.  If you think about service automation, you realize that a “service” is a kind of two-headed beast, one representing the technical fulfillment and the other representing the business process coordination.  While it may be possible to automate technical tasks, it will be difficult to do that effectively if you don’t collaterally automate the service processes that deliver everything to the user and manage billing, etc.  You have to collaterally virtualize management to manage virtualization, in any form.

That’s the other point here.  We have ignored operations for the cloud just as fervently as we’ve ignored it for NFV and SDN.  Cloud services can’t squander resources to minimize faults and still be profitable, any more than you can do that with networking.

So here’s the challenge that Chuck Robbins faces.  Can you take Cisco’s decent-but-disconnected-and-incomplete story on service automation and turn it into something that addresses SDN, NFV, and the cloud.  If so, and if you kind of hand this area over to the UCS people, you have a shot at being what Chambers wanted Cisco to be—the number one IT company—and also attain a top spot in the network space.  Otherwise, you’re delaying a reckoning that will be only harder to deal with down the line.

Service PaaS versus Opto-Electrical Layer: Which Leads to NFV Success?

It’s nice to have a sounding-board news trigger to launch a discussion from, and Oracle has obligingly provided me that with its Evolved Communications Application Server.  This is a product that I believe is driven by the same industry trends that Alcatel-Lucent’s Rapport is, and potentially could deliver services that could compete with Google’s Fi.  The new services could be valuable.  A competitive response to Google could be valuable.  But is this path going to create value enough for the future?  That’s the question, and particularly relevant given the Ciena/Cyan deal announced today.

Operators face a double challenge in preparing for networking’s new age.  The first challenge is to find new services, and beat OTTs to leading with them.  The second challenge is validating hundreds of billion dollars in sunk costs.  It’s easy to be an OTT.  It’s relatively easy to be an operator who offers OTT services.  It’s hard for operators to build next-gen services in a way that builds on their current investment, their current incumbencies.

Mobile and IMS are poster-children for this.  One of the ideas behind IMS was that it could support new applications that would extend basic mobile services.  However, mobile services today means “getting on the Internet”, which is hardly an extension of IMS capabilities.  Operators have longed for a practical way to make IMS into an OTT competitor.  Nothing has worked.

Google’s Fi is a shot across the bow of operators.  Coming from the OTT side, using an MVNO relationship as the basis, Google wants to create value-add without the IMS legacy.  Their position is that you can use 3GPP standards to control handsets below the service layer.  That’s not operator-friendly.

Both Oracle’s new ECAS and Alcatel-Lucent’s Rapport build on IMS, Rapport because it includes it and ECAS because it builds on VoLTE, which depends on IMS.  It’s interesting that Oracle talks about ECAS in VoLTE terms rather than in IMS terms; it shows how far IMS has fallen strategically.

Yeah, but IMS is still necessary for 4G services, so the issue of how to extend new services to 4G users remains.  I like the idea of building a kind of “service PaaS” that operators could build to (or buy to) to extend their investment in mobile services.  The question is whether this will lead the operators to somewhere useful.

Both ECAS and Rapport start with a “universal session” concept that allows calls and session services to be connected across WiFi or cellular and roam between the two (or among them).  Google Fi also has that.  Both ECAS and Rapport would, in my view, allow operators to build Fi-like competitive services and extend them.  All good so far.

The question that remains for both ECAS and Rapport is “what next”.   Both Alcatel-Lucent and Oracle point to the compatibility of their approaches with NFV.  To me that demonstrates that vendors and therefore the buyers think that NFV compatibility is a bridge to the future.  You can deploy elements of Rapport and ECAS with NFV-like orchestration.

The problem is that a “service PaaS” is still a PaaS, meaning that it’s a kind of silo.  If there are special features to be included in a service PaaS, even if they are deployed using NFV as VNFs, are these features open parts of an NFV architecture?  Would the operators’ investment in the platform be extensible to other services, particularly non-session services?

Maybe they would, but it seems likely that NFV would have to be extended explicitly to optimize the picture.  There’s absolutely no reason why NFV couldn’t deploy “platform services” that would become part of a service framework to be used by specialized applications.  What would be bad is that this framework created a different service lifecycle management process.  Operations has to be unified overall, across everything, or evolution to a new service infrastructure will be difficult.

The question may be critical for NFV because this year is the watershed for the concept in my view.  If we can’t promote an NFV vision that can prove ecosystemic benefits in a real field trial by year’s end, then operators may have to find another way to manage their profit-on-infrastructure squeeze.

The Ciena acquisition of Cyan might be a path to that.  You may recall that I’ve proposed the network of the future would be built on an agile opto-electrical foundation that would take over the heavy lifting in terms of aggregation and transport efficiency.  As this layer builds from optical pipes to path grooming at the electrical level, it could well cross a bunch of different technologies, ranging from fibers or wavelengths to tunnels and Ethernet.  Orchestration and operational integration would be very valuable, even essential, in harmonizing this structure.

But like our service PaaS, the vision of agile opto-electrical layers isn’t a slam dunk.  I’ve not been a fan of Cyan’s approach, considering it lacking in both scope of orchestration and in integration of management and OSS/BSS.  It does have potential for the more limited mission of orchestrating the bottom layer.  However, to meet it there will have to be a significant improvement in operations integration.

Ciena had previously announced an NFV, and their opening statement on the proposed Cyan deal is worth reading (from their press release):

Cyan offers SDN, NFV, and metro packet-optical solutions, which have built a strong customer base that is complementary to Ciena. Cyan also provides multi-vendor network and service orchestration and next-generation network management software with advanced visualization. When combined with Ciena’s Agility software portfolio, Cyan’s next-generation software and platforms enable greater monetization for network operators through more efficient utilization of network assets and faster time-to-market with differentiated and profitable services.

This seems a pretty clear statement that the deal is driven in no small way by SDN and NFV, and I do think that the companies could combine to present a strong opto-electric agility story for the metro.  Such a story might not offer new services to operators like the Alcatel-Lucent and Oracle visions of a Service PaaS, but it might offer cost management and it has the advantage of being targeted at the place where operator infrastructure investment would naturally be higher—metro.

To me, these positions raise the critical question of cost-leads or revenue-leads.  Any contribution of new revenue or competitive advantage at the feature level would have a profound positive impact on the NFV benefit case.  However, benefits are harder to develop and socialize when they’re secured way above the infrastructure capabilities and corporate culture of the operators.  Both Alcatel-Lucent and Oracle seem to be working to ease operators into a service-benefit-driven vision by making the connection with mobile and IMS.  They bring to the table a solid understanding of service PaaS and mobile.  Ciena and Cyan bring a solid understanding of optical.  We may now be in a race to see which camp can create a solid vision of operations, because that’s where the results of either approach will be tested most.

How We Get to Where SDN, NFV, and Carrier Cloud Have to Go

In my blog yesterday I talked about the need for something “above” SDN and NFV, and in the last two blogs about the need for an architecture to define the way that future cloud and NGN goals could be realized.  What I’d like to do to end this week is flesh out what both those things might mean.

Networking today is largely built on rigid infrastructure where service behaviors are implicit in the characteristics of the protocols.  That means that services are “low level” and that changes are difficult to make because it requires framing different protocol/device relationships.  Operators recognize that they need to evolve to support higher-level services and also that they need to make their forwarding processes more agile.  NFV and the cloud, and SDN (respectively) are aimed at doing that.

The challenge for operators at my “boundary” between technology and benefits has been in securing an alignment at reasonable levels of cost and risk.  We can define transformative principles but can’t implement them without a massive shift in network infrastructure.  We can contain risks, but only if we contain benefits and contaminate the whole reason for making the change in the first place.

What we need to do have is an architecture for transformation that can be adopted incrementally.  We take steps as we want, but we remain assured that those steps are heading in the right direction.

Let me propose three principles that have to guide the cloud, NFV, and SDN in how they combine to create our next-gen network:

  1. The principle of everything-as-a-service. In the cloud and NGN of the future we will compose both applications and services from as-a-service elements.  Everything is virtualized and consumed through an API.
  2. The principle of explicit vertical integration in the network. OSI layers create an implicit stack with implicit features like connectivity.  In the future, all network layers will be integrated only explicitly and all features will be explicit as well.
  3. The principle of universal orchestration. All applications and services will be composed through the use of multi-level orchestration, orchestration that will organize functionality, commit resources, and compose management/operational behaviors.

You can marry these principles to a layered structure that will for all intents and purposes be an evolution of the cloud and is generally modeled on both SDN and NFV principles:

  1. The lowest layer is transport, hosting and appliances, which makes up physical media like fiber, copper, and servers, along with any “real” devices like data center switches. The sensor/control elements of the IoT and user access devices and mobile devices also live here.  Most capex will be focused on this layer.
  2. The second layer is the connectivity, identity, security, and compliance layer which is responsible for information flows and element relationships. This layer will be built from overlay protocols (tunnels, if you like, or overlay SDN).  You can think of it as “virtual networking”.
  3. The third layer is the feature/component layer where pieces of software functionality are presented (in as-a-service form) to be used by services and applications. This is where the VNFs of NFV or application components or the product catalogs from which architects build stuff lives.
  4. The top layer is the service and application composition layer which builds the features and applications users pay for.

If we combine these principles and our structural view, we can propose a framework for an NGN implementation.

First, network and compute resources at the bottom layer will inevitably be compartmentalized according to the technology they represent, the vendor who provided them, and perhaps even the primary service target.  That means that the implementation of this resource layer (which NFV calls the NFVI) has to be visualized as a set of “domains”, each represented by its own Infrastructure Manager.  That manager is responsible for creating a standard set of resource-facing (in TMF terms) services.  There will thus be many IMs, and unlike the model of the NFV ISG, the IM is part of infrastructure.

The resource services of these domains are “exported” upward, where they join the second element of the implementation, which is a universal model of services and applications.  The purpose of the model is to describe how to deploy, connect, and manage all of the elements of a service from the domain resource services, through intermediary building-blocks that represent useful “components”, up to the retail offerings.  The important thing about this model is that it’s all-enveloping.  The same rules for describing the assembly of low-level pieces apply to high-level pieces.  We describe everything.

One of the main goals of this model is to provide for service- and application-specific network connectivity that is built from a mixture of domain resource services (like fiber pipes) and virtual and completely controllable switching and routing.  Every application and service can have its own connectivity model or can share a model, so the scope of connectivity can be as refined or large as needed.  This layer is based on an SDN, virtual routing, and virtual switching model and I’d expect it would use an overlay-SDN protocol on top of traffic engineered paths and tunnels (used as resources from below).

Above this, we have a set of functions/components that can be harnessed to create that “higher-layer” stuff we always hear about.  Firewall, NAT, and other traditional NFV-service-chain stuff lives here, but so do the components of CRM, ERP, CDN, and everything else.  Some of these elements will be multi-tenant and long-lived (like DNS) and so will be “cloud-like”, while others will be customer-specific and transient and the sort of thing that NFV can deploy.  NFV’s value comes in what it can deploy, not what it does as service functionality (because it doesn’t have any).

Applications and services are at the top level.  These are assembled via the model from the lower components, and can live persistently or appear and disappear as needed.  The models that define them assemble not only the resources but also the management practices, so anything you model is managed using common processes.

Users of this structure, consumer or worker, are connected not to a service with an access pipe, but to a service agent.  Whether you have a DSL, cable, FiOS, mobile broadband, carrier Ethernet, or any other connection mechanism, you have an access pipe.  The things you access share that pipe, and the service agent (which could be in-band control protocol driven or management API driven) would help you pick models for things you want to obtain.

Universal orchestration orchestrates the universal model in this picture.  The purpose of the model is to take all current service-related tasks and frame them into a data-driven description which universal orchestration can then put into place.  Management tasks, operations processes, and everything related to the service lifecycle create components at that third layer, components orchestrated into services just as functional elements like firewalls or CRM tools would be.

I don’t know how farfetched this sounds, but I believe you could build this today.  I also think that there are four or five vendors who have enough of the parts that with operator “encouragement” they could do enough of the right thing to keep things moving.  Finally, I think that any test or trial we run on carrier cloud, SDN, or NFV that doesn’t explicitly lead to this kind of structure and cannot demonstrate the evolutionary path that gets there is taking operators and the industry in the wrong direction.