SDN and NFV: Beyond Serendipity

The concept of serendipity is dear to everyone.  You find a winning lottery ticket, dig up a pipe and find a stash of buried treasure, and (if you’re a network vendor) do a bunch of stupid stuff that somehow adds up to putting you in the right place at the right time.  Well, it’s nice to win the lottery but it would probably be a bad life choice to invest your retirement in it.  Being sensible, addressing real issues to realize real opportunities, is a better bet in the long run.  Sometimes that’s hard because “real issues” can be well-hidden, and that’s the case with SDN and NFV.

We’ve seen a number of purported SDN and NFV business cases published.  I don’t agree with the great majority of them, because they don’t address some of these real issues.  In many cases, they don’t address any of them, which means that the assumptions in the documents can’t be supported in the real world.  I don’t intend to criticize any specifically, but I do want to raise the hidden issues.  Look for these the next time you see SDN/NFV numbers.

The first hidden issue is the limited scope of application.  You can’t revolutionize infrastructure if the new technology can command only a couple percent of total capex.  Neither SDN nor NFV will displace optical transport, so their biggest impact is at Level 2 and above.  There, the largest investment today is in switching/routing, not in the higher-layer appliances.  If you can’t point to a path to broad deployment based on an SDN/NFV “use case” then you’re only proving that one specific service can be done differently.  That won’t change overall infrastructure costs or even (for most operators) the bottom-line profit per bit.

SDN could be used to create tunnels over optical transport, replacing both switching and routing with simple white boxes.  These tunnels could then be combined with NFV-deployed switch/router instances (probably mostly at the edge of the network) to create many services.  However, we aren’t postulating this combination because the big vendors don’t want to see switches and routers decline.  Without something like optical-and-SDN-tunnel-management to boost the number of places where SDN and NFV can be used, we’ll have a difficult time deploying enough of either to make a major difference.  This is why vCPE is potentially a dead end; if what you do is support add-ons to business Ethernet you don’t change enough overall capex and opex to matter.

The second of the hidden issues is operations costs for the resource pool.  If we host an instance of a function in a carrier cloud, we have to be able to maintain that cloud.  Everyone thinks that the cost of operations for virtual elements would be less than for appliances, or at least would be comparable.  Well, here’s the truth.

According to government statistics, network operators spend about 92% of their capital budgets on network equipment and only 8% on IT (servers, software, etc.).  That’s probably not a surprise to anyone given the fact that network operator IT is focused today on running OSS/BSS systems.  But here’s the interesting point; IT service operations costs are almost the same as network operations costs according to the operators themselves!  What would happen, given the fact that 8% of the tail is wagging a very large dog, if we were to shift 20% or 30% of capex to servers and software?

What this shows is that we cannot presume that moving something to a hosted model would even sustain current operations costs.  The numbers say we’re moving from a fairly opex-efficient model (the network) into a far less efficient model—the data center.  It shows that one of the major requirements for the success of SDN or NFV is to manage the IT operations costs of the server resources being deployed.  Service automation is absolutely critical for the success of SDN or NFV, and anything that doesn’t address that is just blowing smoke.  But because service automation is “above” SDN and NFV in the OSS/BSS layer, it gets ignored in every single business case.

The third hidden issue is resource pool economy of scale and efficiency.  Hosting functions means having a cloud/virtualized resource pool, and that raises two questions.  First, where are these pools located, and second how economical are they?  The challenge is that these two factors are interdependent.

Logically speaking, if we assumed that every service was going to have a hosted element to it, we’d expect the hosted elements to be located proximate to the offices where the service access connections are terminated.  In the US, we have about ten thousand such sites.  The problem is obvious; if credible SDN and NFV deployment means establishing local resources in each of these central offices, we’d need to deploy ten thousand new data centers to get into the game.  Nobody believes that can happen, so the presumption is that we’d start perhaps with a metro data center or two and then grow with opportunity.

This happy picture is at risk, because of what’s called “hairpinning”.  If we don’t have resources close to where we need them—at the service edge—then we have to haul traffic to a centralized point for efficient hosting and then return it to its normal path.  Imagine a service chain being chained from a user, fifty miles to a data center for hosting, then back fifty miles to the user, when the two “user” points might themselves be just a couple miles apart.  We’re wasting network bit-miles with this centralization, and worse we’re increasing operations risks and costs because we’re transiting more network devices.

This problem is why so many early SDN and NFV applications are based on natural concentration of hosting points.  SDN is often a data center network technology, which means it’s local to a hosting point.  NFV’s top application is vCPE, which hosts the function on the customer premises or at the carrier edge in a network device.  But you still need server blades or cards inside whatever you plan to use to host software.  And you now have made a little server out of edge devices.  Remember that IT server operations is more expensive than network operations?  How much of that expense will you now incur?  Try to find someone who accounts for that.

The final hidden issue is the hypothetical revenue gains to be expected.  I’ve seen all kinds of numbers on the extravagant new revenue opportunity associated with agile services and how the new features and faster deployment will enable operators to reduce customer acquisition and retention costs.  The challenge is that none of our current data supports the assumptions.

One problem here is that accelerating the time to deploy something only increases revenue for those customers who are newly deploying.  If I have ten thousand customers with service already, how many of those will accept an interruption in service while I replace features like firewalls with virtual ones?  How many of these firewalls-in-place were owned by the customer and are working fine?  We assume that if we’re talking about ten thousand customers and a two-week enhancement in provisioning time, we’re talking about a total of twenty thousand weeks of new revenue.  We might, in a static customer environment, be talking about no additional revenue at all.

Another problem is the notion that this is all going to help operators compete with OTTs.  Well, most OTT services aren’t candidates for SDN or NFV deployment and we know that because the OTTs that sell them (successfully, I might add) don’t use SDN or NFV.  And here’s another point.  If we have created a model of managed service sale where CPE hosts functions sold through a central portal, doesn’t that look exactly like what an OTT might want to provide?  A shift to this model might well do as much or more for OTT revenues as for operator revenues.

I want to say here what I’ve said before, which is that I firmly believe that both SDN and NFV can be justified and that the potential impact on carrier infrastructure, services, and costs, can be profound.  But I also want to say that shallow business cases that rely on unproven assumptions and that ignore real issues will never get us there.  We have six vendors who can make an NFV business case, for example, and yet even those vendors are prone to making business-case statements that are simply not valid.  The problem seems to be that everyone wants to make their SDN or NFV numbers right now, in this quarter, and the fact is that there are too many real issues to be faced to make that happen.

What does SDN or NFV success look like, do you think?  Would you say the technologies are successful if they make up 2% or less of capex?  I don’t think many would call that “success” and yet that’s exactly what will happen if we don’t face these hidden issues, and probably another half-dozen that are more subtle and perhaps have narrower impact.  I think all that stands in the way of that is recognition that we really do have to make business sense out of this, because I think we already have technology from at least six vendors that could be harnessed to do just that.

Are you an SDN or NFV hopeful?  If so, then you need to decide whether you’re going to bet your long-term success on playing the lottery or invest in the future.  I’d strongly suggest that those who can address the full SDN/NFV business case do so quickly to make their own offerings compelling.  Wishing, as the saying goes, won’t make it so.

What Three Earnings Reports Show Us About Tech Progress

We are starting to see earnings reports from technology giants and the results so far aren’t particularly pretty.  We are what we sell, in this industry as in all industries, and so it’s important to look at the trends reflected in quarterly reports to see what’s really happening in our industry.  Today I want to focus on three specific firms—Apple, Ericsson, and IBM.

Apple reported its slowest growth in iPhone sales since the product was announced, and revenues were below Street consensus based on that and on slower-than-expected growth in iPad and Mac systems as well.  This was surprising and frightening to those who believed that Apple would either find or make the Next Big Thing and ride each of its innovation waves perpetually.  That, of course, was never realistic.  It’s always been a question of when the model would run out of steam.  Will Apple, whose Watch sales are disappointing, now move to TVs or cars or maybe both?  Perhaps, but that’s not the issue I want to comment here.

Software is where innovation now is in the consumer space.  New technologies at the device level are important only to the extent that they open new software options.  If that’s true for phones, as I think Apple is showing it is, then we are on a direct path toward the personal-agent and cloud-hosted-feature model that I’ve talked about before.  It makes no sense to ask consumers to pay more for a phone, pay for more traffic (or load networks with it, reducing performance) to drag data to the device for cogitation when you can do all that stuff in the cloud.

IBM is showing something similar in the enterprise space.  The company reported software revenues below consensus and hardware revenues above, which is pretty much exactly what they didn’t want to see.  They do have a bright spot in their strategic target areas like the cloud and analytics, but it’s clear that these areas won’t sustain growth for IBM if they can’t either turn around their core business or sell it off.

The challenge IBM faces is that of transitioning from mining key accounts to evangelizing a totally new market.  While this is a simplistic example, those who sell to the Fortune 500 have only 500 prospects.  What is happening in computing is a shift from a central-batch model that’s sustained us (with modifications) for fifty years to a more personalized and distributed model.  IBM’s deal with Apple has helped it grow in the mobile space, but they haven’t yet considered the fact that if the end-game is to empower workers then you need to aim at approaches that empower all workers, not just those in big current IBM accounts.

Then there’s Ericsson.  They didn’t report bad numbers, but they didn’t show investors any sign of a fundamental change in business strategy and most on the Street agree that one is needed.  Ericsson has relied more on professional services and integration than on having a leadership position in key product areas like L2/L3.  That makes them vulnerable to players like Huawei (the price leader) and Nokia (who now has Alcatel-Lucent’s Ethernet/IP portfolio to add to their own operations/integration skills.

Ericsson’s big vulnerability lies in their lack of product mass.  While it’s possible to win deals on integration alone, it’s harder.  The company who stands to gain the most in total sales will be able to spend more resources on getting the deals.  They also have a higher level of credibility with buyers.  The Street view, which I think is correct, is that Ericsson is now fighting two major rivals in a space that won’t hold more than one leader.  Ericsson is hoping to address its product insufficiency with its deal with Cisco, but Cisco is hardly dedicated to network change and it may be an uneasy relationship.

The dominant common issue here is one of innovation.  All of these vendors are seeing their primary markets commoditize because new and useful features are simply not being added.  Whether you’re talking about a consumer or a giant network operator, your posture on future technology purchases is set by whether there are new benefits that could be harnessed to justify incremental spending.

Incumbents avoid transformational features because it might transform their incumbency out of existence.  Vendors in general fear that transformation opens up enormously long selling cycles that by themselves would defer buying, and bigger vendors fear that their efforts to educate the market will open the door for other competitors who will simply wait for big-vendor education to do its job.  The bigger the transformation, the greater all these risks seem to be.  For all three of these vendors, a big transformation seems to be the only solution.

The second challenge that all these firms face is openness.  Apple has already spawned open-source (Android) competition, and in the cloud everything’s a microservice so brand loyalty is hard to achieve without skirting anti-trust.  IBM and Ericsson both have to contend with open hardware initiatives (the telcos just joined the Open Compute group) and everyone seems to be backing open-source for the cloud, SDN, and NFV.  I think that the initial attractiveness of openness developed out of vendors’ reluctance to accept declining prices as the inevitable consequence of static feature sets.  Open source, open computing, promises to reduce or eliminate hardware and software costs, which is important if cost management is the only improvement you can make.

Open approaches erode the already-limited appetite of vendors for market education, which means that a technology revolution “led” by open technology might never go anywhere because buyers don’t know what to do with the new technology options.  Third-party resources like publications, once the most trusted source on new technology, are now largely ad sponsored and thus not to be trusted at all.  Same for analyst firms.  No wonder that buyers consider “experience of a trusted peer” to be the only validation one can accept from the outside.  But where do experienced trusted peers come from in the early days of a technology?

Apple, Ericsson, and IBM can be forgiven for not running rampant into the brave new world, but they may now have reached the point where the risk of standing still is greater.  IBM in particular has let its brand erode significantly, and you can see in media and Street comments that many Apple fans are frantically looking for good news to counter what might be a slip in Apple’s fortune large enough to raise questions about its future.  Ericsson, already under pressure because of a decision to seek a services-driven revenue model, now faces formidable competitors who have services and products.  So what do these companies, and their peers, have to do?

For Apple, the answer is to build a cloud business model.  I’ve criticized Apple for having such a pedestrian cloud story for two years now, and we may be reaching the point where the omission becomes critical.  Network operators and some competitors (Mozilla, for example) have tried to move to a “thin appliance” phone model, but the efforts have failed because the goal was to do nothing more than replicate current features.  There has to be a Silver Bullet to kill off the old notions, a benefit that justifies the change to the consumer.  I think IoT and other forces are already supplying examples that might be sufficient, and Apple needs to take action before somebody like Amazon or Google grabs the opportunity.

For Ericsson, the obvious answer is to jump out in front of the new-age network transformation movement.  SDN and NFV, taken alone, are not only not going to be enough to move the ball, they’re already in the hands of competitors.  Ericsson has OSS/BSS, and they could easily build that full-range-of-benefits story that operators want to hear.  Look at ONOS, CORD, and XOS, Ericsson.  They are a higher-layer vision of what we know operators want to do, because some (like Verizon) are already starting to do it.  Above all, you cannot let Huawei lead in the practical application of these technologies, and there’s a strong indication that Huawei is moving to do just that.

For IBM, you have to stand for business transformation in the most radical sense.  The business of the future will surely use the cloud, and analytics, and workflow, and mobility.  Everyone knows that and supports all these initiatives—independently.  What IBM has to do is support them collectively, to build the framework of the business of the future by starting with what that business will use technology for, at the highest level.  That means stellar marketing more than anything else, and an associating of “IBM” as a brand with the new wave of worker productivity.

We’ve had productivity waves in the past, since the dawn of computing in fact, but none since 2000.  This is the longest period we’ve ever gone without a major new benefit to drive change.  It’s time for vendors to realize their golden ages all depend on benefit augmentation on a periodic basis.  Otherwise they’re all fighting for scraps, which isn’t pretty.

Will OSS/BSS Love or OSS/BSS Hate Win?

SDxCentral cites an important truth in an article and report (the latter from the MEF and the Rayno Report), which is that there’s a lot of dissatisfaction with OSS/BSS out there.  It’s more complicated than that, though.  I mentioned in a prior blog my own experience at a major Tier One, where I got a “I want to preserve the OSS/BSS’ and “I want to trash OSS/BSS and start over” comment set from two people sitting next to each other.  Operations people (the CIO and reports) are largely in the former group and CTO and marketing teams in the latter, to no surprise.  What may be surprising is that it’s not clear which approach would really be best.

Nobody disagrees that virtualization has put pressure on existing systems, for reasons I’ve gone over in detail in prior blogs.  The heart of the problem is that we’re substituting networks built in two steps from networks built in one.  Where in the past we’d coerce services from cooperating devices, the virtual network of the future has to first deploy functions and then link them into services.  The essential question is how interdependent the steps in our new two-step process have to be.

What I’ve called the “virtual device model” says that the two steps can be independent.  You build network services as before, by coercing service behaviors from functional communities.  The fact that those functional communities are themselves built is taken care of outside the realm of OSS/BSS and even NMS.  Why do network management on server farms, after all?

Nearly all the OSS/BSS deployed today supports the virtual-device model.  Services are provisioned and changed through management interfaces exposed by network hardware.  These same interfaces can report faults or congestion, so any commercial impact (SLA violations etc.) can be addressed this way.  OSS/BSS vendors make a nice piece of change selling adapters that link their products to the underlying network equipment.

What the report/article calls “lifecycle service orchestration” or LSO is what I’ve been calling “service lifecycle management” or SLM.  Whatever term you like, the goal is to automate the deployment of functions and their connection into something, which might be a virtual device or a service that it itself presented through a management API.  Since the deployment of hosted features, the connecting of these pieces, and the threading of user connections to the result is a complicated multi-stage process, the term “orchestration” is a good way to describe it.

Where I think “LSO” and “SLM” diverge is actually the place where the virtual device model becomes unwieldy.  I can deploy something through orchestration, and I can even change its configuration, but I still have to address the question of just what’s happening to the functionality of the service itself.  How do we relate conditions in a server resource pool to conditions at the service level?  How do operations practices like the customer service rep (CSR) gain access to the state of the service, or how does a portal relate it to the customer?  Functionally my service looks like discrete boxes (routers, firewalls, etc.) but it isn’t.

The virtual device model says that when I deploy virtual functionality I deploy or commit collateral management processes whose responsibility it is to translate between the “functional management” state that the virtual device has to present upward toward the user, and the “structural management” of real resources that anyone who wants to really see what’s going on or to fix it necessarily has to be able to see.  The biggest problem with this is that every deployed function has to obtain enough information on its resources to be able to represent its state.  That poses risks in overwhelming the real management interfaces of the resource pool, and also risks that service functions with access to collective resource management interfaces could (deliberately or by accident) do something broadly destructive.

Another less obvious risk is that we’ve made lifecycle management a two-level process whose levels are not really visible to each other.  We might be recovering something down inside a virtual device when the best approach would be to reroute to a different function that’s already deployed elsewhere.  If we make a routing decision at a high level, we might really want to instantiate new processes inside virtual devices.  How does this knowledge get passed between independent lifecycle management processes, independent orchestrations?

There’s a really subtle risk here, too, which is that of promiscuous exchanges of status between components of virtual devices, effectively a kind of back-channel.  If vendor builds their own management tools, could these tools communicate across the standard flow of state/event information?  Could we dip down from a functional level to a resource?  If any of that happens, we’re exposing implementation details upward into what’s supposed to be a function-driven process, and that means other solutions can’t be freely substituted.

From an OSS/BSS perspective, the real problem is that it’s very likely that every vendor who offers a virtual function will offer a different lifecycle management strategy for it.  There is then no consistency in how a virtual device looks as it’s being commissioned and as it evolves through its operating life.  There may be no consistency in terms of what a given “functional command” to the virtual device would do in terms of changing the state of the device.  Thus, it’s very difficult to do effective customer support or drive the operations tasks associated with the commercial deployment of a service.

The alternative to the virtual device model is the managed model approach.  With this approach, a service is built as a hierarchical set of functions that start with basic resources and grow upward to commercial visibility.  Each level of the hierarchy can assert management variables whose value is derived from what it expects its components to be.  If you say that 1) each function-model can send stuff only to its direct neighbors and 2) each function-model can define its own local state and event-handling, then you can build a service from functions, not devices, and the only place where you have to worry about mapping to real-world stuff is at the very bottom, where “nuclear functions” have to be deployed.

The difference in these approaches is that in the virtual device approach, we secure lifecycle management by deploying lifecycle management processes as part of the features.  Nothing will ever interwork properly that way.  The second approach says that you define lifecycle management steps in a hierarchical model that builds upward from the resources to manageable functions, all under model control and without specialized processes.  If you want to substitute a feature, the only thing that has to be controlled is the initial mapping of the resource MIBs to the lowest-level functional model, the level that represents the behavior of that resource.

This relates to the OSS/BSS evolution in that with the virtual device model, you really can’t make the OSS/BSS evolve because it doesn’t know anything about the network evolution.  This satisfies the “leave-my-OSS/BSS-alone” school because there’s no other choice.  With the “virtual function” model, you can take the entire suite of operations processes (defined in the TMF’s enhanced Telecommunications Operations Map or eTOM) and make them into microservices run at the state/event intersection of each function in the service model.  You can build a data model to totally define service automation, in other words.

There is definitely a split in opinion on the role that OSS/BSS should play in the future.  The biggest barrier to OSS-centric visions for orchestration is the feeling that the OSS/BSS needs a face-lift, new technologies like SDN and NFV notwithstanding.  The biggest positive for the OSS/BSS vendor community is that there doesn’t seem to be an understanding of how you’d “modernize” an OSS/BSS in the first place.

The TMF outlined the very model-coupled-event approach I’ve outlined here, about eight years ago, in the NGOSS Contract/GB942 work.  They didn’t even present that approach to the ISG in the recent meeting of SDOs to harmonize on service modeling.  That may be the big problem facing the OSS/BSS community.  If they can’t adopt what was the seminal standard on event-based service automation, what would it take to move the ball?  But unless some vendor steps up to implement that second approach, they’ll be safe.

What We Can Learn From the NFV ISG Modeling Symposium’s Presentations

The ETSI NFV ISG has made a public release of the presentations made in the NFV modeling and information model workshop held recently and hosted by CableLabs.  I’ve referenced some aspects of this meeting in past blogs, but the release of the presentation material supports a look at the way the modeling issue is developing, and perhaps supports some conclusions on just what various standards development organizations (SDOs) are proposing and what the best approach might be.

Happy convergence doesn’t seem to be one of the options.  When you look at the material it’s hard to miss the fact that all of the bodies represented have taken slightly or significantly different approaches to “modeling” of service information.  Where there are explicit connections among the models (the TMF, ITU, MEF, and ONF models for example), the symbiosis seems to be an opportunistic way of addressing all of the issues of services through referencing someone else rather than by hammering out a cooperative approach.   The general picture we have is one of multiple approaches that in some cases deal with different aspects of the overall service modeling problems, and in other ways deal with the same aspects but differently.

Top-down service-oriented modeling also seems a casualty in all the SDO proposals.  I think all are essentially bottom-up in their approach, and that most if not all seem linked with a “componentized” view of service automation, one where there are a small number of software processes that act on model data to produce a result.  This is what the ETSI architecture reflects; we have MANO and VNFM and VIMs and so forth.  Only the OASIS TOSCA presentation seems to break free of these limitations; more on that later.

We’re not replete with broad-based models either.  Many of the modeling approaches are quite narrow; they focus on the specific application of vCPE and service chaining rather than on the broader issue of making a business case for an NFV-driven transformation.  Some of this bias is justified; service chaining presents a different model of component connectivity than we see in applications in the cloud (as I said in a prior blog).  However, there is obviously a risk in focusing on a single application for NFV, particularly one that could well end up being supported by edge hosting of virtual functions, which would make some of the issues moot.

What this adds up to is that we have not only the problem of synchronizing the SDO modeling equivalent of Franklin’s 13 clocks, we have our clocks in different time zones.  If a harmonious model for NFV services, or even for service modeling at a higher level, is the goal then I don’t think there’s much chance of achieving it by rationalizing the viewpoints that were represented by the presentations.  Many of the bodies have long histories with their own approach, and even more have specific limited missions that only overlap the broad service-modeling problem here and there rather than resolving it in a broad sense.  My conclusion is that we are not going to have a unified model of services, period, so the question is what we should or could have.

The core issue that the material represents in terms of the quality of a model rather than the probability of harmonizing on one, is the role of the model itself.  As I said, most of the material presents the modeling as an input to a specific process set.  This means that “operations”, “network management”, “data center management”, and “NFV orchestration and management” are all best visualized as functional boxes with relationships/interfaces to each other, defined by models.  The question is whether this approach is ideal.

A “service” in a modernizing network is a complex set of interrelated features, hosted on an exploding set of resources.  The functional building blocks of service processes we have today were built around device-centric practices of the past.  In a virtual world, we could certainly sustain these practices by building “virtual devices” and then mapping them into our current management and operations models.  That, in my view, is what the great majority of the presentations made in the CableLabs session propose to do.  The models are simply ways of communicating among those processes.  This approach also tends to reduce the impact of SDN or NFV on surrounding management and operations systems, particularly the latter, and that simplification is helpful to vendors who don’t want to address that aspect of infrastructure modernization.  However, there are significant risks associated with managing what are now non-existent devices.

TOSCA, in my view, and my own CloudNFV and ExperiaSphere approaches, would either permit or mandate a different model-to-process relationship, where processes are driven by the models and the models act as a conduit to link events to the processes.  The OASIS TOSCA presentation was the only one that I think addressed the specific point of lifecycle management and thus could argue it would allow automation of service operations and management processes.  The model-steering of events was the innovation of the TMF’s NGOSS Contract and GB942 approach, but interestingly the TMF itself didn’t present any reference to that approach in their submission to the sessions CableLabs hosted.  That’s a shame, because with a state/event approach all operations and management processes are converted to microservices invoked by events and mediated by service element states.  Thus, we break the static boundaries of operations and management “applications.”  That lets us virtualize management as easily as we virtualize functions.

What makes this notion different is that operations/management microservices can then be invoked for any service/resource event at any point, directly from the model.  What you model, you manage.  A service model then defines everything associated with the deployment and management of a service; there is no need for other descriptors.  This isn’t a philosophical point.  The fundamental question is whether virtualization, which builds services from “functions” that are then assigned to resources, can be managed without managing the functions explicitly.  The virtual device model combines functions and their resources, but there are obvious limitations to that approach given that the implementation of a function would look nothing like the device from which the function was derived.  How do you reflect that in a virtual-device approach?

I think that it would be possible to nest different models below service-layer models, and for sure it would be possible to use network-centric modeling like YANG/NETCONF “below” the OpenStack or VIM layer in NFV.  Where systems of devices that already define their own rules of cooperation exist (as they do in all legacy networks) there’s nothing wrong with reflecting those rules in a specialized way.  I don’t see that point called out in the material explicitly, though it does seem an outcome you could draw from some of the hierarchies of standards and models that were presented.

The first milestone defined by the “CableLabs Concord”, which is just having other SDOs get back to the ETSI ISG with proposals, isn’t until March.  Significant progress, as I noted in an earlier blog, is expected by the end of this year, but that doesn’t mean that any resolution will be reached then, or even that any is possible.  Frankly, I don’t think it is.

What all this says to me is that we have a dozen chefs making dishes without a common recipe or even a common understanding of the ingredients to be used.  There are three possibilities that I see for harmonization.  One is to simply accept that TOSCA is the only model that can really accommodate all the challenges of service modeling, including and especially the event-to-process steering.  TOSCA, after all, is rooted in the cloud, which is where virtualization, SDN, and NFV all need to be rooted.  The second is to let vendors, each likely championing their own approach loosely based on some SDO offering or another, fight it out and pick the best of the remaining players after markets have winnowed things down.

What about number three?  Well, that would be vendor sponsorship of TOSCA.  A number of vendors have made a commitment to evolve to a TOSCA approach, including Ciena and HPE whose commitments to TOSCA are most public.  I cited a university-sponsored project to use TOSCA to define network services in my ExperiaSphere material, and there are a number of open-source TOSCA tools available now.  However, I think OASIS’ presentation softballed their benefits a bit, and I’m not sure whether—absent market recognition of TOSCA’s value—any vendor would bet a lot of positioning collateral on TOSCA.  They should, and if they did I think they could take the lead in the service-modeling race, and in the race to make the SDN and NFV business case.

Could ONOS Be the Right Way Forward for SDN and NFV?

Most of those who follow topics in SDN and NFV will recognize the ONOS or Open Network Operating System project.  It’s been covered regularly in the media, but much of the coverage seems to present it as a kind of alternative OPNFV.  It’s true that ONOS and OPNFV, as projects, could be considered to have overlapping goals, but if you look at the two side by side, you see profound differences.  Those differences may be critical to the evolution of next-gen network technologies—both SDN and NFV.

ONOS, as the name suggests, is an operating system, a hardened version of Linux with middleware added but designed with the goal of standardizing white-box software switching.  In this respect it is very similar to OPNFV, which has started (at the bottom or infrastructure level) with its own OS approach.  Where the difference lies isn’t in these OSs, but in the framework in which they’re evolving.

OPNFV is in many ways an explicit satellite of the ETSI NFV efforts, and ONOS is aimed at a broader mission.  To quote their website: “The Open Network Operating System (ONOS) is bringing the promise of software defined networking (SDN) to service providers.”  The details make it clear that ONOS generalizes the term “SDN” to mean all software-based virtualization of networking.  If you want to understand the value of ONOS, you need to skip the low-level OS details and look at two projects within the group, CORD and XOS.

CORD stands for “Central Office Rearchitected as a Data Center”.  That’s important because operators build central offices and because CORD is really a higher-level goal and not just a technology, like SDN or NFV.  The details are doubly important because, as the CORD presentation points out, “CORD=SDN+NFV+Cloud”, which means that CORD unites the technology revolutions of our time.  It’s never been possible for operators to visualize a next-gen network as the somehow-combination of three revolutions; CORD makes it one revolution.

Hardware-wise, CORD is a set of server resources combined with white-box switches, connecting to GPON on the access side and ROADM on the trunk side.  The “virtual functions” of CORD include NFV’s VNFs but also white-box virtual-features created through central control of forwarding.  OpenStack is presumed to deploy VNFs and ONOS (from which the name of the body is taken) is the switch operating system.

Toward the customer CORD establishes Access-as-a-Service (ACCaaS) through a series of subscriber VLANs under central access control that creates virtual line terminations (vOLT).  Virtual Ethernet connections are provided, but switching is not directly virtualized but rather is a service of the central white-box SDN array.  CPE is logically a part of the access side, of course, but “CPE” here means the real and necessary service termination; vCPE functions would be hosted in the server farm of CORD.

This structure, as I’ve noted, is designed around a GPON access network, but it would be applicable to cable and even DSL technology as well; the key point is that access is a collective service of a communal infrastructure under central control.  Similarly, the trunk side is presumed to be ROADM but in theory is abstracted as a BNG (broadband network gateway) and so could represent any suitable technology.

What sits on top of both OpenStack and ONOS is XOS, which stands for, apparently, “anything-as-a-service operating system”.  XOS can be visualized as a manager or orchestrator, but first and foremost it’s an abstraction of SDN, NFV, and the cloud.  XOS is to services what CORD is to infrastructure, so the two combine to create the “network” of the future.  Services are the highest level of abstraction, and are made up of multiple Slices, each of which is a set (zero to some number) of VMs hosted in CORD and VNs (virtual networks) which are part of the SDN fabric of CORD.  VMs are committed through OpenStack and VNs through a Neutron-like interface to ONOS.

Services are hierarchical, meaning that a retail service has Tenant Services, and this lets you model complex service structures made up of several high-level components (that the TMF would call “Customer-Facing Services or CFSs).  XOS manages the data plane connections needed to stitch Tenant Services together, but also the control/management connections.  Each Service is represented by a Service Controller through which its functionality can be accessed.  XOS takes its approach from UNIX/LINUX, meaning it’s very software-centric; a nice touch given that all this stuff is software.

Another very interesting feature of XOS is the “view” concept, which roughly corresponds to what I’ve called “derived operations”.  Services expose management views appropriate to the community, which means that cloud services are managed like EC2 and a service operator has roughly what a customer service rep or NOC agent would have.

Unlike CORD, which defines a single CO and thus a single site, XOS has the ability to define, deploy, and manage services that span multiple locations and (at least in theory) multiple jurisdictions.  CORD services are logically “above” XOS and thus also spread across sites.  Since hosted functions and virtual network services are tenants of XOS, XOS can also unify and combine both SDN and NFV across the same scope.

The relationship between XOS and NFV is described in the XOS paper in what I think is a limiting and apologetic way.  They show XOS as being equivalent to the VNF Manager (VNFM) and running “under” MANO.  I think that XOS could easily supplant both VNFM and MANO and that would make more sense, since XOS would give NFV that scope of service across sites (and maybe networks) that it needs.

For all the gains ONOS has secured from its approach, it still has challenges to face.  It doesn’t have the broad vendor support that the NFV ISG has, though that can be a benefit as well as a challenge.  The programmatic approach it takes is hard to relate to non-programmers and the documents available don’t rise to the challenge.  Finally, there is an implicit reliance on OpenStack that could be a problem for operators looking for a different cloud strategy or who want to move networking forward faster or in a different direction than OpenStack Neutron will take it.

The last of these items is the most critical.  You can see in the base services defined for CORD that ONOS is highly public-Internet-centric.  There’s no model for private Ethernet services or clear provision to expand the base of services.  Part of this may be due to the fact that Neutron is highly IP-Internet centric, or it may be that Neutron and Internet-centricity are inherited from a common bias.  Offsetting this is the fact that some very important operators, including Verizon, seem committed to ONOS.

There’s no reason why ONOS, ONF, the ETSI NFV ISG, and OPNFV couldn’t cooperate and harmonize.  The recent modeling session hosted at CableLabs may be an indication that ETSI at least is trying to make that happen.  The risk I see is that ONOS could be dragged into the morass of standardization, with all the delays and vendor-centric accommodations.  If that happens then ONOS might lose its advantage, to the disadvantage of the SDN and NFV markets.

What Projections Can Tell Us About Opex Efficiency and its Impact on Network Modernization

Over the last year I have been trying to understand how network operators make a business case for a migration to SDN and NFV.  It’s easy to see how you can build them into a few services or projects here and there, but we seem to be lacking the impetus to drive a wholesale shift in infrastructure.  Server and chip vendors, in particular, need to try to exploit the potential hundreds of thousands of incremental servers that such a shift could justify.  The problem is that it’s not clear how all the pieces come together.

I’ve gathered good cost data from operators as well as constraints on a shift in technology, and pushed them through my model of the market.  The result is an interesting view of network evolution, one I intend to share here.

From a benefit-driver perspective, network evolution has two paths forward.  One says that you make the equipment needed to support services cheaper—this is the “capex reduction” benefit.  The other says that you make network operations processes more efficient and agile—the “opex and agility” benefit.  The two can be symbiotic, but there’s a basic tension between them as well.

If we presumed that networking were to be remade by deployment of SDN and NFV in some mix, then the pace of evolution is constrained by the pace at which SDN/NFV changes could be introduced into the network.  One barrier to this is the financial inertia of current infrastructure, and another is the need to establish a suitable operations framework that could at least guarantee no loss in efficiency during the SDN/NFV transition.  Even if we solved this second problem (which neither the ONF nor the ETSI ISG has put into their scope), the first of these problems would limit the rate of SDN/NFV introduction, particularly in the next three years.

If, on the other hand, we presume that remaking networking was going to happen by remaking services to optimize the use of service automation to improve efficiency and agility, we would have no sunk-cost barriers to deployment of our solution but also have no direct capex reduction.  An operations-centric service-automation approach can happen above the current network, though, and also above any future infrastructure technology changes.  In that position it could facilitate the introduction of new equipment and technologies like SDN and NFV.

Operators have told me consistently that they do not believe that capex reductions for SDN or NFV are compelling reasons to shift to these new technologies.  They are particularly concerned that the more complex service topology created by function-hosting and service chaining will add more operations costs than it reduces capex.  Operators are also concerned about new-revenue service agility benefits, which they believe vendors have been over-hyping.  It’s easy to say that users would buy X or Y or Z if you presented them in a certain way, but until that presentation is made and you can test the theory, it’s only a guess.  Thus, for this discussion, we’ll focus on opex.

The opex category of operator costs includes customer acquisition/retention, network operations, customer technical support, IT operations, metro backhaul charges paid, and energy costs.  The first four of these are what I call “process opex” meaning that they directly relate to human operations costs and thus can be directly impacted by service automation.  These costs make up about 88% of opex and in 2016 will account in total for about 23 cents of every revenue dollar for operators.  Capex accounts for only about 19-20 cents.  Operators also say that these costs will grow to 28 cents per revenue dollar in 2020, which contrasts to expected growth of less than 2% annually in capex.  Some costs, like customer technical support, are expected to increase by 22% by 2020, a daunting prospect for operators.

My model shows that by 2022 either a service-operations-driven or SDN/NFV-driven modernization can reasonably be expected to reduce total opex by about a third, which is about 7.5 cents per revenue dollar.  How the two approaches get to that savings is very different, however.

With SDN/NFV, the challenge is timing.  Because the pace of infrastructure change needed to adopt SDN and NFV is so limited by the depreciation of current assets, the SDN/NFV path secures less than a third of its total 2022 savings even in 2018, and this presumes aggressive adoption without any limitations created by perceived risk or operations cost questions.  If we presume the current pace of risk management for SDN/NFV, the adoption rate is only about 40% of this optimum rate, and even in 2022 we’d see less than two-thirds of the target 7.5 cents per revenue dollar benefit.

Service automation shows a totally different picture.  Early savings growth is very rapid because there are no significant financial inertia factors to deal with.  On this path to opex reduction, savings grow so fast that they achieve the SDN/NFV savings rates two years earlier.  By 2020 the service automation path has saved operators twice the opex of the SDN/NFV path.  There’s also the issue of ROI.  As I pointed out in my blog yesterday, achieving opex reduction through SDN/NFV modernization costs about five times as much for the same level of savings.

It’s not all beer and roses for service automation, though.  A pure service-automation transformation hits the wall in about 2023 because of limits in how agile and efficient current network technologies can be.  SDN/NFV transformation continues beyond that point, and by about 2025 has cut opex in half and by 2027 SDN/NFV has made up the cumulative savings deficit it had relative to service automation.

This, I think, makes it clear that the best solution is to combine these two initiatives.  If we had the service automation benefits created in such a way that they worked now with legacy infrastructure and facilitated SDN/NFV evolution (what I’ve called the “operations-first” approach), we could actually fund SDN/NFV transition with early service automation savings and achieve the 2025 optimum savings targets as much as three years sooner.  More and more operators are turning to this approach, though they don’t publically link their plans to the kind of savings progression I’ve cited here.  Verizon made a presentation that specifically detailed plans to first abstract current network features and then evolve them.  That’s the right answer.

Of course, you also have to abstract in such a way as to facilitate realizing the service automation goal.  It would seem that the ideal situation would be to convert current OSS/BSS/NMS processes into microservices that could then be integrated with service lifecycle management states as a service deployed and moved through various sets of issues.  The one risk that I see in focusing on abstracting legacy technology is missing the need for these microservices.  After all, we already have management orchestration in classic networks—the international model of EMS-NMS-SMS progression implies layers of increasing service awareness.  It would be all too easy to simply rename stuff that’s already out there, declare it a service automation solution, and forget that we’ve not moved the ball.

This risk is most likely a factor for big network vendors and for OSS/BSS vendors—the former because they really don’t want much evolution in infrastructure and the latter because they don’t want to do too much new stuff.  That’s why having an operationalizing software model and orchestration within SDN/NFV would have been so helpful.  We still may get it, but it’s getting harder every day.

That’s the real lesson of these numbers.  We could do the best thing possible by uniting the software automation and SDN/NFV movements, but if the latter stubbornly refuses to address an opex-dominated end game, then harmonization will be very difficult.  Salespeople tell me that 1) it’s very hard to do a unified sell of SDN/NFV justified by a broad capex/opex business case and 2) their own companies don’t want them to do that because they won’t make near-term sales goals.  Well, those goals are at risk now in any event, and is the long-term SDN/NFV future if software automation gets out in front.

What Does “Software-Defined Networking” Really Mean?

I know that in past blogs I’ve noted that we often create unnecessary problems in the industry by overloading hot new technology terms.  “Software-Defined Network” is a great example; some people have used the term to describe a major shift from distributed device-adapted networking to centralized software networking.  Others think you get to SDN by simply adding an API or a service catalog on top of a management system.

There’s no question that “riding the hype wave” is an established means for vendors to get good PR, but I think it’s also true that software definition of networking is naturally kind of imprecise.  Since I’m such an advocate of top-down thinking, let’s look at software’s role in networking from the top and see if we can dig out some significant truths.

Any product or service has “price/demand elasticity” in an economic sense.  You can set a price, and at that price the demand you can access will depend on how many prospective buyers can justify the price by making a business case.  In the early days of data networking, the price of data connectivity was very high—so much so that fewer than ten thousand sites in the US could justify the cost of a data circuit whose capacity was about a tenth that of consumer broadband today.  Those data services were profitable to the seller, but limited in their upside.

The whole of broadband evolution since then has been about trying to make it possible to continue to bring broadband to more users, so as to increase the total addressable market (TAM).  A big part of that is lowering the price of bandwidth without making it unprofitable, and this is where software comes in.

If we looked at networking of 1990, we could easily suppose that cutting the price of broadband circuits by better use of fiber optics (for example) would lower the cost.  We might at first be able to reduce costs enough to drive major reductions in price, keeping profits steady.  However, there are costs of service other than cost per bit, meaning there are costs beyond capital costs.  As we make generating bits cheaper, we increase the component of total service cost that non-bit costs represent.

Most of those non-bit costs involve what would today be called “opex” meaning operating expenses.  Most opex cost is human cost.  Furthermore, the need for efficient human intervention in service processes generates delays in making services available or changing them.  Those delays make it more difficult to apply network-centric solutions to business problems that you can’t anticipate well in advance.  What this all adds up to is that to take the next step in cost management we need to operationalize services differently, in a less human-intensive way.  That’s the broad topic of software automation of services.

This point is rarely if ever recognized, but software automation of services is necessarily based on two principles.  First, you must virtualize all service-layer assets so that they can be controlled by software without manual intervention.  Second, you must preposition access assets that have to serve as your service conduit so that you aren’t forced to roll trucks to deliver an agile virtual service through a new access connection.

Prepositioning access assets is a business decision, one that’s historically been strongly influenced by public policy.  If we neglected regulatory impact, a free market for access would mean a kind of bit-arms-race among access providers to get the fattest and most versatile pipe to the highest-value service buyers.  Every operator would still exploit their own access pipes.  Many have seen this (right or wrong) as an invitation to monopolistic behavior on the part of large operators, and some countries (Australia with their NBN project, for example) have attempted to “open” access networking by disconnecting it from competing operators and making it a not-for-profit pseudogovernmental asset.  Most have implemented open network mandates of some sort.

We’ll need to sort out the regulatory position on “open access” if we want to get the most from software automation of services.  If the model of net neutrality that’s emerged for the Internet were to be applied generally to access infrastructure, it’s hard to say whether the infrastructure could be profitable enough to induce operators to oversupply capacity.  Since it’s also hard to see how many buyers would be willing to pay for capacity oversupply against potential future need, we have a problematic leg for the access dimension of cost-effective networking that needs fixed, but whose fixing is beyond technology’s capabilities.

Given that, let’s focus on the service side.

I think the best way to view service-layer automation is to say that it has two layers of its own.  One, the top layer, is OSS/BSS and portal-related and it’s designed to manipulate abstractions of network capabilities.  The next layer is designed to coerce behaviors from those network capabilities.  A network can be said to be software-defined if its lower-level capabilities (“behaviors” as I call them) can be abstracted under a generalized (intent) model.  That means that an operator can say that they have software-defined or even virtual networking as long as they can abstract their management interfaces so as to permit operations systems or retail portals to call for a “service” and have the request properly mapped to the management APIs.

SDN and NFV are mechanisms to address the virtualization of service-layer assets from that critical abstraction downward.  They do this by making service connection features and higher-layer features software-hostable and thus more easily deployed and changed.  SDN allows for the building of new connection models (the “chain” model example I offered in a previous blog on SDN/NFV is one such new connection model) and NFV allows for the injection of new service features/functions without deploying specific appliances.  Both these things augment the versatility of pooled resources as the basis for services, which makes them more agile and more easily automated.

This structure explains what I think is a critical point, which is that if your goal is operations efficiency and service agility you can realize a lot of that goal at the OSS/BSS level.  If you use ODL-type modeling to encapsulate/virtualize legacy management interfaces you can control current services with software and radically improve agility and efficiency.  You could say the same thing in NFV terms by saying that an Infrastructure Manager that supported the right northbound abstractions and linked to legacy management interfaces on the southern end would look to NFV processes pretty much like virtual-function-based stuff would look.

This could have been done all along, of course.  The TMF took most of the critical steps within the last decade, but in more recent projects they seem to have gotten themselves mired in politics or vendor collisions and lost their edge.  They reignited their efforts when it became clear that NFV might end up subducting operations into MANO.  Now it’s become clear that NFV MANO isn’t even addressing its own issues thoroughly, but the TMF is still not leading the higher-layer stuff as it should be.  That sets the stage for a vendor to jump out to take advantage of the opportunity, but of course most vendors have had the same opportunity as standards groups and failed to exploit them.  We’ll have to see if things change.

I think that recognizing the elements associated with successful software automation of services would be a giant step forward.  Telecom is an enormous industry with a disproportionately large sunk cost in equipment.  It takes a lot of benefits to bring about systemic change in this situation, and perhaps isolating the operations side from the network side could help operators address benefits that are less impactful on installed equipment.  This would, of course, come to a degree at least at the expense of SDN and NFV revolution, because it would tap off benefits these two technologies might otherwise claim for their own.

What would the difference between a service-automation-driven versus SDN/NFV-driven modernization approach?  The big difference is that SDN/NFV address efficiency through infrastructure change, which is limited in the pace at which it can be adopted by depreciation of current assets.  My model says that the latter cannot make any meaningful contribution to opex reduction until 2018, while the service automation approach achieves in 2016 what SDN/NFV could achieve only in 2018.  By 2022 the two models converge, largely because operations modernization that doesn’t happen external to SDN/NFV will surely happen as a result of it.  So the big difference in the next 8 years is the timing of savings, which service automation drives a full three years faster.  Another difference is ROI.  The SDN/NFV investment needed to drive savings in 2018, its first “good” year is five times that needed to achieve much the same level of savings for service automation in 2016.

All this shows that if you take a service-automation view of modernization, orchestrate service processes and abstract current infrastructure, you get by far the best result.  In point of fact this approach would actually reverse the carrier revenue/cost-per-bit convergence now set for 2017, and in my model the two never converge at all.  With the SDN/NFV approach, operators have to wait till 2019 to restore balance between revenue and cost.

I’m going to blog more on this later, but for now let me say that the optimum approach is (and always has been) to prioritize the service-automation side of the picture and build a framework in which infrastructure modernization can pay back in both opex and capex.  That’s why it’s so important for SDN and NFV proponents to push operations integration, and so sad that most of them do not.

If Four Operators are Going their Own Way with NFV MANO, What Way Should They Go?

One of the outcomes of the recent meeting of NFV ISG members, hosted by CableLabs, was recognition of the value of open-source implementations of NFV and a need to harmonize efforts there.  Ironically, it’s also been reported that four operators have formed their own open-source activity to develop a MANO implementation.  It’s worth looking both at why that is and what these activities will have to address.

To a software guy, one of the big problems with the ETSI MANO concept is that like the rest of the framework defined in the model, it was synthesized from the bottom up instead of analyzed from the top down.  One thing this has generated is too many explicit components and interfaces—it builds in its own complexity.  The multiplicity of interfaces and the inefficient assignment of features to elements generates a lot of seemingly shared responsibilities.  Look at the definition of what MANO does in the ETSI documents versus what a VNF Manager (VNFM) does and you’ll see the problem.

Operators have now started to hit the wall with the limitations of the specs because they’re starting to attempt to move beyond simple proof-of-technology PoCs into broader trials that demand benefits be proved.  That exposes the NFV structure to a wide range of service needs and a broad requirement for integration with current and future equipment and operations practices and tools.  I believe that what we are seeing today with our four operators, and others as well, is a growing understanding that the process isn’t working as it must.  The question is whether they understand just how it must.  The current model is wrong, but what’s the right one?

What I propose to do is try to bend the ETSI vision into something that might have come out of a proper top-down software design.  I’ll reference THIS FIGURE to do this, so you might want to download it for reference as you read the rest of this.

What this figure shows is a kind of modernized view of the ETSI NFV model, trying to focus on the pieces and not all the implementation details that I think are inappropriate for the mission of the body.  NFV starts with infrastructure (both network and server resources) that are represented in the NFV process through a series of Virtual Infrastructure Managers (VIMs).  These VIMs assert a set of intent models that describe the functional capabilities of the infrastructure, to host virtual functions, connect things, etc.  The intent models describe the connection points, the SLA parameters, and a set of generic functions that the models support, such as “deploy” “redeploy”, “scale”, etc.

The VIM intent models are referenced in a service model, and this model is decomposed and parsed for lifecycle management by what I’m calling Service Lifecycle Management but which is in ETSI divided into the VNF Manager (VNFM) and Management and Orchestration (MANO).  VNFs contribute and consume management data through another intent-modeled interface, and they also can exercise some lifecycle management processes through this interface.  This whole process links to current operations and management systems through a third intent-modeled interface.

In this framework, the role of what ETSI and the operators call “MANO” is fairly clear, providing you generalize “MANO” to my Service Lifecycle Management.  It is the set of NFV software elements that mediate between the abstractions of resources presented by the VIM and by VNFs (which are software resources) and the operational framework of the operator.  Operators build service models, and when they are ordered turns them over to MANO/SLM to instantiate them and sustain them.

If you look at things this way, then MANO/SLM is a pretty straightforward example of state/event programming.  Every element in a service model has a state, and recognizes a set of events.  Ordering something sets up the model of the instance in the ORDERED state and the activation of the order generates an ACTIVATE event (example names only!) to command the software to decompose the model, assign resources, connect things, etc.  When the service is deployed it enters the OPERATING state, where it remains until something unusual occurs, or until the service term expires.

The process of lifecycle management described here and the intent-model interfaces shown allow you to say that as long as a piece of infrastructure 1) is represented by a VIM and 2) asserts one or more intent models to represent the services it represents, then that infrastructure can be used by any service model that references one of its intent models.  If the operators, vendors, or standards groups create a number of standardized intent models (Service-Chain, for example), any implementation of that intent model should be fully interoperable.

Similarly, a VNF can be onboarded using a standard process if 1) it references, directly or indirectly, the state of its resources and the services of NFV through a standard intent model and 2) the VNFs and VNF PaaS Intent model is linked to a service model by the process of deployment.  To make this work, a reference lifecycle state/event table is required.  This could be provided by an operator or standards group as the template to be used for a given function (“Firewall”) or it could be offered by the vendor who supplied the VNF.  In either case, the processes needed at the state/event intersections could either be drawn from a repertoire of “standard” NFV processes or provided by the vendor or operator.

The central piece of this is the service model, and here I offer a simple hypothesis.  If the efficiencies and agility of NFV come from software automation of lifecycle processes, then there must be a model of what the desired state of a service is.  NFV software must then use that model to achieve a desired state.  The best of the cloud DevOps stuff works this way already.  Besides showing that it was madness to start talking about NFV without defining a service model, this hypothesis says that no MANO/SLM implementation that doesn’t start with a model has any hope of providing service automation.

I would offer a second hypothesis:  Service models allow for resources and service components to be represented and manipulated in an abstract implementation-independent form represented by a functionally derived intent model.  Then whatever can realize a given abstraction can be used to deploy it.  Integration of NFV can only be achieved by this abstraction/realization combination, again linked integrally with the service models.  Thus, without both service and intent modeling, NFV integration can never really be standardized and NFV can never be truly open.

Here’s another interesting point.  A service model builds services by assembling abstractions.  One abstraction type is a set of virtual functions, another a set of virtual resources.  Could a third kind of abstraction be a complete MANO implementation of another party?  Why could we not create a “VIM” that talked not to resources but to the order/OSS/BSS intent model interface of another MANO?  If that were the case, then an implementation of NFV could consist of service-modeled-and-connected MANO silos.  So even if we had incompatible details on NFV implementations, we could harmonize them at a higher level.

Application of these principles creates an NFV model that has three broad elements (VIM, MANO/SLM, and VNFs) and has three intent-modeled interfaces.  It is manageable to standardize these interfaces, but in some ways it’s less critical that be done because it’s not difficult to transform one abstraction into another.  The service data model guides the software components through the service lifecycle, and everything that’s needed to deploy and manage services are simply “microservices” in cloud terms.

All of these points are, or should be, critical to these four operators looking at their own contribution to open-source MANO.  If my two hypotheses are correct, and I believe very strongly that they are, then not only would further evolution of the existing MANO approach be a waste, everything that’s been done so far should be reconsidered in the light of the new reality.  Or, truthfully, the reality that’s been there all along and is only now (and slowly) being accepted.

This is where operators can help their own cause.  They complain that vendors have dragged their feet, and that’s generally true.  They’ve complained that opportunistic vendor squabbling has derailed important progress, and that’s less true.  The responsibility for insuring buyer requirements are met rests with the buyer.  Operators, especially those now involved in open-source MANO, need to make it clear that something completely new is needed.  Yes, that will be embarrassing, but all the time and effort in the world is not going to rehab the MANO notion.  The operators say they aren’t tied to the ISG MANO process.  OK, prove it.

What is “Virtual Networking”, Why Do We Need It, and How Do We Get There?

Network virtualization is unique in our virtualizing world in that it’s both a goal and a facilitator.  We want to use network virtualization to change both the services networks can offer and the way that familiar services are created, and at the same time we expect virtualization to answer new questions on resource connection and reconnection, emerging in cloud computing and NFV.  Consistent with our industry’s regrettable tendency to look at everything from the bottom up, network virtualization seems almost disconnected from both missions by sensationalistic positioning—white-box takeovers that drive Cisco into the grave.  Chambers was right to dismiss that threat, but it was never what he needed to be worrying about in the first place.

The basic principle of networking today is that of the connection network.  Things that want to communicate have an “address” that’s part of a structured address space.  The network, by some means or another, learns the way that these network users are connected, meaning where the users are in a topological sense.  The network then figures out routes for traffic delivery.  All this discovery is today based on adaptive exchanges of route/reachability/connectivity information.  SDN proposes to change that to exercise central control of connection points and routing policy.

Network virtualization, whether you think of it in SDN terms or as an evolution to legacy devices and protocols, is all about changing the connection network model.  Rather than starting with the “how” and talking the same bottom-up approach I’ve criticized others for taking, let me start by dealing with the “why?”

Yes, it’s true that you could use SDN to displace existing switching/routing, and it’s also true that this displacement could create a new model where traffic engineering and SLAs were created more at the “virtual wire” level than at L2/L3.  That would facilitate building networks with virtual switch/router instances and thus really threaten Cisco.   The problem is that this is a complex evolution that, like most paths toward NFV, presents limited benefits until you’ve committed thoroughly to the transition.  No operator likes that kind of bet.  What would be helpful is a means of justifying SDN deployment without massive churning of infrastructure, which means addressing SDN as a facilitator.  Of what?  The cloud, and NFV.

In a world of cloud computing and virtual functions, there’s an obvious problem with the simple notion of a static address space.  When a cloud component or VNF is deployed, it has to be given an address through which it can be reached, which is a requirement for “normal” deployments as well.  The problem that cloud or cloud-like approaches (including server virtualization) introduce is dynamism with respect to network location.  It was here a minute ago, and now it’s somewhere else.  Or maybe it’s several places.

When these changes in the mapping of an addressed component with a network access point occur, they have to be reflected somehow.  If I change location of the NSAP without changing the address itself, then packets to that address have to be routed differently, which means routing tables have to change.  If I decide to change the address, then everyone who knew the function by its “old” location has to be given the new one, and I still have to build routability to that new location.

A second problem introduced by both the cloud and NFV is that of public versus private addresses.  Any time I build up applications or services from assembled components, the assembly process creates routes for traffic among the pieces.  If those routes (based on addresses of ports exposed by the components) are “public” meaning that they’re visible in the same address space as the users would be, then it’s possible to attack or hack them.  But some of the component addresses actually represent connections to the outside world, connections that have to be visible.  Where assembly of components is more dynamic, it’s possible that what’s seen from the outside would depend on exactly how the interior components were assembled.

We’re not done yet; there’s a third problem.  The most popular application of NFV at this moment is virtual CPE (vCPE), which uses something called “service chaining”.  In service chaining, traffic is stitched between selected functions in the same way that function-based appliances might be cabled together.  A firewall device, for example, might be connected to a WAN port, and then a VPN appliance, and so forth.  What makes this difficult is that there is no “natural order” of appearance along the traffic path; you have to specify.

These challenges have to be addressed in a framework that balances the need to solve the problem and the need to retain compatibility with current software practices.  We have some notions of what the results might look like in Amazon’s elastic IP and Google’s Andromeda.

A starting point to a virtual networking strategy to address these problems is support for multiple addresses for network elements.  Google Andromeda shows this by depicting a network service point as a vertical silo with an appearance in several different virtual networks.  Each of the virtual networks would have an address space that’s a subset of a larger space so that the address of a packet would implicitly identify the virtual network.  Within a collection of components, open addressing on one virtual network could occur based on a routing scheme linked to cloud topology, and public access could be provided on another.

Amazon’s mechanism of elastic IP makes the process clearer, though it seems less general.  All cloud-hosted components are assigned an IP address from the RFC1918 space but related to the location of the component in the Amazon cloud.  These can be mapped selectively to public IP addresses.  The mapping can be changed to reflect movement of an element.

Neither of these strategies address the issue of explicit function chains, the third problem noted above.  Explicit routes are supported in various IP standards through a routing stack or “source routing”, but in service chaining there is no assurance that that we’d know the first component along the path or whether that component would know to originate the stack, and if so with whom.

SDN would allow the creation of an explicit service chain because the central network control element would (or could) know of the notion of a “chain” element set which would be an ordered list of ingress-egress ports.  As various sources have pointed out before, the only issue might be efficiency in interconnecting vSwitch and “real” switch elements to avoid hairpinning traffic.

It appears that it would be possible to define a virtual networking approach to address all of the issues noted above and retain compatibility with current software.  Interestingly, this could involve using a new SDN connection model to augment the familiar LINE, LAN, and TREE—the CHAIN model just noted.  If so, virtual networking might represent the first specific enhancement to the connection model we’ve managed to propose, despite the fact that in theory SDN could support an arbitrary set of such new models.

Another interesting question is whether SDN could be augmented to control the address management part of virtualization.  Certainly it would be possible to build forwarding rules to support traffic division by address, but public/private mapping could require the use of SET and STRIP features of OpenFlow that are not mandatory.

Network virtualization opens some interesting topics in intent modeling of services too.  Obviously the CHAIN example I just cited is one, but also consider the question of simply adding an instance to some component or VNF.  You have to deploy that instance and connect it, and the latter mission will involve both the simple point of running a pipe or joining a subnet and the potentially more complicated task of adding load balancing and changing DNS.  Might an intent model for a service then include functions like “scale-out-component” or “redeploy?”  If we add those things in, we significantly facilitate open implementation because any solution that functionally meets requirements could be adopted with confidence that the implementation of the model itself would do all that’s required.

I think that virtual networking could be the strongest argument for SDN/OpenFlow adoption, but I also think that the state of the industry lags the potential virtual networking represents.  There are some indications (such as the recent meeting of ISG parties at CableLabs to talk about SDN integration) that thinking will advance to include these topics, but it’s also very possible that other industry groups with collateral interest (like the IETF) will take this up before the ISG gets to it.  Whoever manages to come up with something clear and useful here will do the industry and themselves a major service.

What Can We Draw from HP’s Service Director Announcement?

Back in October 2012 a group of network operators issued a paper called “Network Functions Virtualisation:  An Introduction, Benefits, Enablers, Challenges & Call for Action”.  There were a lot of prescient statements in that first paper, but one that seems particularly relevant today is “Network operators need to be able to “mix & match” hardware from different vendors, hypervisors from different vendors and virtual appliances from different vendors without incurring significant integration costs and avoiding lock-in.”  That statement implies an open but unified framework for deployment and management, and that’s been the goal for most operators.

It’s not a goal that’s been met.  NFV exists today in a number of specialized, service-specific, models.  An operator who’s spent time and money running a trial or PoC can’t confidently extend the technology they’ve used to other services, nor can they extend that model upward to operations and management.  Even the six vendors whose NFV solutions can make an overall business case don’t control enough of the action to deliver a business case for a large operator.  Every NFV project looks like extensive custom integration, and none make that broad business case.

The reason I’m going through all of this is that there’s an obvious solution to this mess.  If we presumed that NFV was modeled from the top down, integrated with the operations processes, and defined so that even MANO/VIM implementations of competing vendors could be abstracted into an effective role in a broad NFV implementation, we could harmonize the current disorder and build that broad business case.  Up to now, only Ciena of the six vendors who have the ability to make the broad NFV business case has asserted this ability to abstract and integrate.  Last week HPE joined them by announcing the Service Director.

Service Director is a high-level modeling and orchestration layer that runs at the service level, above not only traditional NFV MANO implementations but also above legacy management stacks.  The modeling and decomposition in Service Director appear to be consistent with what’s in HPE’s Director MANO implementation.  In fact, you could probably have done some of the things Service Director does even before its announcement.  What’s significant is that HPE is productizing it now.  For the first time, HPE is emphasizing the strength of its service modeling and the features available.  That’s smart because HPE has the strongest modeling approach for which I have details, and operators report it’s the best overall.  They’ve underplayed that asset for too long.  As we’ll see, though, they still seem reluctant to make modeling the lead in its own announcement.

Because Service Director models legacy or NFV services, it can also model multiple NFV services, meaning it can build a service model that represents not only an HPE Director MANO implementation but the MANO implementation of somebody else.  This is the feature that could be most important because of that NFV silo problem I opened with.  Operators who have a host of disconnected services using incompatible tools can unite them with Service Director.  Service Director also codifies the specifics of a feature HPE has claimed for its OpenNFV strategy from the first—the ability to manage legacy devices as part of a service.  Since nobody is going to go cold-turkey to NFV, that’s critical for evolutionary reasons alone, and I think personally that we’ll have legacy devices in services for longer than I’ll be in the industry.

HPE is focusing its management and operations integration on Service Director too, which means that higher-layer functions essential in building service agility and operations efficiency are supported above traditional ETSI NFV.  Service Director can build a management strategy for the elements it models, integrated with the modeling, and since those can include legacy devices and even competitors’ MANO, Service Director can provide a path to a high-level comprehensive business case for an operator no matter how many different virtual and legacy functions it might be committed to using.

The management part of this is based on a combination of the powerful data model, a management database, and analytics tools that support a closed-cycle “from-here-to-there” set of capabilities that are easily applied to VNF scaling and redeployment problems as well as to managing legacy devices.  The data model is IMHO the secret sauce here, and I think that HPE underplays that asset in their presentation material.

The basis for Director (both the NFV form and Service Director) is the combination of a process-centric service modeling approach and a comprehensive information model.  The service model starts with a catalog, moves down to Descriptors, Action Building Blocks and Virtual/Physical Infrastructure Models.  The Information Model, which in my own work I combine with the service model, describes properties, relationships, and behaviors of service objects—all needed if you’re going to do an intent model.  The modeling supports hierarchy and inheritance, just like a good programming language should.

I like the content of the HPE announcement because it seems to address the problems I’m hearing from operators.  I also have to wonder how it relates to the issues reported recently between HPE and Telefonica.  I don’t think HPE could cobble Service Director together that fast, but they might have started on it based on what they were learning about NFV’s integration problems.  Service Director might even be a central piece of a new HPE approach to integration.

The more important question, of course, is whether evolved thinking on service modeling has altered Telefonica’s perceptions on integration.  Remember that the deal HPE lost is being rebid, that HPE has been invited to bid again, and that all of this is supposed to gel by the end of January.  It would be logical for HPE to move in a modeling-centric direction if the Telefonica bid mandated or even favored that.

Service director is about modeling, make no mistake, but as I said earlier HP hasn’t led with that point.  The announcement text seems to be about the role Service Director plays in legacy element integration, which is important but really just a proof point for the modeling.  Are they unwilling to suggest that services be modeled using their language to support integration, to promote a proprietary model?  They shouldn’t be given that ETSI has booted the modeling issue, and that in the recent meeting at CableLabs, the ISG has agreed to prioritize the modeling issues.  The problem is that the ISG has set a pretty low bar; significant progress by the end of 2016.  That’s way too long for service modeling benefits to be realized, so it’s going to be up to vendors.  HP is also migrating to TOSCA to express the service models, and that’s a standard.  They should sing all this proudly.

There is nothing wrong with having proprietary modeling strategies that support VNF, NFVI, and legacy element integration if it comes to that.  We have a lot of programming languages today, a lot of ways of representing data, and nobody tears out their hair.  We accept that the best way to approach the task is one that maximizes benefits and minimizes costs.

The HPE announcement, and even more a Telefonica endorsement of service intent modeling in their rebid, could spur ETSI to address the issues of service modeling before its self-imposed and conservative end-of-year deadline, but the sad truth is that it’s too late now.  The body could never hope to make satisfactory progress given the vendor dynamic within, and there’s no time for a protracted process in any event.  Maybe the ISG should agree to an open modeling approach and let vendors work to prove their own is best.  That would probably be the effect of any specific service intent modeling focus in the Telefonica rebid, and it would surely generate the most useful shift in NFV we’ve seen since its inception.