The Path to “Service Agility” and “Operations Efficiency”

Over the last year, we’ve seen a significant transition in expectations for things like SDN and NFV that are aimed at transforming networks.  Where once it was believed that moving to white-box switches and functionality hosted on cheap servers was a major driver of change, it’s now broadly accepted that something else has to drive us forward.  The things most often cited now are “service agility” and “operations efficiency”.  I happen to agree, but it’s more complicated than it seems.

First, my surveys and work with operators strongly suggests that we’ve attained our heightened understanding of drivers of change through the somewhat (at least) cynical path of having exhausted the easier-to-sell options.  Every salesperson, everyone who has tried to sell “management” on an idea, knows that the best starting point is something that’s easily understood and difficult to disprove.  Capex savings fit that bill, so that’s where we started.  Since we have not completed any useful research on how agile SDN or NFV would be, or how much operational savings they could generate, we’re not rushing (armed with stunning insights) to the right answer.  We’re just moving on past what we know won’t work to the easiest thing that might.

You can see that from the fact that “service agility” is the top requirement today, rather than operational efficiency.  People love service agility because they can postulate nearly any upside they like (“Hey, we could add 50% in revenues if we could address market trends faster!”) and because there are bunch of easy, harmless, things that can be said to address agility needs.  A good example is the notion that we could shorten service turn-on times by two (or maybe four, or even six) weeks.  First, all the service automation you can name won’t string a wire, so the things you can do quickly through an automated process are things that augment wires, not create them.  Second, the actual benefit of shortening the time you can bill for a feature added to a wire is non-replicable.  Operators tell me that shortening turn-on will likely add no more than three tenths of one percent in service revenues in any given year.  Over time, as feature commitments inevitably stabilize (once you get a firewall you tend to keep it), you’re not doing anything to be agile with.

I think that the admittedly more complicated truth here is that service agility and operations efficiencies go hand in hand.  Once we’ve addressed the low apples in agility (and found them little and maybe a bit sour) we have to start looking at the complicated agility issues.  Nearly any service that could be conceptualized and sold today could be built today.  The problem isn’t lack of services, it’s lack of profits.  If something can be done for 148% of the current willingness to pay, then it’s darn sure not going to be sold enough to make anyone care about it, or how agile you were in getting it turned up.  The fact is that more valuable services are more complicated services, and complexity always costs in operations terms.  Thus, we really need to be conceptualizing operations efficiencies first, and that poses two specific and significant challenges.

The first challenge is that any new technology will necessarily make up a relatively small part of the infrastructure pool we’re building from.  We worry about how to operationalize SDN or NFV when that’s not the problem.  We have to operationalize legacy because that’s what we’re starting with in any service roll-out.  Who among our operator friends will deploy an enormous NFV cloud to secure some opex benefits, given that the NFV pieces are little nubbins of functionality buried in a sea of traditional technology?

SDN’s biggest problem, in its OpenFlow purist form at least, is this point.  We have no credible proof you can replace everything in the Internet with SDN and most people don’t believe we can.  Yet without replacing at least a whole heck of a lot, we can’t make any major change to operations costs, and in the early stages of deployment the new and different SDN practices are going to be more expensive to unify with legacy tools and practices.  So we prove a negative benefit and hope that people believe it will magically get positive?  Good luck with that.

The second challenge is that we don’t know what the agility barriers are, because we don’t know what our service opportunity targets will be.  When anyone who touts agility talks, they are forced into pedestrianism that generates boredom in the hearts of our media friends, and we get nowhere because nobody even knows we’re trying.  We have to be able to solve agility operationalization challenges inside a framework of service creation that could address any credible service target.  That is a very big order for a marketplace obsessed with staring at its feet and next quarter and not the horizon and their opportunity to become the next IBM or Cisco.

The fact is that what we should be doing now isn’t directly related to operations efficiency or service modeling, it’s something that Diego Lopez from Telefonica talked about at the Light Reading network event recently.  It’s abstraction.  We need to be able to create services by manipulating abstractions.  We then need to be able to manage the result.  Service agility and operations efficiencies, translated into practical terms, mean model-driven service creation/composition and the linkage of service operations processes to service/network/IT events based on a service-instance-driven model of how resources and functionality are related.

This is the reason I like TOSCA but don’t yet love it.  TOSCA (Topology and Orchestration Specification for Cloud Applications) is a model-based approach to defining how functional atoms of a service would decompose into resources deployed in the cloud.  Since deployment of software-based features has got to be the core of any credible future service, that aligns the tool with the primary new problem.  What I don’t love is the fact that TOSCA is still “cloudy” both in that it’s aimed at the cloud and in that it’s an emerging spec with little current practical history of modeling services in the broad sense.  I think you can make TOSCA into the right answer, at first by augmenting it and later by enhancing it, but I think you have to define that as a goal, and the first step in proving your effectiveness is to take a network that has no SDN or NFV in it and prove you can model and operationalize services.  Because it’s not whether “agility” or “operations efficiency” will drive SDN or NFV, but whether there’s a way to get both and get them from day one in our evolution to the future.