How to Avoid Management Silos in a Virtual World

The need to modernize operations practices to make them more agile and efficient is pretty obvious.  The need to organize complex software deployments, particularly those involving componentized applications, is also obvious.  So is the need to do efficient allocation of features and components to virtualized infrastructure.  What is not yet obvious is just how to do it.

A lot of this modernizing and organizing is a matter of APIs.  Generalized management tools have to accommodate a variety of network equipment, and this is usually accomplished using management APIs.  But if every vendor has their own approach, their own APIs, then there’s too much customization needed to make it likely that high-level tools could be applied to a given multi-vendor network.  Standardization is one way of addressing this; LightReading did an article on the MEF’s APIs for carrier Ethernet and CenturyLink’s goal of standardizing on these APIs.  Another approach is a vendor platform that embraces at least the major legacy vendors Cyan offers an orchestration platform for SDN/NFV that was recently expanded to include the ability to control Cisco and Juniper switches.

There are obviously a lot of vendor strategies (likely as many as there are vendors) but there are also diversions in the standards approach.  At a high level, you could divide operations efficiency tools according to whether they were “network-centric” or “OSS/BSS-centric” in their evolution.  The former seek to establish a common means of controlling devices, leaving the existing OSS/BSS to define how this common-control approach links to operations.  The latter assumes that operations systems themselves have to evolve in some way.

Any multiplicity of approach leads to user confusion and to potentially higher costs and substandard results, but it’s not this high-level division that worries me.  I’m more concerned about the service-specific approaches that both the MEF APIs and CenturyLink and the Cyan Blue Planet focus seem to suggest is evolving.

Arguably, the real value behind the IP convergence was the elimination of service-specific network hardware silos.  Five networks for five services invites not only inefficient operations but inefficient use of capacity since what’s available for one service is lost to others that don’t share the infrastructure.  But five operations/management silos on top of converged infrastructure doesn’t make much sense either, and that’s what we might end up risking here.

On the surface, it might seem very logical to develop operations tools and practices for something like carrier Ethernet.  We have a body (the MEF) focused on the service, we have specific providers who offer it, customers who depend on it, and equipment that’s designed to support it.  It’s also probably easy to make a business case for agility here, and to define very specific goals regarding the kinds of service additions and changes expected and the tolerance for cost and delay.  But thirty or more years ago, it was also easy to justify multiple service-specific networks for many of the same reasons.

Capital costs are a declining piece of overall service costs, and cost management isn’t the only path to building revenues and profit margins.  Operators in my own surveys have valued both service agility and operations efficiency higher than capital cost management for about five years now, reflecting no doubt the idea that they have a handle on how to manage capex but are a lot less sure about the agility/efficiency stuff.  We could well see a flood of initiatives to address agility/efficiency, and we could well create a lot of silos by following through on each.

I’m worried that if we start looking at how to make Ethernet services efficient, or maybe VPN services efficient, we’ll find an answer for both, but not the same answer.  The carrier Ethernet service could well have much the same equipment as pieces of cloud computing service or content delivery.  Buyers may mix multiple “services” into a retail offering.  Do we expect the people who use Ethernet in the cloud or VPNs to come up with their own management strategies or to use ones that evolve out of carrier Ethernet?  If the latter is the goal, how likely is it that we’ll be able to do what we want if we’ve given no thought to the requirements of these additional service areas?

Another source of risk here, IMHO, is the notion of managing devices and virtual devices as the path to managing services.  There is, in SDN or NFV, nothing that necessarily corresponds to a “router” or a “switch”.  There is functionality that can mimic the devices, but that functionality doesn’t likely have the properties of the devices themselves.  It may well not be localized but instead is distributed; it may even move around dynamically.  The point is that a “virtual device” is an abstraction.  We might elect to recognize virtual devices that map 1:1 to current real devices, but would we constrain our evolution forever by demanding that mapping?  If I’m using a synthesizer to create music, why say that my orchestra contains only brass, woodwinds, and strings?  A rapoor (to cite an old science-fiction yarn) might be an imaginary instrument, but if we can do whatever we imagine, why not do one?

What’s needed here is some higher-level organization, something that could in fact come (or could have come) from multiple sources.  In modern terms, we need abstractions that represent manageable elements of functionality.  In short, what we need to do is to assemble what we can control about a set of hosted or installed behaviors, and then represent them as something like a virtual device.  The problem, therefore, is not that we have virtual devices, it’s that we insist that all virtual devices are based on limited real-device elements.  That limits us to current networking concepts.

An elastic model of virtual devices lets us embrace what we have, define whatever we decide we’re evolving to, and sustain management through all the transitions.  Those are properties critical in an age where falling profits are pressuring everyone to be more efficient.  Whatever we waste in operations costs or opportunity benefits isn’t available to fund network expansion, dear vendors.

The TMF could define this.  The OMG, or the ONF, or the NFV ISG or even OPNFV, could all define something like this.  However, the time to do the defining is early in the process of building the framework for management, and we may have passed that optimum time already for most or all of these bodies.  If that’s the case, we need to think hard about the possibilities of going back to do it right.