Domain 2.0, Domains, and Vendor SDN/NFV

Last week we had some interesting news on AT&T’s Domain 2.0 program and some announcements in the SDN and NFV space.  As is often the case, there’s an interesting juxtaposition between these events that sheds some light on the evolution of the next-gen network.  In particular, it raises the question of whether either operators or vendors have got this whole “domain” thing right.

Domain 2.0 is one of those mixed-blessing things.  It’s good that AT&T (or any operator) recognizes that it’s critical to look for a systematic way of building the next generation of networks.  AT&T has also picked some good thinkers in its Domain 2.0 partners (I particularly like Brocade and Metaswitch), and it represents its current-infrastructure suppliers there as well.  You need both the future and the present to talk evolution, after all.  The part that’s less good is that Domain 2.0 seems a bit confused to me, and also to some AT&T people who have sent me comments.  The problem?  Again, it seems to be the old “bottom-up-versus-top-down” issue.

There is a strong temptation in networking to address change incrementally, and if you think in terms of incremental investment, then incremental change is logical.  The issue is that “incremental change” can turn into the classic problem of trying to cross the US by just making turns at random at intersections.  You may make optimal choices per turn based on what you see, but you don’t see the destination.  Domains without a common goal end up being silos.

What Domain 2.0 or any operator evolution plan has to do is begin with some sense of the goal.  We all know that we’re talking about adding a cloud layer to networking.  For five years, operators have made it clear that whatever else happens, they’re committed to evolving toward hosting stuff in the cloud.

The cloud, in the present, is a means of entering the IT services market.  NFV also makes it a way of hosting network features in a more agile and elastic manner.  So we can say that our cloud layer of the future will have some overlap with the network layer of the future.

Networking, in the sense most think of it (Ethernet and IP devices) is caught between two worlds, change-wise.  On the one hand, operators are very interested in getting more from lower-layer technology like agile optics.  They’d like to see core networking and even metro networking handled more through agile optical pipes.  By extension, they’d like to create an electrical superstructure on top of optics that can do whatever happens to be 1) needed by services and 2) not yet fully efficient if implemented in pure optical terms.  Logically, SDN could create this superstructure.

At the top of the current IP/Ethernet world we have increased interest in SDN as well, mostly to secure two specific benefits—centralized control of forwarding paths to eliminate the current adaptive route discovery and its (to some) disorder, and improved traffic engineering.  Most operators also believe that if these are handled right, they can reduce operations costs.  That reduction, they think, would come from creating a more “engineered” version of Level 2 and 3 to support services.  Thus, current Ethernet and IP devices would be increasingly relegated to on-ramp functions—at the user edge or at the service edge.

At the service level, it’s clear that you can use SDN principles to build more efficient networks to offer Carrier Ethernet, and it’s very likely that you could build IP VPNs better with SDN as well.  The issue here is more on the management side; the bigger you make an SDN network the more you have to consider the question of how well central control could be made to work and how you’d manage the mesh of devices. Remember, you need connections to manage stuff.

All of this new stuff has to be handled with great efficiency and agility, say the operators.  We have to produce what one operator called a “third way” of management that somehow bonded network and IT management into managing “resources” and “abstractions” and how they come together to create applications and services.  Arguably, Domain 2.0 should start with the cloud layer, the agile optical layer, and the cloud/network intersection created by SDN and NFV.  To that, it should add very agile and efficient operations processes, cutting across all these layers and bridging current technology to the ultimate model of infrastructure.  What bothers me is that I don’t get the sense that’s how it works, nor do I get the sense that goal is what’s driven which vendors get invited to it.

Last week, Ciena (a Domain 2.0 partner) announced a pay-as-you-earn NFV strategy, and IMHO the approach has both merit and issues.  Even if Ciena resolves the issue side (which I think would be relatively easy to do), the big question is why the company would bother with a strategy way up at the service/VNF level when its own equipment is down below Level 2.  The transformation Ciena could support best is the one at the optical/electrical boundary.  Could there be an NFV or SDN mission there?  Darn straight, so why not chase that one?

If opportunity isn’t a good enough reason for Ciena to try to tie its own strengths into an SDN/NFV approach, we have another—competition.  HP announced enhancements to its own NFV program, starting with a new version of its Director software, moving to a hosted version of IMS/EPC, and then on to a managed API program with components offered in VNF form.  It would appear that HP is aiming at creating an agile service layer in part by creating a strong developer framework.  Given that HP is a cloud company and that it sells servers and strong development tools already, this sort of thing is highly credible from HP.

It’s hard for any vendor to build a top-level NFV strategy, which is what VNFs are a part of, if they don’t really have any influence in hosting and the cloud.  It’s hard to tie NFV to the network without any strong service-layer networking applications, applications that would likely evolve out of Level 2/3 behavior and not out of optical networking.  I think there are strong things that optical players like Ciena or Infinera could do with both SDN and NFV, but they’d be different from what a natural service-layer leader would do.

Domain 2.0 may lack high-level vision, but its lower-level fragmentation is proof of something important, which is that implementation of a next-gen model is going to start in different places and engage different vendors in different ways.  As things evolve, they’ll converge.  In the meantime vendors will need to support their own strengths to maximize their influence on the evolution of their part of the network, but also keep in mind what the longer-term goals of the operator are.  Even when the operator may not have articulated them clearly, or even recognized them fully.

Public Internet Policy and the NGN

The FCC is now considering a new position on Net Neutrality, and also a new way of classifying multi-channel video programming distributors (MVPDs) that would allow streaming providers who offered “linear” (continuous distribution, similar to channelized RF) programming as opposed to on demand to be MVPDs.  That would enable them to negotiate for licensing deals on programming as cable companies do.  The combination could raise significant issues, and problems for ISPs.  It could even create a kind of side-step of the Internet, and some major changes in how we build networks.

Neutrality policy generally has two elements.  The first defines what exactly ISPs must do to be considered “neutral”, and the second defines what is exempt from the first set of requirements.  In the “old” order published under Chairman Genachowski, the first element said you can’t interfere with lawful traffic, especially to protect some of your own service offerings, can’t generally meter or throttle traffic except for reasons of network stability, and can’t offer prioritization or settlement among ISPs without facing FCC scrutiny.  In the second area, the order exempted non-Internet services (business IP) and Internet-related services like (explicitly) content delivery networks and (implicitly) cloud computing.

The DC Court of Appeals trashed this order, leaving the FCC with what it said was sufficient authority to prevent interference with lawful traffic but not much else.  Advocates of a firmer position on neutrality want to see an order that bars any kind of settlement or payment other than for access, and implicitly bars settlement among providers and QoS (unless someone decided to do it for free).  No paid prioritization, period.  Others, including most recently a group of academia, say that this sort of thing could be very destructive to the Internet.

How?  The obvious answer is that if neutrality rules were to force operators into a position where revenue per bit fell below acceptable margins on cost per bit, they’d likely stop investing in infrastructure.  We can see from both AT&T’s and Verizon’s earnings reports that wireline capex is expected to decline, and this is almost surely due to the margin compression created by the converging cost and price.  Verizon just indicated it would grow wireless capex, and of course profit margins are better in wireless services.

You can see that a decision to rule that OTT players like Aereo (now in Chapter 11) could now negotiate for programming rights provided they stream channels continuously might create some real issues.  It’s not certain that anyone would step up to take on this newly empowered OTT role, that programming rights would be offered to this sort of player, that consumers would accept the price, or that the new OTT competitors could be profitable at the margin, but suppose it happened.  What would be the result?

Continuous streaming of video to a bunch of users over the Internet would surely put a lot of additional strain on the ISPs.  One possible outcome would be that they simply reach price/cost crossover faster and let the network degrade.  The FCC can’t order a company to do something not profitable, but they could in theory put them to the choice “carry at a loss or get out of the market”.  I don’t think that would be likely, but it’s possible.  Since that would almost certainly result in companies exiting the Internet market, it would have a pretty savage impact.

There’s another possibility, of course, which is that the ISPs shift their focus to the stuff that’s exempt from neutrality.  That doesn’t mean inventing a new service, or even shifting more to something like cloud computing.  It means framing what we’d consider “Internet” today as something more cloud- or CDN-like.

Here’s a simple example.  The traditional scope of neutrality rules as they relate to video content would exclude CDNs.  Suppose operators pushed their CDNs to the central office, so that content jumped onto “the Internet” a couple miles at most from the user, at the back end of the access connection.  Operator CDNs could now provide all the video quality you wanted as long as you were using them.  Otherwise, you’re flowing through infrastructure that would now be unlikely to be upgraded very much.

Now look at my postulated opportunity for mobile/behavioral services through the use of a cloud-hosted personal agent.  The mobile user asks for something and the request is carried on the mobile broadband Internet connection to the edge of the carrier’s cloud.  There it hops onto exempt infrastructure, where all the service quality you need could be thrown at it.  No sharing required here, either.  In fact, even if you were to declare ISPs to be common carriers, cloud and CDN services are information services separate from the Internet access and sharing regulations would not apply.  It’s not even clear that the FCC could mandate sharing because the framework of the legislation defines Title II services to exclude information services.

You can see from this why “carrier cloud” and NFV is important.  On the one hand, the future will clearly demand operators rise above basic connection and transport, not only because of current profit threats but because it’s those higher-level things that are immune from neutrality risks.  The regulatory uncertainty only proves that the approach to the higher level can’t be what I’ll call a set of opportunity silos; we need to have an agile architecture that can accommodate the twists and turns of demand, technology, and (now) public policy.

On the other hand, the future has to evolve, if not gracefully then at least profitably, from the past.  We have to be able to orchestrate everything we now have, we have to make SDN interwork with what we now have, and we have to operationalize services end to end.  Further, legacy technology at the network level (at least at the lower OSI layers) isn’t displaced by SDN and NFV, it’s just morphed a bit.  We’ll still need unified operations even inside some of our higher-layer cloud and CDN enclaves, and that unified operations will have to unify the new cloud and the old connection/transport.

One of the ironies of current policy debates, I think, is that were we to have let the market evolve naturally, we’d have had settlement on the Internet, pay for prioritization by consumer or content provider, and other traditional network measures for a decade or more.  That would have made infrastructure more profitable to operators, and stalled out the current concerns about price/cost margins on networks.  The Internet might look a little different, the VCs might not have made as much, but in the end we’d have something logically related to the old converged IP model.  Now, I think, our insistence on “saving” the Internet has put more of it—and its suppliers—at risk.

What’s Involved in Creating “Service Agility?”

“Service agility” or “service velocity” are terms we see more and more every day.  NFV, SDN, and the cloud all rely to a degree—even an increasing degree—on this concept as a primary benefit driver.  There is certainly a reason to believe that in the most general case, service agility is very powerful.  The question is whether that most general case is what people are talking about, and are capable of supporting.  The sad truth is that our hype-driven industry tends to evolve drivers toward the thing most difficult to define and disprove.  Is prospective execution of our agility/velocity goal that nebulous?

Services begin their life in the marketing/portfolio management portion of network operators, where the responsibility is to identify things that could be sold profitably and in enough volume to justify the cost.  Ideally, the initial review of the new service opportunity includes a description of the features needed, the acceptable price points, how the service will get to market (prospecting and sales strategies) and competition.

From this opportunity-side view, a service has to progress through a series of validations.  The means of creating the service has to be explored and all options costed out, and the resulting choice(s) run through a technology trial to validate that the stuff will at least work.  A field trial would then normally be run, aimed at testing the value proposition to the buyer and the cost (capex and opex) to the seller.  From here, the service could be added to the operator’s portfolio and deployed.

Today, this process overall can often take several years.  If the opportunity is real, then it’s easy to see how others (OTT competitors for example) could jump in faster and gain a compelling market position before a network operator even gets their stuff into trial.  That could mean the difference between earning billions in revenue and spending a pile of cash to gain little or no market share.  It’s no wonder that “agility” is a big thing to operators.

But can technologies like SDN, NFV, and the cloud help here?  The service cycle can be divided into four areas—opportunity and service conceptualization, technology validation and costing, field operations and benefit validation, and deployment.  How do these four areas respond to technology enhancements?  That’s the almost-trillion-dollar question.

There are certainly applications that could be used to analyze market opportunities, but those applications exist now.  If new technology is to help us in this agility area, it has to be in the conceptualization of a service—a model of how the opportunity would be addressed.  Today, operators have a tendency to dive too deep too fast in conceptualizing.  Their early opportunity analysis is framed in many cases by a specific and detailed execution concept.  That’s in part because vendors influence service planners to think along vendor-favorable lines, but also in part because you have to develop some vision of how the thing is going to work, and operators have few options beyond listening to vendor approaches.

If we think of orchestration correctly, we divide it into “functional” composition of services from features, and “structural” deployment of features on infrastructure.  A service architect conditioned to this sort of thinking could at the minimum consider the new opportunity in terms of a functional composition.  At best, they might have functional components in their inventory that could serve in the new mission.  Thus, NFV’s model of orchestration could potentially help with service conceptualization.

Where orchestration could clearly help, again presuming we had functional/structural boundaries, would be in the formulation of a strategy and the initiation of a technology trial.  The key point here is that some sort of “drag-and-drop” functional orchestration to test service structures could be easy if you had 1) functional orchestration, 2) drag-and-drop or an easy GUI, and 3) actual functional atoms to work with.  A big inventory of functional elements could be absolutely critical for operators, in short, because it could make it almost child’s play to build new services.

Structural orchestration could also help here.  If a service functional atom can be realized in a variety of ways as long as the functional requirements are met (if the abstraction is valid, in other words), then a lab or technology trial deployment could tell operators a lot more because it could be a true functional test even if the configuration on which it deployed didn’t match a live/field operation.  Many DevOps processes are designed to be pointed at a deployment environment—test or field.  It would be easy to do that with proper orchestration.

The transition to field trials, and to deployment, would also be facilitated by orchestration.  A functional atom can be tested against one configuration and deployed on another by changing the structural recipes, which is easier to test with and accommodates variations in deployment better.  In fact, it would be possible for an operator to ask vendors to build structural models of operator functional atoms and test them in vendor labs, or to use third parties.  You do have to insure what I’ll call “structure-to-function” conformance but that’s a fairly conventional linear test of how exposed features are realized.

We now arrive at the boundary between what I’d call “service agility” and another thing with all too many names.  When a service is ordered, it takes a finite time to deploy it.  That time is probably best called “time to revenue” or “provisioning delay”, but some are smearing the agility/velocity label over this process.  The problem is that reducing time-to-revenue has an impact only on services newly ordered or changed.  In addition, our surveys of buyers consistently showed that most enterprise buyers actually have more advanced notice of a service need than even current operator provisioning delays would require.  How useful is it to be able to turn on service on 24 hours’ notice when the buyer had months to plan the real estate, staffing, utilities, etc?

The big lesson to be learned, in my view, is that “service agility” is a lot more than “network agility”.  Most of the processes related to bringing new services to market can’t be impacted much by changes in the network, particularly in changes to only part of the network as “classic NFV” would propose.  We are proposing to take a big step toward agile service deployment and management, but we have to be sure that it’s big enough.

We also have to be sure that measures designed to let network operators “compete with OTTs” don’t get out of hand.  OTTs have one or both of two characteristics; their revenues come from ads rather than from service payments, and their delivery mechanism is a zero-marginal-cost pipe provided by somebody else.  The global adspend wouldn’t begin to cover network operator revenues even if it all went to online advertising, so the operators actually have an advantage over the OTTs—they sell stuff consumers pay for, bypassing the issues of indirect revenues.  Their disadvantage is that they have to sustain that delivery pipe, and that means making it at least marginally profitable no matter what goes on above.

That’s what complicates the issue of service agility for operators, and for SDN or NFV or even the cloud.  You have to tie services to networks in an explicit way, to make the network valuable at the same time that you shift the focus of what is being purchased by the buyer to things at a higher level.  Right now, we’re just dabbling with the issues and we have to do better.

Is Ciena’s Agility Matrix Agile Enough?

NFV, as I’ve said before in blogs, is a combination of three things—the MANO platform that orchestrates and runs services, the NFV Infrastructure on which stuff is run/hosted, and the VNFs that provide the functionality.  You need all of them to have “NFV” and it’s not always clear just where any of them will come from, what exactly will be provided, or what the price will be.  Uncertainty is an enemy of investment, so that could inhibit NFV deployment.

VNFs have been a particular problem.  Many of the network functions that are targets for early virtualization are currently offered as appliances, and the vendors of these appliances aren’t anxious to trash their own revenues and profits to help the operators save money.  One issue that’s come up already is the fact that many VNF providers want to adopt a “capital license” model for distribution.  This would mean that the provider pays for a license up front, much like they pay for an appliance.  It’s easy to see how this suits a vendor.

From the perspective of the network operator, the problem with this is that it’s dangerously close to being benefit-neutral and at the same time risk-generating.  The VNF licensing charges, according to at least some operators, are close to the price of the box the VNF replaces; certainly the cost of the license and the servers needed for hosting are very close.  This, at a time when it’s not certain just how much it will cost to operationalize VNFs, how much they might impact customer SLAs, or even how efficient the hosted resource pool will be.

Ciena has a proposed solution for operators in its Agility Matrix, a kind of combination of VNF platform and partnership program.  VNF providers put their offerings in a catalog which becomes the foundation for the creation of NFV services.  The VNFs are orchestrated into services when ordered, and the usage of the VNFs is metered to establish charges paid by the operator.  What this does is create what Ciena calls a “pay as you earn” model, eliminating VNF licensing fees.

There is no question that Agility Matrix addresses a problem, which is the combination of “first risk” and “first cost” that accompanies any new service.  The question is whether operators will find this approach compelling, not so much in the short term (all that “first” stuff) but in the longer term.  That may be complicated.

The first point is that Ciena doesn’t propose to host the VNFs themselves, but to use carrier resources to host and connect.  NFVI, in short, is still the operator’s, so the operator will still have to deploy resources to offer services in those “first” days.  That means that some cost and risk are not going to be displaced by Agility Matrix.  However, most operators would probably run screaming from a vendor proposal to host VNFs—to provide what would be essentially a “SaaS” framework of VNFs for operators to integrate—because operators would fear the higher cost of hosting and the commitment to a third party.

The second risk is the utility of having VNF choices.  Obviously not all VNFs will be in the catalog.  It’s also true that many operators already know who they want their VNF partners to be and are already in relationships with them, either for CPE or in some cases for hosted elements.  The biggest value of Agility Matrix comes when the operator is flexible enough to grab functionality from the catalog for most of their VNF needs.  If the VNF they want is already available to them, or isn’t in the catalog, then they have to go outside Agility Matrix for their solution, and every such step makes the concept less useful.

The third point is that network operators want an exit strategy from these pay-as-you-go systems since they perceive that in most cases their risk will decline as their customer volume mounts, and their own leverage with VNF vendors to negotiate license charges will increase.  While the fact that Ciena’s not trying to take over hosting, only licensing, makes things easier, Agility Matrix doesn’t so far present an option to shift to a licensed approach down the line.  The operator could work through the process of taking VNF control in-house on their own (there are no contractual lock-ins), but it might create service disruptions and would likely involve a change in service-building and management.  Perpetual pay-as-you-go is a risk; Alcatel-Lucent had an Open API Service designed to build a cross-provider development framework by charging a small usage fee, and it wasn’t successful.

The final point is the onboarding process.  What Ciena is offering is a VNF framework to be bound into an operator’s NFV deployment platform and NFVI.  It’s certainly true that Ciena can offer a single approach to onboarding and even to management—which Agility Matrix promises through its own Director tool.  We don’t at this point know how many different MANO platforms there will be and what the onboarding requirements for each will look like.  Yes, Ciena’s Director element provides ETSI MANO functionality, but I’ve questioned whether this is sufficient for orchestration.  If it’s not, then it’s not clear how the additional features (primarily related to management, IMHO) would be integrated.  And even if Director is solid, additional MANO/NFV tools may be pulled into the operator because some VNFs from network vendors may not be available in any way except by license to the operator and deployment and management by the network vendor’s own platform.  For Ciena and the operator alike, this could generate some complexity in onboarding.

The final point is what I’ll call “brand connection.”  Who do you think of when you think of an NFV infrastructure?  Probably not Ciena.  Network operators in my spring survey didn’t even mention them as a candidate.  That doesn’t mean that Ciena couldn’t be a credible supplier of NFV platforms and VNF catalogs, but it does mean that a lot of other vendors are going to have their opportunity to push their wares as well, many before Ciena gets to bat.

The reason Ciena isn’t a strong brand in the NFV platform space is that it’s not clear what role Ciena’s own gear plays in the NFV world.  There is a linkage between the Agility Matrix and Ciena’s network equipment, but I think the link could be stronger and more compelling if Ciena outlined just how you’d build NFV based largely on agile optics and electrical grooming.  As I said in my Monday blog, vendors like Ciena are potentially in the cat-bird seat with respect to controlling the outcome of network evolution.  They could exploit this position with a good NFV approach, but such an approach would have to be more along the line of program plus product.  Operators should be able to use a pay-as-you-earn program as an on-ramp where they need it.

Agility Matrix is a useful concept.  Tier Two and Three operators might find it especially compelling and might even want Ciena to partner with some cloud providers to host stuff.  Even Tier Ones would see this as a way to control early cost and risk.  However, right now operators see NFV as the framework for all their future higher-level services.  They want their NFV provider to be helpful but not intrusive, and I think Ciena could do more to fulfill these two attributes.  They should try, because the basic idea is sound.

OSI Layers, Policy Control, Orchestration, and NGN

If you look at any model of network evolution, including the one I presented for 2020 yesterday in my blog, you find that it involves a shifting of roles between the familiar layers of the OSI model, perhaps even the elimination of certain layers.  That begs the question of how these new layers would cooperate with each other, and that has generated some market developments, like the work to apply OpenFlow to optical connections.  Is that the right answer?  Even the only one?  No, to the second, and maybe to the first as well.

Layered protocols are a form of abstraction.  A given layer consumes the services of the layers below and presents its own service to the layer above.  By doing so, it isolates that higher layer from the details of what’s underneath.  There is a well-known “interface” between the layers through which that service advertising and consumption takes place, and that becomes the input/output to the familiar “black box” or abstraction.

Familiar, from the notion of virtualization.  I think the most important truth about network evolution is that virtualization has codified the notion of abstraction and instantiation as a part of the future of the network.  The first question we should ask ourselves is whether we are supporting the principles of the “old” abstraction, the OSI model, and the “new” abstractions represented by SDN and NFV, with our multi-layer and layer evolution strategies.  The second is “how?”

Let’s assume we have deployed my stylized future network, foundation agile optics plus electrical SDN grooming plus an SDN overlay for connection management.  We have three layers here, only the top of which represents services for user consumption.  How would this structure work, be controlled?

When a user needs connection services, the user would place an order.  The order, processed by the provider, would identify the locations at which the service was to be offered and the characteristics of the service—functional and in terms of SLA.  This service order process could then result in service-level orchestration of the elements needed to fulfill the request.  Since my presumptive 2020 model is based on software/SDN at the top, there is a need to marshal SDN behaviors to do the job.

Suppose this service needs transport between Metro A and D for part of its topology.  Logically the service process would attempt to create this at the high level, and if that could not be done would somehow push the request down to the next level—the electrical grooming.  Can I groom some capacity from an optical A/D pipe?  If not, then I have to push the request down to the optical level and ask for some grooming there.  It’s this “if-I-can’t-do-it-push-down” process that we have to consider.

One approach we could take here is to presume central control of all layers from common logic.  In that case, a controller has complete cross-layer understanding of the network, and when the service request is processed that layer “knows” how to coordinate each of the layers.  It does so, and that creates the resource commitments needed.

A second approach is to assume cross-layer abstraction and control.  Here, each layer is a black box to the layers below, with each layer controlled by its own logic.  A layer offers services to the higher-layer partner, and takes service requests from that partner, so our service model says that the connection layer would “ask” for electrical grooming from SDN if it didn’t have pipes, and SDN in turn would ask for optical grooming.

I think that a glance at these classic choices shows something important, which is that whether we presume we have central control of all the layers or that the layers are independently controlled, there is no reason to presume that the layers have to be controlled the same way, with the same protocol.  The whole notion of adapting OpenFlow to optics, then (and in my view), is a waste of time.  Any control mechanism that lets layer services be made to conform to the request of the layer above works fine.

Is there a preferred approach, though?  Would central control or per-layer control be better?  That question depends a lot on how you see things developing, and I’m not sure we can pick the “best” option at this point.  However, I think that it is clear that there are concerns about scalability and availability of controllers in SDN, concern that leads to the conclusion that it would be helpful to think of SDN networks as federations of control zones.  Controllers, federated by cross-domain processes/APIs, would have to organize services that spread out geographically and thus implicated multiple controllers.  In this model, it wouldn’t make much sense to concentrate multi-layer networking in a single controller.  In fact, given that connection networks, electrical SDN grooming, and agile optics would all likely have different geographical scopes, that kind of combination might be really hard to organize.

So here’s my basic conclusion; network services in the future would be built by organizing services across both horizontal federations of controllers and down through vertical federations representing the layers of network protocol/technology.  You can do this in three ways; policy-linked structures, domain federation requests, and orchestration.

The policy approach says that every controller has policies that align its handling of requests from its users.  It enforces these policies within its domain, offering what are effectively abstract services to higher-level users.  These policies administer a pool of resources used for fulfillment, and each layer expects the layer below to be able to handle requests within the policy boundaries it’s been given.  There is no explicit need to communicate between layers, or controllers.  If specific service quality is needed, the policies needed to support it can be exchanged by the layers.

The domain federation request approach says that when Layer “A” runs out of resources, it knows what it needs and asks some combination of lower layer controllers to provide it—say “B” and “C”.  The responsibility to secure resources from below is thus explicit and if the lower layer can’t do it, it sends a message upward.  All of this has to be handled via an explicit message flow across the federated-controller boundary, horizontally or vertically.

The orchestration model says that the responsibility for creating a service doesn’t lie in any layer at all, but in an external process (which, for example, NFV would call “MANO”).  The service request from the user invokes an orchestration process that commits resources.  This process can “see” across layers and commit the resources where and when needed.  The continuity of the service and the cooperative behavior of the layers or controller domains is guaranteed by the orchestration and not by interaction among the domains.  It is not “presumptive” as it would be in a pure-policy model.

Multiple mechanisms could be applied here; it’s not necessary to pick just one.  The optical layer might, for example, groom capacity to given metro areas based on a policy to maintain overall capacity at 150% of demand.  Adjacent electrical SDN grooming zones might exchange controller federation requests to build services across their boundaries, and the user’s connection layer might be managed as a policy-based pool of resources for best-effort and an orchestrated pool for provisioned services.

None of this requires unanimity in terms of control mechanisms, and I think that demands for that property have the effect of making a migration to a new model more complicated and expensive.  If we can control optics and SDN and connections, and if we can harmonize their commitment horizontally and vertically, we have “SDN”.  If we can orchestrate it we have “NFV”.  Maybe it’s time to stop gilding unnecessary lilies and work on the mechanisms to create and sustain this sort of structure.

Is There a Radical Shift in Networking on Tap?

In 2020, what will “the network” or “the Internet” look like, in terms of infrastructure?  I’ve gotten that question a lot lately, as well as the question of whether it’s SDN or NFV or the cloud, or mobility or maybe content or even Cisco, that’s driving the evolution.  Of course, people also wonder who will win, either in terms of sectors of vendors or specific ones.  Nobody can consistently predict the future, but I do think there are signs we can read to give us a shot at answering some of these questions.

The key point to start with is that network change is driven by investment change, which is driven by ROI.  Technology changes happen because there’s a financial reason to drive them.  We will deploy what’s profitable, and in the end that will create new infrastructure because profit trends are better sustained by new technology choices.  The profits can be created by raising revenues, lowering costs, or (almost certainly) a mixture of both at once.

For this discussion, I want to focus on the network, meaning the elements of infrastructure that actually carry traffic.  Networks in the classic OSI sense have three layers (4-7 are in the endpoints not the networks).  Of these three, bits are created in the bottom Physical Layer (Level 1) and are steered around to create connectivity by the other two layers.  If we look at the extreme case, we can say that there’s nothing much that can be done to replace the physical layer—virtual bits don’t make much sense.  Everything above that is up for grabs.

So let’s grab a little.  Suppose we create an Optical Foundation Network using agile optical principles.  Every traffic concentration point, which includes central and tandem offices, service points of presence, mobile SGW/PGW points, and cloud or NFV data centers would be a hub of optical connectivity.  Cheap bit paths, in short.  Suppose then that we have SDN-based electrical-layer grooming in each of these locations, and that from that we have groomed tunnel meshing of all the edge locations that don’t justify optical hubbing.  This is our new Level 1, a network that provides “site connectivity” and rich optical capacity, and that is self-healing so that connection issues are resolved here and don’t appear above.

Where, above this, we have the Virtual Layer.  In this layer we build application-, customer-, and service-specific virtual subnetworks employing something like overlay SDN technology and virtual switching/routing.  We have instances of Level 2/3 technology hosted as software on cloud servers.  This layer feeds standard L2/3 interfaces to applications and user service access points.  This is where user and application connectivity lives, exploiting the transport optics below.

It’s pretty easy to see how this transforms things.  For residential/consumer services we are really aggregating to service access points, where we would need something like a BRAS (let’s call it an “SPAS” for “Service Point Access Server” to provide tunnel-to-service linkage.  VPN and business services in general could map to this model very easily by simply having a virtual switch/router instance set dedicated to the service, with the virtual instances hosted where traffic patterns dictate.

The business side of this is easy to conceptualize, as I’ve said, but some may still think that “the Internet” needs more.  Can we support the enormous dynamism of the Internet with a model like this?  To address the Internet we can exploit two trends—“metrofication” and “personal agency”.

Even today, about 80% of profitable Internet traffic stays inside a metro area.  Thus, most of the Internet traffic would be handled as service point of presence content caching or cloud traffic.  All we have to do is make a resource addressable and get a user tunnel to the on-ramp “SPAS”.  For the stuff that’s not fulfilled locally (my model says that only about 15% of traffic will fit that model) we’d have an Internet SPAS and a tunnel-and-optics model of interconnect of the major metro areas.

It’s also already clear that as mobile devices take over from desktops or laptops and as M2M evolves, we’re entering an age where users don’t interact with information resources any longer.  They ask an agent in the cloud to get answers for them.  This means that “the Internet” evolves to a structure where a bunch of information and analysis servers in the cloud form the core of a virtual information resource, whose edge is the personal agents.  User conversations are always with their agents, which means that the only traffic between “the Internet” and the users is their requests and responses.  That, in turn, means that Internet traffic is now internalized where it can easily be mapped to our optical foundation.

Our hypothetical model of network evolution leads us to a future dominated by agile optics, SDN in its pure OpenFlow or overlay forms, and virtual switching/routing.  Traditional devices at Levels 2 and 3 would be largely displaced by virtual behaviors except within the data centers, which means that data center switching might be all that’s left of traditional electrical-layer networking by 2020.  Obviously this is something Cisco might not like much (or Juniper or Alcatel-Lucent or the other big-iron players).

While this would be a seismic shift in network equipment, it might or might not result in a seismic shift in the competitive landscape.  It depends on how our future architecture comes about—will networks drive the change or will servers drive it?  Is this about SDN or cloud/NFV?

Software is a common element in SDN and NFV, and of course the software giants could decide to tune their cloud offerings to serve in both.  IBM has good cloud credentials and so do Microsoft and Oracle, but they share two challenges.  One is that operators really want as much open source code in both SDN and NFV as possible to protect against being led to siloed SDN/NFV implementations by clever vendor tactics.  The other is that both SDN and NFV are going to be educational sells, with a lot of hand-holding.  There’s just not enough software money on the table to make that attractive to the vendors, at least at first.  Software players will need alliances with vendors with more skin in the game in order to stay the course and win.

A network-driven change, one where agile optics and SDN combined in an intelligent way, could give major network equipment providers a chance.  We have minimal investment in agile optics and virtually none in electrical/SDN grooming today so both spaces are up for grabs.  The brass ring here would be easiest to access for vendors with strong optical positions already.  Among the giants, Alcatel-Lucent would be well-positioned, and Ciena among the second tier.  However, even Cisco could gain some leverage if they were aggressive enough in supporting the evolution “downward” from traditional L2/3 to SDN.

Any transition to hosted virtual networking is going to generate a lot of server sales, and this may be the compelling truth about who wins.  While, as I’ve said, management and orchestration will make the business case for either an SDN or NFV revolution, or even for the cloud, the big bucks will come from selling the servers.  Cisco is a double threat because they have networking and IT.  HP would be the more logical favorite here, though, because the evolution of the network I’ve described would be all positive for them where Cisco would clearly see it as robbing next-quarter Peter to pay 2020 Paul in terms of revenue and profits.  HP could leverage both its cloud positioning and NFV positioning easily in such a transformation of networking, and it could also grab some of the virtual layer with its SDN.  Brocade, a player in data center networking and virtual routing at the same time, would also have a shot at greatness.

Whoever wins in my evolved model, they’ll play in a world where networking is very different, and also IT.  We’re heading for a fusion of the cloud and networking in all respects, driven not by the arguable and variable forces of technology but by the relentless drive of economics.

With all these choices of drivers and vendors, it’s convenient to think we might muddle along without bringing this new model about.  That’s possible, of course, but I think that my opening view is ultimately going to prove correct.  Networks are built by ROI not by abstract technology evolution, and there is simply no way to make an industry that’s dependent on consumer experience delivery for its primary growth into an industry that relies on higher bandwidth pricing.  You have to get to the top of the food chain to sustain profits, both as a provider and as a vendor.  And if I’m right, the positions of all the players will have been set by the inevitable capital inertia of evolving infrastructure by 2018, so we’ll soon know who will be the giant in the network of the future.

What the Heck is “Carrier Grade?”

One of the interesting issues that I encountered at the HP Discover 2014 event this week was that of “carrier grade”, and I even had someone make a related comment on a prior blog of mine.  For ages, people have talked about how important it was to be “carrier grade” and offer “five-nines” reliability.  NFV certainly has to support the standard for reliability, and so does SDN, but do we know what that standard is?

There are two factors that influence carrier requirements for reliability.  One is the service-level agreement offered for the service (explicit or implicit) and the other is the operational cost of an outage.  You don’t want SLA violations because it will hurt your churn rate and often cost you money in reparations, and you don’t want failures that drive up opex.  So the question is how to achieve enough availability to suit these two requirements.

In the SLA area, we inherited the notion of five-nines from the old days of TDM.  In TDM networks, operators measured “significantly errored seconds” and “error-free seconds” and corporate SLAs were stated in these terms.  Clearly this micro-managed SLA notion was going to create major reliability concerns, and if you’re writing SLAs with one-second granularity you can’t take the time to fail over to another device or path if something breaks.

Just try to buy an SLA with second-level granularity in VPNs or Ethernet.  In packet services of all types, we rely on what I’ll call “statistical SLAs” and not on highly granular ones.  A statistical SLA says that any event has some probability, including an outage, but that probability is low over time.  You write an SLA in order to reduce the violation rate, partly by managing availability but also partly by managing the granularity.  Packet SLAs usually measure outages over a long period—a week or a month.

It’s not totally about contractual stuff, though.  Anyone who uses a mobile phone knows that five nines is a joke; I don’t think I get one nine myself and I had major issues in hearing the other party in the last phone call I made (yesterday).  I also have regular issues with voice services over IP even with wireline access; maybe I have a nine or two there.  The Internet?  Forget it, nines-wise.  The fact is that we have accepted “best-efforts” services with no SLA at all for most of our communications usage.

So here we come to what could be called “the progress of the mythology” of SLAs.  We say “Five nines is crap; we don’t need highly available devices at all.  What we do is to fail over.”  That isn’t necessarily true either, for three reasons.

Reason one is that most alternate-path or alternate-device responses to failure will in fact break the service connection for at least some period of time.  If a packet connection breaks because of the loss of a device, packets in the flow are dropped and there is often a period of time when no path exists at all (before adaptive recovery finds a new one) during which more packets are dropped.  The point is that while we may accept this level of impairment, we cannot make a service five-nines in most cases through redundancy unless we have essentially hot standby.

Which brings us to reason number two.  The process of making something fail over fast enough to create a reasonable alternative to not failing at all involves both redundancy of facilities and agility of operations response.  Neither of these are free.  With NFV, we have an expectation that benefits will come from capex reduction, opex reduction, and service agility.  Two of the three benefits are impacted by infrastructure changes intended to improve availability.  So what we’re saying is that at some point, the cost of making something resilient is higher than making it reliable.  That’s more likely to happen as overall network complexity increases, and NFV’s substitution of chained resources for simple boxes creates more complexity in itself.

The third reason is that shared resources multiply problems as fast as they multiply users.  One device failing in the old days creates one SLA violation.  A device supporting a thousand streams of service might fault a thousand services.  The process of recovery for all thousand services is unlikely to be fast enough and effective enough to resolve all SLA problems, and in fact finding new resources for them all may spread the issue toward all the service endpoints, which can create a storm of management interventions that further add to cost and complexity, and reduce those nines.

My point here isn’t that NFV is a bad idea, that it’s not reliable.  What I’m saying is that all this talk about how many nines make a carrier grade is kind of useless.  We left absolutes behind when we moved to IP infrastructure from TDM.  We have to manage everything now, including availability.  Services have SLAs, and we have to guarantee SLAs within the economic framework set by the acceptable price of the service.  How reliable is NFV?  As reliable as users are willing to pay for it to be, and as carriers are willing to dip into profit margins to make it.  There is no standard to hold to here other than the standard of the market place—overall utility.

So what about servers?  First, there’s a myth that carrier-grade means NEBS-compliant.  NEBs was about power and RFI, not about availability.  You can have a piece of NEBS gear that needs a live-in tech.  Second, you can never make a box that isn’t five-nines into a virtual box that is five-nines by combining boxes.  The risk that two essential devices with the same reliability requirements creates in combination is higher than the risk either of them pose alone—read the combinatory rules for MTBF and you’ll see what I mean.

Servers that will support stringent SLAs in NFV will have to have higher availability than those that don’t, or to be more accurate it will be more cost-effective to support SLAs though server reliability measures than by operational measures as the SLAs become more stringent.  Servers that will support a lot of tenants need to be more available too, because the multi-tenancy multiplies the overall number of SLAs at risk and complicates recovery.  So while every server vendor who wants to populate NFV infrastructure may not need “carrier grade” technology, you can be darn sure that every operator who deploys NFV is going to have to deploy some of it via carrier-grade servers.

A Deeper Dive into HP’s OpenNFV

I’ve blogged before about HP’s OpenNFV strategy, important IMHO because it’s not only from perhaps the most inherently credible of all possible NFV sources but also the functionally most complete—at least in terms of planned features.  I’ve attended the HP Discover 2014 event in Barcelona this week, spoken to all the key HP NFV people, and I have a better understanding of OpenNFV than before.  I’d like to share it with you, and also take a moment to thank all of you who came up to me at various points to say they’d enjoyed my blog.  It’s great to get two-way feedback, and much appreciated!

I think what sets HP apart from the first is the fact that the company has its head in the clouds, NFV-wise.  The head of the HP NFV program is Saar Gillai, also SVP/COO of HP Cloud, and everyone involved in OpenNFV knows that it’s an application of cloud principles to the hosting of network functions.  You all know that from the first I’ve said that the cloud has to be the root and heart of NFV, and HP clearly believes that.

At a high level, OpenNFV is a cloud project, an open initiative designed to build an ecosystem in the very same way you’d expect a cloud activity to work.  HP proposes to provide a platform, roughly corresponding to the ETSI NFV framework at one level but extending it as well.  This platform is then a host to partners who extend its functionality in two primary directions—the VNFs and NFVI.

HP has a strong belief that VNFs have to be open, meaning that an NFV platform can’t be tuned to support only its own vendor’s VNFs and thus create a kind of NFV silo.  They’re signing up partners to contribute VNFs and VNF management tools, tools that don’t require that the VNF code be customized to be on-boarded.  Given that you don’t get much business value from NFV without virtual functions, that’s an important step.

In the world of NFV Infrastructure, HP’s partners bring deployment and management tools that broaden what VNFs can be run on.  An example I found particularly interesting was Genband’s “Metal-as-a-Service” concept.  This creates a deployment framework for VNFs that can deploy on VMs but also on dedicated servers.  In either case, the same horizontal scaling and adaptive replacement of hosts is provided.  This is smart because, as I’ve pointed out before, many proposed applications of NFV are already multi-tenant in nature and would normally not run in a VM at all.  IMS and perhaps parts of EPC come to mind.

The NFVI dimension is also interesting, because HP itself also extends the NFVI piece, and in perhaps the most critical dimension—to legacy network infrastructure.

The OpenNFV Director, which is HP’s orchestrator and management framework, sprung from work HP did in the OSS/BSS space, to extend service activation functionality to the control of legacy network devices.  Because Director inherited this capability, HP is providing what is in effect a parallel “Infrastructure Manager” to the “Virtual Infrastructure Manager” mandated by ETSI.

IM/VIM elements are critical to NFV because they’re the link between service logic models and infrastructure, the boundary between what I’ve called “functional” and “structural” orchestration.  With HP’s approach, service functions exposed by networks can take the form of hosted functionality (deployed through OpenStack, in the HP model) or embedded features of legacy devices and networks accessed through NMS APIs.

The reason that this legacy integration is critical was exposed in a number of the panel sessions on the NFV track of the conference.  The issue in a nutshell is that NFV’s most credible benefits—operations efficiency improvements and service agility enhancements to advance operators into new service areas for revenue augmentation—are inherently ecosystemic in nature.  You can’t make part of a service efficient or agile and gain the full benefits you’re targeting.  In fact, since NFV is likely to initially roll out over a small piece of the total infrastructure pie, you may not even see any local benefits to NFV in these areas until the deployment of NFV is well along.  That begs the question of how you prove the benefits of NFV in those early “not-enough-to-notice” phases to justify that larger roll-out in the first place.

I’ve pointed out that in my own surveys operators were unanimous in their view that they could not at this point make an NFV business case, nor could they see how any of their current PoCs would allow them to do that.  A Current Analysis survey showed similar results; on a scale where 5.0 is “full agreement”, “Concerned about NFV ROI” ranked about 4.4 with operators.

The net result of the HP extension of VIM to IM is that you can apply the OpenNFV Director to all of the network, not just the NFV part.  That would mean that operators can immediately spread the service automation that NFV demands to the whole of a service.  It also means that you could in theory “orchestrate” networks that had no NFV at all.

I’ve blogged before that MANO principles were so important to networking as it is today that they’d arguably create a massive benefit even if NFV deployment never happened.  That’s still true, but the goal would not be to eliminate NFV but to support an on-ramp to NFV’s benefit expressway.

Let’s say an operator plans to offer enhanced higher-layer services to augment their revenue stream in Carrier Ethernet.  They would normally sell users a box or boxes to provide edge functions like NAT, VPN on-ramp, firewall, virus scanning, and even extended management of site networks or facilities management and control.  With the OpenNFV approach, this operator could roll out the service using legacy facilities where they are in place or convenient, but immediately manage them using the automated processes that NFV would normally provide.  This early “native box” deployment could evolve to a distributed edge-hosted version of NFV where VNFs are loaded as needed into CPE.  That could in turn give way to VNFs hosted in the cloud.

This progression would let operators set operating practices and tools immediately and uniformly for an evolving set of service resources, deploying NFV where and when it’s most valuable.  That contains operations impact in the early stages, reduces first costs, and eliminates the need to strand assets to achieve homogeneous infrastructure to unify operating practices and secure optimum opex benefits.

All of this demands robust service modeling, and in a number of panel discussions and conversations HP articulated its commitment to that functional/structural modeling process.  At the functional level, a level most NFV vendors still won’t accept, HP proposes to use a flexible XML framework to describe services and their management connections.  This would create a hierarchy of service elements similar to that which the TMF defined in their notion of “customer-and resource-facing” services.  The “customer-facing” part would be functionally modeled, and those models decomposed on deployment to map to structurally orchestrated deployments—either of legacy network features or VNFs.

HP believes that a central management tool beats having management deployed ad hoc by including it in or binding it with VNFs.  I agree, but I’d like to convince them that this central tool should be logically central but physically distributed—linked to the service data model as a state/event-activated process set.  I’d also like to see them evolve from their initial position of presenting VNF features as virtual devices to make them compatible with legacy management.  The end-game in  my view is where both legacy and VNF-based functionality is virtualized and presented in all the forms needed by the operator or service user.

HP is serious about this, and so are their partners.  I attended a partner reception, met a lot of you readers of my blog there, and observed and took part in a lot of NFV discussion.  This at a museum location with a stunning view of Barcelona and an overall ambience that hardly encouraged technical discourse.  Yet these partners were talking NFV—all the aspects from technology to deployment models and business cases.  There were so many interesting discussions that I couldn’t even listen to half.  It’s engagement here, engagement of the truly committed.

If you look into any NFV deployment, no matter who provides the technology, you’re not going to see a network, you’ll see a cloud.  But if you look at NFV even in mature installations, you’re going to see it as part of a network.  HP has harmonized these critically different perspectives, and by accommodating both created a credible pathway to getting NFV out of the lab and into the business.  So far they’re the only vendor who can say that.

NFV: Where Do I Start?

I blogged a year ago that optimum implementation of NFV could create between 80,000 and 130,000 new data centers, making NFV potentially the largest source of new server and data center component installations.  There is little doubt that NFV could touch every aspect of virtually every service, so there’s lots of reasons for operators to plan for it.  The big issue in that planning, according to my own research, is just how to get from lab to field.  Testing NFV technology doesn’t exercise the business case.

So how can operators make the transition?  It’s not an easy question when you consider it in the general sense, because the answer depends on just how opportunity and infrastructure combine to create motivations and inhibitions.  One thing is clear, though; NFV has to develop proof points quickly or time will pass it by.  That means that what’s important about NFV may be more where to start than where it might take you in the long term.  I call this the “first-NFV” position.

There are three motives for NFV; capital equipment savings, operations efficiencies, and service agility enhancements.  There are two primary areas of consideration for carriers too.  One is the opportunities presented in their geography and the second is the nature of their current infrastructure.  Over time, I think it’s likely that all three will be used by operators, but if we stay with our first-NFV theme, we can see how some operators’ mixture of infrastructure and opportunity could make one of these a better starting point.

Let’s start with the two areas of consideration, and the primary one of opportunity.  Opportunities depend on both the customer base and the prospect pool.  Carriers operating in a mature geography whose prospects are themselves mature in their network and IT plans have a limited ability to exploit new revenue streams.  They’d want to focus on making what they can already do more profitable, meaning lowering both capex and opex.  Carriers who have significant upside in their primary geographies may want to focus first on new services, to avoid the risk that competitors would step in and sweep all the easy prospects off the table.

Most operators have only a limited notion of the opportunities presented in their own geographies.  When I’m asked to evaluate NFV business cases, the first thing I usually ask for is an opportunity analysis, and usually they have to develop one.  Without such an analysis it’s impossible to size NFV new-service benefits.

If opportunity presents the whip of change, then the anchor is current infrastructure.  Where infrastructure assets are fairly well depreciated, the carrier has the option of making changes quickly without facing major write-downs.  That opens the door to lower-cost solutions even for current problems, so capex reduction is an option.  It also lets operators optimize operations efficiencies by changing technologies as needed, and it insures that new service opportunities can be properly supported as soon as they’re credible.  Where assets have a long residual depreciation period, any changes demanded by NFV may be difficult to justify because the prospective savings will be eroded by the write-down of current gear.

Into this mix we now introduce the three benefit drivers for NFV.  In most cases, a carrier’s current NFV project focus should be finding something that can prove out one or more of these three benefit drivers to the extent needed to get a project into field trials and onward into deployment.  That means addressing what NFV can do functionally while considering how doing it will impact costs (both long-term and first-cost) and operations practices.

Capital equipment savings through NFV will likely depend largely on the efficiencies of resource hosting.  Those efficiencies are easiest to demonstrate if you can justify a large resource pool, meaning that you’d like not to have to distribute stuff too far afield.  There seem to be two factors that promote a first-NFV strategy based on capex savings; cloud computing commitment and carrier Ethernet opportunity.

If you have a cloud data center you already have a resource pool to play with, and that will make even your early NFV deployments more capital-efficient.  Operators who have several cloud data centers in major metro areas are in even better shape because this will reduce the distances that intra-component connections have to travel.  Ideally your NFV deployments would start where you had cloud resource pools available, and that reduces first cost and risk as well.

The carrier Ethernet dimension relates to the use of service chaining to create virtual CPE to offer advanced services, including security and facility monitoring.  The best place to start these services are with the high-value customers, meaning those with big data pipes.  Carrier Ethernet CPE that can act as a service hosting element initially can be helpful if your accounts are scattered over a lot of central offices, but if you have concentrated customers you may be able to go directly to server hosting.

It’s probably clear by this point that if you are a capex-driven first-NFV thinker, you’re really building an IaaS cloud and you should think of your NFV strategy as an IaaS strategy first and a network strategy second.  Get a lot of cloud-planner input here and you’re likely to have a more successful NFV launch.

Skipping to the last driver on our list, most operators find increased service agility a major justification for NFV.  “Service agility” means two things—getting functional elements deployed and integrated into cohesive service offerings, and finding the elements in the first place.  Service agility is really all about virtual network functions, in short.

The key question to ask NFV platform providers if agility is your driver is just how VNFs are on-boarded.  Some NFV strategies are highly parochial in the VNFs they’d run, which means that you may be either limited by the choices your vendor supports or forced to deploy “silo NFV”, meaning multiple NFV platforms, to insure that you can get a rich supply of virtual functions to build services from.  That course of action is likely to complicate service creation and deployment considerably.

But what functions?  An agility-driven carrier needs to be thinking first and foremost about the opportunity space.  Who do they have to sell to?  How will the prospects be reached?  How long will the sales cycle be?  Can you contain your geographic exposure by limiting the marketing scope, or will you have to be prepared to roll out a service anywhere an order comes in?  The intersection of what VNFs you can get and who you can sell services to will determine whether service agility is a real driver or one of those field-of-dreams things.

Ultimately, service agility is a form of software-as-a-service (SaaS).  It’s a partner and development process, one that will require that you carefully match targets to software in order to build early profits and gain management buy-in.  Logically, agile services should even include cloud-application components—VPNs and CRM together are a stronger offering than either would be separately.  Above all, it is not a place to go because you can’t really validate other drivers and you need something to justify NFV.  Service agility is the most challenging driver to realize, though it’s the one with the largest upside.

For those with aspirations of using NFV to increase operations efficiencies, then you need to be thinking of your NFV strategy more in terms of how it can harmonize and automate service management than in terms of what virtual functions you need or how you’ll build resource pools.  If opex is your thing, then you are really building an extension to OSS/BSS, a CIO-type activity.

NFV operations and management is the least mature and least described of all the NFV capabilities/features.  That means that if you’re after opex advantage you’ll need to do some real digging, not only into how NFV elements (VNFs and resources) are managed but also into how NFV element management is linked into overall management practices.

But first, you’ll need to assess the management scope of your NFV offerings.  NFV is easiest to manage where it makes up most of the service logic, where SLAs are not rigorous (best efforts is nice), where service-to-resource management correlation is not a strong requirement.  NFV management is also easier where resource-sharing is limited, so too many VNFs and too much interconnection makes it more complicated.  Remember that opex and capex present approximately equal costs, so if you introduce too much complication into VNF deployment to reduce capex you’re likely to raise operations costs more than you’ve saved.

So what’s the best approach?  Most operators will want to look at three service areas first.

First, business services that enhance Carrier Ethernet through the introduction of access-point services like security or facilities monitoring.  If these services are viewed as a combination of edge-hosted features and cloud-hosted features you can control first costs better, and business services offer higher revenues to improve early return.  They’re also easier to manage using traditional virtual-device strategies.

Second, multi-tenant services for content delivery and mobility.  IMS and EPC can both benefit from the introduction of NFV (and SDN) technology, but here it’s important to look beyond a simple translation of service delivery platform services into hosted services.  We need to think about IMS and EPC differently to wring out the benefits we need.

Third, integrated cloud services.  This is in my view the big opportunity.  Users who want cloud computing want it delivered to the workers, not just whizzing about in the aether.  Integrating cloud services with VPN or Carrier Ethernet services not only moves the operator up the revenue chain, it moves infrastructure toward large efficient resource pools, making the capex reduction argument stronger.  It also encourages cloud-centric operations and management.  Win-win.

In 2015 we’re going to test NFV where it counts.  I think focusing tests on these three areas is the strategy most likely to pay off for NFV, and for each carrier thinking about it.