The New Vision, and Players, of CORD Could Change the Transformation Game

The news that Google has joined the CORD (“Central Office Re-Architected as a Data Center”) project, and that the project is now independent under the Linux foundation, is good news for next-gen networks.  I’ve always thought CORD was important because it describes what operators would face in an infrastructure transformation—a massive shift in data center architecture.  CORD doesn’t have an easy path, though, and it’s hard to say at this point just how much influence it will really exercise.

CORD faces the “Does the Destination Define the Trip? Challenge.  Suppose you were a promoter of boats, and you traveled back in time to the days of dugout canoes to meet with the chief dugout architect.  Suppose you presented a plan for a jet boat as the logical end-game for the evolution of the dugout.  Yes, you could darn sure expect to create a lot of interest, but how much momentum could be created just by describing the end-game to somebody who’s still in the first inning?

The point here is that while it’s important to understand just what a CO re-architected as a data center would look like, if for nothing else to validate the direction of evolution, it’s the evolution that will probably be the focus in the near term.  Can we get to a rational evolution from the end-state?  Perhaps.  Can we get there from where we are now?  Also, perhaps.

The basic CORD model is a series of “PODs” which are server/switch complexes that are combined into a data center through additional switch connections.  This structure isn’t a service-specific product set as it would be today, but rather a resource pool for the hosting of virtual stuff.  What we see today as elements of a CO, including the subscribers, access, Internet, etc., are framed in as-a-service form and hosted on the POD complex.  Users, whether they’re consumers or enterprises, wireline or mobile, consume these -aaS elements because they’re used to compose and deliver the services.

The CORD documentation describes CORD evolution as follows: “Given this hardware/software foundation, transforming today’s Central Office into CORD can be viewed as a two-step process. The first step is to virtualize the devices, that is, turn each purpose-built hardware device into its software counterpart running on commodity hardware. The key to this step is to disaggregate and refactor the functionality bundled in the legacy devices, deciding what is implemented in the control plane and what is implemented in the data plane.  The second step is to provide a framework that these virtualized software elements—along with any cloud services that the operator wants to run in the Central Office—can be plugged into, producing a coherent end-to-end system. This framework starts with the components just described, but also includes the unifying abstractions that forge this collection of hardware and software elements into a system that is economical, scalable, and agile.”

If you took this statement literally, you’d say it describes a fork-lift upgrade.  If you assume that it describes not a one-shot transformation but a controlled set of evolutions, then it’s not a radical shift from the NFV notion of defining network functions and then virtualizing them.  The primary difference with NFV would be the second step, which presumes establishing a complete ecosystem (the transformed CO) into which the individual steps make contributions.

CORD also inherits the business issue NFV faces, which is how you justify even controlled sets of evolutions.  By now, only unbridled optimists think telcos would simply commit to an evolution to a hosted framework for infrastructure with no specific drivers.  We’ve seen a lot of attempts to promote NFV broadly, and none have been anything more than exercises in self-delusion.  We’ve also seen some limited NFV service validations emerge, and it’s these that would probably have to form those controlled sets of evolutions.

Of all the things that have emerged as candidate for transformation, virtualization, and hosting, it’s mobile infrastructure that stands out.  The reason is simple, and in fact it’s the same thing that makes mobile infrastructure a prime NFV target—scale.  Virtual CPE is so far deploying more as open premises devices than as cloud-hosted elements.  IoT is promising in the long term, but in the short term it’s not clear anyone is yet presenting a viable service model.  But mobile services are ubiquitous, growing in volume and competitive importance, already receiving a massive share of capex, on the cusp of 5G, and 5G is emerging as a copper-loop alternative.  What more could you ask?

Well, you could ask for incredibly broad support for an open model, and that’s another problem CORD inherits shares with NFV.  We have not, as an industry, even defined a complete “platform vision” for portability of resources and hosted functions (we’re nearer to the former than to the latter).  Vendors who can deliver a strong mobile-centric CORD would have a massive temptation to try to leverage it in other services, rather than to share it selflessly with others.

If CORD can drive a business case, the question of where it drives one still arises.  There are a three areas where CORD has the potential to generate significant progress in transformation.  One is in the area of creating an open framework hosted functions, the second is in the use of Docker/containers as the platform for control software overall, and the third is resource-pool and service networking.

The ONOS project, from which CORD was spawned, was about a network OS that would provide middleware services that included stuff appropriate to hosted functions.  That could easily turn into an initiative to define a set of standard middleware functions that VNFs could call for lifecycle management and for information about the resources they’re hosted on.  If all VNFs could be expected to run on a specific function set, they the onboarding of VNFs could be made trivial.

The control functions of CORD are expected to be Docker-hosted, and while that doesn’t demand that they be designed for elasticity, resilience, and scalability, it at least supports those goals.  CORD seems to mandate them, so it seems likely that reference and real implementations of CORD would be made up of microservices (or “microfunctions”) that were hosted in a highly dynamic ecosystem.  This is a departure from the framework for NFV, which doesn’t demand any such capability set.

Interestingly, one of the OpenStack vendors (Mirantis) has just pledged to create a version of OpenStack that runs in containers and is orchestrated by the popular Docker/container tool Kubernetes.  The argument is that it would make OpenStack scalable, which it currently isn’t.  One of the backers of this model for OpenStack is Google, who also happens to have just joined the CORD initiative.

The final point, networking, is a serious issue for all virtual environments, and one that nobody has really addressed properly.  OpenStack Neutron, which is currently called out by CORD, has some significant limitations, to the point where I tell clients that they should adopt one of the vendor SDN approaches rather than rely on native Neutron.  I believe that Kubernetes could be used to orchestrate in third-party SDN tools, and I think that if OpenStack becomes containerized it would likely mean that CORD would use it in that form, which might then create momentum to address the whole virtual-network question as it applies to hosted service features.

Virtual networking in an SDN sense or overlay sense is important to any infrastructure that has to separate tenants and maintain the isolation of its own control and management planes.  However, for all these applications it’s also important to address the question of how something could maintain membership in multiple VPNs, in effect.  It’s worth noting here that Google’s Andromeda project is arguably the most cloud-driven model of virtual network access out there, and it specifically addresses these “multiplicity” issues.  If Google contributes and enhances Andromeda, it could give CORD capabilities that the OTT giant has used for its own services, and that could be a major boon to operators wanting to enter the OTT service space (listening, Verizon?) and to NFV as well.

In the end, CORD is perhaps the most promising path to next-gen infrastructure, but it comes back to that “the destination isn’t the route” problem, and that comes down to making a business case.  In a sense, a commitment to transforming the CO sets the bar for CORD higher than for NFV, which can still be said to succeed with virtual CPE and no real infrastructure transformation at all.  But that’s also a virtue; CORD has to promote a holistic vision of benefits because it’s a total transformation.  We darn sure need to find such a vision somewhere.

How Huawei’s Growth Highlights an Industry Challenge

Huawei is certainly on a roll.  It reported revenues up 40% in the latest quarter, up from a 30% gain the quarter before, and that’s certainly the best in the whole industry.  At a time when rival Ericsson is shedding its CEO and rivals Alcatel-Lucent and Nokia are merging for efficiency, Huawei seems to be heading yet again for the deposit window at the bank.  What’s really interesting is that this isn’t “news” in a literal sense.  It’s not novel, not surprising.  It’s also not over.

What did we think was going to happen here?  It’s been four years since I heard Stu Elby (then of Verizon) talk about the converging revenue- and cost-per-bit curves.  The sense of that talk, and of others made by other operators since, was that operators had to do something to push those curves apart or infrastructure would become unprofitable.  Vendors have, in general, shrugged.  Cisco’s ongoing story has been “People demand more bandwidth, and you’re a bandwidth supplier.  Suck it up and buy more routers!”  Well, the operators are doing that.  Except for here in the US where Huawei has a political problem, they’re buying more routers and other gear from Huawei.

The Chinese giant networking vendor doesn’t break out its market sectors so you can’t draw a specific conclusion on carrier sales from the overall numbers, but it’s hard to see how any credible level of smartphone sales growth would push up revenue that much.  It’s also interesting to note that gross margins declined, which is what you’d expect to see if your buyers are under cost pressure.  Telecom equipment is a commoditizing market.

Consolidating to be more competitive is a fine response in such a market, but I don’t think there are any CFOs or CEOs out there who believe that they can match Huawei’s pricing.  If they do, then why haven’t they done so and why isn’t Huawei seeing a revenue plateau?  But unless those CEOs and CFOs are sticking their heads in the sand (or elsewhere) they can’t escape the fact that their business is going to decline because their buyers’ own business is already declining, and Huawei is living proof.

So, what do vendors to?  What should they do, since clearly they’ve not been doing anything much that’s useful?

First, recognize that there is no easy, painless, solution to the declining profit per bit.  You can’t fix this problem with SDN or NFV or white or gray boxes, you can’t cover it up with bogus TCO calculations and meaningless technology proofs of concept.  Ten well-integrated elements that don’t make a business case still don’t make one.  Every vendor except Huawei is going to be under pressure unless they have a specific, credible, short-term-result-generating, approach to pressure release.

Replacing gear with new technology isn’t going to solve the problem because it would take too long given the long depreciation cycles of network equipment.  Chambers was right in not fearing white or gray boxes.  NFV hosting doesn’t impact even 5% of equipment budgets based on current credible virtual function targets.  You can get 20% off from Huawei with negotiations, and with no risk to current investment, practices, or infrastructure—and you can get it right now.

Second, recognize that the only short-term technology initiative that could create a significant financial impact is operations automation.  The challenge is that you cannot make operations automation dependent on the very fork-lift infrastructure change you’re trying to avoid.  Operators spend fifty percent more on “process opex” today, the cost of support and the prevention of churn through enhanced features and accelerated programs, than they do on capex.  They have to automate today’s processes, the ones dependent on today’s infrastructure.

Everyone says that network operations is about fighting fires.  Well, you don’t get fires to follow an orderly workflow.  They’re events, and in order for operations automation to work the software has to manage events.  The problem we have is that standards bodies and operations software companies tend not to think in event terms.  I blogged earlier about the NFV ISG’s framework and the fact that the literal application of the structure of their end-to-end model would yield a workflow rather than an event-driven software application.  The detailed work being done now, even though this wasn’t supposed to be an implementation description but a functional specification, is taking the ISG deeper into a flow-based implementation.

The final step is to focus next-gen changes on the cloud.  As I’ve noted in other blogs, operators have an enormous asset in the number of places where they can locate cloud data centers.  Event-driven processes are better in cloud-hosted form in any event, but even were the capabilities equal the cloud option would promote a shift of processing toward the service edge that would benefit operators.  That in turn could improve their position in the longer term.

Operations automation improvements will help operators relieve the pressure of profit compression, but once you’ve automated operations you’ve reaped the benefits and you’ll probably still face compression down the line.  If services are migrating to a hosted form, via whatever technology you like, then operators have more experience in achieving infrastructure economy of scale in other areas, and they should be able to do that in hosting too—if they prepare by making their stuff cloud-dependent.

Things also have to be benefit-centric, both in positioning and architecture, for operators to utilize them.  Since SDN and NFV got started, we’ve been stuck in a capex focus when every operator I talked with said that capex reductions wouldn’t be enough to drive either SDN or NFV.  That’s why operations automation really has to lead, to guide the way we apply all these points.  You can see progress in the opex-centricity of things in Netcracker’s emphasis on NFV (and their leading position in Current Analysis’ survey), and most recently in a white paper by Amdocs whose title (Beyond Next-Generation OSS) says it all.

Will this, if other vendors follow the outline, help them compete with Huawei?  Perhaps, but it’s far from sure that any of the other vendors really want to compete as much as to somehow (against all odds and logic) preserve the status quo.  In fact, of all the network equipment vendors out there, Huawei is probably the most active in building the kind of vision I’ve described here.  Competitors aren’t shut out, but they’re losing first-mover advantage.  When you lose that to a price leader, you’re in big trouble.

Router vendors like Cisco and Juniper (who just reported their quarter) are benefitting from protection from Huawei competition in the US market.  Juniper said on its earnings call that of their top ten customers, only two were outside the US.  There are signs of resistance to the current ban on Huawei in the US carrier market, and if it’s lifted then sales and margins for a lot of deals will be at risk.  But Juniper, at least, made a very big point about service automation on its call, and it’s also retargeting its Contrail SDN offering for the cloud, wrapping agility and automation into one package.  That may be a helpful trend if it spreads.

The key point, in my view, is that every visible signpost in the industry is pointing to cost-pressured capital spending by operators.  Staying strictly with current equipment and practices will simply cede the market to the price leader, Huawei.  It’s way past time to propose a helpful evolution of networking, if you don’t like the Huawei-wins-it-all outcome.

Can Verizon Make a Yahoo Deal Work?

Just a little short of a decade ago I opened a connection with Yahoo at the request of a group of big Tier Ones.  They wanted to create a cooperation between Yahoo and operators to fend off the issues of OTT competition by essentially joining the enemy camp.  Jerry Yang was running Yahoo at the time, and my contact told me that he saw the telcos as enemies and my initiative was dead on arrival.  Now a telco is buying Yahoo, and you really have to go back to that early interest in partnership to understand why, and what could go wrong.

The Internet has revolutionized our lives, and it also revolutionized the business of networking.  Up to the time of the Internet, telecommunications services were paid for based on a combination of the bandwidth of the connection (implicit or explicit) and the duration of service.  The Internet ushered in bill-and-keep and usage-insensitive pricing, and from the point where consumer broadband became popular, operators’ revenue-per-bit began to fall faster than cost improvements could match.

The reason for this was that for over-the-top players and applications, the Internet was a kind of free delivery.  You paid to attach to an ISP and you bought access to every consumer on the planet.  The applications that made the Internet a personal revolution made Internet traffic a losing game for those who carried it.  What operators hoped for a decade ago was a way of partnering with the OTTs to share revenue, share cost of transport, or both.  Given that the status quo was highly favorable to OTTs, it’s no surprise that Yahoo wasn’t interested, nor were other OTT giants.

Underneath the whole bits-and-settlement thing, though, there was another economic truth.  Ads were what sponsored OTTs, and ad budgets have actually fallen from their global high.  At best, it’s a zero-sum game.  On the other hand, the telco market is for paid services and the global revenues of all telcos is at least an order of magnitude greater than total online ad spending.  Limited total market, increasing startup competition—somebody had to lose and Yahoo was one (AOL was another, and Verizon bought them last year).

At a high level, this points out the negative side of Verizon’s decision.  Why buy into a market that’s capped at a fraction of the market you’re in, where competition has already undermined the player you’re buying?  The argument is that Yahoo is cheap for the potential it offers Verizon, but what potential exactly is that, and what does “cheap” mean?  The financial analyst consensus is that Yahoo would be experiencing almost 7% negative growth in the next five years, and both Yahoo’s and Verizon’s stocks were off the morning after the deal was announced.

Then there’s another more “underlying” truth, literally, to the deal.  No network operator can enter the OTT space with the thought that the move will overcome their declining profit-per-bit.  If you and your competitor both depend on network transport to reach eyeballs for ads, and if you have to subsidize a loss on that transport while your competitor does not, you’re in trouble.

What’s in this for Verizon, then?  The probable answer is the same as it was with the AOL deal, which is two things—platform and brand.

Verizon is like a lot of telcos in that it has a specific footprint.  Yes, its mobile operations can span the globe with roaming, but it has a TV presence where it has FiOS, which is its old territory in the northeast.  Wireline infrastructure is local; advertising is a mass market, and that’s particularly true when delivered via web applications/pages.  Verizon probably reasons that having ad capability in general, at the web level and across screens, will help it sell advertising and monetize some of the traffic they’re losing money on carrying.  AOL and Yahoo also give Verizon a national video and ad delivery capability, and a baseline set of customers and services.

The platforms Verizon has acquired could also be very useful—obviously for the services they already support but also for related services.  They could be leveraged in Verizon’s mobile base and also perhaps in their FiOS broadband and TV properties.  They could also, in theory, form the basis for self-care offerings for consumers and even be adapted to delivering business services.

The challenge, obviously, is whether all this potential can/will be realized and whether the two properties are worth a combined ten billion.  That’s harder to say, but it is possible to see how Verizon might be able to maximize their opportunity.

First, OTT assets are uniquely software-based because they are “over the top” of the network.  The influx of software assets and expertise could be of great value to a network operator at a time when it’s clear that service value and operations efficiency will in the future depend on software.

The question is whether Verizon will keep their AOL and Yahoo assets so compartmentalized that they won’t have a chance to do anything they’re not already doing.  Given that what they were already doing wasn’t enough to prevent both companies from losing market share rapidly, that’s not a good outcome.  What Verizon has to do now is work to define a unified service platform architecture that can host all this agile OTT asset stuff, and that provides a means of selectively linking it with cloud features, SDN connectivity, NFV functions, and so forth.

Such a platform would also promote the second pathway to optimizing opportunity, which is to leverage the OTT stuff to jumpstart carrier cloud.  What all telcos need, including Verizon, is a credible and quick way of generating ROI from a widespread, edge-focused, cloud deployment.  If you could get AOL and Yahoo stuff to lead the charge they could in theory build enough mass to create credible economies of scale, and thus lead to other exploitations of those edge data centers.  That would be a huge step toward both NFV deployment and cloud computing success for the telcos.

Yahoo could help Verizon exploit an unheralded asset—real estate.  There are about ten thousand central office locations in the US, for example, and they’re all natural places to house small cloud data centers.  They’re at the end of the broadband wireline connection and they also often provide trunking for mobile services.  They’re close to the subscriber, and that’s a potentially enormous benefit, not for video (you can cache to overcome transit delay as long as you don’t have a lot of jitter) but for transactional activities including IoT.  And, of course, including ad service.  How many times have you experienced page-loading issues that arise from ad network delay?

This isn’t going to be easy for Verizon.  Buying AOL and then Yahoo doubles down on a strategy that depends for success on reversing a negative trend in the market share of its acquired properties, in an industry where Verizon has no experience.  Most of all, it depends on creating symbiosis.  I hope Verizon can demonstrate how that can happen, but it’s too early to tell.

A TOSCA-and-Intent-Model Approach Could Save Software Automation of Operations

Chris Lauwers is one of the thought leaders on the TOSCA standard, and he has a nice blog on the way that TOSCA might fill the role of defining intent models.  I’ve advocated the use of TOSCA and also the use of intent models in both SDN and NFV, and so I want to look at what Chris proposes and align it with what I’m seeing and hearing from operators.  I think he’s on to something, so let’s start with defining some terms and dig in.

TOSCA standards for “Topology and Orchestration Specification for Cloud Applications” and it’s an OASIS standard that, as the name suggests, was created to describe how you would orchestrate the deployment and lifecycle management of cloud applications.  I think the fact that NFV is a “cloud application” set is clear to most, and so it’s not surprising that vendors are starting to adopt TOSCA and operators are starting to think of their own service modeling in TOSCA terms.

An “intent model” is an abstraction of a system that describes it to its users by describing the properties it exposes and not by the way those properties are achieved through implementation.  The ONF, which has been moving to intent models (influenced by another thought leader, David Lenrow, now at Huawei, and described in an ONF blog he did) says that they define the “what” rather than the “how” of a function.

TOSCA has what I think is the critical property for the proper expression of intent models, which is that it can define a model hierarchy where something abstract and functional can be decomposed (based on defined policies) to one of a number of possible substructures, all the way down to the point that what you’re defining is an application deployment, or in this case a VNF deployment.

If we accept the idea that TOSCA is a natural fit for deploying virtual functions because they look like applications, the question is whether it could be used to define intent models where there was no cloud deployment, and the answer to that is “Yes!”.  Since TOSCA expects to be able to control SDN it’s perfect where an SDN controller can be used to control legacy elements and probably (with perhaps a little diddling) everywhere.

A great program for drawing organization charts won’t organize your business, and so with TOSCA or any other service model you still have to decide what exactly your model should look like.  We’ve had relatively few discussions on this point, unfortunately.  Once you know what you want, you can apply modeling principles from things like TOSCA to get you there.  Do we know what we want?

As Chris points out in his blog, we need to visualize services as a hierarchical model, each element of which is an intent model that could decompose arbitrarily into a string of other models, a cloud deployment of VNFs, a management API call for a legacy network configuration change, or whatever.  The openness of the approach doesn’t depend on the mechanism alone, though; you have to define your model correctly.

As a general rule, two different intent models are not interchangeable unless they were built from the same template and neither extends that template.  Since an intent model is used based on its exposed properties, anything that exposes exactly the same properties is (normally) interchangeable.  This means that if you define a template for the functional class “firewall” and if everything that implements a firewall conforms to that intent model template, you could presume them all to be interchangeable and your approach to be open.

If you define two intent models for “firewall” that do not expose the same properties, then obviously you can’t substitute one for the other, though you might be able to create a super-model above and harmonize the two different implementations there.  Given this point and the one above, it would make sense to start your model-building process (with TOSCA or another suitable approach) by laying out these super-models or function class models.  An implementation could then be based on that template, and if a given service had to accommodate multiple implementations of a single model, you could avoid having to re-spin the service model by having the super-model pick the proper lower-level model based on whatever criteria was appropriate.  This is how you could deploy one kind of firewall in one area and another kind elsewhere, reflecting perhaps whether you had cloud hosting available, CPE hosting available, or had to send a real firewall device because nothing else was available.

It seems to me that this illustrates the importance of what in software (Java) would be called defining functions in terms of “interfaces” which are then “implemented” or “extended”.  If you start at the top and say that services are created from features, you could start the service model with intent models that describe these features, then look down a layer at a time to define new structures and eventually implement them on suitable resources.  If you follow this approach, then it’s relatively easy to map what you’re doing to TOSCA.

The notion of what Java calls “interfaces” is common to modern programming languages, and the fact that it’s applicable to SDN and NFV and to service modeling suggests that we should indeed be looking more at service models and orchestration from a software design perspective.  It also shows that it’s really important to create the higher-level models correctly, by dividing services into “commercial feature components” and then defining those components as lower-level network functions.  Each high-level element defines the properties that service architects (or users through self-serve portals) would “see”, and that each decomposition/implementation would be expected to deliver.

A service-model approach has to be more than a jazzy name.  When you’re using service models, you are processing models in software, which is a different way of looking at software than the traditional flow-and-component approach.  For example, the proper way to consider an “event” in a model-driven approach is that the event is steered via the model, not “sent” to a process through an interface.  In fact, the software interfaces to operations and management processes should be defined in the model and opaque to those who use it.  If you can define a model (in TOSCA, for example) that establishes a link to a software process using the native interface for that process, the application is no less open than one that uses a standard interface.  What you have to focus on standardizing is the event, meaning what it represents and how the information that describes it is structured.  Since an intent model has to define events it receives from the outside as part of its properties, that shouldn’t be an issue.

The issue of events and flows, and the nature of interfaces, is critical not only for data-modeled services but for SDN and NFV and operations orchestration and automation.  The world is not a synchronized process—stuff happens.  It’s very difficult to automate the operations of anything unless you accept that what you’re automating is the response to events.

From a data-driven software perspective, the functional model published by the NFV ISG is weak because it doesn’t specify an event-driven process.  If you are interfacing between processes and not through data models, you’re not model-driven.  By providing functional blocks linked together with interface points, the ISG has encouraged a literal translation of the functional into software structure.  That leads to a serial workflow vision that, in places like OpenStack Neutron, has known limitations.  It’s not that the ISG mandates the wrong thing, but their description has led many astray.

The TMF, interestingly, is supposed to be promoting the right thing in its latest evolution.  The “Platform Economy” announcement just made is supposed to be an evolution of the original NGOSS Contract work that I’ve long said was the seminal development in operations automation.  The NGOSS Contract was data-model-driven relative to event-to-process coupling, but it’s not been widely implemented.  The TMF article I referenced here doesn’t describe a model-and-event approach either, but some of my operator friends tell me that’s what’s intended.  I’m trying to get the TMF to release something about their initiative so I can blog on it, and if they do provide something that’s open and public I’ll do a detailed analysis.  If it’s heading in the right direction it could be a positive for an event-driven approach, but of course TOSCA is already there and the TMF might still take years to get their latest approach finalized and implemented.

It’s been almost a decade since the TMF’s NGOSS Contract work launched respectable consideration for a framework of software automation of operations and management processes and we’re not there yet.  TOSCA, at the least, can show us what a modern event-driven model looks like.  At most, it could deliver the right approach and do it right now.  That would save a lot of operator angst over the next couple years.

What’s Missing in Operator SDN/NFV Visions?

The news that AT&T and Orange are cooperating to create an open SDN and NFV environment is only the latest in a series of operator activities aimed at moving next-gen networks forward.  These add up to a serious changing-of-the-guard in a lot of ways, and so they’re critically important to the network industry…if they can really do what they’re supposed to.  Let’s take a look at what the key issues are so we can measure the progress of these operator initiatives.

“Box networking” has created current infrastructure, and boxes are hardware elements that have long useful lives and development cycles.  To insure they can build open networks from boxes, operators have relied on standards bodies to define features, protocols, and interfaces. Think of box networks as Lego networks; if the pieces fit then they fit, and so you focus on the fitting and function.

Today’s more software-centric networks are just that, software networks.  With software you’re not driving a five-to-seven-year depreciation stake in the ground.  Your software connections are a lot more responsive to change, and so software networks are a bit more like a choir, where you want everything to sound good and there are a lot of ways of getting there.

The biggest challenge we’ve faced with SDN and NFV is that they have to be developed as software architectures, using software projects and methods, and not by box mechanisms.  In both SDN and NFV we have applied traditional box-network processes to the development, which I’ve often characterized as a “bottom-up” approach.  The result of this was visible to me way back in September of 2013 when operators at an SDN/NFV event were digging into my proposals for open infrastructure and easy onboarding of VNFs—two of the things operators are still trying to achieve.  When you try to do the details before the architecture, things don’t fit right.

The problem with SDN and NFV isn’t competing standards or proprietary implementations as much as standards that don’t really address the issues.  The question is whether the current operator initiatives will make them fit better, there are a number of political, technical, and financial issues that have to be overcome.

The first problem is that operators have traditionally done infrastructure planning in a certain way, a way that is driven by product and technology initiatives largely driven by vendors.  This might sound like operators are just punting their caveat emptor responsibilities, but the truth is that it’s not helpful in general for buyers to plan the consumption of stuff that’s not on the market.  Even top-down, for operators, has always had an element of bottom-ness to it.

You can see this in the most publicized operator architectures for SDN/NFV, where we see a model that still doesn’t really start with requirements as much as with middle-level concepts like layers of functionality.  We have to conform to current device capabilities for evolutionary reasons.  We have to conform to OSS/BSS capabilities for both political and technical reasons.  We have to blow kisses at standards activities that we’ve publicly supported for ages, even if they’re not doing everything we need.

The second problem is that we don’t really have a solid set of requirements to start with, we have more like a set of hopes and dreams.  There is a problem that we can define—revenue per bit is falling faster than cost per bit.  That’s not a requirement, nor is saying “We have to fix it!” one.  NFV, in particular, has been chasing a credible benefit driver from the very first.  Some operators tell me that’s better than SDN, which hasn’t bothered.  We know that we can either improve the revenue/cost gap by increasing revenue or reducing cost.  Those aren’t requirements either.

Getting requirements is complicated by technology, financial, and political factors.  We need to have specific things that next-gen technology will do in order to assign costs and benefits, but we can’t decide what technology should do without knowing what benefits are needed.  Operators know their current costs, for example, and vendors seem to know nothing about them.  Neither operators nor vendors seem to have a solid idea of the market opportunity size for new services.  In the operator organizations, the pieces of the solution spread out beyond the normal infrastructure planning areas, and vendors specialize enough that few have a total solution available.

Despite this, the operator architectures offer our best, and actually a decent, chance of getting things together in the right way.  The layered modeling of services is critical, the notion of having orchestration happening in multiple places is likewise.  Abstracting resources so that existing and new implementations of service features are interchangeable is also critical.  There are only two areas where I think there’s still work to be done, and where I’m not sure operators are making the progress they’d like.  One is the area of onboarding of virtual network functions, and the other is in management of next-gen infrastructure and service elements.  There’s a relationship between the two that makes both more complicated.

All software runs on something, meaning that there’s a “platform” that normally consists of middleware and an operating system.  In order for a virtual function to run correctly it has to be run with the right platform combination.  If the software is expected to exercise any special facilities, including for example special interfaces to NFV or SDN software, then these facilities should be represented as middleware so that they can be exercised correctly.  A physical interface is used, in software, through a middleware element.  That’s especially critical for virtualization/cloud hosting where you can’t have applications grabbing real elements of the configuration.  Thus, we need to define “middleware” for VNFs to run with, and we have to make the VNFs use it.

The normal way to do this in software development would be to build a “class” representing the interface and import it.  That would mean that current network function applications would have to be rewritten to use the interface.  It appears (though the intent isn’t really made clear) that the NFV ISG proposes to address this rewriting need by adding an element to a VNF host image.  The presumption is that if the network function worked before, and if I can build a variable “stub” between that function and the NFV environment that interfaces with that function in every respect as it would have been interfaced on its original platform, my new VNF platform will serve.  This stub function has to handle whatever the native VNF hosting environment won’t handle.

This is a complicated issue for several reasons.  The biggest issue is that different applications require different features from the operating system and middleware, some of which work differently as versions of the platform software evolves.  It’s possible that two different implementations of a given function (like “Firewall”) won’t work with the same OS/middleware versions.  This can be accommodated when the machine image is built, but with containers versus VMs you don’t have complete control over middleware.  Do we understand that some of our assumptions won’t work for containers?

Management is the other issue.  Do all “Firewall” implementations have the same port and trunk assignments, do they have the same management interfaces, and do you parameterize them the same way?  If the answer is “No!” (which it usually will be) then your stub function will either have to harmonize all these things to a common reference or you’ll have to change the management for every different “Firewall” or other VNF implementation.

I think that operators are expecting onboarding to be pretty simple.  You get a virtual function from a vendor and you can plug it in where functions of that type would fit, period.  All implementations of a given function type (like “Firewall”) are the same.  I don’t think that we’re anywhere near achieving that, and to get there we have to take the fundamental first step of defining exactly what we think we’re onboarding, what we’re onboarding to, and what level of interchangeability we expect to have among implementations of the same function.

The situation is similar for infrastructure, though not as difficult to solve.  Logically, services are made up of features that can be implemented in a variety of ways.  Operators tell me that openness to them means that different implementations of the same feature would be interchangeable, meaning VPNs are VPNs and so forth.  They also say that they would expect to be able to use any server or “hosting platform” to host VNFs and run NFV and SDN software.

This problem is fairly easy to solve if you presume that “features” are the output of infrastructure and the stuff you compose services from.  The challenge lies on the management side (again) because the greater the difference in the technology used to implement a feature, the less natural correspondence there will be among the management needs of the implementations.  That creates a barrier both to the reflection of “feature” status to users and to the establishment of a common management strategy for the resources used by the implementation.  It’s that kind of variability that makes open assembly of services from features challenging.

Infrastructure has to effectively export a set of infrastructure features (which, to avoid confusion in terms, I’ve called “behaviors”) that must include management elements as well as functional elements.  Whether the management elements are harmonized within the infrastructure with a standard for the type of feature involved, or whether that harmonization happens externally, there has to be harmony somewhere or a common set of operations automation practices won’t be able to work on the result.  We see this risk in the cloud DevOps market, where “Infrastructure-as-Code” abstractions and event exchanges are evolving to solve the problem.  The same could be done here.

Given all of this, will operator initiatives resolve the barriers to SDN/NFV deployment?  The barrier to that happy outcome remains the tenuous link between the specific features of an implementation and the benefits needed to drive deployment.  None of the operator announcements offer the detail we’d need to assess how they propose to reap the needed benefits, and so we’ll have to reserve judgment on the long-term impact until we’ve seen enough deployment to understand the benefit mechanisms more completely.

Looking Ahead to the Operators’ Big Fall Technology Planning Cycle

Every year in the middle of September operators launch a technology planning cycle.  By mid-November it’s done, and the conclusions reached in the interim have framed the spending and business plans for the operators in the coming year.  We’re about two months from the start of the cycle, and it’s at this point that vendors need to be thinking about how their key value propositions will fare, and what they can do by early October at the latest to get the most bang from their positioning bucks.

According to operators, the big question they have to consider this fall is how to stem the tide in the declining profit-per-bit curve.  Network cost per bit has fallen for decades but revenue per bit has fallen faster, largely driven by the all-you-can-eat Internet pricing model and the lack of settlement among Internet and content providers.  SD-WAN, which is increasingly an Internet overlay that competes with provisioned VPNs, threatens to further undermine revenue per bit and introduce new service competitors who look like OTT players.

The challenge, say the operators I’ve talked with regularly, is that there’s no clear path to profit-per-bit success, either at the approach level or at the technology level.  Cost management is a credible approach, but can you actually achieve significant cost reductions?  Revenue gains would be even better, but how exactly do you introduce a new service that’s not just a cheaper version of a current one, and thus likely to exacerbate rather than solve the problem?

Capex isn’t the answer.  Operators say that new technology choices like SDN and NFV might reduce capex somewhat, but the early calculations by CFOs suggests that the magnitude of the reduction wouldn’t be any larger than the potential savings achieved by putting pricing pressure on vendors.  Huawei has clearly benefitted from this everywhere except the US, and some operators here are starting to lobby for a relaxation of the restrictions on Huawei’s sales here.  In any case, a fork-lift change to current infrastructure is impossible and a gradual change doesn’t move the needle enough.

Opex efficiency is a better candidate, but CFOs tell me that nobody has shown them a path to achieving broad operations economies through SDN or NFV.  Yes, they’ve had presentations on how one particular application might generate operations improvements, but the scope of the improvement is too small to create any real bottom-line changes.  Not only that, the math in the “proofs” doesn’t even get current costs right, much less present a total picture of future costs.  However, this is the area where CFOs and CEOs agree that a change in the profit curve is most likely to be possible.  One priority for the fall is exploring just how operations efficiencies could be achieved quickly and at an acceptable level of risk.

One interesting point on the opex front is that operators are still not prioritizing an operations-driven efficiency transformation that would improve cost and agility without changing infrastructure.  Part of the reason is that while vendors (especially Cisco) are touting a message of conserving current technology commitments, none of the equipment vendors are touting an operations-driven attack on opex.  In fact, only a few OSS/BSS vendors are doing that, and their engagement with the CIO has for some reason limited their socialization of the approach.  Technology planning, for operators, has always meant infrastructure technology planning.  Will some big vendor catch on to that, and link operations transformation to the opex benefit chain?  We’ll see.

On the revenue side, most operators believe that trying to invent a new service and then promote it to buyers is beyond their capabilities.  Simple agility-driven changes to current services hasn’t proved out in their initial reviews; they tend to be sellable only to the extent that they lower network spending.  The current thinking is that services that develop out of cloud trends, IoT, agile development and continuous application delivery, and other IT trends are more reasonable.  This fall they’d like to explore what specific technology shifts would be needed to exploit these kinds of trends, and which would present the best ROI.

There are two challenges operators face in considering new revenue.  First is the fact that the credible drivers for new services, the ones I noted in the last paragraph, are all things waiting to mature even as market targets much less as a place where real prospects are seeking real services.  Operators think it could take several years for something to develop on the revenue side, which means that something else has to be done first to reduce profit pressure.

Another issue operators are struggling with this fall is 5G.  Nobody really expects it to be a target for major investment in 2017, and all the operators see a 5G transition as a kind of new-weapon-in-an-arms-race thing.  You can’t not do it because someone for sure will.  It would be nice if it were extravagantly profitable, but fully three-quarters of mobile operators think that they’d have to offer 5G services for about the same price as 4G.  Further, operators in the EU and the US think that regulatory changes are more likely to reduce profits on mobile services than to increase them.  Roaming fees are major targets, for example, and neutrality pressure seems likely to forever kill the notion of content providers paying for peering.  In fact, content viewing that doesn’t impact mobile minutes is becoming a competitive play.

One question operators who retain copper loop have is whether 5G could be used as an alternative last-mile technology, making every user who isn’t a candidate for FTTH or at least FTTC into a wireless user.  Another question is whether 5G might help generate opportunity in IoT or other areas.  Unfortunately, few of them think there will be much in this area to consider this fall.  It’s too early, and there are regulatory questions that vendors are never much help in dealing with.

The big question with 5G is whether it introduces an opportunity to change both the way mobile infrastructure works and the way that mobile backhaul and metro networks converge.  One operator commented that the mobile infrastructure components IP Multimedia Subsystem (IMS) and Evolved Packet Core (EPC) were “concepts of the last century”, referring to the 3G.IP group that launched the initiatives in 1999.  There’s little question in the mind of this operator, and others, that were we to confront the same mission today we’d do it differently, but nobody believes that can be done now.  Instead they wonder whether application of SDN and NFV to IMS and EPC could also transform metro, content delivery, and access networking.

This is the big question for the fall, because such a transformation would lay the groundwork for a redesign of the traditional structure of networks, a redesign focused on networks as we know them becoming virtual overlays on infrastructure whose dominant cost is fiber and RAN.  If operators made great progress here, they could revolutionize services and networks, but the operators I’ve talked with say that’s going to be difficult.  The problem in their view is that the major winners in a transformation of this kind, the optical vendors, have been slow to promote a transformative vision.  While some operators like AT&T have taken it on themselves to build software to help with virtualization, most believe that vendors are going to have to field products (including software and hardware) with which the new-age 5G-driven infrastructure could be built.

The transformation of metro is the most significant infrastructure issue in networking, and SDN or NFV are important insofar as they play a role in that, either driving it or changing the nature of the investments being considered.  Operators believe that too, and their greatest concern coming into this fall is how to reshape metro networking.  They know they’ll need more capacity, more agility, and they’re eager to hear how to get it.

I’ve been an observer and in some cases contributor to these annual planning parties for literally decades, and this is one of the most interesting I’ve ever seen.  I have no idea what’s going to come out of this because there’s less correspondence between goals and choices in this cycle than in any other.  One thing is for sure; if somebody gets things right here there could be a very big payday.

IBM, VMware, the Cloud, and IT Trends

IBM and VMware both reported their numbers and the results look like a win for the cloud.  The question remains whether a win for the cloud translates into a win for vendors and for IT overall, though.  Neither of the two companies really made a compelling case for a bright future, and when you combine this with the Softbank proposal to buy ARM, you have a cloudy picture of the future.

IBM beat expectations, but the expectations were far from lofty.  The company continues to contract quarter-over-quarter, for the very logical reason that if you’re an IT giant who’s bet on a technology whose benefit to the user is lower IT costs, you’re facing lower revenue.  VMware can buck this trend because they’re not a server incumbent and clouds and virtualization reduce server spending.  Broad-based players can’t.

Perhaps the big news from IBM was that it appears that, adjusting for M&A, software numbers are declining and business services are soft.  Let’s see here; you divest yourself of hardware and focus on technology that stresses a shift of enterprises from self-deployed to public hosting, and you’re also shrinking in software and services.  What else is there in IT?

IBM has some solid assets.  They have a good analytics portfolio, exceptional account control in large accounts, a number of emerging technology opportunities like IoT and Watson, and strong R&D with things like quantum computing.   Its challenge is to somehow exploit these assets better, and that’s where IBM still has problems.

The first problem is marketing.  Sales account control is not a growth strategy because there’s no significant growth in the number of huge enterprises that can justify on-site teams.  IBM needs to be down-market and getting there is a matter of positioning and marketing.  They used to be so good at marketing that almost everything IBM did or said became industry mainstream.  Not anymore.

The second problem is strategic cohesion.  OK, perhaps quantum computing will usher in a new age of computing, but “computing” means “computers”, right?  Hasn’t IBM been exiting the hardware business?  Software today means open-source, and IBM has a lot of that, but you don’t sell free software.  Watson and IoT are emerging opportunities, but they could take years to emerge and it’s obvious that IBM needs to do a better job of promoting them.

The third problem is the lack of a clear productivity-benefit dimension to IBM’s story.  The cloud isn’t by nature accretive to IT spending, it’s the opposite.  To get a net gain from the cloud you have to associate the cloud with a productivity benefit that your buyers don’t realize now, without it.  IBM’s story in this area is muddled when it could be strong.

How about VMware and its implications?  First, clearly, VMware as a virtualization-and-cloud player that doesn’t have a stake in the product areas that the cloud consolidates, would see the cloud as a net gain.  It can focus on the cost driver and exploit it, which is a considerably easier task from a sales and marketing perspective.  How long does it take to say “This will be 30% cheaper” versus explaining the ins and outs of productivity enhancement through some new technology?

VMware is also reaping the benefits of the combination of cloud publicity and OpenStack issues.  There is no question in my mind, nor in the minds of most cloud types I’ve talked with, that OpenStack is where the cloud is going.  The problem is that like all open-source software, OpenStack is a bit disorderly compared to commercial products.  VMware has a clear market opportunity in supporting the evolution of limited-scope virtualization to virtual-machine-agile hosting anywhere and everywhere.  They can develop feature schedules and marketing plans to support their goals, where OpenStack as a community project is harder to drive in a given direction.  VMware can also be confident they’ll reap the benefit of their own education efforts with their stuff, which can’t be said for vendors who rely on OpenStack because it is open-sourced.  VMware’s Integrated OpenStack is in fact at least one of two leaders in the OpenStack space, so VMware has little to fear from OpenStack success.

VMware also seems to be gaining some ground over open-source in the container space.  Docker gets all the container PR, but enterprises tell me that Docker adoption isn’t easy and that securing a supported Docker package is costly enough that VMware’s container solution isn’t outlandish.  Not only that, most enterprises interested in virtual hosting have already implemented VMware.

The VMware NSX (formerly Nicira) SDN overlay technology, which VMware is rolling into a broad software-defined-data-center (SDDC) positioning and also exploiting for the cloud.  It announced a management platform specifically for NSX, and that makes it far easier for enterprises to adopt in the data center.  There’s plenty of run room for expansion of NSX functionality and plenty of room for market growth.

Finally, VMware’s alliance with IBM is helping VMware exploit IBM’s commitment to the cloud perhaps better than IBM can.  IBM’s SoftLayer cloud is powered by VMware and IBM’s support is a highly merchandisable reference.  How this will fare when/if Dell picks up VMware is another matter.

Perhaps the big question for VMware, in fact.  Dell, and its rival HPE, seem to be taking the opposite tack to IBM, focusing on hardware and platform software rather than divesting hardware.  That means that there could be no long-term conflict with SoftLayer even if the Dell buy happens.  However, it’s not clear whether getting into, or getting rid of, hardware is the best approach, and it may well be that neither extreme is viable.  And VMware’s benefit in the cloud-and-virtualization space is transitory.  First, being the supplier for what enterprise buyers are evolving from is useful only until they’re done evolving.  Second, all OpenStack sources will eventually converge on the same baseline product with the same stability.  Can Dell strategize an evolution, and exploit things like NSX?

The cloud isn’t hurting IT, it’s simply an instrument of applying the hurt factor, which is the sense that the only thing good about future products and technologies is their ability to control costs.  Commodity servers and open-source software, hosted virtual machines and server consolidation—you get the picture.  This is where the Softbank deal for ARM comes in.  ARM is a leader in embedded systems, which today means largely mobile devices and smart home devices.  Any commoditization trend in hardware tends to favor chips, which are an irreplaceable element.  We already use more compute power out of businesses than in them, and that trend is likely to continue.  Softbank is betting on it, in fact.

Even the cloud will be impacted by the current cost-driven vision of change.  My model has said consistently that a cost-based substitution of hosted resources for data center resources would impact no more than 24% of IT spending.  A productivity-driven vision, if one could be promoted, could make public cloud spending half of total IT spending, and transform over half the total data centers to private clouds.  That’s the outcome that IBM should be promoting, of course.

The point here is that we’re in a tech-industry-wide transformation that’s driven by a loss of new and valuable things.  All the players, whether they’re in IT or networking, are struggling to realign their businesses to the evolving conditions.  Some like VMware are currently favored, others like ARM are favored more in the long term, and some like IBM seem to be trying to find favor.  Eventually, everyone will have to change if current trends continue, and it’s hard to see what will reverse the cost focus any time soon.

A Compromise Model of IoT Might Be Our Best Shot

One of the problems with hype is that it distorts the very market it’s trying to promote, and that is surely the case with the Internet of Things.  The notion of a bunch of open sensors deployed on the Internet and somehow compliant with security/privacy requirements is silly.  But we’re seeing announcements now that reflect a shift toward a more realistic vision—from GE Digital’s Predix deals with Microsoft and HPE to Cisco’s Watson-IBM edge alliance.  The question is whether we’re at risk of throwing the baby out with the bathwater in abandoning the literal IoT model.

The Internet is an open resource set, where “resources” are accessed through simple stateless protocols.  There’s no question that this approach has enriched everyone’s lives, and few question that even with the security and privacy issues the net impact has been positive.  In a technical sense, IoT in its purist form advocates treating sensors and controllers as web resources, and it’s the risks that sensors and controllers would be even more vulnerable to security and privacy issues that have everyone worried.  You can avoid that by simply closing the network model, making IoT what is in effect a collection of VPNs.  Which, of course, is what current industrial control applications do already.

We need a middle ground.  Call it “composable resources” or “policy-driven resource access” or whatever you like, but what’s important here is to preserve as much of the notion of openness as we can, consistent with the need to generate ROI for those who expose sensors/controllers and the need to protect those who use them.  If we considered this in terms of the Internet resource model, we’d be asking for virtual sensors and controllers that could be protected and rationed under whatever terms the owners wanted to apply.  How?

A rational composable IoT model would have to accomplish four key things:

  1. The sensors and controllers have to be admitted to the community through a rigorous authentication procedure that guarantees everyone who wants to use them that they’d know who really put them up and what they really represent, including their SLA.
  2. The sensors and controllers have to be immunized against attack, including DDoS, so that applications that depend on them and human processes that depend on the applications can rely on their availability.
  3. The information available from sensors and the actions that can be taken through controllers have to be standardized so that applications don’t have to be customized for the devices they use. It’s more than standardizing protocols, it’s standardizing the input/output and capabilities so the devices are open and interchangeable.
  4. Access to information has to be policy-managed so that fees (if any) can be collected and so that public policy security/privacy controls can be applied.

If you look at the various IoT models that have been described in open material, I think you can say that none of these models address all these points, but that most or all of them could be made to address them by adding a kind of “presentation layer” to the model.

The logical way to address this is to transliterate the notion of an “Internet of Things” to an “Internet of Thingservices”.  We could presume that sensors and controllers were represented by microservices, which are little nubbins of logic that respond to the same sort of HTML/HTTP commands that web servers do.  A microservice could look, to a user, like a sensor or controller, but since it’s a software element it’s really only representing one, or maybe many, or maybe an analytic result of examining a whole bunch of sensors or sensor trends.

This kind of indirection has an immediate benefit in that it can apply any kind of policy filtering you’d like on access to the “device’s microservice”.  The device itself can be safely hidden on a private network and you get at it via the microservice intermediary, which then applies all the complicated security and policy stuff.  The sensor isn’t made more expensive by having to add that functionality.  In fact, you can use any current sensor through a properly connected intermediary.

The microservice can also represent a logical device, just as a URL represents a logical resource.  In content delivery applications a user clicks a URL that decodes to the proper cache based on the user’s location (and possibly other factors).  That means that somebody could look for “traffic-sensor-on-fifth-avenue-and-33rd” and be connected (subject to policy) to the correct sensor data.  That data could also be formatted in a standard way for traffic sensor data.

You could also require that the microservices be linked to a little stub function that you make into a service on a user’s private network.  That means that any use of IoT data could be intermediated through a network-resident service, and that access to any data could be made conditional on a suitable service being included in the private network.  There would then be no public sensor at all; everyone would have to get a proxy.  You could attack your own service but not the sensor, or even the “real” sensor microservice.

I know that a lot of people will say that this sort of thing is too complicated, but the complications here are in the requirements and not in the approach.  What you don’t do in microservice proxies you have to do in the sensors, if you put them directly online, or you have to presume would be built in some non-standard way into applications that expose sensor/control information or capabilities.  That’s how you lose the open model of the Internet.  That, or by presuming that people are going to deploy sensors and field privacy and security lawsuits out of the goodness of their hearts with no possibility of profits.

I’d love to see somebody like Microsoft (who has a commitment to deploy GE Digital’s Predix IoT on Azure) describe something along these lines and get the market thinking about it.  There are ways to achieve profitable, rational, policy-compliant IoT and we need to start talking about them, validating them, if we want IoT to reach its full potential.

Tapping the Potential for Agile, Virtual, Network and Cloud Topologies

You always hear about service agility as an NFV goal these days.  Part of the reason is what might cynically be called “a flight from proof”; the other benefits touted for NFV have proven to be difficult to validate or to size.  Cynicism notwithstanding, there are valid reasons to think that agility at the service level could be a positive driver, and there are certainly plenty who claim it.  I wonder, though, if we’re not ignoring a totally different kind of agility in our discussions—topology agility.

For most vendors and operators, service agility means reducing time to revenue.  In the majority of cases the concept has been applied specifically to the provisioning delay normally associated with business services, and in a few cases to the service planning-to-deployment cycle.  The common denominator for these two agility examples is that they don’t necessarily have a lot to do with NFV.  You can achieve them with composable services and agile CPE.

If we step back to a classic vision of NFV, we’d see a cloud of resources deployed (as one operator told me) “everywhere we have real estate.”  This model may be fascinating for NFV advocates, but proving out that large a commitment to NFV is problematic when we don’t really even have many service-specific business cases.  Not to mention proof that one NFV approach could address them all.  But the interesting thing about the classic vision is that it would be clearly validated if we presumed that NFV could generate an agile network topology, a new service model.  Such a model could have significant utility, translating to sales potential, even at the network level.  It could also be a way of uniting NFV and cloud computing goals, perhaps differentiating carrier cloud services from other cloud providers.

Networks connect things, and so you could visualize a network service as being first and foremost a “connection network” that lets any service point on the service exchange information with any other (subject to security or connectivity rules inherent in the service).  The most straightforward way of obtaining full connectivity is to mesh the service points, but this mechanism (which generates n*(n-1)/2 paths) would quickly become impractical if physical trunks were required.  In fact, any “trunk” or mesh technology that charged per path would discourage this approach.  The classic solution has been nodes.

A network node is an intermediate point of traffic concentration and distribution that accepts traffic from source paths and delivers it to destination paths.  For nodes to work the node has to understand where service points are relative to each other, and to the nodes, which means some topology-aware forwarding process.  In Ethernet it’s a bridging approach, with IP it’s routing, and with SDN it’s a central topology map maintained by an SDN controller.  Nodes let us build a network that’s not topologically a full mesh but still achieves full connectivity.

Physical-network nodes are where trunks join, meaning that the node locations are linked to the places where traffic paths concentrate.  Virtual network nodes that are based on traditional L2/L3 protocols are built by real devices and thus live in these same trunk-collection locations.  The use of tunneling protocols, which essentially create a L1/L2 path over an L2/L3 network, can let us separate the logical topology of a network from the physical topology.  We’d now have two levels of “virtualization”.  First, the service looks like a full mesh.  Second, the virtual network that creates the service looks like a set of tunnels and tunnel-nodes.  It’s hard to see why you’d have tunnel nodes where there was no corresponding physical node, but there are plenty of reasons why you could have a second-level virtual network with virtual nodes at only a few select places.  This is what opens the door for topology agility.

Where should virtual nodes be placed?  It depends on a number of factors, including the actual service traffic pattern (who talks to who and how much?) and the pricing mechanism applied.  Putting a virtual node in a specific place lets you concentrate traffic at that point and distribute from that point.  Users close to a virtual node have a shorter network distance to travel before they can be connected with a partner on that same node.  Virtual nodes can be used to aggregate traffic between regions to take advantage of transport pricing economies of scale.  In short, they can be nice.

They can also be in the wrong place at any given moment.  Traffic patterns change over each day and through a week, month, or quarter.  Some networks might offer different prices for evening versus day use, which means price-optimizing virtual-node topologies might have to change by time of day.  Some traffic might even want a different structure than another—“TREE” or multicast services typically “prune” themselves for efficient distribution with minimal generation of multiple copies of packets or delivery to network areas where there are no users receiving the multicast.

NFV would let you combine tunnels and virtual nodes to create any arbitrary topology and to change topology at will.  It would enable companies to reconfigure VPNs to accommodate changes in application topology, like cloudbursting or failover.  It could facilitate the dynamic accommodation of cloud/application VPNs that have to be linked to corporate VPNs, particularly when the nature of the linkage required changed over time to reflect quarterly closings or just shifting time zones for users in their “peak period.”

This has applications for corporate VPNs but also for provider applications like content delivery.  Agile topology is also the best possible argument for virtualizing mobile infrastructure, though many of the current solutions don’t exploit it fully.  If you could place signaling elements and perhaps even gateways (PGW, SGW, and their relationships) where current traffic demanded, you could respond to unusual conditions like sporting or political events and even traffic jams.

These applications would work either with SDN-explicit forwarding tunnels or with overlay tunnels of the kind used in SD-WAN.  Many of the vendors’ SDN architectures that are based on overlay technology could also deliver this sort of capability; what’s needed is either a capability to deliver a tunnel as a virtual wire to a generic virtual switch or router, or a virtual router or switch capability included in the overlay SDN architecture.

Agile topology services do present some additional requirements that standards bodies and vendors would have to consider.  The most significant is the need to locate where you want to exercise your agility, and what triggers changes.  Networks are designed to adapt to conditions, but roaming nodes and trunks aren’t the sort of thing early designers considered.  To exploit agility, you’d need to harness analytics to decide when something needed to be done, and then to reconfigure things to meet the new conditions.

Another requirement is the ability to control the way that topology changes are reflected in the network dynamically, to avoid losing packets during a change.  Today’s L2/L3 protocols will sometimes lose packets during reconfiguration, and agile topologies should at the minimum do no worse.  Coordinating the establishing of new paths before decommissioning old ones isn’t rocket science, but it is something that’s not typically part of network engineering.

Perhaps the biggest question raised by agile-topology services is whether the same thing will be needed in the cloud overall.  If the purpose of agile topology is to adapt configuration to demand changes, it’s certainly something that’s valuable for applications as well as for nodes—perhaps more so.  New applications like a logical model of IoT could drive a convergence of “cloud” with SDN and NFV, but even without it it’s possible operators would see competitive advantages in adding agile-topology features to cloud services.

The reason agile topology could be a competitive advantage for operators is that public cloud providers are not typically looking at highly distributed resource pools, but at a few regional centers.  The value of agile topology is greatest where you have a lot of places to put stuff, obviously.  If operators were able to truly distribute their hosting resources, whether to support agile-node placement or just NFV, they might be able to offer a level of agility that other cloud providers could never hope to match.

The challenge for topology agility is those “highly distributed resource pools”.  Only mobile infrastructure can currently hope to create enough distributed resources to make agile topology truly useful, and so that’s the place to watch.  As I said, today’s virtual IMS/EPC applications are only touching the edges of the potential for virtualization of mobile topology, and it’s hard to know how long it will take for vendors to do better.

Can Cisco Succeed with an SDN-and-NFV-less Transformation Model?

Cisco has always been known for aggressive sales strategies and cynical positioning ploys.  Remember the day of the “five phase plan” that was always in Phase Two when it was announced (and that never got to Phase Five)?  When SDN and NFV came along, Cisco seemed to be the champion of VINO, meaning “virtualization in name only”.  Certainly Cisco Live shows that the company is taking software, virtualization, the cloud, and APIs more seriously, but it doesn’t answer the question of whether Cisco’s emerging vision takes things far enough.

Cisco’s approach to networking, embodied in things like Application-Centric Infrastructure (ACI) and Digital Network Infrastructure (DNA), has been to use a combination of policy management and software control through enhanced APIs to create as many of the benefits of SDN and NFV as possible without actually mandating a transition from current switch/router technology.

ACI and DNA may seem like meaningless “five phase plan” successors, another round of cynical positioning, but they’re not.  They are specific defenses of the status quo from a company who benefits more from that status quo than others.  They’re also an exploitation of what Cisco knows from experience is likely to be a highly visible but largely ill-planned and ineffective initiative to change things.

Obviously, the risk of VINO is that any true benefits of infrastructure transformation are lost.  However, that risk is relevant only if those kinds of benefits are demonstrably realizable.  Very early in the SDN/NFV game, operators and enterprises found that capex reduction wouldn’t delivery any real infrastructure-transformation benefits.  Beat the vendors up on price instead!  But not much later, everyone realized that operations and agility benefits could be truly compelling.  I think Cisco, at this point, started to shift their general ACI positioning from “software control” to “software automation”, emphasizing the importance of software in reducing opex and bringing services to market faster.  DNA shows more of that view, for example, as the later architecture.

The truly interesting flex point came along within the last year, when it became increasingly clear that you not only could gain significant opex economy and service agility through operations automation, you probably should.  My own modeling shows that you can create a bigger impact on network operator profits with operations automation than with infrastructure transformation, do it sooner, and present a better ROI along the way.  Maybe I’m reading things into the Cisco event speeches, but I think Cisco may now accepting this shift, and looking to capitalize on it.

Operations automation implemented in an open way should be able to control the service lifecycle independent of the details of infrastructure.  Yes, that could aid the transition to SDN or NFV or both, but it could also be used simply to improve operations on legacy infrastructure.  That would play to what Cisco wanted all along—some way of improving operator profit per bit that didn’t involve shifting to a new set of network technologies that Cisco wasn’t the incumbent for.

You could also argue that it could play into the Tail-f acquisition, which gives Cisco a way of managing multi-vendor networks.  Earlier this month, Cisco won a deal with SDN/NFV thought-leader Telefonica to use Cisco’s Network Services Orchestrator (NSO) for business services.  This product is derived from Tail-f and the YANG modeling language.  In a real sense, NSO is a kind of superset of the NFV Virtual Infrastructure Manager and NFV Orchestrator rolled into one.  What it does, and what I’m sure Cisco intended it do, is let operators orchestrate/automate legacy configurations just as NFV MANO and other tools would do for NFV.

Which is the challenge Cisco now faces.  In fact, the move generates several challenges to Cisco’s positioning and ultimate success.

The most obvious challenge is that a “Network Service Orchestrator” will have to orchestrate SDN and NFV as well as legacy technology.  Cisco will have to let the new-infrastructure camel at least blow some sand under the tent, if not actually get the nose in.  If compelling SDN/NFV business cases could be made (which so far has not happened, but could happen) then Cisco might end up facilitating the very transition it’s been trying to position against.

This challenge leads into the second challenge, which is a fast start to achieve thought and deployment leadership.  Cisco has a credible NFV Orchestration product in NSO, as a recent report on NFV MANO from Current Analysis shows.  The problem is that NFV orchestration isn’t a business case, it’s just a way of making NFV work.  If Cisco’s goal is to fend off NFV transition it first has to make it clear that NSO opens an alternative path to benefits, then convince operators to take that path and prove out its validity.

Meeting these challenges, in my view, means making a direct connection between Cisco’s architectures (ACI, DNA) and products (NSO) and service, network, and operations automation.  I think some of that came out in the announcements and talks at the Cisco event, but this isn’t something you can win by blowing kisses.  NSO is delivering value below the OSS/BSS, and it’s a single-level model at a time when operators are recognizing the need for multi-layered orchestration.  Other vendors have a broader, better, story in that area than Cisco.

Better-ness equates to the ability to make a compelling near-term business case for software automation of operations.  NSO and YANG evolve from an initiative to fix SNMP management by creating CLI-like flexibility in a standardized way.  NETCONF is the protocol that operates on YANG models, and it’s ideal for network device management, particularly in multi-vendor environments.  As an operations modeling language it is, in my view, sub-optimal.  I know of no operator who likes the idea of doing cloud or OSS/BSS orchestration using YANG.  TOSCA, the OASIS cloud standard, is the emerging choice, in fact.  Cisco has to either prove that YANG is a good choice for generalized multi-layer service orchestration or explain where those other layers and those other kinds of orchestration come from.

Ciena, Cloudify, and others have provided some good insight into how TOSCA and YANG, for example, could relate.  Some operator architectures for NFV also suggest a symbiotic application of these technologies.  For Cisco to get its approach going, it needs to lay out this kind of approach and make it an official Cisco policy.  But it also has to tie this to benefits.  Operators I’ve talked with have been continually frustrated by the lack of insight vendors have into operations efficiency and agility benefits.  Vendors don’t know that the current costs are, how any specific approach would target them, and how the targets could be validated.  Most of the “validations” or “use cases” so far have been inside NFV lab activities, where the specific goal isn’t operations automation at all and where most of the interesting stuff has been declared out of scope.

Cisco is making several critical bets here.  First, they’re betting that SDN and NFV will not get their acts together (a fairly safe bet as I’d see it).  Second, they’re betting that they can deliver meaningful operations automation at the network level, meaningful enough to drive adoption of the Cisco NSO model without much operations integration elsewhere.  Third, they’re betting that nobody delivers operations automation from the top down and cuts off Cisco’s NSO layer.  Neither of these last two bets are particularly good ones, and so Cisco is going to have to figure out a way of taking a lead in operations automation and service agility that sticks its nose outside the network and beyond the influence of Cisco’s typical buyers.  A shift in attitude, which we may be seeing, is important to reaching that goal, but it won’t be enough.  Cisco has to step up and make the business case like everyone else.