The “New ONF” Declares a Critical Mission, but Can They Fulfill It?

Yesterday the “New ONF” formed by the union of the old ONF and ON.Labs announced its new mission and its roadmap to achieving it.  I’m a guy who has worked in standards for well over two decades, and the experience has made me perhaps more cynical about standards than I am about most things (which, most of my readers will agree, is pretty darn cynical).  The new ONF actually excites me by stating a goal set and some key points that are spot on.  It also frightens me a little because there’s still one thing that the new group is doing that has been a major cause of failure for all the other initiatives in the service provider transformation space.

The “new ONF” is the union of the Open Network Foundation and ON.Labs, the organization that created the ONOS operating system and CORD, both of which I’ve talked about in the past.  I blogged about the importance of CORD early on (see THIS blog) and gain when Comcast jumped into the consortium, HERE, and everyone probably knows that the ONF is the parent of OpenFlow SDN.  The new ONF seems more focused on the ON.Labs elements, from which they hope to create way to use software-based or software-defined elements to build market-responsive networks and network services.

Networks of old were a collection of boxes joined by very standardized hardware interfaces.  Then, enter virtualization, software definition, the cloud, and all the other good stuff that’s come along in the last decade.  Each of these new initiatives had/have their champions in terms of vendors, buyers, and standardization processes.  Each of these initiatives had a very logical mission, and a logical desire to contain scope to permit timely progress.  Result?  Nothing connects in this wonderful new age.

This is a perhaps-flowery restatement of the opening positioning that the ONF offers for its new concept of the Open Innovation Pipeline.  The goal of the process is the notion of the “Software-Defined Standard”, something that by itself brings tears to the eyes of an old software architecture guy like me.  We’ve gone on way too far along the path of supposed software-defined stuff with little apparent concern for software design principles.  The ONF says they want to fix that, which has me excited.

Digging to the details, what the ONF seems to be proposing is the creation of an open ecosystem that starts (at least in many cases) with the ONOS operating system, on which is added the XOS orchestration layer (which is a kind of service middleware).  This is used to build the variety of CORD models (R-CORD, M-CORD, etc.), and it can also be used to build new models.  If this approach were to be followed, it would create a standardized open-source platform that builds from the bottom to the top, and that provides for easy customization and integration.

But it’s at the top of the architectural heap that I find what makes me afraid.  The architectural slide in all of this shows the open structure with a programmable forwarding plane at the bottom, a collection of Global Orchestrators at the top, and the new ONF focus as a box in between.  This vision is of course device-centric, and in the real world you’d be assembling conforming boxes and presumably other boxes, virtual or real, to create networks and services.  I don’t have a problem with the idea that there’s a forwarding plane at the bottom, because even service elements that are outside the service data plane probably have to forward something.  I’m a bit concerned about that Global Orchestrator thing at the top.

I’ve been a part of a lot of standards processes for decades, and it seems like all of them tend to show a diagram that has some important function sitting god-like at the top, but declared safely out of scope.  That’s what the ONF has done with those Global Orchestrators.  The problem with those past bodies and their past diagrams is that all of them failed their critical mission to make the business case, and all of them failed because they didn’t include elements that were critical to their business case in their scope of work.  So the fact that the ONF seems to do this is discouraging.

The ONF is right in saying that there’s an integration problem with the new-generation virtualization-based services.  They are also right in saying that having a common platform on which the elements of those new services are built would solve that problem, through the simple mechanism of a common implementation platform on which the features were built.  However, the past says that’s not enough, for two reasons.

First, everything is not built on the ONF’s architecture.  Even if we presumed that everything new was built that way, you still have to absorb all the legacy hardware and accommodate the open source initiatives for other virtualized-element models, all of which aren’t based on the ONF’s elements.  We have learned the bitter truth in NFV in particular; you can’t exclude the thing you are evolving from (legacy devices in particular) in your model of a future service, unless you never want to get there from here.  You could accommodate the legacy and “foreign” stuff in the ONF approach, but the details aren’t there yet.

Second, there’s the issue of the business case.  I can have a wonderful architecture for building standardized car parts, but it won’t serve me a whit if nobody wants to buy a car.  I’ve blogged a lot about the business case behind a new virtual service element—SDN, NFV, or whatever you like.  Most of that business case is going to come from the automation of the full service lifecycle, and most of that lifecycle and the processes that automate it live in that Global Orchestrators element that’s sitting out of scope on top of the ONF target functionality.

All of this could be solved in a minute with the inclusion of a model-based service description of the type I’ve been blogging about.  I presented just this notion to the ONF, in fact, back in about 2014.  A model like that could organize all of the pieces of ONF functionality, and it could also organize how they relate to the rest of the service processes, whether they’re NFV processes, OSS/BSS processes, or cloud computing.  Yes, this capability would be in a functional Global Orchestrator, but there aren’t any of them available and we know that because nobody has successfully made the business case with one, nor have they integrated all the service lifecycle processes.

There is a modeling aspect to the XOS layer, and it’s got all the essential pieces, as I said in my first blog on it (see above).  However, in execution, XOS seems to have changed its notion of “service” from a high-level one to something more like the TMF’s “Resource-Facing Services” or my ExperiaSphere “Behaviors”.  They’re what a network or infrastructure can do, more than a functional assembly that when decomposed ends up with these infrastructure capabilities.  That seems to be what created the Global Orchestrator notion; the lost functionality is pushed up into the out-of-scope part.  That’s what frightens me, because it’s the mistake that so many others have made.

I’m not knocking the new ONF here, because I have high hopes for it.  They, at least, seem to grasp the simple truth that software defined stuff demands a definition of stuff in software terms.  I also think that, at a time when useful standards to support integration in SDN and NFV seem to be going nowhere, the notion of a common platform seems unusually attractive.  Is it the best approach?  No, but it’s a workable one, which says a lot at this point.

There have been a lot of recent re-launching of standards and industry groups and activities, brought about because the original efforts of the bodies generated interest, hype, media extravagance, and not much in the way of deployment or transformation.  The new ONF now joins the group of industry mulligans, and the question is whether it will jump off what’s unquestionably a superior foundation and do the right thing, or provide us with another example of how to miss the obvious.  I’ll offer my unbiased view on that as the details of the initiative develop.

What Will Become of Test and Measurement in a Virtual World?

One of the joke statements of the virtual age is “You can’t send a real tech to fix a virtual problem!”  Underneath the joke is a serious question, which is just what happens to test and measurement in a virtual world?  Virtualization opens two issues—how do you test the virtual processes and flows, and how does virtualization impact T&M’s usual missions?  We’ll look at both today.

Test and measurement (T&M) differs from “management” in that the latter focuses on the ongoing status of things and the reporting of changes in status.  Management is status monitoring and not network monitoring.  T&M, in contrast, aims at supporting the “craft processes” or human activity that’s associated with taking a refined look at something—is it working, and how well—with the presumptive goal of direct remediation.

Many people, including me, remember the days when resolving a network problem involved looking at a protocol trace, and that practice is a good place to start our exploration.  Whether you have real or virtual devices, the data flows are still there and so are the issues of protocol exchanges.  However, a virtual device is fundamentally different from a real one, and the differences have to be accommodated in any realistic model of T&M for the virtual age.

There’s an easy-to-see issue that we can start with.  A real device has a location.  A virtual device has one too, in the sense that it’s hosted somewhere, but the hosting location isn’t the same thing as the location of a box.  A box is where it is; a virtual router instance is where it was convenient to put it.  At the least, you’d have to determine where an instance was being hosted before you could run out and look at it.  But that initial check of location isn’t enough in a virtual world.  Imagine a tech on route to stick a monitor in a virtual router path, only to find that while in route, the virtual router “moved”.  It’s common to have a soft collision between management-driven changes in a network and remediation, but in the traditional world the boxes at least stay put.  T&M in a virtual world has to deal with the risk of movement of the instance while the tech is setting up or during the test.

Simplicity is comforting even when it’s not quite as simple as looks, but this simple point of “where is it?” isn’t the real problem.  If software automation to improve opex is the goal (which operators say it is) for virtualization, then we’d have to assume that the goal is to move away from “T&M” to “management”, since the former is presumably explicitly a human activity.  That means that in the future, not only would it be more likely that a virtual router got moved, it would be likely that if there were a problem with it the first goal would be to simply replace it—an option that’s fine if you’re talking about a hosted software element but problematic if you’re dealing with a real box.  So, we’re really saying that virtualization first and foremost alters the balance between management and T&M.

When do you send a tech, or at least involve a tech?  The only satisfactory answer in a time when opex reduction is key is “When you’ve exhausted all your other options.”  One operator told me that their approach was something like this:

  1. If there’s a hard fault or an indication of improper operation, you re-instantiate and reroute the service as needed. It’s like saying that if your word processor is giving you a problem, save and reload it.
  2. If the re-instantiation doesn’t resolve things, you check to see if there was any change to software versions in the virtual device or its platform that, in timing, seem possibly related to the issue. If so, you roll back to the last configuration that worked.
  3. If neither of these resolves things or are not applicable, then you have to try remediation. The operator says that they’d first try to reroute or redeploy the service around the whole faulty function area and then try to recreate the problem in a lab under controlled conditions.  If that wasn’t possible they’d assume T&M was needed.

The same operator says that if we assumed a true virtual network, the goal would be to avoid dispatching a tech in favor of some kind of testing and monitoring from the network operations center (NOC).  The RMON specification from the IETF can be implemented in most real or virtual devices, and there are still a few companies that use hardware or software probes of another kind.  This raises the question of whether you could do T&M in a virtual world using virtual monitoring and test injection, which would eliminate the need to dispatch someone to hook up an analyzer.  A “real” dispatch would be needed only if there were a hardware failure of some sort on site, or a situation where a manual rewiring of the network connections of a device or server was needed.

One advantage of the virtual world is that you could instantiate a monitoring point as software somewhere convenient, and either connect it to a “T” you kept in place at specific locations, or cross-connect by rerouting.  The only issue with this approach is the same issue you can run into with remote monitoring today—the time delay that’s introduced from the point of “tapping” the flow to the point of viewing the monitoring could be an issue.  However, if you aren’t doing test injection at the monitoring point the issues should be minimal, and if you are then you’d need a more sophisticated remote probe to install so you could enter responses to triggers that are executed locally.

Another aspect of “virtual T&M” is applying T&M to the control APIs and exchanges associated with SDN or NFV.  This has been a topic of interest for many of the T&M vendors, and certainly the failure of a control or management path in SDN or NFV could present a major problem.  Operators, in fact, are somewhat more likely to think they need specialized T&M support for control/management exchanges in SDN and NFV than in the service data path.  That’s because of expected issues with integration among the elements at the control/management protocol level.

Most of the technology and strategy behind virtual T&M is the same whether we’re talking about the data path or the control/management plane.  However, there are profound issues of security and stability associated with any monitoring or (in particular) active intervention in control/management activity.  We would assume that T&M would have to live inside the same security sandbox as things like an SDN controller or NFV MANO would live, to insure nothing was done to compromise the mass of users and services that could be represented.

Overall, the biggest impact of virtualization trends on T&M is the fact that a big goal for virtualization is service lifecycle automation.  If that’s taken seriously, then more of what T&M does today would migrate into a management function that generated events to drive software processes, not technicians.  In addition, the T&M processes related to device testing are probably far less relevant in an age where the device is virtual and can be reinstantiated on demand.  But virtualization also lets T&M create what is in effect a virtual technician because it lets you push a probe and test generator anywhere it’s needed.  Will the net be positive or negative?  I think that will depend on how vendors respond to the challenge.

Could Modeling Be the Catalyst for OSS/BSS Transformation?

I can vividly recall one of my early telco transformation meetings.  It was just after NFV had launched, but before any real work had been done.  At the meeting, two of the telco experts sitting next to each other expressed their views on OSS/BSS.  One wanted to transform it, retaining as much as possible of the current systems.  The other wanted to start over.  This polarized view of OSS/BSS futures, it turned out, was fairly pervasive among operators and it’s still dominant today.

The notion of transforming OSS/BSS has a long history, going back more than a decade in fact.  The first transformational notion I saw was the TMF’s NGOSS Contract work, something I’ve cited often.  This was an early attempt to reorganize operations processes into services (SOA, at the time) and to use the contract data model to steer service events to the right process.  This, obviously, was the “event-driven OSS/BSS” notion, and also the “service-based” or “component-based” model.

We sort of did services and components, but the event-driven notion has been harder to promote.  There are some OSS/BSS vendors who are now talking about orchestration, meaning the organization of operations work through software automation, but not all orchestration is event-driven (as we know from the NFV space and the relatively mature area of DevOps for software deployment).  Thus, it would be interesting to think about what would happen should OSS/BSS systems be made event-driven.  How would this impact the systems, and how would it impact the whole issue of telco transformation?

We have to go back, as always, to the seminal work on NGOSS Contract to jump off into this topic.  The key notion was that a data model coupled events to processes, which in any realistic implementation means that the OSS/BSS is structured as a state/event system with the model recording state.  If you visualized the service at the retail level as a classic “black box” or abstraction, you could say that it had six states; Orderable, Activating, Active, Terminating, Terminated, and Fault.  An “order” event transitions to the Activating state, and a report that the service is properly deployed would translate it to the Active state.  Straightforward, right?  In all the states, there’s a key event that represents its “normal” transition driver, and there’s also a logical progression of states.  All except “Fault” of course, which would presumably be entered on any report of an abnormal condition.

You can already see this is too simplistic to be useful, of course.  If the service at the retail level is an abstract opaque box, it can’t be that at the deployment level in most cases.  Services have access and transport components, features, different vendor implementations at various places.  So inside our box there has to be a series of little subordinate boxes, each of which represents a path along the way to actually deploying.  Each of these subordinates are connected to the superior in a state/event sense.

When you send an Order event to a retail service, the event has to be propagated to its subordinates so they are all spun up.  Only when all the subordinates have reported being Active can you report the service itself to be Active.  You can see that the state/event process also synchronizes the cooperative tasks that are needed to build a service.  All of this was implicit in the NGOSS Contract work, but not explained in detail in the final documents (GB942).

Operations processes, in this framework, are run in response to events.  When you report an event to a subordinate (or superior) component of a service, the state that component is in and the event itself combine to define the processes to be run.  The way that an OSS/BSS responds to everything related to a service is by interpreting events within the state/event context of the data models for the components.

This approach contrasts to what could be described as the transactional or workflow approach that has been the model for most business applications, including most OSS/BSS.  In a transactional model, operations tasks are presumed to be activated by something (yes, we could think of it as an event) and once activated the components will then run in a predefined way.  This is why we tend to think of OSS/BSS components like “Order Management” or “Billing”; the structure mirrors normal business software elements.

To make the OSS/BSS operate as an event-driven system, you need to do three things.  First, you need a data model that defines a service and its subordinate elements in a structured way, so that each of the elements can be given a specific state/event table to define how it reacts to events.  Second, you need events for the system to react to, and finally you need to have OSS/BSS processes defined as services or components that can be invoked from the intersection of current state and received event, in any given state/event table.

Most OSS/BSS systems are already modular, and both operators and vendors have told me that there’s little doubt that any of them could be used in a modular-as-service way.  Similarly, there are plenty of business applications that are event-driven, and we have all manner of software tools to code conditions as events and associate them with service models.  What we lack, generally, are the models themselves.  It’s not that we don’t have service modeling, but that the models rarely have state/event tables.  Those would have to be authored as part of service-building.

You can see from this description that the process of modernizing OSS/BSS based on NGOSS-Contract state/event principles is almost identical to the process of defining virtualized function deployments as described by the NFV ISG, or the way that AT&T’s ECOMP proposes to build and manage services.  That has three important consequences.

First, it would clearly be possible to organize both SDN/NFV service lifecycle management and OSS/BSS modernization around the same kind of model, meaning of course that it could be the same model.  Properly done, a move in one space would move you in the other, and since automation of both operations and the lower-level lifecycle management processes are essential for opex efficiency and service agility, the combined move could meet transformation goals.

Second, the model could be defined either at the OSS/BSS level or “below” that, perhaps as independent NFV orchestration.  From where it starts, it could then be percolated up/down to cover the other space.  Everyone in either the OSS/BSS space, the SDN/NFV space, the DevOps or orchestration space, could play in this role.

Third, this level of model-driven integration of operations processes with service and resource management processes at the lower level isn’t being touted today.  We see services and service modeling connected to OSS/BSS, presumably through basic order interfaces.  If that’s accidental, it seems to suggest that even advanced thinkers in the vendor and operator communities aren’t thinking about full-scope service automation.  If it’s deliberate, then it isolates operations modernization from the service modeling and orchestration trends, which in my view would marginalize OSS/BSS and hand victory to those who wanted to completely replace it rather than modernize it.

That returns us to those two people at the meeting table, the two who had diametrically opposed views of the future of OSS/BSS.  Put in the terms of the modeling issue we’ve been discussing here, the “modernize” view would favor incorporating OSS/BSS state/event handling into the new service automation and modeling activity that seems to be emerging in things like ECOMP.  The “trash it and start over” view says that the differences in the role of OSS/BSS in a virtual world are too profound to be accommodated.

My own view falls between these two perspectives.  There are a lot of traditional linear workflows involved in OSS/BSS today, and many of them (like billing) really don’t fit a state/event model.  However, the old workflow-driven thinking doesn’t match cloud computing trends, distributed services, and virtualization needs.  What seems to be indicated (and which operators tell me vendors like Amdocs and Netcracker are starting to push) is a hybrid approach where service management as an activity is visualized as a state/event core built around a model, and traditional transactional workflow tasks are spawned at appropriate points.  It’s not all-or-nothing, it’s fix-what’s-broken.

Or, perhaps, it’s neither.  The most challenging problem with the OSS/BSS modernization task and the integration of OSS/BSS with broader virtualization-driven service management, is the political challenge created by the organization of most operators.  So far, SDN and NFV have been CTO projects.  OSS/BSS is a CIO domain, and there is usually a fair degree of tension between these two groups.  Even where the CIO organization has a fairly unanimous vision of OSS/BSS evolution (in the operator I opened this blog with, both views on operations evolution were held within the CIO organization) there’s not much drive so far to unite that vision with virtualization at the infrastructure level.

Could standardization help this?  The standards efforts tend to align along these same political divides.  The TMF is the go-to group for CIO OSS/BSS work, and the CTO organizations have been the participants in the formal bodies like the NFV ISG.  Underneath it all is the fact that all these activities rely on consensus, which has been hard to come by lately as vendor participants strive for competitive advantage.  We may need to look to a vendor for the startling insights needed.  Would we have smartphones today without Steve Jobs, if a standards process had to create them?  Collective insight is hard, and we’ve not mastered it.

Could We Unify CORD and ECOMP to Accelerate Infrastructure Transformation?

If you like the idea of somehow creating a union between CORD and ECOMP then the next obvious question is just where that union has to start.  The answer, in my view, isn’t in a place where both architectures contribute something that could be united, but where neither does enough and external unionizing forces are essential.  That’s the notion of modeling, not resources but functions.

In my last blog, I noted that integration depends on the ability to freely substitute different implementations of the same function without changing the service definitions or the management practices.  To make that happen, you need to have some Platonic shapes that define all the functions you intend to use in composing services…or even applications.  Each of these models then represents the “look and feel” of the function as seen from above.  The vendors who want to contribute those functions are responsible for building downward from the abstract model to make sure what they do fits seamlessly.

The goal is to make a function’s “object” into a representation of that function through the service lifecycle.  You manipulate the function at the model level, and the manipulation is coupled downward into whatever kind of implementation happens to be used.  That way, things that have to view or control a “router” don’t have to worry (or even know) whether it’s an instance of software, a physical device, or a whole system of routing features either built by SDN forwarding or by combining devices/software into a “network”.

The TMF really got a lot of this started back in the 2006-2010 timeframe, with two initiatives.  One was the “NGOSS Contract” that proposed that events would be steered to the appropriate lifecycle processes through the intermediary of the model service contract.  That approach was the first to make a contract (which the TMF modeled as a series of connected service elements) into a state/event machine.  The other was the Service Delivery Framework (SDF), that explicitly targets the lifecycle management of services that consist of multiple functions/features.

To me, the union of these two concepts required the notion that each service element or model element (my “router”) be represented as an object that had properties determined by the class of feature it defined.  That object was then a little “engine” that had state/event properties and that translated standard class-based features (“a router does THIS”) into implementation-specific methods (“by doing THIS”).  A service was a structured assembly of these objects, and each service was processed by a lifecycle management software element that I called a “Service Factory”, a term the TMF briefly adopted.

Service lifecycle management, which starts by instantiating a service model onto real infrastructure by making the connections between the “objects” that define the model and a service-instance-specific way of deploying or committing resources, lives above the model.  It never has to worry about implementation because it manipulates only the abstract vision (“router”).  The first step in lifecycle management is responsible for deployment, and it makes the connections between the general object vision of available features (probably in the form of APIs) and the way each object is actually deployed in the referenced service.

When a model is deployed, the abstract “model” has to be changed from a template that describes something to an instance that represents something.  There are two basic approaches to doing this.  One is to actually spawn a set of software objects that will then run to process service lifecycle events.  In this approach, a service is a real software application made up of modules for the features.  The second approach is to use a general software tool that interprets the model as needed, meaning that there is in the instance of a service model a set of references to software, not the software itself.  The references could be real pointers to software processes, or they could be a data model that would be passed to a generic software element.

CORD uses abstractions to represent things like the access network and the service trunking.  There are also arguably standard models for resources.  The former are useful but not sufficient to model a service because they don’t have the functional range needed to support all the service features.  The latter open the question of “standardization” below the service objects, which I’ll get to in a bit.

ECOMP also contributes elements.  It has the notion of a service model, though I’d argue it’s not as specific as the approach I’ve described.  It has the notion of service lifecycle management, again not as detailed.  Much of ECOMP detail is in the management and resource portion of the issue, again below the service model I’ve described.

If CORD describes the CO of the future and ECOMP describes the integration of elements, then the thing that would unite them in a logical sense is a complete statement of the service models that relate the processes of ECOMP with the resources of CORD.  To consider that, it’s now time to address the question of what happens underneath a service model.  Here we have three basic options to consider:

  1. We could use the same modeling approach below as we had used for service models, so that the decomposition of a “router” object into a network of “router” objects would use the same tools.
  2. We could use some other standardized modeling approach to describe how an “object of objects” is represented.
  3. We could let anything that works be used, foregoing standardization.

The best approach here, in my view, would depend on how many of the “other standardized modeling” approaches would be fielded in the market.  Below the service model, the mandate is to pick an implementation strategy and then connect it to the service-model’s object-level APIs.  You could see the work of the NFV ISG and MANO living down here, and you could also see modeling options like TOSCA, TMF SID, and YANG, and even more general API or data languages like XML or JSON.  The more options there are, the more difficult it would be to get a complete model from the underside of our highest-level service objects to the resources that will have to be committed.  That’s because it’s likely that vendors would support only a few model choices—their own gladly and everything else with great reluctance.

Clearly the last option leads to chaos in integration.  So does the second option, unless we can define only a very limited set of alternative approaches.  That leaves us with the first option, which is to find a general modeling approach that would work top to bottom.  However, that approach fields about as many different choices as my second one did—and it then demands we pick one before we can go very far in modeling services.  Given all of this, what I’d suggest is that we focus on defining what must be standardized—the structure of those abstract functional objects like “router”.  From there, we’d have to let the market decide by adopting what works best.

It should be easy to unify CORD and ECOMP with service modeling because both require and even partially define it, but neither seems to be firmly entrenched in a specific approach.  It’s also something that the NFV ISG might be ideally positioned to provide, since the scope of objects that need to be defined for the model are all within the range of functions considered by NFV.  It could also be done through open-source activities (including CORD and ECOMP), and it could be done by vendors.  Perhaps with all these options on the table, at least one could come to fruition.

There’s a lot at stake here.  Obviously, this could make both CORD and ECOMP much more broadly relevant.  It could also re-ignite the relevance of the NFV ISG.  It could help the TMF turn its ZOOM project into something other than a lifetime tenure for its members.  I also think that carrier cloud adoption could be accelerated significantly, perhaps by as much two years, if something like this were done.  But make no mistake, carrier cloud is going to happen and result in a lot of new money in the IT world.  Once that’s clear (by 2020 certainly) I think there will be a rush to join in.  For some, it will be too late to reap the full benefits.

Some Specific Recommendations on Boosting the Role of NFV in the Carrier Cloud

In the last several blogs I developed the theme of making NFV relevant and exploring the relationship between the drivers of “carrier cloud”.  One point that I raised, but without numerical detail, is the contribution that NFV would actually make to carrier cloud.  If you look at the model results, there’s a firm connection with NFV in only about 6% of carrier cloud deployment.  That doesn’t really tell the story, though, because there is a credible way of connecting over 80% of carrier cloud deployment to NFV, meaning making NFV relevant to almost all server deployments by operators.  If that’s the case, then the risk that NFV proponents face today is failing to realize that credible connection.

The challenge in realization comes down to one of several forms of integration.  There’s been a lot said about the problems of NFV integration, but most of it has missed all of the real issues.  If we look at the goal of realizing the incremental 74% link to carrier cloud that’s on the table, and if we start from that top goal, we can get some more useful perspectives on the integration topic, and maybe even some paths to solution.

The next four years of carrier cloud evolution are critical because, as I noted in yesterday’s blog, there’s no paramount architectural driver, or even any single paramount application or service, behind the deployments of that period.  The risk (again, citing yesterday’s blog) is that all the stuff that happens, stuff that will end up deploying almost seven thousand new data centers globally, won’t organize into a single architecture model that can then be leveraged further.  If “carrier cloud” is cohesive architecturally, or if cohesion can somehow be fitted onto whatever happens, then the foundation goes a long way toward easing the rest of the deployment.  This is the first level of integration.

The minimum operator/infrastructure goal for the next four years should be to build a single resource pool based on compatible technology and organized and managed through a common framework.  The resources that make up the carrier cloud must be the foundation of the pool of resources that will build the base for future services, for the later phases of cloud evolution.  That means that:

  1. Operators should presume a cloud host basis for all applications that involve software hosting, whether it’s for features, management, operations support, databases, or whatever. Design everything for the cloud, and insist that everything that’s not cloud-ready today be made so.
  2. There should be a common framework for resource management imposed across the entire carrier cloud pool, from the first, and that framework should then expand as the carrier cloud expands.
  3. Network connectivity with, and to, the carrier cloud resource pool should fit a standard model that is SDN-ready and that is scalable to the full 100-thousand-data-center level that we can expect to achieve globally by 2030.
  4. Deployment of anything that runs on the carrier cloud must be based on an agile DevOps approach that recognizes the notion of abstraction and state/event modeling. It’s less important to define what the model is than to say that the model must be used everywhere and for everything.  Deploy a VNF?  Use the model.  Same with a customer’s cloud application, an element of OSS/BSS, or anything else that’s a software unit.

The next point builds off this point, and relates to the integration of the functionality of a service or application using software automation.  Here I want to draw on my own experience in the TMF SDF project, the CloudNFV initiative, my ExperiaSphere work, and work with both operators and vendors in software automation and modeling.  The point is that deployment and application or service lifecycle management must be based on an explicit multi-layer model of the service/application, which serves as the connection point between the software that manages the lifecycle and the management facilities that are offered by the functional stuff being deployed.

A real router or a virtual software router or an SDN network that collectively performs like a router are all, functionally, routers.  There should then be a model element called “router” that represents all of these things, and that decomposes into the implementation to be used based on policy.  Further, a “router network” is also a router—a big abstract and geographically distributed one.  If everything that can route is defined by that single router object, then everything that needs routing, needs to manage routing, or needs to connect with routing can connect to that object.  It becomes the responsibility of the software automation processes to accommodate implementation differences.

The second level of integration we need starts with this set of functional model abstractions, and then demands that vendors who purport to support the NFV process supply the model software stubs that harmonize their specific implementation to that model’s glorious standard.  The router has a management information base.  If your implementation doesn’t conform exactly, then you have a stub of code to contribute that harmonizes what you use to that standard MIB.

This helps define what the model itself has to be.  First, the model has to be an organizer for those stubs of stuff.  The “outside” of the model element (like “router”) is a software piece that exposes the set of APIs that you’ve decided are appropriate to that functional element.  Inside that is the set of stub code pieces that harmonize the exposed API to the actual APIs of whatever is being represented by the model—a real router, a management system, a software element—and that performs the function.  Second, the model has to be able to represent the lifecycle states of that functional element, and the events that have to be responded to, such events coming from other elements, from “above” at the user level, or from “below” at the resource level.

This also defines what integration testing is.  You have a test jig that attaches to the “interfaces” of the model—the router object.  You run that through a series of steps that represent operation and the lifecycle events that the functional element might be exposed to, and you see whether it does what it’s supposed to do.

Infrastructure is symbiotic almost by definition; elements of deployed services should prepare the way for the introduction of other services.  Agile orchestration and portals mean nothing if you can’t host what you want for the future on what your past services justified.  CORD has worked to define a structure for future central offices, but hasn’t done much to define what gets us to that future.  ECOMP has defined how we bind diverse service elements into a single operational framework, but there are still pieces missing in delivering the agility needed early on, and of course ECOMP adoption isn’t universal.

To me, what this means is that NFV proponents have to forget all their past stuff and get behind the union of ECOMP and CORD.  That’s the only combination that can do what’s needed fast enough to matter in the critical first phase of carrier cloud deployment.  I’d like to see the ISG take that specific mission on, and support it with all their resources.

The Evolution of the Carrier Cloud

The concept of carrier cloud has taken some hits over the last months.  Media coverage for network functions virtualization (NFV) shows a significant negative shift in the attitude of operators over the progress of NFV.  Verizon sold off the cloud business it purchased, and is now reported to be selling off its cloud computing business.  Cisco, who had announced an ambitious Intercloud strategy aimed at anchoring a federation of operator clouds, dropped the fabric notion completely.  Vendors are quietly reassessing just what could be expected from sales of cloud infrastructure to network operators.

Do we have a problem here, and could it be reversed?  There have always been two broad drivers for “carrier cloud”.  One is cloud computing services, and the other is the application of cloud hosting to communications services infrastructure.  The history of carrier cloud is written by the balance of interest in these drivers, and so the future will be.

A decade ago, when operators were getting serious about transformation and looking for new service vehicles to embody their goals, they believed that public cloud services were going to be the driver of carrier cloud.  By 2012 support for that view had declined sharply.  Verizon did the Terremark deal in 2011, a year after the high-water mark of operator interest in public cloud services.

What soured operators on public cloud services is lack of support for any credible revenue opportunity.  Many of the operators I’d surveyed back at the start were presuming that the success of cloud computing was written in the stars.  Actually, it was only written in the media.  The presumption that every application run in data centers would shift to the cloud would have opened a trillion dollars in incremental revenue, which certainly was enough to make everyone sit up and take notice.  The presumption was wrong, and for three reasons.

The first reason is that the total addressable market presumption was nonsense.  Cloud computing as a replacement for current IT spending is an economy-of-scale play.  Enterprises, in their own data centers, achieve between 92% and 99% of cloud provider economy of scale.  There are some users whose operations costs are enough to add more benefit to the pie, but for most the efficiency difference won’t cover cloud provider profit goals.  Taking this into consideration, the TAM for business migration of software to the public cloud was never more than 24% of total IT spending.

The second reason is that even the real TAM for cloud services is largely made up of SMB and SaaS.  SMBs are the class of business for whom IaaS hosting can still offer attractive pricing, because SMBs have relatively poor economies of scale.  Enterprise cloud today is mostly SaaS services because these services are easily adopted by line departments and displace nearly all the support costs as well as server capital costs.  Since operators want to sell IaaS, they can’t sell to enterprises easily and direct sales to SMBs is inefficient and impractical.

The final reason is that real public cloud opportunity depends on platform service features designed to support cloud-specific development.  These features are just emerging, and development practices to exploit them are fairly rare.  An operator is hardly a natural partner for software development, and so competitors have seized this space.

For all these reasons, operator-offered cloud computing hasn’t set the world afire, and it’s not likely to any time soon.  What’s next?  Well, on the communications services side of carrier cloud drivers, the Great Hope was NFV, but here again the market expectations were unrealistic.  Remember that NFV focuses primarily on Layer 4-7 features for business virtual CPE and what are likely primarily control-plane missions in content and mobile networks (CDN, IMS, EPC).  The first of these missions don’t really create carrier cloud opportunities because they are directed primarily at hosting features on agile CPE.  The remaining missions are perhaps more credible as direct drivers of carrier cloud than as drivers of NFV, and it’s these missions that set the stage for the real carrier cloud opportunity.  Unfortunately for vendors, these missions are all over the place in terms of geography, politics, and technology needs.  A shift from box sales to solution sales might help vendors address this variety, but we all know the trend is in the opposite direction.

Virtualization will build data centers, and at a pace that depends first on the total demand/opportunity associated with each service mission and second on the hostable components of the features of each service.  Our modeling of carrier cloud deployment through 2030 shows a market that develops in four distinct phases.  Keep in mind that my modeling generates an opportunity profile, particularly when it’s applied to a market that really has no specific planning drive behind it yet.  These numbers could be exceeded with insightful work by buyers and/or sellers, and of course we could also fall short.

In Phase One, which is where we are now and which will last through 2020, CDN and advertising services drive the primary growth in carrier cloud.  NFV and cloud computing services will produce less than an eighth of the total data centers deployed.  It’s likely, since there are really no agreed architectures for deploying cloud elements in these applications, that this phase will be a series of ad hoc projects that happen to involve hosting.  At the end of Phase one, we have deployed only 6% of the carrier cloud opportunity.

Phase Two starts in 2021, ushered in by the transformation in mobile and access infrastructure that’s usually considered to be part of 5G.  This phase lasts through 2023, and during it the transformation of services and infrastructure to accommodate mobile/behavioral services will generate over two-thirds of the carrier cloud data center deployments.  This phase is the topic, in a direct or indirect way, for most of the planning now underway, and so it’s this phase that should be considered the prime target for vendors.  At the end of Phase Two, we will have deployed 36% of the total number of carrier cloud data centers.

This is perhaps the most critical phase in carrier cloud evolution.  Prior to this, a diverse set of missions was driving carrier cloud and there’s a major risk that this would create service-specific silos even in data center deployment.  Phase Two is where we’ll see the first true architecture driver—5G.  Somehow this driver has to sweep up all the goodies that where driven before it, or somehow those goodies have to anticipate 5G needs.  How well that’s managed will likely decide how much gets done from 2021 onward.

The next phase, Phase Three, is short in chronological terms, lasting from 2024 through 2025.  This phase represents the explosion of carrier cloud opportunity driven by the maturation of contextual services for consumers and workers, in large part through harnessing IoT.  Certainly, IoT-related big-data and analytics applications will dominate the growth in carrier cloud, which by the end of the phase will have reached 74% of the opportunity.  In numbers terms, it is Phase Three that will add the largest number of data centers and account for the fastest growth in carrier cloud capex.  It’s worthy to note that cloud computing services by operators will see their fastest growth in this period as well, largely because most operators will have secured enough cloud deployment to have compelling economies of scale and low-latency connections between their data centers and their users.

The final phase, Phase Four, begins in 2026 and is characterized by an exploitive application of carrier cloud to all the remaining control-plane and feature-based missions.  Both capital infrastructure and operations practices will have achieved full efficiency at this point, and so the barrier to using carrier cloud for extemporaneous missions will have fallen.  Like the network itself, the carrier cloud will be a resource in service creation.

The most important point to be learned from these phases is that it’s service missions that drive carrier cloud, not SDN or NFV or virtualization technology.  Benefits, when linked to a credible realization path, solve their own problems.  Realizations, lacking driving benefits, don’t.  SDN and NFV will be deployed as a result of carrier cloud deployments, not as drivers to it.  There is an intimate link, but not a causal one.

If all this is true, then supporters of the carrier cloud have to forget the notion that technology changes automatically drive infrastructure changes.  Technology isn’t the driver; we’ve seen that proven in every possible carrier cloud application already.  The disconnect between tech evolution and service evolution only disconnects technology evolution from real drivers for change, and thus from realization of transformation opportunities.

We are eventually going to get to the carrier cloud, but if the pace of that transformation is as slow as it has been, there will be a lot of vendor pain and suffering along the way, and not just for network equipment vendors.  Open source has been the big beneficiary of the slow pace of NFV, and open white-box servers could be the result of slow-roll carrier cloud.  Only a fast response to opportunity or need justifies early market movement, and creates early vendor profit.  You don’t get a fast response by tossing tech tidbits to the market, you get there by justifying a revolution.

How Can SDN and NFV Prove Their 5G Relevance?

I said in a number of blogs last week that 5G wasn’t an automatic savior for SDN and NFV.  It’s not that neither concept could support 5G, or even that 5G wouldn’t be better if SDN and NFV were incorporated.  There are instead two critical truths to deal with.  First, there isn’t anything currently in 5G that would demand the use of SDN or NFV.  Second, the best applications of SDN and NFV within 5G would be just about as compelling if they were deployed outside, and before, 5G.  To win, it’s my contention that SDN and NFV can’t rely on 5G for support; they have to anticipate it.

Presuming this is true (which I believe, but which you’ll have to decide for yourself) then what has to be done in SDN and NFV to do that essential anticipating?  How could the two technologies evolve and change in focus to make the most of the 5G windfall, and perhaps even earn some respectable bucks before it?  That’s what we’re going to deal with today, sometimes with SDN and NFV together and sometimes considering one or the other.

There is one central truth for both technologies.  Nothing that doesn’t pull through carrier cloud has any chance of generating a significant opportunity for either SDN or NFV.  In fact, carrier cloud is so much the lead technology here that both SDN and NFV in operator missions should be considered only in that context.  That particular truth is good for SDN, whose principle success has come in the cloud computing data center, but not so good for NFV.

The first and foremost point for NFV is think multi-tenant and not service chain.  Too much time has been spent focusing on NFV virtual CPE (vCPE) missions that are useful primarily to business customers, and that are probably best supported via versatile mini-server-like CPE, not carrier cloud.  The difference between NFV and carrier cloud is that NFV has a single-tenant focus and nearly all the applications that are credible drivers of carrier cloud are multi-tenant.

All of mobile infrastructure is multi-tenant, and not just today.  Do you think 5G network slicing is going to slice every mobile user their own network?  Nonsense.  Yes, you could use NFV principles to deploy elements of IMS or EPC, but aren’t these two applications really carrier cloud and not NFV?  What role could NFV constructive play in multi-tenant feature deployments that OpenStack by itself couldn’t do just as well?

On the SDN side, Issue One is get out of the data center!  Network slicing in 5G is just a cry for efficient partitioning of networks using some form of virtualization, which could be either something done below Ethernet/IP (virtual wire) or something above (tunnels, overlay network).  Do you think 5G planners sat around in a bar and dreamed up stuff to ask for absent any actual interest among the operators they represent?  More nonsense.  There is a current mission for network slicing—several in fact.

If you look at the market from a distance, it’s clear that there’s been continued groping toward a model of networking to replace OSI’s seven layers.  Over a decade ago, Huawei presented their “Next-Generation Service Overlay Network.”  Nicira made news with an overlay-based “SDN” technology for cloud data centers.  SD-WAN is being presented as a successor to MPLS, and Nuage’s overlay SDN is in my view the best current model of virtual networking available.  All overlay concepts, all based on the presumption that it’s better to create service, user, or application networks using an overlay than to require that the user participate in what’s essentially transport-network behavior.

The Third Network model of the MEF is the right model in perhaps too limited a forum.  An overlay has to be underlay-independent.  Should we even try to define a specific approach, or should we agree that any overlay is fine among consenting adults?  The SDN world needs to step back and address goals and benefits.  The operators need to push for that, and those network vendors who aren’t too wrapped around current-network revenue streams need to push it too.

Coming back around to NFV, the next point is the need to find opportunity in dynamism.  NFV as it has evolved harnesses cloud computing tools in a service-feature context.  That’s wrong in two dimensions.  First, limiting NFV to a service feature context—limiting it to virtual functions inherited from physical network devices—pulls it out of the truly valuable mission of supporting capabilities that not only were never part of a device, they couldn’t be so.  Second, if you’re going to harness a technology capability to frame your own features, your first market obligation is to differentiate yourself from your parent, which NFV hasn’t done.

A function that’s put someplace and lives there till it breaks is more like a cloud function than a virtual network function.  Scaling, resiliency, and other attributes of features can help to create essential dynamism in a mission, but has NFV really done anything for scaling and resiliency other than to specify that it has to be there?  What cloud application isn’t supposed to have the attributes?  In the cloud, we know these attributes are most likely created via DevOps.  We know how DevOps works.  What creates them in NFV?

5G is supposed to be able to provide low-latency paths by providing for local hosting of features.  OK, what features?  Most of the stuff we propose to host comes from things like IoT, whose “features” are a lot more like application elements than like network functions.  But IoT does potentially expose both per-user “agent processes” and multi-tenant shared services.  It would be a great place to start in building a useful model of NFV.

That same area of IoT and local processes unites SDN and NFV in another point—it’s time to separate functions.  Data-plane activities should be primarily SDN elements.  If we’re going to stick a firewall process into the data plane, then we should be seeing it as an adjunct to a broader SDN mission to enhance security.  If we’re going to talk latency, we have to decide if we’re talking about data-plane latency—which only data-path elements can assure—or transaction/control latency, which local hosting of functions can actually help with.

As I suggested earlier in this blog, there’s not much credibility to spawning single-tenant VNFs for the broad consumer market.  That market almost demands multi-tenant elements, and the fact this was missed in planning for NFV means that we really needed to ask ourselves whether NFV missions were associated with market niches whose cost tolerance would never permit separate function deployment and management.

All of the things that SDN and NFV need to do could be done today, without a whiff of 5G justification.  They could have been done from the first, and had they been I think we’d have seen far more adoption of SDN and NFV today.  We need to learn a lesson here, which is that good ideas are ideas that deliver on benefits.  You don’t have to pull them through with a vessel.  I know that I’ve been hopeful that 5G investment would provide an opportunity for SDN and NFV, and I believe it will—but only an opportunity.  Just as the two technologies have failed to target their own key business case in the past, they could fail in the 5G future.  If that happens, then 5G will advance without them.

But not as far.  The concepts that differentiate 5G from 4G are the same concepts that should have differentiated SDN and NFV from legacy networking.  Can 5G redevelop these notions, notions that were ignored or underplayed in the past, without SDN and NFV?  It’s a dilemma because either 5G would have to explicitly reference SDN and NFV and then become dependent on their fulfilling a benefit mission set that’s not been fulfilled so far, or ignoring them and defining technologies anew.  Neither is going to be easy, but one or the other has to be done.

This isn’t a technology problem, either.  SDN and NFV technology and “carrier cloud” could be done right almost instantly.  It’s a positioning problem.  Vendors have taken the view that operators are “demanding” all this new stuff, when the truth is that operators (at the CFO level in particular) aren’t even sure the new stuff is useful.  A vendor who supports NFV supports nothing in particular; same with SDN and carrier cloud.  A vendor who can take the time to develop the business case for transformation to SDN, NFV, carrier cloud, and 5G is a contender.  However, vendors have always been reluctant to do market education because it benefits their competitors.  So, do the media or analyst communities do the educating?  Nonsense again, and that’s the core of our real challenge in transformation.  I hope it gets solved with the 5G process, but I’m still skeptical.

Could 5G Promote SDN and NFV?

Continuing on my recent theme of 5G assessment, I want to look today at the relationship between 5G, SDN, and NFV.  Because 5G evolution is considered a given in the industry, many vendors in the SDN and NFV space have been eager to link their wares to 5G as a means of insuring they’re not trapped by slow discrete SDN and NFV adoption.  Can SDN and NFV be pulled through by 5G, or would the linkage be as likely to slow 5G down?

We have to start with what’s almost a standard disclaimer, which is that the details of 5G architecture won’t be completely defined until Release 16, which isn’t due until mid-2020.  The core architecture, however, will be available with Release 15 late in 3Q18.  That means that we’re not really going to know the exact nature of 5G for about a year and a half.  Even given that vagueness offers an unparalleled opportunity for positioning agility among vendors, that’s still a long time in the future.

The high-level model of 5G (an example is found HERE) envisions a five-layer structure with a Business Services Layer on top, then a Business Functions Layer, an Orchestrator layer next, then a Network Function Layer, and an Infrastructure Layer on the bottom.  Operators who own infrastructure would implement all five layers, while those who offered “virtual network operator” or “over-the-top” services would implement the top two and would access infrastructure via a specifically called-out Northbound Interface exposed by the Orchestrator layer.

The NBI is important here because it’s the most specific (in most diagrams, the only specific) inter-layer interface that’s defined in the preliminary model.  The presumption is that NBI-exported features (which I’ve been calling “behaviors”) are assembled into “vertical business functions” that are then combined within the Business Functions Layer to create retail services.  Thus, in 5G, the NBI represents the boundary between what I’ve been calling the “service domain” and “resource domain”, referencing SDN/NFV or virtualization architectures.  Below Orchestration, the Network Function layer presumably exports Network Functions for composition into my “behaviors”.  These are assembled from the raw infrastructure.

Orchestration is a familiar NFV concept, of course, and many of the models of NFV orchestration map fairly nicely to the Orchestration layer of 5G.  It also maps to AT&T’s ECOMP, Verizon’s SDN/NFV architecture, and a bunch of OSS/BSS models.  In fact, the position of the Orchestration layer in 5G seems to make it clear that this isn’t NFV MANO—it has to deal with more generalized deployment processes.  That’s where the ambiguity in the relationship between 5G and NFV comes in.

A Virtual Network Function is a hosted cousin of the Physical Network Function from which it is derived, according to classic NFV ISG thinking.  It’s my view that the 5G model’s presumption is that a given “network function” in either it’s “V” or “P” flavor would be abstracted equivalently and exposed identically, so that the Orchestration process could use them interchangeably.  From this, it’s clear that there is no reason why 5G couldn’t be built from infrastructure in which nothing was virtualized at all, as long as whatever PNFs were there were exposed correctly.  All that means is that the function as a device could be deployed and managed.

In this approach, 5G orchestration is indeed NFV-like, but not like the ISG flavor of NFV that focuses explicitly on virtual function deployment.  Instead it’s like the top layer of the ECOMP model, the Master Services Orchestrator or MSO.  That means that from operators’ perspectives, 5G architecture and NFV depend on a common high-level concept of service orchestration.  NFV does not define it, but it needs it.  5G will presumably define it, at least in terms of its NBIs.

You don’t need NFV to do the 5G architecture, but that doesn’t necessarily mean that NFV would have no value in 5G.  To understand what the value could be, we have to look at the differences between VNFs and PNFs.

PNFs, meaning devices, are real boxes that tend to be put somewhere and sustained there until they break or a network topology change requires they be moved.  Their functionality is typically fairly static, meaning that while they can be updated to add features their main mission is consistent.  A “router” can have protocols and features added to it, but it typically stays a router.

VNFs, as hosted versions of PNFs, could also be viewed this way.  You could build a network of routers or router instances, and if we assume that the hosting of the instances supported the performance requirements of the mission, the two would be equivalent.  There are many missions of VNF that would be like this, and this mission is really more like that of a cloud-hosted router element than a VNF because none of the complexity of deployment or redeployment really applies to it.  Virtual boxes of this type stay put like their PNF counterparts.

What would validate a VNF versus a PNF is dynamism.  If a network of VNFs could benefit from different node placements and topologies, you can create those different models by simply re-instantiating and re-connecting, presuming you have a rich resource pool that’s highly interconnected.  5G doesn’t necessarily demand this kind of thing, however, and it will be difficult to say how much 5G would preference it versus another earlier network model for applications like IoT.  We don’t have the details of 5G service or its architectural framework yet, nor will we for that year-and-a-half period.

But it’s SDN that is the big deal for 5G in my view, though it’s not clear just what “SDN” we’re talking about.  Network slicing, as I noted earlier this week, is a multi-tenancy issue.  SDN can support that in any OpenFlow, tunneling, or overlay model.  To me, the big question about the value of 5G is how valuable network slicing will turn out to be.  In addition, SDN could be a big part of a modern vision of mobility management to replace the old EPC concepts of previous wireless generations.  A supercharged mobility management strategy could almost justify 5G on its own.  We don’t need NFV for either of these, but we could sure use SDN.

We can’t wait forever, though.  As I noted at the start of this blog, even foundation specs for 5G are a year and a half away, and the industry isn’t waiting.  We’re already seeing things like the “fronthaul” movement in fiber, rich deployment of edge dark fiber to feed microcells in urban areas.  Will an industry sit around and wait for formal specs when they have assets that could be exploited?  An outrun standard is an irrelevant standard, except insofar as it drives industry attention and moves markets.  5G could be another indication that formal standardization just cannot keep up with the information age, and if that’s true then it’s the ad hoc market reaction to 5G and not 5G itself that we have to watch for SDN/NFV impact.

What Do the Operator CFO Organizations Think About 5G?

We have all heard about 5G at this point, mostly through the heady “you’ll-love-it” pieces that have been done.  Not surprisingly, most of these tout features that are either never going to happen (some of the speed claims are for point-to-point millimeter-wave applications only, not for mobile users) or are still in a state of definition flux and so may not happen in the near term.  The big question in 5G progress is less what the standards might be aiming for than what operator CFOs are really looking at in the near term.  So I asked a bunch of CFOs or their direct reports about 5G, and here’s what I found.

Obviously, CFOs aren’t technologists, so it’s not surprising that their views are more directed at the impact of 5G on their business.  It remains for the CTO, COO, and CFO to bring these benefit-biased views into harmony with technology options, or perhaps the vendors will have to do that.  Certainly there’s a risk that natural alignment between CFO goals and technology choices won’t develop.  The lack of that connection has hurt both SDN and NFV, after all.

The first key CFO hope for 5G is the elimination of the burdens of specialized mobile infrastructure.  Mobile connectivity is supported over what’s essentially a specialized overlay network that handles signaling traffic and supports user movement across cell boundaries—the Evolved Packet Core or EPC.  EPC has been a bit of a trial from the first because of its cost, and as mobile traffic and the number of mobile devices increases, the CFO sees red ink flying in great clouds.

There is a technical goal for 5G that seems connected to this, but the precise way in which 5G would address mobility is still up in the air.  To actually eliminate EPC would mean dealing with mobility management in a new way, which many 5G proponents think is a given if the architecture is to make any sense.  There are evolutions to IP to disconnect location from user address in a routing sense, and of course traditional overlay network technology could also address mobility more flexibly if you virtualized the elements of EPC or simply used overlay VPNs.  The idea that virtual EPCs would pave the way to 5G embodies this last view (and proponents see EPC networks living inside network slices).

I think it’s premature to say that virtual EPC paves the way for 5G, given that we don’t seem to have a clear idea of what “native” or end-game 5G would look like in the area of mobility management.  Since this area seems to be the top CFO priority, getting quick clarity on this point should be a priority with vendors who want to do more than support 5G trials.

The second hope CFOs have for 5G is creation of a new model of fixed access that combines fiber and 5G radio (perhaps even millimeter-wave) to more efficiently connect homes.  FTTH is practical in some areas where user density is high enough, but where it isn’t the fallback has been FTTN (for “node” or “neighborhood”), with copper forming the last segment.  While DSL technology has advanced considerably, it still falls short of CATV cable or fiber potential, and CFOs fear a competitive war based on capacity might force them to less profitable fixed-network deployments.

Obviously small 5G cells could be used in an FTTN deployment to reach nearby users.  That doesn’t seem to demand anything revolutionary from the standards process or vendors.  However, there’s also a lot of interest among CFOs for the idea that these FTTN cells would also be 5G mobile cells, leveraging the FTTN nodes to improve coverage and performance.  This could be particularly valuable, according to about half the CFOs, in the IoT space for both home control and vehicular 5G.

Network slicing and perhaps function virtualization could be the technical features promoted by this goal, and I think we’re far enough along to be able to get some traction on the point in early trials—providing carriers and vendors converge on trying the application.

The third hope for 5G is creation of a single architecture for access/service networking, one that lets services live above both current and future wireless and wireline access connectivity.  One reason for this is to facilitate roaming seamlessly between WiFi (local or public) and cellular networks.  That issue has gotten hot since Comcast said it would be doing a WiFi-based mobile service that’s widely expected to involve MVNO relationships with some cellular providers (like Google Fi already does).

In addition to facilitating WiFi roaming, network multi-tenancy, or slicing, is a requirement to support MVNO relationships, and CFOs also think that competition is going to drive the need for those relationships as a means of reducing customer acquisition and retention costs.  However, CFOs seem to think that slicing has to be extensible broadly—a kind of explicit overlay/underlay or multi-tenant structure that crosses from wireless to wireline, access to core.  That vision would favor an SDN-driven transformation on a broader scale.  While some vendors would surely like to see that (or should, particularly ones like ADVA or Ciena), nobody seems to be pushing it according to the CFOs.

What’s the last CFO hope for 5G?  That 5G will be the last “generational” change, to launch a less dramatic evolutionary path to the future.  Wireless services have evolved as a series of major lurches, each posing major changes in technology.  The CFOs hope that with 5G we can make the components of mobile infrastructure more modular, so that new services and features can be introduced ad hoc, on demand.

This hope is last not because it’s narrowly held but because it’s vague.  What the CFOs are concerned about is that the exploding changes driven by consumer broadband are only beginning to hit network design, and that IoT could create even more dramatic shifts in requirements.  Generations, if one applies the human timeframe, are supposed to be 20 years.  One CFO suggested that maybe for networking demand, 20 months was more realistic.  If that’s true, then the cost of wireless modernization to keep pace would be prohibitive unless a modular-substitution model were applied.

You can kind of draw this capability from the diagrams of vendors like Ericsson, Huawei, and Nokia, but the points aren’t explicit.  I think the biggest problem in creating an explicit evolution-versus-generation plan is the lack of a specific model that represents where 5G itself would be starting.  We can envision backward compatibility with LTE, but what is LTE then being compatible with?  It has to be more than consuming a slice of 5G; what does 5G itself do without the LTE slice?

The single-architecture goal doesn’t mean that CFOs think 5G is a fork-lift upgrade or an all-or-nothing proposition.  In fact, their hope is that its modularity will mean that it will be possible to introduce 5G for limited missions, with the single-architecture attribute insuring that no matter where you start, you end up in the right place.  I think CFOs have felt that way about SDN and NFV as well, and I hope their aspirations are more easily met with 5G.

Is the Huawei Win at Telefonica Decisive for NFV and Equipment Competitors?

The announcement that Telefonica has turned to Huawei for its virtual EPC is certainly of major significance.  Huawei is the only major telecom vendor who’s managed to gain significantly in revenue over the last several years, largely due to its price-leader status.  Competitors like Cisco, Ericsson, and Nokia have hoped to wrestle away some market share through product innovation, and aspiring NFV vendors like HPE saw virtualization as an entry point into the networking space.  Now, Huawei seems to be taking a product innovation lead too.  Is there no stopping them?

Not if you don’t address the reason that they got started.  I had a memorable strategy session with a major network vendor about eight years ago, and during the meeting I was asked to list out their competitors and their strengths and weaknesses.  When I did, one of the strategists called me out triumphantly—“You forgot the most important competitor—Huawei!”  My response was “Huawei is not a competitor, they’re a winner if you decide not to compete.”  That was a more critical point than I think that vendor recognized.  Eight years ago, there was plenty of time to take an aggressive position on feature-driven network evolution.  Absent such a position, you have price-driven evolution, and that’s what happened.

But it’s also important to recognize that Huawei has been active in the feature side of the game, almost from the first.  Perhaps their most important contribution is something most have never heard of—the “Next-Generation Service Overlay Network” or NGSON.  This was something pursued about a decade ago, through the IEEE rather than through one of the more prominent network standards bodies.  It presaged the MEF’s Third Network, SD-WAN, and many of the concepts now emerging in 5G.  It’s also, in my view at least, an element in the formulation of the CloudEPC solution that Telefonica bought into.

NGSON’s failing isn’t technology, it’s publicity.  The IEEE, of course, isn’t as vocal and effective a body for promoting new technology as a nice fresh forum like the MEF, ONF, etc.  Huawei, no slacker in sales or technology, is surely weak in marketing and positioning, and when you combine the two it’s no wonder that NGSON didn’t shake the earth.

Service overlay network technology is the foundation of the Nuage SDN strategy, bought by Alcatel-Lucent who then merged with Nokia, and it’s the thing I’ve already called the primary asset Nokia has in the SDN/NFV space.  However, it’s obvious that Nuage was a bit of a Cinderella in ALU, and it seems to be the same in Nokia.  Certainly neither parent has done a lot in positioning Nuage as a strategic asset, so Huawei’s failure to promote NGSON hasn’t hurt as much as it might have.

Which, I think, is the key point here.  Certainly Huawei or any other telco equipment vendor doesn’t need the media to engage with customers, but the lack of aggressive positioning of key feature assets, for any vendor, lets competitors dodge embarrassing questions from the media.  Those questions might impact sales success were they to be asked.  Could that happen even with the Telefonica CloudEPC success?

Virtual overlays could be a key to making things like EPC work, in no small part because the whole notion of EPC (which relies on tunnel networks) is a form of an overlay network.  It’s also very likely that virtual overlays would make sense in a multi-tenant network-slicing solution, because overlay networks (Nicira, now NSX) was the first big step in cloud multi-tenancy and it’s an overlay solution.

One critical result of all the positioning mismanagement going on is that there’s no clear vision of just what a next-generation EPC would look like.  Most vendors call their next-gen EPC an NFV element, but there’s a lot more in common with traditional cloud computing than with NFV, since the elements of EPC are not single-tenant and that’s been the focus of NFV work so far.  And nobody is really dealing with the fact that EPC is about accommodating mobility, and the EPC mobility accommodation is supposed to be obsoleted by 5G.

If a key goal of 5G is to eliminate EPC altogether, then the smart money would bet on a mobile core architecture that did both 5G and EPC seamlessly.  Something like that would also introduce the notion of a “selective multi-tenant” approach to NFV, and it could exercise the flexibility in packet forwarding of OpenFlow and the tenancy management capabilities inherent in overlay networks.  Why isn’t that the right answer?  Could it be that vendors have just not bothered to position their stuff, or that they don’t really have a solution.

We can’t tell from Huawei.  They have documentation on CloudEPC on its website, but there’s little there beyond a vague diagram.  There is customer product data available, but that’s not generally available to the media/analyst community and it’s not really a tutorial on the conceptual framework of CloudEPC.  We can’t tell from Ericsson or Nokia either.  Might a competitor with little more in the way of a product than Huawei has be able to jump off with a strong campaign and take thought leadership?  Yes, but they’ve not done that so far.

Huawei’s win with Telefonica could be huge for both companies; as far as I can tell it would be the largest-scale “NFV” strategy yet deployed and it certainly hurts all the other vendors who have been hoping to win big with Telefonica as the leading EU operator in the NFV world.  That’s especially true for HPE, who was the first vendor to appear to have an inside track with Telefonica, and Ericsson who seemed to have the recent lead.  Is it really seminal for “NFV” or “EPC” or “5G?”  I don’t think so, and it’s not just because there’s no substance to the positioning of the vendors.

5G looms over everything, and yet we really don’t know what it will do or how it will do it.  We’re only a couple years from the supposed deployments, and the specifications are still swimming in uncertainty.  The biggest question for Huawei and Telefonica is how far their deployment will get without 5G capability, or perhaps how Huawei could introduce 5G evolution into their offering when there aren’t firm standards to work from.

In any event, mobile evolution is the last gasp of opportunity for Huawei’s telco equipment competitors.  As I’ve pointed out in the past, there are no other service transformations that could credibly position a critical mass of NFV assets in the near term.  Even IoT, which has great NFV potential, is likely to be subsumed into the 5G transformation.  Given all of that, Huawei needs to leverage its mobile NFV position to the utmost, and its competitors need to knock them down before it’s too late.