Can We Build Agile Infrastructure with the Overlay/Underlay Model?

Let us suppose for a moment that the goal of operators is to reduce equipment and operations cost in concert and at the same time increase their ability to provision current services quickly and flexibly, and develop new services just as quickly.  Let us further suppose that they have addressed the higher-level operations/portal implications of this.  What would the ideal network approach be?

Since it’s clear that operators do want exactly what’s presented in the last paragraph, this is a fair question.  Since the answer to the question will dictate infrastructure spending in the future, it’s an important one.  Interestingly, we have an answer for it, and it’s been around for a fair period of time.

If we go back to a point in my last blog, operators need to be able to make changes to costs and revenues without forcing a fork-lift, large-scale, change-out of infrastructure.  There is simply no way to bear the risk of a large transformation and at this point no time to prove out alternative infrastructure technologies to the degree needed to contain that risk.  We have to evolve with some grace into the future.

My conversation with the MEF’s CTO convinced me that their Third Network model has merit, providing that the model embrace something that is strongly hinted at but not featured—the concept of an overlay technology.  If the lower three layers of the OSI model (what the model says is actually in the network) is Levels 1 through 3, then let’s call this overlay layer Level I or Li for short.

The basic notion for Li is that services would be defined and delivered at this new layer, which would then consume tunnels (“virtual wires”) created at the layers below.  Since services would now be using existing network technology only as a physical layer, you’d be able to change out any or all of that stuff at whatever pace you find optimal because lower-layer implementations are opaque to the higher layers.

Overlay connections are based on a header that’s appended to data payloads before they’re encapsulated for handling by the tunnel protocol.  They subdivide the traffic at any tunnel-point, and at each such tunnel-point the subdivisions can either extract the traffic with a given header and deliver it to a user access point, or “cross-connect” it to another tunnel.  It’s in how this is done that the efficiency and value of the Li model is determined.

In the original Nicira overlay-SDN model, a LAN or VLAN or VPN architecture created the tunnel paths, and these connected physical network/IT elements like servers.  The SDN overlay then subdivided access by tenant.  In theory, each server could either extract header-identified traffic for its local users or cross-connect it onward.  This is not unlike how lower OSI layers relate to higher layers; you can pull traffic from a LAN (Level 2) and connect it to another LAN through a WAN connection, via a router.

The current SD-WAN products have a slightly different approach but use the same overlay concept.  Here, a series of connections made at a lower level to the same access point are effectively united by a higher overlay that can ride on any of the low-level options.  This higher layer then presents the user interface.

The general overlay model that might be viewed as the basis for MEF’s Third Network should be able to work with any of the following tunnel-models:

  1. The lower-level tunnels can connect all the way to the access points, creating a virtual mesh. The overlay technology would then provide only service-specific handling and addressing, and each tunnel access point would simply forward a packet on the right tunnel.  This would work for modest-scale virtual networks where a fully scalable forwarding technology (like SDN switches) was used.
  2. The lower-level tunnels connect to some number of aggregation points hosted within the network based on traffic topology. At these points, forwarding rules would cross-connect them.  This is the structural model that would optimize the use of hosted/virtual router instances.
  3. The lower-level tunnels, in addition to one of the above approaches, cross a protocol or administrative boundary where tunnel-to-tunnel connection is not available, and where tunnels from each side must therefore terminate. The Li layer now has to cross-connect the tunnels appropriately just to pass across the boundary.

The issue that can mess up a good overlay strategy could be called “tunnel granularity”.  If you have too little tunnel granularity, then you can’t create tunnels to the access points for an overlay-based service without a lot of tunnel cross-connecting.  Not only does this process increase delay and packet loss risk, the fact that it’s happening for a concentration of users sharing an inadequate number of lower-level tunnels means it might well grow in demand to the point where addressing it with a hosted router instance would be difficult.  You’d like to get your lower-level tunnel mesh as close to serving all the access points as possible.  The MEF has been working to improve Ethernet’s ability to support connected-path multiplicity efficiently, and that’s good.

Here is where “universal SDN” might be very helpful.  If you think of an OpenFlow-driven concatenation of forwarding table entries as a kind of “naked tunnel”, you see that SDN could create any arbitrary tunnel configuration end to end if desired.  If you combine this with agile optics (ROADMs) then you’d have a highly functional physical layer over which you could overlay any convenient L2/L3 service protocol while largely ignoring issues like topology and even path failures (because they’d be handled or controlled below).

The overlay approach would be easy to apply to mobile infrastructure because it’s already heavily based on tunnels (EPC).  It would also be easy to apply to business virtual network services and to cloud application services.  It’s not as clear that you could adopt an overlay model for the Internet, which suggests that either you’d want to retain standard Internet routing at least in the core and augment it with SDN forwarding, or at least retain it for non-content delivery services, which are already supported largely from CDNs.

There’s no shortage of potential vendors to support the model, starting with the classic overlay-SDN Nicira/VMware play and extending to SD-WAN vendors like Talari, Citrix/CloudBridge, Silver Peak, and Riverbed/SteelConnect.  In addition, most virtual routers (software router instances) can interconnect tunnels and so could be used to build an overlay-modeled service framework.  However, vendors have been shy so far in committing to the approach, preferring to sell to enterprises in more limited missions rather than to operators.  Even the SD-WAN vendors whose products could easily frame an overlay model (even within the Third Network approach) haven’t played that capability as a differentiator.

The likely reason for this is that selling SD-WAN to enterprises is working, and selling it as a mainstay for next-gen networking is a Great Unknown, particularly for vendors who don’t call on operator CTOs and don’t participate in emerging-network standards.  Despite the resistance, I think it’s clear that overlay networking could play a major role in next-gen infrastructure, perhaps the dominant one.  It may be that the evolution of the MEF’s Third Network will finally legitimize the approach and address the critical question of overlay/underlay relationships.

How Equipment Vendors Can Counter Cautious Operator Spending

With the exception of Huawei, network equipment vendors are facing tightening spending by operators.  The reason, obviously, is that compression in profit-per-bit that I’ve been talking about—the compression that’s led to operator support for “transformation” and their interest in SDN and NFV.  Since SDN and NFV have not evolved fast enough and far enough to generate the kind of radical improvements in cost and revenue operators had hoped for, their only response is to slow capital spending.  The impact is greatest in wireline, because wireless is too competitive for anyone to skimp on improvements.  Vendors like Juniper with little credible wireless contribution to make suffer, obviously, but nearly every vendor who isn’t a price leader is feeling the pinch.

So what’s to be done?  While I might be (and am) confident that there’s a way out of the compression problem, I’m not the guy who’s going to be stuck with an enormous technical albatross if the method doesn’t pan out.  Operators have long capital cycles and so they’re unusually risk-adverse with respect to writing down failures.  They either have to reduce the risk you’re doing the wrong thing, or they have to do nothing—or more precisely they have to do nothing different and risk-building in terms of architecture, just build with cheaper components.  Hence, Huawei.

Vendors across the board have failed to deal with this resistance to failure, and some add insult to injury by promoting the OTT notion of “fast fail” as a model the operators need to adopt.  There is no way of fast-failing a trillion-dollar infrastructure.  What operators need, and have always needed, is a set of new-approach hypothesis that link directly to a benefit, and that can be proven out in a modest-scale trial.  The biggest casualty of bottom-up specifications is the potential to fulfill this need.  The early work has no business context in which it can prove either benefits or realization.

But OK, we’re here and it’s now.  AT&T and Verizon have both issued papers describing an architecture model and these should resolve a lot of the issues, right?  Not so fast.

What AT&T has done is set goals, which can define benefits.  What Verizon has done is frame a solution set inside an architecture.  You can do a lot with these two, particularly if you could somehow combine them, but guess what?  Vendors are singularly unimpressed with, perhaps even unhappy with, the two approaches.  “Selling” has become not a fitting of your product plans to the demands of your buyers, but rather a coercion or manipulation of your buyers to accept what you’ve decided to produce.  The reason, of course, is that vendors want to make money and their fondest wish is that operators just suck it up and buy stuff and forget all this newfangledness.

What we need now is recognition that resolving the problem of cost reduction without killing vendor support for your initiatives must lie in focusing on costs other than equipment costs, and in achieving transformation on a large scale without making infrastructure changes on a large scale.  As I’ve said in past blogs, this means focusing on opex reductions that can be achieved by a higher layer of orchestration, one that accommodates both legacy and new SDN/NFV technologies.  That would let you test and realize benefit-based changes without forcing you to commit to major infrastructure upgrades that could only be justified if you were sure they’d work.

The problem getting to that (happy?) goal is a combination of the fact that when you do a bottom-up spec you don’t get to the top till the end of the process (if at all) and that same old issue of vendor self-interest.  Network equipment vendors are reluctant to embrace top-down abstraction-based operations because it anonymizes network equipment and threatens incumbencies.  IT vendors in the SDN/NFV space are similarly reluctant because these top-down approaches don’t sell servers right away.  And everyone is reluctant because, in the main, they don’t have the top-layer tools in place.

One of the most important developments in this area is the emergence of operator-driven initiatives to define holistic SDN/NFV architectures.  Verizon, in particular, has emphasized the notion of a layered orchestration model that would allow a higher-level orchestrator to harmonize not only legacy and emerging network technologies but also multiple vendor-specific implementations.  This overcomes the fact that neither SDN nor NFV standards include modernizing operations practices or incorporating or evolving future networks from legacy deployments.

Another potential solution is the use of a generalized orchestration model, championed by some of the six vendors who have complete NFV solutions (ADVA, Ciena, and HPE in particular).  This approach could in theory be applied two ways—a top-to-bottom orchestration architecture and a selective architecture.  With the former, the vendors’ solutions would be accepted as the only orchestration approach, and this seems to run afoul of the current service-specific SDN/NFV evolution trends.  With the latter, you’d adopt the generalized orchestration model where there’s no competing implementation, and use a stub/adapter to incorporate the competing models by supporting abstractions that represent them.

It’s this last approach that shows the most promise, but vendors have not been enthusiastic in promoting it.  Part of the reason is that most still hope to achieve their own “lock-in” of early NFV deployments, and fear that embracing an open model would hurt them as often as help them.  Part is the fact that you have to implement the stubs to represent “foreign” models, which of course means that there has to be some foreign model structure to represent.  At this point, absent any specific intent-model requirement for NFV or SDN (SDN’s is coming along), that could be challenging.  In particular, it would leave the top-level orchestration vendor at risk to changes made by vendors below.

That problem, unfortunately, can happen in any multi-layer orchestration approach, and that’s why in the end the operator models may be the only hope.  Verizon or another Tier One could compel vendors to open and stabilize their models so that lower-level service-specific implementations would fit inside an end-to-end orchestration model.  Other vendors almost surely could not.

Everything comes back to the point I made about vendor differentiation and model-based abstraction.  If operators think of equipment as simply a realization of a given abstract model, then it’s harder for vendors to differentiate.  Operator-driven models would probably not include special differentiating features from vendors, given operator demands for an open approach.  Vendors need to somehow support open-network goals and retain some opportunity to exploit their own special sauce.

The first-quarter slump in operator spending (which vendors want to believe is just a blip on an otherwise untroubled horizon of spending growth despite ROI compression) argues for taking decisive action.  A vendor could develop an operations-savings approach that would at least mitigate the problem of loss of differentiation.  For example, they could develop their own models to link their lower-level management systems to EEO tools, which could then exploit their own differentiation.  As long as their models only enabled these special capabilities and didn’t mandate them, would operators refuse them?  Probably not, and they might even use the features if they were valuable, even at the cost of openness.

Remember too that Verizon and AT&T are emphasizing a shift to white-box products, which means products non-differentiable at the data plane level.  Verizon has also explored the notion of displacing physical routers with software instances, recognizing that hardware acceleration may be required for the hosting.  I think that widespread use of router instances will also require “virtual-wire” partitioning of traffic at L1/L2 to eliminate large L3 aggregation missions that servers are never likely to be able to support efficiently.

I said early on that if vendors did not find a way to secure significant non-capex benefits through SDN and NFV, operators would re-architect networks to reduce spending on switching and routing, and also achieve opex savings through L2/L3 simplification.  I think that’s happening.  I think that everything happening in the network market today demonstrates a need for vendors to push an operations-savings approach, and to take control of the way their own orchestration and management tools integrate with emerging high-level EEO tools.  In fact, I think that vendors have already lost millions by not having this capability, money operators would have spent on infrastructure had the profit compression pressure been relieved by operations savings.  Not losing any more should be a priority.

Exploring the Operations Implications of the Verizon Model

The issue of operationalizing next gen networks and services is critical for operators, and it’s thus fitting to close this week’s review of the Verizon architecture with comments on OSS/BSS integration.  There are two questions to be answered; can the approach deal with the efficiency/agility goals that will have to be met to justify SDN/NFV, and can they accommodate the political divisions over the future of OSS/BSS.

Every Tier One I know is conflicted on the issue of OSS/BSS.  It’s not that the functions themselves are not needed (you have to sell services to make money) but that the functions are wrapped in an architecture that seems in every respect to be a dinosaur.  OTTs who have adopted modern cloud and web principles seem to be a lot more efficient and agile, and thus there’s a camp within each operator who wants to toss the OSS/BSS and remake the functions along web/cloud lines.  On the other hand, every OSS/BSS expert is going to resist this sort of thing (just like router experts resist transformation to SDN) and in any event, changing out your core business systems is always going to present risks.

The Verizon paper introduces the notion of layered service abstractions, starting at the top with an end-to-end retail-driven vision and ending at the bottom with virtual features/functions/devices.  These layers can be used to assemble a spectrum of network-service-operations relationships, and if these relationships are broad enough they could cover all the bases needed for benefit generation and political consensus-building.  Do they?  Let’s use the extreme cases to see, and let’s assume that we could graph the resulting structural relationship between OSS/BSS, independent EEO, and resources as a small-letter “y” whose left, shorter, branch could connect either up top or down lower.

One extreme case is to consider the OSS/BSS system to end with the high-level retail model.  A service is a set of commercially defined functions represented by SLAs.  Those functions, from an OSS/BSS perspective, are atomic—think of them as “virtual devices” if you like.  OSS/BSS systems deploy them, and customer portals and service lifecycle management processes treat the SLA parameters as representing service and resource behavior.

If you charted this in our “y” model, the left bar would join the main bar very close to the top.  EEO, then, would generate the vision of the service that operations systems saw, reducing the role of OSS/BSS to the commercial management of the service and leaving the issues of deployment and lifecycle management to EEO.

The other extreme case is to consider the OSS/BSS system to be responsible for the lowest-level virtual function/device models.  This would open two options for our conceptual structure.  The first would dive the left bar of the “y” downward to touch the resources, and the second would make the “y” into an “l” with a single branch.

The first approach would say that while EEO should be responsible for the deployment and lifecycle management, the OSS/BSS would see the lowest-level virtual devices.  This approach is friendly to current OSS/BSS models and perhaps to the legacy TMF approach, because it retains contact between “resources” and “operations” in its current form.  We have management systems today that operate, somewhat at least, in parallel with operations, and this would perpetuate that.

The second says that it’s fine to have EEOs but they need to be inside the OSS/BSS.  That component then has a linear relationship with all the model layers of the Verizon architecture.  This, I think, is the essential model for the TMF ZOOM project, though the details of that architecture aren’t fully open to the public at this point.  It would magnify the role of OSS/BSS, obviously, and preference the OSS/BSS vendors.  It also perpetuates the OSS/BSS and its role.

If you look at these extremes, particularly in terms of my “y” topology, you see that the Verizon approach of layers of models opens the opportunity to connect the OSS/BSS in at any of the modeling layers, meaning that you can slide that left bar up and down.  It also lets operators elect to integrate the functions “above the junction” of the left bar with the OSS/BSS, turning the model into an “I”.

All this capability isn’t automatic, though.  To understand the issues of implementation, let’s move from our “y” model to another one, resembling an “H”.

The left vertical bar of our “H” represents the OSS/BSS flow and visibility, and the right the EEO or incremental orchestration view.  Both these can coexist, but to make sure they don’t end up as ships-in-the-night competitors to management, there has to be a bridge between the domains—the crossbar.  The purpose of this is to establish a kind of visibility bridge—at the model layer where this crossbar is provided, there is a set of processes that convert between the two sets of abstractions that drive the OSS/BSS and EEO flows.  What is above the crossbar is invisible to the other side, and what is below it is harmonized—reflecting the likelihood that the two verticals represent different but necessarily correlated views.

Wherever this crossbar exists, the model for the associated layer has to provide both operations-friendly and orchestration-friendly parameters to represent the status, and that has to include lifecycle state if that state has to be coordinated between EEO and OSS/BSS.  Where the bar is set higher, meaning where the model layers represent more functional and less structural abstractions, the same parameters would likely serve both sides and little work is needed.  If you drive the bar lower, then you encounter a point where it’s desirable to have the operations view composed from the EEO view.

In an operator-defined architecture like Verizon’s, there’s no need to support a range of options for positioning the bar because the operator can make a single choice.  For a general architecture, vendors and operators would have to expect that the modeling at every layer provide for the coequal viewing of OSS/BSS and EEO elements, and support the necessary parametric derivations—both in read and write terms—to provide that capability.

The bidirectionality of the bar is important because it illustrates the fact that there are two parallel paths—operations and orchestration—and that these can be harmonized in part by “exporting” functionality from one to the other.  This is the answer to the political dilemma that OSS/BSS modernization seems to pose, because if you use the bar to shunt orchestration into OSS/BSS you “modernize” it, and if you use the bar to move OSS/BSS functions (by making them orchestrable) into the orchestration side, you essentially replace the current OSS/BSS concept.

I don’t see specific evidence of this “bidirectional bar” in the Verizon approach, but I think that all of the six vendors with current full-spectrum NFV capability could provide it.  It will be interesting to see if the emergence of carrier-developed models (like those of AT&T and Verizon) will raise recognition of the multiplicity of possible OSS/BSS-to-orchestration relationships, and create some momentum for a solution that can accommodate more, or even all, the options.

The Implications and Impacts of Verizon’s End-to-End Hierarchical Modeling

It has always been my view that NFV would be better and more efficient if there were a common modeling approach from the top layer of services to the bottom layer of infrastructure.  I still feel that way, but I have serious doubts on whether such a happy situation can now arise.  The service-centric advance to NFV now seems the only path, and that advance almost guarantees a multiplicity of modeling approaches.  They might be harmonized, though, by adopting some of the principles outlined in the Verizon paper I’ve blogged about, and that’s the topic of the day.

A model is an abstraction that represents a complex interior configuration.  In NFV, the decomposition of a model into that internal complexity is the responsibility of the orchestration process.  Everywhere you have a model, you have an orchestrator, and of course of you have different modeling approaches then you have multiple orchestrators.  I’ve always felt that introduced complexity and inefficiency, which is why I favored a single one—but we’ve already noted that’s probably no longer practical.

A generalized NFV architecture should, and likely would have, contained a single modeling/orchestration implementation, but the ETSI work hasn’t defined all the layers and only a few (six) vendors have a unified architecture to date.  Further, there’s been little (well, let’s be honest, no) progress toward a full-scope NFV business case.  That’s what got us on a service-driven path, and service-driven approaches rarely develop holistic modeling/orchestration visions.  That’s because individual services don’t expose the full set of issues.

There are two pieces of good news in this.  First, most service-driven NFV starts at the bottom and goes no higher than necessary—which isn’t very far.  Second, a proper modeling and orchestration approach at a higher level can envelope and harmonize the layers below even if they’re not based on the same approach.  This is one of the features of Verizon’s End-to-End Orchestration or EEO approach, but it also applies down deeper, and it’s the basis for coercing order from service-specific NFV chaos.  But it’s not without effort and issues.

Let’s suppose we have a “classic ISG” implementation of NFV, which means that we have some sort of model that represents the VNF deployment requirements of a service, which means we have individual VNFs and the “forwarding graph” that somehow connects them.  This combination, represented using something like YANG, represents a specific model at a low level.  It’s the sort of thing you might find in a vCPE business service deployment.

Now let’s suppose we want to incorporate this in a broader service vision, one that for example includes some legacy service elements that the first model/orchestration didn’t support.  We could add a new layer of model/orchestration above the first.  If our first model is M0 we could call this second one M1 to show its relationship.  The M1 model would have to be able to properly decompose requests for legacy provisioning, and it would have to be able to recognize a model element that represented an M0 structure and decompose that structure into a request to pass a low-level model to the appropriate low-level orchestrator.  This is a hierarchical decomposition—one model can reference another as an interior element.

In my example, I assumed that the M1 model had orchestration that would directly process legacy deployments, but you could just as easily have had a second M0 level, this one for legacy, and had the M1 level reference the models of either of the two options below.  Thus, even if you assumed that you had two different ways of implementing NFV (deploying VNFs) you could still envelope them both in a higher-level model, as long as each of the two options below could be identified.  Either give them a different model element, or have the decomposition logic determine which of the two was needed.

What this shows is that layered modeling and orchestration can accomplish all sorts of useful stuff, including harmonizing different implementations or addressing the deployment of things that a given model doesn’t include/support.  And it can be carried on to any number of layers, meaning that you could orchestrate a dozen different model layers.  This (sorry to beat a dead horse here!) is another reason I liked the idea of a single model/orchestration approach.  It would have let us decompose a model using recursive processing by a single piece of software.  But, onward!

Verizon’s paper calls for two layers of service modeling, one representing the retail view of the service as it might be seen by a customer or the OSS/BSS, and the second representing the input to a connection controller (SDN controller with legacy capability).  I think it would be helpful to generalize this to allow any number of layers, and to recognize that each “leaf/branch” on the tree of a service hierarchy would pass from service-abstract to resource-abstract in its own way at its own time, subject to the higher-layer service/process synchronization needed to insure pieces get set up in the right order (which Verizon’s paper includes as a requirement).

How about Verizon’s EEO?  The Verizon paper has an interesting point in its section on E2E service descriptors:

Apart from some work in ETSI/NFV, which will be discussed below, there has not been much progress in the industry on standardizing EENSDs. That is not considered an impediment, for the following reasons:

  1. EEO functionality and sophistication will improve over time
  2. Operators can start using EEO solutions in the absence of standard EENSDs

Since an EENSD is essentially a service model, what Verizon is saying is that you could hope that the industry would converge on a standard approach there, but if it didn’t operators could still use proprietary or service-specific EEO strategies.  True, but they could also simply overlay their “EEO” models with a “super-EEO” model and harmonize that way.

I said earlier that this wasn’t necessarily a slam-dunk approach, and the reason should be obvious.  If a given Mx is to superset a lower-level model then the decomposition to that lower-level model and the invocation of its orchestration process has to be incorporated into the modeling/orchestration at the Mx level.  Somebody would have to write the code to do this, and even if we assume that the orchestrator at our Mx level is open-source, there’s still work to be done.  If it isn’t, then only the owner of the software could do the modification unless there was a kind of plug-in hook mechanism provided.

To make this kind of model-accommodation easier, the first requirement is that all modeling approaches provide the documentation (and if needed, licensing) to allow their model to be enveloped in one at a higher layer.  The second requirement is that any modeling layer have that plug-in hook or open-source structure such that it can be expanded to include the decomposition of new lower-level models.

All of this could be accomplished in two broad ways.  First, any of the six vendors with a comprehensive implantation could focus on “de-siloing” and service harmonization in their development and positioning.  Second, some standards group or open-source activity could address it as an explicit goal.  I think AT&T and Verizon have both made the goal implicit in their announced approaches, but real progress here is going to depend on somebody picking up the standard of harmonization and making a commitment.

A final interesting point is that this approach appears to offer an opportunity to offer “modeling-and-orchestration-as-a-service”.  Higher level models could be linked to cloud service portals, passing off lower-level provisioning and lifecycle management to operators’ own implementations.  This could create a whole new set of NFV opportunities and competition among model providers could move the whole service-first approach ahead, to the benefit of all.

Lessons from Taking a Service-Inward View of NFV

Getting closer to the buyer and to the dollars is always good advice in positioning a product or service.  For network operators, that means looking at what services they sell, and for network operators reviewing the potential of SDN/NFV, it means looking at how these new technologies can improve their services.  But “services” doesn’t necessarily mean “all services.”  In my last two blogs, I used a combination of operator comments to me on their view of NFV’s value and future, and Verizon’s SDN/NFV architecture paper, to suggest that operators were looking at NFV now mostly in a service-specific sense.  Holistic NFV, then, could arise as an almost-accidental byproduct of the sum of the service-specific deployments.

One question this raises is just how far “holistic NFV” can go, given that early projects might tend to wipe low-hanging benefits off the table.  Another is whether silos of NFV solutions, per service, might dilute the whole holistic notion.  I don’t propose to address either of these at this point, largely because I’ve talked about these problems in prior blogs.  What I want to do instead is look at what “service-driven NFV” might look like.

Service-driven NFV has to start with service objectives, first and foremost.  There is relatively little credibility for capex reduction as an NFV driver overall, and I think less in the case of service-driven NFV.  Few “services” offer broad opportunities to reduce capex, broad enough to impact the bottom line and justify taking some risks.  There are some credible opex benefits that might be attained on a per-service basis, but again the issue of breadth comes in.  In addition, there’s a risk that specialized NFV operations within a single service could create islands of opex practices that would end up being confusing and inefficient.  That means revenue-side, or “service agility” benefits would have to be the key.

That’s a conclusion consistent with both my operator survey and the content of the Verizon paper, I think.  Operators told me they liked mobile services and business services as NFV targets, and their specific comments focused on portal provisioning and customer care, agile deployment of incremental managed service features, etc.  The big focus in Verizon seems to be the same, with the specific adjustment that “mobile” probably means “5G”.  In fact, about half of the over-200 page Verizon paper is devoted to mobile issues and applications.

Everyone these days thinks “services” mean “portals”, and that’s true at one level.  You need to have self-service user interfaces to improve agility or the operator’s customer service processes are just delays and overhead.  However, a portal is a means of activation and presentation, and that’s all it is.  You still need to have something to process the activations and to generate what you propose to present.

If customers are going to have a portal that provides them both service lifecycle information and the ability to make changes to services or add new ones, then the critical requirement is to have a retail representation of a service that can be decomposed into infrastructure management, including the deployment of virtual functions (NFV) and the creation of ad hoc network connectivity (SDN).  In the Verizon paper this is accomplished through a series of hierarchical models.  There is a model that represents the service as a portal or operations system would offer it—the retail vision.  Another model represents the connectivity options available from the underlying infrastructure, and yet another the “abstract devices” that create the connectivity.  The models build up to or decompose from (depending your perspective) their neighbors.

Implicitly, the service-driven vision of NFV would start with the creation of this model hierarchy.  The retail presentation (analogous to the TMF’s “Product”) would decompose into functional elements (TMF Customer-Facing Services?) and then into the infrastructure connectivity elements (TMF Resource-Facing Services?) that would be built from the abstract devices.  To make this process amenable to portal-driven ordering, you’d need the model hierarchy to define the service based on all of the possible infrastructure options that might be associated with a given order, meaning that the model would have to support selective decomposition based on a combination of the order parameters and the service-versus-network topology-to-infrastructure relationships.  A portal order could then initiate a selective decomposition of the model, ending in a deployment.

Automated deployment doesn’t address all the issues, of course.  A user who depends on a portal for service orders and changes is likely to depend on it for service status as well.  Thus, it’s reasonable to assume that the retail model of the service defines parameters on which the service is judged by the buyer—the SLA.  The model would then have to create a connection between these parameters and the actual state of the service, either by providing a specific derivation of SLA statistics from lower-level model statistics, or by doing some sort of status query on an analytics database.

Both the deployment and lifecycle management activities associated with service-driven NFV pose a risk because a “service” may not expose the full set of requirements for either step, and thus new services might not fit into current models of NFV as operators seek to expand their NVF story.  Put another way, each service could end up being a one-off, sharing little in the way of software with others.  It’s even possible that early services would not develop generally reusable resource pools.

vCPE is a good example of this.  There is no question that the best model for managed service deployment would be an agile CPE device on which feature elements could be loaded.  Every model I’ve run on this suggests that it would always beat a cloud-hosted model of deployment for business service targets, which is where most operators want to use it.  Obviously, though, CPE resources to host VNFs wouldn’t be generally helpful in building a resource pool.  Less obvious is the fact that software to deploy VNFs on CPE wouldn’t have to consider all the issues of general pooled-resource infrastructure.  There’s no selection optimization, no connection of features through the network, and the derivation of management data is much easier (the CPE device knows the status of everything).

A general solution to service-specific deployments via evolving SDN/NFV technology could be created by expanding the OSS/BSS role, but Verizon’s architecture seems to focus on the opposite—containing that role.  Operators seem to think that pushing more details of SDN/NFV into the OSS/BSS is a bad move overall, and the TMF has yet to publish an open model that supports the necessary absorption.  At this point, I think that’s a dead issue, which is why I appended the question-marks on TMF references earlier.

A final consideration in a service-driven model of NFV is whether new services might be a major contributor.  Even in mobile NFV, Verizon and other operators seem to think that it would be difficult to drive NFV without a broader mobile change, like 5G, to help bear the cost and justify the disruption.  That suggests that a green-field service would be even more helpful.  IoT is the obvious one, and Nokia has recently suggested it thinks that some medical applications (which could be considered a subset of IoT) could also drive network change.  However, focusing on NFV as a platform for new services would be facilitated if there were a generic NFV model to build on, and getting that model in place may be more than new services can justify.

What We Can Learn from Verizon’s SDN/NFV Paper

Verizon has just released a white paper on its SDN/NFV strategy, developed with the help of a number of major vendors, and the paper exposes a number of interesting insights into Tier One next-gen network planning.  Some are more detailed discussions of things Verizon has revealed in the past, and some new and interesting.  This document is over 200 pages long and far too complicated to analyze here, so I’m going to focus on the high-level stuff, and along the way make some comments on the approach.  Obviously Verizon can build their network the way they want; I’m only suggesting places where I think they might change their minds later on.

The key point of the paper, I think, is that Verizon is targeting improvements in operations efficiency and service agility, which means that they’ve moved decisively away from the view that either SDN or NFV are primarily ways of saving money on infrastructure.  This is completely consistent with what operators globally have told me for the last several years; capex simply won’t deliver the benefits needed to transform the business.  And, may I add, business transformation is an explicit Verizon goal, and they target it with both SDN and NFV.

On the SDN side, Verizon is particularly focused on the independence of the control and data planes (“media planes”, as Verizon puts it, reflecting the increased focus on video delivery).  This is interesting because it validates the purist SDN-controller-and-OpenFlow model of SDN over models that leverage software control of current switches and routers.  They also say that they are expecting to use white-box products for the switches/routers in their network, but note here that “white box” means low-feature commodity products and not necessarily products from startups or from non-incumbent network vendors.  It would be up to those vendors to decide if they wanted to get into the switch game with Verizon at the expense of putting their legacy product revenue streams at risk.

On the NFV side, things are a bit less explicit at the high level.  Verizon recognizes the basic mission of NFV as that of decoupling functional software from specific appliances to allow its hosting on server pools.

One of the reasons why the NFV mission is lightweight at the high level, I think, is that Verizon includes an End to End Orchestration layer in its architecture, sitting above both NFV MANO and the SDN controller(s).  This layer is also responsible for coordinating the behavior of legacy network elements that make up parts of the service, and it demonstrates how critical it is to support current technology and even new deployments of legacy technology in SDN/NFV evolution.

Verizon also makes an interesting point regarding SDN and NFV, in the orchestration context.  NFV, they point out, is responsible for deploying virtual functions without knowing what they do—the VNFs appear as equivalent to physical devices in their vision.  Their WAN SDN Controller, in contrast, knows that a function does but doesn’t know whether it’s virtualized or not.  SDN controllers then control both virtual and physical forms of “white box” switches.

One reason for this approach is that Verizon wants an architecture that evolves to SDN/NFV based on benefits, often service-specific missions.  That relates well with the point I made in yesterday’s blog about operators looking more at SDN or NFV as a service solution than as an abstract architecture goal.  All of this magnifies the role of the service model, which in Verizon’s architecture is explicitly introduced in three places.  First, as how an OSS/BSS sees the service, which presumably is a retail-level view.  Second, as the way of describing resource-behavior-driven cooperative service elements, and third (in the form of what Verizon calls “device models”) as an abstraction for the functional units that can be mapped either to VNFs or physical network functions (PNFs).  End-to-End Orchestration (EEO) then manages the connection of models and doesn’t have to worry about the details of how each model is realized.  This is a firm vote in favor of multi-layer, divided, orchestration.

Management in the VNF sense is accommodated in the Service Assurance function, which Verizon says “collects alarm and monitoring data. Applications within SA or interfacing with SA can then use this data for fault correlation, root cause analysis, service impact analysis, SLA management, security monitoring and analytics, etc.”  This appears to be a vote for a repository model for management.  However, they don’t seem to include the EMS data for legacy elements in the repository, which I think reflects their view that how a function is realized (and thus how it is managed) is abstracted in their model.

I do have concerns about these last two points.  I think that a unified modeling approach to services is both possible and advantageous, and I think that all management information should be collected in the same way to facilitate a unified model and consistent service automation.  It may be that Verizon is recognizing that no such unified model has emerged in the space, and thus are simply accommodating the inevitability of multiple implementations.

An interesting feature of the architecture on the SDN side is the fact that Verizon has three separate SDN controller domains—access, WAN, and data center.  This, I think, is also an accommodation to the state of SDN (and NFV) progress, because a truly powerful SDN domain concept (and a related one for NFV) would support any arbitrary hierarchy of control and orchestration.  Verizon seems to be laying out its basic needs to help limit the scope of integration needed.  EEO is then responsible for harmonizing the behavior of all the domains involved in a service—including SDN, NFV, and legacy devices.

Another area where I have some concerns is in the infrastructure and virtualization piece of the architecture.  I couldn’t find an explicit statement that the architecture would support multiple infrastructure managers other than that both virtual and physical infrastructure managers are required.  But does this multiplicity also extend within each category?  If not, then it may be difficult to accommodate multi-vendor solutions given that we already have proprietary management in the physical network device sense, and that the ETSI specs aren’t detailed enough to insure that a single VIM could manage anyone’s infrastructure.

My management questions continue in the VNF Manager space.  Verizon’s statement is that “Most of the VNF Manager functions are assumed to be generic common functions applicable to any type of VNF. However, the NFV-MANO architectural framework needs to also support cases where VNF instances need specific functionality for their lifecycle management, and such functionality may be specified in the VNF Package.”  This allows an arbitrary split model of VNF management, particularly given that there are no specifications for how “generic functions” are defined or how VNF providers can support them.  It would seem that vendors could easily spin most management into something VNF-specific, which could then complicate integration and interchangeability goals.

EEO is the critical element of the architecture overall.  According to the document, “The objective of EEO is to realize zero-touch provisioning: a service instantiation request — from the operations crew or from the customer, through a self-service portal – results in an automatically executed work flow that triggers VNF instantiation, connectivity establishment and service activation.”  This appears to define a model where functions are assembled to create a service, and then lifecycle management for each function is expected to keep the service in an operating state.  However, Service Assurance interfaces with EEO to respond to SLA failures, which seems to create the potential for multi-level responses to problems that would then have to be organized through fault correlation or response analysis.  All of that could be handled through policy definition and distribution, which Verizon’s architecture also requires.

The final interesting element in the Verizon paper is its statement on Intent-Based Networking (IBN), which is their way of talking about intent modeling and the ONF initiatives in that direction.  The paper makes it clear that Verizon sees IBN as a future approach to “populized” network control rather than as a specific principle of the current architecture, but on the other hand their models (already referenced above) seem to apply intent-based principles throughout their architecture.  It may be that Verizon is using the term “IBN” to refer only to the evolving ONF initiatives, and that it expects to use intent principles in all its model layers.

The most important thing that comes out of the Verizon document, in my view, is that neither the current ONF nor NFV ISG work is sufficient to define an architecture for the deployment of even SDN and NFV (respectively) much less for the deployment of a service.  Integration testing, product assessment, and even SDN/NFV transformation planning need to look a lot further afield to be useful, and that’s going to involve in many cases making up rules rather than identifying standards.  This means, IMHO, that Verizon is not only willing but determined to move beyond the current processes and make their own way.

For players like Ericsson, this could be good news because if every operator follows the Verizon lead and defines their own next-gen architecture, there will be considerable integration work created.  That might diminish in the long term if standards bodies and open-source initiatives start to harmonize the implementation of SDN and NFV and incorporate the required higher-level concepts.

The six vendors I’ve identified as being capable of supporting a complete NFV business case could also learn something from Verizon’s paper.  One clear lesson is that a failure to develop a total benefit picture in positioning, which I think all six vendors have been guilty of, has already exacted a price.  I don’t think operators, including Verizon, would have gone so far in self-integration if they’d had a satisfactory strategy offered in productized form.  However, all six of the key NFV vendors can make the Verizon model work.  Who makes it work best?  I think Nokia wins that one.  I know one of the key Nokia contributors to the Verizon paper, and he’s both an IMS and federation expert.  And, no matter what Verizon may feel about vCPE, it is clear to me that their broad deployment of NFV will start with mobile/5G.

Overall, Verizon’s paper proves a point I discussed yesterday—vendors who want to succeed with NFV will need to have a strong service-based story that resonates with each operator’s market, and a broad architecture that covers all the bases from operations to legacy infrastructure.  Verizon has clearly taken some steps to open up the field to include many different vendors, but most operators will have to rely on a single vendor to at least get the process started, and everyone is going to want to be that vendor.  The bar has been high for that position from the first, and it’s getting higher every day.

What Buyers Think about NFV and the Cloud

I got back from a holiday to a flood of data from both enterprises and network operators/service providers—the former group focusing on cloud and network service views, and the latter group focusing on NFV.  Because all the data is so interesting I thought it was important to get it into a blog ASAP, so here we go!

Let’s start with my last group, the operators/providers.  The issue I was working on was the expectations for NFV in 2016, now that we’re a third of the day through the year.  I got responses from key people in the CIO, CFO, and CTO areas, and I had some surprises—perhaps even a few big ones.

The biggest surprise was that all three groups said they were more sure now that they would deploy some NFV infrastructure in 2016 than they had been at the start of the year.  Nearly five out of six operators responded that way, which is a pretty optimistic view of NFV progress.  What was particularly interesting was that the three key groups all responded about the same way, a sharp departure from last year when CFOs weren’t convinced there was any future in NFV at all.

The second-biggest surprise, perhaps related to the first, was that the amount of NFV infrastructure expected to be deployed was almost universally minimal.  None of the operators said they believed that NFV spending would reach 3% of capital spending even in 2017.  This suggests that operators weren’t rushing to NFV, but perhaps waving earnestly in its direction.

The reason for the limited commitment expected even in 2017 is complicated, and you could approach it in two ways—what makes up the commitment and what has impacted the planning.  Let’s take the first of these first.

There are two paths of NFV that are considered viable by all the key operator constituencies—NFV targeting business customers with premises-hosted virtual CPE, and NFV targeting mobile infrastructure.  The first of these opportunities is not seen as a driver of large-scale systemic NFV at all, but rather a way of addressing the needs of businesses better, particularly through accelerated order cycles, portals for service orders and changes, etc.  The second is seen as potentially ground-shaking in terms of impact, but NFV in mobile infrastructure is very complicated and risky, in operators’ eyes, and thus they expect to dabble a bit before making a major investment.

Add these together and you can see what operators are seeing in 2016 and 2017.  vCPE is going to happen, but nobody thinks that it’s going to shake their spending plans except perhaps MSPs who lease actual service from another operator and supplement it with CPE and management features.  Mobile applications could be very big indeed, but that very bigness means it’s going to happen at a carefully considered pace.

If all the “Cs” in the operator pantheon are in accord on these basic points, they still differ on the next one, which is the factors that got them to where they are.  The CIO and CFO organizations feel that a business case has been made for vCPE—enough said.  They also feel that there’s enough meat in mobile NFV to justify real exploration of the potential, including field trials.  But they do not believe that a broad NFV case has been made, and in fact these two groups believe that no such broad case for NFV can be made based on current technology.  The CTO, perhaps not surprisingly, thinks that NFV’s broad impact is clear and that it’s just a matter of realizing its potential.

Everyone is back in sync when it comes to what might realize NFV potential—open-source software.  In fact, the CIO and CFO are starting to think that their transformation is more generally about open-source use than about NFV in particular.  This, perhaps, is due to the fact that the CTO organizations have shifted their focus to open-source projects aiming at NFV software, from the standards process that’s still the titular basis for NFV.  Getting all the pieces of NFV turns out to involve getting a lot of related open-source stuff that can be applied to NFV, but also to other things.

Everyone has their own example of this.  The CIOs are starting to see the possibility of having OSS/BSS transformation based more on open-source tools—or at least those who are interested in OSS/BSS transformation.  It’s clear from the responses I got that CIOs are split on just how much the fundamentals of their software needs to change.  A slight majority still see their job as being simply the accommodating of SDN/NFV changes, but it seems more and more CIOs think a major change in operations software is needed.

There’s also more harmony of views than I’d expected on just how far NFV will go in the long run and what will drive it there.  Only about a quarter of CIO/CFO/CTO responses suggest that systemic infrastructure change will drive broad adoption of NFV.  The remainder say that it will be service-specific, and the service that has the most chance of driving broad deployment is mobile.  Those operators with large mobile service bases think they’ll be adopting NFV for many mobile missions, and this will then position more hosting infrastructure for other services to exploit.  Those without mobile infrastructure see NFV mostly as a business service agility tool, as I’ve said.

My impression from this is that operators have accepted a more limited mission for NFV, one that’s driven by specific service opportunities, over the notion of broad infrastructure transformation.  That doesn’t mean that we wouldn’t get to the optimum deployment of NFV, but it does suggest that getting there will be a matter of a longer cycle of evolution rather than an aggressive deployment decision.  The fact that the CIOs are not united on a course for OSS/BSS transformation seems the largest factor in this; you need those opex benefits to justify aggressive NFV roll-out.

Services that prove a broad benefit case—that’s the bottom line for NFV.

On the cloud side, enterprises’ views are related to media coverage of cloud adoption, but only just related.  Nearly all enterprises and about a third of mid-sized businesses report they use cloud services directly (not just hosting, but real cloud deployment of their apps).  Only one enterprise, a smaller one, said they had committed more than 15% of IT spending to the cloud, and none of the enterprises expected to shift more than 30%.  The cloud will be pervasive, but not decisive.

The problem seems to be that enterprises still visualize the sole benefit of cloud computing to be lower costs, in some sense at least, and that the cloud will host what they already run.  Given this picture, it’s not surprising that both Microsoft and IBM reported that while cloud sales have grown for them, it hasn’t replaced losses on the traditional IT side.  Users who adopt the cloud because it’s cheaper will always spend less.  The only way to get out of that box is to unlock new benefits with new cloud-specific techniques, but users have almost zero understanding of that potential.  Part of that is due to lack of vendor support for new productivity-benefit-driven cloud missions, and part to the fact that current enterprise IT management has grown up in a cost-driven planning age and can’t make the transition to productivity benefits easily.

There’s one common thread between the operator NFV and enterprise cloud stories, and that’s over-expectation.  Well over 90% of both groups say that they’ve had to work hard to counter expectations and perspectives that just didn’t jive with the real technology the market was presenting or the realistic benefits that technology could generate.  We live in an ad-sponsored age, where virtually everything you read is there because some seller has paid for it in some way.  That’s obviously going to promote over-positioning, and while the results aren’t fatal for either technology, I think it’s clear that NFV and the cloud would have progressed (and would continue to progress) further and faster if buyers could get a realistic view of what can be expected.

The Best Approach to SDN and NFV isn’t from ETSI or Open-Something, but From the MEF

I had a very interesting talk with the MEF and with their new CTO (Pascal Menezes), covering their “Third Network”, “Lifecycle Service Orchestration” and other things.  If you’ve read my stuff before, you know that there are many aspects of their strategy that I think are insightful, even compelling.  I’m even more sure about that after my call with them, and also more certain that they intend to exploit their approach fully.  I’m hoping that they can.

The Third Network notion comes because we have two networks today—the Internet which is an everybody-to-everybody open, best-efforts, fabric that mingles everything good and bad about technology or even society, and business connectivity which is more trustworthy, has an SLA, and supports an explicit and presumably trusted community.  One network will let me reach anything, do anything, in grand disorder. The other orders things and with that order comes a massive inertia that limits what I can do.

We live in a world where consumerism dominates more and more of technology, and where consumer marketing dominates even businesses.  An opportunity arises in moments, and is lost perhaps in days.  In the MEF vision (though not explicitly in their positioning) the over-the-top players (OTTs) have succeeded and threatened operators because the OTTs could use the everything-everywhere structure of the Internet to deliver service responses to new opportunities before operators could even schedule a meeting of all their stakeholders.

The operators can hardly abandon their networks, for all the obvious reasons.  They need to somehow adapt their network processes to something closer to market speed.  I think that the MEF concept of the Third Network reflects that goal nicely in a positioning sense.

At a technical level, it could look even better.  Suppose we take MEF slides at the high level as the model—the Third Network is an interconnection of provider facilities at Level 2 that creates a global fabric that rivals the Internet in reach without its connectivity promiscuity and its QoS disorder.  If you were to build services on the Third Network you could in theory have any arbitrary balance of Internet-like and Carrier-Ethernet-like properties and costs.  You could establish Network-as-a-Service (NaaS) in a meaningful sense.

In my view the obvious, logical, simple, and compelling architecture is to use the Third Network as the foundation for a set of overlay networks.  Call them Nicira-SDN-like, or call them tunnels, or virtual wires, or even SD-WANs.  Tunnels would create a service framework independent of the underlayment, which is important because we know that L2 connectivity isn’t agile or scalable on the level of the Internet.  The point is that these networks would use hosted nodal functionality combined with overlay tunnels to create any number of arbitrary connection networks on top of the Ethernet underlayment.  This model isn’t explicit in the MEF slides but their CTO says it’s their long-term goal.

A combination of overlay/underlay and an interconnected-metro model of the network of the future would be in my view incredibly insightful, and if it could be promoted effectively, it could be a revolution.  The MEF is the only body that seems to be articulating this model, and that makes them a player in next-gen infrastructure in itself.

What’s needed to make this happen?  The answer is two things, and two things only.  One is a public interconnection of Level 2 networks to create the underlayment.  The other is a place to host the nodal features needed to link the tunnels into virtual services.  We can host features at the user edge if needed, and we know how to do network-to-network interfaces (NNIs).  The operators could field both these things if they liked, but so could (and do, by the way) third parties like MSPs.

What would make this notion more valuable?  The answer is “the ability to provide distributed hosting for nodal functionality and other features”.  Thus, philosophically above our Third Network connection fabric would be a tightly coupled cloud fabric in which we could deploy whatever is needed to link tunnels into services and whatever features might be useful to supplement the basic connectivity models we can provide that way.  These, classically, are “LINE”, “LAN”, and “TREE”, which the MEF recognizes explicitly, as well as ACCESS and NNI.

If the Third Network is going to provide QoS, then it needs to support classes of service in the L2 underlayment, and be able to route tunnels for services onto the proper CoS.  If it’s going to provide security then it has to be sure that tunnels don’t interfere or cross-connect with each other, and that a node that establishes/connects tunnels doesn’t get hacked or doesn’t create interfering requests for resources.  All of that is well within the state of the art.  It also has to be able to support the deployment of nodes that can concentrate tunnel traffic internally to the network for efficiency, and also to host features beyond tunnel cross-connect if they’re useful.

You don’t need either SDN or NFV for this.  You can build this kind of structure today with today’s technology, probably at little incremental cost.  That to my view is the beauty of the Third Network.  If, over time, the needs of all those tunnels whizzing around and all that functionality hunkering down on hosting points can be met better with SDN or NFV, or cheaper with them, or both—then you justify an evolution.

What you do need in the near term is a means of orchestrating and managing the new overlay services.  Lifecycle Service Orchestration (LSO) is the MEF lifecycle process manager, but here I think they may have sunk themselves too far into the details.  Yes it is true that tunnels will have to be supported over legacy infrastructure (L2 in various forms, IP/MPLS in various forms), SDN, and NFV.  However, that should be only the bottom layer.  You need a mechanism for service-level orchestration because you’ve just created a service overlay independent of the real network.

The details of LSO are hard to pull from a slide deck, but it appears that it’s expected to act as a kind of overmind to the lower-level management and orchestration processes of NMS, SDN, and NFV.  If we presumed that there was a formal specification for the tunnel-management nodes that could be resident in the network (hosted in the cloud fabric) or distributed to endpoints (vCPE) then we could say this is a reasonable presentation of features.  The slides don’t show that, and in fact don’t show the specific elements for an overlay network—those tunnel-management nodes.

It all comes down to this, in my view.  If the MEF’s Third Network vision is that of an overlay network on top of a new global L2 infrastructure, then they need tunnel-management nodes and they need to orchestrate them at least as much as the stuff below (again, they assure me that this is coming).  You could simply let CoS do what’s needed, if you wanted minimalist work.  If they don’t define those tunnel-management nodes and don’t orchestrate them with LSO, then I think the whole Third Network thing starts to look like slideware.

The Third Network’s real state has special relevance in the seemingly endless battle over the business case for network evolution.  In my own view, the Third Network is a way of getting operators close to the model of future services that they need, without major fork-lift modernization or undue risk.  It could even be somewhat modular in terms of application to services and geographies.  Finally, it would potentially not only accommodate SDN and NFV but facilitate them—should it succeed.  If the Third Network fails, or succeeds only as a limited interconnect model, then operators will inevitably have to do something in the near term, and what they do might not lead as easily to SDN and NFV adoption.

This could be big, but as I’ve noted already the model isn’t really supported in detail by the MEF slideware, and in fact I had to have an email exchange with the CTO to get clarifications (particularly on the overlay model and overlay orchestration) to satisfy my requirement for written validation of claims.  He was happy to do that, and I think the MEF’s direction here is clear, but the current details are sparse because the long term is still a work in progress.

The MEF is working to re-invent itself, to find a mission for L2 and metro in an age that seems obsessed with virtualizing and orchestrating.  Forums compete just like vendors do, after all, and the results of some of this competition are fairly cynical.  I think that the MEF has responded to the media power of SDN and NFV, for example, by featuring those technologies in its Third Network, when the power of that approach is that it doesn’t rely on either, but could exploit both.  Their risk now lies in posturing too much and addressing too little, of slowing development of their critical and insightful overlay/underlay value proposition to blow kisses at technologies that are getting better ink.  There’s no time for that.

Whether the foundation of the Third Network was forum-competition opportunism or market-opportunity-realization is something we may never know, but frankly it would matter only if the outcome was questionable.  I’m more convinced than ever that the MEF is really on to something with the Third Network.  I hope they take it along the path they’ve indicated.

How to Get NFV On Track

You can certainly tell from the media coverage that progress on NFV isn’t living up to press expectations.  That’s not surprising on two fronts; first, press expectations border on an instant gratification fetish that nothing could live up to, and second that transformation of a three-trillion-dollar industry with average capital cycles of almost six years won’t happen overnight.  The interesting thing is that many operators were just as surprised as the press has been at the slow progress.  Knowing more about their perceptions might be a great step to getting NFV going, so let’s look at the views and the issues behind them.

In my recent exchanges with network operator CFO organizations, I found that almost 90% said that NFV was progressing more slowly than they had hoped.  That means that senior management in the operator space had really been committed to the idea that NFV could solve their declining profit-per-bit problems before the critical 2017 point when the figure falls low enough to compromise further investment.  They’re now concerned it won’t meet their goals.

Second point:  The same CFO organizations said that their perception was that NFV progress was slower now than in the first year NFV was launched (2013).  That means that senior management doesn’t think that NFV is moving as fast as it was, which means that as an activity it’s not closing the gap to achieving its goals.

Third point:  Even in the organizations that have been responsible for NFV development and testing, nearly three out of four say that progress has slowed and that they are less confident that “significant progress” is being made on building broad benefit case.

Final point: Operators are now betting more on open-source software and operator-driven projects than on standards and products from vendors.  Those CFO organizations said that they did not believe they would deploy NFV based on a vendor’s approach, but would instead be deploying a completely open solution.  How many?  One hundred percent.  The number was almost the same for the technologists who had driven the process.  Operators have a new horse to ride.

I’m obviously reporting a negative here, which many vendors (and some of my clients) won’t like.  Some people who read my blog have contacted me to ask why I’m “against” NFV, which I find ironic because I’ve been working to make it succeed for longer than the ETSI ISG even existed.  Further, I’ve always said (and I’ll say again here and now) that I firmly believe that a broad business case can be made for NFV deployment.  I’ve even named six vendors who can make it with their own product sets.  But you can’t fix a problem you refuse to acknowledge.  I want to fix it and so I want to acknowledge it.

The first problem was that the ETSI ISG process was an accommodation to regulatory barriers to operators working with each other to develop stuff.  I’ve run into this before; in one case operator legal departments literally said they’d close down an activity because it would be viewed as regulatory collusion as it was being run.  The collusion issue was fixed by absorption into another body (dominated by vendors) but the body never recovered its relevance.  That also happened with NFV, run inside ETSI and eventually dominated by vendors.

The second problem was that standards in a traditional sense are a poor way to define what has to be a software structure.  Software design principles are well-established; every business lives or dies on successful software after all.  These principles have to be applied by a software design process, populated by software architects.  That didn’t happen, and so we have what’s increasingly a detailed software design created indirectly and without any regard for what makes software open, agile, efficient, or even workable.

The third problem was that you can’t boil the ocean, and so early NFV work focused on two small issues—did the specific notion of deploying VNFs to create services work at the technical level, and could that be proved for a “case study”.  Technical viability should never have been questioned at all because we already had proof from commercial public cloud computing that it did work.  Case studies are helpful, but only if they represent a microcosm of the broad targets and goal sets involved in the business case.  There was never an attempt to define that broad business case, and so the case studies turned into proofs of concept that were totally service-specific.  No single service can drive infrastructure change on a broad scale.

All of this is what’s generated the seemingly ever-expanding number of “open” or “open-source” initiatives.  We have OPNFV, ONOS, OSM, OPEN-O, and operator initiatives like ECOMP from AT&T.  In addition, nearly all the vendors who have NFV solutions say their stuff is at least open, and some say it’s open-source.  The common thread here is that operators are demanding effective implementations, have lost faith that vendors will generate them on their own, and so are working through open-source to do what their legal departments wouldn’t let them do in a standards initiative.

The open-source approach is what should have been done from the first, because in theory it can be driven by software architecture and built to address the requirements first, in a top-down way.  However, software design doesn’t always proceed as it should, and so even this latest initiative could fail to deliver what’s needed.  What’s necessary to make that happen?  That’s our current question.

The goal now, for the operators and for vendors who want NFV to succeed, is to create an open model for NFV implementation and back that model with open-source implementations.  That model has to have these two specific elements:

  1. There must be an intent-model interface that identifies the relationship between the NFV MANO process and OSS/BSS/NMS, and another that defines the “Infrastructure Manager” relationship to MANO.
  2. There must be a Platform-as-a-Service (PaaS) API set that defines the “sandbox” in which all Virtual Network Functions (VNFs) run, and that provide linkage between VNFs and the rest of the NFV software.

There are three elements to NFV.  One is the infrastructure on which stuff is deployed and connected, and this is represented by an infrastructure manager (IM, in my terms, VIM for “virtual” infrastructure manager in the ETSI ISG specs).  One is the management and orchestration component itself, MANO, and one is the VNFs.  The goal is to standardize the functionality of these three things and to control the way they connect among themselves and to the outside.  This is critical in reducing integration issues and providing for open, multi-vendor, implementations.

We can’t simply collect the ETSI material into a set of specs to define my three elements and their connections; the details don’t exist in ETSI material.  This puts anything that’s firmly bound to the ETSI model at risk to being incomplete.  While an open-source implementation could expose and fix the problems, it’s not totally clear that any do (ONOS, CORD, and XOS among the open groups, or ECOMP for operators, seem most likely to be able to do what’s needed.

Vendors have to get behind this process too.  They can do so by accepting the componentization I’ve noted, and by supporting the intent models and PaaS, by simply aligning their own implementations that way.  Yes, it might end up being a pre-standards approach, but the kind of API-and-model structure I’ve noted can be transformed to a different API format without enormous difficulty—it’s done in software so often that there’s a process called an “Adapter Design Pattern” (and some slightly different but related ones too) to describe how it works.  The vendors, then, could adapt to conform to the standards that emerged from the open-source effort.  They could also still innovate in their own model if they wanted, providing they could prove the benefit and providing they still offered a standard approach.

This open approach isn’t essential in making the business case for NFV.  In some respects, it’s an impediment because it will take time for any consensus process to work out an overall architecture that fits (in general) my proposed model.  A single-vendor strategy could do that right now—six of them, in fact.  The problem is that vendors have lost the initiative now, and even if they got smart in their positioning it’s not clear that they could present a proprietary strategy that had compelling benefits.  They need an open model, a provable one.  That’s something that even those six might struggle a bit with; I don’t have enough detail on about half of the six to say for sure that they could open theirs up in a satisfactory way.  All of them will need some VNF PaaS tuning.

I think that it is totally within the capabilities of the various open-source organizations to solve the architecture-model problem and generate relevant specs and APIs, as well as reference implementations.  It is similarly well within vendor capabilities to adopt a general architecture to promote openness—like the one I’ve described here—and to promise to conform to specific standards and APIs as they are defined.  None of this would take very long, and if it were done by the end of the summer (totally feasible IMHO) then we’d remove nearly all the technical barriers to NFV deployment.  Since I think applying the new structure to the business side would also be easy, we’d quickly be able to prove a business case.

Which is why I think this impasse is so totally stupid.  How does this benefit anyone, other than perhaps a few vendors who believe that even if operators end up losing money on every service bit they carry they’ll sustain their spending or even grow it?  A small team of dedicated people could do everything needed here, and we have thousands in the industry supposedly working on it.  That makes no sense if people really want the problem solved.

My purpose here is to tell the truth as I see it, which is that we are threatening a very powerful and useful technology with extinction with no reason other than stubborn refusal to face reality.  NFV can work, and work well, if we’re determined to make that happen.

 

 

Is Ericsson’s NodePrime Deal Even Smarter Than it Looks?

Ericsson has made some pretty smart moves in the past, long before their smartness was obvious to the market.  They may have made another one with their acquisition of NodePrime, an hyperscale data center management company that could be a stepping stone for Ericsson to supremacy in a number of cloud-related markets, including of course IoT.

It seems the theory behind the deal is clear; if IoT or NFV or any other cloud-driven technology is going to succeed on a large scale, then data centers are going to explode.  Thus, dealing with that explosion would be critical in itself, but to make matters worse (or better, from Ericsson’s view) just getting to large-scale success will certainly require enormous efficiency in operationalizing the data center resources as they grow.  Hence, NodePrime.

Data centers don’t lack sources of operations statistics; there are a couple dozen primary statistics points in any given operating system and at least a half-dozen in middleware packages.  In total, one operator found they had 35 sources of data to be analyzed per server and 29 per virtual machine, and then of course there’s the network.  The basic NodePrime model is to acquire, timestamp, and organize all of this into a database that can then be used for problem and systems management and lifecycle management for installed applications.

Hyperscale data centers aren’t necessarily the result of SDN, NFV, or IoT.  While NodePrime positioning calls out that target, they also make it clear that they can manage data centers of any size, which means that they could probably both manage the individual distributed data centers operators are likely to deploy in SDN/NFV/IoT applications, but also the collective virtual data center (that’s a layer in NodePrime, in fact).  The NodePrime model also has three functional dimensions.  You can manage data centers this way.  You can manage service infrastructure this way, and feed the results into something like NFV management, and you could even build an IoT service by collecting sensor data like you collect server statistics.  I’m told that some operators have already looked at all three of these functional dimensions, and that NodePrime had said they support them all.

If we presumed that the management of an application or service was based on the analysis of the resource management data that was committed in support, then any complicated service could have insurmountable management problems.  If we presumed that a smartphone had to query a bunch of traffic sensors directly and analyze the trends and movements to figure a route, the problems are similarly insurmountable.  The fact is that any rational application based on using information has to be designed around an information analysis framework.

A framework has to do three things.  First, it has to gather information from many different sources using many different interfaces.  Second, it has to harmonize the data into a common model and timestamp and contextualize the information, and finally it has to support all the standard analytics and query tools to provide data views.  NodePrime does all of this.

The NodePrime model could represent a management abstraction very easily (datahub, in NodePrime).  The resources at the bottom are collected and correlated, passed into an analytics layer in the middle (directive) and used to create an abstraction of the resource framework that’s useful (virtual datacenter in the current model, but why not “virtual infrastructure?”)  This abstraction could then be mapped to service management, VNF management, and so forth.

It also works for IoT and contextual services.  Collect basic data at the bottom, use queries to generate abstractions that are helpful to users/applications, then expose these abstractions through microservices at the top.  NodePrime supports this too.

Well, sort of does it.  The meat of the value of NodePrime will come from the variety of information resources it can link and the range of analytics it can support.  For SDN, NFV, IoT, and other cloud applications and services, a lot of this is going to be developed by an integrator—like Ericsson.  Ericsson can enrich the capabilities of NodePrime through custom development and specialized professional services, which of course is what it likely wanted all along.

This isn’t the first time that a vendor has come at the notion of a network revolution driven by data centers and not networks.  Brocade had this message as the foundation of their NFV position in 2013 and gained a lot of traction with operators as a result.  They didn’t carry through with enough substance and they gradually lost influence.  Brocade has recently been making some acquisitions of its own, and one in the DevOps space that could arguably be an orthogonal shot at the data center space, because it’s targeting deployment and lifecycle management.

An inside-out vision of network evolution is logical, then, but it’s also a climb.  The further you are from the primary benefit case with a given technology, the longer it takes for you to build sales messaging that carries you through.  That’s been the problem with SDN and NFV, both of which in simplistic terms postulate a completely new infrastructure that would be cheaper to run and more agile.  How do you prove that without making a massive unsupported bet?

That’s where an Ericsson initiative to connect NodePrime decisively with IoT could be extravagantly valuable.  Industrial IoT isn’t really IoT at all, it’s simply an evolution of the established process control and M2M trends of the recent past.  However, the model that’s optimal for industrial IoT happens to be the only model that’s likely to be viable for “broad IoT”, and also a useful model for evolving services toward hosted components.  Ericsson could have a powerful impact with NodePrime.

The question with something like this is always “but will they?” of course.  There’s enough value in hyperscale or even normal data center management for cloud providers and operators to justify the buy without any specific mission for NFV, SDN, or IoT.  NodePrime was part of Ericsson’s Hyperscale Datacenter System 8000 before the deal was made.  However, the press release focuses on what Ericsson calls “software-defined infrastructure” in a way that seems to lead directly to NFV.

It’s not clear that Ericsson sees NodePrime’s potentially crucial role in IoT, or how success there might actually drive success with “software-defined infrastructure” by short-cutting the path to benefits and a business case.  NodePrime had some industrial IoT engagements before the Ericsson deal and Ericsson is surely aware of them, but there was no specific mention of IoT in the press release.  I had some comments from operators on the deal that suggested Ericsson/NodePrime had raised the vision with them at the sales level, but it’s not clear whether that was a feeler or the start of an initiative.

The qualifier “industrial IoT” used by NodePrime and some publications covering the story may simply reflect the fact that “industrial IoT” uses internal sensor networks and IoT fanatics aren’t prepared to let go of the notion of promiscuous Internet sensors quite yet.  We’ll have to see how this evolves.