Where are We in the SD-WAN Evolution/Revolution?

SD-WAN remains a very hot topic, and say the enterprises I’ve chatted with, a very confusing one.  The biggest source of the confusion is the fact that we use the single term “SD-WAN” to describe what are actually three different architectural models.  Not only do those three models differ in capabilities and focus, they can also be looked at in different ways based on different SD-WAN value propositions.  It adds up to too many subtleties for many users.

About the only thing that’s fairly consistent in the SD-WAN space is that, in some way using some technology base, SD-WANs create an overlay network framework.  That means that they define an address space and connectivity rule set that uses traditional Ethernet/IP networks (including the Internet) as a transport network.  Years ago, you could have created something that today would have been called “SD-WAN” using an edge router that could support some “tunnel” or “virtual wire” mechanism.

The dominant model of SD-WAN still looks a lot like that old tunnel-routing approach.  The dominating feature for this model is the ability to use the Internet as an extension to or replacement of a corporate MPLS VPN.  Since universal connectivity of all company facilities is important to most information empowerment strategies, this has been the primary driver for SD-WAN interest.  That’s why there hasn’t been an enormous amount of pressure for the vendors supporting this first SD-WAN model to shift their focus.

Some shifting has been required, though, because while thinly connected branch offices are a general problem for enterprises, the ability to connect the cloud has arguably become a greater need.  As we look to integrate public clouds and data centers into hybrid cloud and multi-cloud, it’s important to have control over your applications’ address space and connectivity wherever you happen to host the application components.  Adding a software-hostable SD-WAN node instance for the cloud is now almost table stakes in any SD-WAN.

The second model of SD-WAN has been mindful from the first of the address-space-management and virtual-network-creating requirements the cloud introduced.  The vendors who support this model (no more than three, currently, as far as I can tell) provide explicit control over forwarding policies rather than replicating the promiscuous routing that’s a fixture in IP networks overall.  These vendors not only provide inherent security features, essentially combining firewall capability with SD-WAN, but they also can offer policy controls over traffic priority and even failover and trunk usage priorities where there are multiple connections per SD-WAN node available, or where wireless backup for wireline connections is available.

The third model of SD-WAN is one where additional connectivity features are provided through integration, most commonly in the form of “universal CPE” (uCPE) and virtual network functions (VNFs).  Service providers, meaning the big-tier telcos and even some cable operators, are interested in this model because it leverages the long-standing NFV work many have been involved with.  These products can provide connectivity control, but do so by adding an incremental firewall component.  They can also provide other encryption and security features the same way.

The big failing of this model, in my view, is the fact that NFV is a big-operator strategy, not well-suited for managed service providers and not suited at all for do-it-yourself enterprise adoption.  You can already see that some of the vendors in this space are becoming more uCPE-centric than NFV-centric, meaning that they’re allowing for hosting of features in a white box but not mandating NFV technology be used for deploying and managing those features.  Since there are no hard standards for how independent components would share a piece of uCPE, this evolution tends to create a lot of risk of fragmentation, silos, and compatibility issues that could lock in users to a limited set of virtual features.

Another issue with the VNF approach is the licensing cost for the elements needed.  Operators have been upset by vendors’ pricing policies for licensing VNFs, though of course vendors see this as ensuring their own business case is met by unbundling their software from proprietary devices.  The only logical solution to this, as I proposed back in 2013, is relying more on open-source VNFs.  There are open-source implementations available for nearly every useful network function, but NFV integration in general, and uCPE integration in particular, has been a barrier.

Other feature issues are emerging that cut across these three models.  One is the use of wireless (often 5G in the media sense, even though 5G isn’t available enough to serve the role at this point) to back up wireline connections or to connect ultra-thin sites where no wireline may be available.  It’s likely that 5G fixed-site connectivity will be supported by some 5G access point.  For business sites, the device would almost surely have to include an Ethernet port, and so an SD-WAN product with two Ethernet ports could support both a wireline Internet path and a 5G (or 4G) path.  A 5G hotspot that was dual WiFi/5G for local connectivity, or one with an Ethernet port, could also serve.

The key point is that it’s easy to support 5G or any “G” that offers an Ethernet port and presents a standard broadband access interface upstream.  Optimizing that capability is another matter.  Enterprises tell me that they expect wireless backup for thin sites to be available at a lower speed and higher cost, and therefore they want to have special control over the traffic policies while in backup mode.  That implies an integrated connectivity management capability, since a separate firewall would probably not understand backup status.

Another interesting development is the notion of an open-source SD-WAN solution.  A new company, flexiWAN, is now engaged in PoC trials with an open-source implementation.  I’ve reviewed the white paper on this, and the idea is good but there aren’t enough feature details for me to assess just what the new open product can do.  According to the company, flexiWAN will be a kind of “Android for routers”, a baseline set of features and a set of APIs that facilitate the integration of third-party components that add functionality.

flexiWAN sees SD-WAN as a foundation element, a virtual-network connection framework in which other things can then be used.  This has some similarity to the way Kubernetes has evolved, as the basis for not only deployment but also a full ecosystem of capabilities.  There’s not nearly enough detail available at this point to see how far they’ve gone in fulfilling their goals, but the notion is surely intriguing, especially for someone like me who’s a fan of open-source software.

SD-WAN today has multiple dimensions of divergence, in short.  That in itself creates a certain disorder in the market, and I think one of the main reasons is another dimension to the space—the “media” dimension.  The biggest SD-WAN vendors, the ones with account control and sales presence, don’t really have any incentive to push the envelope.  The up-and-comings are, almost without exception, failing to use the media to promote any strong differentiating position.  As a result, we have two market dimensions to align with the three technical models.  One dimension is the “incumbent” players, who are largely leveraging their positions with customers and promoting a simple value proposition that leads to quick sales, and the other the startups who need to convince buyers of a new paradigm, but are so far not carrying that burden.  We’ll probably not see any order in the SD-WAN space until this final two-dimensional face-off resolves itself.

What’s Needed to make AT&T’s 5G Infrastructure Initiatives a Success

AT&T shared their vision for an open-source, white-box, and 5G future in a blog post, and any time a major Tier One does that it’s worth a look.  It’s fair to say that AT&T has been the most proactive of all the Tier One operators in the use of open technology to reduce network costs and improve service automation and agility.  I haven’t always agreed with their approach (ONAP is an example of a place I don’t), but I truly admire their determination.  I also admire the fact that their efforts alone, even the unsuccessful ones, are changing the dialog on the future of network infrastructure.

The AT&T blog opens with the comment that 5G demands an entirely new approach to networks, and I agree with that statement.  I think AT&T does hype the impact of 5G on IoT and other applications, but under the hype there’s a fundamental reality.  Network usage, and our dependence on network-related applications, has been increasing steadily and is certain to continue to do so.  What we want is always the same—more for less.  That’s a common buyer desire, but one that flies in the face of the reality of being a seller.  There, you need profit.

What AT&T’s statement comes down to is that we need to reduce the cost per bit of network services in order to sustain and improve current service levels (capacity and QoS) while at the same time maintaining a reasonable profit per bit for the operator.  For years now I’ve been blogging about the decline in profit per bit, a decline that’s the result of the fact that price per bit has fallen sharply as broadband speed expectations have risen, and that cost per bit has declined much more slowly.  Had the curves stayed on their track of 2013, we’d have crossed into a zone where ROI on infrastructure was too low to sustain investment.  All operators have taken measures to reduce costs faster, and AT&T is one of the leaders in that effort.

Cost per bit is made up of both a capital (capex) and operations expenses (opex) component.  In networks today, capex accounts for about 20 cents of each revenue dollar and opex for about 31 cents.  However, that opex isn’t all associated with what most would think of as “network operations”.  The largest component is customer acquisition and retention, and these costs are impacted both by the value and agility of the services and the effectiveness of customer support.  Most operators have recognized that the easiest way to improve cost per bit is to radically reduce the cost of network infrastructure.

NFV tried that with hosting of functions, but the problem that poses is that complexity rises quickly with NFV, increasing opex and limiting the net benefit.  A more promising approach is to attempt to control device costs more directly using white-box technology.  A white box is no more complex than a traditional vendor device, so the opex impacts are minimal.  Further, if you could build white-box deployment into a good service lifecycle automation approach, you could hit both cost components at the same time.

That’s what I think AT&T is working to do.  At the Open Networking Summit, as their blog says, they announced four specific initiatives, and they’ve already been very active in service lifecycle automation, so it’s fair to say they have five pieces to their strategy that we need to consider.  We’ll look at them starting with the most 5G-specific and moving to the most general.

The first AT&T open initiative is the RAN Intelligent Controller (RIC) for 5G, based on the work of the O-RAN group.  Mobile services have long been dominated by the vendors who could provide the RAN technology, and so an open initiative would potentially break the control these vendors have on 5G.  This is potentially the most significant initiative of the group for 5G because it could significantly reduce 5G deployment costs.  It’s also perhaps the hardest to realize, since open RAN (New Radio or NR in 5G) is a combination of hardware and software, and it’s not progressing particularly fast.  Many operators tell me they don’t believe there will be a practical deployment possible before 2020, and pressure on operators to deploy 5G quickly for competitive reasons could delay the impact further.  The software side of RIC is being turned over to the Linux Foundation.

The second initiative is the white-box router technology AT&T has been talking about for the last year.  These devices are intended to be service gateways for business customers, and AT&T has been deploying them in two international locations, hoping to expand these quickly to over 70 new locations by the end of the year.  These routers are cheaper by far than proprietary traditional routers, and so AT&T can deliver more capacity for less capex.  Operationally they’re roughly equivalent to the traditional devices they replace, and so there’s no negative opex hit as there would likely be had AT&T deployed the same logic as a VNF.

Initiative number three is the “Network Cloud” white-box switch, designed for edge missions as the data center switch.  This device is software-controlled by AT&T’s ONAP, and that illustrates the extent to which AT&T is relying on ONAP for its 5G mission, and overall for its operations automation.  Having a standard framework for data center switches, with switch software consistent and (because it’s open-source) more controllable, is an important piece of AT&T’s data center evolution.

The final initiative is the heavy use of fiber connectivity in metro infrastructure.  If in fact 5G will require more capacity per user and per cell, and obviously require more cells, then getting the connections made and fast enough to virtually eliminate the risk of congestion simplifies traffic management and operations automation significantly.

I think all of these moves, and other moves to use white-box cell routers, are both smart and likely to be effective.  The only question I have about the AT&T strategy, in fact, is whether ONAP’s architecture is up to the task.

Lifecycle automation at its very foundation is an event-handling process.  Events represent signals of a condition that changes the status of a service or service element, and that therefore requires handling.  I’ve worked on various projects for the handling of events in telecom services for about 15 years, and I developed a Java-based exemplar implementation for data-model-driven state/event coupling of events to processes.  This early work was based on the TMF’s NGOSS Contract and Service Delivery Framework (SDF) activity, and it proved the value of the event-to-process mapping in creating a distributable, scalable, resilient model for service lifecycle automation.

Model-driven state/event handling requires a model to do the driving, and ONAP was not designed around that principle, nor have they so far included the model-based approach in upgrades (I’ve asked them to brief me when they do, asked at each release if that feature was included, and have yet to be briefed).  It’s my view that without the model-driven approach, ONAP is just a monolithic management system.  Such a system poses a variety of risks, ranging from integration challenges when new gear or software is introduced, to scalability problems that could limit the system’s ability to manage a flood of events that might arise from the failure of some fundamental network component.

I don’t know whether ONAP will ever become truly event-driven, and obviously I’m unlikely to be able to influence that decision.  AT&T could.  What I’d like to see now from AT&T is a push to modernize ONAP, to absorb the cloud-native principles emerging and the model-driven state/event coupling of a decade or more ago.  If AT&T can manage to do that, or make it happen (since ONAP is open-source), I think their 5G strategy is ready for whatever happens in the 5G market.

Lean NFV: Under the Covers

Last week at the Open Networking Summit, we got our first look at the concept of “Lean NFV”.  This initiative is the brain-child of two California academics, and they have an organization, a website, and a white paper.  Light Reading did a nice piece on it HERE, and in it they provide a longer list of those involved.  The piece in Fierce Telecom offers some insights from the main founders of the initiative, and I’d advise everyone interested in NFV to read all of these.

Anyone who’s read my blog since 2013 knows that I’ve had issues with the way the NFV ISG approached their task.  Thus, I would really be happy to see a successor effort that addressed the shortcomings with the current NFV model. There are obviously shortcomings, because nearly everyone (including, obviously, the people involved in Lean NFV) agrees that NFV has fallen far short of implementations.  I’ve reviewed the white paper, and I have some major concerns.

My readers also know that from the first, I proposed an NFV implementation based on the TMF concept of NGOSS Contract, where a service data model was used to maintain state on each functional element of the service, and from that steer events to the appropriate processes.  This concept is over a decade old and remains the seminal work on service automation.  I applied it to the problem of automating the lifecycle management of services made up of both hosted resources and appliances in my ExperiaSphere work.  I’ve been frank from the first in saying that I don’t believe there’s any other way to do service lifecycle automation, and I’m being frank now in admitting that here I’m going to compare the Lean NFV approach both to the ETSI NFV ISG model and to the ExperiaSphere model.  Others, with other standards, are obviously free to make their own comparisons.

Many have said that the NFV ISG did it wrong, me surely among them.  In my own case, I have perhaps historical primacy on the point since I responded to the 2012 white paper referred to in the quote and cited the risks, then told the ISG at the first US meeting in 2013 that they were getting it wrong.  The point is that we don’t need recognition of failure here, we need a path to success.  Has the Lean NFV initiative provided that?  Let’s look at the paper to see if they correctly identified the faults of the initial effort, and have presented a timely and effective solution.

The opening paragraph of the white paper is something few would argue with, and surely not me: “All new technologies have growing pains, particularly those that promise revolutionary change, so the community has waited patiently for Network Function Virtualization (NFV) solutions to mature. However, the foundational NFV white paper is now over six years old, yet the promise of NFV remains largely unfulfilled, so it is time for a frank reassessment of our past efforts. To that end, this document describes why our efforts have failed, and then describes a new approach called Lean NFV that gives VNF vendors, orchestration developers, and network operators a clear path forward towards a more interoperable and innovative future for NFV.”

The “Where did we go wrong” section of the paper says that the NFV process is “drowning in complexity” because there are “so many components”.  I don’t think that’s the problem.  The ISG went wrong because it started at the bottom.  They had a goal—hosting functions to reduce capex by reducing the reliance on proprietary appliances.  A software guy (like me and many others) would start by taking that goal and dissecting it into specific presumptions that would have to be realized, then work down to the details of how to realize them.  The ISG instead created an “End-to-End Architecture” first, and then solicited proofs-of-concept based on the architecture.  That got implementations going before there was any concerted effort to design anything or defend the specific points that attaining the goal would have demanded.

The paper goes on to “What Is the Alternative”, which identifies three pieces of NFV, the “NFV Manager”, the “computational infrastructure” and the “VNFs”.  These three align in general with the NFV ISG’s concept of MANO/VNFM, the NFV Infrastructure (NFVi), and the Virtual Network Functions (VNFs).  It then proposes these three as the three places where integration should focus.  But didn’t the paper just do the very thing that I criticized the ISG for doing?  Where is the goal, the dissection of the goal into presumptions to be realized?  We start with what are essentially the same three pieces the ISG started with.

The paper proposes that the infrastructure manager be stripped down to “a core set of capabilities that are supported by all infrastructure managers, namely: provide NFV with computational resources, establish connectivity between them, and deliver packets that require NFV processing to these resources.”  This statement in itself is problematic in my view, because it conflates a deployment mission (providing the resources) with a run-time mission (deliver packets).  That would mean that either the Lean NFV equivalent of the Virtual Infrastructure Manager was instantiated for each service, or that all services passed through a common element.  Neither of these would be efficient.

But moving on, the technical explanation for the Lean NFV approach starts with a diagram with three layers—the NFV Manager running a Compute Controller and also an SDN Controller.  Right here I see a reference to another disquieting linkage between Lean NFV and its less-than-illustrious predecessor—“Service Chains”.  In NFV ISG parlance, a service chain is a linear string of VNFs along a data path, used to implement a multifunctional piece of virtual CPE (vCPE).  According to the paper, declarative service chain definitions are passed to the lower layers, and to me that’s problematic.

A service chain is only one of many possible connection relationships among deployed VNFs, and as such should be a byproduct of a generalized connection modeling strategy and not a singular required relationship model.  vCPE isn’t even a compelling application of NFV, and if you look at the broader requirements for carrier cloud, you’d find that most applications would expect to be deployed as cloud application components already are—within an IP subnet.  You don’t need or even want to define “connections” between elements that are deployed in a fully connective framework.

In the cloud, we already do orchestration and connection by deploying components within a subnetwork that supports open connectivity.  That means that members are free to talk with each other in whatever relationship they like.  Service chaining should have been defined that way, because there’s no need to create tunnels between VNFs when all the VNFs can reach each other using IP or Ethernet addresses.  Especially since the tunnels would have to use those addresses anyway.

More troubling is the fact that the three-layer structure described here doesn’t indicate how the NFV Manager knows what the architecture of the service is, meaning the way that the elements are deployed and cooperate.  The fact that a service chain is called out suggests that there is no higher-level model structure that represents the deployment and lifecycle management instructions.  The only element in the paper that might represent that, what’s called the “key-value store” isn’t offered in enough detail to understand what keys and values are being stored in the first place.  The Fierce Telecom story features commentary from one of the founders of Lean NFV on the concept, and it’s clear it’s important to them.

But what the heck is it?  The paper says “Finally, as mentioned before, the NFV manager itself should be architected as a collection of microcontrollers each of which addresses one or more specific aspect of management. These microcontrollers must coordinate only through the KV store, as shown in Figure 3 below (depicting an example set of microcontrollers). This approach improves modularity and simplifies innovation, as individual management features can be easily added or replaced.”

The TMF has a data model for services, and it’s a hierarchical model.  In my ExperiaSphere stuff, I outline a hierarchical model for service descriptions.  Many standards and initiatives in and out of the cloud have represented services and applications as a hierarchy—“Service” to “Service Component” to “Subcomponent” and so forth.  These models show the decomposition of a service to its pieces, and the relationships between components of features and the features themselves.  In my view and in the view of the TMF, that is critical.

KV stores are typically not used to represent relationships.  Intent modeling and decomposition, in my own experience, are better handled with a relational or hierarchical database model.  It’s my view that if Lean NFV contemplated basing NFV deployment and management on a model-driven approach, which I believe is absolutely the only way to do it with the kind of scalability and agility the paper says they’re looking for, they’d have featured that in the paper and highlighted it in their database vision.  They don’t, so I have to assume that like traditional NFV, they’re not modeling a service but rather deploying VNFs without knowledge of the specific relationship they have to the service overall.

Could KV work for management/configuration data missions?  I’ve advocated a single repository for management data, using “derived operations” based on the IETF proposal for “infrastructure to application exposure” or i2aex.  I suggested this was separate from the service model, and that it was supported by a set of “agents” that both stored management data and delivered it through a query in whatever format was needed.

The Lean NFV paper describes something similar.  Continuing with the quote above, “The KV store also functions as the integration point for VNFs (depicted on the left) and their (optional) EMSs; in such cases, coordination through the KV store can range from very lightweight (e.g., to notify the EMS of new VNF instances or to notify the NFV manager of configuration events) to actually storing configuration data in the KV store. Thus, use of the KV store does not prevent NF-specific configuration, and it accommodates a range of EMS implementations. Note that because the KV store decouples the act of publishing information from the act of consuming it, information can be gathered from a variety of sources (e.g., performance statistics can be provided by the vswitch). This decreases reliance on vendor-specific APIs to expose KPIs, and hence allows a smooth migration from vendor- and NF-specific management towards more general forms of NFV management.”

Certainly a KV store could be used for the same kind of “derived operations” I’ve been blogging about, with the same benefits in adapting a variety of resources to a variety of management interfaces.  However, “derived operations” was based on the notion of an intent model hierarchy that allowed each level of the model to define its management properties based on the properties of the subordinate elements.  If you don’t have a model, then you have only static relationships between deployed virtual functions and (presumably) the EMSs for the “real” devices on which they’re based.  That gets you back to building networks from virtual versions of real devices, which is hardly a cloud-native approach.

To me, the relationship concept or hierarchical modeling is the key to NFV automation and cloud-native behavior.  You have to do model-driven event-to-process steering to do service automation; where otherwise would changes in condition be reflected?  If you do that steering, you need to understand the relationship between the specific feature element you’re working on, and the rest of the service.  The TMF recognized that with its NGOSS Contract model a decade or more ago, and it’s still the basic truth of service automation today.  If you have a model, you can define functional elements of services in any way that’s convenient, based on devices, on features, or on both at the same time.  That’s what we’d expect from a cloud-native approach to services.  That I don’t see the modeling in the Lean NFV material is very troubling.

We have a cloud-native service/application modeling approach, TOSCA, that’s already used in some NFV implementations and by some orchestration/automation vendors.  We have a cloud-proven technology, GitOps and GitHub, to organize configuration data and unify deployment strategies.  We have a proposal for a repository-based management system, i2aex, that would allow flexible storage and retrieval of status information and support any device and any EMS.  Why do we need something new in these areas, like KV?  Are we not just repeating the NFV ISG’s mistake of failing to track the cloud?  Lean NFV seems to be another path to the same approach, when another approach is what we need.  What they seem to say is that the implementation of the NFV architecture was wrong, but the implementation is wrong because the architecture is wrong.

Could you mold the Lean NFV picture to make it compatible with the model-driven vision I’ve been promoting for years?  Possibly, but you could have done that with the NFV E2E architecture too (I had a slide that did it, in fact).  To me, the problems here are first that they’ve accepted too much of the NFV ISG methodology and approach and not enough of what we’ve learned both before the ISG (the TMF NGOSS Contract) and after (the Kubernetes ecosystem for the cloud).  They’ve inherited the issues that have arisen from ETSI and ignored solutions from the cloud.  Second, they have not described their own approach in enough detail to judge whether it will make the same mistake that the ISG made, which was to literally interpret what their own material said was a functional model.  Even if this is only a starting point for Lean NFV, it’s the wrong point to start in.

I asked on Thursday of last week to join the group and introduce my thinking; no response so far, but I’ll let you know how that works out.

Is Intel’s “Innovation Day” Innovative Enough for Carrier Cloud?

Intel’s “Data-Centric Innovation Day” announcements take specific aim at 5G and NFV, seemingly at a time when operators themselves are doubting whether they really want to deploy their own clouds for hosting features in either area.  There is no question that Intel, server vendors, software vendors, and practically everyone else would love to see operators spend on deploying over a hundred thousand data centers.  There is increasing question whether operators are willing to do it, and in my last blog I noted two drivers that seem to be influencing operator planning for carrier cloud.  One is the lack of understanding of “the cloud” at a technical and cultural level, and the other the issue of “first cost”.

If vendors want carrier cloud to happen, it’s my view that they will need to resolve these issues for their prospective carrier buyers.  That’s actually been clear, even to Intel, for quite some time, but what seems to be missing is a strategy to do what they know they need to do.  Intel is almost a poster child for this, because they have an unusually complete set of the pieces operators need, and their having missed the boat on execution is particularly unfortunate and perhaps incomprehensible.

Intel and others share a common problem, which is that they want to sell their tools to deploy carrier cloud without selling a business case for carrier cloud.  That means that the operators themselves have to first come up with something that actually justifies a carrier cloud build-out and second assemble the pieces from the collection of tools that vendors feel like providing.  Given the point that operators don’t understand cloud technology or culture, you can guess how likely that is to succeed.

You need servers and chips and all the things that Intel included in its Innovation Day announcement, but only if you need carrier cloud.  Operators believe they will, but clearly don’t have the proof points yet, and part of the reason is that people are selling pieces of the solution.  They don’t even realize that there are really six different problems.

If we look at carrier cloud as an opportunity, we see a deployment driven by six distinct market factors, each of which are (today, at least) seen as being independent.  In NFV’s vCPE, streaming video and ad personalization, mobile infrastructure and 5G, contextual services, public cloud services to enterprises, and IoT (the six drivers), we have technology initiatives that lack coordination.  Even within a given driver, as we can already see with NFV, the lack of a cohesive architecture is making every initiative into an integration project.  This is a proof point for the assertion that operators lack cloud technology and culture, and it exacerbates the second issue of first cost.

Suppose we decided to support application development on a broad scale but without any specific operating system or middleware tools.  That would mean that each application would end up picking the framework in which it ran, and if we were thinking about operations efficiency and a nice harmonious resource pool, it’s hard to see how we’d get there from that anarchistic starting point.  We’ve kind of done that with carrier cloud, starting with NFV.  By working hard to preserve the management interfaces to “real” devices through our transformation into the cloud-virtual world, we’ve saved something that wasn’t working particularly by making something vitally important work poorly.

You can see, in a loose and indirect way, how the cloud community has been dealing with this problem.  Containers are the natural partner to componentized applications, including hosted service features.  Orchestration via Kubernetes has already become the go-to strategy for container orchestration, and foresight on the part of the early architects of Kubernetes created the open interfaces that make Kubernetes suitable to be the core of a growing ecosystem.  Something similar is needed with carrier cloud, an ecosystem that would frame how all applications are authored, orchestrated, and managed.  Something to standardize the carrier cloud platform so that the things running on it would be more easily integrated and managed.  Something above chips and servers and operating systems and even middleware.

I told both Intel and their open-source Wind River group four or five years ago that platform tools were not going to be enough.  They needed to pull together what I called “VNFPaaS” for “Virtual Network Function Platform as a Service”, including the management, orchestration, and lifecycle automation pieces.  They didn’t, of course, and now they face exactly the same situation they did back then, but with less time to remedy the problems and a sign that the opportunity might be passing away.

The “might be” here relates to the point I made in my earlier blogs on the public cloud and carrier cloud symbiosis.  Operators are growing more interested in outsourcing carrier cloud because they can’t build enough of a business case for building it themselves.  The question is whether the cloud providers understand what makes carrier cloud different, and so far, I think it’s fair to say they don’t.  Can public cloud providers like Amazon frame their own virtual-carrier-cloud vision before vendors like Intel sell a business case and not disconnected tools?  That’s the question.

For the cloud providers, the big problem isn’t technical as much as pricing policy.  Many carrier cloud applications involve both application-like service-and-management-plane components and components that are part of the service data plane.  If cloud providers were to apply traffic charges to the service data plane piece, they’d price themselves out of the market.  If they didn’t, they run the risk of increasing their network capacity requirements enormously without adding any revenue to pay for the upgrade.

There could be a technical solution to the pricing problem, though.  Almost every service that includes a data-plane carriage requirement could have that requirement satisfied outside the cloud.  vCPE isn’t a great cloud application in the first place, but if you wanted to support it you could use uCPE universal premises devices.  Mobile evolved packet core (EPC) is currently a tunnel-based application, but the data-plane-tunnel part could be implemented in a white-box SDN device.

A partnership between an agile SDN connection network and a public cloud control and management plane would be a profound shift in thinking, but it would actually be easier in an evolutionary sense.  We have network equipment today that lives outside the cloud, in carrier infrastructure.  If we could come up with the right management automation abstractions, we could link that external data-plane investment to the cloud, then enhance the linkage with a gradual migration to SDN.  SDN, remember, centralizes control plane and distributes the forwarding plane.  Doesn’t this sound like our cloud-control-and-SDN-data model?

The tricky truth is that something like this would benefit also Intel in that it would establish a baseline for that VNFPaaS model I mentioned earlier, and at the same time separate the tricky legacy equipment and data plane issues.  Why all this “trickiness?”  The equipment/data-plane trick is a challenge because of the impact on cloud pricing and network capacity, as I noted above, but why is this a tricky issue because of the benefit to Intel?  Because Intel benefits not from the offloading of carrier cloud applications onto the public cloud, but from carriers building the cloud.  Thus, the smart position they take to make their own carrier cloud story more appealing may also facilitate having at least the early carrier cloud applications going to the public cloud instead.

At this point, it’s going to take Intel some time to make that carrier cloud business case, if it makes it at all.  During that time, it’s likely that the public cloud will steal some of the early demand that might have driven early carrier cloud infrastructure investment.  There’s no helping that; Intel and others should have gotten smart sooner.  Their best approach now would be to frame the right architecture for both early outsourcing of carrier cloud and for the later clawing back of the opportunity to fuel network operator data center (in particular, edge) buildouts.

“Innovation” implies something radical, and it’s hard to see how a decision to buttress the features of a platform that still can’t make a business case within carrier cloud can qualify.  The door is still open for Intel to do more, to build the ecosystem that would create and justify carrier cloud, but time is passing and eventually operators will either offload their cloud mission to public providers, or accept the role of “plumber” in the network of the future.

Could Amazon Want to Host the “Carrier Cloud?”

One of the truths about carrier cloud is that it might be more cloud than carrier.  I noted in THIS blog the possibility that some network operators might prefer to have their carrier cloud hosted by one of the public cloud providers (IBM specifically in the referenced blog), and according to Light Reading, there’s a sign that Amazon might be interested in being a carrier-cloud host too, but perhaps aiming at a different target application.

I’ve used the term “carrier cloud” to refer to the infrastructure operators would need in order to supply hosted features as part or all of current and future services.  My model says that carrier cloud would be, if fully realized, the largest single source of new data center deployments globally.  Services created from that infrastructure, again referring to my model, would approach an incremental trillion dollars per year in revenue.  But realizing carrier cloud has been a problem for operators for two reasons.  First, they don’t have the faintest idea of how to do cloud infrastructure.  Second, the “first cost” (cost of initial deployments needed to create even a basic service framework) is frighteningly high.  Thus, more and more operators are quietly looking at the idea of outsourcing at least the early phase of their carrier cloud.  Amazon’s interest could be both credible and timely.

Tetsuya Nakamura has been a fixture in the NFV ISG from the first, though he wasn’t one of the creators of the original “Call for Action” paper published in 2012.  As a former vice-chair of the ISG he’s familiar with the people and politics of the NFV initiative, and as an NTT DoCoMo thought leader, he knows the network operator space and its unique problems and challenges.  His role is to represent the telecom space in Amazon’s partner solutions team, and his switch from CableLabs to Amazon raises some important questions.

First, questions about NFV in the cable industry.  Nakamura and Don Clarke, another NFV pioneer, were both hired by CableLabs for their own NFV program, and both appear to have departed at about the same time, Nakamura to Amazon and Clarke to take a breather in the industry.  It’s hard not to read that as an indicator that CableLabs wants to cut back on NFV.  I don’t have any specific comments on the CableLabs implications from either of the two, but I do have some views shared in non-attributable form from a cable company executive.

It’s no secret that many in the network operator space feel that NFV hasn’t paid off, and certainly hasn’t brought a return on the enormous effort that’s been devoted to it.  My cable company friend tells me that this is the general view of companies in the cable side of the network operator space.  According to him, most execs in his own company think NFV is unlikely to help them either reduce costs or improve profits.  If that’s the case broadly, which he says he believes it is, then CableLabs as an industry technology body would be under pressure to reduce its efforts in the NFV space.

Another problem that my cable company contact cites is the “telco-centricity” of the NFV work.  Most of the operators in the NFV ISG are telcos, most of the vendors are chasing telco opportunities, and most of the initiatives are arguably more valuable to telcos than to cable companies.  Not only that, telcos are former public utilities with low internal rates of return and modest ROI expectations, while cable companies tend to have higher IRRs and thus want better ROIs.  That fundamental difference in financial policy makes it hard to align technical programs between the two constituencies within the ISG.  That’s a good reason for CableLabs to look hard at its NFV commitment.

The second question is what this might mean for Amazon initiatives in the carrier cloud space.  IBM’s deal with Vodafone Business seemed directed at providing operators with a way to offer enterprises public cloud services, a move that would presume that Vodafone Business would have a preferential marketing relationship with business customers and could serve as a “virtual cloud provider” using IBM’s infrastructure instead of building out its own.  Amazon hardly needs another party to take over some or all of its enterprise marketing, and in fact would probably fear a collision in sales strategy and messaging between that virtual-cloud partner and its own AWS.  Might Amazon be instead looking at the “internal” use of carrier cloud, a mission that’s obviously inclusive of hosting NFV?

The obvious question is how an NFV focus would do Amazon much good given that NFV is widely viewed as having fallen short of expectations.  My carrier cloud opportunity model says that even in 2019, when most other carrier cloud drivers are still only shaping up, NFV doesn’t command a lot of incremental hosting opportunity.  Why, given both these points, would Amazon even consider taking an NFV position, which hiring a former NFV ISG co-chair seems to suggest it is in fact taking?  Two possibilities come to mind, the first that Amazon has been deluded by the market hype, and the second that there’s another dimension to NFV to be considered here.  It’s hard for me to see how Amazon is deluded about much of anything, so I favor the latter.  NFV has credibility, or at least mindshare, in two applications—virtual CPE and mobile infrastructure.  Both these have potential value to a public cloud provider like Amazon.

One of the ongoing challenges operators have faced, both telco and cable, is that of pan-provider services.  Nearly all operators of both types have a physical infrastructure footprint that’s short of their target market area.  Why would AT&T buy DirecTV, if not to extend TV service beyond its wireline footprint?  Why has wireless, which doesn’t rely as much on fixed infrastructure, be so competitive when few operators are trying to deploy copper or cable in the ground?  Back a decade or more ago, the IPsphere Forum tackled the issue of pan-provider services at the request of operators, and it was tough going because operators interested in pan-provider services all want to be the master player in the pan-provider game.  It’s thus an attempt to federate competitive players.

Amazon and other public cloud providers have hosting presence in a fairly wide geography.  Suppose that a network operator wanted to offer virtual CPE on another continent?  Rather than build out a data center, they could either try to cut a federation deal with a competing carrier on that continent, or they could go to someone like Amazon to host there.  The former could fall apart on competitive fear (why not, the “partner” might think, have me be the primary and the other guy the wholesale partner?) or on the more obvious problem that the prospective partner carrier doesn’t even have cloud infrastructure to host in.  Thus, Amazon.  vCPE might be an on-ramp for telcos looking for carrier cloud hosting instead of carrier cloud deployment.  Not big, but timely.

Mobile infrastructure is the other side.  This is the application that operators told me was their top priority way back in 2013 (which is one reason why the first NFV PoC I worked on was with Metaswitch on hosted IMS).  It has more than double the potential for driving carrier cloud as other NFV applications have, and it’s an application that could be hosted in public cloud infrastructure fairly easily if you can resolve the issue of data plane traffic.

NFV in the form of vCPE isn’t a compelling opportunity, but it might be a gateway to the mobile infrastructure opportunity that’s a pretty good launch point for carrier cloud.  That’s particularly true once we get past 2021 into the 5G Core deployment period.  And if carrier cloud and 5G are connected to IoT, then Amazon’s position with NFV might lead to a carrier cloud IoT coup.

Nakamura is a credible resource, a credible leader, in this kind of quest, providing that he can quickly absorb the obvious and necessary software-centricity of Amazon.  Standards processes are about as far from cloud processes as glaciers are from wildfires.  Amazon, in turn, is going to have to gain enough understanding of the carrier cloud opportunity to frame their strategies reasonably.  Given the total lack of insight that’s being offered in the media on the topic of NFV and carrier cloud, that’s not going to be easy either.

Digging Deeper into Data-Driven Event-to-Process Coupling

In yesterday’s blog, I opened two points that I think are particularly critical for lifecycle automation.  The first is the notion of event coupling to processes via the service data model, something that came not from me but from the TMF’s NGOSS Contract work.  The second is the notion of service/resource domain separation, which you could infer from the TMF’s SID and its customer-facing versus resource-facing services (CFS and RFS, respectively).  Today I’d like to build a bit on these.

If we model a “service” (and, in theory, an application in the cloud), we could say that it consists of two sets of things.  The first is the functional elements that make up the service’s behavior, and which in many cases are orderable or at least composable elements.  The second is the resource bindings that link functional elements to resources.  I’ve called these the “service domain” and “resource domain” respectively.

Each domain consists of a hierarchical representation of elements.  At the top of the service domain is an element representing the service itself.  This would decompose into a series of parallel functional elements that directly made up the service.  An example might be “virtual-CPE” and “VPN”.  These highest-level elements could be decomposed into different service types—“vCPE-as-uCPE” for agile premises white-box devices and “vCPE-as-Service-Chain”, or “VPN-via-MPLS” versus “VPN-via-SDWAN”.

The division here would be within the service domain if it represented something that a customer or customer service rep could select, and would be priced differently or differ in some other meaningful contractual sense.  In some cases, the customer or CSR would select the specific low-level elements, if those were what the operator sold.  In others, the customer/CSR might select only the top-level functional elements, and the decomposition of those elements would then change depending on the parameters associated with each functional element happened to be.

Within the resource domain, there would be a high-level object that represented a behavior of a resource or collection of resources.  An example of a behavior could be “IP-any-to-any”, which could then decompose into things like “MPLS-VPN”, “VLAN”, “RFC2547” or whatever, or these behaviors might be exposed directly.  Either way, the decomposition of resource domain elements would eventually lead to parameterization of a network management system (to induce a VPN) or deployment of hosted elements that provided the specified feature.

The service data model for a service is the collection of data associated with the hierarchy of elements in the service.  In my own work, the presumption was that the service data model represented every element in the service, even if that element was a secondary decomposition option not taken for this particular service instance.  Thus, an enterprise VPN service would have the MPLS and SD-WAN options present in the service data model even though only one was selected.

The resource data model is likewise a collection of the data associated with the model hierarchy representing the behavior to which the service (at its “bottom” elements) was bound.  This model would have represented all the decomposition options at each level too, but not all the possible options for all the possible bindings between service and resource.

Binding is something I believe to be critical in service modeling.  It allows the service domain and resource domain to develop autonomously, with the former focusing on functionality and the latter on resource commitments to securing that functionality.  As long as the resource layer can generate behaviors that can be bound to the necessary functionality, it can support the service goals.  Correspondingly, a service function can be bound to any resource behavior that matches its functional needs, regardless of what technology is used or what administration offers it.  Hosted or appliance, software or hardware, wholesale or retail, my infrastructure or a partner—all are supported through behaviors and binding.

When a service is ordered, a service data model is created and populated, decomposing the elements as they’re encountered.  At the point where a service element has to be bound to resources, the process would stop to await an “activation”.  If, later, the service terminated, the “terminate” event would tear down the resource domain model but leave the service data model intact and in a “terminated” state.

This gets us to states and events.  Each model element has a state/event table representing the way it responds to outside conditions.  I found that it was possible to define a generic set of states and event that would serve all of the services and behaviors I was asked to consider, and so the state/event table in my work was made up of the same states/events for each service element, then again for each resource element.

The first event in my sequence was always the “Order” event and the first state the “Orderable” state.  A new service starts in the Orderable state, and when the customer order is complete an Order event is sent to the top object.  That object would consult its state/event table (for the Order event in the Orderable state), and run the associated process, which would always decompose the subordinate elements.  Each of them, as instantiated, would be sent their own Order event by the superior process, and this would eventually result in the service entering the Ordered state, where an Activate event would start the next progression.

Events of this sort flow “downward” from service toward resources, but events could also flow the other direction.  Suppose that there’s a problem with the uCPE hosting of a specific function.  The error would be reported by the uCPE in the resource domain, and the first responsibility for any element is to remediate itself.  If the problem couldn’t be fixed (by, let’s say, rebooting the device) then the next step would be to report the error up the chain to the superior object.  That report would signal that object to remediate, which would likely mean seeing if there was another possible decomposition that would still work.  Perhaps we’d substitute a cloud-hosted instance of the element for the uCPE instance.

All this decomposition and remediation is controlled by the processes that the events trigger.  Those processes would operate only on the parameters associated with the objects that are steering the events, and since those parameters are stored in the service order instance in a repository, any instance of the process would serve.  That means that the processes are effectively microservices and are fully scalable and resilient.  Further, since everything is operating off a service order instance, that instance is a complete record of the state of the service and everything connected with it.  Functions like billing and reporting could operate from the service order instance and need not be state/event-driven.

The integration potential of all of this falls out of this process-to-data-model correlation.  A resource element for “uCPE” for example would define the external properties of uCPE.  A uCPE vendor would be responsible for providing an implementation of the uCPE resource object that would represent their specific implementation.  That implementation would take the form of a set of parameters (that would fill into the service data model instance) and a set of processes, which would either be “stock” processes representing non-specific processes against the generalized uCPE model or their own specific processes that supported their own implementation.  With both those, all uCPE would be interchangeable.

Non-specific processes, meaning processes that would apply to generalized objects and not specific ones, could be authored by the provider of the framework, by the operator, or by third parties.  Each process, remember, works only on the object parameters associated with it and so the process is really specific to the parameters (meaning the implementation).  Many of these non-specific processes would do things like the basic state/event management, and others might be associated with “class” definitions like “uCPE” that represent a reference that vendors would be expected to conform to in their implementations of the class.

A monolithic implementation of zero-touch lifecycle automation could be converted to an event-and-model-driven implementation by extracting the processes and converting them to microservices that operated from the data model.  This would be complicated where the implementation had progressed a long time in the monolithic direction, as would be the case with ONAP, for example, but it would be possible.

There are a lot of ways to make event-driven processes work, but all of the effective ways have common properties.  First, they are asynchronous in that you dispatch an event to a process and the process takes over, perhaps generating an event at the completion.  You don’t wait for a response, and anything that allows waiting for a response is the wrong approach.  Second, they are microservices that don’t store anything internally so that any instance of a given process can handle its work.  I’ve offered examples here of how events can be used, but in theory you could make every request into an event.  For example, generating a bill could be an event, or generating a trouble ticket.  The entire OSS/BSS system could be a series of event-linked processes, which is what operators who want an “event-driven” operations system are looking for.

The biggest truth here is that you can’t do event-to-process coupling without a service/resource model that includes state/event tables.  That’s been my quarrel with ECOMP from the first; you can’t retrofit what’s supposed to be the fundamental step into something.  You start it right or you pay a big price in delay and effort to convert it, and that’s what we’re facing with both ECOMP and with OSS/BSS.

That’s where separating services and resources come in.  If you use service/resource modeling to create an abstraction layer that presents simple virtual functions upward to the OSS/BSS and that manages all the good stuff I’ve described within itself, you insulate the operations systems from the complexity of zero-touch lifecycle management.  That won’t help ECOMP, which is supposed to be doing lifecycle automation, but it would provide a rational path to OSS/BSS evolution.

I’d like to see the TMF pick up its own lead in this space.  As I said in my opening, the TMF has contributed what I think are the two critical pieces needed for effective service lifecycle automation, but hasn’t developed or modernized either one of them.  My own initiatives in the area have focused on “cloudifying” the TMF notion, which of course the TMF could do better on its own.  So how about it, TMF?  Can you show the same truly dazzling insights you showed in the past?  We surely need them.

Resources, Services, and Operations Support Systems

The TMF has done some very insightful things in the last decade.  One of the places where I split from the TMF is the boundary point between operations support systems (OSS) and network technology.  My view has been that OSS/BSS should be dealing with functional entities, high-level service-related intent models, and never dig into the implementation of functions.  The TMF has a more OSS-centric model, though it may be moving slowly to at least acknowledge the possibility that functional-based OSS/BSS thinking is acceptable.  That would make it easier to reflect transformations in infrastructure into the service domain without breaking the OSS, and that’s a priority for many operators.

Light Reading did a piece on Telstra’s work in this space.  Telstra is Australia’s dominant network operator, and was (and arguably is) enmeshed in the complicated, politicized, and (many say) doubtful National Broadband Network (NBN) initiative there.  NBN has effectively removed the under-network from operators and planted it in a separate “national business”, and so Telstra is forced to consider “overlay” services more seriously than most big network operators.  For some time, they’ve grappled with issues that every operator will face, arising from the decline in profits from connection services.

Right now, practically every operator in the world is struggling to create a new network model that optimizes capex through hosting and white-box devices, and opex through service lifecycle automation.  At the same time, nearly all are trying to make their operations support and business support systems (OSS/BSS) more agile and responsive.  That’s something like trying to threading a needle while break dancing.  Linking these goals, meaning making the OSS/BSS handle the new infrastructure, continues a resource-to-service tie that’s been a thorn in the side of operators for a decade or more.

The article says that Telstra is hoping to use the TMF’s Open API definitions to build an OSS that’s more agile and that creates reusable services to limit the number of specialized implementations of the same thing that Telstra is currently finding in its operations systems.  The barrier cited is that there’s not yet broad industry support for those APIs, which means that Telstra can’t expect to find OSS elements that support them and can be integrated.  That’s an issue, but to my mind there’s another issue that I’ll get to later.

The no-support issue shouldn’t be surprising for two reasons.  First, the TMF is first and foremost a body to support operations software vendors and provide them a forum for customer engagement.  This isn’t to say that it’s not doing some useful work; I’ve said many times that the NGOSS Contract work the forum did a decade ago was the definitive, seminal, introduction to event-driven service automation.  The problem is that a lot of the useful work, including NGOSS Contract by the way, doesn’t seem to get off the launch pad when it comes to actual deployment in the real world.

OSS/BSS vendors are like network equipment vendors; they want all the money for themselves.  An open set of APIs to support the integration of operations software features would at least encourage buyers to roll their own OSSs from piece parts.  It would lower barriers to market entry by letting new players introduce pieces of the total software package, and let buyers do best-of-breed implementations.  Good for buyers, bad for sellers, in short, and in almost every market the sellers spend the most on industry groups, trade shows, and other forums.

If Telstra is hoping for a quick industry revolution in acceptance of the TMF’s Open API stuff, I think they’ll be disappointed, but I don’t think that’s the biggest problem that Telstra faces.  For that, let me quote the Light Reading article briefly as it quotes Telstra’s Johanne Mayer: “We started with one technology and added another and then NFV came along and helped us add more spaghetti lines. It was supposed to make things easier and cheaper because it would build on white boxes, but in terms of management it is more spaghetti lines and that makes it hard to deliver anything fast.”  The goal is to remove resource-level information from OSS systems and emplace it in service domains, and then to assure that all the service domains map their visions of a given function/feature to a single implementation (or small set thereof).  It’s not easy to do that.

The problem here is one I’ve blogged about many times, and reflected in my ExperiaSphere work (see THIS tutorial slide set).  Operations systems (in the kind of view Telstra wants) are disconnected from the resources by recognizing that there’s a “service domain” where functions are modeled and composed into services, and a “resource domain” where the functions are decomposed into resource commitments.  If this explicit separation is maintained, then it’s possible to build services without making the results brittle in terms of how the services are implemented, and also to provide for whatever level of multiplicity of implementation of a given function might suit the operator involved.  If there is no separation here, then you can’t remove resource-level information from OSSs because it would intertwine with service composition.

It’s fairly easy to convey the notion of service and resource domains, and the mechanism for composition in each domain, if you presume a model-based approach that recognizes the idea that any given “element” of a service is a black box that might decompose into resources (resource domain), or might decompose into other elements (which might be in either the service or resource domain).  My presumption has always been that there are “service models” that describe functional composition, and resource models that describe resource commitments, and that the juncture of the two is always “visible”, meaning that service-model “bottoms” must mate with resource-model “tops” in some agile process (that’s also described in the referenced presentation, but more detail is available HERE, where you can also find a description of service automation, another Telstra goal).

Service/resource separation and compositional modeling are critical if you intend to expose service-building through a customer portal.  If the modeling follows the approach I’ve described, then lifecycle management tasks are integrated with the models, which means that support is composed in parallel with composition of the service features.  Moreover, the tie-in between the service and resource layers creates a set of model elements that present service-abstracted properties to service-builders, and permits any kind of implementation that meets those properties to be linked.

That’s an essential piece of any goal to reduce the integration burden, which Telstra says is now 80% of the cost of features and that they’d like to see reduced to 20%.  Without some master target for feature implementations to meet, it’s impossible to present a consistent view of features to the service layer when implementations are different, unless all those implementations were harmonized somehow.  That means that some resource details could not be separated from OSS concerns, as Telstra wants.

The frustrating part of all of this is that nothing that Telstra wants, and nothing that isn’t currently available in a uniform way, should come as any surprise.  These issues predate NFV, as the timing of the TMF’s NGOSS Contract work proves.  The TMF also grappled with them in their own Service Delivery Framework (SDF) work, another project that did good stuff that somehow didn’t move the market to implement it, and that again predated NFV.

Back in March of 2014, I did a final report to the NFV ISG on NFV objectives as I saw them, opening with these four goals:

  • To utilize cloud principles and infrastructure to the greatest extent possible, but at the same time to ensure that functions could be hosted on everything from the cloud through a chip, using common practices.
  • To support the use of any software component that provided network functionality or any other useful functionality, without change, as a VNF, providing only that the rights to do this were legally available.
  • To provide a unified model for describing the deployment and management of both VNF-based infrastructure and legacy infrastructure.
  • To create all of this in an open framework to encourage multiple implementations and open source components.

These seem to me to be the things Telstra is looking for, and they’re half-a-decade old this month.  What happened?  Did the ISG reject these goals, or did they just not get realized as so many good TMF projects have gone unrealized?  The point to me is that there is a systemic problem here, not with our ability to define models that meet our long-term objectives in service lifecycle automation, but with our ability to recognize when we’ve done that, and implement them.

The challenge the TMF faces with its Open API Program in extracting resource specificity from services is similar to the one that the NFV ISG faced in integrating NFV with operations systems.  You have two different communities here—the OSS/BSS stuff with its own vendors and internal constituencies, and the NFV, SDN, and “network technology” stuff with its own standards groups and CTO people.  We’ve seen both groups try to advance without paying enough attention to the other, and we’re seeing both fail.

There are some players out there who have taken steps in the right direction.  Cloudify has done some good things in orchestration, and Apstra has made some strides in resource-layer modeling.  Ubicity has a good TOSCA modeling framework that could be applied fairly easily to this problem, and there are startups that are starting to do work in cloud orchestration and even “NetOps” or network operations orchestration.  All of these initiatives, I submit, are at risk to the same problems that the TMF’s Open API program faces—sellers drive our market and sellers aren’t interested in open frameworks.

Buyers aren’t much better.  The ETSI work on service lifecycle automation should have started with the discussion of modeling, service and resource separation, and event-driven, model-guided, operations processes.  That was the state of the art, but that’s not what was done, and service providers themselves drove it.  Same for ECOMP.  If we’re not getting convergence on the things network operator buyers of operations automation technology need, it’s because the operators themselves are not wrestling effective control of the process.  They’ve tried with standards, with open-source initiatives, and they still miss the key points of modeling that you can’t miss if you want to meet your service lifecycle automation goals.

The Light Reading article ends with Telstra’s Mayer expressing frustration with the lack of those function-to-implementation connectors discussed above.  “There is no such standard as firewall-as-a-service and I am looking for a forum where we can agree on these things and speak the same language.”  Well, we’ve had four that I’m aware of (the TMF, the IPsphere Forum, the NFV ISG, and the ETSI ZTA initiative), and somehow the desired outcome didn’t happen, despite the fact that there were explicit examples created of just what Telstra wants.  I think Telstra and other operators need to look into what did happen in all those forums, and make sure it doesn’t happen again.

Without an open, effective, way of creating the function-to-implementation linkage, there is no chance that resource independence or no-integration service-building is going to happen, and the TMF alone doesn’t provide that.  My recommendation is that those who like the points I’ve made here use the ExperiaSphere project material I’ve cited and the other four presentations to either assess the commercial tools (some of which I’ve mentioned here) or to frame their own vision.  All the concepts are open for all to use without licensing, permission, or even attribution (you can’t use the presentation material except for your own reference, or use the ExperiaSphere term, without my consent).  At the least, this will offer a concrete vision you can align with your requirements.  That may be the most important step in achieving what Telstra, and other operators, want to achieve.