Another Slant on the Service Lifecycle Automation Elephant

I asked in an earlier blog whether the elephant of service automation was too big to grope.  The Light Reading 2020 conference this week raised a different slant on that old parable, which is whether you can build an elephant from the parts you’ve perceived by touching them.

Wikipedia cites the original story of the elephant as:

A group of blind men heard that a strange animal, called an elephant, had been brought to the town, but none of them were aware of its shape and form. Out of curiosity, they said: “We must inspect and know it by touch, of which we are capable”. So, they sought it out, and when they found it they groped about it. In the case of the first person, whose hand landed on the trunk, said “This being is like a thick snake”. For another one whose hand reached its ear, it seemed like a kind of fan. As for another person, whose hand was upon its leg, said, the elephant is a pillar like a tree-trunk. The blind man who placed his hand upon its side said, “elephant is a wall”. Another who felt its tail, described it as a rope. The last felt its tusk, stating the elephant is that which is hard, smooth and like a spear.

My original point was that it’s possible to envision an elephant/problem that’s just too big to be grasped (or groped!) in pieces.  The question raised by the Light Reading piece is whether you can assemble an elephant from what you thought the pieces were.  Snakes and fans and walls and spears don’t seem likely to provide the right tools for elephant-assembly, right?  Might it be that the tools of our current transformation process don’t add up to the whole at all?

Most of the key initiatives to support transformation of the network operator business model have been deliberately localized in terms of scope.  SDN and NFV, for example, didn’t cover the operations side at all; it was out-of-scope from the first.  It’s also out-of-scope for 5G and IoT, and both these initiatives talk about SDN and NFV but acknowledge that the standards for them are created elsewhere.

Like the blind men in the Wikipedia version of the elephant story, the standards bodies are dealing with what’s in scope to them, and that has totally fragmented the process of transforming things.  This fragmentation isn’t critical when the standards cover an area fully enough to guide and justify deployment, but remember that “transformation” is a very broad goal.  Addressing it will require very broad initiatives, and we have none of them today.

If the whole of transformation can’t be created by summing the parts of initiatives we’ve undertaken, can we really expect to get there?  I’ve been involved in three international standards activities aimed at some part of transformation, and indirectly associated with a fourth.  I’ve seen three very specific problems that fragmentation of the standards initiatives have created, and either could compromise our goals.

The first problem is the classical “dropped ball” problem.  For example, if the NFV ISG decided that operations is out of scope, how does the body ensure that any specific operations impacts of their activity are met by somebody?  The classic standards-group answer is “liaisons”, between the groups, but we’re still seeing liaison requests submitted and approved by the ISG four years after the process started.

What we’re lacking to address this problem effectively is a single vision of the parts that have to add up to our glorious transformation whole.  Not the details, just the identification of the total requirements set and how that set is divided up among the bodies doing the work.  This could, of course, guide liaison by identifying what is essential in the way of the relationships across the various groups.  It could also bring to the fore the understanding that there are areas in the work Group A is doing that can be expected to heavily impact Group B, thus showing that there needs to be special attention given to harmonization.

There’s nowhere this is more obvious than in the relationship between NFV and the cloud.  What is a VNF, if not a network application of cloud computing principles?  We were doing cloud computing before NFV ever started, and logically should have used cloud computing standards as the basis for NFV.  I firmly believe (and have believed from the first) that the logical way to do NFV was to presume that it was a lightweight organizing layer on top of cloud standards.  That’s not how it’s developed.

The second problem is the “Columbus problem”.  If you start off looking for a route to the East and run into an entirely new continent instead, how long does it take for you to realize that your original mission has resulted in myopia, and that your basic premise was totally wrong?

We have that problem today in the way we’re looking at network transformation.  Anyone who looks at the way operator costs are divided, or that talks with operators about where benefits of new technologies would have to come from, knows that simple substitution of a virtual instance of a function (a VNF) for a device (a “physical network function” or PNF) isn’t going to save enough.  In 2013, most of the operators who signed off on the original Call for Action admitted they could get at least that level of savings by “beating Huawei up on price”.  They needed opex reduction and new service agility to do the job, and yet this realization didn’t impact the scope of the work.

The final problem, perhaps the most insidious, is the “toolbox problem.”  You start a project with a specific notion of what you’re going to do.  You have the tools to do it in your toolbox.  You find some unexpected things, and at first you can make do with your toolbox contents, but the unexpected keeps happening, and eventually you realize that you don’t have what you need at all.  I’ve done a lot of runs to Lowes during do-it-yourself projects so I know this problem well.

The current example of this problem is the notion of orchestration and automation.  You can perform a specific task, like deploying a VNF, with a script that lets you specify some variable parameters.  But then you have to be able to respond to changes and problems, so you need the notion of “events”, which means event-handling.  Then you increase the number of different things that make up a given service or application, so the complexity of the web of elements increases, and so does the number of events.  If you started off thinking that you had a simple model-interpreter as the basis for your orchestration, you now find that it can’t scale to large-sized, event-dense, situations.  If you’d expected them from the start, you’d have designed your toolbox differently.

Architecturally speaking, everything we do in service lifecycle processing should be a fully scalable microservice.  Every process should scale with the complexity of the service we’re creating and selling, and the process of coordinating all the pieces through exchange of events should be structured so that you can still fit in a new/replacement piece without having to manually synchronize the behavior of the new element or the system as a whole.  That’s what the goals for service lifecycle automation, zero-touch automation, closed-loop automation, or whatever you want to call it, demand.  We didn’t demand it, and in many cases still aren’t demanding it.

None of these problems are impossible to solve; some are already being solved by some implementations.  Because we’ve not particularly valued these issues, particularly in terms of how they’re covered, there’s not been much attention paid by vendors in explaining how they address them.  Buyers don’t know who does and who doesn’t, which reduces the benefit of doing the right thing.

We also need to take steps in both the standards area and the transformation-related open-source projects to stop these kinds of issues from developing or worsening.  Some sort of top-down, benefit-to-project association would be a good start, an initiative to start with what operators expect from transformation and align the expectation with specific steps and architectural principles.  This wouldn’t be difficult, but it might then be hard to get the various standards bodies to accept the results.  We could try, though, and should.  Could some major standards group or open-source activity not step up and announce something along these lines, or even a credible vendor or publication?

Nobody wants to commit to things that make their work more complicated.  Nobody likes complexity, but if you set about a complex process with complex goals, then complexity you will have, eventually.  If you face that from the first, you can manage things at least as well as the goals you’ve set permit.  If you don’t, expect to make a lot of trips to Lowes as you try to assemble your elephant, and they probably won’t have the right parts.

What are the Options and Issues in AI in Networking?

It looks like our next overhyped concept will be AI.  To the “everything old is new again” crowd, this will be gratifying.  I worked on robotics concepts way back in the late 1970s, and also on a distributed-system speech recognition application that used AI principles.  Wikipedia says the idea was introduced in 1956, and there was at one time a formal approach to AI and everything.  Clearly, as the term gains media traction, we’re going to call anything that has even one automatic response to an event “AI”, so even simple sensor technologies are going to be cast in that direction.  It might be good to look at the space while we can still see at least the boundaries of reality.

In networking, AI positioning seems to be an evolution of “automation” and “analytics”, perhaps an amalgamation of the two concepts.  Automation is a broad term applied to almost anything that doesn’t require human intervention; to do something on a computer that was once done manually is to “automate” it.  Analytics is typically used to mean the application of statistical techniques to draw insight from masses of data.  “Closed-loop” is the term that’s often used to describe systems that employ analytics and automation in combination to respond to conditions without requiring a human mediator between condition and action.  AI is then an evolution of closed-loop technology, enhancing the ability to frame the “correct” response to conditions, meaning events.

There have been many definitions and “tests” for artificial intelligence, but they all seem to converge on the notion that AI systems have the ability to act as a human would act, meaning that they can interpret events and learn behaviors.  We seem to be adopting a bit broader meaning today, and thus we could say that in popular usage, AI divides into two classes of things—autonomous or self-organizing systems that can act as a human would based on defined rules, and self-learning systems that can learn rules through observation and behavior.

The dividing line between these two categories is fuzzy.  For example, you could define a self-drive car in a pure autonomous sense, meaning that the logic of the vehicle would have specific rules (“If the closure rate with what is detected by front sensors exceeds value x, apply the brakes until it reaches zero.”) that would drive its operation.  You could, in theory, say that the same system could “remember” situations where it was overridden.  Or you could say that the car, by observing driver behavior, learned the preferred rules.  The first category is autonomous, the second might be called “expanded autonomous” and the final one “self-learning”.  I’ll use those terms in this piece.

Equipped now, at least if you accept these terms as placeholders, to classify AI behaviors, we can now look at what I think is the top issue in AI, the rule of override.  Nobody wants a self-driving car that goes maverick.  To adopt AI in any form you have to provide the system with a manual override, not only in the sense that there might be a “panic button” but in the sense that the information the user needs to make an override decision is available in a timely way.

This rule is at the top because it’s not only the most important but the most difficult.  You can see that in a self-drive car, the rule means simply that the controls of the vehicle remain functional and that the driver isn’t inhibited from either observing conditions around the vehicle or using the override function if it’s necessary.  In a network, the problem is that the objective of network automation is to replace manual activity.  If no humans remain to do the overriding, you clearly can’t apply our rule, but in the real world, network operations center personnel would likely always be available.  The goal of automation, then, would be to cut down on routine activity in the NOC so that only override tasks would be required of them.

That’s where the problem arises.  Aside from the question of whether NOC personnel would be drinking coffee and shooting the bull, unresponsive to the network state, there’s the question of what impact you would have on automation if you decided to even offer an override.  The network sees a failure through analysis of probe data.  It could respond to that failure in milliseconds if it were granted permission, but if an override is to be made practical you’d have to signal the NOC about your intent, provide the information and time needed for the operator to get the picture, and then either take action or let the operator decide on an alternative, which might mean you’d have to suggest some other options.  That could take minutes, and in many cases make the difference between a hiccup in service and an outage.

This problem isn’t unsolvable; router networks today do automatic topology discovery and exercise remedial behavior without human intervention.  However, the more an AI system does, meaning the broader its span of control, the greater the concern that it will do something wrong—very wrong.  To make it AI both workable and acceptable, you need to provide even “self-learning” systems with rules.  Sci-Fi fans will perhaps recall Isaac Asimov’s “Three Laws of Robotics” as examples of policy constraints that operate even on highly intelligent AI elements, robots.  In network or IT applications, the purpose of the rules would be to guide behavior to fit within boundaries, and to define where crossing those boundaries had to be authorized from the outside.

An alternative approach in the AI-to-network-or-IT space would be to let a self-learning system learn by observation and build its own rules, with the understanding that if something came up for which no rule had been created (the system didn’t observe the condition) it could interpolate behavior from existing rules or predefined policies, and at least alert the NOC that something special had happened that might need manual review.  You could also have any action such a system takes be “scored” by impact on services overall, with the policy that impacts below a certain level could be “notify-only” and those above it might require explicit pre-authorization.

All of this is going to take time, which is why I think that we’ll likely see “AI” in networking applications focusing mostly on my “autonomous” system category.  If we look at the whole notion of intent modeling, and manifest that in some sort of reality, we have what should be autonomous processes (each intent-modeled element) organized into services, likely through higher layers of model.  If all of this is somehow constrained by rules and motivated by goals, you end up with an autonomous system.

This leads me back to my old AI projects, particularly the robotics one.  In that project, my robot was a series of interdependent function controllers, each responsible for doing something “ordered” from the system level.  You said “move north” and the movement controller set about carrying that out, and if nothing intervened it would just keep things moving.  If something interfered, the “context controller” would report a near-term obstacle to avoid, and the movement controller would get an order to move around it, after which its original order of northward movement would prevail.  This illustrates the autonomous process, but it also demonstrates that when there’s lots of layers of stuff going on, you need to be able to scale autonomy like you’d scale any load-variable element.

Returning to our network mission for AI, one potential barrier to this is the model of the service.  If events and processes join hands in the model, so to speak, then the model is an event destination that routes the event to the right place or places.  The question becomes whether the model itself can become a congestion point in the processing of events, whether events can pile up.  That’s more likely to happen if the processes that events are ultimately directed to are themselves single-threaded, because a given process would have to complete processing of an event before it could undertake the processing of a new one.

This additional dimension of AI systems, which we could call “event density”, is something that’s slipped through the cracks, largely because so far most of the “orchestration” focus has been on NFV-like service-chain deployments.  If you move from two or three chained elements to services with hundreds of elements, add in the business processes that surround the network parts, and then try to automate the entire mess, you have an awful lot of things that generate events that could change the requirements for a lot of other things.  We need to take event density seriously, in short, when we assess automation and orchestration goals that go beyond basic NFV MANO.

And maybe even there.  There’s nothing more frustrating than a system with self-limiting faults that are hidden until you get really committed to it.  New applications of NFV will be more complex than the old ones, because nobody starts with the most complicated stuff.  We don’t want to find out, a year or so into an NFV commitment, that our solutions have run out of gas.

A Deeper Dive into ONAP

When I blogged about the ONAP Amsterdam release, I pointed out that the documentation that was available on ONAP didn’t address the questions I had about its architecture and capabilities.  The ONAP people contacted me and offered to have a call to explain things, and also provided links and documentation.  As I said in my prior blog, there is a lot of material on ONAP, and there’s no way I could explain it all.  It would be difficult for me to explain the answers to my questions in terms of the documentation too.

I had my call, and my proposal was to take three points that I believed (based on operator input) were critical to ONAP’s value in service lifecycle automation.  I asked them to respond to these points in their own terms, and it’s that set of questions and responses that I’m including here.  For the record, the two ONAP experts on the call were Arpit Joshipura, GM Networking, Linux Foundation and Chris Donley, senior director of Open Source Technology at Huawei and chair of the ONAP Architecture Committee.

I got, as reference, a slide deck titled “ONAP Amsterdam Release” and a set of document links:

The ONAP people were very helpful here, and I want to thank them for taking the time to talk with me.  They pointed out at the start that their documentation was designed for developers, not surprising given that ONAP is an open-source project, and they were happy to cooperate in framing their approach at a higher level, which was the goal of my three points.  I framed these as “principles” that I believed had been broadly validated in the industry and by my own work, and I asked them to respond to each with their views and comments on ONAP support.

The first point is that Network Functions (NFs) are abstract components of a service that can be virtual (VNF), physical (PNF), or human (HNF).  This is an architectural principle that I think is demonstrably critical if the scope of ONAP is to include all the cost and agility elements of carrier operations.

My ONAP contacts said this was the path that ONAP was heading down, with their first priority being the VNF side of the picture.  In the architecture diagram on Page 4 of the Amsterdam Architecture White Paper referenced above, you’ll see a series of four gray boxes.  These represent the Amsterdam components that are responsible for framing the abstractions that represent service elements, and realizing them on the actual resources below.

The notion of an HNF is indirectly supported through the Closed Loop Automation Management Platform (CLAMP), which is the ONAP component responsible for (as the name suggests) closed-loop automation.  CLAMP provides for an escape from a series of automated steps into an external manual process either to check something or to provide an optional override.  These steps would be associated with any lifecycle process as defined in the TOSCA models, and so I think they could provide an acceptable alternative to composing an HNF into a service explicitly and separately.

An abstraction-driven, intent-based approach is absolutely critical to ONAP’s success.  I don’t think there’s any significant difference between how I see industry requirements in this area and what ONAP proposes to do.  Obviously, I think they should articulate this sort of thing somewhere, but articulation in terms that the industry at large could understand is a weakness with ONAP overall.  They appear to recognize that, and I think they’re eager to find a way to address it.

The second point is all network functions of the same type (Firewall, etc.) would be represented by the same abstraction, and implementation details and differences would be embedded within.  Onboarding something means creating the implementation that will represent it within its abstraction.  Abstractions should be a class/inheritance structure to ensure common things across NFs are done in a common way.

The ONAP people say they’re trying to do this with the VNFs, and they have VNF requirements project whose link reference I’ve provided above.  VNF development guidelines and an SDK project will ensure that VNF implementations map into a solid common abstraction.  This works well if you develop the VNF from scratch, but while the architecture supports the notion of creating a “wrapper” function to encapsulate either an existing software component to make it a VNF, or to encapsulate a PNF to make it an implementation of the same NF abstraction, this hasn’t been a priority.  However, they note that there are running implementations of ONAP that contain no VNFs at all; the user has customized the abstractions/models to deploy software application elements.

I don’t see any technical reason why the ONF could not support the kind of structure my second point describes, but I don’t think they’ve established a specific project goal to identify and classify NFs by type and create a kind of library of these classes.  It can be done with some extensions to the open-source ONAP framework and some additional model definition from another party.  Since most of the model properties are inherited from TOSCA/YAML, the notion of extending ONAP in this area is practical, but it is still an extension and not something currently provided.

The final point is lifecycle processes should operate on the abstractions, both within them and among them.  The former processes can be type-specific or implementation-specific or both.  The latter should always be generalized for both NFs and services created from them.

If we go back to that architecture diagram I referenced in my first point, you can see that the processes “above the line”, meaning above those four gray blocks, are general service processes that operate on abstractions (modeled elements) and not on the specific way a given network function is implemented.  That means that it’s a function of modeling (and assigning the abstraction to a gray box!) to provide the link between some NF implementation and the service processes, including closed-loop automation (CLAMP).

The key piece of lifecycle or closed-loop automation is the handling of events.  In ONAP, it’s OK for VNFs (or, presumably, PNFs) to operate on “private resources”, but they can access and control shared-tenant facilities only through the Data Collection, Analytics, and Events (DCAE) subsystem and the Active and Available Inventory (A&AI) subsystem.  There’s API access to the latter, and publish-and-subscribe access to DCAE.

The workings of these two components are fairly complicated, but the combination appears to deal with the need to identify events (even if correlation is needed) and to pass them to the appropriate processes, where handling is presumably guided by the TOSCA models.  I like the A&AI notion because it decouples process elements from real-time access to actual multi-tenant resources.

In our discussions we touched on a couple of things not part of my list of points.  One was the issue of the relationship between open-source projects like ONAP and standards bodies that were tasked with creating something in the same area.  Obviously ONAP and the ETSI NFV ISG have such a relationship.  According to the ONAP people, the coders are supposed to follow standard where they are available and work properly, and to kick the problem upstairs for liaison with the appropriate body if that isn’t possible.

The tension here is created, in my view, by the fact that “standards” in the carrier space are developed by a team of people who are specialists in the standards process.  Open-source development is populated by programmers and software architects.  My own work in multiple standards groups has taught me that there is a real gulf between these two communities, and that it’s very difficult to bridge it.  I don’t think that the ETSI structural model for NFV software is optimal or even, at scale, workable, but I also don’t think that ONAP has been religious in enforcing it.  As long as they’re prepared to step outside the ETSI specs if it’s necessary to do good code, they should be OK.

Which is how I’d summarize the ONAP situation.  Remember that in my earlier blog I questioned whether ONAP did what they needed to do, and said that I wasn’t saying they did not, but that I couldn’t tell.  With the combination of my conversation with the ONAP experts and my review of the material, I think that they intend to follow a course that should lead them to a good place.  I can’t say if it will because by their own admission they are code/contribution-driven and they’re not in that good place yet.

There is a lot that ONAP is capable of, but doesn’t yet do.  Some of it is just a matter of getting to things already decided on, but other things are expected to be provided by outside organizations or the users themselves.  Remember that ONAP is a platform not a product, and it’s always been expected that it would be customized.  Might it have been better to have brought more of that loosely structured work inside the project?  Perhaps, but god-boxes or god-packages are out of fashion.  ONAP is more extensible for the way it’s conceptualized, but also more dependent on those extensions.

This is the second, and main, risk that ONAP faces.  The operators need a solution to what ONAP calls “closed-loop” automation of the operations processes, and they need it before any significant modernization of infrastructure is undertaken.  The advent of 5G creates such a modernization risk, and that means that ONAP will have to be ready in all respects for use by 2020.  The extensions to the basic ONAP platform will be critical in addressing the future, and it’s always difficult to say whether add-on processes can do the right thing fast enough to be helpful.

Is Service Lifecycle Management Too Big a Problem for Orchestration to Solve?

Everyone has probably heard the old joke about a person reaching behind a curtain and trying to identify an elephant by touching various parts.  The moral is that sometimes part-identification gets in the way of recognizing the whole.  That raises what I think is an interesting question for our industry in achieving the transformation goals everyone has articulated.  Has our elephant gotten too big to grope, at least in any traditional way?  Is the minimum solution operators need beyond the maximum potential of the tools we’re committed to?

The steady decline in revenue per bit, first reported by operators more than five years ago, has reached the critical point for many.  Light Reading did a nice piece on DT’s cost issues, and it makes two important points.  First, operators need to address the convergence of cost and price per bit quickly, more quickly than credible new service revenue plans could be realized.  That leaves operators with only the option of targeting costs, near-term.  Second, operator initiatives to address costs have proven very complex because many of their costs aren’t service- or even technology-specific.  They can push their arms behind the curtain and grab something, but it’s too small a piece to deal with the glorious whole.

This is an interesting insight because it may explain why so many of our current technologies are under-realizing their expected impacts.  What operators have been seeking goes back a long way, about ten years, and the term we use for it today is “zero-touch automation”, which I’ve been calling “service lifecycle management automation” to reflect a bit more directly what people are trying to automate. Here, “zero touch” means what it says, the elimination of human processes that cost a lot and create errors, and the substitution of automated tools.

Like SDN and NFV?  Probably not.  Neither SDN nor NFV themselves address service lifecycle automation fully, they address only a substitution of one technical element for another.  Putting that in elephant terms, what we’ve been trying to do is apply what we learned from a single touch of some elephant part to the broad problem of dealing with the beast as a whole.  SDN and NFV are just too narrow as technologies to do that.

The next thing we tried was to apply some of the technology-specific automation strategies that emerged from SDN and NFV to that broader problem.  Orchestration in the NFV form of “MANO” (Management and Orchestration) was a critical insight of the NFV ISG, but the big question is whether the approach to automation that MANO takes can be broadened to address the whole of operator cost targets, “zero touch”.  If you touch an elephant’s toe, you can manicure it, but can you learn enough from that to care for the whole beast?

Simple scripting, meaning the recording of the steps needed to do something so they can be repeated consistently, isn’t enough here; there are too many steps and combinations.  That is what has already lead cloud DevOps automation toward an intent-modeled, event-driven, approach.  But now we have to ask whether even that is enough.  The problem is interdependence.  With intent-and-event systems, the individual processes are modeled and their lifecycle progression is synchronized by events.  The broader the set of processes you target, the more interdependent cycles you create, and the more combinations of conditions you are forced to address.  At some point, it becomes very difficult to visualize all the possible scenarios.

MANO orchestration has a simple, highly compartmentalized goal of deploying virtual functions.  Once deployed, it leaves the management of those functions to traditional processes.  It doesn’t try to orchestrate OSS/BSS elements or human tasks, and if you add in those things you create the interdependence problem.  You can visualize a service deployment as being access deployment plus service core deployment, which is a hierarchical relationship that’s fairly easy to model and orchestrate.  When you add in fault reports, journaling for billing, human tasks to modernize wiring, and all manner of other things, you not only add elements, you add relationships.  At some point you have more of a mesh than a hierarchy, and that level of interdependence is very difficult to model using any of the current tools.  Many can’t even model manual processes, and we’re going to have those in service lifecycle management until wires can crawl through conduits on their own.

What I am seeing is a growing realization that the problem of zero-touch is really, at the technical level, more like business process management (BPM) than it is about “orchestration” per se.  No matter how you manage the service lifecycle, sticking with the technical processes of deployment, redeployment, and changes will limit your ability to address the full range of operations costs.  BPM attempts to first model business processes and then automate them, which means it’s focused on processes directly—which means it can focus directly on costs, since processes are what incur them.

What we can’t do is adopt the more-or-less traditional BPM approaches, based on things like service busses or SOA (service-oriented architecture) interfaces that have a high overhead.  These are way too inefficient to permit fast passing of large numbers of events, and complex systems generate that.  Busses and SOA are better for linear workflows, and while the initial deployment of services could look like that, ongoing failure responses are surely not going to even remotely resemble old-fashioned transactions.

How about intent modeling?  In theory, an intent model could envelope anything.  We already know you can wrap software components like virtual network functions (VNFs) and SDN in intent models, and you can also wrap the management APIs of network and IT management systems.  There is no theoretical reason you can’t wrap a manual process in an intent model too.  Visualize an intent model for “Deploy CPE” which generates a shipping order to send something to the user, or a work order to dispatch a tech, or both.  The model could enter the “completed” state when a network signal/event is received to show the thing you sent has been connected properly.  If everything is modeled as a microservice, it can be made more efficient.

This seems to be a necessary condition for true zero-touch automation, particularly given that even if you eventually intend to automate a lot of stuff, it won’t be done all at once.  Even non-service-specific processes may still have to be automated on a per-service basis to avoid creating transformational chaos.  Some tasks may never be automated; humans still have to do many things in response to problems because boards don’t pull themselves.

It’s probably not a sufficient condition, though.  As I noted above, the more interdependent things you have in a given process model, the harder it is to synchronize the behavior of the system using traditional state/event mechanisms.  Even making it more efficient I execution won’t make it scalable.  I’m comfortable that the great majority of service deployments, at the technical level, could be automated using state/event logic, but I’m a lot less comfortable—well, frankly, I’m uncomfortable—saying that all the related manual processes could be synchronized as well.  Without synchronizing those broader processes, you miss too much cost-reduction opportunity and you risk having human processes getting out of step with your automation.

This is a bigger problem than it’s appeared to be to most, including me.  We’re going to need bigger solutions, and if there’s anything the last five years have taught me, it’s that we’re not going to get them from inside the telecom industry.  We have to go outside, to the broader world, because once you get past the purchasing and maintenance of equipment and some specific service-related stuff, business is business.  Most of the costs telcos need to wring out are business costs not network costs.  To mix metaphors here, we’re not only shooting behind the duck with SDN and NFV, we’re shooting at the wrong duck.

I’ve said for some time that we need to think of NFV, orchestration, and lifecycle automation more in terms of cloud processes than specialized network processes, and I think the evolving cost-reduction goals of operators reinforces this point.  If zero-touch automation is really an application of BPM to networking businesses, then we need to start treating it that way, and working to utilize BPM and cloud-BPM tools to achieve our goals.

Why Is Carrier Cloud on the Back Burner for Carriers?

I noted in my blog yesterday that I was surprised and disappointed by the fact that the network operators didn’t seem to have given much thought to carrier cloud in their fall technology planning cycle.  I had a decent number of emails from operators, including some that I’d surveyed, to explain why that seemed to be (and for most, was) the case.  I thought I’d share a summary with you.

The net of the comments was that for better or worse, the operators have come to view “carrier cloud” as the outcome of things they’re doing rather than as a technical objective.  About half those who commented to me said that they believed that over time as much as a quarter of all their capex would be spent on servers and cloud technology.  However, they were all over the map in terms of how they believed they’d get to that point.

NFV remains the favored technology to drive carrier cloud, despite the fact that there is relatively little current indication that the connection exists, much less is growing.  This is in sharp contrast to my own modeling of the carrier cloud opportunity, which says that nothing will happen with NFV/carrier cloud in 2017 and that only about 4% of carrier cloud opportunity in 2018 comes from NFV.  In fact, the attitudes on carrier cloud demonstrate how difficult it is to survey users on many technology trends.  Two years ago, an even larger percentage of operators told me that in 2017 NFV would be driving carrier cloud.  My model always said “No!”

The second-favored technology to drive carrier cloud is 5G, and interestingly the percentage of operators who say 5G will be the driver in 2020 is almost exactly the same who said that NFV would be two years ago.  The majority of this group still think that NFV is the real driver, and they believe carrier cloud comes about because of NFV involvement in 5G implementation.

It’s really difficult to say what 5G would do for carrier cloud, largely because it’s difficult to say what 5G will do overall, both functionally and in a business sense.  A third of the comments I got from operators that favored 5G as a carrier cloud driver admitted that 5G “has a long way to go” before real adoption can be expected.  In other dialogs I’ve had with operators, they indicated that their current 5G planning focused on the radio network (RAN).  Some said they wanted to extend FTTN with 5G instead of DSL/copper, but most thought they’d do the upgrade for competitive and capacity reasons.

Those who think 5G boosts NFV, which boosts carrier cloud, are thinking mostly of a 5G goal of making network service features interconnectable and “roamable” to the same extent that connectivity is.  The problems with this vision are 1) there is no currently approved approach for VNF federation in NFV, 2) there’s no significant VNF deployment except in premises devices, 3) many operators don’t like the notion of constructing services from components like that, fearing it would eliminate a large-provider advantage, and 4) we still don’t have a 5G standard in this area (and probably won’t get one till next year).

The actual place where 5G might help carrier cloud is in the residential broadband space.  I’ve been blogging for almost a year on the fact that operators told me the most interesting 5G application was DSL replacement in FTTN deployments, and Verizon has now announced it will be starting to deploy that way halfway through 2018.  Clearly the thing that needs 5G capacity versus DSL capacity would be video, and video turns out to be the thing my model says is the best near-term driver of carrier cloud.

In 2017, video delivery enhancements and advertising caching (CDN and related tools) accounted for almost 60% of the opportunity driver for carrier cloud, and you’ve got to go out to 2020 before it drops below the 50% level.  Obviously there hasn’t been much uptick in the adoption of carrier cloud for video/ad hosting, but here’s an important point—you can’t deliver traditional FiOS video over 5G/FTTN.  You’d have to stream; thus, it is very likely that the Verizon-style 5G/FTTN model would require augmented caching for video and ad delivery.

The good thing about this particular carrier cloud driver is that it would very likely create a demand for edge-caching, meaning edge-hosting, meaning edge-concentrated carrier cloud.  FTTN terminates in central offices where there’s real estate to establish carrier cloud data centers.  These data centers could then be expected to serve as hosting points for other carrier cloud applications that are not yet capable of justifying one-off deployments of their own.

By 2020, when video/ad support finally drops below 50%, the biggest uptick in carrier cloud driver contribution comes from the 5G/IMS/EPC area, meaning the virtual hosting of 5G-and-mobile-related elements.  This is because as wireline DSL/FTTN is replaced by 5G/FTTN, there’s certain to be symbiotic use of that home 5G.  One of the easiest ways to stall out a new RAN technology is to have no handsets capable of using it, which happens in part because there’s no 5G cells to use those handsets in.  If many homes have local 5G, then why not let those same 5G connections support the homeowner?  In fact, why not let those FTTN links to 5G-for-home also serve as 5G RAN cells for mobile services?  You end up with a lot of 5G deployment, enough to justify handset support for 5G.

The big carrier cloud opportunity starts to show at about this same point (2020) and by 2022 it makes up half of the total carrier cloud driver opportunity.  It’s the shift to event/contextual services, including IoT.  The edge data centers that are driven by 5G/FTTN are available for event processing and the creation of contextual, event-driven, services that most cloud providers won’t be able to supply for lack of edge data centers.  This is what finally gives the network operators a real edge in cloud services.

Of course, they may not take the opportunity and run with it.  You can fairly say that the big problem with carrier cloud is that it’s driven by a bunch of interdependent things and not one single thing, and that’s probably why operators don’t think of it as a specific technology planning priority.  They need to learn to think a different way, and I’m trying now to find out if there are real signs that’s going to happen.  Stay tuned!

Operators’ Technology Plans for 2018: In a Word, “Murky”

We are now past the traditional fall technology planning cycle for the network operators, and I’ve heard from the ones that cooperate in providing me insight into what they expect to do next year and beyond.  There are obviously similarities between their final plans and their preliminary thinking, but they’ve also evolved their positions and offered some more details.

Before they got started, operators told me there were three issues they were considering.  First, could SDN and NFV be evolved to the point where they could actually make a business case and impact declining profit per bit?  Second, was there hope that regulatory changes would level the playing field with the OTTs?  Third, what could really be done with, and expected of, 5G?  They’ve addressed all of these, to a degree.

There’s some good news with respect to NFV, and some not-so-good.  The best news in a systemic sense is that operators have generally accepted the notion that broader service lifecycle automation could in fact make a business case.  The not-so-good news in the same area is that operators are still unconvinced that any practical service lifecycle automation strategy is being offered by anyone.  For SDN and NFV, the good news is that there is gradual acceptance of the value of both technologies in specific “low-apple” missions.  The bad news is that operators aren’t clear as to how either SDN or NFV will break out of the current limitations.

From a business and technology planning perspective, operators think they have the measure of vCPE in the context of business edge services.  They believe that an agile edge device could provide enough benefits to justify vCPE deployment, though most admit that the ROIs are sketchy.  They also believe that the agile-edge approach is a reasonable way to jump off to cloud-edge hosting of the same functions, though most say that their initiatives in this area are really field trials.  That’s because virtually no operators have edge-cloud deployments to exploit yet.

It’s interesting to me that SDN and NFV haven’t introduced a new set of vendors, at least not yet.  About two-thirds of the operators say that the vendors they’re looking hardest at for SDN and NFV are vendors who are incumbents in their current infrastructure.  The biggest place that’s not true is in “white-box” switching, and in that space, operators are showing more interest in rolling their own based on designs from some open computing and networking group than in buying from a legacy or new vendor.

In the NFV space, computer vendors are not showing any major gains in strategic influence, which is interesting given that hosting is what separates NFV from device-based networking.  The reason seems to be that “carrier cloud” is where servers deploy, and so far, NFV is confined to agile CPE and doesn’t contribute much to proactive carrier-cloud (particularly edge-cloud) deployment.  Somewhat to my own surprise, I didn’t see much push behind “carrier cloud” in the planning cycle.  I think that’s attributable to a lack of strategic focus among the computer vendors, and lack of a single decisive driver.

The lack of a decisive driver is reflected in my own modeling of the market opportunity for carrier cloud.  Up to 2020, the only clear opportunity driver is video and advertising, and operators have both regulatory and competitive concerns in both these areas.  Video on demand and video streaming are both slowly reshaping content delivery models, but there seems little awareness of the opportunity to use operator CDNs as a carrier-cloud on-ramp, and frankly I’m disappointed to see this.  I hope something changes in 2018.

On the regulatory side, note my blog on Monday on the FCC’s move.  Operators are both hopeful and resigned on the proposed change, which is somewhat as I’d feared.  They recognize that unless the FCC were to impose Title II regulation on operators, they have little chance of imposing restrictions on settlement and paid prioritization.  They also believe that whatever happens in the US on “net neutrality” up to 2020, it’s not going to reimpose Title II.  Thus, their primary concern is that a change in administration could result in a reversal of the Title II ruling in 2020.  That limits the extent to which they’d make an aggressive bet on paid prioritization and settlement.

The US operators I’ve talked with are cautious about even moving on settlement, fearing that a decision to charge video providers (in particular) for delivery would result in consumer price hikes and potential backlash on the whole regulatory scheme.  Thus, they seem more interested in the paid prioritization approach, offering at least content providers (and in some very limited cases, consumers) an opportunity to pay extra for special handling.

Outside the US, operators believe that if the US applies paid prioritization and settlement to the Internet, many or even most other major markets would follow suit.  However, they don’t think it would happen overnight, and that makes the confidence that US operators feel in the longevity of the regulatory shift very important.

For 2018 and probably 2019, I don’t see any signs that regulatory changes will have a major impact on technology choices.  SDN could be facilitated by paid prioritization, but current plans don’t include SDN because the shift couldn’t be easily reversed if policies changed.  Fast lanes may come, but they won’t drive near-term technology changes.

Any hopes of changes, at least technology changes, come down to 5G.  In areas where mobile services are highly competitive (including the US and EU), 5G deployment may be mandatory for competitive reasons alone.  In the US and some other markets, 5G/FTTN combinations offer the best hope of delivering “wireline” broadband at high speeds to homes and small business/office locations.  All of this adds up to the likelihood that 5G investment is baked into business plans, and that’s what I’ve been told.

Baking it into technology plans is a given at one level (it’s mandated) but difficult at another (what do you bake in?)  Like almost everything else in tech, 5G has been mercilessly overhyped, and associated with a bunch of stuff whose 5G connection is tenuous at best.  Let me give you some numbers to illustrate this.

Of the operators who’ve talked to me on the topic, 47% say that there’s a credible link between 5G and NFV, and 53% say 5G doesn’t require NFV.  On SDN, 36% say there’s a credible link and 64% say there isn’t.  In carrier cloud 68% say there’s a credible link to 5G and 32% say “No!”  Fully 85% say that you could do 5G credibly with nothing more than changes to the radio access network (RAN).  So where does this leave 5G technology planning?

5G core specifications won’t be ratified for almost a year, and it’s not clear to operators how much of the 5G capabilities that are then finalized in standards form will be translated into deployed capabilities, or when.  Much of 5G core deals with feature/service composability, and some operators argued that this sort of capability hasn’t been proved in the higher-value wireline business services market.

Where this has left operators this fall is a position of management support for 5G deployment but only limited technical planning to prepare for it.  The sense I get is that operators are prepared to respond to competitive 5G pressure and do what the market demands, but they really hope (and perhaps believe) that 5G won’t be a technical planning issue before 2020 or even 2021.

Across the board, that same confusion seems to hold.  In fact, this year’s planning cycle is less decisive than any I can recall in 30 years, though some of the winning technologies of prior cycles never really made any impact (ATM comes to mind).  Might the lack of direction, the emphasis on response rather than on tactical planning, be a good sign?  Perhaps the market can pick better technologies than the planners, and it appears that for 2018 at least, the planners are looking to the market for direction.

Sorry, ONAP, I Still Have Questions

The ONAP Amsterdam release is out, and while there are reports that the modular structure eases some of the criticisms made of ONAP, I can’t say that it’s done anything to address my own concerns about the basic architecture.  I’ve tried to answer them by reviewing the documentation on ONAP, without success.  They’re important, basic, questions, and so I’ll address them here and invite someone from ONAP to answer them directly.

Let me start by talking about a VNF, which is the functional unit of NFV.  A VNF is a hosted feature, something that has to be deployed and sustained, like any software component.  VNFs have network connections, and these can be classified into three general categories.  First, there are the data-plane connections that link a VNF into a network.  Firewall VNFs, for example, would likely have two of these, one pointing toward the network service and the other toward the user.  Second, there are management connections representing portals through which the management of the element can be exercised.  SNMP ports and CLI ports are examples.  Third, there may be a connection provided for user parametric control, to do things like change the way a given TCP or UDP port is handled by a firewall.

When we deploy a VNF, we would do a bunch of stuff to get it hosted and make whatever connections it has accessible.  We would then exercise some sort of setup function to get the management and user parametrics set up to make the function operational.  Lifecycle processes might have to renew the connections and even change management and user parameters.  The connection management part of deployment involves making the connections addressable in an internal (private) address space, and exposing into the “service address space” any connections that are really visible to the user.

I think it’s clear that the process of “deployment”, meaning getting the VNF hosted and connected, has to be a generalized process.  There is no reason why you’d have to know you were doing a firewall versus an instance of IMS, just to get it hosted and connected.  A blueprint has to describe what you want, not why you want it.

In the management and user parameterization area, it is possible that you will not have a generalized interface.  All SNMP MIBs aren’t the same, and certainly all the possible firewall implementations don’t have the same interface (often it’s a web portal the device exposes) to change its parameters.  If we don’t need to set up the VNF because the user is expected to do that, then management and parameterization are non-issues.  If we do have to set it up (whether there’s a standard interface or not) then we need to have what I’ll call a proxy that can speak the language of that interface.  Logically, we’d ask that proxy to translate from some standard metamodel to the specific parameter structure of the interface.

Given this, the process of onboarding a VNF would be the process of describing the hosting/connection blueprint (which probably can be done with existing virtualization and DevOps software) and providing or identifying the proper proxy.  I would submit that there is nothing needed that’s VNF-specific beyond this, and nothing that’s specific to the mission of the VNF.

OK, so given this, what are my concerns with Amsterdam?  The answer is that a good, promotional, industry-responsive description of VNF-specific processes would look like what I just offered.  I’d start with that, a powerful comment on my generalized approach.  I might then say “We are releasing, with Amsterdam, the metamodel for VoLTE and residential gateway (vCPE), and also a proxy for the following specific management/parameter interfaces.”  I’d make it clear that any VNF provider could take one of the proxies and rebuild it to match their own interfaces, thus making their stuff deployable.  This would be a highly satisfactory state of affairs.

ONAP hasn’t done that.  They talk about the two “use case” applications but don’t say that their support for them is a sample adaptation of what’s a universal VNF lifecycle management capability.  So is it?  That’s my question.  If there is any degree of VNF or service specificity in the ONAP logic, specificity that means that there really is a different set of components for VoLTE versus consumer broadband gateway, then this is being done wrong and applications and VNFs may have to be explicitly integrated.

The blueprint that describes deployment is the next question.  Every VNF should deploy pretty much as any other does, using the same tools and the same set of parameters.  Every service or application also has to be composable to allow that, meaning a blueprint has to be created that describes not only the structure of the service or application, but also defines the lifecycle processes in some way.

People today seem to “intent-wash” everything in service lifecycle management.  I have an intent model, therefore I am blessed.  An intent model provides a means of hiding dependencies, meaning that you can wrap anything that has the same external properties in one and it looks the same as every other implementation.  If something inside such a model breaks, you can presume that the repair is as hidden (in a detail sense) as everything else is.  However, that doesn’t mean you don’t have to author what’s inside to do the repair.  It doesn’t mean that if the intent-modeled element can’t repair itself, you don’t have to somehow define what’s supposed to happen.  It doesn’t mean that there isn’t a multi-step process of recommissioning intent-modeled components, and that such a process doesn’t need to be defined.

I don’t see this stuff either.  I’m not saying it’s not there, but I do have to admit that since operators tell me that this is the sort of thing they’d need to know to place a bet on ONAP, it’s hard to see why it wouldn’t be described triumphantly if it is there.

ONAP may not like my criticism and doubt.  I accept that, and I accept the possibility that there’s some solid documentation somewhere on the ONAP wiki that explains all of this.  OK, ONAP people, assume I missed it (there is a lot of stuff there, candidly not structured to be easily reviewed at the functional level), and send me a link.  Send me a presentation that addresses my two points.  Whichever you do, I’ll read it, and alter my views as needed based on what I find.  If I didn’t miss it, dear ONAP, then I think you need to address these questions squarely.

While I’m offering ONAP an opportunity to set me straight, let me do the same for the NFV ISG.  This is how NFV deployment should work.  Show me where this is explained.  Show me how this model is referenced as the goal of current projects, and how current projects align with it.  Do that and I’m happy to praise your relevance in the industry.  If you can’t do that, then I humbly suggest that you’ve just defined what your next evolutionary phase of work should be targeting.

To both bodies, and to everyone else in the NFV space who talks with the media and analyst communities and wants publicity, I also have a point to make.  You are responsible for framing your story in a way that can be comprehended by the targets of your releases and presentations.  You should never presume that everyone out there is a member of your group, or can take what might be days or weeks to explore your material.  Yes, this is all complicated, but if it can’t be simplified into a media story then asking for publicity is kind of a waste of time, huh?  If it can’t be explained on your website with a couple diagrams and a thousand words of text, then maybe it’s time to revisit your material.

The FCC Neutrality Order: It’s Not What you Think

We have at least the as-yet unvoted draft of the FCC’s new position on Net Neutrality, and as accustomed as I am to reading nonsense about developments in tech, the responses here set a new low.  I blogged about the issues that the new FCC Chairman (Pai) seemed to be addressing here, and I won’t reprise all the details.  I’ll focus instead on what the draft says and how it differs from the position I described in the earlier blog, starting with some interesting “two-somes” behind the order.

There are really two pieces of “net neutrality”.  The first could be broadly called the “non-discrimination” piece, and this is what guarantees users of the Internet non-discriminatory access to any lawful website.  The second is the “prioritization and settlement” piece, and this one guarantees that no one can pay to have Internet traffic handled differently (prioritized) or be required to pay settlement among ISPs who carry the traffic.  The public debate has conflated the two, but in fact the current action is really aimed at the second.

There are also two competing issues in net neutrality.  The first is the interest of the consumers and OTTs who are using the Internet, and the second the profit interest of the ISPs who actually provide the infrastructure.  The Internet is almost totally responsible for declining profit per bit, and at some point this year or next, it will fall below the level needed to justify further investment.  While everyone might like “free Internet”, there will be no race to provide it.  A balance needs to be struck between consumer interest and provider interest.

As a practical matter, both the providers and the OTTs have powerful financial interests they’re trying to protect, and they’re simply manipulating the consumers.  Stories on this topic, as I said in my opening paragraph, have been simply egregious as far as conveying the truth is concerned.  The New York Attorney General is investigating whether some comments on the order were faked, generated by a third party usurping the identities of real consumers.  Clearly there’s a lot of special interest here.

Finally, there are two forums in which neutrality issues could be raised.  The first is the FCC and the second the Federal Trade Commission (FTC).  The FCC has a narrow legal mandate to regulate the industry within the constraints of the Communications Act of 1934 as amended (primarily amended by the Telecommunications Act of 1996).  The FTC has a fairly broad mandate of consumer protection.  This is a really important point, as you’ll see.

So, what does the new order actually do?  First and foremost, it reverses the previous FCC decision to classify the Internet as a telecommunications service (regulated under Title II of the Telecommunications Act of 1934).  This step essentially mandates an FCC light touch on the Internet because the Federal Courts have already invalidated many of the FCC’s previous rules on the grounds they could be applied only to Telecommunications Services.

All “broadband Internet access services”, independent of technology, would be classified as information services.  It includes mobile broadband, and also includes MVNO services.  People/businesses who provide broadband WiFi access to patrons as a mass consumer service are included.  It excludes services to specialized devices (including e-readers) that use the Internet for specialized delivery of material and not for broad access.  It also excludes CDNs, VPNs, or Internet backbone services.  The rule of thumb is this; if it’s a mass-market service to access the Internet, then it’s an information service.

The classification is important because it establishes the critical point of jurisdiction for the FCC.  The FCC is now saying that it would be restrictive to classify the Internet as Title II, but without that classification the FCC has very limited authority to regulate the specific behavior of the ISPs.  Thus, the FCC won’t provide much in the way of specific regulatory limits and penalties.  It couldn’t enforce them, and perhaps it could never have done so.  Everything they’ve done in the past, including non-discrimination, has been appealed by somebody based on lack of FCC authority, and the Title II classification was undertaken to give the FCC authority to do what it wanted.  Absent Title II, the FCC certainly has no authority to deal with settlement and prioritization, and probably has insufficient authority to police non-blocking and discrimination.  That doesn’t mean “net neutrality” goes away, as the stories have said.

The FCC will require that ISPs publish their terms of service in clear language, including performance, terms of service, and this is where the FCC believes that “neutrality” will find a natural market leveling.  The order points out that broadband is competitive, and that consumers would respond to unreasonable anti-consumer steps (like blocking sites, slowing a competitor’s offerings, etc.) by simply moving to another provider.

The order also points out that the “premier consumer protection” body, the FTC, has authority to deal with situations where anti-competitive or anti-consumer behavior arise and aren’t dealt with effectively by competitive market forces.  Thus, the FCC is eliminating the “code of conduct” that it had previously imposed, and is shifting the focus of consumer protection to the FTC.  As I noted earlier, it’s never been clear whether the FCC had the authority to impose “neutrality” except through Title II, and so the fact is that we’ve operated without strict FCC oversight for most of the evolution of the Internet.

The FTC and the marketplace are probably not enough to prevent ISPs from offering paid prioritization and for requiring settlement to deliver high-volume traffic.  In fact, one of the things I looked for in the order was the treatment of settlement among ISPs, the latter topic being particularly dear to my heart since I’ve opposed the current “bill and keep” practice for decades, and even co-authored an RFC on the topic.  The order essentially says that the FCC will not step in to regulate the way that ISPs settle for peering with each other or through various exchanges.  Again, the FCC says that other agencies, including DoJ antitrust and the FTC, have ample authority to deal with any anti-competitive or unreasonable practices that might arise.

Paid prioritization is similarly treated; the FCC has eliminated the rules against it, so ISPs are free to work to offer “fast-lane” behavior either directly to the consumer or to OTTs who want to pay on behalf of their customers to improve quality of experience.  This may encourage specific settlement, since the bill-and-keep model can’t compensate every party in a connection for the additional cost of prioritization.  We should also note that paid prioritization could be a true windfall for SD-WAN-type business services, since the economics of high-QoS services created over the top with paid prioritization would surely be a lot better than current VPN economics.  You could argue that SD-WAN might be the big winner in the order.

The OTTs will surely see themselves as the big losers.  What they want is gigabit broadband at zero cost for everyone, so their own businesses prosper.  Wall Street might also be seen as a loser, because they make more money on high-flyers like Google (Alphabet) or Netflix than on stodgy old AT&T or Verizon.  VCs lose too because social-media and other OTT startups could face higher costs if they have to pay for priority services.  That might mean that despite their grumbling, players like Facebook and Netflix could face less competition.

It will be seen as an improvement for the ISPs, but even there a risk remains.  Network operators have a very long capital cycle, so they need stability in the way they are regulated.  This order isn’t likely to do that for two reasons.  First, nobody believes that a “new” administration of the other party would leave this order in place.  Second, only legislation could create a durable framework, and Congress has been unable to do even major things.  They’ve avoided weighing in on Internet regulation for 20 years now.  Thus, realizing the full benefits of the order may be illusive because operators might be reluctant to believe the changes will persist long enough to justify changing their plans for investment in infrastructure.

The long-term regulatory uncertainty isn’t the only uncertainty here.  The Internet is global, and its regulation is a hodgepodge of competing national and regional authorities, most of whom (like the FCC) haven’t had a stable position.  “We brought in one side and gave them everything they wanted, then we brought in the other side and gave them everything they wanted,” is how a lawmaker in the US described the creation of the Telecom Act in 1996.  That’s a fair statement of regulatory policy overall; the policies favor the firms who favor the current political winners.

My view, in the net?  The FCC is taking the right steps with the order, and that view shouldn’t surprise those who’ve read my blog over the last couple of years.  Net neutrality is not being “killed”, but enforcement of the first critical part of it (what consumers think neutrality really is) is shifted to the FTC, whose power of enforcement is clear.  There is no more risk that ISPs could decide what sites you could visit than there has been—none, in other words.  It’s not a “gift to telecom firms” as one media report says, it’s a potential lifeline for the Internet overall.  This might reverse the steady decline in profit per bit, might restore interest in infrastructure investment.  “Might” if the telcos believe the order will stand.

It’s not going to kill off the OTTs either.  There is a risk that the OTTs will be less profitable, or that some might raise their rates to cover the cost of settlement with the ISPs.  Will it hurt “Internet innovation?”  Perhaps, if you believe we need another Facebook competitor, but it might well magnify innovation where we need it most, which is in extending broadband at as high a rate and low a cost as possible.

If the ISPs are smart, they’ll go full bore into implementing the new position, offering paid prioritization and settlement and everything similar or related, and demonstrating that it doesn’t break the Internet but promotes it.  That’s because there could be only about three years remaining on the policy before a new FCC threatens to take everything back.  The only way to be sure the current rules stay in place is to prove they do good overall.

Cisco’s Quarter: Are They Really Facing the Future at Last?

Cisco reported its quarterly numbers, which were still down in revenue terms, but they forecast the first growth in revenue the company has had in about 2 years of reports.  “Forecast” isn’t realization of course, but the big question is whether the gains represent what one story describes as “providing an early sign of success for the company’s transition toward services and software”, whether it’s mostly a systemic recovery in network spending, or just moving the categories of revenue around.  I think it’s a bit of everything.

Most hardware vendors have been moving to a subscription model for all their software elements, which creates a recurring revenue stream.  New software, of course, is almost always subscription-based, and Cisco is a bit unique among network vendors in having a fairly large software (like WebEx) and server/platform business.

Cisco’s current-quarter year-over-year data shows a company that’s still feeling the impact of dipping network equipment opportunity.  Total revenue was off 2%, infrastructure platforms off 4%, and “other products” off 16%.  Security was the big winner, up 8%, with applications up 6% and services up 1%.  If you look at absolute dollars (growth/loss times revenue), the big loser was infrastructure and the big winner was applications.

Here’s the key point, the point that I think at least invalidates the story that this is an “early sign of success” for the Cisco shift in emphasis.  Infrastructure platforms are over 57% of revenue as of the most recent quarter.  Applications are about 10%, Security about 5%, and Services about 25%.  Two categories of revenue—applications and security—that are showing significant growth combine to make up only 15% of revenue, and that 57% Infrastructure Products sector is showing a significant loss.  How can gains in categories that account for only 15% of revenue offset losses in a category that account for almost four times as much revenue?

Two percent of current revenues for Cisco, the reported q/q decline, is about $240 million.  To go from 2% loss to a 2% gain, which is where guidance is, would require $480 million more revenue from those two gainer categories, which now account for about $1.8 billion in total.  Organic growth in TAM of that magnitude is hardly likely in the near term, and change in market share in Cisco’s favor similarly so.  What’s left? [see note below]

The essential answer is M&A.  Cisco has a decent hoard of cash, which it can use to buy companies that will contribute a new revenue stream.  However, Cisco classifies the revenue, getting about half a billion more would create everything Cisco needs.  Cisco is being smart by using cash and M&A to diversify, to add products and revenue to offset what seems the inevitable diminution of Cisco’s legacy, core, products’ contribution.  So yes, Cisco is transforming, but less by a transition toward software and services than by the acquisition of revenues from outside.

It may seem this is an unimportant distinction, but it’s not.  The problem with “buying revenue” through M&A is that you easily run out of good options.  It would be better if Cisco could fund its own R&D to create innovative products in other areas, but there are two problems with that.  First, what would an innovator in another “area” want with a job with Cisco?  They probably have experts in their current focus areas, which doesn’t help if those areas are in perpetual decline.  Second, it might take too long; if current infrastructure spending (at 57% of revenue) is declining at a 4% rate, the Cisco’s total revenue will take a two-and-a-quarter-percent hit.  To offset that in sectors now representing 15% of revenue, Cisco would need gains there of about 12%, right now.  That means that at least for now, Cisco needs M&A.

Most of all, it needs a clear eye to the future.  You can’t simply run out to the market and look for people to buy when you need to add something to the bottom line.  The stuff you acquire might be in at least as steep a decline as the stuff whose decline you’re trying to offset.  If you know where things are going you can prevent that, and you can also look far enough out to plan some internal projects that will offer you better down-line revenue and reduce your dependence on M&A.

Obviously, it’s not easy to find acquisitions to make up that needed $350 billion.  Cisco would have to be looking at a lot of M&A, which makes it much harder to pick out winners.  And remember that the losses from legacy sectors, if they continue, will require an offset every year.  A better idea would be to look for acquisitions that Cisco could leverage through its own customer relationships, and that would represent not only that clear current symbiosis but also future growth opportunity.  That kind of M&A plan would require a whole lot of vision.

Cisco has spent $6.6 billion this year on the M&A whose prices have been disclosed, according to this article, of which more than half was for AppDynamics.  Did that generate the kind of revenue gains they need?  Hardly.  It’s hard to see how even symbiosis with Cisco’s marketing, products, and plans could wring that much from the M&A they did.  If it could, it surely would take time and wouldn’t help in the coming year to get revenues from 2% down to 2%up.

To be fair to Cisco, this is a tough time for vision for any network vendor, and a tough industry to predict.  We have in networking an industry that’s eating its heart to feed its head.  The Internet model under-motivates the providers of connectivity in order to incentivize things that consume connectivity.  Regulations limit how aggressively network operators could elect to pursue those higher-layer services, which leaves them to try to cut costs at the lower level, which inevitably means cutting spending on equipment.

That which regulation has taken away, it might give back in another form.  The FCC will shortly announce its “end of net neutrality”, a characterization that’s fair only if you define “net neutrality” much more broadly than I do, and also that the FCC was the right place to enforce real net neutrality in the first place.  Many, including Chairman Pai of the FCC, think that the basic mission of non-discrimination and blocking that forms the real heart of net neutrality belongs in the FTC.  What took it out of there was less about consumer protection than OTT and venture capital protection.

The courts said that the FCC could not regulate pricing and service policy on services that were “information services” and explicitly not subject to that kind of regulation.  The previous FCC then reclassified the Internet as a telecommunications service, and the current FCC is now going to end that.  Whether the FCC would end all prohibitions on non-neutral behavior is doubtful.  The most it would be likely to do is accept settlement and paid prioritization, which the OTT players hate but which IMHO would benefit the ISPs to the point of improving their willingness to capitalize infrastructure.

What would network operators do if the FCC let them sell priority Internet?  Probably sell it, because if one ISP didn’t and another did, the latter would have a competitive advantage with respect to service quality.  Might the decision to create Internet QoS hurt business VPN services?  No more than SD-WAN will, inevitably.

Operators could easily increase their capex enough to change Cisco’s revenue growth problems into opportunities.  Could Cisco be counting on the reversal of neutrality?  That would seem reckless, particularly since Cisco doesn’t favor the step.  What Cisco could be doing is reading tea leaves of increasing buyer confidence; they do report an uptick in order rates.  Some of that confidence might have regulatory roots, but most is probably economic.  Networking spending isn’t tightly coupled to GDP growth in the long term (as I’ve said in other blogs) but its growth path relative to GDP growth still takes it higher in good times.

The question is what tea leaves Cisco is reading.  Their positioning, which is as strident as always, is still lagging the market.  Remember that Cisco’s strategy has always been to be a “fast follower” and not a leader.  M&A is a better way to do that because an acquired solution can be readied faster than a developed one, and at lower cost.  But fast following still demands knowing where you’re going, and it also demands that you really want to be there.  There is nowhere network equipment can go in the very long term but down.  Value lies in experiences, which means software that creates them.  I think there are players out there that have a better shot at preparing for an experience-driven future than any Cisco has acquired.

What Cisco probably is doing is less preparing for “the future” than slapping a band-aid on the present.  They are going to leak revenue from their infrastructure stuff.  The market is going to create short-term wins for other companies as the networking market twists and turns, and I think Cisco is grabbing some of the wins to offset the losses.  Regulatory relief would give them a longer period in which to come to terms with the reality of networking, but it won’t fend off the need to do that.  The future doesn’t belong to networking at this point, and Cisco has yet to show it’s faced that reality.

[The paragraph in italics had errors in its original form and is corrected here!]

MEF 3.0: Progress but not Revolution

We have no shortage of orchestration activity in standards groups, and the MEF has redoubled its own Life Cycle Orchestration (LSO) efforts with its new MEF 3.0 announcement.  The overall approach is sound at the technical level, meaning that it addresses things like the issues of “federation” of service elements across provider boundaries, but it also leaves some gaps in the story.  Then there’s the fact that the story itself is probably not completely understood.

Virtualization in networking is best known through the software-defined network (SDN) and network functions virtualization (NFV) initiatives.  SDN replaces a system of devices with a different system, one based on different principles in forwarding.  NFV replaces devices with hosted instances of functionality.  The standards activities in the two areas are, not surprisingly, focused on the specific replacement mission of each.  SDN focuses on how forwarding substitutes for switching/routing, and NFV on how you make a hosted function look like a device.

The problem we’ve had is that making a substitution workable doesn’t make it desirable.  The business case for SDN or NFV is hard to make if at the end of the day, the old system and the new are equivalent in every way, yet that’s the “replacement” goal each area has been pursuing.  Operators have shifted their view from the notion that they could save enough in capital costs by the change to justify it, to the notion that considerable operational savings and new-service-opportunity benefits would be required.  Hence, the SDN and NFV debates have been shifting toward a debate on service lifecycle management automation.

Neither SDN nor NFV put SLMA in-scope for standardization, which means that the primary operations impact of both SDN and NFV is to ensure that the opex and agility of the new system isn’t any worse than that of the old.  In fact, NFV in particular is aiming at simple substitution; MANO in NFV is about getting a virtual function to the state of equivalence with a physical function.  It’s the lack of SLMA capability that’s arguably hampering both SDN and NFV deployment.  No business case, no business.

The MEF has taken a slightly approach with its “third network”, and by implication with MEF 3.0.  The goal is to create not so much a virtual device or network, but a virtual service.  To support that, the LSO APIs are designed to support “federated” pan-provider control of packet and optical elements of a service, and also for the coordination of higher-layer features (like security) that are added to basic carrier Ethernet.

There are three broad questions about the MEF approach.  First is the question of federation; will the model address long-standing operator concerns about it?  Second is the question of carrier-Ethernet-centricity; does the MEF really go far enough in supporting non-Ethernet services?  Finally, there’s the overarching question of the business case; does MEF 3.0 move the ball there?  Let’s look at each.

Operators have a love/hate relationship with federation, and I’ve worked for a decade trying to help sort things out in the space.  On one hand, federation is important for operators who need to provide services larger than their own infrastructure footprint.  On the other, federation might level the playing field, creating more competitors by helping them combine to offer broader-scope services.  There’s also the problem of how to ensure that federation doesn’t create a kind of link into their infrastructure for others to exploit, by seeing traffic and capacity or by competing with their own services.

Facilitating service federation doesn’t address these issues automatically, and I don’t think that the MEF takes substantive steps to do that either.  However, there is value to facilitation, and in particular for the ability to federate higher-layer features and to integrate technology domains within a single operator.  Thus, I think we can say that MEF 3.0 is at least useful in this area.

The second question is whether the MEF goes far enough in supporting its own notion of the “third network”, the use of carrier Ethernet as a platform for building services at Level 3 (IP).  I have the launch presentation for the MEF’s Third Network, and the key slide says that Carrier Ethernet lacks agility and the Internet lacks service assurance (it’s best-efforts).  Thus, the Third Network has to be agile and deterministic.  Certainly, Carrier Ethernet can be deterministic, but for agility you’d have to be able to deliver IP services and harmonize with other VPN and even Internet technologies.

While the basic documents on MEF 3.0 don’t do much to validate the Third Network beyond claims, the MEF wiki does have an example of what would almost have to be the approach—SD-WAN.  The MEF concept is to use an orchestrated, centrally controlled, implementation of SD-WAN, and they do define (by name at least) the associated APIs.  I think more detail in laying out those APIs would be helpful, though.  The MEF Legato, Presto, and Adagio reference points are called out in the SD-WAN material, but Adagio isn’t being worked on by the MEF, and as a non-member I’ve not been able to pull the specs for the other two.  Thus, it’s not clear to me that the interfaces are defined enough in SD-WAN terms.

Here again, though, the MEF does something that’s at least useful.  We’re used to seeing SD-WAN as a pure third-party or customer overlay, and usually only on IP.  The MEF extends the SD-WAN model both to different technologies (Ethernet and theoretically SDN, but also involving NFV-deployed higher-layer features), and to a carrier-deployed model.  Another “useful” rating.

The final point is the business-case issue.  Here, I think it’s clear that the MEF has focused (as both SDN and NFV did) on exposing service assets to operations rather than on defining any operations automation or SLMA.  I don’t think you can knock them for doing what everyone else has done, but I do think that if I’ve declared SDN and NFV to have missed an opportunity in SLMA, I have to do the same for the MEF 3.0 stuff.

Where this leaves us is hard to say, but the bottom line is that we still have a business-case dependency on SLMA and still don’t have what operators consider to be a solution.  Would the MEF 3.0 and Third Network approach work, functionally speaking?  Yes.  So would SDN and NFV.  Can we see an easy path to adoption, defined and controlled by the MEF itself?  Not yet.  I understand that this sort of thing takes time, but I also have to judge the situation as it is and not how people think it will develop.  We have waited from 2012 to today, five years, for a new approach.  If we can’t justify a candidate approach at the business level after five years, it’s time to admit something was missed.

There may be good news on the horizon.  According to a Light Reading story, Verizon is betting on a wholesale SD-WAN model that would exploit the MEF 3.0 approach, and presumably wrap it in some additional elements that would make it more automated.  I say “presumably” because I don’t see a specific framework for the Verizon service, but I can’t see how they’d believe a wholesale model could be profitable to Verizon and the Verizon partner, and still be priced within market tolerance, unless the costs were wrung out.

We also have news from SDxCentral that Charter is looking at Nuage SD-WAN as a means of extending Ethernet services rather than of creating IP over Ethernet.  That would be an enhanced value proposition for the Third Network vision, and it would also establish that SD-WAN is really protocol-independent at the service interface level, not just in the support for underlayment transport options.  This is the second cable company (after Comcast) to define a non-MPLS VPN service, and it might mean that this will be a differentiator between telco and cableco VPNs.

How much the MEF vision alone could change carrier fortunes is an issue for carriers and for vendors as well.  Carrier Ethernet is about an $80 billion global market by the most optimistic estimates, and that is a very small piece of what’s estimated to be a $2.3 trillion communications services market globally.  Given that, the MEF’s vision can succeed only if somehow Ethernet breaks out of being a “service” and takes a broader role in all services.  There’s still work needed to support that goal.