How Did NFV Get “Down But Not Out”?

I just had a chance to read the Light Reading story on NFV (“Down But Not Out”) and it’s a nice summary of the NFV situation.  It also documents my own frustration with the issues, because while everything the article says is true, every issue that the article presents was well-known in 2013.  I know that because I produced the first approved ETSI NFV Proof of Concept, called CloudNFV, and in the PoC document I described all the things that needed to be done, and what steps we’d already taken within the project to do them.  We need to look at the question of how things got off-track, because future initiatives are at risk for the same problems.

When the “Call for Action” paper that launched NFV came out in the fall of 2012, it proposed the substitution of hosted functions on commercial servers for proprietary appliances.  This seemed to me to be a straightforward cloud mission, and I sent the operators who produced the paper a set of comments pointing to that issue.  In the paper I noted that the goals of NFV had to be harmonized with the direction already being taken in the cloud, and emphasizing that we needed to develop a kind of “Platform-as-a-Service” middleware toolkit to organize NFV components, but most of all the Virtual Network Functions (VNFs) into something that could be open and would harmonize differences in implementation.  It was critical, I said, to approach NFV as a top-down process, defining the high-level relationships and goals, turning them into features, and then defining implementations and data/interface models.

In March of 2013, I provided the newly formed ISG a slide presentation that outlined the specific issues I believed had to be dealt with.  In that presentation, I proposed that NFV be “positioned as a cloud application set” that drew on cloud capabilities for resource management and deployment, what today is called a “cloud-native” model.  I also stated again that is was critical not to create so much complexity in building services that operations cost increases overran capital cost reductions.

From the very first meeting of the ETSI NFV ISG in the US in the spring of 2013, it was clear that there were two problems.  One was that the NFV process was proceeding from the bottom up, meaning that the ISG was defining low-level functional elements without having established the requirements framework to justify them.  That risks creating an implementation that doesn’t solve the critical problems of the new technology, because the problems weren’t defined.  The second was that to contain the time needed to complete work, the ISG elected to declare the operations processes of NFV out-of-scope.  That meant that the body would not propose a specific means of managing a service lifecycle, which means one could not be guaranteed to emerge.

This presentation contained a slide titled “Mapping Cloud Resource Control to NFV” and it suggested that everything in the scope of NFV could be addressed either by cloud infrastructure tools (VMs, containers, and multi-tenant web-service-like tools) or by cloud DevOps tools.  In other words, all the ISG really needed to do was lay out an object model for NFV and map that to current cloud technology.

I spoke up frequently at that meeting, objecting to these and other points, and it was during a break at the meeting that the group of vendors who formed the CloudNFV initiative collected and agreed to cooperate on an implementation.  I laid out a model based on my previous contributions and on work I’d done in my ExperiaSphere project, whose first phase was a Java implementation of the TMF’s Service Delivery Framework (SDF), an early orchestration initiative.  It was the CloudNFV team that generated the PoC submission I cited above.

I’d invite everyone who thinks that NFV has fallen short of its goals to read that document, because it shows that the goals we now recognize were understood and accepted by some operators and vendors even then.  The project defined in that PoC addressed many of the issues and framed a process to explore the details of others, with the goal of defining the right approach and then implementing it.  I think that if this project had been fully supported by the ISG and had run to its completion as planned, it would have created a totally open platform that fulfilled the goals of the original white paper, and more.

One of the precepts of CloudNFV was open-source VNFs.  The PoC was based on that; Metaswitch’s Project Clearwater implementation of an open-source IMS.  There was no question that without competition from open-source tools, commercial sources of VNFs would likely impose licensing fees high enough to contaminate the benefits of NFV deployment.  They did.  Another precept was service lifecycle management based on a data-model hierarchy.  There was no question that without this, realistic operations practices and costs could not be attained and sustained.  They weren’t.

I suggested to NFV ISG leadership that the entire NFV operator community join as sponsors to the project, to put their weight behind the concepts that operators seemed to agree were essential to NFV’s success.  This wasn’t to promote myself; I was working unpaid on CloudNFV, as I’d promised to do for the first year.  Only two operators stepped up as sponsors, which wasn’t enough to generate momentum for a complete NFV solution.  Without full support, the PoC testing tool time, and eventually it passed the one-year period in which I’d promised everyone I could supply my time at no cost.  When I withdrew from the project early in 2014, the scope of the PoC was drastically cut back and the goals of the original PoC were never realized.

Even a couple years into the initiative, there was hope for an open-source-centric vision of VNFs.  In a blog I did in 2016 I outlined a pathway to creating more open-source VNFs and improving the competitive environment.  I also authored an extension of the ExperiaSphere project that gave rise to CloudNFV, and developed a whole series of presentations describing an open implementation with open VNFs and management orchestration, based on the TOSCA model.

I agree with the points made in the Light Reading article.  NFV is down but not out.  However, I think that “not out” doesn’t necessarily mean NFV will eventually fulfill all its promises.  Too much time has been spent doing what wasn’t needed, or even useful, and too little addressing the fundamental points that had to be made to work if NFV were to meet any respectable business case.  If the ISG is serious about an extension of its mandate by ETSI, then defining the right approach, even if it means backing off from some of the things already done (always hard in a standards process), is the thing that additional time should be focused on.

CloudNFV was an open process.  At the time of the PoC there were 9 vendors formally involved and another vendor’s tools (Red Hat’s) were used in deployment.  There were only six founding members.  We had also launched an initiative, including documentation, to recruit other members and integrate new features, including test and measurement and support for multi-tenant VNFs (like IMS and EPC in mobile networks, and like those used in content and ad delivery).  It could have been exploited widely, evolved into an early and open implementation that could then evolve to meet market requirements as they were exposed by ongoing dialog and testing.

There was no need to fall short with NFV.  The basic concept was sound, but the realization of that concept fell short because it didn’t reflect good software or good business practices.  It can still be fixed, but the fixing would have to step back from many of the details that have been added in the last four years, and rely more on outside tools and practices.

It’s not easy to say that the wrong thing has been done, and to go back and fix it.  Hard or not, that’s what’s going to have to happen if NFV is to achieve all it could achieve.  But I need to make an important point here, which is that time marches on.  In the four wasted years of NFV, when it left what should have been its cloud roots to duplicate a lot of cloud initiatives, its opportunity to drive carrier cloud has largely passed, and even a completely compliant and fully effective NFV spec couldn’t turn back the clock now.  We now need, even more, to think of NFV as nothing but a cloud application, because carrier cloud is going to drive whatever NFV deployment there is, not the other way around.

Device Models for Future Open Network Infrastructure

What will our next-generation network devices really look like?  I know we’ve heard a lot of stories about the future, but are they all true?  Are any of them, in fact?  I had some interesting operator discussions on this point, and they yielded some surprising information.  The best way to review it is to list the device models operators saw as part of future infrastructure and discuss the benefits and risks of each.

The first and most obvious of the device models is the traditional vendor-proprietary appliance.  Despite the fact that everyone has been predicting the death of these devices, every operator I talked with believed they would have these devices in their networks five and even ten years out.  The primary reasons are a tie; the devices are already there and their use and operations tools are understood, and the proprietary model offers a known risk both in terms of support and technology.

Operators do think their spending on this type of device will decline, both as a percentage of total spending and in absolute terms.  The reason for the decline is a combination of pricing pressure they’ll place on vendors and displacement of appliances in some missions.  Operators expect to rely on the traditional boxes deeper in their networks, but not as much in access/edge and metro.  There, because of the number of boxes needed, they predict they’ll adopt another model.

The model they think they’ll adopt is the open traditional-protocols device, which means an open switch/router platform combined with hosted open-source functionality.  The best-known example of this is the commitment AT&T has made to its DANOS software platform for open devices, which it expects to deploy in the tens of thousands in their mobile infrastructure, notably at/near cell sites.  These devices are not replacements for routing, but rather for vendor-proprietary routers.  They’d still run routing protocols and work with other routers elsewhere in their networks.

Nearly every operator thinks that this is the class of device that will experience the fastest growth and attain the largest total deployment over time.  A bit over half operators said they thought that within 10 years they’d be spending more on this class of device than any other.  That, again, is owing to the expectation that this class will dominate metro/edge deployments.

The next device class is roughly tied with the forth in terms of expectations, but it’s already experiencing deployment growth.  That class is the OpenFlow SDN device, and operators think it will deploy predominantly in carrier cloud data centers, but also somewhat later as the foundation for electrical-level grooming of transport trunks, replacing traditional core routers.

Slightly less than half of my operator contacts think that OpenFlow SDN will see action in business services, but this view doesn’t seem firmly held.  Their issue is a combination of lack of proof of scalability to VPN levels and concern over the operations impact—both on practices and on cost.  That suggests that were there a strong service lifecycle management framework in place, and were that framework to have proven itself with OpenFlow, greater deployment could be expected.

The forth device class is the overlay-edge device, which is the foundation of an overlay network.  We have these today both in the “SDN” space (VMware’s NSX and Nokia/Nuage) and in the SD-WAN space, and both have various uses in operator infrastructure.  Operators tend to see SD-WAN as a pure business service framework, a supplement to or replacement for MPLS VPNs.  They see overlay SDN as a potential alternative to OpenFlow SDN in the data center, and some even see a value in transport grooming missions.  The big opportunity for the “SDN” part of overlay devices is believed to be 5G network slicing.

This device class sees the greatest uncertainty in deployment terms.  Almost two-thirds of operators say they expect to see most of their business services delivered via an overlay model within 5 years, owing in no small part to the ease with which MSPs or even enterprises themselves could create SD-WAN services.  Almost 15% see no use of overlay technology at all, and a small number see it as their primary grooming and service-layer architecture.  It’s clear that operator perspectives are evolving here, and that the slow pace of evolution can be linked to the limited vision vendors are presenting in the space.

The fifth class of device is the hosted/virtual function.  There are two subclasses in this group, one being the virtual functions of Network Functions Virtualization (NFV’s VNFs) and the other being simply hosted instances of software.  Operators here are sharply divided, with about a sixth saying they think NFV will deploy for CPE in all classes of users (including consumers) and throughout 5G and carrier cloud, a sixth thinking that strict NFV will never amount to anything, and a sixth believing that they’ll adopt cloud technology in 5G and elsewhere, but without the formal NFV trappings.  The rest hold no firm view at this point.

One reason for the confusion in this class is the “virtual CPE” or vCPE concept.  Most NFV today is really a limited application of the ETSI NFV ISG specifications aimed at hosting functions in a universal edge device (uCPE).  These devices may be simple Linux machines or use an embedded OS that supports at least a limited Linux/POSIX API set.  The functions can be dynamically loaded and changed, and the deployment doesn’t require a cloud data center or formal “service chaining” of multiple functions across hosts.  Most operators recognize that this model isn’t ETSI NFV at all, but they’re grappling with how standardization and openness can be achieved without a formal framework.

That point opens the question of the physical devices involved in these models.  If we look at the hardware foundation for the classes, we find there are three primary hardware platforms deemed credible by operators, besides the proprietary appliances.  One is the server, the second the open network device, and the third the “universal CPE” device.

Most of the operators I’ve talked with believe that while it would be possible to host routing/switching instances on general-purpose computers, the trend is toward open but specialized devices.  These would have custom ASICs or other GPU chips intended to handle the heavy lifting of forwarding.  Servers are best for hosting persistent, multi-tenant, functions, which includes most of what operators expect would be involved with 5G.

Open network devices are the inheritors of the special-device expectations.  There are already quite a few open-platform switch/router products that don’t depend on OpenFlow, and operators like these devices as the basis for at least edge missions in the future.  They’re more reserved about using open-model hardware in deeper missions, but that may change as they become more comfortable with the approach.

uCPE devices have inherited most of the “NFV” expectations, and this may be at the core of the general trend to use “NFV” to mean any method of hosting network functions, not necessarily the specific approach mandated by the NFV ISG.  If you’re going to put functions into a box on prem, much of the ISG stuff isn’t relevant.

The mission of uCPE is potentially a mixture of open-switch-router and function-hosting server, which makes this device class difficult to pin down.  It is possible to host the flow-programming language P4 on a server, and to build a P4 interpreter or “virtual switch” that runs under a standard operating system, so perhaps that would be the ideal approach for uCPE.  However, open and agile deployment of virtual functions implies that anything that hosts one should be able to host a standard model/configuration.

There’s a bit too much variability in the uCPE notion, in my view.  There is a presumption in standard NFV that you’re going to put functions into a VM, but that might not be the best approach with uCPE given the excessive resource requirements for per-function VMs.  A better model, still standardized, might be a container, but if you wanted optimum efficiency you’d probably not bother with any sort of isolation of functions from each other, given that uCPE is sitting on one customer’s premises.

Perhaps we need something even lighter-weight, given that even container technology presumes a level of complexity in application distributability and lifecycle processes that sticking all your function eggs in a uCPE box doesn’t easily justify.  That’s especially true if we presume that IoT and other event-driven applications will draw function hosting and microservices to the network edge, and presumably that means either the operator or user side of the service demarc.  Functions are simple pieces of code, requiring little in the way of middleware or operating system features.  If function-hosting uCPE is the future, then function-based VNFs should be considered.

Any open model for networking demands true openness in hardware and software, and a pretty high level of consistency in terms of operating system and middleware features, at least within the appropriate device class and mission.  I think that some operators, notably AT&T, are pushing to achieve that, and if they succeed I think they’ll drive an open model of networking forward much faster than we’re currently seeing.

Is Tech News “Fake?”

GDPR has just gone into effect, and the national media is obsessed with issues of privacy and the impact of advertising.  There’s also the matter of the spread of biased information on social media; what’s called “fake news”.  Maybe it’s time to address a touchy topic, which is the impact of advertising and fake news in tech.  I’m not talking about using social media to distribute “fake news” but fake tech news.  Is there such a thing, how common is it, and how did we get there?  Is there a common element joining fake tech news to “fake news” overall?  Most important, is it hurting us, and the industry?

Thirty years ago, technology was starting the most important transition of its time.  Not the most important technology, but the most important transition in how users and potential users of technology learned about it.  At that time, the premier outlets for technology information were paid-subscription publications.  My surveys of the time showed that publications like Business Communications Review (BCR) were top of the heap in terms of credible channels for information on technology.  When I visited enterprise CIOs or network operator planners, I’d often see the latest issue of BCR on their desks.  You don’t see it today; BCR isn’t published.  The reason why is “ad sponsorship” or “controlled circulation”.

I can recall when controlled circulation publications and “qualification” cards emerged.  The theory was simple; if advertisers believed that only those who were qualified buyers of a particular technology sector would receive the publication, they’d be more likely to advertise in it, and pay more for the ads.  When the process started, the most important question was something like “How much network equipment budget are you responsible for?”  Too little and you didn’t get the publication, so editors used to joke with me that if you added up the claimed budget responsibilities of all their qualification cards, the total would exceed the global GDP.

As media evolved from print to web, the controlled circulation model evolved to ad-sponsored websites.  There’s no question that this has improved the availability of information on technology.  But is it good information?  The challenge with this new model is pretty obvious, framed by the fairly recognizable statement that “Whose bread I eat, his song I sing.”  When readers pay for news, they get news useful to readers.  When vendors pay, not only do the vendors get news they like, the rest of us get that same story.  It doesn’t mean that the story being told is a lie, but that it reflects the view of an interested party other than the reader.

Ads on web pages are served when you see the page, so you have to click on an article to see the associated ads.  Why did you click?  Because the headline looked interesting, so writing headlines is really important in getting you engaged and presenting you with ads.  The story?  In advertising terms, not so important, because most of the ads you’ll see will come up in the first screen.  Give somebody a two or three-thousand-word story to read and they’ll likely see no more ads than if you’d given them 500.  However, two thousand words on a single topic will take a lot more work to generate than 500 on four topics, and you’ll get four times the ad impressions with the latter.  It’s “better” to have a headline (like “self-driving cars” or “robots”) that grabs interest than one that promises complex solutions to mundane problems.

There’s also a problem of content bias.  Who are the sources of information for articles?  Many end users of technology have policies that forbid them to talk with the media about project details or plans, for good reasons.  Who doesn’t have such policies?  Vendors, and in addition vendor announcements usually form the basis for “news”, which after all means “novelty”.  Remember the old cartoons where news hucksters had issues whose headline was “Man Bites Dog!”  Dogs biting men would hardly be news, but it would probably have been true a lot more often.

Sometimes what the vendor wants to push in a media briefing doesn’t look like it’s going to generate an interesting story, so what’s written isn’t what the vendor intended.  I got a call from a reporter about a vendor’s announcement to drop a particular product, and I remarked that given the product had limited value and was likely to have less over time, that was probably a smart move.  There was a long pause, then the reporter asked “So you’re saying that the vendor’s product strategy is in disarray?”  No, obviously I was saying the opposite, but obviously the story was already done in a “more interesting” form and just a quote or two was needed—if the quotes said the right thing.

Then there’s analysts, among which group I’m generally classified as belonging.  I got into this space way back in 1982 with a big survey project for a network vendor that involved talking with 300 users.  Getting that many people to talk about the topic was so much a challenge that I kept asking them questions about every six months to keep them engaged, and that formed the basis for my own surveys and reports.  Surveys are hard work, expensive, and time-consuming, so it’s important that they provide something that’s going to be useful, something that people will pay for.

In my early days, I also did some contract work for other analyst firms, and in the 1990s I started to see RFPs from them saying something like “Write a report detailing the five billion dollar per year market for xyz.”  They’d determined that if the market wasn’t at least five billion, the report wouldn’t sell.  Well, gosh, that sounded more like a plot for a novel to me.  However, it was exactly what the major buyers of reports—vendors, again—wanted.  I got out of the syndicated research business.

Just five years ago, I had another dose of reality.  A vendor asked me to confidentially review a research report they’d commissioned (from another analyst firm).  They were concerned about the responses and wanted to know if my own work corroborated them.  I did the review, and the responses and research were not even remotely like what I’d obtained.  The reason was that the target topic was appropriate to enterprise CIOs, but the people surveyed weren’t even working in enterprises for the most part, they were in SMBs.

The problem here isn’t that people are making up tech stories (or at least that’s not a significant piece of the problem) as much as that vendors and the search for ad delivery opportunities have too much influence on what gets published.  Vendors pay for ads, they pay for reports, and not surprisingly they have significant influence on each.  Does that influence mean the buyer is being fed fake news in the form of information favoring the seller and not the reader?  Sometimes it does.

This is the common element between fake tech news and fake news in the current popular dialog.  Everything has a perspective.  Everything has a central truth.  Is that central truth the goal, or is it the goal to present all the perspectives?  Right now, according to the ratings, the three major news networks (CNN, Fox, and MSNBC) have two (the last two) that favor a particular (and opposite) political view, and one that favors presenting people advocating both views.  The two extreme ones are more popular.  Why?  Because people who are dedicated news junkies are more likely to want their own positions validated, or their opponents trashed, than to hear what’s actually going on.  More viewers, more ad money.  In tech, are you better off with a story about a hot topic, or about something that’s really critical to advance the industry?  If you want to serve ads, you know what the answer is.

What, then, about our “central truth” that’s supposed to be at the core of every story?  Remember “Man bites dog!”  It’s often lost in the desire to say something catchy, and so to catch readers.  New stuff is interesting.  I remember when “Popular Science” ran a story on how we could use nuclear bombs to dig underground reservoirs.  It was sure interesting, but I don’t see those reservoirs today.  If you planned to dig yours that way, the story didn’t advance the real world a whit, and in fact might have overshadowed practical options and problem-solving techniques.

We can’t advance this industry based on exciting stories about the far future.  “Popular Science” in the past is no excuse for letting “Popular Networking” dominate in the present.  Networking today needs the opposite of pure promotion, which is pure validation.  It’s something that touches everyone’s life, but that hardly anyone understands.  We need to make networking understandable, particularly to the people we expect to fork out dollars to make it happen.  We also need to understand what those people need in the way of justification, before they pull out their checkbooks.

What do we do about it?  There’s nothing that’s going to change the ad-sponsorship business model, but what readers/buyers can do is to recognize that the information they get is often sourced from the people trying to sell them something and influenced by the selling process overall.  An “editorial mention” for a company or product should never convince you to buy it, only to visit the vendor’s website and do some direct research there and in related sites.

Remember too that the “sponsorship taint” doesn’t taint everything equally.  Some publications, including those I write for (Tech Target’s sites and No Jitter), don’t exert any editorial control on their authors, so if the writers are unbiased you can generally rely on the stories being likewise.  They firewall editorial organizations from the ad sales part of their company, which at least reduces and sometimes eliminates the risk of influence.  Analyst reports may be sponsored but may also be sold and thus more accountable to the reader.  When you read something, ask how it was paid for and what the interest of the paying party is.  If it’s not congruent with your interest as a buyer, factor that in as a negative on the objectivity and value of the information.

I could be a problem too, of course, which means I have to explain myself.  My own stand is two-pronged.  My blog, which this piece is a part of, is totally ad-free and influence-free, and that’s a pledge I make on the blog itself.  You can rely on it.  The second prong is that in the small number of cases where I write something under my own name for publication elsewhere, I’m almost always paid for it, but what I write is what I believe to be true, and nobody influences it or changes it.  If I can’t be assured that my views are published as I’ve written them, I won’t write for that publication or company.  I demand real documentation for claims, not verbal comments.  Sometimes I’ll demand a demonstration, and if I’m not satisfied then I’m not going to let my name go on anything.  Could I be deceived?  Possibly, but it will have to happen despite my best efforts, and if I find out I have been, I’ll blog about it in detail.  That’s another pledge.  I may be wrong about something—I have been in the past—but if I am, it’s my own error and not that somebody is paying me to think differently.

That’s often how bloggers work, since the barriers to publishing them in the Internet age is reduced.  I’ve found blogs from truly independent sources to be the best pathway to getting unbiased information, at least if “unbiased” means “free of explicit influence”.  Everyone has biases, including me, and you should keep that in mind.  The biases arise from limited exposure to the issues, from the background and experience of the writers, and from their own vision of how things should be and should work.  Those biases are unavoidable, and you can sometimes net them out of your research by consulting multiple sources.  I’m not saying vendors, or reporters or editors or analysts, are lying.  I’m simply saying that they have their own agenda, explicitly (vendors) or implicitly (sponsored outlets and reports).

You’re always free to ignore the truth, to find opinions that fit your preconceptions, of course.  Fake news works because people do just that.  The biggest problem here isn’t media, analysts, or vendors, though sadly they are part of the problem.  The biggest part is you the reader, the user, the buyer.  In the Internet age you can find a validating resource for any point of view, no matter how limited, stupid, or extreme it may be.  If that’s what you’re looking for, then you’re in luck in today’s world.  If you want real information, you’ll have to put aside your own prejudices and do some research.  In the end, you are what you know, and you buy and use what you decide.  Make the most of that power.

Some Truths About the Challenge of Operator Transformation

There is always a risk in using buzzwords or catch-phrases.  They tend to propagate through the market, losing contact with reality along the way.  We hear, for example, that operators are “bullish” on transformation but don’t follow through.  Is that a fair comment or is there something deeper going on.  Four global operators have recently shared transformation issues with me, and as long as I don’t identify the operators, they’re happy to let me address them here.

The first issue, one all four operators mentioned, was everyone forgets we’re network operators.  You and I have been reading stories about transformation for half-a-decade, and most of them reduce to things like “why not just adopt the OTT (or cloud provider) business model?”  Answer: Because network operators aren’t OTTs or cloud providers.  Most people who advocate this kind of transformation aren’t really suggesting that operators become OTTs, only that they rely on profits from OTT services.

Operators cannot run network services at a loss and then try to make it up with profitable OTT services.  The incremental cost they’d have to assess to overcome the losses in network services would make their OTT services non-competitive.  The axiom for the network operator is now, and will always be, that network services must be at least marginally profitable no matter what other services they may offer.

In truth, what operators think of as “transformation” or “digital transformation” is really “network service transformation”.  They know they can offer OTT services; some like Verizon have bought OTT players.  They also know that they have to make network services into something beyond a boat anchor on their bottom line.  So let’s stop telling them to be OTTs and focus on the actual goal of transformation to an operator.

The second issue was also universally mentioned.  Doesn’t anyone understand the notion of undepreciated assets?  Network infrastructure has a financial useful life of between 5 and 20 years.  If a project comes along that requires a hundred million dollars’ worth of undepreciated equipment be written off, the CFO will say the project’s cost has to be increased by that hundred million.  Few infrastructure project business cases would survive that added cost.

Remember the old song “If I knew then…What I know now…”?  There are surely a number of network investments that operators (and users) regret, but just because the right answer comes along later down the road, doesn’t mean you get a do-over.  Technology revolution in network infrastructure, folks, is not possible.  That means that proponents of this or that network transformation strategy have to pursue an evolution-driven business case.  That’s problematic because the early deployments don’t offer much benefit because most of the infrastructure can’t be impacted for depreciation reasons.

Issue number three was cited by three of the four operators.  Saving five, ten, or even twenty-five percent of capex doesn’t justify a major technology shift.  I’ve noted in prior blogs that a group of operators at an industry event told me that 25% wasn’t enough for NFV to save in capex—they could get that by “beating up Huawei on price.”  One reason a big savings is needed is that new technology means redoing operations practices, and that may end up killing a lot of the benefit.  Operators have said publicly that NFV is way more complicated, and managing complexity is more expensive.  Another reason is risk of failure.  We have no global networks based on SDN or NFV, which means that someone who decides to risk it all on one of those technologies has little defense if it fails.

Why is AT&T deploying white-box products in cell sites that are based on traditional routing protocols but implemented with an open platform?  Answer, because the new open stuff saves money without requiring changes in operations, and it doesn’t expose the operator to a technology that’s never been used at scale.  The AT&T/Linux Foundation DANOS project or ONF Stratum, combined with P4 forwarding programs, offer a much greater chance of capex-driven transformation than SDN white boxes or hosted functions ever did, or will.

Three of the operators also agreed with our next point, which is the Internet is a service, not a network.  It’s a confederation of global providers who interconnect IP infrastructure into a global service based on a public address space and near-universal connectivity.  When people say that operators should be able to change this or that quickly because “that’s what happens on the Internet”, they’re forgetting that those providers are the Internet in an infrastructure sense.  The agility of “the Internet” is really the agility of what’s delivered over it, and that doesn’t have the depreciation constraints or requirement for massive global infrastructure deployment that real networks have.

The infrastructure that creates the Internet is complicated, and how things like content delivery and even VoIP work isn’t well understood.  The relationship between the Internet, broadband wireline access, and mobile access is likewise not understood.  All of this creates confusion in what operators actually do, what they are responsible for fixing, and what payments made to them actually cover.  One operator said that almost 100% of their customer base believed content delivery glitches were always the operators’ problem when operator data said it was most often CDN issues, and DNS problems were as often the cause of wireline access issues as broadband connectivity.  Customer care is more expensive because of this problem, and visibility into customer services has to be deeper than the operators’ own service in order to respond.

The final problem, which gets 100% buy-in, is vendors and users forget that network operators are businesses, not public utilities.  Public companies are responsible for making decisions that their shareholders support.  If they don’t they face the risk of a hostile board of directors.  You can’t just cut prices if it kills your stock price.  You can’t toss undepreciated assets for the same reason.  You have to worry about the next quarter, not five or six quarters in the future.

This last point resonated with me because countless vendors have complained to me about how operators don’t think about the future, just the present.  At the very moment these vendor complaints are flying, the vendors themselves are hunkering down on obsolete products to keep current sales up, and hoping something will bail them out when the “future” comes along.

Operators agree with some of the popular comments on their transformation efforts.  One such comment that stands out is that operators are not organizationally prepared for transformation.  The current separation of operators into three groups—CIO for OSS/BSS, operations to run the network, and CTO to plan for adoption of new technologies—is almost universally seen as a barrier.  It works for planning an orderly service set using fairly static technology, but not in the modern age.

This is borne out in my broader research.  For example, operators across my own survey/contact base say that less than 20% of the technology projects/trials launched actually result in meaningful deployment.  Almost three-quarters of operators say that OSS/BSS needs to be modernized, and half think it should be merged with operations.  A third believe that the CTO organization should be a smaller team, more subordinate than it is today.  Almost two-thirds liked the idea of a three-piece organization consisting of sales and marketing, infrastructure, and product/service management.  The CTO group of today would be a part of the last of these, the CIO and COO groups the second, and current largely diffuse sales/marketing activity would be coequal with the others, and in fact be the lead group.

Network operators have been in the connectivity business for a century.  It’s now clear that connectivity in the traditional sense isn’t going to keep the lights on.  Experience delivery is a different world from a tech perspective, but it’s all the more different when a lot of what you’re delivering is ad-sponsored and thus free.  ISP and common carrier and cable services are a vast industry, one that’s not going to turn on a dime, and we need to understand what’s really happening to understand how we can make transformation work better.

Might Overlay SDN and SD-WAN Help Operators with Profit-Per-Bit?

If there is a profit-per-bit problem, how much of it could SD-WAN solve?  How much could SDN or NFV solve?  What are the fundamental attributes of a strategy to address profit-per-bit, and where can we expect to see features to fulfill the potential?  These are all critical questions at a time when operators are struggling with the issue of ROI on infrastructure investments, meaning capex.  They matter to operators, of course, but they also matter to the SDN and SD-WAN vendors.

The overlay-network space revolves around sales, like everything else.  For SDN and SD-WAN, sales success depends on sales to network operators and cloud providers, not to end users or even MSPs.  For SD-WAN, in particular, focusing on that market is going to require some convulsive changes in positioning and features.  If anyone finds a profit-per-bit hook, they’re way ahead of the pack.

For operators and even users, overlay technology could also be critical.  It frees user services from a direct protocol relationship with transport/access infrastructure.  That can allow services to federate across multiple providers and even technologies, which is the current most-common mission for SD-WAN.  It can provide a different model of logical connectivity, support the integration of new features (the universal CPE or uCPE model).  It’s an easy new revenue source for network operators, and a defense against technology complexity at the service level complicating management of infrastructure.  It could be huuuuuuuge, if it works.

Let’s start with what I don’t think will work, at least not in a practical timeframe.  I do not believe that SDN (in the form of OpenFlow white-box switching/routing) or NFV in any form can have any meaningful impact on network operator capex overall, certainly not fast enough to stave off profit-per-bit pressure over the next five years.  I think operator comments quoted by networking websites and my own conversations with operators are totally consistent with that view.

What I think might help, if it’s done right, is adoption of a variant SDN/SD-WAN model based on overlay technology.  This would have the effect of creating a service layer that’s transport-independent, and that would create three specific benefits in the fight to improve operator profit per bit.

The first benefit would be taking infrastructure out of the service-feature business.  All overlay technology relies on an “agent element” that forms the new service edge.  This element can construct an interface to the user, add in features that might otherwise be supplied by additional appliances, and simplify the features actually used in infrastructure-network devices.  Think, for example, of an “MPLS router” that does only MPLS ad nothing else, or packet-optical pipes.

All of that would combine to facilitate the paring down of complexity and cost, both capex and opex, for network infrastructure.  New “services” could be built by augmenting only the service layer.  There’s less risk of constant service changes to infrastructure, less risk that network operations would have to adapt to different features in the network, and less risk of updates creating instability.

The corollary benefit to this is that any infrastructure transport service or combination of services that offer basic connectivity and required QoS/SLA could be used.  A service overlay can overlay today’s IP, today’s Ethernet, emerging OpenFlow SDN (to the extent it emerges, wherever it emerges), the MEF’s “third network” model, anything built on the P4 flow-programming protocol, etc.  As long as different network technologies had some mechanism to gateway across the differences (which could be offered at the service level, of course), anything goes.

The biggest value this brings is the ability to exploit the present equipment and at the same time facilitate the introduction of new stuff.  There are no fork-lift upgrades, because the service layer glosses over everything underneath.  The biggest problem that “transformational” technologies have is that you have to deploy a lot of a given technology to gain a benefit, and that increases both premature asset displacement and technology risk.  No such risk with the overlay.

The next benefit is that service overlays introduce a great place to make management and test and measurement connections.  You can get end-to-end information easily with a service overlay, and you have a convenient set of places to connect management tools and probes for detailed analysis and even test data injection.  One of the paradigms of SLA enforcement is that there has to be a place where provider and user come together on what’s being seen/experienced.  The service edge is the only logical place that can happen.

You can do management of transport services at a transport demarcation point, of course, including one inside a service-layer termination.  But you can also health-check other features.  For IP services, for example, you need at the minimum a gateway, DNS, and DHCP to create connectivity.  It’s easy, if those features are projected to or provided by a service-layer element, to validate their health.  If the service layer does dynamic QoS measurements for transport selection, that’s easy to leverage, and if it’s session-aware you may be able to get dynamic packet loss data.  Obviously, you can also do protocol analysis and test data injection.

This should be making service overlays a hot button, and in fact they sort-of are.  The qualifier is needed because the products that offer overlay service-layer capability divide into two families that are typically directed at missions only peripheral to these mainstream issues.  SDN in overlay form is more common in data centers, and SD-WAN is most often used simply to unify connectivity in enterprises where VPN services are either impractical or unavailable in some sites.

What’s needed to get things moving are a vision of the mission I’m describing, and movement toward exposing some specific APIs from service-edge elements to facilitate the development of ecosystems that support and exploit service-edge deployments.  These are far more important than defining transport interfaces in developing momentum and new service-layer features quickly.

For service-layer technology targeted for network operator deployment, one of the API sets should facilitate integration between the service and transport layers, as an option.  All providers of service layer technology won’t have the option to integrate with transport, either because they’re not incumbent network operators or because regulatory issues constrain the integration.  Where it’s available, though, it could be a powerful incentive for network operators to adopt a service overlay model, because it would differentiate their offering from that of third parties.

In the short term, management APIs and features will be the primary requirement to help meet profit-per-bit goals for operators.  The APIs have to expose the state of the service, of each connected site, and each connecting transport resource.  They should also be wrapped in policy management to control the nature of information that can be viewed/changed according to role, since operators may want a deeper management view for their own personnel if they sell the service.

In the longer term, logical networking features will be the primary requirement. The cloud, containers, virtualization, and componentization of applications are combining to create a whole new model of connectivity, a model that direct IP network features can’t address easily.  Overlay technology could address all these new requirements easily, and for vendors getting something very strong in this area should be viewed as a co-equal requirement to the management features.

Right now, SDN and SD-WAN vendors have a roughly equal shot at the feature space; SDN players generally have better management and may also have better integration with transport resources, but SD-WAN players tend to have better logical-network support.  Both classes of vendor are about equally represented in operator plans for service-layer deployment at present, so it’s fair to say that there’s no convincing leader in the overlay space.  That’s likely to change, even this year, as both operators and potential service buyers do more planning and understand their requirements better.  It’s going to make for an interesting fall, in particular.

Is the IoT Market Entering a New (Realistic) Phase?

IoT isn’t about 5G, it’s about events and edge computing.  A recent story on Rigado’s “edge-as-a-service” is a more realistic take on what IoT means, but is it a full-on example of a winning IoT approach?  I have to confess that another “as-a-service” positioning raises my hackles, but we’ll have to look at it, and measure it against other news and trends, to see if there’s value.

We already have IoT, in what is probably the dominant sense of the term.  Many homes and offices have security, environmental monitoring, and industrial or other control systems deployed today.  Nearly all of them are based on a controller that talks with applications using the Internet and WiFi, combined with a local wired or wireless protocol that talks to sensors and controllers.

There are huge advantages to this approach.  Sensor technology is highly cost-sensitive, and providing a sensor with 5G, Internet, and security/governance elements means jacking up the price from thirty bucks or so to perhaps five to ten times that.  One large provider of smart home technology told me they sell fifty times as many low-cost non-Internet devices as those that are actually Internet-connected.  Even where no special sensor protocol is used, WiFi and not cellular are the most common technologies.

The model I’ve described provides the features necessary for virtually any “private facility” IoT application out there, and I don’t think that model is going to change just because somebody deploys 5G.  There’s no reason for it to, because it provides what people want, at the lowest possible cost and lowest risk.  You can put a lot more security into a controller than into a sensor without breaking the bank.

A controller, in this context, is a gadget that can talk short-range wired or wireless protocols, including things like X10, Insteon, Zigbee, Bluetooth, and WiFi, to connect with devices on premises, and then provide a combination of local processing and Internet access.  There are controllers that meet these requirements offered by many companies already, and widely used.  You can buy a system in many local retail stores.

These controllers are in a cloud sense, an “edge”.  There is no practical way to provide connections via local protocols without a device on premises to do that, but it would be possible to dumb down that device and pair it with cloud intelligence.  If that intelligence offered local sensor/controller abstractions via API, you could consider it “edge-as-a-service” in at least the IoT context.  Since this is a pretty accurate picture of what Rigado provides, their marketing/positioning claim is fair.

The best way to build a controller, and in fact the way some are already built, is to combine an embedded-control device with local compute power and sensor/controller protocol support, with deeper processing via APIs.  That way, you have the option of hosting some basic processes on the controller to shorten the control loop and ensure that you have control during periods when you may lose Internet access.  What separates Rigado from the others is less the specific technology of the controller than a combination of positioning and the set of tools and features (which they package as the “Cascade” solution, their real “edge-as-a-service” offering) to enable cloud application integration, security, and compliance.  You can cloud-extend both their basic controller and their more programmable one.

In my view, Rigado’s approach is useful mainly to either OEMs who want to build IoT applications or to large enterprises who are prepared to custom-develop their own facility/industrial control.  That’s not a bad thing, though.  The IoT market is likely to be dominated by specialty providers who exploit tools and hardware to build applications that then target end users.  Residential IoT is an example of this, of course.

Cascade integrates with Amazon, including AWS Lambda, their functional-programming event-processing toolkit.  Obviously the idea is to use the IoT gateway as an event source to Lambda, which is one of the technical differentiators of the Rigado approach versus other controllers.  You build two-tier applications that can cede as much event control to the cloud as you’d like.

This is one of two possible edge models for IoT; the other model would be to extend cloud-specific process hosting (AWS Lambda, in this case) out of the cloud and into the edge.  Amazon’s Greengrass offers that capability, and as far as I can see, Cascade doesn’t include that.  It’s not a crippling problem in a functional sense because most users would be just as comfortable (or more comfortable) writing controller apps in another language anyway.  What could be a crippling problem is that cloud providers can easily extend their event strategy to the premises, and if that were to happen then firms like Rigado would be competing with the provider of part of their solution.

Amazon has, in other areas like unified communications and collaboration, seemed to focus on being a supplier to specialized OEMs rather than a provider of direct end-user solutions.  That might continue in the IoT space, but remember that Amazon, Google, and Microsoft are all dabbling in direct sales of event-based and IoT services.  What’s Amazon’s strategy matters less than what the overall competitive dynamic would turn out to be.  If Microsoft or Google field a device (and remember that they both make devices), would Amazon (who also makes devices) sit back and wait to see what happens?

The other risk for Rigado is the rest of the controller/IoT marketplace.  I know of four or five vendors who offer controllers that have nearly identical features, but they’ve so far not positioned their capability in the same way Rigado has.  If they were to start to offer extended functionality via cloud hosting (perhaps in partnership with an Amazon competitor) they would put a lot of pressure on the space.  Some of them have been around for a decade, have a large installed base and good distribution channels.  Unlike Rigado, they don’t seem to be aiming at an OEM space themselves.

What Rigado represents is a recognition of IoT reality, not a totally new approach or new technology.  The important thing about the story is that it might show that the “real” side of IoT is going to get some ink from major industry news sites, and that could shift the IoT market emphasis in a real, helpful, direction.

Are Network Vendors Waiting Too Long in Accepting Change?

Cisco is always a bellwether for the router market, so when their earnings call gives the Street angst, it’s not just about Cisco.  Cisco said that every sector in its product space grew in the last quarter, except the most important one, which was service provider routing.  Softness there obviously impacts Cisco’s forecast, whose weakness relative to expectations has led to some Street analysts to question the stock (others still defend it).  Whether Cisco has broader revenue problems or not, service provider spending weakness is an issue for Cisco and its competitors alike.

Cisco’s quarter wasn’t bad, in fact, and it showed that in other product areas like switching (data center switching), the company posted strong sales.  What some on Wall Street didn’t like was the softness in router sales, which Cisco attributed to network operator spending softness.  There’s always an implication (especially on earnings calls) that issues are temporary, but we know that’s not necessarily the case with operator spending.

Since 2012, operators have been saying that their cost- and revenue-per-bit curves were going to cross over in the late part of this decade.  They’ve been pursuing significant operations cost reductions as a means of preventing their net profit per bit from going negative, but there’s no question that operators think they need to spend less on networking.  Commodity boxes and open-source software are their solution, and that threatens not only Cisco but every vendor.

If this isn’t a surprise, why hasn’t it been taken care of?  Cisco and its competitors have had six years of opportunity to frame their business model differently in some way, focusing perhaps on things that would have boosted their revenues outside of routing.  The problem, I think, is twofold.  First, growth in other product areas hasn’t shown signs of replacing losses (real or potential) in the routing area.  Second, every public company at every level is focused on making their current-quarter numbers.  Talking about what comes next is the same as talking about how the current situation is going to change, and that suppresses spending today.

The truth is that Cisco has, more than any other vendor in all of service provider networking, tried earnestly to prepare for a time when routing won’t carry them.  To do any better would mean deliberately focusing buyers on what they don’t want or need today, to prepare them for a shift in the market.  Realistically, that’s not going to happen for Cisco or anyone else.  Was there a mistake Cisco made?  Two, I think.  Can they fix them, even now?  Possibly, I think.

The first Cisco error was not pushing software and the cloud fast enough.  Cisco has developed a range of software strategies, but they’ve never seemed to have been fully engaged with them.  Absent a very strong software position, their UCS server business was just another potential commoditization-cannibalized product space.  Remember the old-line Sun theme that the “network is the computer?”  Cisco could have pushed that strongly, as the player best equipped to create that fusion.

The second Cisco error was not jumping on the automation bandwagon in an effective way.  Today, we know that for service providers in particular, service and application automation converge.  Early on, when NFV first reared its head and suggested that carrier cloud could be a real opportunity, Cisco bought an automation player, Tail-f.  The problem was that it was a network guy’s view of automation, focusing on things like configuring routers.  In a world transitioning to carrier cloud, that’s clearly not the mission.  But Cisco let their Tail-f deal define them, and so failed to promote broad-spectrum automation at a time when it could have staved off capital budget pressure.

What Cisco (and of course its competitors, at least those who lack RAN credentials) face now is a buyer base that wants open-platform hardware like open switch/routers (white boxes) and open servers, with both supplemented by open-source software.  If all this openness succeeds, then a few percent shortfall in revenue forecasts are the least of vendors’ worries.  Their business could totally implode.

But will it succeed, and if so, when?  The fact is that network operators have booted their opportunities for taking control of their infrastructure destiny even more decisively than vendors have.  Very few operators have actually taken any initiative in framing their technology future, and those that have tended to apply network-centric concepts to a server/software-centric age.  ONAP, evolved from AT&T’s ECOMP, is the best example of vendor-centric software.  Is it workable?  Yes, almost surely.  Is it optimum?  Absolutely not, nor is it likely to be made optimal by open-source-driven software evolution.

Even in the cloud, operators are missing their chance.  A commodity cloud-hosting service market will always be dominated by the player who has the lowest internal rate of return, the bar set by the CFO to decide what projects meet ROI targets and what don’t.  Operators have that, because they used to be public utilities.  They could have dominated the cloud market on price alone.  They had the facilities needed for edge computing at the access edge, where it’s most valuable.  They could have framed IoT as what it really is, which is an opportunity for sensor-driven correlation and contextualization services, and then been the only player to provide them.  Instead they let the public cloud providers take the lead, and Telefonica now announces a cloud deal with Amazon.

Server vendors haven’t done any better.  Dell had a golden opportunity to take the lead in carrier cloud back in 2013, and they didn’t step up even though they told the big Tier Ones they would.  HPE had a primo integration contract for carrier could and couldn’t see the forest for the trees, ending up by losing their deal.  Today, Dell/VMware is finally catching on with “virtual cloud networking” and HPE is buying Plexxi for application networking.  Both could have been exploited five years ago.

Are open initiatives then the rule?  If so, then standards and open-source groups should be stepping up, and in general they’re not setting the world on fire either.  The TMF has tried to get beyond OSS/BSS while keeping hold of its own incumbency, without much success.  The ONF and MEF are working to rebrand themselves, and coming out with some new approaches, but there’s the question of whether what they do will be both clearly useful and artfully promoted.  Open-source bodies are grappling with the fact that you make the most progress by exploiting stuff that’s available, but stuff that’s available is likely to reflect the thinking and needs of the past.

It shouldn’t be surprising that sellers in every market expect buyers to suck it up and continue spending no matter what.  The alternative of their not spending is too awful to contemplate, and the alternative of changing the product/service mix in anticipation of future trends isn’t much better.  Operators and vendors have, on this point, a different risk/reward profile.

For vendors, the problem is balancing both risks and rewards between the short-term and long-term.  If revolutionary services came along, they could generate revolutionary spending.  However, that spending on new gear could benefit the vendors only if they actually got the money, and if operators are going to be tentative and careful (as I said they would be in the last paragraph) then vendors who push too much for that future pie in the sky could end up killing short-term sales momentum.

For operators, there is the reality that a new service set could not only involve vast new first-cost issues, but also threaten a trillion dollars in sunk infrastructure costs.  That means that they have to be careful not to get too far ahead of real opportunity.  In today’s hype-driven market, how are they going to know what “real opportunity” is.  Everything is either a revolution it’s not mentioned.  That’s more than anything the reason why operators have focused on cost reduction; they know what costs are, and future revenue is still something at the end of the Yellow Brick Road.

The problem for everyone here is that, if you don’t want capex cut, cost reduction means service lifecycle or zero-touch automation.  It’s been a popular notion for some time, but we’re still grappling with what that means.  Cisco talked a lot about being a leader in “intent-based” networking, but what the heck is that, and how does it relate to operations efficiency?  Intent modeling as an attribute of service modeling is critical to ZTA, but does Cisco or anyone else have a true, complete, model-driven ZTA approach?  No.  Do bodies like the TMF, who has defined service-layer technology, have any vision of what ZTA would mean to operations at all levels?  Not that they’ve shown.  Even ETSI seems to have a ZTA strategy that’s a formula for multi-year research and consideration, not for timely solutions.

Timely solutions to ZTA could be critical to Cisco and everyone else, because of the onrush in interest in white-box, open-source devices to replace vendor hardware.  AT&T’s DANOS and the ONF Stratum project embrace a P4-language forwarding architecture that could, over time, significantly impact proprietary network devices.  There are always risks to new technologies, and the more revolutionary they are, the greater that risk.  There’s also a risk to watching profit per bit go negative, so if the operations efficiencies of ZTA can’t be achieved, then open-architecture network devices are going to look very good, and vendor fortunes will then be at risk.

The operators will end up setting the tone for and pace of “transformation”.  They may think they’re “getting control” of the process, and they may indeed be exercising more influence, but that’s not going to matter if they let themselves fall into depending on the familiar stuff, the familiar vendors, the familiar technologies.  We have nothing in view today regarding future services or costs that wasn’t just as visible in 2013.  Nothing we’ve done since then has transformed us.  We have to look harder today, and find different things, and that’s where operators have to lead.

Organizing All Our “Automation” Concepts

Can we get some definitions here?  I’m as interested in software-directed operations processes as the next person (maybe more than most), but I confess that I’m getting buried in terms and concepts that clearly relate to the software-directed operations goal at the high level, but don’t seem to relate well, or consistently, with each other.  Time out for a taxonomy, I think.  This one is my own, gleaned from my long involvement in the space and an attempt at drawing consistent definitions from current usage.

What seems to be the universal goal of all of this is what’s called “zero-touch automation”, meaning the application of software technology to respond without human intervention to abnormal operations conditions in applications or services.  Something happens and software fixes it.  Humans would get involved only if something arose so out-of-scope to the goals that the automated system couldn’t be relied on to do the right thing, or perhaps anything at all.

There are other terms that seem to mean just about the same thing.  “Closed-loop automation” simply means that a report or event is processed to generate a response.  However, the term has been used most often in association with simple processes, something like “Light-beam-breaks-so-open-gate”.  Service (or application) lifecycle automation is a term that I’ve used, and ZTA does seem to have the same meaning that I at least assigned to the service lifecycle automation concept.  However, service lifecycle automation implies a cooperative system (the application or service and its associated resources) that has a specific lifecycle, meaning a progression of states that together form the evolution of functionality from “I-want-it” to “make-it-go-away.”

ZTA could be a broader concept, perhaps, but I don’t think so.  Most people who separate the two terms seem to apply the ZTA term to a “lifecycle system” that doesn’t have a beginning or end, meaning a steady-state thing like a wide-area network.  You’re going to build it and run it, and ZTA then means automating the response to the stuff that interferes with running it.  Services and applications have this “run” state as part of what we could call a “goal-state sequence”, the part that (hopefully) the service or application lives in the longest.

Whether we call our goal ZTA or service/application lifecycle automation, I do think that there are three elements that make it up.  The first element is our goal-state sequence, meaning some set of conditions that define what things should be or look like.  The second is an exception source, meaning a report of something that’s not as it should be, probably in the form of an event.  The final element is the response process, the software function that does something to remedy the exception.

The idea of associating an exception to a process can be applied at many levels.  At the basic level, it could be an expansion to a simple closed-look process, and here it would be fair to call it “machine learning”.  The idea is to let the system learn what a valid response to a condition is, mostly by watching what operators do but also perhaps by analyzing the results of best-effort estimates.  If you expand this idea to a complex system, you can see the glimmer of the meaning of ZTA.

A network or a cloud is a fairly vast interdependent system of components.  A lot of things can happen, a lot of them could happen at one time, and in many cases the response to a “happening” would be something that could involve several steps, any of which might fail during execution.  ZTA has to deal with the complex goal of restoring the goal-state, or at least getting as close to it as possible.  That is almost certain to involve a bunch of closed-loop processes coordinated in some way.  Think of machine learning at a larger scale, a system of machines rather than a single machine.

Look at the problem this way.  Suppose we have a machine we need to keep running.  We could define a “running state” in terms of the conditions of each component in the machine as they would be in that desired state.  We could then watch component conditions and respond, per component, when something wasn’t right.  But if the machine is truly complex and if the conditions we can measure don’t necessarily relate to a simple, single, component fault, how do we do things?

This is where I think the notion of lifecycle automation offers a better story.  Lifecycle automation presumes that there are multiple components, and components of components, forming a hierarchy that would be called a service or application model.  Each component is responsible for working, fixing itself based on internal processes, or reporting a fault.  If the last of the three happens, then the component into which the faulting component is composed has to remediate or report a fault in turn.

We can inject “analytics” (another term) into our process now.  It’s easy to see how analytics might be a source of the exceptions we talked about, but the problem is the classic one of fault correlation.  You can learn about problems, but is the point at which remediation has to occur, the nature of the remediation, and the sequencing of multiple steps (some of which might fail) something analytics can provide you?

Where analytics might help is where a systemic problem is the sum of lower-level conditions that might not be a problem at the element level.  Nobody is “faulting” but perhaps everyone is on the outer edge of their delay budget, which means that somewhere up the line the budget might be exceeded collectively.  But even here, just knowing that’s happened doesn’t necessarily close any automation loops.  Nobody is violating their SLA, so do you tighten everyone’s SLA?  Not without breaking down the service, probably.

Analytics and closed-loop processes are probably used more often as attractive window-dressing on positioning strategies than as true solutions to the problem of operations automation.  The same is true of script-based tools, models that define configuration but don’t align with events and corresponding process invocations, and models that define only the singular state of “working”.  Too much interpretation is needed on any of these to get to the right response to a condition, and getting to that response is what ZTA is supposed to be about.

So AI is the answer, right?  There’s a whole definitional black hole in itself.  What does “artificial intelligence” mean?  Does it mean self-learning systems, or autonomous systems, or rule-based systems?  In a sense, to say that you’re going to use AI for ZTA is circular because ZTA is zero-touch and so is AI.  In another sense it’s falling back on analytics, because the presumption is that we can simply identify a problem and let AI solve it.  Could we teach an AI system the right way to diagnose and fix problems in a global network?  Perhaps, but if we did I suspect we’d be applying AI at the functional-element level, as a substitute for lower-level state/event handling.

A model-based service and application automation strategy can easily be distributed, since the data model contains everything that every process at every level needs.  Do we know what stateless, distributed, AI would look like?  We really don’t know what centralized AI would look like at this point, and while there have been impressive gains in AI, the basic concepts have changed little, and to say that we’re on the verge of a breakthrough that would let AI resolve ZTA issues is to say what’s been said so many times before…and proved to be too optimistic.

Where are we, then?  First, the difference between ZTA/lifecycle automation and other concepts is that the former two reflect a service- or application-level vision where other approaches are a fault/event vision.  The more complex the system, the more difficult it is to simply fix things that pop up, and so in our era of the cloud and virtualization, we have long exceeded the limits of “basic” automation strategies, including classic closed-loop or machine learning.  The key to ZTA is the goal-state sequence model, and that’s a key that we seem curiously reluctant to unlock.

This is why I get frustrated on this topic.  You have two choices with ZTA.  You can define a hierarchical model of the service, with each element in the model representing a functional system with defined states and faults, or you can define an AI process so smart that it can replace a whole operations center’s worth of humans in doing problem isolation and resolution.  Which of these do you think is possible today?  Unless we want to wait an indefinite period for ZTA benefits, we have to get real, today.

Taking Another Look at the 5G Emergence Issues

What’s the difference between a use case and a business case?  The answer might involve some subtle thinking, but it might also be a key to understanding what’s going on in the 5G space here in the US, and even in global markets.  It’s a question that’s plagued our industry for three decades now, at least, and I’m not sure we have a good answer even today.  The Street is reporting on emerging use cases, but research admits a critical problem: “The revenue models cannot develop until the infrastructure is in place.”  Build it, and they will come?

Back in the early days of ISDN, CIMI Corporation signed the National Institute of Standards and Technology’s (NIST) Cooperative Research and Development Agreement (CRDA).  As a result, I was involved in a lot of the early ISDN work, and I remember one meeting where a vendor technical rep came into the meeting all excited and said “I’ve discovered a new ISDN application!  It’s called file transfer!”  Well, gosh, we had that, and yes, ISDN could do it better than voice lines and modems, so it was a use case for ISDN.  Would people deploy ISDN just to get it?  Clearly not, so it wasn’t a good business case.

There are plenty of things you could do with ISDN, or frame relay, or ATM, or SDN or NFV or SD-WAN…or 5G.  In point of fact, nearly any connectivity/communications technology can do most of what the others can do.  It works, in short.  The real question, the question that’s been validated by all the past technologies that didn’t meet expectations, should have been can this technology do enough to create a business case for deployment that meets ROI expectations?

One reason this question isn’t asked and answered is that vendors have long understood that new technologies generate sales calls because buyers want to hear about them.  Even if you can’t sell the new widget, you can at least get a sit-down and put the full-court press on the prospect, who may be talked into getting something else that’s a “stepping stone” to that widget-that-will-never-be.

Cynical reasons should never be put aside in our current age, but there are also technical complications that are now coming to the fore in the 5G space.  Many of them arise out of a comment that a New York banking-industry CIO made to me two decades ago.  “We don’t spend much time assessing the value of something that we can’t buy yet.”  Sure, they may answer survey questions or even vendor questions about how much 5G they’d consume, but the fact is that they have no real idea in nearly all cases.  It’s not an issue till it’s sold.

Some say the big factor driving 5G is the lower cost per bit, which some studies say would be three percent of today’s mainstream LTE.  But if lower costs lead to lower service pricing, then operator revenues would decline not grow.  If this is a competitive market today, why would 5G not usher in a period of greater competition, erasing the benefits of the deployment?

5G can’t be sold without a big obvious business case for the operators who deploy it, because in the end it won’t be offered.  We have lots of use cases for 5G, almost all of which are waiting for somebody to put the critical touch on them—a financial projection of cost and revenue.  The “realists” on 5G deployment tend to fall into three camps with respect to the business case side.  All three camps deliver 5G; they differ in the “when?” and “how much?” dimensions.

Camp One is the Modernization camp.  5G will come about because operators will inevitably be investing in their wireless infrastructure and it makes zero sense to invest in something that’s being obsoleted by new/better specifications, even if you can’t prove they’re needed now.  I ran a model on this, and it says that the radio network in first-world economies would be 5G-ready in about four years based on modernization alone.  However, little would happen until late in 2019.

The second camp is the Competition camp.  5G will come about because “5” is greater than “4” and consumers like nice easy numerical processions.  An operator who says they have 5G is on top of the marketing world, and so every operator will want to have it.  One player launching an arms race pretty much guarantees more players will follow.  My model says this strategy will get action in a few key markets, by a few key operators, this year, and it would get 5G deployed in a limited form in those markets by about 2021.  It would have little impact on less-developed areas until 2022.

The final camp is the Live TV camp.  5G will come about because viewing habits and content industry dynamics will rapidly promote delivery of live TV over wireless, which will create both exhaustion pressure on older wireless technology and new revenue opportunities for new services.  My model says this could create significant 5G deployments in key markets by 2020, and advance 5G the most broadly across even third-world economies.

Most operators actually sit in multiple camps.  In the US, for example, AT&T and Verizon are wireless rivals and their position in wireless competition in general and 5G in particular are set by the way their own opportunities are perceived.  That depends on the nature of their goals and the demographic characteristics of their own home geographic region.

AT&T has a fairly low “demand density” in its own wireline area, a lot of which is made up of rural low-density zones.  They see live TV and content largely as a way of boosting the popularity of their wireless services.  They’ve gone to the live-TV position (with DirecTV Now) because it leverages their wireless service through no-data-charges delivery, and because it can attack Verizon for home TV in Verizon’s home region without requiring AT&T deploy wireline infrastructure there.  They also have satellite TV (DirecTV) to deliver linear video in and out of region.  Wireless focus is smart for them.

Verizon has a much higher demand density; concentrated urban and suburban zones make up much of the population, and Verizon has sold off areas where they can’t secure high-density demand for video.  They do not have content properties, satellite video delivery.  For Verizon, their high-density, high-value, wireline base is too attractive to leave undefended.  Their early focus for 5G is thus delivery of broadband at better-than-DSL speeds.  They’ll also add selective 5G mobile support, but it’s not clear how fast or far that part of their plans will progress.  The first two locations they’ve announced for 5G mobile are both in AT&T’s back yard.  Competition, meaning staying even with AT&T’s mobile-centric plans?

In a very real sense, these are both dip-the-toe-in-the-pool strategies, but with very different reasons for risk management.  AT&T can’t be sure how much of a competitive advantage 5G would bring, and they also face the risk of having a change of government create a change in the FCC and a return of stringent neutrality rules, even rules that would forbid off-data-plan video services as being non-neutral.  Verizon has to realize that 5G/FTTN will end up creating a nice way to deliver DirecTV Now unless they launch their own streaming service.

Despite the looming streaming TV issue, Verizon probably has a lower risk overall in pushing 5G.  Their demand density means more customers per unit area, which means 4G would run out of capacity faster, which means the modernization driver would impact them faster.  They don’t have a committed streaming live TV strategy, which means they could tune up an approach based on the current market dynamic.  AT&T may have cobbled their DirecTV Now approach together for more tactical reasons.  Finally, Verizon’s higher density means more competition, so competitive-driven 5G strategies make more sense.  AT&T may be the PR leader in 5G, but Verizon may end up doing more, and quicker.

The big competitive question may not be the incumbent operators at all.  If 5G can deploy more efficient cells, and if it can support home broadband by combining with FTTN, then why couldn’t rival operators or even other players (like Amazon?) decide to deploy some of it in key market areas?  Cell towers on top of Whole Foods sites?  Google Fiber becoming Google 5G?  There are plenty of places where there’s little competition for broadband and video delivery, and plenty of players might see the focus on MVNO cellular that’s intrinsic to network slicing as an opportunity to wholesale capacity to bigger MVNO players.  This may end up being the thing that drives 5G forward fastest, forcing other players to step up too.  But Google and others have postured on entering the space and done little or nothing, so don’t hold your breath.