Service Lifecycle Modeling: More than Just Intent

I blog about a lot of things, but the topic that seems to generate the most interest is service lifecycle automation.  The centerpiece of almost every approach is a model, a structure that represents the service as a collection of components.  The industry overall has tended to look at modeling as a conflict of modeling languages; are you a YANG person, a TOSCA person, a TMF SID person?  We now have the notion of “intent modeling”, which some see as the super-answer, and there are modeling approaches that could be adopted from the software/cloud DevOps space.  How do you wade through all of this stuff?

From the top, of course.  Service lifecycle automation must be framed on a paramount principle, which is that “automation” means direct software handling of service events via some mechanism that associates events with actions based on the goal state of each element and the service overall.  The notion of a “model” arises because it’s convenient to represent the elements of a service in a model, and define goal states and event-to-process relationships based on that.

The problem with this definition as a single modeling reference is that term “service elements”.  A service is potentially a collection of thousands of elements.  Many of the elements are effectively systems of lower-level elements (like a router network), or complex elements like hosted virtual functions that have one logical function and multiple physical components.  The structural reality of networks generates three very specific problems.

Problem number one is what are you modeling?  It is possible to model a service by modeling the specific elements and their relationships within the service itself.  Think of this sort of model as a diagram of the actual service components.  The problem this has posed is that the model doesn’t represent the sequencing of steps that may be needed to deploy or redeploy, and it’s more difficult to use different modeling languages if some pieces of the process (setup of traditional switches/routers, for example) already have their own effective modeling approaches.  This has tended to emphasize the notion of modeling a service as a hierarchy, meaning you are modeling the process of lifecycle management not the physical elements.

The second problem is simple scale.  If we imagine a model as a single structure that represents an entire service, it’s clear in an instant that there’s way too much going on.  Remember those thousands of elements?  You can imagine that trying to build a complete model of a large service, as a single flat structure, would be outlandishly difficult.  The issue of scale has contributed to the shift from modeling the physical service to modeling the deployment/redeployment steps.

Problem three is the problem of abstraction.  Two different ways of doing the same thing should look the same from the outside.  If they don’t, then making a change to how some little piece of a model is implemented could mean you have to change the whole model.  Intent modeling has come to be a watchword, and one of its useful properties is that it can collect different implementations of the same functionality under a common model, and can support hierarchical nesting of model elements, an essential property when you’re modeling steps or relationships not the real structure.

Problem four is suitability and leveraging.  We have many software tools already available to deploy hosted functions, connect things, set up VPNs, and so forth.  Each of these tools has proved itself in the arena of the real market, they are suitable to their missions.  They are probably not suitable for other missions; you wouldn’t expect a VPN tool to deploy a virtual function.  You want to leverage stuff where good stuff is available, meaning you may have to adopt multiple approaches depending on just what you’re modeling.  I think that one of the perhaps-fatal shortcomings of SDN and NFV work to date is the failure to exploit things that were already developed for the cloud.  That can be traced to the fact that we have multiple modeling approaches to describe those cloud-things, and picking one would invalidate the others.

As I noted above, it may well have been the recognition of these points that promoted the concept of intent models.  An intent model is an abstraction that asserts specific external properties related to its functionality, and hides how they’re implemented.  There’s no question that intent models, if properly implemented, offer a major advance in the implementation of service lifecycle automation, but the “properly implemented” qualifier here is important, because they don’t do it all.

Back in the old IPsphere Forum days, meaning around 2008, we had a working-group session in northern NJ to explore how IPsphere dealt with “orchestration”.  The concept at the time was based on a pure hierarchical model, meaning that “service” decomposed into multiple “subservices”, each of which was presumed to be orchestrated through its lifecycle steps in synchrony with the “service” itself.  Send an “Activate” to the service and it repeats that event to all its subservices, in short.  We see this approach even today.

One of the topics of that meeting was a presentation I made, called “meticulous orchestration”.  The point of the paper was that it was possible that the subordinate elements of a given model (an “intent model” in today’s terminology) would have to be orchestrated in a specific order and that the lifecycle phases of the subordinates might not just mimic those of the superior. (Kevin Dillon was the Chairman of the IPSF at the time, hope he remembers this discussion!).

The important thing about this concept, from the perspective of modeling, is that it demonstrated that you might need to have a model element that had no service-level function at all, but rather simply orchestrated the stuff it represented.  It introduced something I called in a prior blog “representational intent.”  If you are going to have to deploy models, if the models have to be intent-based and so contain multiple implementations at a given level, why not consider two thinking in two levels—the model domain and the service domain?

In traditional hierarchical modeling, you need a model element for every nexus, meaning the end of every fork and every forking point.  The purpose of that model element is to represent the collective structure below, allowing it to be an “intent model” with a structured interior that will vary depending on the specific nature of the service and the specific resources available at points where the service has to be delivered or host functionality.  It ensures that when a recovery process for a single service element is undertaken and fails to complete, the recovery of that process at a higher level is coordinated with the rest of the service.

Suppose that one virtual function in a chain has a hosting failure and the intent model representing it (“FirewallVNF” for example) cannot recover locally, meaning that the place where the function was formerly supported can no longer be used.  Yes, I can spin up the function in another data center, but if I do that, will the connection that links it to the rest of the chain not be broken?  The function itself doesn’t know that connection, but the higher-level element that deployed the now-broken function does.  Not only that, it’s possible that the redeployment of the function can’t be done in the same way in the new place because of a difference in technology.  Perhaps I now need a FirewallVNF implementation that matches the platform of a new data center.  Certainly the old element can’t address that; it was designed to run in the original place.

You see how this has to work.  The model has to provide not only elements to represent real service components, but also elements that represent the necessary staging of deployment and redeployment tasks.  Each level of such a structure models context, dependency.

There are other approaches to modeling a service, of course, but the hierarchical approach that defines structure through successive decomposition is probably the most understood and widely accepted.  But even that popular model has to be examined in light of the wide-ranging missions that transformation would be expected to involve, to be sure that we were doing the right thing.

You could fairly say that a good modeling approach is a necessary condition for service lifecycle automation, because without one it’s impractical or even impossible to describe the service in a way that software can be made to manage.  Given that, the first step in lifecycle automation debates should be to examine the modeling mechanism to ensure it can describe every service structure that we are likely to deploy.

There are many modeling languages, as I said at the opening.  There may be many modeling approaches.  We can surely use different languages, even different approaches, at various places in a service model, but somehow we have to have a service model, something that describes everything, deploys everything, and sustains everything.  I wonder if we’re taking this as seriously as we must.

Missions and Architectures: Can the Two Meet?

What do Juniper and Nokia have in common, besides the obvious fact that both are network equipment providers?  The answer may be that the two are both trying to gain traction by making generalized SDN products more mission-specific.  “Laser focus?”  Juniper has announced a multi-cloud application mission for Contrail, and Nokia’s Nuage SDN product is getting a lot of operator traction as an SD-WAN platform.

What do they have in common with the major operator focus?  Apparently not much.  At this very moment, ETSI has formalized its zero-touch automation initiative, which appears to be aimed at broadening the architectural target of automation.  Is this “laser unfocus?”  Is there something going on here that we need to be watching?  I think so.

If you’ve followed the path of SDN and NFV, you know that both concepts burst on the scene to claims and stories that were nothing short of total infrastructure revolution.  Neither of the two has achieved anything like that.  SDN has had success in cloud data centers and in a few other applications, but has had no measurable impact on network infrastructure or operations so far.  NFV has been adopted in limited virtual-CPE applications and in some mobile missions, and in both cases has achieved these limited goals by narrowing its focus.  For vendors who need something to happen, this is a reasonable thing.

The common issue with SDN and NFV is one I’ve blogged about often.  “Transformation” isn’t a half-hearted thing you can tiptoe into.  The definition of the term is “a thorough or dramatic change”, after all.  If you apply it as network operators have for a decade now, it means a revolution in the business model of network operators, created and sustained through a comparable revolution in infrastructure.  In short, it’s big, and in the interest of making progress, neither SDN nor NFV have proved big enough.

Big change demands big justifying benefits to achieve a reasonable return on investment, and the problem with both SDN and NFV is that they have too narrow a scope to deliver those benefits.  In particular, both technologies focus on the infrastructure, not the total business model, and that’s where transformation has to start.  That decision by ETSI to launch a new zero touch automation group (properly called “Zero touch network and Service Management Industry Specification Group” or for short, ZSM ISG) is an acceptance of the need for a wider swath of realizable benefits, and also I think of the fact that the current processes, including both the ETSI NFV ISG and the TMF, are not going to achieve that goal fast enough, if at all.

Vendors aren’t going to throw themselves on the fire, though, so you’d have to assume that there was buyer receptivity for narrower missions, and there is.  Operators want transformation at one level, but at another level they also want to, even have to, do something right now.  Vendors, who are already seeing SDN and NFV take on the trappings of multi-generational abstract research, are certainly interested in making their numbers in the coming quarter.  It’s these two levels of behavior that we’re seeing in the news I cited in my opening, and the “right now” camp is resonating with vendors with that same short-term goal.

That leads to the question of whether an architecture can even work at this point, given the mission-focused, disconnected, progress.  In past blogs, I’ve pointed out that it’s difficult to synthesize a total service automation solution from little disconnected pieces.  Yet, we are seeing one camp moving toward bigger architectures and another moving to smaller applications of current architectures.  Surely having the goals rise while the applications of technology sink isn’t a happy situation.

The need to unify architectures and missions created the “proof-of-concept-low-apple” view of transformation.  You sing a broad and transforming tune, but you focus your attention on a set of limited, clearly useful, impact-conserving pieces of the problem.  The theory is that you can apply what you’ve learned to the problem at large.  We know now that isn’t working; if you focus on small things you tend to design for limited scope, and transformation (you will recall) is revolutionary by definition.  Hence our split, with vendors specificizing even the already-limited-in-scope stuff like SDN and NFV, and operators looking to a new and broader body to handle the real transformation problem.

Is either of these things going to end up in the right place?  Can we, at this point, address the broader goals of proving “necessity and benefit of automating network and service operation in next generation networks”, as the ZSM ISG white paper suggests?  Will vendors, seeking quarterly returns through limited applications, be able to later sum these limited things into a glorious whole?

Those who won’t learn history are doomed to repeat it, so the saying goes.  The operators have now accepted the biggest problem with their earlier NFV initiative—it didn’t take a big enough bite out of even the low apple.  We can assume, since their quote demonstrates the correct scoping of the current effort, that mistake won’t be repeated.  Vendors like Juniper and Nokia should see that enterprises and service providers all want transforming changes, so we can assume that they will frame their local solutions so as to make them extensible.  What we can’t assume is that operators won’t make a different mistake by failing to understand the necessity of a strong technical architecture for their transformed future.  Or that vendors will somehow synthesize an overall approach from steps whose limited size and scope were designed to avoid facing the need for one.

Recall that we have, in the industry today, two camps—the “architecture” camp and the “mission” camp.  Whatever you think the balance between the two should be, there is one crystal truth, which is that only missions can get business cases.  You can’t talk about transformation without something to transform.  What is less accepted as a truth, but is true nevertheless, is that absent an architecture, a bunch of mission-specific advances advance to mission-specific silos.  That’s never efficient, but if what you’re trying to do is transform a whole ecosystem, it’s fatal.  The pieces of your service won’t assemble at all, much less optimally, and you’ll have to constantly convert and adapt.

Right now, we have years of work in technologies for network transformation without comparable thinking on the question of how to combine them to create an efficient operational ecosystem.  We are not going back to redo what’s been done, so we have to figure out a way of retrofitting operations to the sum of the needs the missions create.  This is a very different problem, and perhaps a new body will be able to come at it from a different direction.  It’s not an easy fix, though.  The mission guys don’t speak software and the architecture guys can’t understand why we’re not talking about abstract programming techniques and software design patterns.  The two don’t talk well to each other, because neither really understands the other.

So, do we do kumbaya songfests by the fire to introduce them?  No, because remember it’s the mission guys who have the money.  If we want to get a right transformation answer out of the ZSM ISG, then software people will have to learn to speak mission.  They’ll have to talk with the mission people, frame a set of mission stories with their architecture at the center of each, and convince the mission people that the approach makes the business case for their mission.  Not that it makes all business cases, not that it’s the superior, modern, sexy, way of doing things.  That it makes the business case for each buyer and still, well-hidden under the technical covers, makes it for all and in the right way.

Is there a pathway to doing this?  If there is, then getting on it quickly has to be the goal of the ZSM ISG process, or we’ve just invented NFV ISG 2.0, and vendors will be carving out little missions that match incomplete solutions.

What Does Verizon’s Dropping IPTV FiOS Mean for Streaming Video?

Verizon is reportedly abandoning its streaming video platform, says multiple online technology sources.  That, if true, raises some very significant questions because it could mean that Verizon has abandoned streaming as a delivery strategy for FiOS TV.  If that’s true, then what does it mean for the 5G/FTTN hybrid model of broadband that Verizon has been very interested in?

I can’t confirm the story that FiOS IPTV is dead, but it sure seems from the coverage that there are multiple credible sources.  Verizon has been dabbling with the notion of IPTV for FiOS for some time, and for most of the period it was seen as a way of competing with the AT&T DirecTV franchise, which can deliver video outside AT&T’s wireline footprint, meaning in Verizon’s region.  The best defense is a good offense, so IPTV could have let Verizon take the fight back to AT&T.  I think that AT&T was indeed the primary force in the original Verizon IPTV plan.

AT&T has further complicated the situation since FiOS IPTV was first conceptualized.  DirecTV Now, which for at least some AT&T mobile customers is available cheaply with no charge against data usage, elevates DirecTV competition into the mobile space.  You could argue that what Verizon really needs is a way of offering off-plan viewing of TV shows to its mobile customers, to counter what’s clearly a trend toward special off-plan content deals from competitors.

A true unlimited plan without any video throttling would support streaming without off-plan special deals, of course, but for operators who have content properties in some form, the combination of those content elements and mobile services offers better profit than just allowing any old third-party video streaming player to stream over you.  Even, we point out, a competitor.

On the other side of the equation is the fact that if Verizon really plans to replace DSL tail circuits from FTTN nodes with 5G millimeter wave and much better broadband, would it not want to be able to sell “FiOS” to that customer group?  Some RF engineers tell me that it is theoretically possible to broadcast a full cable-TV channel complement over 5G/FTTN.  However, you use up a lot of millimeter-wave bandwidth with all those RF channels, and remember that viewers are more interested in bundles with fewer channels, even in a la carte video.  Surely it would be easier to just stream shows over IP.  Streaming over IP would also be compatible with mobile video delivery, something that could be helpful if Verizon elected to pair up its 5G/FTTN millimeter-wave stuff with traditional lower-frequency 5G suitable for mobile devices.  Or went traditional 5G to the home.

So does the fact (assuming again the story is true) that Verizon is bailing on IPTV FiOS mean it’s not going to do 5G/FTTN or won’t give those customers video?  I think either is incredibly unlikely, and there is another possible interpretation of the story that could be more interesting.

Look at the home TV when FiOS first came out.  It had an “antenna” and a “cable” input, and the latter accommodated a set-top box that delivered linear RF video.  Look at the same TV today.  It has an Internet connection and increasingly a set of features to let you “tune” to Internet streaming services.  There are a growing number that have features to assemble a kind of streaming channel guide.  The point is that if we presumed that everything we watched was streamed, we wouldn’t need an STB at all unless we didn’t have either a smart TV or a device (Apple TV, Chromecast, Fire TV, Roku, or whatever) that would let us adapt a non-Internet TV to streaming.

In this light, a decision by Verizon to forego any IPTV version of FiOS looks a lot smarter.  Why invent an STB for video technology that probably every new model of TV could receive without such a device?  In my own view, the Verizon decision to drop IPTV FiOS plans is not only non-destructive to its 5G/FTTN strategy, it serves that strategy well.  So well, in fact, that when enough 5G/FTTN rolls out, Verizon is likely to start phasing in the streaming model to its new FiOS customers, then to them all.

Even the competitive situation favors this kind of move.  A pure STB-less streaming model is much easier to introduce out of area, to competitive provider mobile customers, etc.  It has lower capex requirements, it’s more suited to a la carte and “specialized bundle” models, and thus gives the operator more pricing flexibility.  Add to that the fact that the cable operators, who currently have to assign a lot of CATV capacity to linear RF channels, are likely to themselves be forced to adopt IP streaming, and you can see where Verizon would be if they somehow tried to stay with RF.

You might wonder why all of this seems to be coming to a head now, when it was at least a possible model for the last decade.  I think the answer is something I mentioned in a recent blog; mobile video has essentially separated “viewers” from “households.”  If you give people personal video choices, they tend to adopt them.  As they do, they reduce the “watching TV as a family” paradigm, which is what’s sustained traditional viewing.  My video model has suggested that it’s the early-family behavior that sets the viewing pattern for households.  If you give kids smartphones, as many already do, then you disconnect them from family viewing very quickly.

Time-shifting has also been a factor.  The big benefit of channelized TV is that you only have to transport a stream once.  If you’re going to time-shift, the benefit of synchronized viewing is reduced, and probably to the level where caching is a suitable way of optimizing delivery bandwidth.  Anyway, if you presumed that “live” shows were cached at the serving office level, you could multi-cast them to the connected homes.  Remember, everyone needs to have a discrete access connection except where you share something like CATV channels.

I think that far from signaling that Verizon isn’t committed to streaming, the decision to drop the IPTV FiOS platform is a signal that they’re committed to where streaming is heading, rather than to a channelized view of a streaming model.  If channels are obsolete, so for sure are set-top boxes.

Enterprise Budgets in 2018: More Questions but Some Clarity

Network operators obviously buy a lot of network gear, but so do enterprises.  In my past blogs I’ve tended to focus on the operator side, largely because my own practice involves more operators and their vendors than it does enterprises.  To get good enterprise data, I have to survey explicitly, which is too time-consuming to do regularly.  I did a survey this fall, though, and so I want to present the results, and contrast them with what I’ve gotten on the operators this year.

Enterprises have a shorter capital cycle than operators, typically three years rather than five, and they experience faster shifts in revenue than operators usually do.  As a result, their IT spending is more variable.  They also traditionally divide IT spending into two categories—“budget” spending that sustains current IT commitments, and “project” spending that advances the use of IT where a favorable business case can be made.

The biggest question that I’ve tried to answer with respect to enterprise IT has always been where that balance of budget/project spending can be found.  In “boom” periods, like the ‘90s, I found that project spending lead budget spending by almost 2:1.  Where IT spending was under pressure, the ratio shifted back to 1:1, then below, and that’s what has happened since 2008 in particular.  In 2006, which was my last pre-recession survey of enterprises, Project spending was 55% of total IT spending.  It slipped to its lowest level of the recession in 2009, where it was 43%, gained up to 2015, and then began to slip again.

This year, project spending was 49% of total IT spending, and enterprises suggest that it could fall as low as 39% in 2018, which if true would be the lowest level since I surveyed first in 1982.  It could also rise to as much as 54%, which would be good, and this is obviously a fairly broad range of possibilities.  Interestingly, some Wall Street research is showing the same thing, though they don’t express their results in exactly the same terms.  The point one Street report makes is that IT was once seen as an “investment area” and is now seen as a “cost area”, noting that the former state was generated because it was believed that IT investment could improve productivity.

CIOs and CFOs in my survey agreed that 2018 would see more IT spending, but they disagreed on the project/budget balance, with CIOs thinking there would be more project spending to take advantage of new productivity opportunities, and CIOs thinking that they’d simply advance the regular modernization of infrastructure.  It’s this difference in perspective that I think accounts for the wider range of project/budget balance projections for next year.

Where this aligns with network operator thinking is fairly obvious.  I noted that operators had a lot of good technology plans and difficulty getting them budgeted as recommended.  That seems to be true for enterprises too.  CIOs think that there’s more that IT could do, but CFOs aren’t yet convinced that these new applications can prove out in business terms.

That’s the heart of the problem with the enterprise, a problem that in a sense they share with the operators.  Absent a benefit, a business case, you can’t get approval for new tech projects in either sector.  In the enterprise, it’s productivity gains that have to be proved, and with operators it’s new revenue.  In both sectors, we have a mature market where the low apples have been picked, the high-return opportunities.  What’s left either has to have unproven technology or benefits less easily quantifiable.  In either case, approvals are harder now.  That won’t change until a new paradigm emerges.

Tech isn’t a paradigm, it’s the mechanization of one.  You can improve software, software architecture, data mining, or whatever you like, and what you have done is valuable only if you can use that change to make a business case, to improve productivity or revenues.  We’re good at proposing technology changes, less good at validating the benefits of the changes.  Till that improves, we’ll probably under-realize on our key technology trends.

Except, perhaps, in one area.  Technology that reduces cost is always credible, and enterprises tell me that an astounding nine out of ten cost-saving technology projects proposed this year are budgeted for 2018.  This includes augmented cloud computing, some hyperconvergence projects, and in networking the application of SD-WAN.  In productivity-driven projects, only three in ten were approved.

It’s interesting to see how vendor influence interacts with project priority, and here there are some differences between operators and enterprises.  Operators have always tended to be influenced most by the vendors who are the most incumbent, the most represented in their current infrastructure and recent purchases.  Enterprises have tended to shift their vendor focus depending on the balance of future versus past, and the project/budget balance is an indicator there too.  This year, the vendors who gained influence in the enterprise were the ones that the enterprise associated with the cloud—Microsoft, Cisco, and (trailing) IBM.  There’s a different motivation behind each of the three.

Microsoft has been, for some time, the dominant cloud computing influence among enterprises, not Amazon.  I’ve noted in the past that a very large chunk of public cloud revenues come from social media and content companies, not from enterprises.  Microsoft’s symbiotic cloud positioning, leveraging data center and public cloud hybridization, has been very favorably received.  Twice as many enterprises say that Microsoft is a trusted strategic partner in the cloud than name Amazon.

Microsoft has some clear assets here.  First, they have a data center presence and a cloud presence.  Vendors who rely totally on Linux servers have the disadvantage of sharing Linux with virtually every other server vendor, where Microsoft has its own software technology on prem.  They also have, as a result of that, a clear and long-standing hybrid cloud vision.  Finally, they can exploit their hybrid model to use the cloud as a tactical partner for apps that need more elastic resources, faster deployment, more agility.  It’s winning for Microsoft, so far.

Cisco as a leading influence frankly surprised me, but when I looked at the reason behind the choice it makes sense.  To a CIO, the transformation to a hybrid cloud is a given.  That is seen as being first and foremost about the network accommodation of more complex and diverse hosting options, which implicates the corporate VPN, which usually means Cisco.  Cisco is also the only prime network vendor seen as having direct cloud computing credentials.

Cisco doesn’t have the clear public-cloud link that Microsoft has, which means that they can’t reap the full benefit of hybridization in hosting.  Some enterprises think that it makes Cisco pull back from opportunities that need developed at the cloud service level, confining them more to the network than they might.  Others note that Cisco is getting better at cloud advocacy.  Their recent purchase of cloud-management provider Cmpute.io may be a sign of that; it could give them entrée into hybridization deals.

Third-place IBM didn’t surprise me, in large part because IBM has always had very strong strategic account control among enterprises.  IBM did slip in terms of influence, though.  Its major competitors, HPE and Dell, slipped less and thus gained a bit of ground.  Still, both companies have only started to recover from a fairly long slide in terms of enterprise strategic influence.  There’s at least some indication that either or both could displace IBM by the end of the year.

IBM’s assets, besides account control, lie in its software resources, but it’s still struggling to exploit them in a cloud sense.  Having divested themselves of a lot of hardware products, they have the skin-in-the-game problem Cisco has, and unlike Microsoft their own cloud services haven’t exactly blown competition out of the market.  Among the largest enterprises, IBM is still a power.  Elsewhere is another story.

Enterprises will spend more on tech in 2018, largely because they tend to budget more directly in relationship to revenue expectations than operators do.  Their biggest focus will be the modernization of what they had, and that’s what will drive them to things like higher-density servers, and container software to improve hosting efficiency is second.  The cloud is third, and that’s where some potential for productivity enhancement and some focus on cost management collide.  If that collision generates good results in 2018, we can expect a decisive shift to productivity-directed measures, a shift toward the cloud.

Network Operator Technology Plan to Budget Transition: First Look

I continue to get responses from operators on the results of their fall planning cycle, on my analysis of those results, and on their planning for 2018.  The interesting thing at this point is that we’re in the period when technology planning, the focus of the fall cycle, collides with the business oversight processes that drive budgeting, the step that’s now just starting and will mature in January and February.  In fact, it’s this transition/collision that’s generating the most interesting new information.

Typically, the fall technology cycle leads fairly directly to budget planning.  In the ‘90s, the first decade for which I had full data, operators told me that “more than 90% of their technology plans were budgeted as recommended”.  That number fell just slightly in the first decade of this century, and in fact it’s held at over 80% for every year up to last year.  In 2016, only about three-quarters of the technology plans were budgeted, and operators told me this year that the expected that less than two-thirds of their technology plans would be funded next year.

The problem, says the operators, is that it is no longer fashionable or feasible for CFOs to blindly accept technical recommendations on network changes.  In the ‘90s, half the operators indicated that CFOs took technology plans as the primary influence on budget decisions.  By 2009 that had fallen to a third, and today only 17% of operators said that technology plans were the primary driver.  What replaced the technology plans was business case analysis, which in 1985 (when I specifically looked at this issue) was even named as an important factor in just 28% of cases.

Part of the reason for the shift here, of course, was the breakup of the Bell System and the Telecom Act of 1996.  Regulated monopolies don’t really need to worry much about business case, after all.  But remember that the biggest difference in behavior has come since 2015, and through all that period the regulatory position of operators was the same.  The simple truth is that operators are finally transitioning from being de facto monopolies, with captive markets, into simple competitors, and as such they can’t just toss money to their network weenies.

So how did this impact technology plans?  Well, 5G plans were budgeted at a rate over 80%, but nearly all the stuff that’s been approved relates to the millimeter-wave 5G hybridization with FTTN, and the remainder still relates to the New Radio model.  Everything beyond those two topics is just trials at this stage, and in many cases very limited trials at that.  But 5G, friends, is the bright spot.

SDN was next on the list of stuff that got more budget than average.  Operators said that almost half of SDN projects were budgeted, but remember that the majority of these projects involved data center switching.  If we looked outside the data center and restricted ourselves to “SDN” meaning “purist ONF-modeled” SDN, about a quarter of the technology plans were budgeted.

NFV fared worse, to no surprise.  It had less than a third of its planned projects budgeted, and nearly all of those that won were related to virtual CPE and business edge services.  The actual rate of mobile-related (IMS/EPC) success was higher, but the number of these projects was small and the level of commitment beyond a simple field trial was also limited.

Worst on the list was “service lifecycle automation”, which had less than a quarter of the technology plans budgeted.  According to operators, the problem here is a lack of a consistent broad focus for the projects.  The fuzzier category of “OSS/BSS modernization” that I’ve grouped into this item did the best, but the goals there were very narrow and inconsistent.  Three operators had ambitious closed-loop automation technology plans, none of which were approved.

Interestingly, the results in all of these categories could be improved, says the CFO organizations, if CTO, CIO, and COO teams could come up with a better business case, or make a better argument for the benefits being claimed.  According to the operators, these changes could be made with little impact in reducing budgets even as late as May, but if nothing significant was done by then, it would be likely that fixing the justifications would result in limited spending next year and more budget in 2019.

The second thing that came out of the comments I received is that even operators who weren’t among the 77 in my survey base were generally in accord with the results of the fall technology planning survey.  There were a few that were not, and those I heard from were generally associated with atypical marketing opportunities or competitive situations.  National providers with no competition and business-only providers made up the majority of the dissenters.  I suspect, but can’t prove, that those who said their own views/results had been very different were expressing technology optimism more than actual different experiences, but since I don’t have past survey results to validate or invalidate this judgment, I have to let the “why” go.  Still, I do need to say that among non-surveyed operators, the view on SDN and NFV is a bit more optimistic.

A couple of other points that emerged are also interesting.  None of the operators who got back to me after the fall planning cycle thought that they would likely take aggressive action in response to any relaxation in net neutrality rules.  They cited both a fear of a return to current regulations and fear of backlash, with the former a general concern and the latter related mostly to things like paid prioritization and settlement.  Operators need consistency in policy, and so far they don’t see a lot of that in most global regulatory jurisdictions.  I’d point out that in most markets, a commission is responsible for policy but operates within a legislative framework, and thus it would take a change of law to create more constant policy.

Another interesting point that I heard from operators was that they’re becoming convinced that “standards” in the traditional sense are not going to move the ball for them going forward.  In fact, about ten percent of operators seem to be considering reducing their commitment to participation in the process, which means sending fewer people to meetings or assigning them to work specifically on formal standards.  On the other hand, three out of four said they were looking to commit more resources to open-source projects.

Operators have had a love/hate relationship with standards for at least a decade, based on my direct experience.  On the one hand, they believe that vendors distort the formal standards process by pushing their own agendas.  Operators, they point out, cannot in most markets control a standards body or they end up being guilty of anti-trust collusion.  They hope that open-source will be better for them, but they point out that even in open-source organizations the vendors still tend to dominate with respect to resources.  That means that for an operator to advance their specific agenda, they have to do what AT&T has done with ECOMP, which is develop internally and then release the result to open-source.

The final point was a bit discouraging; only one in ten operators thought they’d advance significantly on transformation in 2018, but there was never much hope that 2018 would be a big year.  The majority of operators said in 2016 that transformation will get into high gear sometime between 2020 and 2022.  That’s still what they think, and I hope that they’re right.

Are We Seeing the Sunset of Channelized Live TV?

There is no question that the video space and its players are undergoing major changes.  It’s not clear where those are leading us, at least not yet.  For decades, channelized TV has been the mainstay of wireline service profit, and yet it’s more threatened today than ever before.  Where video goes, does wireline go?  What then happens to broadband?  These are all questions that we can explore, but perhaps not quite answer.

With VoIP largely killing traditional voice service profit growth and Internet revenue per bit plummeting, operators have come to depend on video in the form of live TV to carry their profits forward.  At the same time, the increased reliance on mobile devices for entertainment has radically increased interest in streaming non-live video from sources like Amazon, Hulu, and Netflix.  The combination has also generated new live-TV competition from various sources, including Hulu and AT&T, and more ISPs are planning to offer some streaming options in the future.

At the same time, the fact that streaming means “over the Internet” and the Internet is agnostic to the specific provider identity means that many content sources can now think about bypassing the traditional TV providers.  The same sources are looking to increase profits, and so increase the licensing fees charged to the TV providers.  Those providers, also looking for a profit boost, add their own tidbit to the pricing to users, which makes users unhappy with their channelized services and interested in streaming alternatives.

I’ve been trying to model this for about five years, and I think I’ve finally managed to get some semi-useful numbers.  Right now, in the US market, my research and modeling says that about a third of all TV viewers regularly use some streaming service, and about 12% have a “live” or “now” option.  It appears that 8% or so exclusively stream, meaning they have truly cut the cord.  This data has to be viewed with some qualifiers, because many young people living at home have only streaming service but still get live TV from the household source.  It’s this that accounts for what Neilson has consistently reported as fairly constant household TV viewing and at the same time accounts for a rise in streaming-only users.  Households and users aren’t the same thing.

Wireline services serve households and mobile services serve users.  The fact that users are adopting streaming because of increased mobile dependence isn’t news.  What is news is that this year it became clear that traditional channelized viewing was truly under pressure at the household level.  People seem to be increasingly unhappy with the quantity and quality of original programming on traditional networks, and that translates to being less willing to pay more every year for the service.

In my own limited survey of attitudes, what I found was that about two-thirds of viewers don’t think their TV service offers good value.  That number has been fairly steady for about four years.  Of this group, the number who are actively looking to reduce their cost has grown over that four years from about a fifth to nearly half.  Where TV providers have offered “light” or “a la carte” bundles, users tend to adopt them at a much higher rate than expected.  All of this is a symptom that TV viewing is under growing pressure, and that the “growing” is accelerating.

The most obvious consequence of this is the obvious desire of cable/telco giants and even streaming video players to get their own video content.  Comcast buys NBC Universal, AT&T wants Time Warner, and Amazon and Netflix are spending a boatload on original content.  I don’t think anyone would doubt that this signals the belief of the TV delivery players that content licensing is going to be ever more expensive, so they need to be at least somewhat immune to the cost increases.  Ownership of a TV network is a great way to limit your licensing exposure, and also to hedge cost increases because you’ll get a higher rate from other players.

Another obvious impact of a shift toward streaming is that you don’t need to own the wireline infrastructure that touches your prospects.  You don’t need to own any infrastructure at all, and that means that means that every operator who streams can feed on the customers of all its competitors.  Those who don’t will become increasingly prey rather than predator.  And the more people start to think in terms of finding what they want when they need it, rather than viewing what’s on at a given time, the less value live TV has as a franchise.

I think it’s equally obvious that the TV industry has brought this on themselves, in a way.  For wireline TV players, their quest for mobile service success has promoted a whole new kind of viewing that’s seducing users away from traditional TV, even where it leaves household connections intact.  A family of four would likely select a fairly fat bundle to satisfy everyone, but if half the family is out with friends viewing mobile content, will they need the same package?  Competition for higher access speeds as a differentiator also creates more bits for OTTs to leverage, and encourages home consumption of streaming.

The quality of material is also an issue.  If you want “old” material you clearly can get most of it from somebody like Amazon, Hulu, or Netflix.  If you want new material, you’re facing much shorter production seasons, a reported drop in the quality of episodes, and higher prices.  Every time someone finds that their favorite shows have already ended their fall season (which apparently ran about 2 months this year) and goes to Amazon to find something to view, they are more likely to jump to streaming even when something is new, because they expect more.

To me, the revolutionary truth we’re finally seeing is that “viewers” are increasingly separating from “households”.  We’ve all seen families sitting in a restaurant, with every one of them on their phones, ignoring the others.  Would they behave differently at home?  Perhaps many think they should, but markets move on reality and not on hopes, and it seems that personal mobile video is breaking down the notion of collective viewing, which means it’s breaking down channelized TV bundles, which means it’s eroding the whole model of channelized TV.

If you need to stream to reach mobile users, if mobile users are the ones with the highest ARPU and the greatest potential in producing future profits, and if streaming is going to reshape viewer behavior more to favor stored rather than live shows, then when steaming hits a critical mass it will end up reshaping the industry.  That probably won’t happen in 2018, but it could well happen in 2019, accelerated by 5G/FTTN deployments.

I’ve been mentioning 5G and fiber-to-the-node hybrids in my recent blogs, and I think it’s warranted.  This is the part of 5G that’s going to be real, and quickly, and it has many ramifications, the shift toward streaming being only one of them.  5G/FTTN, if it truly lowers the cost of 100 Mbps and faster “wireline” Internet radically, could even boost competition in the access space.  New York City did an RFI on public/private partnership strategies for improving broadband speed and penetration, and other communities have been interested in municipal broadband.  The 5G/FTTN combination could make it possible in many areas.

Ah, again the qualifying “could”.  We don’t know how much 5G/FTTN could lower broadband cost, in part because we do know that opex is the largest cost of all.  Current players like Verizon have the advantage of an understanding of the total cost picture, and a disadvantage in that they have so many inefficient legacy practices to undo.  New players would have to navigate the opex side in a new way, to be a pioneer in next-gen closed-loop practices that nobody really has proved out yet.  We’ll surely see big changes in video, but the critical transition of “wireline” from channelized to streaming will take some time.

Another Slant on the Service Lifecycle Automation Elephant

I asked in an earlier blog whether the elephant of service automation was too big to grope.  The Light Reading 2020 conference this week raised a different slant on that old parable, which is whether you can build an elephant from the parts you’ve perceived by touching them.

Wikipedia cites the original story of the elephant as:

A group of blind men heard that a strange animal, called an elephant, had been brought to the town, but none of them were aware of its shape and form. Out of curiosity, they said: “We must inspect and know it by touch, of which we are capable”. So, they sought it out, and when they found it they groped about it. In the case of the first person, whose hand landed on the trunk, said “This being is like a thick snake”. For another one whose hand reached its ear, it seemed like a kind of fan. As for another person, whose hand was upon its leg, said, the elephant is a pillar like a tree-trunk. The blind man who placed his hand upon its side said, “elephant is a wall”. Another who felt its tail, described it as a rope. The last felt its tusk, stating the elephant is that which is hard, smooth and like a spear.

My original point was that it’s possible to envision an elephant/problem that’s just too big to be grasped (or groped!) in pieces.  The question raised by the Light Reading piece is whether you can assemble an elephant from what you thought the pieces were.  Snakes and fans and walls and spears don’t seem likely to provide the right tools for elephant-assembly, right?  Might it be that the tools of our current transformation process don’t add up to the whole at all?

Most of the key initiatives to support transformation of the network operator business model have been deliberately localized in terms of scope.  SDN and NFV, for example, didn’t cover the operations side at all; it was out-of-scope from the first.  It’s also out-of-scope for 5G and IoT, and both these initiatives talk about SDN and NFV but acknowledge that the standards for them are created elsewhere.

Like the blind men in the Wikipedia version of the elephant story, the standards bodies are dealing with what’s in scope to them, and that has totally fragmented the process of transforming things.  This fragmentation isn’t critical when the standards cover an area fully enough to guide and justify deployment, but remember that “transformation” is a very broad goal.  Addressing it will require very broad initiatives, and we have none of them today.

If the whole of transformation can’t be created by summing the parts of initiatives we’ve undertaken, can we really expect to get there?  I’ve been involved in three international standards activities aimed at some part of transformation, and indirectly associated with a fourth.  I’ve seen three very specific problems that fragmentation of the standards initiatives have created, and either could compromise our goals.

The first problem is the classical “dropped ball” problem.  For example, if the NFV ISG decided that operations is out of scope, how does the body ensure that any specific operations impacts of their activity are met by somebody?  The classic standards-group answer is “liaisons”, between the groups, but we’re still seeing liaison requests submitted and approved by the ISG four years after the process started.

What we’re lacking to address this problem effectively is a single vision of the parts that have to add up to our glorious transformation whole.  Not the details, just the identification of the total requirements set and how that set is divided up among the bodies doing the work.  This could, of course, guide liaison by identifying what is essential in the way of the relationships across the various groups.  It could also bring to the fore the understanding that there are areas in the work Group A is doing that can be expected to heavily impact Group B, thus showing that there needs to be special attention given to harmonization.

There’s nowhere this is more obvious than in the relationship between NFV and the cloud.  What is a VNF, if not a network application of cloud computing principles?  We were doing cloud computing before NFV ever started, and logically should have used cloud computing standards as the basis for NFV.  I firmly believe (and have believed from the first) that the logical way to do NFV was to presume that it was a lightweight organizing layer on top of cloud standards.  That’s not how it’s developed.

The second problem is the “Columbus problem”.  If you start off looking for a route to the East and run into an entirely new continent instead, how long does it take for you to realize that your original mission has resulted in myopia, and that your basic premise was totally wrong?

We have that problem today in the way we’re looking at network transformation.  Anyone who looks at the way operator costs are divided, or that talks with operators about where benefits of new technologies would have to come from, knows that simple substitution of a virtual instance of a function (a VNF) for a device (a “physical network function” or PNF) isn’t going to save enough.  In 2013, most of the operators who signed off on the original Call for Action admitted they could get at least that level of savings by “beating Huawei up on price”.  They needed opex reduction and new service agility to do the job, and yet this realization didn’t impact the scope of the work.

The final problem, perhaps the most insidious, is the “toolbox problem.”  You start a project with a specific notion of what you’re going to do.  You have the tools to do it in your toolbox.  You find some unexpected things, and at first you can make do with your toolbox contents, but the unexpected keeps happening, and eventually you realize that you don’t have what you need at all.  I’ve done a lot of runs to Lowes during do-it-yourself projects so I know this problem well.

The current example of this problem is the notion of orchestration and automation.  You can perform a specific task, like deploying a VNF, with a script that lets you specify some variable parameters.  But then you have to be able to respond to changes and problems, so you need the notion of “events”, which means event-handling.  Then you increase the number of different things that make up a given service or application, so the complexity of the web of elements increases, and so does the number of events.  If you started off thinking that you had a simple model-interpreter as the basis for your orchestration, you now find that it can’t scale to large-sized, event-dense, situations.  If you’d expected them from the start, you’d have designed your toolbox differently.

Architecturally speaking, everything we do in service lifecycle processing should be a fully scalable microservice.  Every process should scale with the complexity of the service we’re creating and selling, and the process of coordinating all the pieces through exchange of events should be structured so that you can still fit in a new/replacement piece without having to manually synchronize the behavior of the new element or the system as a whole.  That’s what the goals for service lifecycle automation, zero-touch automation, closed-loop automation, or whatever you want to call it, demand.  We didn’t demand it, and in many cases still aren’t demanding it.

None of these problems are impossible to solve; some are already being solved by some implementations.  Because we’ve not particularly valued these issues, particularly in terms of how they’re covered, there’s not been much attention paid by vendors in explaining how they address them.  Buyers don’t know who does and who doesn’t, which reduces the benefit of doing the right thing.

We also need to take steps in both the standards area and the transformation-related open-source projects to stop these kinds of issues from developing or worsening.  Some sort of top-down, benefit-to-project association would be a good start, an initiative to start with what operators expect from transformation and align the expectation with specific steps and architectural principles.  This wouldn’t be difficult, but it might then be hard to get the various standards bodies to accept the results.  We could try, though, and should.  Could some major standards group or open-source activity not step up and announce something along these lines, or even a credible vendor or publication?

Nobody wants to commit to things that make their work more complicated.  Nobody likes complexity, but if you set about a complex process with complex goals, then complexity you will have, eventually.  If you face that from the first, you can manage things at least as well as the goals you’ve set permit.  If you don’t, expect to make a lot of trips to Lowes as you try to assemble your elephant, and they probably won’t have the right parts.

What are the Options and Issues in AI in Networking?

It looks like our next overhyped concept will be AI.  To the “everything old is new again” crowd, this will be gratifying.  I worked on robotics concepts way back in the late 1970s, and also on a distributed-system speech recognition application that used AI principles.  Wikipedia says the idea was introduced in 1956, and there was at one time a formal approach to AI and everything.  Clearly, as the term gains media traction, we’re going to call anything that has even one automatic response to an event “AI”, so even simple sensor technologies are going to be cast in that direction.  It might be good to look at the space while we can still see at least the boundaries of reality.

In networking, AI positioning seems to be an evolution of “automation” and “analytics”, perhaps an amalgamation of the two concepts.  Automation is a broad term applied to almost anything that doesn’t require human intervention; to do something on a computer that was once done manually is to “automate” it.  Analytics is typically used to mean the application of statistical techniques to draw insight from masses of data.  “Closed-loop” is the term that’s often used to describe systems that employ analytics and automation in combination to respond to conditions without requiring a human mediator between condition and action.  AI is then an evolution of closed-loop technology, enhancing the ability to frame the “correct” response to conditions, meaning events.

There have been many definitions and “tests” for artificial intelligence, but they all seem to converge on the notion that AI systems have the ability to act as a human would act, meaning that they can interpret events and learn behaviors.  We seem to be adopting a bit broader meaning today, and thus we could say that in popular usage, AI divides into two classes of things—autonomous or self-organizing systems that can act as a human would based on defined rules, and self-learning systems that can learn rules through observation and behavior.

The dividing line between these two categories is fuzzy.  For example, you could define a self-drive car in a pure autonomous sense, meaning that the logic of the vehicle would have specific rules (“If the closure rate with what is detected by front sensors exceeds value x, apply the brakes until it reaches zero.”) that would drive its operation.  You could, in theory, say that the same system could “remember” situations where it was overridden.  Or you could say that the car, by observing driver behavior, learned the preferred rules.  The first category is autonomous, the second might be called “expanded autonomous” and the final one “self-learning”.  I’ll use those terms in this piece.

Equipped now, at least if you accept these terms as placeholders, to classify AI behaviors, we can now look at what I think is the top issue in AI, the rule of override.  Nobody wants a self-driving car that goes maverick.  To adopt AI in any form you have to provide the system with a manual override, not only in the sense that there might be a “panic button” but in the sense that the information the user needs to make an override decision is available in a timely way.

This rule is at the top because it’s not only the most important but the most difficult.  You can see that in a self-drive car, the rule means simply that the controls of the vehicle remain functional and that the driver isn’t inhibited from either observing conditions around the vehicle or using the override function if it’s necessary.  In a network, the problem is that the objective of network automation is to replace manual activity.  If no humans remain to do the overriding, you clearly can’t apply our rule, but in the real world, network operations center personnel would likely always be available.  The goal of automation, then, would be to cut down on routine activity in the NOC so that only override tasks would be required of them.

That’s where the problem arises.  Aside from the question of whether NOC personnel would be drinking coffee and shooting the bull, unresponsive to the network state, there’s the question of what impact you would have on automation if you decided to even offer an override.  The network sees a failure through analysis of probe data.  It could respond to that failure in milliseconds if it were granted permission, but if an override is to be made practical you’d have to signal the NOC about your intent, provide the information and time needed for the operator to get the picture, and then either take action or let the operator decide on an alternative, which might mean you’d have to suggest some other options.  That could take minutes, and in many cases make the difference between a hiccup in service and an outage.

This problem isn’t unsolvable; router networks today do automatic topology discovery and exercise remedial behavior without human intervention.  However, the more an AI system does, meaning the broader its span of control, the greater the concern that it will do something wrong—very wrong.  To make it AI both workable and acceptable, you need to provide even “self-learning” systems with rules.  Sci-Fi fans will perhaps recall Isaac Asimov’s “Three Laws of Robotics” as examples of policy constraints that operate even on highly intelligent AI elements, robots.  In network or IT applications, the purpose of the rules would be to guide behavior to fit within boundaries, and to define where crossing those boundaries had to be authorized from the outside.

An alternative approach in the AI-to-network-or-IT space would be to let a self-learning system learn by observation and build its own rules, with the understanding that if something came up for which no rule had been created (the system didn’t observe the condition) it could interpolate behavior from existing rules or predefined policies, and at least alert the NOC that something special had happened that might need manual review.  You could also have any action such a system takes be “scored” by impact on services overall, with the policy that impacts below a certain level could be “notify-only” and those above it might require explicit pre-authorization.

All of this is going to take time, which is why I think that we’ll likely see “AI” in networking applications focusing mostly on my “autonomous” system category.  If we look at the whole notion of intent modeling, and manifest that in some sort of reality, we have what should be autonomous processes (each intent-modeled element) organized into services, likely through higher layers of model.  If all of this is somehow constrained by rules and motivated by goals, you end up with an autonomous system.

This leads me back to my old AI projects, particularly the robotics one.  In that project, my robot was a series of interdependent function controllers, each responsible for doing something “ordered” from the system level.  You said “move north” and the movement controller set about carrying that out, and if nothing intervened it would just keep things moving.  If something interfered, the “context controller” would report a near-term obstacle to avoid, and the movement controller would get an order to move around it, after which its original order of northward movement would prevail.  This illustrates the autonomous process, but it also demonstrates that when there’s lots of layers of stuff going on, you need to be able to scale autonomy like you’d scale any load-variable element.

Returning to our network mission for AI, one potential barrier to this is the model of the service.  If events and processes join hands in the model, so to speak, then the model is an event destination that routes the event to the right place or places.  The question becomes whether the model itself can become a congestion point in the processing of events, whether events can pile up.  That’s more likely to happen if the processes that events are ultimately directed to are themselves single-threaded, because a given process would have to complete processing of an event before it could undertake the processing of a new one.

This additional dimension of AI systems, which we could call “event density”, is something that’s slipped through the cracks, largely because so far most of the “orchestration” focus has been on NFV-like service-chain deployments.  If you move from two or three chained elements to services with hundreds of elements, add in the business processes that surround the network parts, and then try to automate the entire mess, you have an awful lot of things that generate events that could change the requirements for a lot of other things.  We need to take event density seriously, in short, when we assess automation and orchestration goals that go beyond basic NFV MANO.

And maybe even there.  There’s nothing more frustrating than a system with self-limiting faults that are hidden until you get really committed to it.  New applications of NFV will be more complex than the old ones, because nobody starts with the most complicated stuff.  We don’t want to find out, a year or so into an NFV commitment, that our solutions have run out of gas.

A Deeper Dive into ONAP

When I blogged about the ONAP Amsterdam release, I pointed out that the documentation that was available on ONAP didn’t address the questions I had about its architecture and capabilities.  The ONAP people contacted me and offered to have a call to explain things, and also provided links and documentation.  As I said in my prior blog, there is a lot of material on ONAP, and there’s no way I could explain it all.  It would be difficult for me to explain the answers to my questions in terms of the documentation too.

I had my call, and my proposal was to take three points that I believed (based on operator input) were critical to ONAP’s value in service lifecycle automation.  I asked them to respond to these points in their own terms, and it’s that set of questions and responses that I’m including here.  For the record, the two ONAP experts on the call were Arpit Joshipura, GM Networking, Linux Foundation and Chris Donley, senior director of Open Source Technology at Huawei and chair of the ONAP Architecture Committee.

I got, as reference, a slide deck titled “ONAP Amsterdam Release” and a set of document links:

The ONAP people were very helpful here, and I want to thank them for taking the time to talk with me.  They pointed out at the start that their documentation was designed for developers, not surprising given that ONAP is an open-source project, and they were happy to cooperate in framing their approach at a higher level, which was the goal of my three points.  I framed these as “principles” that I believed had been broadly validated in the industry and by my own work, and I asked them to respond to each with their views and comments on ONAP support.

The first point is that Network Functions (NFs) are abstract components of a service that can be virtual (VNF), physical (PNF), or human (HNF).  This is an architectural principle that I think is demonstrably critical if the scope of ONAP is to include all the cost and agility elements of carrier operations.

My ONAP contacts said this was the path that ONAP was heading down, with their first priority being the VNF side of the picture.  In the architecture diagram on Page 4 of the Amsterdam Architecture White Paper referenced above, you’ll see a series of four gray boxes.  These represent the Amsterdam components that are responsible for framing the abstractions that represent service elements, and realizing them on the actual resources below.

The notion of an HNF is indirectly supported through the Closed Loop Automation Management Platform (CLAMP), which is the ONAP component responsible for (as the name suggests) closed-loop automation.  CLAMP provides for an escape from a series of automated steps into an external manual process either to check something or to provide an optional override.  These steps would be associated with any lifecycle process as defined in the TOSCA models, and so I think they could provide an acceptable alternative to composing an HNF into a service explicitly and separately.

An abstraction-driven, intent-based approach is absolutely critical to ONAP’s success.  I don’t think there’s any significant difference between how I see industry requirements in this area and what ONAP proposes to do.  Obviously, I think they should articulate this sort of thing somewhere, but articulation in terms that the industry at large could understand is a weakness with ONAP overall.  They appear to recognize that, and I think they’re eager to find a way to address it.

The second point is all network functions of the same type (Firewall, etc.) would be represented by the same abstraction, and implementation details and differences would be embedded within.  Onboarding something means creating the implementation that will represent it within its abstraction.  Abstractions should be a class/inheritance structure to ensure common things across NFs are done in a common way.

The ONAP people say they’re trying to do this with the VNFs, and they have VNF requirements project whose link reference I’ve provided above.  VNF development guidelines and an SDK project will ensure that VNF implementations map into a solid common abstraction.  This works well if you develop the VNF from scratch, but while the architecture supports the notion of creating a “wrapper” function to encapsulate either an existing software component to make it a VNF, or to encapsulate a PNF to make it an implementation of the same NF abstraction, this hasn’t been a priority.  However, they note that there are running implementations of ONAP that contain no VNFs at all; the user has customized the abstractions/models to deploy software application elements.

I don’t see any technical reason why the ONF could not support the kind of structure my second point describes, but I don’t think they’ve established a specific project goal to identify and classify NFs by type and create a kind of library of these classes.  It can be done with some extensions to the open-source ONAP framework and some additional model definition from another party.  Since most of the model properties are inherited from TOSCA/YAML, the notion of extending ONAP in this area is practical, but it is still an extension and not something currently provided.

The final point is lifecycle processes should operate on the abstractions, both within them and among them.  The former processes can be type-specific or implementation-specific or both.  The latter should always be generalized for both NFs and services created from them.

If we go back to that architecture diagram I referenced in my first point, you can see that the processes “above the line”, meaning above those four gray blocks, are general service processes that operate on abstractions (modeled elements) and not on the specific way a given network function is implemented.  That means that it’s a function of modeling (and assigning the abstraction to a gray box!) to provide the link between some NF implementation and the service processes, including closed-loop automation (CLAMP).

The key piece of lifecycle or closed-loop automation is the handling of events.  In ONAP, it’s OK for VNFs (or, presumably, PNFs) to operate on “private resources”, but they can access and control shared-tenant facilities only through the Data Collection, Analytics, and Events (DCAE) subsystem and the Active and Available Inventory (A&AI) subsystem.  There’s API access to the latter, and publish-and-subscribe access to DCAE.

The workings of these two components are fairly complicated, but the combination appears to deal with the need to identify events (even if correlation is needed) and to pass them to the appropriate processes, where handling is presumably guided by the TOSCA models.  I like the A&AI notion because it decouples process elements from real-time access to actual multi-tenant resources.

In our discussions we touched on a couple of things not part of my list of points.  One was the issue of the relationship between open-source projects like ONAP and standards bodies that were tasked with creating something in the same area.  Obviously ONAP and the ETSI NFV ISG have such a relationship.  According to the ONAP people, the coders are supposed to follow standard where they are available and work properly, and to kick the problem upstairs for liaison with the appropriate body if that isn’t possible.

The tension here is created, in my view, by the fact that “standards” in the carrier space are developed by a team of people who are specialists in the standards process.  Open-source development is populated by programmers and software architects.  My own work in multiple standards groups has taught me that there is a real gulf between these two communities, and that it’s very difficult to bridge it.  I don’t think that the ETSI structural model for NFV software is optimal or even, at scale, workable, but I also don’t think that ONAP has been religious in enforcing it.  As long as they’re prepared to step outside the ETSI specs if it’s necessary to do good code, they should be OK.

Which is how I’d summarize the ONAP situation.  Remember that in my earlier blog I questioned whether ONAP did what they needed to do, and said that I wasn’t saying they did not, but that I couldn’t tell.  With the combination of my conversation with the ONAP experts and my review of the material, I think that they intend to follow a course that should lead them to a good place.  I can’t say if it will because by their own admission they are code/contribution-driven and they’re not in that good place yet.

There is a lot that ONAP is capable of, but doesn’t yet do.  Some of it is just a matter of getting to things already decided on, but other things are expected to be provided by outside organizations or the users themselves.  Remember that ONAP is a platform not a product, and it’s always been expected that it would be customized.  Might it have been better to have brought more of that loosely structured work inside the project?  Perhaps, but god-boxes or god-packages are out of fashion.  ONAP is more extensible for the way it’s conceptualized, but also more dependent on those extensions.

This is the second, and main, risk that ONAP faces.  The operators need a solution to what ONAP calls “closed-loop” automation of the operations processes, and they need it before any significant modernization of infrastructure is undertaken.  The advent of 5G creates such a modernization risk, and that means that ONAP will have to be ready in all respects for use by 2020.  The extensions to the basic ONAP platform will be critical in addressing the future, and it’s always difficult to say whether add-on processes can do the right thing fast enough to be helpful.

Is Service Lifecycle Management Too Big a Problem for Orchestration to Solve?

Everyone has probably heard the old joke about a person reaching behind a curtain and trying to identify an elephant by touching various parts.  The moral is that sometimes part-identification gets in the way of recognizing the whole.  That raises what I think is an interesting question for our industry in achieving the transformation goals everyone has articulated.  Has our elephant gotten too big to grope, at least in any traditional way?  Is the minimum solution operators need beyond the maximum potential of the tools we’re committed to?

The steady decline in revenue per bit, first reported by operators more than five years ago, has reached the critical point for many.  Light Reading did a nice piece on DT’s cost issues, and it makes two important points.  First, operators need to address the convergence of cost and price per bit quickly, more quickly than credible new service revenue plans could be realized.  That leaves operators with only the option of targeting costs, near-term.  Second, operator initiatives to address costs have proven very complex because many of their costs aren’t service- or even technology-specific.  They can push their arms behind the curtain and grab something, but it’s too small a piece to deal with the glorious whole.

This is an interesting insight because it may explain why so many of our current technologies are under-realizing their expected impacts.  What operators have been seeking goes back a long way, about ten years, and the term we use for it today is “zero-touch automation”, which I’ve been calling “service lifecycle management automation” to reflect a bit more directly what people are trying to automate. Here, “zero touch” means what it says, the elimination of human processes that cost a lot and create errors, and the substitution of automated tools.

Like SDN and NFV?  Probably not.  Neither SDN nor NFV themselves address service lifecycle automation fully, they address only a substitution of one technical element for another.  Putting that in elephant terms, what we’ve been trying to do is apply what we learned from a single touch of some elephant part to the broad problem of dealing with the beast as a whole.  SDN and NFV are just too narrow as technologies to do that.

The next thing we tried was to apply some of the technology-specific automation strategies that emerged from SDN and NFV to that broader problem.  Orchestration in the NFV form of “MANO” (Management and Orchestration) was a critical insight of the NFV ISG, but the big question is whether the approach to automation that MANO takes can be broadened to address the whole of operator cost targets, “zero touch”.  If you touch an elephant’s toe, you can manicure it, but can you learn enough from that to care for the whole beast?

Simple scripting, meaning the recording of the steps needed to do something so they can be repeated consistently, isn’t enough here; there are too many steps and combinations.  That is what has already lead cloud DevOps automation toward an intent-modeled, event-driven, approach.  But now we have to ask whether even that is enough.  The problem is interdependence.  With intent-and-event systems, the individual processes are modeled and their lifecycle progression is synchronized by events.  The broader the set of processes you target, the more interdependent cycles you create, and the more combinations of conditions you are forced to address.  At some point, it becomes very difficult to visualize all the possible scenarios.

MANO orchestration has a simple, highly compartmentalized goal of deploying virtual functions.  Once deployed, it leaves the management of those functions to traditional processes.  It doesn’t try to orchestrate OSS/BSS elements or human tasks, and if you add in those things you create the interdependence problem.  You can visualize a service deployment as being access deployment plus service core deployment, which is a hierarchical relationship that’s fairly easy to model and orchestrate.  When you add in fault reports, journaling for billing, human tasks to modernize wiring, and all manner of other things, you not only add elements, you add relationships.  At some point you have more of a mesh than a hierarchy, and that level of interdependence is very difficult to model using any of the current tools.  Many can’t even model manual processes, and we’re going to have those in service lifecycle management until wires can crawl through conduits on their own.

What I am seeing is a growing realization that the problem of zero-touch is really, at the technical level, more like business process management (BPM) than it is about “orchestration” per se.  No matter how you manage the service lifecycle, sticking with the technical processes of deployment, redeployment, and changes will limit your ability to address the full range of operations costs.  BPM attempts to first model business processes and then automate them, which means it’s focused on processes directly—which means it can focus directly on costs, since processes are what incur them.

What we can’t do is adopt the more-or-less traditional BPM approaches, based on things like service busses or SOA (service-oriented architecture) interfaces that have a high overhead.  These are way too inefficient to permit fast passing of large numbers of events, and complex systems generate that.  Busses and SOA are better for linear workflows, and while the initial deployment of services could look like that, ongoing failure responses are surely not going to even remotely resemble old-fashioned transactions.

How about intent modeling?  In theory, an intent model could envelope anything.  We already know you can wrap software components like virtual network functions (VNFs) and SDN in intent models, and you can also wrap the management APIs of network and IT management systems.  There is no theoretical reason you can’t wrap a manual process in an intent model too.  Visualize an intent model for “Deploy CPE” which generates a shipping order to send something to the user, or a work order to dispatch a tech, or both.  The model could enter the “completed” state when a network signal/event is received to show the thing you sent has been connected properly.  If everything is modeled as a microservice, it can be made more efficient.

This seems to be a necessary condition for true zero-touch automation, particularly given that even if you eventually intend to automate a lot of stuff, it won’t be done all at once.  Even non-service-specific processes may still have to be automated on a per-service basis to avoid creating transformational chaos.  Some tasks may never be automated; humans still have to do many things in response to problems because boards don’t pull themselves.

It’s probably not a sufficient condition, though.  As I noted above, the more interdependent things you have in a given process model, the harder it is to synchronize the behavior of the system using traditional state/event mechanisms.  Even making it more efficient I execution won’t make it scalable.  I’m comfortable that the great majority of service deployments, at the technical level, could be automated using state/event logic, but I’m a lot less comfortable—well, frankly, I’m uncomfortable—saying that all the related manual processes could be synchronized as well.  Without synchronizing those broader processes, you miss too much cost-reduction opportunity and you risk having human processes getting out of step with your automation.

This is a bigger problem than it’s appeared to be to most, including me.  We’re going to need bigger solutions, and if there’s anything the last five years have taught me, it’s that we’re not going to get them from inside the telecom industry.  We have to go outside, to the broader world, because once you get past the purchasing and maintenance of equipment and some specific service-related stuff, business is business.  Most of the costs telcos need to wring out are business costs not network costs.  To mix metaphors here, we’re not only shooting behind the duck with SDN and NFV, we’re shooting at the wrong duck.

I’ve said for some time that we need to think of NFV, orchestration, and lifecycle automation more in terms of cloud processes than specialized network processes, and I think the evolving cost-reduction goals of operators reinforces this point.  If zero-touch automation is really an application of BPM to networking businesses, then we need to start treating it that way, and working to utilize BPM and cloud-BPM tools to achieve our goals.