A Deeper Dive into AT&T ECOMP

Even a superficial review of AT&T ECOMP shows it’s a whole different way of looking at the virtualization/softwarization of networking.  The master architecture diagram is a picture of start-to-finish service lifecycle management, the bottom is a redrawing of SDN and NFV concepts, and the middle is a modeling approach that seems to draw from multiple sources.  There are no specific references to any standards in the diagram, and the term “ECOMP” appears only twice.

The reason for this is simple, I think.  ECOMP is a new glue that binds and somewhat reshapes things that AT&T had always believed would be part of its Domain 2.0 (D2) architecture.  Because that architecture was designed to create a network where vendors fit into a specific set of silos and were strongly discouraged from slopping across multiple zones to lock AT&T into their approach, so is ECOMP.  In fact, it goes beyond D2 in that regard.  By framing services in a D2 way, ECOMP makes D2 real and not just a set of zones and boundaries.

It’s a fascinating model, and I have to open by saying I’m reviewing ECOMP based on less information than I’d like.  Until the full source code is released we won’t have all the details of the current implementation, and another that I expect that ECOMP will evolve as AT&T gains experience.  It will evolve faster, and further, if Orange and other operators now looking at (or trialing) ECOMP decide to go with it.  That could make ECOMP the most important development in SDN/NFV.

Functionally, ECOMP divides into two parallel frameworks, a design-time framework that builds and sustains the models and policies, and a runtime framework that applies them to the service lifecycle and network infrastructure.  The latter is repeatedly linked to D2 with references like “manages the full lifecycle of D2 infrastructure”, and in the various diagrams and texts, it’s possible to see many holdovers from the early D2 work.

The “left side” or design part of the service lifecycle process is rooted in the AT&T Service Design and Creation (ASDC) element, and also includes policy and analytics application design.  The models themselves seem to be somewhat based on the TM Forum model, combined with a dash of TOSCA, but the details are still murky because the code hasn’t yet been released for public review.  This element feeds another related component that’s responsible for acting as a model and policy repository and distributor.  ECOMP’s common services heart element provides an interface to this repository.

There are two high-layer ECOMP elements, the Master Service Orchestrator (MSO) and the Active and Available Inventory (A&AI) elements.  The former does the orchestrating based on a catalog of “recipes” and is roughly analogous to an expanded NFV MANO function.  The latter is the real-time viewer into the D2 environment, including both resources and services and in my view, it represents something that’s not explicit in the ETSI model at all.

The lower-layer elements of ECOMP are perhaps the most critical to vendors because it’s here that D2 largely assigns outside servers, devices, and software.  Again, there are two primary components in this area.  One is the Data Collection, Analytics, and Events (DCAE) element that handles telemetry and analysis, and the other is the collection of what’s perhaps the most critical element—the Controllers.

Orchestration in ECOMP is a two-level process.  The MSO handles high-level service-end-to-end orchestration and it then hands off its needs to one of a number of Controllers, each of which are resource-domain-specialist orchestrators that turn high-level requests into specific commitments of resources.  ECOMP defines three Controller types, the Infrastructure (or cloud), the Network, and the Application.

This structure divides responsibility between MSO and the Controllers, with the former doing stuff that involves high-level deployment and redeployment actions that have broad scope and minute-level response requirements, and the latter doing the seconds-of-time responses to things.  The Controllers are like ETSI Infrastructure Managers, except that ECOMP explicitly assigns some level of orchestration responsibility to them, which ETSI should and does not (so far).

AT&T seems to envision “orchestration” at the Controller level to be a combination of specific orchestration steps (DevOps- or “NetOps”-like) and policy distribution.  The combination seems appropriate given that the domains for each Controller could involve both legacy and virtual elements.  The implication of the structure is that a Controller is given a mission to deploy and sustain a service element (from its associated domain) and will be responsible for event-handling as long as the Controller can do what it’s expected to do.  If it can’t, then it would buck the case upward to the MSO for handling.

The MSO seems responsible for activating individual elements, meaning that the recipes that it works on would have to be fairly detailed in terms of the steps needed.  The Controllers carry out the requests of the MSO, but as I noted they also respond to lifecycle management events.  This makes the controllers a mixture of ETSI functions.  The Infrastructure controller is surely similar to the Infrastructure Manager, but the ETSI IM (virtual IM in ETSI) is singular while the ECOMP model divides it into Network and Infrastructure (meaning cloud).  The Application Controller is analogous to the ETSI VNF Manager in some ways.

This approach sounds strange, particularly for those versed in the ETSI approach, but it’s more logical than the ETSI original.  Controllers are passed “policies” from the MSO, and they have their own orchestration/policy mechanisms to cycle through the lifecycles of the stuff they build.  It’s supported by telemetry that’s collected everywhere and distributed to where it’s needed, using a publish-and-subscribe model.

ECOMP is a big advance in SDN/NFV architecture, a major improvement in nearly every way.  That doesn’t mean that it’s entirely without questions, perhaps even issues.  Again I have to stress that the details here are sketchy because the code’s not released, but I think there’s enough to comment on.

The big issue in ECOMP remains the VNFs themselves.  A “virtual network function” is generally seen as a function transplanted from an appliance and strung into a service chain.  Every appliance is different, and there’s no standard way of hosting one of these functions as a result.  Each presumably has a set of interfaces, each requires parameters, and all of this variability could be handled in only two ways—require that it be shed in favor of a standard set of API dependencies (what I’ve called a “VPF PaaS” model) or nest custom code with each VNF to provide it what it needs and interface with the outside world.  Even that would require some standard way of harmonization.  Neither ETSI’s work nor ECOMP mandates either of these two approaches, and without them there’s still too much “snowflake” variability and not enough “Lego” interchangeability in the VNFs.

The second issue is in the Controller design.  It appears from the ECOMP material that while there are three types of controllers, there could be multiple instances of each type corresponding to specific domains.  I’d take “domains” here to mean resource/administrative domains, meaning areas under a common management jurisdiction.  That’s a good idea, and it could also contribute to federation if some of the domains were “foreign”, which appears to be possible given the implementation description.

What’s not quite clear is whether the instances of a given Controller type all share a common implementation.  In some places the material seems to suggest that they do, and in others that there might be different implementations to accommodate different technologies deployed in the controlled regions.  This isn’t a simple semantic point; if there is only one controller implementation for each type, then every controller would have to know a lot of things to reflect all the variability in implementation within its domain.  Or, the domains would have to be totally homogeneous from a control and functional perspective.

The final point is that orchestration of operations functions is still undefined.  It’s not that ECOMP precludes it, but that it doesn’t presume a common modeling and orchestration flow that starts at the OSS/BSS and moves downward to the services.  Operators will vary on how much they rely on OSS/BSS tools for specific service lifecycle processes, and thus it’s not clear how much operations efficiency benefits might be left on the table.

OK, overall, ECOMP isn’t perfect, but it’s a massive forward step in an industry that’s been dawdling around the key points for all too long.  I’m sure it will mature with adoption and that if there is successful promotion of ECOMP beyond AT&T, it will promote things to move further and faster.  I’ll probably revisit ECOMP down the line as it evolves and more details become available.

An Overview of the AT&T and Verizon SDN/NFV Architectures

When AT&T and Verizon released their architecture models for SDN and NFV, I did a quick blog overview of the two.  I’ve had a chance to talk with operators and vendors about the approach now, and I’d like to revisit the two architectures based on what I’ve heard.  This is going to be a three-part series, with this (the first) focused on broad architecture and goals and the other two on the specifics of the two architectures.

To me, the important starting point for the discussion is the fact that both operators believe that they have to step up and define an architecture on their own.  We’ve seen in other areas (like 5G and IoT) that operators are increasingly unhappy with the pace of standards development and the support they believe they’re getting from vendors.  The market doesn’t wait for I-dotting and T-crossing, nor can the operators.  They need something for SDN and NFV that delivers benefits that justify deployment and, at deployment, solve the problem of profit-per-bit compression.  Standards haven’t delivered.

On the other hand, operators have spun a lot of cycles in activities like the TMF, ONF, and NFV ISG and they don’t want to just toss the work (and money) away.  Thus, it’s not surprising that these two operators think that they have to blow a few kisses in the appropriate directions.  The biggest risk that these two operator initiatives share is the one they’ve inherited from those other activities.  It’s not just a matter of not having gotten enough done, or of not working fast enough to do more.  Some of what was done, having been approached wrong, was done wrong.  Some break with the past may be essential in avoiding the pitfalls that have already been created.

My read on the AT&T and Verizon models is that they have some similarities but also some fundamental differences, and these differences reflect what I think are basic differences in the goal of each of the operators.  AT&T is defining an open-source project that effectively puts vendors in their place, where Verizon is defining an open framework to encourage vendors.  Keep these missions in mind because I’ll refer back to them and even expand a bit on their implications.

AT&T’s ECOMP is an element in its Domain 2.0 (D2) strategy, and that’s important because D2 tries to divide up networking into functional pieces where AT&T can then establish a number of specific bidder/competitors for future deals.  Fundamental to D2 is the idea that vendors can’t create vast connected ecosystems that put the operator in an all-or-nothing position.  ECOMP, as an architecture for the software elements that create the service lifecycle and manage infrastructure, has to insure that vendors stay where they’re put.  Thus, it doesn’t try to create open elements within itself, which would risk the substitution of a vendor element that might pull through other stuff.  Instead it creates specific low-level points where equipment and software can be placed.

Because D2 isn’t limited to virtual elements, the AT&T strategy has relevance to all its services and infrastructure.  You could orchestrate a service made up completely of legacy equipment with the AT&T approach, though it seems clear that AT&T intends to use both SDN and NFV as broadly as it can.  Still, the D2 focus when combined with the open-source implementation implies that AT&T is truly building a “software-defined” network.

Verizon’s “SDN/NFV Reference Architecture” has no company-offered convenient acronym, so I’m going to refer to it as “SNRA” here.  Whatever you call it, the contributors and acknowledgements page makes it clear that there were more vendor people involved than Verizon people.  I’m not suggesting that outside pressure is what creates the “encourage vendors” model; from what I hear Verizon’s explicit goal was to frame the way vendors approached Verizon with offerings.  Vendor participation in something that’s supposed to encourage vendors is essential, and that’s why there were seven vendors listed (Cisco, Ericsson, HPE, Intel, Nokia, Red Hat, and Samsung).  Verizon aims at open implementation by a community that contributes entire pieces, and so pays a lot of attention to both interfaces and to the ETSI NFV E2E model that created at least a rough (if not totally functional) consensus.

SNRA is what the name suggests, meaning that it’s really about SDN and NFV.  While you can definitely provision legacy elements with the model, it seems pretty clear that Verizon is seeing SNRA as deploying lock-step with SDN and NFV and not being prominent where those technologies haven’t reached.  Again, I’m not surprised with this given Verizon’s apparent goal of creating an architecture into which vendors can fit their wares.  Displacing legacy gear or reducing its differentiation would be a hard sell to many of the contributors on the list.

The scope of both models is set by the need to harness benefits more effectively.  One of the issues I’ve raised repeatedly in both SDN and NFV is that deployment depends more on operations improvements than on capital savings.  Since neither the OSF nor the ETSI NFV ISG chose to address full-scope operations and the full-service lifecycle in their models, and since the TMF has yet to provide its own approach to operational orchestration, one of the key questions that both Verizon and AT&T will have to address to make their approaches successful is “can this do opex efficiency?”  The answer has to start with the relationship between the two architectures and OSS/BSS systems.

In both the ECOMP and SNRA models, infrastructure (including both legacy elements and virtualization-enhanced SDN/NFV elements) is composed into what could fairly be called “service-specific virtual devices” that present a simple model of the service to the OSS/BSS layer.  Neither of the approaches propose to modernize OSS/BSS by introducing event-driven concepts or OSS/BSS functional orchestration.  In my past characterization of the ways in which service orchestration could be applied, they both operate below the OSS/BSS boundary.  While this doesn’t foreclose securing opex benefits, it doesn’t fully address the question of how you’d orchestrate OSS/BSS processes into service operations.  I’ll comment on some approaches as we develop this series to cover the specifics of each approach.

We have competing approaches here, and what’s particularly helpful I think is the fact that what they compete on is fundamental to the evolution of networking.  One operator is embodying the “go it alone” trend, taking control of its software evolution, subordinating hardware to software, and limiting vendor contributions so they fit into the overall scheme.  The other has tried to rehab a vendor-participation process that has broken down in the standards bodies.  Verizon realizes that it is very difficult for operators to drive progress alone, and in many areas impossible for them to cooperate outside formal standards groups without risking anti-trust action.

How is this all going so far?  Obviously, operators are happy with the AT&T model; Orange is running trials of ECOMP.  What may be particularly telling is that the vendors whose chestnuts Verizon may be trying to pull out of the fire aren’t being particularly helpful in promoting the Verizon approach.  For example, I have had a number of vendors ask for details on ECOMP and (so far) none have asked for similar details on SNRA.

Vendors have done a generally abysmal job with SDN and NFV at the positioning and benefit-securing level, and they may now be shooting down their last, best, chance to redeem themselves.

Can Nokia Really Make Itself Software-Centric?

If network operators want their vendors to embrace software, stories in SDxCentral and Light Reading hint that Nokia may be thinking of doing just that.  Details on the notion are incredibly scarce, so we’re left not only to ask the usual question “Is this the right approach?” but also the question “What is the approach?”

All the fuss comes from a single sentence in a Nokia release on its Capital Markets Day discussions.  What they said was that one goal was to “Move beyond our current product-attached software model and create a software business with the margin profile of large software companies, focused on areas including enterprise software and IoT platforms.”  So, to the question: Is this “software business” a business unit, a spin-off, or what?  And, “What does “software model” mean?

Clearly, a software model not the “current product-attached software model”, by which Nokia means the practice of tying software that’s associated with equipment in some way into the equipment business unit.  Many vendors have found that this approach tended to subordinate software advances to hardware sales, and also offered such a limited career path for software professionals that it was hard to get the right people, particularly in leadership roles.

The product attachment issue probably got especially acute when Nokia acquired Alcatel-Lucent.  That company was almost legendary for its product silos and isolationist policies, many of which held over from the original Alcatel and Lucent merger.  It was never really resolved after that first M&A, and that at least complicates achieving the “software business” goal.

As it happens, Alcatel-Lucent actually has an “independent software business” in Nuage, its SDN property.  I’ve made it clear in a number of my blogs that Nuage is my pick for the top prize in the SDN and virtual network space.  Despite this, it never got the kind of support from corporate that it should have, and many operators believed that this was to protect the router business, headed by the powerful executive Basil Alwan.  The point is that just making something a business unit, even an independent one carrying its own brand, doesn’t guarantee its independence.

There’s also the question of the Ericsson example.  I suggested in an earlier blog that Ericsson probably had abandoned a hardware-centric model too soon, and tied itself to a model of network evolution that was still too radical to be immediately successful.  If that’s true for Ericsson, it certainly could pose a threat to Nokia’s software-business approach, whatever it turns out to be.

The big question for Nokia, then, is what software-centricity does turn out to mean.  They can grab the software out of the product units, but they then have to do something specific with it.  There has to be some new and specific focus to all that product-centric stuff or all you do is disconnect the old connections if you break it out.  Having it as an independent business unit would essentially replicate the Nuage model, and I think it’s clear that Nuage didn’t really have the independence and senior management support it needed.  Having the software business actually spin out as a separate company could create shareholder value but poses the risk of having two parts that need each other and don’t have a clear path to symbiosis.

Nuage may be the functional key to what Nokia should do in terms of planning a unified software strategy, whatever the relationship between the software unit and the rest of the company.  In Nuage, Nokia has an outstanding virtual network model, a model into which you could fit SDN services, NFV, services, cloud services, and legacy infrastructure.  If Nuage is the universal constant, then it would make sense to build an architecture positioning around it and fit other software (and even hardware) products into the architecture.

If virtualization is the future, then you have to start by virtualizing the network.  That means creating a model of networking that’s as infrastructure-independent as you can make it.  It’s fairly easy to do that with an overlay-SDN approach like Nuage has.  You can then apply virtual network services directly as NaaS or even as SD-WAN (which Nokia is already doing), and you can also apply them to the cloud and to cloud-centric missions like NFV’s function deployment and connection.  Nuage could have enriched Alcatel-Lucent’s NFV strategy, which was pretty good in any case, and it could enhance Nokia’s too.

The big benefit of a virtual-network approach is its ability to abstract resources, including legacy equipment.  The traditional SDN model has been “fork-lift-and-install” where the virtual-network approach lets you “abstract-and-adapt”.  You can create a unified model of networking and adapt the real-product element relationships as far as hardware-level virtualization and the cloud can take you.  Overlay SDN works fine over Ethernet or IP, and also fine over OpenFlow-forwarded virtual wires.

The fact that you can base SD-WAN service, which is probably the most logical way to deliver NaaS, on overlay SDN and Nuage is to me a proof point for its value.  It lets you deliver a service to users while at the same time exploring the questions of the relationship between virtual-overlay and underlay technology.  It also lets you explore the relationship between virtual networks and the carrier cloud, which could consume those virtual networks either to deliver services or to connect its own elements—or both, of course.

The cloud is the key to all of this, in my view.  Any operator software will end up running in it.  All virtual functions and application elements will be hosted there.  But first and foremost, the cloud is green field.  It’s far easier to apply virtualization where you aren’t displacing something else.  If operators were convinced that the Nuage model was the right approach for cloud networking, and if they also believed it could create an infrastructure-independent abstraction of connection infrastructure, they would have a single path forward to the networking model of the future.

Everything isn’t all beer and roses just because you build a virtual network, of course.  All of the challenges of virtualization in the areas of operations and management remain.  What a virtual network does is create a connection abstraction in a consistent way, which opens the door to using an expanded form of that abstraction (a “service model”) to bind the virtual and the real and to integrate it with OSS/BSS/NMS and onward to the service users.

Ultimately that has to be the goal of Nokia’s software business.  Just finding a home for open-source or DevOps initiatives, as a Nokia person suggested in one of the articles, isn’t a real strategy.  You have to find a role for all this stuff, a way to create one or more deployable ecosystems.  And we know from Broadway that you have to understand what the play is so you can organize the players.  Figuring that out will be Nokia’s challenge.

Competitors’ challenge, too.  Other vendors have played with software-centric visions, including Nokia competitors Cisco and Juniper.  The initiatives haven’t really moved the ball much for either company, in no small part because they haven’t figured out the answer to that first problem I mentioned—the box business generates a lot of revenue that the software business doesn’t. Software-centricity also frames another problem, a dimension of the Ericsson problem.  Software is moving increasingly to open-source, which means it’s a commodity and isn’t differentiating.  What then is the software business?  If it’s professional services, then Nokia and every company who want software-centricity is surely heading into uncharted territory.

Is an Open-Source Framework For Next-Gen Network Software Possible?

Network operators have accepted open source software.  Or, sort of accepted.  I don’t want to minimize the fact that operators have indeed made the culture shift necessary to consider the adoption of open-source software, but that’s Step One.  There are other steps to be taken, and you can see some of the early attempts at those other steps emerging now.

The decisive driver for Step One here seems to have been the fact that cost reduction needs were far outpacing any possible, credible, path to a software-driven revolution.  Operators see a much higher probability that they’ll adopt open hardware models like that of the Open Compute Project.  If more and more of the data center iron is an open commodity, how many vendors who are denied the big hardware revenues will then jump out to address software needs?  Open source, like it or not, may be the only option.

No operator is naïve enough to think that they can just run out into the marketplace and say “InstallOpenSourceBusiness” and have it happen.  There are literally millions of open-source packages, and even narrowing the field to the possible packages that operators could adopt probably wouldn’t cut the number below a million.  Operators are used to buying solutions to problems, usually specific problems and not huge ones like “Make Me Profitable”, so one of the issues that has been facing operators is finding the components to their solution suites.  Hopefully from a single provider, but certainly at least a set of open-source tools that combine properly.

All of the operators I talked with in the last two years agree on two things.  First, they need to transform their business to be leaner and meaner and to open the opportunity for new revenues.  Second, achieving the first goal will surely mean becoming more software-centric rather than device-centric in their planning.  That’s not easy, and it’s only Step Two.

I had a nice talk with an operator a couple months ago, and they indicated that their planners had divided their technology processes into an operations layer, a service feature layer, and a virtual infrastructure layer.  The first focused on OSS/BSS modernization, the second on adding software instances of features to the current appliances/devices in use, and the last on the platform used to host and connect those software features.

In the first area, this operator said that while there were in theory open-source tools for OSS/BSS, they did not believe these tools could possibly do their job, or that transforming to use them could be managed without significant disruption.  That means that they have to confront a stark choice of either discarding the traditional OSS/BSS model completely and building up operations features by area, or staying with (likely their current) proprietary OSS/BSS strategy.

In the second area, which is the software feature-building process, this operator confesses that their current focus in NFV has been entirely on proprietary virtual functions.  They offered two reasons for that.  First, many on the marketing side told the operators that picking virtual functions from “familiar” vendors would grease the sales skids on NFV service offerings.  Second, everyone they found offering virtual functions were offering only proprietary ones.  The argument was that there was no unified approach to creating open-source virtual functions.

The final area has been the most perplexing for this particular operator.  They had envisioned, when they started, that both SDN and NFV would create virtual devices through a process of feature lifecycle management that lived within the third, infrastructure or platform, area.  Early on, this operator felt that cloud tools like OpenStack and some DevOps solution would build this virtual device, but they accepted quickly that management and orchestration processes would also be needed.  That’s because OpenStack doesn’t do everything operators need.

To make matters worse in this last area, this operator realized that by staying with legacy OSS/BSS systems, they couldn’t do much to ensure that management of these new services would be possible within the OSS/BSS.  That requirement had to be added elsewhere, and the only place that seemed to fit was that third area of infrastructure or platform.  What this has done is to concentrate a lot of change in a single space, while leaving the issues of the other two largely unresolved, especially with regard to open source.

There are really three layers of “infrastructure”.  At the bottom is the physical resources and cloud deployment stuff, including OpenStack, SDN control, and legacy device management.  In the middle are the things that drive deployment below, which would be DevOps and orchestration tools, and at the top are the mystical orchestration processes that create the virtual devices that have to be presented upward.

OpenStack has issues, especially in terms of networking, even for cloud computing.  While there are initiatives to harden it for carrier NFV use, these are aimed more at the deployment side than at broader concerns of performance and scalability limits, or at network constraints.  All of these could be resolved outside OpenStack by the next layer up, and probably should be.  OpenNFV seems likely to harmonize this layer a bit, but its role in organizing the next one up is unclear and so is how it would treat legacy elements.

In that next layer, we have current DevOps technology that could control deployment, but today’s DevOps isn’t rich in network configuration and control features.  NFV has at least two MANO projects underway.  Open-Source MANO (OSM) seems to me to be aimed at the limited mission of deploying virtual functions and service chains.  OPEN-O also wants to control legacy elements and orchestrate higher processes.  Neither of them are fully developed at this point.

In the top, final, layer, the only specific contribution we have is OPEN-O, but I think you could argue that whatever happens here would probably be based on a service modeling architecture.  Some like TOSCA, which is in my view very generalized and very suitable.  There is a basic open-source TOSCA implementation, and there’s been some university work building service deployment based on it and on the Linked USDL XML variant.

There is nothing I know of underway that suitably links the management chains through all of this, allowing for higher-level elements to control what happens below and allowing lower-level elements to pass status upward for action.  There is also no accepted architecture to tie this all together.

You can see how “open source” is challenged in Step Two, and we’ve not even gotten to the last step, where we can actually field an integrated suite of software elements.  We’ve also not addressed the question of how virtual functions could be authored in open source without them all becoming what one operator has characterized as “snowflakes”, every one unique and demanding custom integration.

Operators like AT&T and Verizon are trying to create models to address the whole next-gen ecosystem, but even they are risking falling into the same trap that has hurt the ETSI NFV efforts and the ONF’s SDN.  In the name of simplification and acceleration, and in the name of showing results quickly to sustain interest, they make assumptions without following the tried-and-true top-down modeling from goals to logic.  I’m hopeful that the operators can avoid this trap, but it’s not a done deal yet.

Still, it’s these operator architectures that will likely carry the open-source banner as far forward as it gets.  Other open-source SDN and NFV elements will be relevant to the extent they can map into one or both of these carrier models.  The reason for their primacy is simple; they represent the only real ecosystem definitions available.  All of the issues I’ve noted above will have to be addressed within these architectures, and so they’ll be a proving-ground for whether any open-source solution can make a complete business case.

Looking at the Future of IT Through the “Whirlpool” Model

Changes in how we build data centers and networks, and in how we deploy applications and connect them, are really hard to deal with in abstract.  Sometimes a model can help, something to help visualize the complexity.  I propose the whirlpool.

Imagine for the moment a whirlpool, swirling about in some major tidal flow.  If you froze it in an instant, you would have a shape that at the top is wide, slowly narrowing down, then more quickly, till you get to a bottom where again the slope flattens out.  This is a decent illustration of the relationship between distributed IT and users, and it can help appreciate the fundamental challenges we face with the cloud and networking.

In our whirlpool, users are strung about on the very edge, where there’s still no appreciable slope.  They’re grouped by their physical location, meaning those who are actually co-located are adjacent on the edge, and the further they are away from each other, the further they are spaced on that whirlpool edge.

Compute resources are distributed further down the sides.  Local resources, close to the users, are high up on the whirlpool near that user edge, and resources that are remote from workers are further down.  The bottom is the data center complex, generally equidistant from users but at the bottom of a deep well that represents the networking cost and delay associated with getting something delivered.

When you deploy an application to support users, you have to create a connection between where it’s hosted and where it’s used, meaning you have to do a dive into the whirlpool.  If the application is distributed, you have multiple places where components can live, and if those places aren’t adjacent you have to traverse between the hosting points.  Where multiple users are supported, every user has to be linked this way.

This illustrates, in a simple but at least logical way, why we tend to do distributed computing in a workflow-centric way.  We have computing local to workers for stuff that has relevance within a facility and requires short response times.  We may then create a geographic hierarchy to reflect metro or regional specialization—all driven by the fact that workers in some areas create flows that logically converge at some point above the bottom of the whirlpool.  But the repository of data and processing for the company still tends to be a place everyone can get to, and where economies of scale and physical security can be controlled.

Now we can introduce some new issues.  Suppose we move toward a mobile-empowerment model or IoT or something that requires event-handling with very fast response times.  We have to push processing closer to the worker, to any worker.  Since in this example the worker is presumably not in a company facility at all, caching processing and data in the local office may not be an option.  The cost efficiency is lost and resources are likely to be under-utilized.  Also, a worker supported on a public broadband network may be physically close to a branch office, but the network connection may be circuitous.  In any event, one reason why the cloud is much easier to justify when you presume point-of-activity empowerment is that the need for fast response times can’t be met unless you can share hosting resources that are better placed to do your job.

The complication here is the intercomponent connectivity and access to central data resources.  If what the worker needs is complex analysis of a bunch of data, then there’s a good chance that data is in a central repository and analyzing it with short access delays would have to be done local to the data (a million records pushed over a path with a 100-millisecond delay takes over a day in communication latency alone).  Thus, what you’re calling for is a microservice-distributed-processing model and you now have to think about interprocess communications and delay.

Purpose of composability, networking, DC networking, is to make the whirlpool shallower by reducing the overall latency.  That has the obvious value of improving response times and interprocess communications, but it can also help cloud economics in two ways.  One is by reducing the performance penalty for concentration of resources—metro might transition to regional, for example.  That leads to better economies of scale.  The other is by spreading the practical size of a resource pool, letting more distant data centers host things because they’re close in delay terms.  That can reduce the need to oversupply resources in each center to anticipate changes in demand.

Information security in this picture could change radically in response to process and information distribution.  Centralized resources are more easily protected, to be sure, because you have a mass of things that need protecting and justify the costs.  But transient process and data caches are difficult to find, and the substitution of a user agent process for a direct device link to applications allows you to impose more authentication at all levels.  It’s still important to note that as you push information toward the edge, you eventually get to a point where current practices would make workers rely on things like hard copy, which is perhaps the least secure thing of all.

Do you like having refinery maps tossed around randomly?  Do you like having healthcare records sitting in open files for anyone to look at?  We have both today, and probably in most cases.  We don’t have to make electronic distribution of either one of these examples perfect to make it better than what we have.  The problem is not the level of security we have but the pathway to getting to it.  A distributed system has to be secured differently.

Suppose now that we tool things to extreme; the whirlpool is just a swirl that presents no barriers to transit from any point to any other.  Resources are now fully equivalent no matter where they are located and from where they’re accessed.  Now companies could compete to offer process and data cache points, and even application services.

So am I making an argument against my own position, which is that you can’t drive change by anticipating its impact on infrastructure and pre-building?  Not at all.  The investment associated with a massive shift in infrastructure would be daunting, and there would be no applications or business practices in place that could draw on the benefits of the new model.  A smarter approach would be to start to build toward a distributable future, and let infrastructure change as fast as the applications can justify those changes.  Which, circling back to my prior comments, means that this has to be first and foremost about the cloud as an IT model, and only second about the data center.

Wise Council from the Past

Whatever your party, if you are concerned about the country’s future, I recommend this poem, one I’ve quoted to friends in the past.  Henry Wadsworth Longfellow:

Thou, too, sail on, O Ship of State!
Sail on, O Union, strong and great!
Humanity with all its fears,
With all the hopes of future years,
Is hanging breathless on thy fate!
We know what Master laid thy keel,
What Workmen wrought thy ribs of steel,
Who made each mast, and sail, and rope,
What anvils rang, what hammers beat,
In what a forge and what a heat
Were shaped the anchors of thy hope!
Fear not each sudden sound and shock,
‘T is of the wave and not the rock;
‘T is but the flapping of the sail,
And not a rent made by the gale!
In spite of rock and tempest’s roar,
In spite of false lights on the shore,
Sail on, nor fear to breast the sea!
Our hearts, our hopes, are all with thee,
Our hearts, our hopes, our prayers, our tears,
Our faith triumphant o’er our fears,
Are all with thee, — are all with thee!

 

 

Is There a Business Benefit Driving “Hyperconvergence” or “Composable Infrastructure?”

The cloud is a different model of computing, a combination of virtualization and network hosting.  We all recognize that “the cloud” is something apart from virtual machines or containers, OpenStack or vCloud, IaaS or PaaS or SaaS.  It’s also something apart from the specific kind of servers you might use or the data center architecture you might adopt.  Or so it should be.

I had a question last week on LinkedIn (which is where I prefer my blog questions to be asked) on what I thought would drive “Infrastructure 2.0.”  My initial response was that the term was media jargon and that I’d need a more specific idea of what the question meant in order to respond.  When I got that detail, it was clear that the person asking the question was wondering how an agile, flexible, infrastructure model would emerge.  Short answer; via the cloud.  Long answer?  Well, read on.

The biggest mistake we make in technology thinking and planning is perpetuating the notion that “resources suck”, meaning that if we simply supply the right framework for computing or networking (forgetting for the moment how we’d know what it was and how to pay for it), the new resource model would just suck in justifying applications.  Does that sound superficial?  I hope so.

IT or network resources are the outcome of technology decisions, which are the outcome of application and service decisions, which are the outcome of benefit targeting, which is the outcome of demand modeling.  We can’t just push out a new infrastructure model, because the layers above it are not in place to connect it.  The best we could do at this point is to say that the new compute model, the cloud, could be an instrument of change.  The challenge even there is deciding just what kind of changes would then drive the cloud, and you have to do that before you decide how the cloud drives compute infrastructure.

If you did a true top-down model of business IT, you’d have to start with an under-appreciated function, the Enterprise Architect (EA).  This is a role that evolved from the old “methods analysts” of the past, responsible for figuring out what the elements of a job were as a precursor to applying automation.  But it’s here that we expose a big risk, because the best way to do a job may not be the way it’s been done in the past, and often past practices have guided high-level business architectures.

An alternative to this approach is the “focus-on-change” model, which says that if you’re going to do something transformational in IT these days, you will probably have to harness something that’s very different.  I cite, as change-focus options, mobility and IoT.  No, not analytics; they apply to current practices and try to make better decisions through better information.  Mobility and IoT are all about a higher-level shift, which is away from providing a worker an IT-defined framework for doing a job and toward providing a way to help a worker do whatever they happen to be doing.

Any business has what we could call elemental processes, things that are fundamental to doing the business, but there’s a risk even in defining these.  For example, we might say that a sales call is fundamental, but suppose we allow the buyer to purchase online?  Same with delivery, or even with production.  There are basic functional areas, though.  Sales, production, delivery, billing and accounting, all are things that have to be done in some way.  Logically, an EA should look at these functional areas and then define a variety of models that group them.  An online business would have an online sales process, and it might dispatch orders directly to a producer for drop-shipping to the customer or it might pick goods from its own warehouse.  The goods might then be manufactured by the seller or wholesaled.

When you have a map of processes you can then ask how people work to support them.  The most valuable change that appears to be possible today is the notion of “point-of-activity empowerment” I’ve blogged about many times.  Information today can be projected right to where the worker is at the moment the information is most relevant.

Another mistake we make these days is in presuming that changing the information delivery dynamic is all we need to do.  A mobile interface on an application designed for desktop use is a good example.  But would the worker have hauled the desk with them?  No, obviously, and that means that the information designed for desktop delivery isn’t likely to be exactly what the worker wants when there’s a mobile broadband portal that delivers it to the actual job site.  That’s why we have to re-conceptualize our information for the context of its consumption.

That’s why you could call what we’re talking about here contextual intelligence.  The goal is not to put the worker in a rigid IT structure but to support what the worker is actually doing.  The first step in that, of course, is to know what that is.  You don’t want to interfere with productivity by making the worker describe every step they intend to take.  If we defined a process model, we could take that model as a kind of state/event description of a task, a model of what the worker would do under all the conditions encountered.  That model could create the contextual framework, and we could then input into it events from the outside.

Some events could result from the worker’s own action.  “Turn the valve to OFF” might be a step, and there would likely be some process control telemetry that would verify that had been done, or indicate that it didn’t happen.  In either case, there is a next step to take.  Another event source might be worker location; a worker looking for a leak or a particular access panel might be alerted when they were approaching it, and even offered a picture of what it looked like.

From an application perspective, this is a complete turnaround.  Instead of considering information the driver of activity, activity is the driver of information.  Needless to say, we’d have to incorporate process state/event logic into our new system, and we’d also have to have real-time event processing and classification.  Until we have that, we have no framework for the structure of the applications of the future, and no real way of knowing the things that would have to be done to software and hardware to do them.

The converse is true too.  We could change infrastructure to make it hyperconverged or composable or into Infrastructure 2.0 or 3.0 or anything else, and if the applications are the same and the worker behavior is the same, we’re probably going to see nothing but a set of invoices for changes that won’t produce compensatory benefits.

Obviously, it’s difficult to say what the best infrastructure model is until we’ve decided on the applications and the way they’ll be structured.  We can, though, raise some points or questions.

First, I think that the central requirement for the whole point-of-activity picture is a cloud-hosted agent process that’s somewhere close (in network latency terms) to the worker.  Remember that this is supposed to be state/event processing so it’s got to be responsive.  Hosting this in the mobile device would impose an unnecessary level of data traffic on the mobile connection.  The agent represents the user, maintains user context, and acts as the conduit through which information and events flow.

We also need a set of context-generating tools, which can be in part derived from things like the location of the mobile device and in part from other local telemetry that would fall into the IoT category.  Anything that has an association with a job could be given a sensor element that at the minimum reports where it is.  The location of the worker’s device relative to the rest of this stuff is the main part of geographic context.

The agent process is then responsible for drawing on contextual events, and also drawing on information.  Instead of the worker asking for something, the job context would simply deliver it.  The implication of this is that the information resources of the company would be best considered as microservices subservient to the agent process map (the state/event stuff).  “If I’m standing in front of a service panel with the goal of flipping switches or running tests, show me the panel and what I’m supposed to do highlighted in some way.”  That means the “show-panel” and “highlight-elements” microservices are separated from traditional inquiry contexts, which might be more appropriate to what a worker would look at from the desk, before going out into the field.

You can see how the cloud could support all of this, meaning that it could support an application model where components of logic (microservices) are called on dynamically based on worker activity.  The number of instances of a given service you might need, and where you might need them, would depend on instantaneous workload.  That’s a nice cloud-friendly model, and it pushes dynamism deeper than just a GUI, back to the level of supporting application technology and even information storage and delivery.

Information, in this model, should be viewed as a combination of a logical repository and a series of cache points.  The ideal approach to handling latency and response time is to forward-cache things that you’ll probably need as soon as that probability rises to a specific level.  You push data toward the user to lower delivery latency.

The relationship between this productivity-driven goal set, which would at least hold a promise of significant increases in IT spending, and things like “hyperconvergence” or “composable infrastructure” is hard to establish.  Hyperconvergence is a data center model, and so is composable infrastructure.  It’s my view that if there is any such thing as either (meaning if they’re not totally hype-driven rehashes of other technology) then it would have to be a combination of a highly integrated resource virtualization software set (network, compute, and storage) and a data center switching architecture that provided for extremely low latency.  A dynamic application, a dynamic cloud, could in theory favor one or both, but it would depend on how distributed the data centers were and how the cloud itself supported dynamism and composability.  Do you compose infrastructure, really, or virtualize it?  The best answer can’t come from below, only from above where the real benefits—productivity—are generated.

Which leads back to one of my original points.  You can’t push benefits by pushing platforms that can attain them.  You have to push the entire benefit food chain.  The cloud, as I said in opening this blog, is a different model of computing, but no model of computing defines itself.  The applications, the business roles, we support for the cloud will define just what it is, how it evolves, and how it’s hosted.  We need to spend more time thinking about Enterprise Architecture and software architecture and less time anticipating what we’d come up with.

What Should the Next FCC Do?

Telecom is really more about regulatory posture than technology.  The US is now looking at a change in the FCC Chairmanship and the overall political balance of the body, and so there’s a chance that regulatory policy will shift.  It might even shift sharply.  However, FCC workings are murky so it’s not always easy to say what will happen or to relate realistic chances to what’s being said.

The current FCC, under Wheeler, has been characterized as being “pro-competitive” but that’s actually probably giving him credit for a more consistent, strategic, vision than he actually has exhibited.  The FCC has been pro-OTT, pro-consumer, in mindset but in the latter case the question has always been whether the stances it took were actually beneficial to the consumer in the long run.  Like most bodies in the political world, the FCC was probably responding more to lobbyists than anything else, and policy biases were really more about what lobbyists it listened to most.

Making things even harder, the FCC has been about as polarized as government overall.  The two “Republican” commissioners have opposed what the three “Democratic” ones proposed.  Given the likelihood of knee-jerk opposition, we can’t draw much from how the Republican commissioners voted or what they said.

Then there’s the Trump-versus-Republican problem.  Traditionally, Republicans have supported M&A as being consistent with their pro-business position.  Trump said he would block the TW/AT&T deal, and the FCC Commissioners are appointed by the President.  DoJ policies on anti-trust are also largely driven by the views of the Executive branch, and the now-to-be-Republican Attorney General.

Given all the uncertainty, any prediction here could be a waste of time, but I think there are at least a couple points that might emerge as new policy, and they could have a major impact.

The most important point is that the Internet and OTT community favored, and was favored by, the Democrats while the telcos were more on the Republican side.  Net neutrality policy had shifted rather far in favor of the OTTs, much further than I personally believed was smart.  It would be very reasonable to expect that it would now start to shift back, even in areas where the FCC’s rulings had passed court review.

The two most controversial elements in the FCC neutrality policy were the banning of paid prioritization and settlement for the Internet, and the application of the same neutrality rules to mobile services as to wireline.  I think it’s likely that the new FCC will reverse at least elements of both.

The current FCC policies, to the extent that there’s a systematic foundation for them, are based on the presumption that anything that favors incumbents over startups hurts innovation.  If Netflix had to pay for premium delivery, they might have to increase prices a bit or reduce profits a bit, but it would (so the FCC believes) limit other media-related startups’ ability to raise money because they couldn’t afford the charges, and so wouldn’t be competitive.  If settlement were required, then OTTs would have to pay for traffic to access ISP customers, which could limit OTT growth.

The flip side of this is our current problem with return on infrastructure, a problem that has cut operator spending and reduced equipment vendor revenues.  The telecom industry would be transformed in an instant if the FCC restored paid prioritization (by anyone, consumer or OTT) and required settlement among Internet players to replace the bill-and-keep model.  I’ve said that both prioritization and settlement would net out to be better for the industry, so in my view this shift would be a good thing.

On the mobile side, the challenge we currently face is the transition to 5G.  Our use of mobile devices continues to outrun our infrastructure planning.  5G, if it were done right, would go a long way toward fixing that problem.  Where regulations could impact this is in the doing-it-right part.

Any issues in return on infrastructure will end up slow-rolling infrastructure, and 5G is arguably a very critical transformation.  The most important part of it is its attempt to harmonize infrastructure between mobile and wireless, and that would improve efficiency and also potentially build an alternative to FTTH by uniting fiber and 5G last-football-field technology.

Mobile operators also believe that they face greater risks in capacity exhaustion than wireline operators, due in no small part to the tendency of phone vendors and OTTs to push applications that would expend a lot of capacity.  That risk could be particularly significant facing a 5G transition because if 5G is to obsolete the old model of mobile backhaul, the more we spend now on that area, the more we waste.  Regulations today tend to favor just giving the consumer and the OTT everything.

Where regulatory change could really impact things is in IoT.  In the model of IoT that most people seem to hope for, the “things” are free to be exploited by all, much as the Internet is.  That’s a nice model if you’re not one of those expected to supply the exploited things, but if you are it’s a non-starter.  The telcos would be perhaps the most logical investors in thing-tech (forgetting their notion of everything-on-5G), and if they could frame an IoT proposal that included thing-utility status, it might get approval under the new FCC regime.

The question is whether even a change in policy would help either of these major initiatives.  5G is a 2020-and-beyond phenomenon, and IoT is even further out.  There’s another election in 2020 and if the FCC changed political ownership then, even policy changes made today wouldn’t assure that telcos would invest assuming those policy changes would endure through another transition.

Networking is a long-cycle industry, and that’s a problem in the real world, including the political and regulatory world.  If we want operators to continue to fund broadband, we have to be sure they get an adequate ROI, or at least that we don’t foreclose all the options to achieving that.  We don’t have to use public policy to save the telcos yet, because significant opex reduction is still available.  That won’t change infrastructure much, though.

If we want to see modernized infrastructure, then we have to look to service-revenue drivers to create incremental benefits beyond opex.  That’s where regulatory policy can help, or hurt.  I think that an FCC that was willing to focus “net neutrality” on the original non-discrimination goals and allow the same amount of business experimentation in the connection services as we have in OTT would help us all in the long run.  Listening, FCC?

Do We Need New Infrastructure for New Services?

Does a new set of services for network operators imply a new network infrastructure?  That’s a question some of you asked me after the series of blogs I’ve just done.  I’ve talked about software automation of the service lifecycle, and that has focused primarily on cost management.  Obviously, software automation could also facilitate the introduction of new services, but how effective that would be depends on whether the new services could be delivered from “stock” infrastructure.  Agile ordering of something that will take a year to deploy and test isn’t going to move the ball.

The problem being raised here is one operators have raised too.  Nobody likes to base network evolution on the notion of static services, focusing totally on improving profit by reducing costs.  There may not be an easy alternative, though.  A “new service” is a vague term, as I said in a prior blog.  We have multiple categories of “newness”, ranging from enhanced connection features, to connection-supporting things like security, to hosted or experiential features like cloud computing.  Operators are aware of them all, and interested in them to the extent that they can be validated.

Can they be, and if they can, how?  It depends on the class of new service we’re talking about.

Connection service innovation has focused primarily on elasticity of bandwidth or dynamic connection of endpoints.  People have been talking about the notion of a “turbo button” for some time; you push it to get a speed boost when you need it.  Turbo buttons are a consumer access feature largely killed off by neutrality regulation, but for business the equivalent concept is at least legal.  Workable might be another matter.

Remember the rule that we have to deploy automatically delivered stuff based on stock resources?  I can only dial capacity up or down to the extent that I’ve got access assets that are able to support both the range and the bandwidth agility.  For most business services, it would mean selling an access pipe fat enough to handle the highest capacity you intend to sell, then throttling it up and back based on a service-order change.  Operators are prepared to pre-position access capacity in support of this or other services, if there’s a revenue upside.

The challenge here has been that enterprises are interested in dynamic bandwidth to the extent that it lowers their overall costs.  They get frustrated when a salesperson says “Wouldn’t you like to be able to dial in some extra speed?” or “Wouldn’t a little extra capacity be helpful at end-of-quarter?”  Yeah, they’d also like to have their local tax authorities declare a dividend instead of sending a bill, and it would be helpful if some government regulation made buyers purchase their goods or services.  It’s not realistic, though.  Buyers say that business needs drive information exchange, and at the moment they only see a compulsion for dynamism if throttling back for lower performance and up for higher meant changing their costs in a net negative way.

The situation isn’t all that different at the connection-augmenting feature level either.  Yes, it would be nice to be able to get a virtual firewall installed with a click when you need it, but once you figured out you needed one, it’s unlikely you’d then say “Well, let’s just throw the old doors open to hacking!” and pull it out.  Most of the credible features, once installed, would tend to stay that way, which isn’t exactly a dynamic service model.

So, the answer to the opening question is “Yes!”  New network infrastructure is needed, but not to do what we’re already doing.  Yes, the real service opportunities arise in the cloud, and we know that because that’s where those successful OTT competitors we always hear about are living.  I don’t disagree with those who say that the operators would have to build something more OTT-like.  However, that doesn’t necessarily mean that they have to build that instead of what they already have.  Operators could build OTT infrastructure over their own tops.  The question is whether either building that infrastructure or sustaining it would be economically facilitated if some of the current connection services features were supported on it.  SDN and NFV have to prove that could be true, if they are to be useful to operators in the long term.

If we were to envision NFV’s contribution as hosting of virtual CPE features, it should be clear to anyone with a calculator that there’s no way that’s going to be broadly useful.  You can’t host vCPE for consumer services when the amortized cost of the feature as part of a cable modem or other broadband gateway is a couple bucks a year—which is just what it would be.  Business services might or might not benefit from cloud-hosting of vCPE, but there are only 1.5 million business sites in the US that are satellite sites of multi-site businesses.  Three-quarters of these don’t need business-grade access like Carrier Ethernet.  They’re not going to contribute either.

So, what provides the opportunity?  The big truth here is that there is no credible service that a network operator could deploy to justify a transition to NFV.  That doesn’t mean they couldn’t adopt it, only that they couldn’t get there with it.  The transition from not-NFV to NFV has to start with a big infusion of opex savings, and again we should realize that.  Operator costs for “process opex” involving service and network management and related costs (like churn) currently run about 31 cents on each revenue dollar, where OTT costs in the same area run less than six cents.

Once you get an efficient and agile service layer, you could start to build out to optimize what it can deliver, but even there you need help.  We cannot simply build mobile services based on NFV, because too much of mobile infrastructure is already deployed and we’d have to displace it.  We have to piggyback on an initiative that would refresh a lot of infrastructure as it rolled out, which means 5G.

Beyond mobile, the obvious opportunity and in fact the brass ring is IoT.  IoT could by itself build enough carrier cloud data centers to jumpstart the whole next wave of services.  However, operators are stuck in a transcendentally stupid vision of IoT based on giving every sensor a 5G radio (and the media is more than happy to play along).  As long as operators don’t have a realistic vision of the future, they’re not going to adopt a realistic strategy to deal with it.

Most of all, though, we don’t have a plan.  You can evolve to a lot of things in networking, but it’s not easy to evolve to a national or global change in network technology.  You need a vision of the end-game and a way of addressing the evolution as a series of technical steps linked to real and provable ROIs.

I firmly believe that I could justify the carrier cloud.  I firmly believe that I know the pieces needed to get there and the steps to take.  I firmly believe that there are six or seven vendors who could provide everything that the operators would need, and do it starting right now.  But I firmly believe that vendors wouldn’t promote the approach because the sale would be too complicated, and without strong vendor backing of a revolution, everyone ends up sitting in coffee shops instead of marching.

Do we want something to happen here?  If so, dear vendors, you need to stop asking buyers to take you on faith.  Prove your worth.

Are the NFV Pieces Combining Optimally on the Current Trials?

I just got some interesting information from operators on the business case implications of early NFV tests and trials.  I’d been trying to harmonize a series of lets-face-it inconsistent stories on what was going on.  The results perhaps explain some of the disconnect between the CIO and the CTO, and perhaps even some of the coverage of NFV stories.  It certainly brings some clarity to what might happen in NFV deployment.

The most important point operators made was that their early NFV activity was not designed to prove a broad business case for NFV, but rather to prove out the technology and perhaps (yes, that word again) get some insight into low-apple opportunities for actual field deployments.  With very few exceptions, operators said that they didn’t yet have a service lifecycle process in place into which they could integrate NFV, and that limited where it could be used.

That turns out to be an important point.  A large majority of operators (outside the CTO area) believe that the primary attribute associated with advancing NFV out of lab trials and PoCs is contained impact.  That’s why things like virtual CPE have advanced, but more specifically why the CPE-hosted version has advanced.

According to operators, those low-apple NFV opportunities have two specific attributes.  First, they involve static deployment of functions not dynamic deployment and function migration, and second they are almost always going to involve something at the customer demarcation—CPE on the customer premises.  These attributes help operators avoid dependence on broader lifecycle management and exposure to large first costs.

The first dynamic, cloud-hosted, virtual-function-based application could cost a fortune because it’s the only thing dipping in the resource pool.  Given that full-service lifecycle management isn’t available, there’s little to be gained from a lot of cloud-hosting-generated agility either.  In fact, it’s not clear that you could even onboard a service-chained virtual element, much less manage it.  If, however, you stuck a commodity general-purpose box on premises, you solve your problems in a stroke.  You have no resource pool to manage—a customer’s features get dumped into the customer’s box.  You have no complicated management problem because it would be possible to place management agents in the box, which would make managing it little different from managing any other device.

Where things get a little complicated here is the question of what specific functions are in the box.  Operators who are incumbent providers of business services note that their customers stabilize on the features they want fairly quickly, and so there’s not a lot of real value in dynamic changes.  The corollary to that is that the biggest benefit you’ll reap is the reduction in capex you might experience.  You’d like a function that’s pervasive and has a high appliance price.

What operators are really interested in is replacing edge routing functions for VPNs, meaning that they’d host the router in a generalized box on prem, and then perhaps evolve to hosting it in an edge office server.  You can see that this approach has benefits to both customers (who could dodge all the MPLS VPN management) and operators (who could dodge expensive routers in favor of software routers).  The downside is that this model starts to look a lot like SD-WAN, which operators are trying to come to terms with.

The problem SD-WAN presents is that it’s an overlay technology, meaning that OTTs could build VPNs on top of some other service, including the Internet, and compete with expensive operator VPN offerings.  As is always the case, the threat is double-sided.  If operators jump into the space, they cannibalize their own service base.  If they don’t, somebody else cannibalizes it, but perhaps later on.  But the big question with SD-WAN is broader than that, which is whether that model could obsolete a lot of SDN and NFV notions completely, meaning that maybe operators need to think more about its long-term impact.

If an SD-WAN model, or even an approach to one, is in the air, network technology would shift from a service mission to a transport mission.  This is generally what the MEF has envisioned (and positioned badly) in its Third Network model.  Such a network would have a lower cost base for business services, both in capex and opex, and so adopting it would defuse some of the other initiatives aimed at cost reduction.  I’ve said in prior blogs that if you were to conceptualize the optimum way of doing networking based on all available technologies, you’d do an overlay SDN.

Overlay SDN doesn’t address all the issues, though.  Just because something might be more agile at the infrastructure level doesn’t mean that agility will be realized in the service lifecycle delays and costs.  What the operators’ initiatives in NFV tell me is that they are implicitly reacting to that long-standing separation of service operations (OSS/BSS, under the CIO) and network operations (the physical network and NMS).

With proper support from the ETSI ISG, from the TMF, or even from vendors who had the full solution to service automation, we could have advanced infrastructure and operations in lockstep.  It now seems to me that we’re defaulting to separate evolutions, that network technology and those in the operator organizations that support it will not be able to influence service automation much.  It may have to be a CIO move.

Does this delay transformation?  It depends on how the transitions are timed.  SD-WAN coupled with a strong service automation and orchestration story could realize nearly all the goals that have been set for SDN, NFV, and everything else at the infrastructure level.  Realize them, in fact, at a lower investment and risk.  If CIOs were to see (or be made to see) this, they could become the drivers of the new age.  If not, then we would have to see coordinated benefits from two largely discontinuous activities in order to get the optimum outcome.  Given that we’ve not seen either of the activities take off to date, that could be a problem.

I said in a recent blog that in theory the optimum place for orchestration aimed at service automation benefits was a boundary layer that overlaps infrastructure and management below and service management via OSS/BSS above.  Either OSS/BSS types could play in this, or full-spectrum service-automation NFV vendors could take a shot.  However, what operators are telling me today is that they aren’t pushing NFV initiatives that drive broader operations automation.  I believe that’s because they can’t socialize the solution the NFV vendors would present with their operations counterparts.

The role the TMF could and should play here, then, looms large.  If operations needs to transform separately, it’s difficult to see how that can happen when the mouthpiece organization for the CIOs isn’t playing their best game.  The TMF just dumped their CEO, but they still have a lot of people who were there through the current stagnation.  Can they find momentum now?  If they don’t then transformation probably stalls until they do.