Would Savings from NFV or Lifecycle Automation Fund Innovation?

SDxCentral raised an interesting point in an article on how Nokia thinks operators would use savings created by virtualization and automation.  The point is that operators, having saved on both opex and capex with these strategies, would then spend more on innovation.  I believe that the potential for this shift exists, but I also think there are some barriers that would have to fall to realize it.  The biggest, perhaps, is facing exactly what NFV really is.

One of the problems with the save-here-to-spend-there approach is that, according to the operators’ own CFOs, the savings that have been proposed for virtualization and automation don’t stand up to close examination.  In one of my sweeps of CFO attitude, I found that none had seen a credible demonstration of net savings.  Strategies aimed at capex reduction didn’t consider the fact that the alternative infrastructure almost certainly created additional operations expense.  Strategies aimed at opex reduction didn’t correctly estimate even the current opex costs, much less what could be saved.

Part of this problem is the effect of the media on claims and research, which I’ve irreverently described as a “Bull**** bidding war.”  One vendor says “I can demonstrate a savings of 15%!”  The reporter goes to a competitor and says “Charlie over there says he can save 15%, what can you save?”  Now this competitor knows darn well that either they beat Charlie’s number, or the story is going to be about Charlie.  What do you suppose happens?

The bigger factor, though, is the fact that you cannot even attempt a credible estimate of the cost of a network unless you understand in detail how that network is built.  We say “adopt SDN” or “adopt NFV”, but does that mean you to everything with those technologies?  We know that SDN and NFV will have a limited impact on fiber or access technology, but how limited?  Is the impact in other areas limited too?  We can’t know unless we understand just what areas of the network would really be influenced.

On the opex side, I’ve never seen a use case or report that cited how operations costs are actually distributed, or even what they are.  One common problem is to take the entire “operations” portion of a carrier and assume it’s network equipment.  Hey, guys, they have a bunch of expenses like real estate, vehicles, and do forth that don’t even represent direct network costs.  OAM&P costs run to about 64 cents on every revenue dollar, but most of that doesn’t have any connection with network operations and can’t be addressed by automation.

The good news is that while most of the numbers are just smoke, the reality is that there is considerable opportunity to create savings.  My own estimates put the achievable goal at about 12% of capex and between a third and a half of opex, and the result of that combination would exceed the total network capex budget of some operators.  You could buy a lot of innovation with that.

That raises the second point, though.  What exactly does spending on innovation mean?  Is innovation even monetizable?  If you get a windfall savings of perhaps 15 to 18 cents on every revenue dollar, do you run out into the street and scatter money?  You’d invest in something that offered a good return, and “good return” to a CFO means noticeably above the return of legacy infrastructure and services.  What that “something” might be isn’t easy to determine.

A massive investment in innovation would mean a massive shift in infrastructure architecture, say from spending on boxes that create connection services to servers and software that create experiences.  Historically, operators see this kind of shift as being guided by some standards initiative, aimed at defining the architecture and elements in an open way.  Like, one might say, 5G.

5G is a poster-child for the issues of network innovation.  Intel calls it “the next catalyst.”  We are years along in the effort.  We’ve defined all of the architectural goals.  We are just now starting to see people talking about the business case for pieces like network slicing.  How do you get to this point without knowing what the benefits were going to be?  Innovation has to mean more than “doing something different.”

It’s easy to slip from “benefit” to “feature”.  There are a lot of things that next-gen infrastructure could do, but it’s far from clear that all of them (or even any of them) offer a high-enough ROI to meet CFO requirements.  In the case of 5G, we know that higher customer speeds and cell capacity, FTTN/5G combinations to enhance wireline service delivery, unification of wireline/wireless metro architecture to eliminate separate evolved packet core (EPC), and some aspects of network slicing have at least credible benefits.

Credible, but are they compelling?  Most people would agree that an “innovation transformation” would shift much more focus to hosting and data centers.  My work on carrier cloud, drawing on the input of about 70 operators, shows that all of 5G would drive only about 12% of potential carrier cloud data center deployment.  The biggest factors in carrier cloud deployment are IoT, personalization of advertising and video, and advanced cloud computing services.  We should then look to architectures for each of these.

We actually have them.  The big OTT players like Amazon, Google, Microsoft, Twitter, and Uber have all framed architectures to deal with the kind of thing all of these true carrier cloud drivers will require.  All we need to do is to frame them in the context of carrier cloud, which should actually not be that difficult.  The thing that I think has made it challenging is that it’s software-driven.  In fact, it would be accurate to say that all “innovation” in the network operator space is really about transformation to a software-driven vision of technology.

Software-centric planning is hard for operators, and you don’t have to look further than SDN, NFV, or 5G to see that.  None of these initiatives were done the way software architects would have done them; we fell back on hardware-standards thinking.  The problem with the drivers of the carrier cloud is that there’s no real hardware-centricity to fall back on.  How do you approach these drivers if you don’t have a software vision?

Traditional NFV plays, including the open-source solutions, have a problem of NFV-centricity in general, and in particular a too-literal adherence to the ETSI ISG’s end-to-end model.  Most are incomplete, even for the specific issues of NFV, and can’t drive enough change to really make a business case on their own.  There are players now emerging that are doing better, but the problem we have now is that all “orchestration” or “NFV” or “intent modeling” represents is a claim.  Like, I might say, “innovation”.  Perhaps what we need to do first is catalog the space, look at how software automation of services has evolved and how the solutions have divided themselves.  From there, we can see what, and who, is actually doing something constructive.  I’ll work on that blog for next week.

Exploiting the New Attention NFV is Getting

You might be wondering whether perhaps NFV is getting a second wind.  The fact that Verizon is looking at adopting ONAP, whose key piece is rival AT&T’s ECOMP, is a data point.  Amdocs’ ONAP-based NFV strategy is another.  Certainly there is still interest among many operators in making NFV work, but we still have two important questions to answer.  First, what is NFV going to be, and do?  Second, what does “work” mean here.

NFV does work, in a strict functional sense.  We have virtual CPE (vCPE) deployed in real customer services.  We have some NFV applications in the mobile infrastructure space.  What we don’t have is enough NFV to make any noticeable difference in operator spending or profit.  We don’t have clear differentiation between NFV and cloud computing, and we don’t have a solid reason why that differentiation should even exist.  We won’t get those things till we frame a solid value proposition for “NFV” even if it means that we have to admit that NFV is really only the cloud.

Which it is.  At the heart of NFV’s problems and opportunities is the point that its goal is to host some network features in the cloud.  That by rights should be 99% defining feature hosting as a cloud application and 1% doing what special things arise that might demand more than public cloud tools would provide.  What are the differences?  These are the things that have to justify incremental NFV effort, or justify cloud effort to expand the current thinking on cloud computing to embrace more of NFV’s specific mission.

The biggest difference between a cloud application and an NFV application is that cloud applications don’t sit in a high-volume data plane.  The cloud hosts business applications and event processing, meaning what would look more like control-plane stuff in data networking terms.  NFV’s primary applications sit on the data plane.  They carry traffic, not process transactions.

Traffic handling is a different breed of application.  You cannot, in a traffic application, say that you can scale under load, because adding a parallel pathway for data to follow invites things like out-of-order arrivals.  Doesn’t TCP reorder?  Sure, but not all traffic is TCP.  You have to think a lot more about security, because traffic between two points can be intercepted and you could introduce something into the flow.  Authenticating traffic on a per-packet basis is simply not practical.

NFV applications probably require different management practices, in part because of the traffic mission we just noted, and in part because there are specific guarantees (SLAs) that have to be met.  Many network services today have fairly stringent SLAs, far more stringent than you’d find in the cloud.  You can’t support hosting network functions successful if you can’t honor SLAs.

So, we have proved that you do need something—call it “NFV”—to do what the cloud doesn’t do, right?  I think so, but I also think that the great majority of NFV is nothing more than cloud computing, and that the right course would be to start with that and then deal with the small percentage that’s different.  We’ve not done that; much of NFV is really about specifying things that the cloud already takes care of.  Further, at least some of those “NFV things” really should be reflected in cloud improvements overall.  Let’s look at some of the issues, including some that are really cloud enhancements and some that are not, to see what our second-wind NFV would really have to be able to address if it’s real.

Cloud deployment today is hardly extemporaneous.  Even to deploy a single virtual function via cloud technology would take seconds, and an outage on a traffic-handling connection that’s seconds long would likely create a fault that would be kicked up to the application level.  There are emerging cloud applications that have similar needs.  Event processing supposes that the control loop from sensor back to controller is fairly short, probably in the milliseconds and not seconds.  So how do we deploy a serverless event function in the right place to handle the event, given that we can’t deploy an app without spending ten times or more the acceptable time?

Networks are run by events, even if traffic-handling is the product.  Clouds are increasingly aimed at event processing.  What makes “serverless” computing in the cloud revolutionary isn’t the pricing mechanism, it’s the fact that we can run something on demand where needed.  “On demand” doesn’t mean seconds after demand, either.  We need a lot better event-handling to make event-based applications and the hosting of network functions workable.

Then there’s the problem of orchestrating a service.  NFV today has all manner of problems with the task of onboarding VNFs.  We have identified at this point perhaps a hundred discrete types of VNF.  We have anywhere from one to as many as about a hundred implementations for a given type.  None of the implementations have the same control-and-management attributes.  None of the different types of VNF have any common attributes.  Every service is an exercise in software integration.

But what about the evolving cloud?  Today we have applications that stitch components together via static workflows.  The structure is fixed, so we don’t have to worry excessively about replacing one component.  Yet we already have issues with version control in multi-component applications.  Evolve to the event-chain model, where an event is shot into the network to meet with an appropriate process set, and you can see how the chances of those appropriate processes actually being interoperable reduces to zero.  The same problem as with NFV.

Then we have lifecycle management.  Cooperative behavior of service and application elements is essential in both the cloud and NFV, and so we have to be able to remediate if something breaks or overloads.  We have broad principles like “policy management” or “intent modeling” that are touted as the solution, but all policy management and all intent modeling are at this point are “broad principles”.  What specific things have to be present in an implementation to meet the requirements?

Part of our challenge in this area is back to those pesky events.  Delay a couple of seconds in processing an event, and the process of coordinating a response to a fault in a three-or-four-layer intent model starts climbing toward the length of an average TV commercial.  Nobody likes to wait through one of those, do they?  But I could show you how just that kind of delay would arise even in a policy- or intent-managed service or application.

There is progress being made in NFV.  We have an increased acceptance of the notion that some sort of modeling is mandatory, for example.  We have increased acceptance of the notion that a service model has to somehow guide events to the right processes based on state.  We even have acceptance of an intent-modeled, implementation-agile, approach.  We still need to refine these notions to ensure that they’ll work at scale, handling the number of events that could come along.  We also need to recognize that events aren’t limited to NFV, and that we have cloud applications evolving that will be more demanding than NFV.

My net point here is that NFV is, and always was, a cloud application.  The solutions to NFV problems are ultimately solutions to broader cloud problems.  That’s how we need to be thinking, or we risk having a lot of problems down the line.

Comcast is Signaling a Sea Change in the SD-WAN Space

Comcast has started to push in earnest at business services with SD-WAN, and they’re far from the only play in the space.  In fact, one question that’s now being raised in the space is whether the future of SD-WAN will be tied more to service providers than to CPE products bought directly by enterprises, or by managed service providers.  That question is also extending to the broader area of vCPE, which ties in then with NFV.  Service-provider SD-WAN is also a means of linking SDN services to the user, and even linking enterprise management systems with WAN services.

There are a lot of ways of offering business services, and the one that’s dominated for decades is the “virtual private network” (VPN) at Level 3 or the “virtual LAN” or VLAN at Level 2.  Both these service types have been deployed largely by adding features to native routers and switches (respectively) that allow network segmentation.  These “device-plus” features provide low overhead, but they also impact the native behavior of the protocol layer they work at, and that can create cost, compatibility, and management issues.

SD-WAN is an overlay technology, meaning that it’s created on top of L2/L3 (usually the latter) network services.  The nodes of the service provider’s networks see SD-WAN as traffic, just like all other traffic, and that’s true even where SD-WAN is overlaid on VPN/VLAN services.  Many SD-WAN services extend traditional VPN/VLAN services by spreading a new network layer on top of both VPN/VLAN and Internet services.

Service providers like the telcos have had mixed views of SD-WAN from the first.  Yes, it could offer an opportunity to create business services at any scale, to leverage Internet availability and pricing, and to unify connectivity between large sites and small sites, even portable sites.  The problem is that SD-WAN services can be deployed by MSPs and the users themselves, over telco Internet services, and so cannibalize to at least a degree the traditional “virtual-private” LAN and network/WAN business.  Comcast isn’t an incumbent in VPN/VLAN services so they have no reason to hold back.  In fact, they could in theory offer SD-WANs that span the globe by riding on competitive Internet services.

Once you have a bunch of telcos who face SD-WAN cannibalization from competitors like Comcast, from MSPs, and even from enterprises rolling their own VPNs, you pose the question of whether it’s better, if you’re going to lose VPN/VLAN business, to lose it to your own SD-WAN or to someone else’s.  Obviously it is, at least once it’s clear that the market is aware of the SD-WAN alternative.  That could mean that all the network operators will get into the SD-WAN space for competitive reasons alone.

If network operators decide, as Comcast has, to compete in the SD-WAN space, it makes little sense for them to squabble about in the dirt on pricing alone.  They would want to differentiate, and one good way to do that (again, a way Comcast has used) is by linking their SD-WAN service to underlying network features, which most often will mean QoS control, but also likely includes management capability.  That promotes a cooperative model of SD-WAN to replace the overlay model.  To understand how that works, you’d have to look at the SD-WAN service from the outside.

A service like SD-WAN has the natural capacity to abstract, meaning that it separates service-level behavior from the resource commitments that actually provide connectivity.  An SD-WAN service looks like an IP VPN, without any of the stuff like MPLS that makes VPNs complicated, and regardless of whether IP/MPLS or Internet (or any other) transport is used.  You can provide service-level management features, you can do traffic prioritization and application acceleration, and it’s all part of the same “service”, and it’s the same whatever site you happen to be talking about.  This uniformity is a lot more valuable than you might think at a time when businesses spend on the average about 2.7 times as much on network support as they do on network equipment.

The general trend in SD-WAN has been to add on features like application acceleration and prioritization, and those additions beg a connection to network services that would offer variable QoS.  An SD-WAN service with that traffic-expediting combination is a natural partner to operator features.   The management benefits of SD-WAN can also be tied to management of the underlying WAN services, which is a benefit both in user-managed and managed service provider applications.

SD-WAN prioritization features are also a camel’s nose for NFV’s virtual CPE (vCPE) model.  A unified service vision at the management level means it’s easier to integrate other features without adding undue complexity, and so it encourages buyers to think in modular feature terms, playing into the vCPE marketing proposition.  If operators could promote an SD-WAN model that relied on elastic cloud-hosted features for vCPE rather than a general-purpose premises box as is the rule today, they could end up with a service model that neither MSPs nor direct buyers of SD-WAN could easily replicate.  Since linking their SD-WAN service to network prioritization features is also something that third parties can’t do easily, this can create a truly unique offering.  Differentiation at last!

Of course, everyone jumps on differentiation, so all this adds up to the possibility, or probability, that SD-WAN will be increasingly dominated by network operators who exploit network features under the covers to differentiate themselves.  That’s been clear for some time, and it’s why the players in the crowded SD-WAN startup market are trying so hard to elevate themselves out of the pack.  There will be perhaps four or five that will be bought, and four or five times that number exist already.

There is little or no growth opportunity for business VPNs that require carrier Ethernet access and MPLS.  Big sites of big companies are about it, and in any business total addressable market (TAM) is everything.  Add that truth to the two differentiating paths of SD-WAN for network operators (linkage to network services including SDN and linkage to NFV hosting of features) and you have the story that will dominate the future of SD-WAN.  Which means that every SD-WAN startup had better understand how to tell that story or they’ll have no exit.

In the second half of 2018 we’ll probably start to see the signs of this in the SD-WAN space, with fire-sale M&A followed by outright “lost-funding” exits.  There are way too many players in the space to sustain when the market is going to focus on selling to network operators, and startups have only a limited opportunity to prepare for that kind of SD-WAN business.  There’s only one hope for them to avoid this musical-chairs game, and it’s government.

No, not government market, though that does present an opportunity.  Regulators, if they were to allow for settlement and paid prioritization on the Internet, would create an SD-WAN underlayment that anyone could exploit.  That would keep SD-WAN an open opportunity and prevent the constriction in opportunity to network operators that will drive consolidation.  The question is whether it could happen fast enough.  Even in the US, where regulatory changes have been in the wind since January, it will almost surely take more than six months to get something new in place.  Elsewhere it could be even longer, and operators like Comcast aren’t waiting.  If the big operators get control of SD-WAN before regulatory changes gel, it will be too late for most of the SD-WAN players.  So, if you are one, you might want to start prepping for an operator-dominated future right now, or you may run out of runway.

Some Further Thoughts on Service Lifecycle Automation

Everyone wants service lifecycle automation, which some describe as a “closed-loop” of event-to-action triggering, versus an open loop where humans have to link conditions to action.  At one level, the desire for lifecycle automation is based on the combined problem of reducing opex and improving service agility.  At another level, it’s based on the exploding complexity of networks and services, complexity that would overwhelm manual processes.  Whatever its basis, it’s hardly new in concept, but it may have to be new in implementation.

Every network management system deployed in the last fifty years has had at least some capability to trigger actions based on events.  Often these actions were in the form of a script, a list of commands that resemble the imperative form of DevOps.  Two problems plagued the early systems from the start, one being the fact that events could be generated in a huge flood that overwhelmed the management system, and the other being that the best response to an event usually required considerable knowledge of network conditions, making the framing of a simple “action” script very difficult.

One mechanism proposed to address the problems of implementing closed-loop systems is that of adaptive behavior.  IP networks were designed to dynamically learn about topology, for example, and so to route around problems without specific operations center action.  Adaptive behavior works well for major issues like broken boxes or cable-seeking backhoes, but not so well for subtle issues of traffic engineering for QoS or efficient use of resources.  Much of the SDN movement has been grounded in the desire to gain explicit control of routes and traffic.

Adaptive behavior is logically a subset of autonomous or self-organizing networks.  Network architecture evolution, including the SDN and NFV initiatives have given rise to two other approaches.  One is policy-based networking where policies defined centrally and then distributed to various points in the network enforce the goals of the network owner.  The other is intent-modeled service structures, which divide a service into a series of domains, each represented by a model that defines the features it presents to the outside and the SLA it’s prepared to offer.  There are similarities and differences in these approaches, and the jury isn’t out yet on what might be best overall.

Policy-based networks presume that there are places in the network where a policy on routing can be applied, and that by coordinating the policies enforced at those places it’s possible to enforce a network-wide policy set.  Changes in policy have to be propagated downward to the enforcement points as needed, and each enforcement point is largely focused on its own local conditions and its own local set of possible actions.  It’s up to higher-level enforcement points to see a bigger picture.

Policy enforcement is at the bottom of policy distribution, and one of the major questions the approach has to address is how you balance the need for “deep manipulation” of infrastructure to bring about change, with the fact that the deeper you go the narrower your scope has to be.  Everybody balances these factors differently, and so there is really no standard approach to policy-managed infrastructure; it depends on the equipment and the vendor, not to mention the mission/service.

Intent-modeled services say that both infrastructure and services created over it can be divided into domains that represent a set of cooperative elements doing something (the “intent”).  These elements, because they represent their capabilities and the SLA they can offer, have the potential to self-manage according to the model behavior.  “Am I working?”  “Yes, if I’m meeting my SLA!”  “If I’m not, take unspecified internal action to meet it.”  I say “unspecified” here because in this kind of system, the remediation procedures, like the implementation, are hidden inside a black box.  If the problem isn’t fixed internally, a fault occurs that breaks the SLA and creates a problem in the higher-level model that incorporates the first model.  There the local remediation continues.

You can see that there’s a loose structural correspondence between these two approaches.  Both require a kind of hierarchy—policies in one case and intent models in another.  Both presume that “local” problem resolution is tried first, and if it fails the problem is kicked to a successively higher level (of policy, or of intent model).  In both cases, therefore, the success of the approach will likely depend on how effectively this hierarchy of remediation is implemented.  You want any given policy or model domain to encompass the full range of things that could be locally manipulated to fix something, or you end up kicking too many problems upstairs.  But if you have a local domain that’s too big, it has too much to handle and ends up looking like one of those old-fashioned monolithic management systems.

I’m personally not fond of a total-policy-based approach.  Policy management may be very difficult to manipulate on a per-application, per-user, per-service basis.  Most solutions simply don’t have the granularity, and those that do present very complex policy authoring processes to treat complicated service mixes.  There is also, according to operators, a problem when you try to apply policy control to heterogeneous infrastructure, and in particular to hosted elements of the sort NFV mandates.  Finally, most policy systems don’t have explicit events and triggers from level to level, which makes it harder to coordinate the passing of a locally recognized problem to a higher-level structure.

With intent-based systems, it’s all in the implementation, both at the level of the modeling language/approach and the way that it’s applied to a specific service/infrastructure combination.  There’s an art to getting things right, and if it’s not applied then you end up with something that won’t work.  It’s also critical that an intent system define a kind of “class” structure for the modeling, so that five different implementations of a function appear as differences inside a given intent model, not as five different models.  There’s no formalism to insure this happens today.

You can combine the two approaches, and in fact an intent-model system could envelope a policy system, or a policy system could drive an intent-modeled system.  This combination seems more likely to succeed where infrastructure is made up of a number of different technologies, vendors, and administrative domains.  Combining the approaches is often facilitated by the fact that inside an intent model there almost has to be an implicit or explicit policy.

We’re still some distance from having a totally accepted strategy here.  Variability in application and implementation of either approach will dilute effectiveness, forcing operators to change higher-level management definitions and practices because the lower-level stuff doesn’t work the same way across all vendors and technology choices.  I mentioned in an earlier blog that the first thing that should have been done in NFV in defining VNFs was to create a software-development-like class-and-inheritance structure; “VNF” as a superclass is subsetted into “Subnetwork-VNF” and “Chain-VNF”, and the latter perhaps into “Firewall”, Accelerator” and so forth.  This would maximize the chances of logical and consistent structuring of intent models, and thus of interoperability.

The biggest question for the moment is whether all the orderly stuff that needs to be done will come out of something like NFV or SDN, where intent modeling is almost explicit but where applications are limited, or from broader service lifecycle automation, where there’s a lot of applications to work with but no explicit initiatives.  If we’re going to get service lifecycle automation, it will have to come from somewhere.

What’s the Real Relationship Between 5G and Edge Computing?

According to AT&T, 5G will promote low-latency edge computing.  Is this another of the 5G exaggerations we’ve seen for the last couple of years?  Perhaps there is a relationship that’s not direct and obvious.  We’ll see.  This is a two-part issue, with the first part being whether low latency really matters that much, and the second being whether edge computing and 5G could reduce it.

Latency in computing is the length of the closed-feedback control loop that characterizes almost every application.  In transaction processing, we call it “response time”, and IBM for decades promoted the notion that “sub-second” response time was critical for worker productivity improvement.  For things like IoT, where we may have a link from sensor to controller in an M2M application, low latency could mean a heck of a lot, but perhaps not quite as much as we’d think.  I’ll stick with the self-drive application for clarity here.

It’s easy to seem to justify low latency with stuff like self-driving cars.  Everyone can visualize the issue where the light changes to red and the car keeps going for another 50 feet or so before it stops, which is hardly the way to make intersections safe.  However, anyone who builds a self-drive car that depends on the response of an external system to an immediate event is crazy.  IoT and events have a hierarchy in processing, and the purpose of that hierarchy is to deal with latency issues.

The rational way to handle self-drive events is to classify them according to the needed response.  Something appearing in front of the vehicle (a high closing speed) or a traffic light changing are examples of short-control-loop applications.  These should be handled entirely on-vehicle, so edge computing and 5G play no part at all.  In fact, we could address these events with no network connection or cloud resources at all, which is good or we’d kill a lot of drivers and pedestrians with every cloud outage.

The longer-loop events arise more from collective behaviors, such as the rate at which vehicles move again when a light changes.  This influences the traffic following the light and whether it would be safe to pull out or not.  It’s not unreasonable to suggest that a high-level “traffic vector” could be constructed from a set of sensors and then communicated to vehicles along a route.  You wouldn’t make a decision to turn at a stop sign based on that alone, but what it might do is set what I’ll call “sensitivity”.  If traffic vector data shows there’s a lot of stuff moving, then the sensitivity of motion-sensing associated with entering the road would be correspondingly high.  For this, you need to get the sensor data in, digested, and distributed within a couple seconds.

This is where edge computing comes in.  We have sensors that provide the traffic data, and we have two options.  First, we could let every vehicle tickle the sensors for status and interpret the result.  Leaving the latter stage aside, the former is totally impractical.  First, a sensor that a vehicle could access directly would be swamped by requests unless it had the processing power of a high-end server.  Second, somebody would attack it via DDoS and nobody would get a response at all.  A better approach is to have an edge process collect sensor data in real time and develop those traffic vectors for distribution.  This reduces sensor load (one controller accesses the sensor) and improves security.  If we host the control process near the edge, the control loop length is reasonable.  Thus, edge computing.

The connection between this and 5G is IMHO a lot more problematic.  Classical wisdom (you all know how wise I think that is!) says that you need 5G for IoT.  How likely that is to be true depends on just where you think the sensors will be relative to other technology elements, like stoplights.  If you can wire a sensor to a subnet that the control process can access, you reduce cost and improve security.  If you can’t, there are other approaches that could offer lower wireless cost.  I think operators and vendors have fallen in love with the notion that IoT is a divine mandate, and that if you link it with 5G cellular service you get a windfall in monthly charges and buy a boatload of new gear.  Well, you can decide that one for yourself.

However, 5G might play a role, less for its mobile connection than for the last-mile FTTN application so many operators are interested in.  If you presume that the country is populated with fiber nodes and 5G cells to extend access to homes and offices, then linking in sensors is a reasonable add-on mission.  In short, it’s reasonable to assume that IoT and short-loop applications could exploit 5G (particularly in FTTN applications) but not likely reasonable to expect them to drive 5G.

In my view, this raises a very important question about 5G, which is the relationship between the FTTN/5G combo for home and business services, and other applications, including mobile.  The nodes here are under operator control, and are in effect small cells serving a neighborhood.  They could also support local-government applications like traffic telemetry, and could even be made available for things like meter reading.  These related missions pose a risk for operators because the natural response of a telco exec would be to try to push these applications into higher-cost 5G mobile services.

The possibility that these neighborhood 5G nodes could serve as small-cell sites for mobile services could also be a revolution.  First, imagine that 5G from the node could support devices in the neighborhood in the same way as home WiFi does.  No fees, high data rate, coverage anywhere in the neighborhood without the security risks of letting friends in your WiFi network.  Second, imagine that these cells could be used, at a fee, to support others in the neighborhood too.  It has to be cheaper to support small-cell this way than to run fiber to new antenna locations.

There’s a lot of stuff that could be done to help both the IoT and small-cell initiatives along.  For IoT what we need more than anything is a model of an IoT environment.  For example, we could start with the notion of a sensorfield, which is one or more sensors with common control.  We could then define a controlprocess that controls a sensorfield and is responsible for distributing sensor data (real-time or near-term-historical) to a series of functionprocesses that do things like create our traffic vectors.  These could then feed a publishprocess that provided publish-and-subscribe capabilities, manual or automatic, to things like our self-drive vehicles.

I think too much attention is being paid to IoT sensor linkage, a problem which has been solved for literally billions of sensors already.  Yes, there are things that could make sensor attachment better, such as the FTTN/5G marriage I noted above.  The problem isn’t there, though, it’s with the fact that we have no practical notion of what to do with the data.  Edge computing will be driven not by the potential it has, but by real, monetized, applications that justify deployment.

Can the Drivers of Carrier Cloud Converge on a Common Cloud Vision?

One of the issues that should be driving the fall operator planning cycle, carrier cloud, isn’t making a really strong appearance so far.  Those of you who’ve read my blog on what is likely to be a big planning focus no doubt saw that carrier cloud wasn’t on the list.  Many would find this a surprise, considering that by 2030 we’ll likely add over one hundred thousand data centers to support it, most at the edge.  I was surprised too, enough to ask for a bit more information.  Here’s what’s going on.

The key point holding back carrier cloud is a clear, achievable, driving application.  Operators have become very antsy about the field-of-dreams approach to new services.  Before they build out infrastructure like carrier cloud, they want to understand exactly what they can expect to drive ROI, and get at least a good idea of what they’ll earn on their investment.  There are six drivers of carrier cloud, as I’ve noted before, and while operators generally understand what they are, they’re not yet able to size up the opportunity for any of them.

Two of the six drivers for carrier cloud were on the hot-button list for the fall.  One was NFV and the other was 5G, but these account for only 5% and 16% of carrier cloud incentive, respectively, through 2020.  The majority, more than three-quarters, falls to applications not on the advance planning radar.  The biggest driver available in that timeframe is the virtualization of video and advertising features for ad and video personalization and delivery.  It accounts for about half the total opportunity for the near term.  Why is this driver not being considered, even prioritized, in the fall cycle?  Several reasons, and none easy to justify or fix.

First, operators have been slow to get into advertising.  Most who have moved have done so by purchasing somebody who had an ad platform (Verizon and both AOL and Yahoo, for example).  As a result, there’s been less focus on just how the ad space should be handled in infrastructure (meaning carrier cloud) terms.  Operators who have tried to move their own approach here (mostly outside the US) have found it very difficult to get the right people onboard and to drive their needs through a connection-services-biased management team.

The second factor is that operators have tended to see video delivery in terms of using caching to conserve network capacity.  These systems have been driven from the network side rather than from the opportunity side, and they’ve ignored issues like socialization and personalization.  Operators see the latter as being linked more to video portals, which (so far) they’ve either pushed into the web-and-advertising space just noted here, covered in their IPTV OTT service plans, or have not been particularly hot about at all.

What these points add up to is a dispersal of responsibility for key aspects of our demand driver, and a lack of a cohesive set of requirements and opportunities links to infrastructure behavior.  In short, people aren’t working together on the issues and can’t align what’s needed with a specific plan to provide it.

This explains, at least at a high level, why the main carrier cloud driver isn’t in the picture for the fall cycle.  What about the two that are?  Could they take up the slack?

The SDN-and-NFV-drives-change approach, as I’ve already noted in the referenced blog, hasn’t really delivered much in the way of commitment from senior management.  The biggest problem is that neither technology has been linked to an opportunity with a credible scope of opportunity and infrastructure.  SDN today is primarily either a data center evolution toward white-box devices, or a policy management enhancement to legacy switches and (mostly) routers.  NFV today is primarily virtual CPE hosted not in the cloud but on a small edge box at the service demarcation point.  It’s hard to see how these could change the world.

What SDN and NFV prove is the difficulty in balancing risk and reward for technology shifts.  For these technologies to bring about massive change in the bottom line, they have to create massive impact, which means touching a lot of services and infrastructure.  That creates massive risk, which leads operators to dabble their toes in the space with very limited trials and tests.  Those, because they are very limited, don’t prove out the opportunity or technology at scale.  By the time we get a convincing model of SDN and NFV that has the scope to do something, carrier cloud deployment will have been carried forward by something else, and SDN/NFV will just ride along.

5G is even more complicated.  Here we have the classic “There’s a lot you can do with a widget” problem; that may be true but it doesn’t address the question of how likely it is you’d want to do one of those things, or how profitable it would be to do it.  In many ways, 5G is just a separate infrastructure justification problem on top of carrier cloud.  We have to figure out just how it’s going to be useful, then deploy it, and only then see what parts of its utility bear on a justification for carrier cloud.

Nobody doubts that 5G spectrum and the RAN are useful, but it’s far from clear that either would have any impact on carrier cloud.  In fact, it’s far from clear what positive benefits would come from the additional 5G elements, including network slicing.  Remember, there’s a difference between “utility” (I can find something to do with it) and “justification” (I gain enough to proactively adopt it).

An Ixia survey cited by SDxCentral says that carrier cloud plans are rooted in NFV.  Actually the data cited doesn’t seem to show that to me.  The cited drivers for 5G are quite vague, as disconnected from explicit benefits as any I’ve heard.  “Flexible and scalable network” is the top one.  What does that mean and how much return does it generate?  My point is that the survey doesn’t show clear drivers, only abstract stuff that’s hard to quantify.

That’s consistent with what operators have told me.  In fact, 5G planning focus is more on trying to nail down actual, quantifiable, actionable, benefits than about driving anything in particular forward.  Don’t get me wrong; everything in 5G has potential value, has circumstances that would make a critical contribution to operators.  How much of that potential is real is very difficult to say, which is what’s giving operators so much grief.

What exactly is all that rooting of 5G in NFV?  The article quotes an Ixia executive as follows: “You are going to deploy the 5G core in an NFV-type model. There’s no doubt it will all be virtualized.”  Well, gosh, everything virtualized isn’t necessarily NFV.  So, is this an attempt by a vendor (who has an NFV strategy) to get into the 5G story?  You decide.

It all circles back to the notion of that field of dreams.  In order for the drivers of carrier cloud to operate effectively, they all have to drive things to the same place.  We need to have a common cloud model for the network of the future, because we’re not going to build one for every possible demand source.  The difficulties for the moment lie less in making the drivers real in an ROI sense, but in making the drivers credible enough to motivate planners to look for a single cloud solution that can address them all.  That’s what’s missing today, and sadly I don’t see the fall planning cycle providing it.

So the NFV ISG Wants to Look at Being Cloud-Like: How?

The ETSI NFV ISG is having a meeting, one of which’s goals is to explore a more cloud model of NFV.  Obviously, I’d like to see that.  The question is what such a model would look like, and whether it (in some form) could be achieved from where we are now, without starting another four-year effort.  There are certainly some steps that could be taken.

A “cloud model of NFV” has two functional components.  First, the part of NFV that represents a deployed service would have to be made very “cloud-friendly”.  Second, the NFV software itself would have to be optimized to exploit the scalability, resiliency, and agility of the cloud.  We’ll take these in order.

The first step would actually benefit the cloud as well as NFV.  We need a cloud abstraction on which we can deploy, that represents everything that can host functions and applications.  The model today is about hosts or groups of hosts, and there are different mechanisms to deploy containers versus VMs and different processes within each.  All of this complicates the lifecycle management process.

The biggest NFV challenge here is dealing with virtual CPE (vCPE).  Stuff that’s hosted on the customer prem, in a cloud world, should look like a seamless extension of “the cloud”, and the same is true for public cloud services.  This is a federation problem, a problem of agreeing on a broad cloud abstraction and then agreeing to provide the mechanisms to implement it using whatever mixture of technology happens to be available.  The little boxes for vCPE, the edge servers Amazon uses in its Greengrass Lambda extension, and big enterprise data centers are all just the edge of “the cloud” and we need to treat them like that.

If we had a single abstraction to represent “the cloud” then we would radically simplify the higher-level management of services.  Lifecycle management would divide by “in-cloud” and “not-in-cloud” with the latter being the piece handled by legacy devices.  The highest-level service manager would simply hand off a blueprint for the cloud piece to the cloud abstraction and the various domains within that abstraction would be handed their pieces.  This not only simplifies management, it distributes work to improve performance.

Our next point is Cloudy VNFs, to coin an awkward term, should be for all intents and purposes a cloud application component, no different from a piece of a payroll or CRM system.  If it breaks you can redeploy it somewhere, and if it runs out of capacity you can replicate and load-balance it.  Is this possible?  Yes, but only possible because the attributes of a VNF that could make those attributes available aren’t necessarily there.

If I have a copy of an accounting system that runs out of capacity, can I just spin up another one?  The problem is that I have a database to update here, and that update process can’t be duplicated across multiple instances unless I have some mechanism for eliminating collisions that could result in erroneous data.  Systems like that are “stateful” meaning that they store stuff that will impact the way that subsequent steps/messages are interpreted.  A “stateless” system doesn’t have that, and so any copy can be made to process a unit of work.

A pure data-plane process, meaning get-a-packet-send-a-packet, is only potentially stateless.  Do you have the chance of queuing for congestion, or do you have flow control, or do you have ancillary control-plane processes invoked to manage the flow between you and partner elements?  If so then there is stateful behavior going on.  Some of these points have to be faced in any event; queuing creates a problem with lost data or out-of-order arrivals, but that also happens just by creating multiple paths or by replacing a device.  The point is that a VNF would have to be examined to determine if its properties were consistent with scaling, and new VNFs should be designed to offer optimum scalability and resiliency.

We see this trend in the cloud with functional programming, lambdas, or microservices.  It’s possible to create stateless elements, to do back-end state and context control, but the software that’s usually provided in a single device didn’t face the scalability/resiliency issue and so probably doesn’t do what’s necessary for statelessness.

Control-plane stuff is much worse.  If you report your state to a management process, it’s probably because it requested it.  Suppose you request state from Device Instance One, and Instance Two is spun up, and it gets the request and responds.  You may have been checking on the status of a loaded device to find out that it reports being unloaded.  In any event, you now have multiple devices, so how do you obtain meaningful status from the system of devices rather than from one of them, or each of them (when you may not know about the multiplicity)?

All this pales into insignificance when you look at the second piece of cloud-centric NFV, which is the NFV software itself.  Recall that the ETSI E2E model describes a transactional-looking framework that controls what looks like a domain of servers.  Is this model a data-center-specific model, meaning that there’s a reasonably small collection of devices, or does this model cover an entire operator infrastructure?  If it’s the former, then services will require some form of federation of the domains to cover the full geography.  If it’s the latter, then the single-instance model the E2E diagram describes could never work because it could never scale.

It’s pretty obvious that fixing the second problem would more work than fixing the first, and perhaps would involve that first step anyway.  In the cloud, we’d handle deployment across multiple resource pools by a set of higher-layer processes, usually DevOps-based, that would activate individual instances of container systems like Docker (hosts or clusters) or VM systems like OpenStack.  Making the E2E model cloud-ready would mean creating fairly contained domains, each with their own MANO/VNFM/VIM software set, and then assigning a service to domains by decomposing and dispatching to the right place.

The notion of having “domains” would be a big help, I think.  That means that having a single abstraction for “the cloud” should be followed by having one for “the network”, and both these abstractions would then decompose into domains based on geography, management span of control, and administrative ownership.  Within each abstraction you’d have some logic that looks perhaps like NFV MANO—we need to decompose a service into “connections” and “hosting”.  You’d also have domain-specific stuff, like OpenStack or an NMS.  A high-level manager would orchestrate into high-level requests for abstract services, and that would invoke a second-level manager that would divide things by domain.

We don’t have that now, of course.  Logically, you could say that if we had a higher-layer system that could model and decompose, and if we created those limited NFV domains, we could get to the good place without major surgery on NFV.  There are some products out there that provide what’s needed to do the modeling and decomposing, but they don’t seem to be mandatory parts of NFV.

I’d love to be able to go to meetings like this, frankly, but the problem is that as an independent consultant I have to do work that pays the bills, and all standards processes involve a huge commitment in time.  To take a proposal like this to a meeting, I’d have to turn it into a contribution, defend it in a series of calls, run through revision cycles, and then face the probability that the majority of the body isn’t ready to make radical changes anyway.  So, instead I offer my thoughts in a form I can support, which is this blog.  In the end, the ISG has the ability to absorb as much of it as they like, and discard what they don’t.  That’s the same place formal contributions would end up anyway.

Who Will Orchestrate the Orchestrators (and How)

What exactly is “service automation” and who does it?  Those are the two questions that are top of the list for network operators and cloud providers today, and they’re ranking increasingly high on the list of enterprises as well.  As the complexity of networks increases, as technology changes introduce hosted elements in addition to discrete devices, and as cloud computing proliferates, everyone is finding that the cost of manual service operations is rising too fast, and the error rate even faster.  Something obviously needs to be done, but it’s not entirely clear what that something is.

Part of the problem is that we are approaching the future from a number of discrete “pasts”.  Application deployment and lifecycle management have been rolled into “DevOps”, and the DevOps model has been adopted in the cloud by users.  Network service automation has tended to be supported through network management tools for enterprises and service providers alike, but the latter have also integrated at least some of the work with OSS/BSS systems.  Now we have SDN and NFV, which have introduced the notion of “orchestration” of both application/feature and network/connection functions into one process.

Another part of the problem is that the notion of “service” isn’t fully defined.  Network operators tend to see services as being retail offerings that are then decomposed into features (the TMF’s “Customer-Facing Services, or CFSs).  Cloud providers sometimes see the “service” as the ability to provide platforms to execute customer applications, which separates application lifecycle issues from service lifecycle issues.  The trend in cloud services is adding “serverless” computing, which raises the level of features that the operator provides and makes their “service” look more application-like.  Enterprises see services as being something they buy from an operator, and in some cases what they have to provide to cloud/container elements.  Chances are, there will be more definitions emerging over time.

The third piece of the problem is jurisdictional.  We have a bunch of different standards and specifications bodies out there, and they cut across the whole of services and infrastructure rather than embracing it all.  As a result, the more complex the notion of services becomes, the more likely it is that nobody is really handling it at the standards level.  Vendors, owing perhaps to the hype magnetism of standards groups, have tended to follow the standards bodies into disorder.  There are some vendors who have a higher-level vision, but most of the articulation at the higher level comes from startups because the bigger players tend to focus on product-based marketing and sales.

If we had all of the requirements for the service automation of the future before us, and a greenfield opportunity to implement them, we’d surely come up with an integrated model.  We don’t have either of these conditions, and so what’s been emerging is a kind of ad hoc layered approach.  That has advantages and limitations, and balancing the two is already difficult.

The layered model says, in essence, that we already have low-level management processes that do things like configure devices or even networks of devices, deploy stuff, and provide basic fault, configuration, accounting, performance, and security (FCAPS) management.  What needs to be done is to organize these into a mission context.  This reduces the amount of duplication of effort by allowing current management systems to be exploited by the higher layer.

We see something of this in the NFV approach, where we have a management and orchestration (MANO) function that interacts with a virtual infrastructure manager (VIM), made up presumably of a set of APIs that then manage the actual resources involved.  But even in the NFV VIM approach we run into issues with the layered model.

Some, perhaps most, in the NFV community see the VIM as being OpenStack.  That certainly facilitates the testing and deployment of virtual network functions (VNFs) as long as you consider the goal to be one of simply framing the hosting and subnetwork connections associated with a VNF.  What OpenStack doesn’t do (or doesn’t do well) is left to the imagination.  Others, including me, think that there has to be a VIM to represent each of the management domains, those lower-layer APIs that control the real stuff.  These VIMs (or more properly IMs, because not everything they manage is virtual) would then be organized into services using some sort of service model.  The first of these views makes the MANO process very simple, and the second makes it more complicated because you have to model a set of low-level processes to build a service.  However, the second view is much more flexible.

There are also layers in the cloud itself.  OpenStack does what’s effectively per-component deployment, and there are many alternatives to OpenStack, as well as products designed to overcome some of its basic issues.  To deploy complex things, you would likely use a DevOps tool (Chef, Puppet, Ansible, Kubernetes, etc.).  Kubernetes is the favored DevOps for container systems like Docker, which by the way does its own subnetwork building and management and also supports “clusters” of components in a native way.  Some users layer Kubernetes for containers with other DevOps tools, and to make matters even more complex, we have cloud orchestration standards like TOSCA, which is spawning its own set of tools.

What’s emerging here is a host of “automation” approaches, many overlapping and those that don’t covering a specific niche problem, technology, or opportunity.  This is both a good thing, perhaps, and a bad thing.

The good things are that if we visualize deployment and lifecycle management as distributed partitioned processes we allow for a certain amount of parallelism.  Different domains could be doing their thing at the same time, as long as there’s coordination to ensure that everything comes together.  We’d also be able to reuse technology that’s already developed and in many cases fully proven out.

The bad thing is that coordination requirement I just mentioned.  Ships passing in the night is not a helpful vision of the components of a service lifecycle automation process.  ETSI MANO, SDN controllers, and most DevOps, are “domain” solutions that still have to be fit into a higher-level context.  That’s something that we don’t really have at the moment.  We need a kind of “orchestrator of orchestrator” approach, and that is in fact one of the options.  Think of an uber-process that lives at the service level and dispatches work to all of the domains, then coordinates their work.  That’s probably how the cloud would do it.

The cloud, in fact, is contributing a lot of domain-specific solutions that should be used where available, and we should also be thinking about whether the foundation of the OofO I just mentioned should be built in the cloud and not outside it, in NFV or even OSS/BSS.  That’s a topic for my next blog.

Can We Make ETSI NFV Valuable Even If It’s Not Optimal?

Network Functions Virtualization (NFV) has been a focus for operators for five years now.  Anyone who’s following my blog knows I have disagreed with the approach the NFV ISG has taken, but it took it.  The current model will never, in my view, be optimal, as I’ve said many times in past blogs and media interviews.  The question now is whether it can be useful in any way.  The answer is “Yes”, providing that the industry, and the ISG, take some steps quickly.  The goal of these steps is to address what could be serious issues without mandating a complete redesign of the software, now largely based on a literal interpretation of the ETSI ISG’s End-to-End model.

The current focus of NFV trials and deployments is virtual CPE (vCPE), which is the use of NFV to substitute for traditional network-edge appliances.  This focus has, IMHO, dominated the ISG to the point where they’ve framed the architecture around it.  However, the actual deployments of vCPE suggest that the real-world vCPE differs from the conceptual model of the specs.  Because of the central role of vCPE in early NFV activity, it’s important that these issues be addressed.

What was conceptualized for vCPE was a series of cloud-hosted features, each in its own virtual machine, and each linked to the others in a “service chain”.  What we actually see today for most vCPE is a general-purpose edge device that is capable of receiving feature updates remotely.  This new general-purpose edge device is more agile than a set of fixed, purpose-built, appliances.  Furthermore, the facilities for remote feature loading make a general-purpose edge device less likely to require field replacement if the user upgrades functionality.  If vCPE is what’s happening, then we need to optimize our concept without major changes to the ETSI model or implementation.

Let’s start with actual hosting of vCPE features in the cloud, which was the original ETSI model.  The service-chain notion of features is completely impractical.  Every feature adds a hosting point and chain connection, which means every feature adds cost and complexity to the picture.  My suggestion here is that where cloud-hosting of features is contemplated, abandon service chaining in favor of deploying/redeploying a composite image of all the features used.  If a user has a firewall feature and adds an application acceleration feature, redeploy a software image that contains both to substitute for the image that supports only one feature.  Use the same VMs, the same connections.

Some may argue that this is disruptive at the service level.  So is adding something to a service chain.  You can’t change the data plane without creating issues.  The point is that the new-image model versus new-link model has much less operations intervention (you replace an image) and it doesn’t add additional hosting points and costs.  If the cost of multi-feature vCPE increases with each feature, then the price the user pays has to cover that cost, and that makes feature enhancement less attractive.  The ETSI ISG should endorse the new-image model for cloud-hosted vCPE.

Let’s now move to the market-dominant vCPE approach, which is a general-purpose edge device that substitutes for cloud resources.  Obviously, such a hosting point for vCPE doesn’t need additional hosting points and network connections to create a chain.  Each feature is in effect inserted into a “virtual slot” in an embedded-control computing device, where it runs.

One of the primary challenges in NFV is the onboarding virtual functions and interoperability of VNFs.  If every general-purpose edge device vendor takes their own path in terms of the device’s hosting features and local operating system, you could end up with a need for a different VNF for every vCPE device.  You need some standard presumption of a local operating system, a lightweight device-oriented Linux version for example, and you need some standard middleware that links the VNF to other VNFs in the same device, and to the NFV management processes.

What NFV could do here is define a standard middleware set to provide those “virtual slots” in the edge device and support the management of the features.  There should be a kind of two-plug mechanism for adding a feature.  One plug connects the feature component to the data plane in the designated place, and the other connects it to a standard management interface.  That interface then links to a management process that supplies management for all the features included.  Since the whole “chain” is in the box, it would be possible to cut in a new feature without significant (if any) data plane interruption.

This same approach could be taken for what I’ll call the “virtual edge device” approach.  Here, instead of service-chaining a bunch of features to create agility, the customer buys a virtual edge device, which is a cloud element that will accept feature insertion into the same image/element.  Thus, the network service user is “leasing” a hosting point into which features could be dynamically added.  This provides a dynamic way of feature-inserting that would preserve the efficiency of the new-image model but also potentially offer feature insertion with no disruption.

The second point where the NFV community could inject some order is in that management plug.  The notion here is that there is a specific, single, management process that’s resident with the component(s) and interacts with the rest of the NFV software.  That process has two standard APIs, one facing the NFV management system (VNFM) and the other facing the feature itself.  It is then the responsibility of any feature or VNF provider to offer a “stub” that connects their logic to the feature-side API.  That simplifies onboarding.

In theory, it would be possible to define a “feature API” for each class of feature, but I think the more logical approach to take would be to define an API whose data model defines parameters by feature, and includes all the feature classes to be supported.  For example, the API might define a “Firewall” device class and the parameters associated with it, and a “Accelerator” class that likewise has parameters.  That would continue as a kind of “name-details” hierarchy for each feature class.  You would then pass parameters only for the class(es) you implemented.

The next suggestion is to formalize and structure the notion of a “virtual infrastructure manager”.  There is still a question in NFV as to whether there’s a single VIM for everything or a possible group of VIMs.  The single-VIM model is way too restrictive because it’s doubtful that vendors would cooperate to provide such a thing, and almost every vendor (not to mention every new technology) has different management properties.  To make matters worse, there’s no organized way in which lifecycle management is handled.

VIMs should become “infrastructure managers” or IMs, and they should present the same kind of generalized API set that I noted above for VNFM.  This time, though, the API model would present only a set of SLA-type parameters that would then allow higher-level management processes to manage any IM the same way.  The IM should have the option of either handling lifecycle events internally or passing them up the chain through that API to higher-level management.  This would organize how diverse infrastructure is handled (via separate IMs), how legacy devices are integrated with NFV (via separate IMs), and how management is vertically integrated while still accommodating remediation at a low level.

The final suggestion is aimed at the problem I think is inherent in the strict implementation of the ETSI E2E model, which is scalability.  Software framed based on the functional model of NFV would be a serialized set of elements whose performance would be limited and which would not be easily scalable under load.  This could create a major problem should the failure of some key component of infrastructure cause a “fault cascade” that requires a lot of remediation and redeployment.  The only way to address this is by fragmenting NFV infrastructure and software into relatively contained domains which are harmonized above.

In ETSI-modeled NFV, we have to assume that every data center has a minimum of one NFV software instance, including MANO, VNFM, and VIM.  If it’s a large data center, then the number of instances would depend on the number of servers.  IMHO, you would want to presume that you had an instance for each 250 servers or so.

To make this work, a service would have to be decomposed into instance-specific pieces and each piece then dispatched to the proper spot.  That means you would have a kind of hierarchy of implementation.  The easiest way to do this is to say that there is a federation VIM that’s responsible for taking a piece of service and, rather than deploying it, sending it to another NFV instance for deployment.  You could have as many federation VIMs and layers thereof as needed.

All of this doesn’t substitute completely for an efficient NFV software architecture.  I’ve blogged enough about that to demonstrate what I think the problems with current NFV models are, and what I think would have to be done at the bottom to make things really good again.  These fixes won’t do that, but as I said at the opening of this blog, my goal isn’t to make current NFV great or even optimal, but rather to make it workable.  If that’s done, then we could at least hope that some deployment could occur, that fatal problems with NFV wouldn’t arise, and that successor implementations would have time to get it right at last.

What to Expect in Network Operators’ Fall Planning Cycle

Network operators generally do a fall technology plan to frame their following-year budget.  The timing varies with geography and operator, but most are active between mid-September and mid-November.  This year, a fair number of operators have done some pre-planning, and we can actually see the results in their quarterly earnings calls, as well as the calls of the network equipment vendors.  I’ll track the plans as they evolve, but this is a good time to baseline things.

Nearly all the operators reported lower capex could be expected for 2017, and most have actually spent a bit ahead of their budget plans.  As a result, the 4th quarter is looking a bit soft, and you can see that in the guidance of the equipment vendors and that for the operators themselves.  This shouldn’t come as a surprise, given that operators are feeling the pressure of declining profit per bit, which makes investment in infrastructure harder to justify.

Among the operators who have done some pre-planning, three issues have been raised.  First is whether SDN and NFV could bring about any meaningful change in revenue or profit, and for some at least, if “not” then “what might?”  Second is whether there is a potential for a change in regulatory climate that could help their profits, and third is just what to expect (if anything) from 5G.  We’ll look at each of these to get a hint of what might happen this fall and next year.

What operators think of either SDN or NFV is difficult to say because the response depends on who you’re talking to.  The CTO people are the most optimistic (not surprisingly, given that they include the groups working on the standards), and the CFO people tend to be the least.  Among the specific pre-plan operators, the broad view is “hopeful but not yet committed”.  There is general agreement that neither technology has yet made a business case for broad adoption, and that means neither has a provable positive impact on the bottom line.

Perhaps the biggest issue for this fall, based on the early input, is how a better business case could be made.  Nobody disagrees that both SDN and NFV will play a role in the future, but most operators now think that “automation”, by which they mean the automated service lifecycle management I’ve been blogging about, is more important.  Full exploitation of automation is outside the scope of both SDN and NFV in current projects and plans, and there is no standards body comparable to the ONF or ETSI NFV ISG to focus efforts.

“No standards body” here is interesting because of course the TMF is a body that could drive full service lifecycle automation.  It didn’t come up as much among pre-planning users, in large part because only the CIO organizations of operators seem to have much knowledge of or contact with the TMF.  In my view, the TMF also tends to generate its documents for consumption by its own members, using their own terminology.  That makes it harder for operator personnel who aren’t actively involved to understand them, and it reduces their media coverage as well.  In any event, the TMF doesn’t seem to be pushing “automation”, and so we’re a bit adrift on the SDN/NFV side for the fall planning cycle.

The regulatory trends are another up-in-the-air issue.  In the US, the Republican takeover of the FCC seems to be intent on reversing the pro-OTT mindset of previous FCCs, particularly the Wheeler Chairmanship that preceded the current (Pai) one.  Under Wheeler the FCC declared that the Internet was a telecommunications service regulated under Title II, which gave the FCC the ability to control settlement and pricing policies.  Wheeler took that status as a launching-pad for ruling against settlement among ISPs and paid prioritization, both of which could help ISP (and thus network operator) business models.  Pai seems determined to eliminate that classification, but even if he does the position could change with a change in administration in Washington.  There’s talk of Congress passing something to stabilize the net neutrality stance, but that might never happen.

Outside the US, regulatory trends are quite diverse, as has been the case for a decade or more.  However, operators in both Europe and Asia tell me that they see signs of interest in a shift to match the US in accepting paid prioritization and settlement.  If that were to happen, it could at least provide operators with temporary relief from profit compression by opening a revenue flow from OTTs to operators for video.  That would probably boost both legacy infrastructure spending and work on a longer-term revenue and cost solution.  However, operators don’t know how to handicap the shift of policy, and so far it’s not having a big impact on planners.

The final area is the most complicated—5G.  Generally, operators have accepted that they’ll be investing in 5G, with the impact probably peaking in 2021-2022, but the timing and the confidence operators have in a specific infrastructure plan varies considerably.  In the US, for example, there is considerable interest in using 5G with FTTN as a means of delivering high bandwidth to homes in areas where FTTH payback is questionable.  Operators in other countries, particularly those where demand density is high, are less interested in that.  Absent the 5G/FTTN connection, there isn’t a clear “killer justification” or business case for 5G in the minds of many operators.  “We may be thinking about an expensive deployment justified by being able to use the ‘5G’ label in ads,” one operator admits.

The 5G issue is where pre-planners think the overall focus for fall planning will end up.  Some would like to see a 5G RAN-only evolution, including those with FTTN designs.  Others would like to see the convergence of wireless and wireline in the metro, meaning the elimination or diminution of investment in Evolved Packet Core for mobile.  Still others with MVNO partner aspirations like network slicing.  Everyone agrees that it’s not completely clear to them that 5G evolution will improve things, and they say they’ll go slow until that proof is out there.  The pre-planners didn’t see IoT support as a big near-term driver for 5G, interestingly.

4G transition came along, operators say, at a critical point in market evolution, where the advent of smartphones and the growth in mobile phone usage drove demand upward sharply and outstripped old technologies.  There’s a question among operators whether that kind of demand drive will work for 5G, in no small part because it’s not clear whether competition will stall ARPU growth or even drive it down.  Operators would invest to fend off competition as long as service profits overall were promising, but it’s not clear to them whether they will be.  They’ll try to find out this fall.

Which raises the last point, the last difficulty.  Operators have historically relied on vendor input for their technology planning, under the logical assumption that it did little good to speculate about technologies that nobody was offering.  The problem is that the vendors have demonstrably failed to provide useful technology planning support in areas like SDN and NFV, and are failing in 5G by most accounts.  The pre-planners think that vendors still think that operators are public utilities engaged in supply-side market expansion.  Build it, and they will come.  The operators know that’s not a reasonable approach, but their own efforts to move things along (such as the open-source movement in both SDN and NFV) seem to have very long realization cycles and significant technology uncertainties.

We’re in an interesting time, marketing-wise.  We have a group of buyers who collectively represent hundreds of billions in potential revenue.  We have a group of sellers who don’t want to do what the buyers want and need.  The good news is that there are some signs of movement.  Cisco, who more than any other vendor represents a victory of marketing jive over market reality, is reluctantly embracing a more realistic position.  Other vendors are taking steps, tentatively to be sure, to come to terms with the new reality.  All of this will likely come to focus this fall, whether vendors or operators realize it or not.  There’s a real chance for vendors here, not only the usual chance to make the most of the fall planning cycle, but a broader chance to fill market needs and boost their own long-term success.