Google Fi Could Be Big, or It Could Be Another “Wave”

One of the questions that seems to get asked annually is “When Google is going to build its own network?”  After all, Google has deployed fiber in some areas, and from time to time it’s said it was going to bid on mobile spectrum.  Is it just a matter of time for Google to take over the network as well as the Internet?  Not if you believe in financial reality.

We should be suspicious about a Google-eats-the-world vision based on financials alone.  Google is currently priced at about 6.4 times sales, where telcos are priced about 1.7 time sales.  The Price/Earnings ratio for Google is double that of telcos, and so is its return on assets.  So let’s see, I’m going to make my stock price go up and reward my investors (and myself) by getting into a business where opportunities are lower?  Why not just call Carl Icahn and invite him to tea and takeover?

Then there’s the fact that while Google has threatened to buy spectrum and has dabbled in fiber, it’s not really acquired licenses and it’s cherry-picking fiber cities.  All the indications are, and have always been, that Google is trying to force network operators to sustain broadband improvement cycles in the face of their declining profit per bit.  The media takes this seriously, but I doubt that the telcos worry that Google is trying to get into their business.  They’re more worried about how to get into Google’s.  We hear a lot about network operators trying to learn the “OTT mindset” but not much is written about how OTTs are trying to learn to be like telcos.

But if I intended to blog about how Google wasn’t going to become a telco, I’d be done already.  The strongest indicator of Google’s plans is its Project Fi initiative, and that initiative also shows just where telco vulnerability lies and how vendors will have to think about their customers’ “transformations” in the future.

Project Fi is a lot of things.  At its most fundamental level, it’s a “network anonymizer”.  Google combines WiFi access and selective 4G and future 5G partnerships to create its own virtual network that the customer sees as being their broadband service.  The website is evocative; it shows a smartphone (Android, of course) with the network identifier as “Fi Network”.

Fi also makes Google a Mobile Virtual Network Operator (MVNO) in its relationship with cellular wireless providers.  MVNOs piggyback their service on a “real” mobile broadband network rather than deploy their own.  Apple has long been rumored to covet an MVNO role, but Tim Cook seems to have ruled that course out earlier this year.  If that’s true and not just classic misdirection, then we’d have to address why Google would be seeing opportunity there when Apple isn’t.

Apple wants to sell iPhones and other “iStuff”.  The mobile operators who supply the phones to customers are Apple’s most valuable conduit, and if Apple decided to be an MVNO they’d make perhaps one operator in a given market happy (the partner in their underlay mobile network) and alienate all the others.  Google may be a competitor to Apple with Android, but Android isn’t a phone it’s a phone OS and the Android phone market is already fragmented.  If Google wanted to compete with the iPhone directly, why not just build phones and not license them to third parties?

What Google wants is, I think, clear.  Right now they’re an Internet OTT giant.  It’s hard to visualize what’s above “the top” in both a semantic sense and a realistic sense.  They don’t want to be on the bottom, where all the current bottom-feeders look covetously upward at Google.  What they want is to be “just-under-the-top”.  Project Fi is about JUTT.

A JUTT approach is consistent with Google’s Android approach.  I don’t want to own the dirt, Google says, I just want to make sure the dirt doesn’t own me.  Apple can sell phones as long as they don’t threaten the OTT space.  Operators can sell broadband if they stay in their profit cesspool and play nice.  Android was designed to poison Apple initiatives, and so is Fi designed to poison operator aspirations in Google’s top layer.

But JUTT is also an offensive play.  A simple MVNO approach exploits a brand.  The notion is that you have a good brand, good enough to be an automatic win in a certain number of mobile deals, so your marketing costs are minimal.  Being an MVNO gives you a nice little chunk of the mobile bill for doing not much (as long as it doesn’t turn your operator partners against you).  But what I think Google wants with Fi is to establish itself as a mobile brand.  Android is a brand unto itself, Samsung is an Android brand.  Fi is a Google brand.  Every time a Fi user looks at that smartphone, they’ll see the brand reinforced.

And they’ll see it exploited.  Apple has proved (with things like FaceTime) that you can make a phone into a community.  Fi could let you make broadband service into a community.  Social networks and video chat are already creating what’s essentially a mobile-service-free vision of communication.  Users always see their service as what they directly use, and smart devices with OTT services cover up the broadband underneath.  But if the broadband JUTT has the services within itself, then those services could pull through JUTT just like wanting to call grandma pulled through POTS voice.

Which brings up the final point; your “service provider” perception is set by your service.  If Google can use Fi to establish a Google-slanted vision of social-collaborative services, of “real” IoT, then they can make that vision the default standard for the new area.  That could make Google a leader in even the evolved form of “communications services”, the things the operators believe has to be tied specifically to the network.

This isn’t going to be an easy road, though.  Google has not been very successful in direct marketing; their DNA is in ad-sponsored services and they’re not going to make Fi work without charging for the network services consumed—because Google will have to pay to expand them and keep them strong.  I think Project Fi is just that right now, a “project” and not a product.  Google will have to work out the promotion and make a go of this, or it will fall by the wayside as so many other good Google concepts have.  Remember Google Wave?  That could be Fi’s future too.

Will IT Giants Slim Down to Nothing or Rebuild Around a New Driving Architecture?

For decades, there’s been a view that a one-stop IT shop or full-service vendor was the best approach.  Now it seems like nobody wants to be that any more.  IBM, once the vendor with the largest strategic influence of any vendor, has seen its product line and customer base shrink.  Dell and HPE seem to be getting out of the service and software business in favor of servers and platform software.  What kind of IT market are we facing now, anyway?  There are a number of basic truths that I think answer the “why” and “what’s coming” questions.

Truth number one is there are only 500 companies in the Fortune 500.  OK, I guess you think that’s obvious, but what I’m getting at is that very large enterprises aren’t getting more numerous.  The success of technology hasn’t come by making the big companies spend more, but by getting smaller companies to spend something.  That’s a whole different kind of market than the one that created Big Blue (IBM, for those who don’t recognize the old reference!)

With big customers, you can count on internal technical planning and support.  Your goal as a vendor is to secure account control, meaning that you have a strong ability to influence buyers’ technology planning processes.  You have an account team dedicated to these giant buyers and your team spends a lot of time buying drinks and kissing babies.  They can afford to because they’ll make their quota and your numbers from their single customer.

Down-market, the situation is the opposite.  The buyers don’t even do technology planning, and they can’t provide internal technical support for what they buy.  If you give them a hand, you’re holding them up—often spending days on a deal that might earn you a tenth of your sales quota if the deal is done.  You have a dozen or more of these accounts and you have to spend time with them all, doing little tiny deals that don’t justify much support.  You can’t sell down-market, you have to market to these buyers and that was something new to IBM and to other IT giants.

Which brings up truth number two:  you can sell solutions, but not market them.  If the small buyer doesn’t do technology planning, then they don’t know what they want unless they can somehow connect goals and needs to products.  If you try to help them in this by “solution selling” you end up spending days or weeks on the account for little gain in the end—because they are small companies, they buy IT on a small scale.

Software, particularly business software, is a solution sell.  Imagine trying to take out an ad in a small-business rag to promote analytics.  How many calls do you think it would generate, and how long would it take to turn a “suspect” into a customer?  Too long.  Thus, the down-market trend in tech tends to make software a more difficult sell than hardware.  Another problem is that new applications that demand new hardware pose the classic sales problem of helping the other guy win.  A big project budget is usually about 70% equipment and 30% software and services, and the software company has to do the heavy lifting.  That’s a tough pill to swallow.

Trying to sell both doesn’t help either.  If you try to push a solution that includes both software and hardware, or even if you just have software in your portfolio, you end up doing that endless hand-holding and the software decision holds up hardware realization.  Your management thinks “Hey, why not wait till these guys decide they want analytics and then sell them what it runs on?”  Why not, indeed.

The solution seems to be to either sell hardware or sell software.  In the first case you rely on the market for orderly compute capacity growth or on new applications that software players spawn.  You assume competition rather than account control.  In the software case, you focus on selling something that’s differentiable and you make hardware-only vendors into partners.

The third truth is that there is only a finite number of IT professionals and most of them are looking for vendor jobs, not end-user jobs.  Nike and Reebok make sneakers, not servers or software.  An IT professional has a better career path in a technical company, so it’s hard to keep great performers in end-user organizations.  The very qualities that a company needs to plan for the best IT isn’t likely to stay with the company if they go there in the first place.  That makes it much harder to promote complex solutions or propose significant changes.

I’ve told the story of the bank executive who interrupted a discussion I was having with the comment “Tom, you have to understand my position.  I don’t want decision support.  I want to be told what to do!”  Every expansion in consumer IT, every new startup, every vendor hiring spree, reduces the labor pool for the very people on whom business IT depends.  Those left behind want to be told what to do.

This problem could have been mitigated.  There was a time when vendors spent a lot on technical material to help buyers learn and use their stuff.  There is no similar level/quality of material available today, in large part because vendors who produce it would simply be educating their competition—most product areas are open and competitive.  There was a time when you could read good, objective, comments on products and strategies.  No more; those days were the days of subscription publications, and we’re living in an ad-sponsored world.

This brings us to where we are, which is that the trend to maximize profits in the coming quarter at the expense of the more distant future is alive and well.  None of these moves by the IT giants are smart in the long run.  Spending on business IT can improve only if new applications realize new benefits, and that really takes a company with some skin in the game across the board—hardware, software, and services.  But taking the long view means getting punished in quarterly earnings calls in the short term at best, and at worst having some “vulture capitalist” take a stake in your company to jump off on a hostile takeover.

But what about the future.  Nature, they say, abhors a vacuum and that’s true in information terms or in terms of driving change.  Software innovation happens under the current myopic conditions, but not nearly as fast as it should.  As the potential for a revolution builds, it tends to create revolutionaries.  In some cases (NFV is a good example), we have a wave of enthusiasm that outruns any chance of realization and we fall short.  In others we have enough functional mass to actually do what people get excited about.

For business IT this is hopeful but not decisive.  You can’t easily create business revolution as a startup, and decades of tripping over our own feet because we’re afraid of looking forward hasn’t created an environment where even enthusiasm will be enough.  What I think is about to happen is a fundamental shift in power among IT firms, away from those who shed the broad opportunities by shedding critical elements, and toward those who keep some key stuff.  All of our IT kingpins could still be in that latter category, but by cutting a lot of things out they make it critically important to exploit what they retain.  Will they?  That’s what we’ll have to watch for.

Where, though, could we see a real opportunity to exploit something, given the stripping down that’s happening?  The answer, I think, lies in architecture, a fundamental relationship between hardware and applications that’s set by a software platform.  Nobody who sells hardware can avoid selling platform software, and there are already big platform-software players with little (Oracle) or no (Microsoft, Red Hat) position in hardware.  Microsoft just announced a deal with GE Digital to host its Predix IoT platform on Azure.  The PR around the deal has been week on all sides, but it’s a sign that architecture plays are already emerging to link critical new applications (and IoT is an application not a network) with…you guessed it…platforms.

IoT, or mobility, or contextual productivity support, or even NFV, could be approached architecturally, and the result could lead to a new wave of dominance.  Almost certainly one of these drivers will succeed, and almost certainly it will anoint some player, perhaps a new one.  When that happens it will likely end our cost-management doldrums and announcements of slimming down.  I lived through three waves of IT growth and I’m looking forward to number four.

Building On the Natural Cloud-to-NFV Symbiosis

From almost the first meeting of the NFV Industry Specification Group, there’s been a tension between NFV and the cloud.  Virtual Network Functions (VNFs) are almost indistinguishable from application components in the cloud, and so platforms like OpenStack or Docker and tools like vSwitches and DevOps could all be considered as elements of NFV implementation.  Actually, “should be considered” is a more appropriate phrase because it makes no sense to assume that you’d duplicate most of the capabilities of the cloud for NFV.

What doesn’t get duplicated, or shouldn’t?  Our vision of NFV is increasingly one of a multi-layer modeled structure where the top is service-specialized and the bottom is infrastructure-generalized.  The cloud is a hosting model at the bottom layer, but current virtual CPE trends show that virtual functions can be hosted without the cloud.  Our vision of the cloud is also increasingly multi-layer, with virtualization or cloud stack platforms at the bottom, an “Infrastructure-as-Code” or IaC model between, and DevOps tools above.  You’d think the first step in harmonizing the cloud and NFV would be to solidify the models and their relationships.  That’s not yet happening.

In a rough sense, the cloud’s IaC concept corresponds with NFV’s notion of a Virtual Infrastructure Manager.  Either of them would wrap around the underlying platforms and management tools needed to deploy/connect stuff.  The cloud vision of IaC, which includes the specific notion of “events” that trigger higher-layer processes based on resource conditions, is more advanced in my view.  Most significantly, it’s advanced in that it presumes that IaC has to work with what the platform does, and the NFV ISG seems to think that it needs to do a gap analysis on OpenStack (for example) and submit changes.  That opens the possibility of long delays in coordinating implementations, and also raises the question of whether many of the NFV-related features belongs in a general cloud implementation.

Which raises the question of the next layer, because if you don’t have something in the bottom you probably want to look at putting it a layer above.  In the cloud, DevOps is a broad and modular approach to deployment that could (and in most cases, does) offer a range of options from deploying a whole system of applications to deploying a single component.  In NFV, you have Management and Orchestration (MANO) and the Virtual Network Function Manager (VNFM), with the first (apparently) deploying stuff and the second (apparently) doing lifecycle management.  However, these are subservient to a model of a service that presumably exists higher up, unlike the cloud which makes the DevOps layer extensible as high as you’d like.

Operators like AT&T, Orange, Telefonica, and Verizon have been working through their own models for service-building that start up with operations software (OSS/BSS) and extend down to touch SDN, NFV, and legacy infrastructure.  Even here, though, they seem unwilling or unable to define something DevOps-ish as a uniform higher-layer approach.  TOSCA, as a data model, would clearly be suitable and is already gaining favor among vendors too, but some (Cisco included) have orchestration tools that fit into a lower-level model (based on YANG) and don’t really have a clearly defined higher-level tie-in.

One of the impacts of the confusion here is a lack of a convincing service-wide operations automation strategy.  I’ve blogged about this recently, so I won’t repeat the points here except to say that without that strategy you can’t realize any of the NFV benefits and you can’t even insure that NFV itself could be operationalized with enough efficiency and accuracy to make it practical.  Another impact is an inconsistent, often impractical, and sometimes entirely omitted integration model.  The whole cloud DevOps/IaC concept was built to let applications/components deploy on generalized infrastructure.  Without agreeing to adopt this model for NFV, or to replace it with something equally capable, you have no broad framework for harmonizing different implementations of either network functions or infrastructure elements, which puts everything in the professional-services boat.

Interface standards like the ones described in the NFV documents aren’t enough to assure interoperability or openness.  Software is different from hardware, and the most important thing in software is not how the elements connect (which can often be adapted simply with some stubs of code) but how the elements interrelate.  That’s what describes the features and functions, which is the primary way in which an open approach can be architected.  We need this high-level model.

Another reason we need the model is that since it makes no sense to assume that we’d duplicate cloud efforts in NFV, we need to understand where NFV requirements are introduced and how they’re realized.  Much of that the ISG is working on relates to the description of parameters.  What functionality, exactly, is expected to use them?  Do we want OpenStack to check for whether a given VNF can be deployed on a given host, or do we want some DevOps-like capability to decide which host to put something on given any exclusionary requirements?  If we wait till we try to deploy something to find out it doesn’t go there, deployment becomes a series of trial-and-error activities.

Both the “declarative” (Puppet-like) and “imperative” (Chef-like) models of DevOps allow for the creation of modular elements that can be built upward to create a larger structure.  Both also have IaC features, and both allow for community development of either IaC or application/VNF elements and the sharing of these among vendors and users.  This NFV could ride a lot of this process and get to a useful state a lot faster.

It could also get to the cloud state, which may be the most critical point of all.  The difference between a VNF and an application component is minimal, as I’ve noted above.  If operators want to offer features like facility monitoring, is that feature a “VNF” or a cloud application?  Wouldn’t it make sense to assume that “carrier cloud” was the goal, and not that the goal was NFV?  And IoT is just one of several examples of things that would logically blur the boundary between NFV and the cloud even further.

The good news here is that operators are framing NFV in a logical and cloud-like way in their own architectures.  It’s possible that these initiatives will eventually drive that approach through the open-source NFV initiatives, but the operator approaches themselves are the real drivers for change.  NFV is not going to deploy if it has to re-invent every wheel, and those who are expected to deploy it know that better than anyone.

Coupling Resource Conditions and Service SLAs in the Automation of Operations/Management

In a couple of past blogs, I’ve noted that operations automation is the key to both improved opex and to SDN/NFV deployment.  I’ve also said that to make it work, I think you have to model services as a series of hierarchical intent models synchronized with events through local state/event tables.  The goal is to be able to build everything in a service lifecycle as a state/event intersection, or set of synchronized intersects in linked models.  The key to this, of course, is being able to respond to “abnormal” service conditions, and that’s a complex problem.

If you looked at a single service object in a model, say “Firewall”, you would expect to see a state/event table to respond to things that could impact the SLA of “Firewall”.  Each condition that could impact an SLA would be reflected as an event, so that in the “operational” state, each of the events/conditions could trigger an appropriate action to remedy the problem and restore normal operation.  This framework is the key to operationalizing lifecycles through software automation.

Now, if you look “inside” the object “Firewall”, you might find a variety of devices, hosted software elements and the associated resources, or whatever.  You can now set a goal that however you decompose (meaning deploy or implement) “Firewall” you need to harmonize the native conditions of that implementation with the events that drive the “Firewall” object through its lifecycle processes.  If you can do that, then any implementation will look the same from above, and can be introduced freely as a means of realizing “Firewall” when it’s deployed.

This approach is what I called “derived operations” in many past blogs and presentations.  The principle, in summary, means that each function is an object or abstraction that has a set of defined states and responds to a set of defined events.  It is the responsibility of all implementations of the function to harmonize to this so that whatever happens below is reported in a fixed, open, interoperable framework.  This creates what’s effectively a chain of management derivations from level to level of a hierarchical model, so that a status change below is reflected upward if, at each level, it impacts the SLA.

This sort of approach is good for services that have an explicit SLA, and in particular for services where the SLA is demanding or where customers can be expected to participate in monitoring and enforcing it.  It’s clearly inappropriate for consumer services because the resources associated with linking the service objects and deriving operations would be too expensive for the service cost to justify.  Fortunately, the approach of an intent-model object can be applied in other ways.

The notion of an SLA is imprecise, and we could easily broaden it to cover any level of guarantee or any framework of desired operations responses to service or network conditions.  Now our “Firewall” has a set of events/conditions that represent not necessarily guarantees but actions.  Something breaks, and you generate an automatic email of apology to the user.  The “response” doesn’t have to be remedial, after all, and that opens a very interesting door.

Suppose that we build our network, including whatever realizes our “Firewall” feature, to have a specific capacity of users and traffic and to deliver a specific uptime.  Suppose that we decide that everything that deals with those assumptions is contained within our resource pool, meaning that all remediation is based on remedying resource problems and not service problems.  If the resources are functioning according to the capacity plan, then the services are also functioning as expected.  In theory, we could have a “Firewall” object that didn’t have to respond to any events at all, or that only had to post a status somewhere that the user could access.  “Sorry there’s a problem; we’re working on it!”

There are other possibilities too.  We could say that an object like “Firewall” could be a source of a policy set that would govern behavior in the real world below.  The events that “Firewall” would have to field would then represent situations where the lower-layer processes reported a policy violation.  If the policies were never violated no report is needed, and if the policy process was designed not to report violations but to handle them “internally” then this option reduces to the hands-off option just described.

It’s also possible to apply analytics to resource-level conditions, and from the results obtain service-level conditions.  This could allow the SLA-related processes to be incorporated in the model at a high level, which would simplify the lower-level model and also reduce or eliminate the need to have a standard set of events/conditions for each function class that’s composed into a service.

Finally, if you had something like an SD-WAN overlay and could introduce end-to-end exchanges to obtain delay/loss information, you could create service-level monitoring even if you had no lower-level resource events coupled up to the service level.  Note that this wouldn’t address whether knowing packet loss was occurring (for example) could be correlated with appropriate resource-level remediation.  The approach should be an adjunct to having fault management handled at the resource level.

The point of all of this is that we can make management work in everything from a very tight coupling with services to no coupling at all, a best-efforts extreme on the cheap end and a tight SLA on the other.  The operations processes that we couple to events through the intent-modeled structure can be as complicated as the service revenues can justify.  If we can make things more efficient in hosting operations processes we can go down-market with more specific service-event-handling activity and produce better results for the customer.

The examples here also illustrate the importance of the service models and the state/event coupling to processes through the model.  A service is built with a model set, and the model set defines all of the operations processes needed to do everything in the service lifecycle.  SDN and NFV management, OSS/BSS, and even the NFV processes themselves (MANO, VNFM) are simply processes in state/event tables.  If you have a service model and its associated processes you can run the service.

Resource independence is also in there.  Anything that realizes an object like “Firewall” is indistinguishable from anything else that realizes it.  You can realize the function with a real box, with a virtual function hosted in agile CPE, or with a function hosted in the cloud (your cloud or anyone else’s).

Finally, VNF onboarding is at least facilitated.  A VNF that claims to realize “Firewall” has to be combined with a model that describes the state/event processes the VNF needs to be linked with, and the way that “Firewall” is defined as an intent model defines the things that the implementation the VNF provides has to expose as unified features above.

Operations automation can work this way.  I’m not saying it couldn’t work other ways as well, but this way is IMHO the way a software type would architect it if the problem were handed over.  Since service automation is a software task, that’s how we should be looking at it.

The TMF got part-way to this point with its NGOSS Contract approach, which linked processes to events using SOA (Service-Oriented Architecture, a more rigid predecessor to modern microservices) through the service contract as a data model.  It really hasn’t caught on, in part I think because the details weren’t addressed.  That’s what the TMF’s ZOOM project, aimed at operations automation, should be doing in my view, and whether they do it or not, it’s what the industry should be doing.

I think some are doing it.  Operators tell me that Netcracker uses something like this in their presentations, and a few tell me that Ericsson is now starting to do that too.  I think the presentation made last year by Huawei at the fall TMF meeting in the US exposes a similar viewpoint, and remember that Huawei has both the tools needed for low-level orchestration and also an OSS/BSS product.

It’s a shame that it’s taking so long to address these issues properly, because the lack of software automation integration with operations and management has taken the largest pool of benefits off the table for now.  It’s also hampered both SDN and NFV deployment by making it difficult to prove that the additional complexity these technologies introduce won’t end up boosting opex.  If we’re going to see progress on transformation in 2017 we need to get going.

Are Opex Savings Delays Threatening SDN/NFV, or Are We Thinking About Opex Savings the Wrong Way?

Is “the latest and greatest” always great?  There are definitely many examples of fad-buying in the consumer space.  In business, though, it would probably be a career-killing move to suggest a project whose only benefit was adopting “the latest thing”.  That doesn’t mean that there’s not still a bit of latest-thing hopefulness in positioning new technologies, but it suggests that these hopes will be dashed.  According to a Light Reading piece, that’s now happening with SDN/NFV, but I think there’s a bigger question on the table.

The article documents a survey of operators asked when they expected “virtualization” to lower their opex by at least 10%, run in November and again in May.  One result is pretty much predictable; in the most recent survey a lot of operators jumped ship on hopes that this level of savings would be generated in 2017, since that’s now only half-a-year away.  The other result that’s interesting is that the largest group of operators think they won’t see 10% opex savings till 2021 or beyond.  To understand what this means, we have to first look at what we mean by “opex” and analyze from there.

The kind of “opex” that technology could address is what I call “process opex” meaning the costs associated with service sale, delivery, and support.  That differs from what I’ll “financial opex” which counts in my process opex items but also any other service-related costs that are expensed rather than capitalized.  The biggest piece of the difference are things like roaming charges paid to other operators, or backhaul leases.  But if we presume process opex to be the target, then for this year the total opex is about 27 cents per revenue dollar.

Is setting a 10% opex target realistic?  A 10% savings in process opex would yield about three cents per revenue dollar.  To put this in perspective, total capex across all operators averages around 19 cents per revenue dollar, so a 3 cent opex savings corresponds to a reduction in capex of about 16%.  That’s just shy of what operators have said they think the maximum capex reduction for deployment NFV might be.  My own model of potential savings, which I introduced last year, predicted that SDN/NFV deployment could achieve the 10% savings level in 2019, which happens to be roughly the median time of the survey responses.  Thus, I think 10% is a good target.

It’s the next point that’s critical.  Is “virtualization” just SDN/NFV?  I also said last year that if you were to apply intent-model-based software automation to services and service lifecycle management, you could achieve that same 10% savings two years earlier, which means 2017.  That means that applying orchestration principles and service modeling principles to current and new services, even presuming no infrastructure transformation at all, could generate more benefit than SDN and NFV.  SDN and NFV would of course pull through these changes in time, but not as the primary technical goal.  Not only that, the delay in adoption inherent in linking opex transformation to massive transformation of infrastructure would slow the ramp of savings.

You cannot achieve opex reduction with hardware, period.  You can get it with software automation that is pulled through by hardware, or by software automation on its own.  I think the biggest factor in the delay of realization of “virtualization” opex benefits is the fact that we don’t really have a handle on the software-automation side, in part because SDN and NFV are driven by the CTO group and operations (OSS/BSS) is the CIO.  It’s not that operators aren’t seeing the benefits realized as much as they’re not even seeing a path to securing them.  We are only now starting to see operators themselves try to put SDN and NFV into a complete operations context.

We do have, with some initiatives like the Orange NaaS story, indications that operators are elevating their vision of services to the point where the vision disconnects from infrastructure transformation.  Because NaaS tends to be based on overlay technology (as I discussed in an earlier blog this week) it disconnects service processes from infrastructure technology—either the current state or its evolution.  This could mean that NaaS would drive consideration of operations automation separate from infrastructure transformation, bringing us closer to first facing the issue and second addressing it independently of SDN and NFV.

NaaS disconnects service control from real infrastructure by introducing an intermediate layer, the overlay network.  That lets service operations automation operate on something other than real devices, which preserves the current infrastructure until something comes along that really justifies changes there.  When it does, the NaaS model insulates operators and customers from the technology transition/evolution.  But it still doesn’t create operations efficiency.  For that you need to virtualize and software-automate operations itself.

In a software-centric vision of operations, you’d have a data structure made up of “objects” that represent services, features, sub-features, implementation options, infrastructure, etc.  This structure would consist of a series of “intent models” that like all good abstractions hide the details below from the use above.  Operations, management, lifecycle processes, and anything else that’s service-related would be defined for each of the elements of the structure in a state/event table.  This kind of model is composable, and it’s compatible with emerging cloud trends as well as with SDN evolution.  NFV hasn’t spoken on the issue yet.

Despite a lack of clarity on how NFV could address this model, there does seem to be some operator momentum on making it work for NFV and also for SDN.  Since we don’t really have much in the way of SDN/NFV deployment there’s plenty of opportunity to put something into place for it, when it comes along.  The difficulty has been above, with the OSS/BSS processes.  NaaS could bring clarity to that by defining a “network function” pairing—physical and virtual, or PNF and VNF.  That function could then become the bottom of an OSS/BSS service model, and the SDN/NFV orchestration process could be tasked with decomposing it into management commands (for PNFs) or deployment (VNFs).

Having a demarcation between abstract features and real infrastructure has benefits, one of which is that you can evolve operations at both levels with a high level of independence.  For example, instead of having a single service automation and orchestration platform for everything from old to new, top to bottom, you could in theory have different platforms responsible for decomposing different objects at various places in the model.  That means you could define something with one model (TOSCA for example) at the top and another (YANG) at the bottom for legacy, and stay with TOSCA for cloud-deployed elements.  Of course you could also still adopt a single model, if the industry could agree on one and all vendors accepted it!

A demarcation xNF also has risks because the xNF boundary is a likely barrier to tight integration between operations and infrastructure.  How big a risk that is depends on your perspective; today most network lifecycle processes are not managed by OSS/BSS systems.  However, a fully integrated approach could let operations tasks be assigned to handle events even fairly close to (or at the level of) real network/server hardware.  It’s hard to say how useful that would be, so it’s hard to say what we’d lose by foreclosing it.

The Light Reading piece exposes a problem, but I think it’s more than just a transient shortfall of realization versus need in opex management.  The problem isn’t that SDN/NFV is not delivering opex benefits fast enough.  The problem is that opex benefits aren’t in scope for SDN or NFV in the first place.  We’re blowing kisses at operations when we have to, hoping that buyers don’t really dig into details.  What we need to do now is face reality, and recognize that if we want opex efficiency we’re going to get it by transforming operations not transforming the network.  Until we do that, we’re going to undershoot everyone’s expectations.

Unraveling Our NaaS Options

One of the useful trends in network services these days is the trend to retreat from the technology basis for a service and focus on the retail attributes.  You can see this in announcements from operators that they’re supporting “network-as-a-service” or “self-service”, but in fact these same trends are a critical part of the “virtual CPE” (vCPE) movement in the NFV space.  They’re also tied in to managed services and SD-WAN.  The NaaS space, then, might be the unifier of all the trends we’re talking about these days.  So where is it now, and where is it going?

There would seem to be a lot of value in a more dynamic model of network services.  Users even today report that service changes take an average of 17 days to be completed, with some requiring over 30 days.  The problem is acute for networks that cross international borders, but it’s present even where you only have to change operators from place to place or adapt to different access technologies.

The delay frames two hypothetical problems—one being that the cost of all the stuff that takes 17 average days to complete surely reduces profits, and the other being that those days represent non-billing days that could have been billed.  I say “hypothetical” here because it’s clear that you don’t have 17 days of frantic activity, and that even if all 17 days could be made billable, that revenue is available only per-change, not per-customer-per-year.  How much of the 17 days of delay is accounted for in customer planning (I know it takes three weeks so I place an order three weeks before I need service) and wouldn’t have been paid for anyway is impossible to determine.

The challenge that NaaS presents, then, starts with setting realistic goals and frameworks for achieving them.  You definitely need a portal system to let customers interact with fast-track service provisioning, changes, and self-maintenance, but clearly having an online way of initiating a 17-day wait is counterproductive.  Obviously such a strategy would set an even higher level of expectations for instant response, and I think that frames the way NaaS has to be approached.

Business services today (and consumer services as well) are provided mostly as Ethernet or IP services.  Today, the services are “native” byproducts of the devices that are deployed, and the time it takes to configure and plan the services is impacted by the fact that the setup of real devices will have to be changed in some way to make service changes.  If you wanted to give a user a self-service portal, you’d risk the user asking for something by mistake that would destabilize the real infrastructure and impact other users.  There are ways to mitigate these problems, but obviously they’re not satisfactory or operators wouldn’t be looking at new technologies to create agility.

New technology isn’t the answer either, in part because you’d have to evolve to it and somehow support all the intermediate network states and in part because even new network technology would still give self-service users an opportunity to truly mess something up.  Logically the service network has to be independent of infrastructure.  You need an overlay network of some sort, and of course the physical network of today (and in all the intermediate states through which it evolves, to whatever ultimate technology you target) forms the underlay.

The big points about NaaS from an underlay or physical-network perspective are ubiquity, internetwork gateways, and headroom.  You can’t sell an overlay service unless you can reach all the locations in some way.  You can’t assume uniform physical facilities so you have to be able to jump your overlay between different physical networks, whether the networks are different for reasons of technology or administration.  Finally, you have to insure that you have enough underlay capacity to carry the sum of the overlay services you want to sell.

If we look at this from the perspective of the overlay network, we have four principle requirements.  First, every customer network service access point has to be part of some connection network and have a unique address.  Otherwise you can’t get them anything.  Second, the overlay has to be able to ride uniformly on whatever combination of connection networks exist for the prospect base that forms the service target.  Third, if network technology is likely to evolve, the overlay has to be able to accommodate the new technology and the transition states.  Finally, the mechanisms for setting up the overlay network have to be fully automated, meaning software-based.

There is no technical reason why all business and consumer network services we have today, including the Internet, couldn’t be built as an overlay/underlay.  We already see managed services being offered in this form, and this model is what players ranging from Orange to AT&T or Verizon are either offering now or plan to introduce.  Most are also looking to augment overlay connection services with hosted value-add features, which is what virtual CPE is all about.

One reason for the interest in vCPE is that you really need to have some gadget to sit on the “underlay demarcation point” of the physical network and groom off the overlay stuff.  Given that this device is the binding element between infrastructure and service-layer technology, it’s also a logical point where end-to-end management and SLAs would be applied.  And given that, you might as well host some stuff there and make a buck along the way.

In fact, the real question with respect to this vCPE element is less whether it should offer hosting than whether it should ever give the mission up.  While it’s true that cloud-hosting edge-provided services could improve capital efficiency versus a premises device, we can’t eliminate the premises device and so it’s only the incremental mission of NFV-like vCPE hosting is at risk.  Cloud economies probably can’t be created with only business vCPE services to drive opportunity.

NaaS could be proven out readily using overlay technology and supplemented effectively using premises hosting of virtual functions.  From that starting point we could then build out in both an SDN and NFV direction at a pace determined by the benefits of adoption.  Because overlay NaaS could be extended to a wide service area immediately with costs that scale to customer adoption (because the CPE is really the only incremental element in many configurations), we could obtain the benefits we need without huge investments up front.

Overlay NaaS is a perfect complement to operations-driven software automation.  You have to change the front-end service processes to achieve efficiency or agility, and once you do that you could drive an overlay NaaS service set fairly easily.  If I were an OSS/BSS player I’d be excited enough about the possibilities to promote the idea—it could give me a seat at the early service modernization table.

That doesn’t mean that SDN and NFV would be killed off either.  A layer of overlay NaaS, as noted above, insulates the network from disruptions created by technology evolution, but it could also reduce the feature demand on lower layers by elevating the connectivity features and related higher-layer features (firewall, etc.) to the overlay.  This could promote adoption of virtual wires, accelerate the use of SDN to groom agile optical paths, and shift infrastructure investment decisively downward.  Ciena, an optical player, is a player in the Orange NaaS approach, and their Blue Planet is one of the few fully operationally integrated orchestration solutions.  Coincidence?  I don’t think so.

Getting SDN and NFV to Be Truly Symbiotic

The relationship between SDN and NFV has always been complicated and often a bit competitive.  SDN had an early lead for mindshare and vendor support but NFV captured the media’s attention quickly and today it seems to be leading in the field of strategic interest to operators.  However, nearly all the operators I’ve talked with believe that it’s not an either/or a “this-first” approach at all, but a struggle to find a useful symbiosis.

SDN is a connection technology, a way of controlling forwarding and route determination explicitly through centralized control.  In theory it would be usable at any level in networking, though it’s always been my view that SDN isn’t particularly useful at the optical layer.  Since service agility and operations efficiencies are the driving benefits needed for any new technology, and since those benefits are highest at the top of the stack where most services are created, it would make sense to adopt SDN there.  The problem is that operators aren’t convinced that central control scales to large networks where efficiency and agility are the most useful.

NFV is a function hosting technology.  While it makes clear sense if you presume that hosting functions on commodity servers would eliminate expensive proprietary gear, the capex-benefit paradigm has long been discredited as a prime driver because savings aren’t large enough and there are concerns that the additional complexity associated with function hosting would raise opex more than it would lower capex.  In addition, servers are simply not powerful enough to replace all network devices—optical transport and even big switches/routers would require specialized hardware that would probably end up looking like the very proprietary devices NFV is supposed to displace.

There is a clear point of symbiosis between SDN and NFV, or more accurately between cloud hosting of anything that’s multi-tenant and SDN.  The utility of SDN to create multi-tenant data center connections is well established, and if we were to see a major operator commitment to carrier cloud (including NFV) we’d surely see SDN use expand.  However, this mission isn’t really going to move the ball a lot for SDN, in part because we’re still struggling to make a large-scale carrier-cloud-and-NFV business case and in part because SDN inside the data center isn’t a major technology shift.  For that, we need SDN to get into the WAN.

There are two possible approaches to creating an SDN/NFV symbiosis beyond the obvious in-data-center one just noted.  One is to use SDN to address the big-box limitations of NFV by building “virtual-wire” networks that partition connectivity so that at least VPN/VLAN services and possibly other non-Internet services could be built using hosted switch/router instances.  The other is to use NFV to host control/signaling plane functions and let SDN do the data-plane lifting.

I’ve blogged in the past about using SDN to partition private networks by creating per-tenant “virtual wires” that would then be combined with SD-WAN-like edge forwarding rules and hosted-router-instance nodes to create VPNs.  This approach could meld SDN and NFV nicely, with NFV deploying edge and nodal elements and SDN doing the connectivity, but operators say that it has the usual disadvantage—too much first-cost and early risk because it requires fairly wide deployment of both SDN devices and NFV hosting points.

It seems to me that the SD-WAN evolution approach would be the easiest way to get to the virtual-wire-partition model of symbiosis.  You could build an overlay network based on today’s infrastructure and services, including the Internet, and use NFV to deploy both edge features and interior nodes.  SDN could be used to control the connectivity, and if white-box electrical-layer technology expanded to further groom agile optics, this could then create virtual wires that could gradually displace the legacy elements.

The second place where the control/data symbiosis has been demonstrated is in mobile infrastructure.  Anyone who’s followed mobile standards knows that there’s a lot of buzz around the use of SDN and NFV in the IP Multimedia Subsystem (IMS) and Evolved Packet Core (EPC) and that SDN and NFV are also routinely linked to 5G and VoLTE.  If you look at the functional diagram of IMS/EPC you see something that seems to be of mind-boggling complexity, and since mobile infrastructure is about the only place where rapid changes are being funded these days it would be easy to introduce something new at the next point of investment.

IMS and EPC are really signaling-plane activities that rely on a virtual-network overlay even with today’s technology.  There are a dozen functional elements that are there to register devices and associate them with accounts for charging purposes, and another half-dozen that are there to control a set of moving tunnels that link a user in any of a collection of cells with a fixed point of access to services.  IMS and EPC were designed before either SDN or NFV came along, and they are happily running today without any support from SDN or NFV.  The question is whether they could run better.

At the signaling level, the big advantage that NFV could bring is support for the placement of IMS/EPC functional elements in the optimum places, and at the optimum numbers, needed for current traffic and activity.  You can spin up a piece of virtual IMS or EPC as needed, and then decommission it if traffic/activity declines.  Since mobile services are unusually signaling-intensive because of mobility management, the elasticity of control-plane topology and capacity would be especially valuable.

SDN could in theory be used to create the kind of tunnels used in EPC to link the Packet Gateway with the cell sites (via the Serving Gateway).  Since mobile infrastructure, as I’ve noted, is a major target for investment (and backhaul is likely to be more demanding as we evolve to 5G) there’s a good opportunity to drive change.  The question is the business case.  It’s easier to validate NFV for hosting signaling-plane elements in an agile way than to substitute SDN for a traditional tunneling technology.  Not impossible, but it would demand a lot of work be done on operational efficiency, resiliency, etc.  Right now we don’t have the story in complete form.

I think that finding a symbiotic mission for SDN and NFV is rendered more difficult by the fact that any network technology change tends to pose a problem of scope, risk, and benefit.  You can do a little of the new stuff at a low cost and risk, and accomplish little in return.  You can do a lot of the new stuff, which poses high first costs and probably unacceptable risk, but would deliver a significant benefit.  Since this is true of both SDN and NFV, getting the two to dance when you can’t teach either of them the steps is doubly challenging.

The cloud is probably the only solution here.  Widespread use of cloud services could create a marriage of SDN in the data center, NFV to deploy agile components of applications, and SD-WAN to deliver tenant-segmented network connectivity into VPNs.  The only rub is that it would be far easier to drive symbiotic SDN/NFV deployment with the cloud if it were the network operators deploying that cloud.  Operators, while they were initially bullish on cloud offerings, have found it to be a very difficult business.  Some think that relying on the cloud to unify SDN and NFV is just adding another point of difficulty to a situation that’s already hard enough to stymie experts.

I don’t know where this is going, but operators tell me that neither SDN nor NFV can reach full potential without the other.  It may be that the real unifier, if there is one, is the operationalization and service automation revolution that could provide a starting point for all SDN, NFV, and carrier cloud alike.  The question is whether that revolution can get past the inertia of OSS/BSS, which has proved at least as problematic as the inertia of long-lived capital infrastructure.

Event-Driven Operations, OSS/BSS Evolution, and Virtualization

All of the discussions of service modeling and management or operations integration that I’ve recently had beg the question of OSS/BSS modernization.  This is a topic that’s as contentious as that of infrastructure evolution, but it involves a different set of players in both the buyer and seller organizations.  Since operations practices and costs will have to figure in any realistic vision of transformation, we have to consider this side of the picture.

Fundamentally, OSS/BSS systems today are transactional in nature, meaning that they are built around workflows that are initiated by things like service orders and changes.  This structure makes sense given the fact that in the past, OSS/BSS systems were about orders and billing, and frankly it would make sense in the future if we assumed that the role of OSS/BSS systems were to remain as it has been.

A bit less than a decade ago, the OSS/BSS community and the Telemanagement Forum (TMF) started to consider the expansion of the OSS/BSS role to become more involved in the automation of service lifecycle management.  A number of initiatives were launched out of this interest, including the Service Delivery Framework (SDF) stuff and the critical GB942 “NGOSS Contract” work.  The latter introduced the concept of using a contract data model to steer events to operations processes, and so it was arguably the framework for “making OSS/BSS event-driven”, a stated industry goal.

Since that time, there has been a continuous tension between those who saw OSS/BSS evolving as infrastructure changed, and those who saw OSS/BSS architectures totally obsolete and needing revision, in many cases whatever happens at the infrastructure level.  My own modeling of operations efficiency gains has shown that operators could gain more from automating operations practices without changing infrastructure than by letting infrastructure change drive the bus.  So which option is best; evolve or revolt?  You have to start by asking what an “event-driven” OSS/BSS would look like.

An event-driven OSS/BSS would be a collection of microservices representing operations tasks, placed into a state/event structure in a service data model and driven by service and resource events that occurred during the service lifecycle.  This approach collides with current reality in a number of ways.

First, a good event-driven structure has stateless service components like microservices.  That means that the software is designed like a web server—you send it something and it operates on that something without regard for what might have gone before.  Today, most OSS/BSS software is stateful, meaning that it processes transactions with some memory of the contextual relationship between that transaction and other steps, maintained in the process.  In event-driven systems, context is based on “state” and it’s part of the data model.  Thus, you’d have to restructure software.

The second issue with event-driven structures is the data model.  Data models are critical in event-driven systems not only to maintain the state/event tables but also to collect the “contextual variables” that have to be passed among the software elements.  The standard data model for OSS/BSS is the venerable TMF SID, and the while on the surface the structure of the SID seems to lend itself to the whole event-driven thing, the problem is that modernization of the SID hasn’t followed software-practices-driven thinking.  Attempting to preserve the structure has overridden logical handling of new things, which means that a lot of what seems intuitively good turns out not to work.  I’ve had years of exposure to SID and I still get trapped in “logical” assumptions about what can be done that turn out to have to work completely differently than logic would dictate.

The third issue in event-driven operations is the events.  Few operators would put OSS/BSS systems into mainstream network management.  Most of the network equipment today, particularly at Levels 2 and 3, are focused on adaptive behavior that does fault correction autonomously.  Virtualization adds in another layer of uncertainty by making service-to-resource relationships potentially more complicated (horizontal scaling is a good example).  What events are supposed to be included in event-driven operations, and how do we actually get them connected to a service contract?

All of these issues are coming to the fore in things like cloud computing, SDN, and NFV—meaning “virtualization” is driving change.  That means that there are infrastructure-driven standards and practices that are doing most of the heavy lifting in service lifecycle automation, in the name of “orchestration”.

You could orchestrate infrastructure events and virtualization-based remediation of problems below traditional operations.  You could create an event-driven OSS/BSS by extending orchestration into operations.  You could do both levels of orchestration independently, or in a single model.  An embarrassment of riches in terms of choice usually means nobody makes a choice quickly, which is where we seem to be now.

I’m of the view that operations could be integrated into a unified model with infrastructure orchestration.  I’m also of the view that you could have two layered state/event systems, one for “service events” and one for “infrastructure events”.  Either could address the issues of event-driven operations, in theory.  In practice, I think that separation is the answer.

Operators seem to be focusing on a model of orchestration where infrastructure events are handled in “management and orchestration” (MANO, in NFV ISG terms) below OSS/BSS.  This probably reflects in part the current organizational separation of CIO and COO, but it also probably preserves current practices and processes better.  In my top-down-modeling blog earlier this week, I proposed that we formalize the separation of OSS/BSS/MANO by presuming that the network of the future is built on network functions (NFs) that bridge the two worlds.  Services and infrastructure unite with the NF and each have their own view—views related in the NF logic.

Some OSS/BSS vendors seem to be heading in this direction—Netcracker for one.  An OSS/BSS vendor can still provide orchestration software (maybe they should, given the disorder in the vendor space) and if they do, they can define that critical intermediary-abstraction thing I called an NF and define how it exchanges events between the two domains.  If you dig through material from these OSS/BSS leaders, that’s the structure that at least I can interpret from the diagrams, but they don’t say it directly.

This is a critical time for OSS/BSS and all the people, practices, and standards that go along with it.  As network infrastructure moves to virtualization, it moves to the notion of abstract or virtual devices and intent models and other logical stuff that’s not part of OSS/BSS standards today and very possibly never will be.  OSS/BSS types have always told me that you can’t easily stick an intent model into the TMF SID; if that’s true then SID has a problem because an intent model is a highly flexible abstraction that should fit pretty much anywhere.

An NF intermediary wouldn’t mean that OSS/BSS can’t evolve or can’t support events, but it would take the event heavy lifting associated with resource management out of the mix, and it would maintain the notion that OSS/BSS and network management and infrastructure are independent/interdependent and not co-dependent.  Perhaps that’s what we need to think about when we talk about operations’ progress into the future.