So the NFV ISG Wants to Look at Being Cloud-Like: How?

The ETSI NFV ISG is having a meeting, one of which’s goals is to explore a more cloud model of NFV.  Obviously, I’d like to see that.  The question is what such a model would look like, and whether it (in some form) could be achieved from where we are now, without starting another four-year effort.  There are certainly some steps that could be taken.

A “cloud model of NFV” has two functional components.  First, the part of NFV that represents a deployed service would have to be made very “cloud-friendly”.  Second, the NFV software itself would have to be optimized to exploit the scalability, resiliency, and agility of the cloud.  We’ll take these in order.

The first step would actually benefit the cloud as well as NFV.  We need a cloud abstraction on which we can deploy, that represents everything that can host functions and applications.  The model today is about hosts or groups of hosts, and there are different mechanisms to deploy containers versus VMs and different processes within each.  All of this complicates the lifecycle management process.

The biggest NFV challenge here is dealing with virtual CPE (vCPE).  Stuff that’s hosted on the customer prem, in a cloud world, should look like a seamless extension of “the cloud”, and the same is true for public cloud services.  This is a federation problem, a problem of agreeing on a broad cloud abstraction and then agreeing to provide the mechanisms to implement it using whatever mixture of technology happens to be available.  The little boxes for vCPE, the edge servers Amazon uses in its Greengrass Lambda extension, and big enterprise data centers are all just the edge of “the cloud” and we need to treat them like that.

If we had a single abstraction to represent “the cloud” then we would radically simplify the higher-level management of services.  Lifecycle management would divide by “in-cloud” and “not-in-cloud” with the latter being the piece handled by legacy devices.  The highest-level service manager would simply hand off a blueprint for the cloud piece to the cloud abstraction and the various domains within that abstraction would be handed their pieces.  This not only simplifies management, it distributes work to improve performance.

Our next point is Cloudy VNFs, to coin an awkward term, should be for all intents and purposes a cloud application component, no different from a piece of a payroll or CRM system.  If it breaks you can redeploy it somewhere, and if it runs out of capacity you can replicate and load-balance it.  Is this possible?  Yes, but only possible because the attributes of a VNF that could make those attributes available aren’t necessarily there.

If I have a copy of an accounting system that runs out of capacity, can I just spin up another one?  The problem is that I have a database to update here, and that update process can’t be duplicated across multiple instances unless I have some mechanism for eliminating collisions that could result in erroneous data.  Systems like that are “stateful” meaning that they store stuff that will impact the way that subsequent steps/messages are interpreted.  A “stateless” system doesn’t have that, and so any copy can be made to process a unit of work.

A pure data-plane process, meaning get-a-packet-send-a-packet, is only potentially stateless.  Do you have the chance of queuing for congestion, or do you have flow control, or do you have ancillary control-plane processes invoked to manage the flow between you and partner elements?  If so then there is stateful behavior going on.  Some of these points have to be faced in any event; queuing creates a problem with lost data or out-of-order arrivals, but that also happens just by creating multiple paths or by replacing a device.  The point is that a VNF would have to be examined to determine if its properties were consistent with scaling, and new VNFs should be designed to offer optimum scalability and resiliency.

We see this trend in the cloud with functional programming, lambdas, or microservices.  It’s possible to create stateless elements, to do back-end state and context control, but the software that’s usually provided in a single device didn’t face the scalability/resiliency issue and so probably doesn’t do what’s necessary for statelessness.

Control-plane stuff is much worse.  If you report your state to a management process, it’s probably because it requested it.  Suppose you request state from Device Instance One, and Instance Two is spun up, and it gets the request and responds.  You may have been checking on the status of a loaded device to find out that it reports being unloaded.  In any event, you now have multiple devices, so how do you obtain meaningful status from the system of devices rather than from one of them, or each of them (when you may not know about the multiplicity)?

All this pales into insignificance when you look at the second piece of cloud-centric NFV, which is the NFV software itself.  Recall that the ETSI E2E model describes a transactional-looking framework that controls what looks like a domain of servers.  Is this model a data-center-specific model, meaning that there’s a reasonably small collection of devices, or does this model cover an entire operator infrastructure?  If it’s the former, then services will require some form of federation of the domains to cover the full geography.  If it’s the latter, then the single-instance model the E2E diagram describes could never work because it could never scale.

It’s pretty obvious that fixing the second problem would more work than fixing the first, and perhaps would involve that first step anyway.  In the cloud, we’d handle deployment across multiple resource pools by a set of higher-layer processes, usually DevOps-based, that would activate individual instances of container systems like Docker (hosts or clusters) or VM systems like OpenStack.  Making the E2E model cloud-ready would mean creating fairly contained domains, each with their own MANO/VNFM/VIM software set, and then assigning a service to domains by decomposing and dispatching to the right place.

The notion of having “domains” would be a big help, I think.  That means that having a single abstraction for “the cloud” should be followed by having one for “the network”, and both these abstractions would then decompose into domains based on geography, management span of control, and administrative ownership.  Within each abstraction you’d have some logic that looks perhaps like NFV MANO—we need to decompose a service into “connections” and “hosting”.  You’d also have domain-specific stuff, like OpenStack or an NMS.  A high-level manager would orchestrate into high-level requests for abstract services, and that would invoke a second-level manager that would divide things by domain.

We don’t have that now, of course.  Logically, you could say that if we had a higher-layer system that could model and decompose, and if we created those limited NFV domains, we could get to the good place without major surgery on NFV.  There are some products out there that provide what’s needed to do the modeling and decomposing, but they don’t seem to be mandatory parts of NFV.

I’d love to be able to go to meetings like this, frankly, but the problem is that as an independent consultant I have to do work that pays the bills, and all standards processes involve a huge commitment in time.  To take a proposal like this to a meeting, I’d have to turn it into a contribution, defend it in a series of calls, run through revision cycles, and then face the probability that the majority of the body isn’t ready to make radical changes anyway.  So, instead I offer my thoughts in a form I can support, which is this blog.  In the end, the ISG has the ability to absorb as much of it as they like, and discard what they don’t.  That’s the same place formal contributions would end up anyway.

Who Will Orchestrate the Orchestrators (and How)

What exactly is “service automation” and who does it?  Those are the two questions that are top of the list for network operators and cloud providers today, and they’re ranking increasingly high on the list of enterprises as well.  As the complexity of networks increases, as technology changes introduce hosted elements in addition to discrete devices, and as cloud computing proliferates, everyone is finding that the cost of manual service operations is rising too fast, and the error rate even faster.  Something obviously needs to be done, but it’s not entirely clear what that something is.

Part of the problem is that we are approaching the future from a number of discrete “pasts”.  Application deployment and lifecycle management have been rolled into “DevOps”, and the DevOps model has been adopted in the cloud by users.  Network service automation has tended to be supported through network management tools for enterprises and service providers alike, but the latter have also integrated at least some of the work with OSS/BSS systems.  Now we have SDN and NFV, which have introduced the notion of “orchestration” of both application/feature and network/connection functions into one process.

Another part of the problem is that the notion of “service” isn’t fully defined.  Network operators tend to see services as being retail offerings that are then decomposed into features (the TMF’s “Customer-Facing Services, or CFSs).  Cloud providers sometimes see the “service” as the ability to provide platforms to execute customer applications, which separates application lifecycle issues from service lifecycle issues.  The trend in cloud services is adding “serverless” computing, which raises the level of features that the operator provides and makes their “service” look more application-like.  Enterprises see services as being something they buy from an operator, and in some cases what they have to provide to cloud/container elements.  Chances are, there will be more definitions emerging over time.

The third piece of the problem is jurisdictional.  We have a bunch of different standards and specifications bodies out there, and they cut across the whole of services and infrastructure rather than embracing it all.  As a result, the more complex the notion of services becomes, the more likely it is that nobody is really handling it at the standards level.  Vendors, owing perhaps to the hype magnetism of standards groups, have tended to follow the standards bodies into disorder.  There are some vendors who have a higher-level vision, but most of the articulation at the higher level comes from startups because the bigger players tend to focus on product-based marketing and sales.

If we had all of the requirements for the service automation of the future before us, and a greenfield opportunity to implement them, we’d surely come up with an integrated model.  We don’t have either of these conditions, and so what’s been emerging is a kind of ad hoc layered approach.  That has advantages and limitations, and balancing the two is already difficult.

The layered model says, in essence, that we already have low-level management processes that do things like configure devices or even networks of devices, deploy stuff, and provide basic fault, configuration, accounting, performance, and security (FCAPS) management.  What needs to be done is to organize these into a mission context.  This reduces the amount of duplication of effort by allowing current management systems to be exploited by the higher layer.

We see something of this in the NFV approach, where we have a management and orchestration (MANO) function that interacts with a virtual infrastructure manager (VIM), made up presumably of a set of APIs that then manage the actual resources involved.  But even in the NFV VIM approach we run into issues with the layered model.

Some, perhaps most, in the NFV community see the VIM as being OpenStack.  That certainly facilitates the testing and deployment of virtual network functions (VNFs) as long as you consider the goal to be one of simply framing the hosting and subnetwork connections associated with a VNF.  What OpenStack doesn’t do (or doesn’t do well) is left to the imagination.  Others, including me, think that there has to be a VIM to represent each of the management domains, those lower-layer APIs that control the real stuff.  These VIMs (or more properly IMs, because not everything they manage is virtual) would then be organized into services using some sort of service model.  The first of these views makes the MANO process very simple, and the second makes it more complicated because you have to model a set of low-level processes to build a service.  However, the second view is much more flexible.

There are also layers in the cloud itself.  OpenStack does what’s effectively per-component deployment, and there are many alternatives to OpenStack, as well as products designed to overcome some of its basic issues.  To deploy complex things, you would likely use a DevOps tool (Chef, Puppet, Ansible, Kubernetes, etc.).  Kubernetes is the favored DevOps for container systems like Docker, which by the way does its own subnetwork building and management and also supports “clusters” of components in a native way.  Some users layer Kubernetes for containers with other DevOps tools, and to make matters even more complex, we have cloud orchestration standards like TOSCA, which is spawning its own set of tools.

What’s emerging here is a host of “automation” approaches, many overlapping and those that don’t covering a specific niche problem, technology, or opportunity.  This is both a good thing, perhaps, and a bad thing.

The good things are that if we visualize deployment and lifecycle management as distributed partitioned processes we allow for a certain amount of parallelism.  Different domains could be doing their thing at the same time, as long as there’s coordination to ensure that everything comes together.  We’d also be able to reuse technology that’s already developed and in many cases fully proven out.

The bad thing is that coordination requirement I just mentioned.  Ships passing in the night is not a helpful vision of the components of a service lifecycle automation process.  ETSI MANO, SDN controllers, and most DevOps, are “domain” solutions that still have to be fit into a higher-level context.  That’s something that we don’t really have at the moment.  We need a kind of “orchestrator of orchestrator” approach, and that is in fact one of the options.  Think of an uber-process that lives at the service level and dispatches work to all of the domains, then coordinates their work.  That’s probably how the cloud would do it.

The cloud, in fact, is contributing a lot of domain-specific solutions that should be used where available, and we should also be thinking about whether the foundation of the OofO I just mentioned should be built in the cloud and not outside it, in NFV or even OSS/BSS.  That’s a topic for my next blog.

Can We Make ETSI NFV Valuable Even If It’s Not Optimal?

Network Functions Virtualization (NFV) has been a focus for operators for five years now.  Anyone who’s following my blog knows I have disagreed with the approach the NFV ISG has taken, but it took it.  The current model will never, in my view, be optimal, as I’ve said many times in past blogs and media interviews.  The question now is whether it can be useful in any way.  The answer is “Yes”, providing that the industry, and the ISG, take some steps quickly.  The goal of these steps is to address what could be serious issues without mandating a complete redesign of the software, now largely based on a literal interpretation of the ETSI ISG’s End-to-End model.

The current focus of NFV trials and deployments is virtual CPE (vCPE), which is the use of NFV to substitute for traditional network-edge appliances.  This focus has, IMHO, dominated the ISG to the point where they’ve framed the architecture around it.  However, the actual deployments of vCPE suggest that the real-world vCPE differs from the conceptual model of the specs.  Because of the central role of vCPE in early NFV activity, it’s important that these issues be addressed.

What was conceptualized for vCPE was a series of cloud-hosted features, each in its own virtual machine, and each linked to the others in a “service chain”.  What we actually see today for most vCPE is a general-purpose edge device that is capable of receiving feature updates remotely.  This new general-purpose edge device is more agile than a set of fixed, purpose-built, appliances.  Furthermore, the facilities for remote feature loading make a general-purpose edge device less likely to require field replacement if the user upgrades functionality.  If vCPE is what’s happening, then we need to optimize our concept without major changes to the ETSI model or implementation.

Let’s start with actual hosting of vCPE features in the cloud, which was the original ETSI model.  The service-chain notion of features is completely impractical.  Every feature adds a hosting point and chain connection, which means every feature adds cost and complexity to the picture.  My suggestion here is that where cloud-hosting of features is contemplated, abandon service chaining in favor of deploying/redeploying a composite image of all the features used.  If a user has a firewall feature and adds an application acceleration feature, redeploy a software image that contains both to substitute for the image that supports only one feature.  Use the same VMs, the same connections.

Some may argue that this is disruptive at the service level.  So is adding something to a service chain.  You can’t change the data plane without creating issues.  The point is that the new-image model versus new-link model has much less operations intervention (you replace an image) and it doesn’t add additional hosting points and costs.  If the cost of multi-feature vCPE increases with each feature, then the price the user pays has to cover that cost, and that makes feature enhancement less attractive.  The ETSI ISG should endorse the new-image model for cloud-hosted vCPE.

Let’s now move to the market-dominant vCPE approach, which is a general-purpose edge device that substitutes for cloud resources.  Obviously, such a hosting point for vCPE doesn’t need additional hosting points and network connections to create a chain.  Each feature is in effect inserted into a “virtual slot” in an embedded-control computing device, where it runs.

One of the primary challenges in NFV is the onboarding virtual functions and interoperability of VNFs.  If every general-purpose edge device vendor takes their own path in terms of the device’s hosting features and local operating system, you could end up with a need for a different VNF for every vCPE device.  You need some standard presumption of a local operating system, a lightweight device-oriented Linux version for example, and you need some standard middleware that links the VNF to other VNFs in the same device, and to the NFV management processes.

What NFV could do here is define a standard middleware set to provide those “virtual slots” in the edge device and support the management of the features.  There should be a kind of two-plug mechanism for adding a feature.  One plug connects the feature component to the data plane in the designated place, and the other connects it to a standard management interface.  That interface then links to a management process that supplies management for all the features included.  Since the whole “chain” is in the box, it would be possible to cut in a new feature without significant (if any) data plane interruption.

This same approach could be taken for what I’ll call the “virtual edge device” approach.  Here, instead of service-chaining a bunch of features to create agility, the customer buys a virtual edge device, which is a cloud element that will accept feature insertion into the same image/element.  Thus, the network service user is “leasing” a hosting point into which features could be dynamically added.  This provides a dynamic way of feature-inserting that would preserve the efficiency of the new-image model but also potentially offer feature insertion with no disruption.

The second point where the NFV community could inject some order is in that management plug.  The notion here is that there is a specific, single, management process that’s resident with the component(s) and interacts with the rest of the NFV software.  That process has two standard APIs, one facing the NFV management system (VNFM) and the other facing the feature itself.  It is then the responsibility of any feature or VNF provider to offer a “stub” that connects their logic to the feature-side API.  That simplifies onboarding.

In theory, it would be possible to define a “feature API” for each class of feature, but I think the more logical approach to take would be to define an API whose data model defines parameters by feature, and includes all the feature classes to be supported.  For example, the API might define a “Firewall” device class and the parameters associated with it, and a “Accelerator” class that likewise has parameters.  That would continue as a kind of “name-details” hierarchy for each feature class.  You would then pass parameters only for the class(es) you implemented.

The next suggestion is to formalize and structure the notion of a “virtual infrastructure manager”.  There is still a question in NFV as to whether there’s a single VIM for everything or a possible group of VIMs.  The single-VIM model is way too restrictive because it’s doubtful that vendors would cooperate to provide such a thing, and almost every vendor (not to mention every new technology) has different management properties.  To make matters worse, there’s no organized way in which lifecycle management is handled.

VIMs should become “infrastructure managers” or IMs, and they should present the same kind of generalized API set that I noted above for VNFM.  This time, though, the API model would present only a set of SLA-type parameters that would then allow higher-level management processes to manage any IM the same way.  The IM should have the option of either handling lifecycle events internally or passing them up the chain through that API to higher-level management.  This would organize how diverse infrastructure is handled (via separate IMs), how legacy devices are integrated with NFV (via separate IMs), and how management is vertically integrated while still accommodating remediation at a low level.

The final suggestion is aimed at the problem I think is inherent in the strict implementation of the ETSI E2E model, which is scalability.  Software framed based on the functional model of NFV would be a serialized set of elements whose performance would be limited and which would not be easily scalable under load.  This could create a major problem should the failure of some key component of infrastructure cause a “fault cascade” that requires a lot of remediation and redeployment.  The only way to address this is by fragmenting NFV infrastructure and software into relatively contained domains which are harmonized above.

In ETSI-modeled NFV, we have to assume that every data center has a minimum of one NFV software instance, including MANO, VNFM, and VIM.  If it’s a large data center, then the number of instances would depend on the number of servers.  IMHO, you would want to presume that you had an instance for each 250 servers or so.

To make this work, a service would have to be decomposed into instance-specific pieces and each piece then dispatched to the proper spot.  That means you would have a kind of hierarchy of implementation.  The easiest way to do this is to say that there is a federation VIM that’s responsible for taking a piece of service and, rather than deploying it, sending it to another NFV instance for deployment.  You could have as many federation VIMs and layers thereof as needed.

All of this doesn’t substitute completely for an efficient NFV software architecture.  I’ve blogged enough about that to demonstrate what I think the problems with current NFV models are, and what I think would have to be done at the bottom to make things really good again.  These fixes won’t do that, but as I said at the opening of this blog, my goal isn’t to make current NFV great or even optimal, but rather to make it workable.  If that’s done, then we could at least hope that some deployment could occur, that fatal problems with NFV wouldn’t arise, and that successor implementations would have time to get it right at last.

What to Expect in Network Operators’ Fall Planning Cycle

Network operators generally do a fall technology plan to frame their following-year budget.  The timing varies with geography and operator, but most are active between mid-September and mid-November.  This year, a fair number of operators have done some pre-planning, and we can actually see the results in their quarterly earnings calls, as well as the calls of the network equipment vendors.  I’ll track the plans as they evolve, but this is a good time to baseline things.

Nearly all the operators reported lower capex could be expected for 2017, and most have actually spent a bit ahead of their budget plans.  As a result, the 4th quarter is looking a bit soft, and you can see that in the guidance of the equipment vendors and that for the operators themselves.  This shouldn’t come as a surprise, given that operators are feeling the pressure of declining profit per bit, which makes investment in infrastructure harder to justify.

Among the operators who have done some pre-planning, three issues have been raised.  First is whether SDN and NFV could bring about any meaningful change in revenue or profit, and for some at least, if “not” then “what might?”  Second is whether there is a potential for a change in regulatory climate that could help their profits, and third is just what to expect (if anything) from 5G.  We’ll look at each of these to get a hint of what might happen this fall and next year.

What operators think of either SDN or NFV is difficult to say because the response depends on who you’re talking to.  The CTO people are the most optimistic (not surprisingly, given that they include the groups working on the standards), and the CFO people tend to be the least.  Among the specific pre-plan operators, the broad view is “hopeful but not yet committed”.  There is general agreement that neither technology has yet made a business case for broad adoption, and that means neither has a provable positive impact on the bottom line.

Perhaps the biggest issue for this fall, based on the early input, is how a better business case could be made.  Nobody disagrees that both SDN and NFV will play a role in the future, but most operators now think that “automation”, by which they mean the automated service lifecycle management I’ve been blogging about, is more important.  Full exploitation of automation is outside the scope of both SDN and NFV in current projects and plans, and there is no standards body comparable to the ONF or ETSI NFV ISG to focus efforts.

“No standards body” here is interesting because of course the TMF is a body that could drive full service lifecycle automation.  It didn’t come up as much among pre-planning users, in large part because only the CIO organizations of operators seem to have much knowledge of or contact with the TMF.  In my view, the TMF also tends to generate its documents for consumption by its own members, using their own terminology.  That makes it harder for operator personnel who aren’t actively involved to understand them, and it reduces their media coverage as well.  In any event, the TMF doesn’t seem to be pushing “automation”, and so we’re a bit adrift on the SDN/NFV side for the fall planning cycle.

The regulatory trends are another up-in-the-air issue.  In the US, the Republican takeover of the FCC seems to be intent on reversing the pro-OTT mindset of previous FCCs, particularly the Wheeler Chairmanship that preceded the current (Pai) one.  Under Wheeler the FCC declared that the Internet was a telecommunications service regulated under Title II, which gave the FCC the ability to control settlement and pricing policies.  Wheeler took that status as a launching-pad for ruling against settlement among ISPs and paid prioritization, both of which could help ISP (and thus network operator) business models.  Pai seems determined to eliminate that classification, but even if he does the position could change with a change in administration in Washington.  There’s talk of Congress passing something to stabilize the net neutrality stance, but that might never happen.

Outside the US, regulatory trends are quite diverse, as has been the case for a decade or more.  However, operators in both Europe and Asia tell me that they see signs of interest in a shift to match the US in accepting paid prioritization and settlement.  If that were to happen, it could at least provide operators with temporary relief from profit compression by opening a revenue flow from OTTs to operators for video.  That would probably boost both legacy infrastructure spending and work on a longer-term revenue and cost solution.  However, operators don’t know how to handicap the shift of policy, and so far it’s not having a big impact on planners.

The final area is the most complicated—5G.  Generally, operators have accepted that they’ll be investing in 5G, with the impact probably peaking in 2021-2022, but the timing and the confidence operators have in a specific infrastructure plan varies considerably.  In the US, for example, there is considerable interest in using 5G with FTTN as a means of delivering high bandwidth to homes in areas where FTTH payback is questionable.  Operators in other countries, particularly those where demand density is high, are less interested in that.  Absent the 5G/FTTN connection, there isn’t a clear “killer justification” or business case for 5G in the minds of many operators.  “We may be thinking about an expensive deployment justified by being able to use the ‘5G’ label in ads,” one operator admits.

The 5G issue is where pre-planners think the overall focus for fall planning will end up.  Some would like to see a 5G RAN-only evolution, including those with FTTN designs.  Others would like to see the convergence of wireless and wireline in the metro, meaning the elimination or diminution of investment in Evolved Packet Core for mobile.  Still others with MVNO partner aspirations like network slicing.  Everyone agrees that it’s not completely clear to them that 5G evolution will improve things, and they say they’ll go slow until that proof is out there.  The pre-planners didn’t see IoT support as a big near-term driver for 5G, interestingly.

4G transition came along, operators say, at a critical point in market evolution, where the advent of smartphones and the growth in mobile phone usage drove demand upward sharply and outstripped old technologies.  There’s a question among operators whether that kind of demand drive will work for 5G, in no small part because it’s not clear whether competition will stall ARPU growth or even drive it down.  Operators would invest to fend off competition as long as service profits overall were promising, but it’s not clear to them whether they will be.  They’ll try to find out this fall.

Which raises the last point, the last difficulty.  Operators have historically relied on vendor input for their technology planning, under the logical assumption that it did little good to speculate about technologies that nobody was offering.  The problem is that the vendors have demonstrably failed to provide useful technology planning support in areas like SDN and NFV, and are failing in 5G by most accounts.  The pre-planners think that vendors still think that operators are public utilities engaged in supply-side market expansion.  Build it, and they will come.  The operators know that’s not a reasonable approach, but their own efforts to move things along (such as the open-source movement in both SDN and NFV) seem to have very long realization cycles and significant technology uncertainties.

We’re in an interesting time, marketing-wise.  We have a group of buyers who collectively represent hundreds of billions in potential revenue.  We have a group of sellers who don’t want to do what the buyers want and need.  The good news is that there are some signs of movement.  Cisco, who more than any other vendor represents a victory of marketing jive over market reality, is reluctantly embracing a more realistic position.  Other vendors are taking steps, tentatively to be sure, to come to terms with the new reality.  All of this will likely come to focus this fall, whether vendors or operators realize it or not.  There’s a real chance for vendors here, not only the usual chance to make the most of the fall planning cycle, but a broader chance to fill market needs and boost their own long-term success.

What Are We Missing About “Multi-Cloud?”

What’s the thing (well, one of the things) I’m sick of hearing?  It is that, as Michael Dell and others have said recently, “It’s definitely a multi-cloud world.”  Why am I sick of it?  Two reasons.  First, because it’s never been anything else, and the fact that’s only now recognized is a sad commentary on the industry.  Second, because the statement has no value to planners, only to publications who want to sell ads based on story clicks.  We’ve missed the technical point by focusing on glitz…as usual.

It’s always nice to put things into a financial perspective.  Global total IT spending is about a trillion dollars a year.  I ran the cloud through my modeling process five years ago, and it churned out a prediction that the migration of current applications to the cloud could never exceed 23% of that total.  Today, total cloud spending is about a tenth of that 23%, and more than half isn’t business applications at all, but web companies serving the consumer market.

What this says is that unless you believe that enterprises have scrapped over 95% of their current IT and gone back to high stools and green eyeshades, the cloud has not displaced the data center.  Nor will it, ever.  Given that, the data center will always be a “private cloud” supplemented by a “public cloud”, which makes it both a hybrid cloud and a multi-cloud.

The application view of the same picture creates a similar result.  What are the top reasons for enterprise use of the cloud?  Resiliency and scalability.  If I want to use the cloud as a backup resource, to replace something that’s broken or to scale something that’s overloaded, where does the original application live?  In the data center, by long odds.  Thus, users expect the public cloud to look like an extension of the data center, which is a multi-cloud environment.

Even if you want to say that “multi-cloud” is multiple public cloud providers, that kind of vision is the explicit goal of almost three-quarters of all enterprises I’ve talked with.  Most feel that way because they don’t want to be “locked in” to a single provider, but the second-place answer is that they believe that they would find the “optimum” provider for different geographies or different applications to be…well…different.

These are all “why?” reasons to say that multi-cloud is the de facto approach.  There’s also a “why not?” reason, meaning that there is a set of technology requirements and trends that would tend to erase any distinction among multiple clouds, which raises the question of why you wouldn’t want to adopt that model.  We met one already—users want to be able to move applications and their components freely to wherever they need to be hosted.  There are more, and in particular one giant one.

The largest use of public cloud services for enterprises today is as a front-end for business applications.  The public cloud hosts web and mobile elements of applications, and it can spin up another to replace or supplement what’s there.  Public cloud providers know this and have offered a lot of support for the application, in the form of web services.  They are now offering a set of web services aimed at what’s being called “serverless computing”.  The right kind of component (a functional process or “lambda” or a microservice, depending on the cloud provider) can be run on demand anywhere, with no reserved resources at all.  Wouldn’t “anywhere” logically mean “in any cloud or in the data center?”  You can’t believe in serverless without believing in multi-cloud.

OK, hopefully this all demonstrates that anyone who looked at the cloud seriously and logically would have concluded from the first that multi-cloud was where things had to go.  What about my second reason?

If you dip into multi-cloud drivers and requirements, what you see is a vision of the cloud as a kind of seamless compute fabric.  You want to run something?  You make some policy decisions that include QoE, price, security, and so forth, and you deploy.  Every option you have doesn’t have its own unique deployment and lifecycle requirements because the differences would make operationalizing the picture impossible.  What do you do, then?  Answer: You rely on the principles of abstraction and virtualization.

In IaaS services, a “host” is a virtual machine.  The services from different public cloud providers or different cloud stacks for the private cloud differ from each other in their management, but they’re all supposed to run things the same way.  That property should be even more apparent in serverless computing.  In effect, what cloud users want is a kind of “virtual cloud” layer that’s above the providers, describing component hosting and connectivity in a uniform, universal, way.  This is what we should have realized we needed from the first, and might have realized had everyone recognized that multi-cloud was where we’d end up (which they should have).

We also need to be thinking about how “serverless” computing is represented at the functional level, as well as how various cloud provider web services are represented.  If you want something to be portable, you’d also like for it to be able to take advantage of service features where they’re available, or to limit hosting options to where you can get them.  That suggests a middleware-like tool that’s integrated with the virtualization layer to allow developers to build code that dynamically adapts to different cloud frameworks.  If we had all of that, then multi-cloud would be a giggle, as they say.

The frustrating thing about the one-two combination of insightless promotion of the cloud is how much it’s probably cost us.  We still don’t have a realistic picture of what a true multi-cloud architecture would look like.  We don’t have a software development framework that lets enterprises or software houses serving enterprises build the optimum software.  Who was the innovator that launched functional/lambda/microservice serverless computing?  Twitter.  Even today, more than two years after a Twitter blog described their model, most enterprises don’t know about it, what it could mean, and how they should plan to use it.

This has infected areas beyond the enterprise.  NFV kicked off in a real sense in 2013, so the Twitter blog came along a couple years after the start.  Have we fit the model, which supports what’s approaching ten billion sessions per day, into NFV to make it scalable?  Nope.  Probably most people involved, in both the vendor and operator communities, don’t even know about the concept.

Nor do they know that Amazon, Google, IBM, and Microsoft now all have serverless options based on that original Twitter concept.  The efforts by network operators and network vendors to push networking into the cloud era is falling further behind the players who have defined the cloud era.  This may be the last point in market evolution where network operators can avoid total, final, irreversible, disintermediation.  NFV will not help them now.  They have to look to Twitter’s and Google’s model instead.

Can Service Providers Really Win With an API Strategy?

Everyone loves to talk about APIs, so much so that you could be forgiven for thinking that they were the solution to all of the problems of the tech world.  I did a straw poll, and wasn’t too surprised to find that only about 40% of network professionals had a good grasp of what APIs were and could do, and that almost 20% couldn’t even decode the acronym properly.  All this, and yet there are many who say that APIs are the key to monetizing service provider networks, or the key to new services, or both.  Are we leaping to API conclusions here, based on widespread misunderstanding?  To answer that we have to dig into that “what-are-the-and-what-do-they-do” question.

API stands for “Application Program Interface”, and the term has been used for decades to describe the way that one program or program component passed a request to another.  The concept is even older; many programming languages of the 1970s supported “procedures” or “processes” or “functions” that represented semi-independent and regularly used functionality.  And even in the 1960s and the days of assembler-language programs, old-timers (including me) were taught to structure their logic as a “main routine” and “subroutines”.

All this shows that first and foremost, APIs are about connecting modular components of something.  Back in the older times, those components were locally assembled, meaning that an API was a call between components of the same program or “machine image” in today’s cloud terms.  What happened in the ‘70s is that we started to see components distributed across systems, which meant that the APIs had to represent a call to a remote process.  The first example of this was the “remote procedure call” or RPC, which just provided a middleware tool to let what looked like a local API reference connect instead to a remote component.  Web services and Service Oriented Architecture (SOA) evolved from this.

The Internet introduced a different kind of remote access with HTTP (Hypertext Transfer Protocol) and HTML/XML.  With this kind of access, a user process (a browser) accessed a resource (a web page) through a simple “get” and updated it (if it was a form that could be updated) with a “post”.  This kind of thing was called “Representational State Transfer” or REST.  Most procedure calls are “stateful” in that they are designed to transfer control and wait for a response; RESTful procedures are stateless and the same server can thus support many parallel conversations.

So we have APIs that can represent remote processing functions or resources.  Why is this so hot?  The reason is that businesses that have processing functions or resources (cloud providers, web providers, and network operators) could sell access to these functions/resources.  For network operators in particular, there’s been a theory that selling access to network services/features through APIs instead of selling traditional connection services, could be a new business model.  Some pundits think that exposing all the network and management/operations features through APIs might be a significant revenue source.  Could it?

Well, it depends.  You can sell a service feature profitably if there are buyers and if the price the buyers are willing to pay generates a profit for you even after you factor in any loss of revenue created by having the features either create competition for or displace higher-level services you sell.  In other words, would others leverage the stuff you sell through APIs to either compete with you broadly, or replace a composite offering you make with a cheaper set of piece-parts.  We can see this in two examples.

In our first case, Operator A exposes operations services via an API.  These provide for robust service ordering, billing, and customer care.  A startup operator might never be able to establish these services on their own, but could they add them to a bare network and create a credible competitive offering?  Yes, they could.  Thus, the cost of the services delivered through the API would have to factor in this risk, and that might end up pricing them out of the market.

In our second case, Operator A exposes a simple message service among sites via an API.  A customer who purchases connectivity services could take this message service and use it to carry transactions, which might allow them to replace the connectivity services.  Unless the message service was priced higher, the result would be a net loss to the operator.

The point here is that the most likely way for APIs to pay off is if they represent new capabilities rather than exposing old ones.  In the latter case, there will always be some risk that the exposure will in some way threaten the services that contributed the capabilities in the first place.

APIs that represent new services open a question, not of just what the APIs should look like but what the new services really should be.  An example is IoT.  Should an operator build a complete IoT service, or provide a set of low-level features for sale, enabling third parties to turn those features into a complete set of retail offerings?  In short, should the operator use APIs to create retail-model or wholesale-model services?

The “classic wisdom” (which, I’m sure regular readers of my blog, I’ll contend isn’t wisdom at all) has been that operators should fall into a wholesale API model and expose their current service components.  In other words, get a fillet knife and cut off pieces of yourself for OTTs to eat.  The smart money says that operators have to get quickly to new features to expose, new service components, and then make a retail/wholesale decision based on the nature of the element.

IoT represents the best source of examples for that smart-money choice.  Operators could look at the entire IoT event-to-experience food chain, and formulate an architecture to host key processes.  They could then see how much work it would take to turn that into a retail service, what the revenue potential might be for that service, and whether there would be a risk that others might pick a better retail service choice to fund their own deployment of the basic processes.

IMHO, operators should look at something like IoT to frame a vision of an event-driven, context-enhanced, service future.  That would give them a retail outlet on one hand, an outlet that might have enough profit potential to significantly reduce the magnitude of investment in infrastructure that operators would make before they saw enough revenue to break even and then show a profit.  That’s “first cost” in carrier parlance.  They could then, with retail value established there, expand at the retail level where they had market opportunity to exploit, and at the wholesale level where others could do it better.

The value of this approach is clear; you have a specific service target and revenue opportunity with which to justify the deployment of servers and software.  The problem is nearly as clear; you need to convince the operators of a linkage, and that’s something I think vendors would ordinarily be expected to do.  They’ve not done it yet, particularly in NFV, and while things were fiddling, the operators were focusing on an open-source solution.  Today, five times as many operators think NFV will emerge from open-source projects like ONAP than from vendors.  That’s bad because it would be very difficult to get the right architecture out of an open-source project.

It’s not that open-source isn’t a player.  Most of the technology that will shape the kind of service-centric software infrastructure I’ve described comes from open-source.  What doesn’t is the glue, the organized middleware tools and application notes, and that will require a unique marriage of software and network expertise.  I don’t doubt that most vendors have the right individual skills somewhere, and that it exists in the open-source community, but the combining of the skills is going to be a challenge, particularly in development activity that has to start from scratch.

So, are APIs over-hyped?  Surely; they are not a source of opportunity but rather a step in realizing a revenue model from new service features.  In that role, though, they are very important, and it’s worth taking the time to plan an API strategy carefully—once you have planned the underlying services even more carefully!  A gateway into a useless, profitless, service isn’t progress.

Who’s the Biggest Force for Network Technology Change and Why?

It’s always popular to talk about who’s going to lead the next big step in something.  In networking these days, you might look to Cisco’s Robbins, for example.  I have my own candidate, one many of you may never heard about.  It’s Ajit Pai.

Pai is the new Chairman of the FCC, and like all Federal commissions, the FCC is changing leadership and tone with the change of the party controlling the Presidency.  Under the previous Democratic Chairman, Wheeler, the FCC took dramatic steps to impose net neutrality.  Pai is already dedicated to relaxing those rules.  On May 23rd, the FCC took the mandatory first step of issuing a Notice of Proposed Rulemaking (NPRM) that outlines what the FCC is looking to do, which is essentially to reverse the Wheeler FCC decision to declare the Internet a telecommunications service subject to full FCC regulation.  This could make a radical change in the business of the Internet, and a similarly radical change in infrastructure.  We’ll see what that might look like, and what’s driving my “could” qualifier, below.

There’s not much point in doing a deep analysis of the regulations at this point because there are still steps to be taken, and the final order probably won’t come along until well into 2018.  However, one of the key differences between the positions of the two FCC party factions is the issue of paid prioritization and settlement on the Internet.  So, let’s not try to handicap the outcome of the FCC’s current action.  Let’s also forego the question of whether this is “good” or “bad” for an open Internet.  Instead let’s look at what specific technology impacts we might see were the FCC to reverse the policy on prioritization and settlement.

Way back in the ‘80s, I was involved with the then-CTO of Savvis, Mike Gaddis, on an RFC to introduce settlement to ISPs.  This was obviously in an earlier and less polarized time, and many in the Internet community believed that for the Internet to prosper as a network you had to introduce some QoS, which can’t happen if each ISP bills its own connection customers and keeps all the revenue they gain.  Why prioritize when you’re not paid?  I still think this principle is a good one, and in any event, it opens a great avenue to discuss the technology implications of changes in neutrality policy.

The thing we call “the Internet” here, of course, is virtual.  In the real world, the Internet is a federation of operators, and it’s this fact that makes the whole QoS thing important, and difficult.  Remember that we started this discussion with a “suppose…”  Just because the US creates QoS and settlement within this community of operators doesn’t mean everyone does.  We still might have places where there is neither settlement nor QoS, and the more such places exist, the harder it would be to totally eclipse private networks globally.  But let’s carry on with our supposition to see what else could happen.

Suppose that you could ask for specific QoS from the Internet and get it?  There would be two impacts, one a leveling of business services into an Internet model, and the other the populization of QoS by its extension into the consumer market.  Both could be significant, but the second could be seismic.  In fact, the changes that Chairman Pai may be contemplating would change the business structure of the Internet, perhaps taking such a long step toward establishing a rational business framework that it would reverse the pressure on capex.

With full QoS on the Internet, the notion of VPNs now separates from the network and focuses instead on edge devices and the SD-WAN model.  SD-WANs can manage the prioritized services and request priority when needed, balancing traffic between best-efforts services and various levels of priority.  SD-WANs can also add security to the picture, creating what is much closer to being a true “virtual private network” than just using a subset of Internet addresses would create.

Consumer QoS, in either the subscriber-initiated form (premium handling subscriptions or the “turbo button”) or provider-paid (Netflix or Amazon paying for premium delivery) would mean that QoS would have to be a much broader capability, touching many more users and impacting much more traffic.  It’s likely that this would drive operators to seek the most effective way of offering QoS, especially since consumer price tolerance would be lower.  Thus, prioritization and settlement could boost things like fiber, agile optics, and SDN virtual wires.

Prioritization doesn’t mean that you don’t still have the current model, but I’m sure many would argue that ISPs would all collude to make best-efforts services unavailable or so bad that they were useless.  Well, we have best-efforts now and that’s not the case.  Just being able to charge for special handling doesn’t eliminate all other handling options.

QoS and settlement would tend to favor larger operators with either a lot of reach or with market power to enter into agreements with other operators.  Regulations aimed at preventing that would end up looking much like common carrier regulations, and if the FCC is getting us settlement and QoS by declaring that Internet services are not common carrier services, those additional regulations to prevent large-operator dominance might be hard to impose.  However, the experiences I had myself, and those of others still involved in brokered peering, suggest that the ISPs overall would be happy to adopt an open brokered peering strategy, where everyone could do QoS peering at designated points.

All of this, of course, depends on there actually being a demand for Internet QoS.  If there were no consumer demand, then operators would obviously have no incentive to offer it even if regulators allowed for it, because operators would lose money on a business switch from MPLS VPNs or Ethernet VLANs to SD-WAN Internet VPNs.  Research suggests that in order for consumer QoS to pay, it’s essential that a “third-party payment” mechanism be validated.  If Netflix can charge customers for premium delivery, then settle with the ISPs for the QoS, there’s a very strong chance that this can all work.  The “turbo button” approach has much less appeal.

The Wheeler FCC took the position that no paid prioritization was acceptable.  The Genachowski FCC (before Wheeler) said that it was OK if consumers paid.  What Pai may end up with is that any form of prioritization is OK as long as it’s non-discriminatory, meaning everyone can pay for it if they want, and pay based on the same pricing structure.  That’s because the absence of Title II common carrier status for the Internet means that the FCC has no jurisdiction to regulate pricing or pricing policies there.  That’s what the Federal Appeals Courts told the FCC, which is why we ended up with Title II status to begin with.  Thus, we are probably heading for a prioritization and settlement decision that would lift all barriers, creating the largest business impact on operators and vendors, and potentially the largest technical impact as well.

In the near term, the prioritization-and-settlement policy would, as I noted above, reduce the pressure of price/cost crossover for operators.  That would likely open up capital budgets, raising revenues for network equipment vendors.  The increased spending would also be directed mostly at currently validated infrastructure and devices, meaning that it wouldn’t immediately result in a flood of SDN or NFV spending.

NFV would benefit from the SD-WAN process, but only in the limited premises-hosted vCPE model that we already see dominating.  Operators now realize that even if you had multiple features to deploy (firewall, SD-WAN, etc.) you would almost certainly elect to use a composite image for all the current features rather than service-chain multiple separately hosted features.  The latter approach would cost more in hosting, and generate more delay.  It would also raise operational complexity considerably; a two-host chain is twice as complex as a single-host image to deploy and sustain operationally.

If you believe the operators, though, the relaxation in profit pressure that prioritization and settlement would create would further both SDN and NFV innovation.  The operators recognize that anything that’s Internet-related and consumer-driven is going to be subject to price pressure, which means that it will have to be cost-managed carefully.  Operators tell me that both SDN and NFV innovation would be accelerated by the regulatory shift, but that these would not be the first or primary focus points.

What would be?  Number one is service lifecycle automation.  The nice thing about the prioritization and settlement shift is that it would allow operators to undertake a change in their service management practices without the pressure of creating an immediate return in terms of cost reduction.  Operators know, of course, that Internet prioritization is not constant as much as on-demand, episodic, based on content viewing.  That means it has to be invoked and removed quickly and cheaply.

The problem with this area is that operators really don’t have a solid strategy.  Most of their automation vision comes from pieces of SDN and NFV, and neither were designed as full-range lifecycle automation projects or based on advanced cloud principles.  Not all operators (in fact, less than half) accept the need to frame automation on advanced cloud principles, but nearly all know that they have to cover the whole of the service lifecycle and the full range of operations tasks.

The second focus area is carrier cloud service-layer deployment.  Operators are coming to realize that their best long-term strategy is to mimic the OTTs in framing higher-level (meaning non-connection) services, but they have struggled with how to get started, both in targeting terms and in infrastructure terms.  I think it’s likely that the second problem needs to be solved in a way that delivers an agile, generally capable, infrastructure model that they can then trial-target as they build up confidence.

The problem in this area is obvious; they don’t have that model of infrastructure, they don’t know how to get it, and no vendor seems to be offering it.  NFV and SDN make sense in a context of an increasingly cloud-centric infrastructure model, but neither can really drive operators there.  They can only exploit.

The third focus is SDN and NFV, which operators have not abandoned but rather simply re-prioritized.  Even that comment may be, on my part, reading motive into what they’ve expressed.  I think that operators know that both SDN and NFV will play a big role in their future, but they’re coming to realize that, as I noted above, they are going to be important to exploit the cloud to do more with legacy services as they become more cloud-centric in infrastructure planning.  In short, though I don’t think any operator planner would say this, they see themselves migrating to a more Google-like service-centric infrastructure model that they’ll simply run some legacy stuff on for continuity.

How many operators really see this?  I can’t say, of course, but I have fairly good contact with 57 of them at the moment, and three or perhaps four would see things as I’ve just described.  But that’s not really the question.  The question is how many would buy in were they to be offered a pathway to that sort of future.  I think all of them would.

There are some missing pieces in all this happy realization, not the least being that while operators may be willing to step into the future, there’s still no pathway to be had.  The problem isn’t a hardware problem but a software problem, and it’s not strictly operations software or even service lifecycle management or MANO or SDN controllers.  What’s really needed is what in the software world is called middleware.  The future has to be built on software that’s designed to be infinitely agile and scalable.  Google, Amazon, and Microsoft all know that now, and Google in particular has been framing their infrastructure to support the agile model we’re talking about.  So, can operators follow?

No.  Operators don’t have the kind of software people to do it, because they’ve not recognized they need them.  Even vendors don’t have a lot of the right stuff, but they do have enough to make something happen here.  It’s critical for vendors that they do that, because open source projects aren’t going to get us to the right place quickly enough.  Chairman Pai is going to give the industry a gift, a gift of time.  But it’s not going to last forever.

Does Microsoft’s CycleComputing Deal Have Another Dimension?

They say that Microsoft is acquiring CycleComputing for its high-performance computing (HPC) capabilities, combating Amazon and Google.  They’re only half-right.  It’s combating Amazon and Google, but not so much about HPC.  It’s mostly about coordinating workflows in an event-driven world.

Traditional computing is serial in nature; you run a program from start to finish.  Even if you componentize the program and run pieces of it in the cloud, and even if you make some of the pieces scalable, you’re still talking about a flow.  That is far less true in functional computing and even less in pure event-driven computing, and if you don’t have a natural sequence to follow for a program, how do you decide what to run next?

Functional computing uses “lambda” processes that return the same results for the same inputs; nothing is stored within that can alter the way a process works from iteration to iteration.  This is “stateless” processing.  What this means is that as soon as you have the input to a lambda, you could run it.  The normal sequencing of things isn’t as stringent; it’s a “data demands service” approach.  You could almost view a pure functional program as a set of data slots.  When a process is run or something comes in from the outside, the data elements fit into the slots, and any lambda functions that have what they need can then run.  These could fill other slots, and so the process continues till you’re done.

This may sound to a lot of people who have been around the block, software-wise, like “parallel computing”.  In scientific or mathematical applications, it’s often true that pieces of the problem can be separated, run independently, and then combined.  The Google MapReduce query processing from which the Hadoop model for big data emerged is an example of parallelizing query functions for data analysis.

Event-driven applications are hardly massive database queries, but they do have some interesting parallelism connections.  If you have an event generated, say by an IoT sensor, there’s a good chance that the event is significant to multiple processes.  A realistic event-driven system would trigger all the applications/components that were “registered” for the event, and when those completed they could be said to generate other events that would be similarly processed.

In a true event-driven system you can’t sequence events as much as contextualize them.  Events generate other events, fill data fields, and trigger processes.  The process triggers, like the processes in our functional example, are a matter of setting conditions associated with what the processes need before they run.  Don’t ask for five fields, generate an event when you get each, and when they’re all in you do what you wanted to do with the data.

This is very much like parallel computing.  You have this massive math formula, a simple example of which might be:

A = f(x(f(y))/f(z)

This breaks down into three separate processes.  You need f(z), f(y), and f(x(f(y)).  You can start on your f(z) and f(y) when convenient, and when you get f(y) and the value of x you can run that last term and solve for A.  The coordination of what runs in parallel with what, and when, is very much like deciding what processes can be triggered in an event-driven system.  Or to put it another way, you can take some of the HPC elements and apply them to events.

If you follow the link to their website above, then on to “Key Features” you find that besides the mandatory administrative features, the only other feature category is workflow.  That’s what’s hard about event processing.

I’m not saying that big data or HPC is something Microsoft could just kiss off, but let’s face it, Windows and Microsoft are not the names that come unbidden to the lips of HPC planners.  Might Microsoft want to change that?  Perhaps, but is it possible that such an attempt would be just a massive diversion of resources?  Would it make more sense to do the deal if there was something that could help Microsoft in the cloud market overall?  I think so.

Even if we neglect the potential of IoT to generate massive numbers of events, I think that it’s clear from all the event-related features being added to the services of the big public cloud providers (Amazon, Google, and Microsoft) that these people think that events are going to be huge in the cloud of the future.  I think, as I’ve said in other blogs, that events are going to be the cloud of the future, meaning that all the growth in revenue and applications will be from event-driven applications.  I also think that over the next decade we’ll be transforming most of our current applications into event-driven form, making events the hottest thing in IT overall.  Given that, would Microsoft buy somebody to get some special workflow skills applicable to all parallel applications?

In fact, any cloud application that is scalable at the component level could benefit from HPC-like workflow management.  If I’ve got five copies of Component A because I have a lot of work for it, and ten of Component B for the same reason, how do I get work from an Instance of A to an Instance of B?  How do I know when to spawn another instance of either?  If I have a workflow that passes through a dozen components, all of which are potentially scalable, is the best way to divide work to do load-balancing for each component, or should I do “path-flow” selection that picks once up front?  Do I really need to run the components in series anyway?  You get the picture.

We’ve had many examples of parallel computing in the past, and they’ve proven collectively that you can harness distributed resources to solve complex problems if you can “parallelize” the problems.  I think that the cloud providers have now found a way to parallelize at least the front-end part of most applications, and that many new applications like IoT are largely parallel already.  If that’s true, we may see a lot of M&A driven by this same trend.

Does AT&T’s Digital Life Prove There’s No Life in Digital?

The Street report that ATT is considering the sale of its Digital Life division should have a lot of telco transformation people on edge.  This is the division that handles consumer offerings like home security, long seen as the basis for any shift of a network operator into non-connection services.  Is it not working?  Worse, AT&T is a poster-child for SDN/NFV transformation at the infrastructure level.  Is that transformation then not producing what’s needed to support a shift to services beyond connection?  If so, then this could be very bad news.  The question is just what the “news” really is.

AT&T has been spending a lot on M&A, most notably and recently the still-pending-approval deal with Time Warner, but earlier the DirecTV deal.  The media deals make AT&T the largest pay-TV provider.  In contrast, the Digital Life stuff is about six-tenths of a percent of AT&T revenues, and AT&T decided to sell off DirecTV’s home security business when it did the acquisition.  On the surface, it looks like the home security and even home services space doesn’t merit operator attention.  Verizon dropped its own offering several years ago, remember.

It’s always difficult to get an official reason why a given service idea seems to be heading for the trash.  In the case of home security, some of my operator friends have been willing to comment off the record.  They say that there are three reasons for the problems with home security as a service.  First, incumbent competition.  Second, low ARPU.  Finally, unfavorable cross-contamination of other services.  Let’s take them one at a time.

Most of the people who read this blog probably have a home security system.  Most upscale developments include them at least as an option, and many communities have 100% penetration.  The homes are wired when built, or by the first security firm that comes in.  The homeowner will then go back to that company for changes in the system, including the inevitable repairs for the sensor pieces.  Increasingly, the aftermarket for systems is supported by wireless models that involve self-installation by the homeowner.  Data suggests that this is a down market segment, with less revenue potential overall.

The problem here is that unless you want to try to be a pure wireless-self-install player, you need to have installation services.  Operators generally contract these out, which means there is effectively no profit for them in the installation.  Since the operators’ names aren’t household words in security services, they have to advertise heavily to even get a play, and that means that, given the zero-profit installation, the initial sale probably won’t even pay back marketing costs for several years.

The revenue side is a big issue in other ways.  Most of the money in security systems comes from the monitoring process.  Operators obviously have call centers, so they in theory should be able to monitor the home sensors and act, but their costs for this have typically run well above the costs of independent security firms.  Some of my contacts told me that if they matched monitoring prices with incumbent firms, their profits on monitoring would be about half what they’d like, and well below the profits of those incumbents for the same services.

Perhaps the biggest issue is the downward price pressure coming into the market.  The operator contacts I’ve listened to on this tell me that their customers are not the high-end users in most cases, but perhaps a bit below mid-market.  This space is already under price pressure from increased competition, and if strike prices for services continue to fall, operators are in another market where profit declines seem baked in.  Your customer gets worse every day.

In more ways than one, perhaps.  People are way more likely to get rabid over a problem with a security system provider than even one with their Internet or TV.  There are inevitably callbacks with home security, often decades after the sale.  Many of them don’t result in incremental revenue, and if the operator has contracted for installation they’d likely have to contract for some of this stuff.  The rest would end up going to that call center where operators already have higher base costs.  In short, it’s going to be hard to provide quality support.

What happens if they don’t?  Sure they could drop the security service, but how many customers do you suppose won’t threaten their provider with loss of the whole relationship?  So, for a service you might make a minimal profit on, you could be risking the whole bundle.  Let me see…little ARPU upside, big customer loss downside…why did I think this was a good idea?

Probably because you thought that “moving up the food chain” from connections to OTT services would be easier.  Perhaps it looked like a technical problem, or (if you read the tech news) a political one within your own organization.  Apparently, it’s not that easy.  The truth is that what makes Google or Facebook or Amazon winners isn’t just that they offered something over the top.  It was because they offered something unique in the market.  You don’t find those niches by going out to look for services others now sell that you could also sell.

The reason this stuff is relevant is that the concept of NFV is almost totally dependent on virtual CPE, which in turn can’t be a broad-based service if you can only sell it to businesses.  You could argue in favor of consumer vCPE providing you could provide some service kickers for it.  The services of security (firewall), and DNS/DHCP are already present in under-fifty-buck home gateways.  At best, operators would have to give them away, and that assumes they could even justify cloud-hosting features that can be purchased that cheaply.  What services would be credible to consumers beyond those gateway services?  Obviously, home monitoring and security would be on top of the list, which is why the hint that those services can’t be profitable enough is critical.

However, it’s not NFV that’s the problem here, only vCPE, and that’s a problem for the same reason home security as an OTT service is a problem.  It aims at stuff already being done, and all of that stuff is very likely to pose the very same challenges as home security does.  NFV is only threatened to the extent that it relies on “basic” vCPE, which unfortunately it probably does way too much.  If NFV wants to ride the vCPE train, they’d need something that is unique.

SD-WAN, in a form that links the edge elements (usually boxes today, but often cloud components, and easily translated into virtual network functions for NFV) to internal service features for added capability and differentiation, is an easy answer.  If operators linked SD-WAN with vCPE they could create an offering that had real sticking power.  They’d also reduce the risk that they’ll lose customers to Internet VPNs, a likely outcome of their current (non-) strategy.  Versa follows this general model in their relationships with CenturyLink, Comcast, and Verizon, but I think it could be tied better with infrastructure-level services.  And, in any event, SD-WAN is still a connection service with a very limited (business) appeal.  The Internet took us out of the age where business services dominated.  SD-WAN can ease operators out of traditional connection services, but they have to know what they’re easing into.

You could take a similar view of home security and monitoring.  Why would operators elect to jump in and go head-to-head with incumbent providers in a market that’s facing declining prices already?  Don’t offer customers the same thing they already have or can get elsewhere at a bargain price.  Offer something unique.  Tie in external sensors and analytics to predict security risks as they develop.  Correlate multiple sensor inputs to help define what’s likely happening.  Correlate alerts in nearby homes, and IoT sensor information.  Think about what advanced technology, applied by operators at massive scale, could do for home monitoring.  It beats scrabbling in the market dust for a few tenths of a point of profit margin.

Agility is what this means, pure and simple.  You have to be able to frame new services to meet market opportunities, not to try to catch up with the competition.  The whole value proposition for things like SDN and NFV and even service automation is tied to agile response to market opportunity, because even cost control is just a short-term way of getting a payback for an agility and automation investment.  That means that firms need to be looking at a reasonable platform for delivery of OTT services, one that can be reused and exploited.  SDN and NFV can be part of that platform, but they’re not the whole story.

What we’ve learned in the last two decades is that what users want from broadband isn’t connectivity, it’s information and experiences.  “Climbing the OSI stack” to add connection functionality isn’t a long-term answer.  In fact, these kinds of services are really best as means of translating current services to exploit carrier cloud.  If you don’t have carrier cloud to exploit, then you don’t have the best growth medium for things like SDN and NFV.

Google built its network to deliver services.  They’re totally open about its structure.  Maybe the network operators should take a look at it.

What Should We Expect from Controllers and Infrastructure Managers?

One of the key pieces of network functions virtualization (NFV) is the “virtual infrastructure manager” or VIM.  In the E2E ETSI model for NFV, the VIM takes instructions from the management and orchestration element (MANO) and translates them to infrastructure management and control processes.  One of the challenges for NFV implementation is just what shape these instructions take and just how much “orchestration” is actually done in MANO versus in the VIM.  To understand the challenges, we have to look at the broader issue of how services as abstractions are translated to infrastructure.

A service, in a lifecycle sense, is a cooperative behavior set impressed on infrastructure through some management interface or interfaces.  Thus, a service is itself an abstraction, but the tendency for decades has been to view services as a layer of abstractions, the higher being more general than the lower.  Almost everything we see today in service lifecycle management or service automation is based on an abstraction model.

The original concept probably came from the OSI management standards, which established a hierarchy of element, network, and service management.  It’s pretty clear that the structure was intended to abstract the notion of “service” and define it as being a set of behaviors that were first decomposed to network/administrative subsets, and finally down to devices.  This was the approach used by almost all router and Ethernet vendors from the ‘90s onward.

If we presume that there’s a service like “VPN” it’s not hard to see how that service could be first decomposed by the administrative (management) domains that were needed to cover the scope of the service, and then down to the elements/devices involved.  Thus, we could even say that “decomposition” is an old concept (even if it might have gotten forgotten along the way to new developments).

The Telemanagement Forum (TMF) largely followed this model, which became known as “MTNM” for Multi-Technology Network Management.  An implicit assumption in both the old service/network/element hierarchy and the MTNM concept was that the service was a native behavior of the underlying networks/devices.  You just had to coerce cooperation.  What changed the game considerably was the almost-parallel developments of SDN and NFV.

SDN networks don’t really have an intrinsic service behavior that can be amalgamated upward to create retail offerings.  A white-box switch without forwarding policy control sits there and eats packets.  NFV networks require that features be created by deploying and connecting software pieces.  Thus, the “service behaviors” needed can’t be coerced from devices, they have to be explicitly created/deployed.  This is the step that leads to the abstract concept of an “infrastructure manager”.

Which is what we should really call an NFV VIM.  All infrastructure isn’t virtual; obviously today most is legacy devices that could still be managed and service-coordinated the old way.  Even in the future it’s likely that a big piece of networks will have inherent behavior that’s managed by the old models.  So an “IM” is a VIM that doesn’t expect everything to be virtual, meaning that on activation it might either simply command something through a legacy management interface or deploy and connect a bunch of features.  In SDN, an IM is the OpenFlow controller, and in particular those infamous northbound interfaces (NBIs).

It’s comforting, perhaps, to be able to place the pieces of modern network deployment and management into a model that can also be reconciled to the past.  However, we can’t get cocky here.  We still have issues.

I can abstract a single management interface, at a low level.  I can abstract a high-level interface.  The difference is that if I do abstraction at a low level, then I have to be able to compose the service myself, and issue low-level commands as needed to fulfill what I’ve composed.  If I can abstract at a high level, I have the classic “do-job” command—I can simply tell a complex system to do what I want.  In that case, though, I leave the complexity of composition, optimization, orchestration, or whatever you’d like to call it, to that system.

This is a natural approach to take in the relationship between modern services and OSS/BSS systems.  Generally, service billing and operations management at the CRM level depend on functional elements, meaning services and meaningful, billable, components.  Since billable elements are also a convenient administrative breakdown, this approach maps to the legacy model of network management fairly well.  However, as noted, this supposes that there’s a sophisticated service modeling and lifecycle management process that lives below the OSS/BSS.

That’s not necessarily a bad thing, because we’ve had a pretty hard separation between network management and operations and service management and operations for decades.  However, having two ships-in-the-night operations processes running in parallel can create major issues of coordination in a highly agile environment.  I’m not saying that the approach can’t work, because I think it can.  I am saying that you have to co-evolve OSS/BSS and NMS to make it work through a virtualization transition in infrastructure.

The thing that seems essential is the notion of a service plane separate from the resource plane.  This separation acknowledges the way operators have organized themselves (CIOs run the former, and COOs the latter), and it also acknowledges the fact that services are compositions built from resource behaviors.  The infrastructure has a set of domains, a geographic distribution, and a set of technical capabilities.  These are framed into resource-level offerings (which I’ve called “behaviors” to separate them from the “service” elements), and the behaviors are composed in an upward hierarchy to the services that are sold.

Infrastructure managers, then, should be exporters of those “behaviors”.  You should, in your approach to service modeling, be able to ask an IM for a behavior, and have it divide the request across multiple management domains.  You should also be able to call for a specific management domain in the request.  In short, we need to generalize the IM concept even more than we’re working to generalize it today, to allow for everything from “do-job” requests for global services to “do-this-specifically” requests for an abstract feature from a single domain.

But we can’t dive below that.  The basic notion of intent modeling demands that we always keep a functional face on our service components.  Behaviors are functional.  Service components are functional.  In the resource domain, they are decomposed into implementations.

I do think that the modeling approach to both service and resource domains should be the same.  Everything should be event-driven because that is clearly where the cloud is going, and if service providers are going to build services based on compute-hosted features, they’re darn sure not going to invent their own architecture to do the hosting and succeed.  The cloud revolution is happening and operators first and foremost need to tap it.  Infrastructure management and controller concepts have to be part of that tapping.