What’s Missing in Operator SDN/NFV Visions?

The news that AT&T and Orange are cooperating to create an open SDN and NFV environment is only the latest in a series of operator activities aimed at moving next-gen networks forward.  These add up to a serious changing-of-the-guard in a lot of ways, and so they’re critically important to the network industry…if they can really do what they’re supposed to.  Let’s take a look at what the key issues are so we can measure the progress of these operator initiatives.

“Box networking” has created current infrastructure, and boxes are hardware elements that have long useful lives and development cycles.  To insure they can build open networks from boxes, operators have relied on standards bodies to define features, protocols, and interfaces. Think of box networks as Lego networks; if the pieces fit then they fit, and so you focus on the fitting and function.

Today’s more software-centric networks are just that, software networks.  With software you’re not driving a five-to-seven-year depreciation stake in the ground.  Your software connections are a lot more responsive to change, and so software networks are a bit more like a choir, where you want everything to sound good and there are a lot of ways of getting there.

The biggest challenge we’ve faced with SDN and NFV is that they have to be developed as software architectures, using software projects and methods, and not by box mechanisms.  In both SDN and NFV we have applied traditional box-network processes to the development, which I’ve often characterized as a “bottom-up” approach.  The result of this was visible to me way back in September of 2013 when operators at an SDN/NFV event were digging into my proposals for open infrastructure and easy onboarding of VNFs—two of the things operators are still trying to achieve.  When you try to do the details before the architecture, things don’t fit right.

The problem with SDN and NFV isn’t competing standards or proprietary implementations as much as standards that don’t really address the issues.  The question is whether the current operator initiatives will make them fit better, there are a number of political, technical, and financial issues that have to be overcome.

The first problem is that operators have traditionally done infrastructure planning in a certain way, a way that is driven by product and technology initiatives largely driven by vendors.  This might sound like operators are just punting their caveat emptor responsibilities, but the truth is that it’s not helpful in general for buyers to plan the consumption of stuff that’s not on the market.  Even top-down, for operators, has always had an element of bottom-ness to it.

You can see this in the most publicized operator architectures for SDN/NFV, where we see a model that still doesn’t really start with requirements as much as with middle-level concepts like layers of functionality.  We have to conform to current device capabilities for evolutionary reasons.  We have to conform to OSS/BSS capabilities for both political and technical reasons.  We have to blow kisses at standards activities that we’ve publicly supported for ages, even if they’re not doing everything we need.

The second problem is that we don’t really have a solid set of requirements to start with, we have more like a set of hopes and dreams.  There is a problem that we can define—revenue per bit is falling faster than cost per bit.  That’s not a requirement, nor is saying “We have to fix it!” one.  NFV, in particular, has been chasing a credible benefit driver from the very first.  Some operators tell me that’s better than SDN, which hasn’t bothered.  We know that we can either improve the revenue/cost gap by increasing revenue or reducing cost.  Those aren’t requirements either.

Getting requirements is complicated by technology, financial, and political factors.  We need to have specific things that next-gen technology will do in order to assign costs and benefits, but we can’t decide what technology should do without knowing what benefits are needed.  Operators know their current costs, for example, and vendors seem to know nothing about them.  Neither operators nor vendors seem to have a solid idea of the market opportunity size for new services.  In the operator organizations, the pieces of the solution spread out beyond the normal infrastructure planning areas, and vendors specialize enough that few have a total solution available.

Despite this, the operator architectures offer our best, and actually a decent, chance of getting things together in the right way.  The layered modeling of services is critical, the notion of having orchestration happening in multiple places is likewise.  Abstracting resources so that existing and new implementations of service features are interchangeable is also critical.  There are only two areas where I think there’s still work to be done, and where I’m not sure operators are making the progress they’d like.  One is the area of onboarding of virtual network functions, and the other is in management of next-gen infrastructure and service elements.  There’s a relationship between the two that makes both more complicated.

All software runs on something, meaning that there’s a “platform” that normally consists of middleware and an operating system.  In order for a virtual function to run correctly it has to be run with the right platform combination.  If the software is expected to exercise any special facilities, including for example special interfaces to NFV or SDN software, then these facilities should be represented as middleware so that they can be exercised correctly.  A physical interface is used, in software, through a middleware element.  That’s especially critical for virtualization/cloud hosting where you can’t have applications grabbing real elements of the configuration.  Thus, we need to define “middleware” for VNFs to run with, and we have to make the VNFs use it.

The normal way to do this in software development would be to build a “class” representing the interface and import it.  That would mean that current network function applications would have to be rewritten to use the interface.  It appears (though the intent isn’t really made clear) that the NFV ISG proposes to address this rewriting need by adding an element to a VNF host image.  The presumption is that if the network function worked before, and if I can build a variable “stub” between that function and the NFV environment that interfaces with that function in every respect as it would have been interfaced on its original platform, my new VNF platform will serve.  This stub function has to handle whatever the native VNF hosting environment won’t handle.

This is a complicated issue for several reasons.  The biggest issue is that different applications require different features from the operating system and middleware, some of which work differently as versions of the platform software evolves.  It’s possible that two different implementations of a given function (like “Firewall”) won’t work with the same OS/middleware versions.  This can be accommodated when the machine image is built, but with containers versus VMs you don’t have complete control over middleware.  Do we understand that some of our assumptions won’t work for containers?

Management is the other issue.  Do all “Firewall” implementations have the same port and trunk assignments, do they have the same management interfaces, and do you parameterize them the same way?  If the answer is “No!” (which it usually will be) then your stub function will either have to harmonize all these things to a common reference or you’ll have to change the management for every different “Firewall” or other VNF implementation.

I think that operators are expecting onboarding to be pretty simple.  You get a virtual function from a vendor and you can plug it in where functions of that type would fit, period.  All implementations of a given function type (like “Firewall”) are the same.  I don’t think that we’re anywhere near achieving that, and to get there we have to take the fundamental first step of defining exactly what we think we’re onboarding, what we’re onboarding to, and what level of interchangeability we expect to have among implementations of the same function.

The situation is similar for infrastructure, though not as difficult to solve.  Logically, services are made up of features that can be implemented in a variety of ways.  Operators tell me that openness to them means that different implementations of the same feature would be interchangeable, meaning VPNs are VPNs and so forth.  They also say that they would expect to be able to use any server or “hosting platform” to host VNFs and run NFV and SDN software.

This problem is fairly easy to solve if you presume that “features” are the output of infrastructure and the stuff you compose services from.  The challenge lies on the management side (again) because the greater the difference in the technology used to implement a feature, the less natural correspondence there will be among the management needs of the implementations.  That creates a barrier both to the reflection of “feature” status to users and to the establishment of a common management strategy for the resources used by the implementation.  It’s that kind of variability that makes open assembly of services from features challenging.

Infrastructure has to effectively export a set of infrastructure features (which, to avoid confusion in terms, I’ve called “behaviors”) that must include management elements as well as functional elements.  Whether the management elements are harmonized within the infrastructure with a standard for the type of feature involved, or whether that harmonization happens externally, there has to be harmony somewhere or a common set of operations automation practices won’t be able to work on the result.  We see this risk in the cloud DevOps market, where “Infrastructure-as-Code” abstractions and event exchanges are evolving to solve the problem.  The same could be done here.

Given all of this, will operator initiatives resolve the barriers to SDN/NFV deployment?  The barrier to that happy outcome remains the tenuous link between the specific features of an implementation and the benefits needed to drive deployment.  None of the operator announcements offer the detail we’d need to assess how they propose to reap the needed benefits, and so we’ll have to reserve judgment on the long-term impact until we’ve seen enough deployment to understand the benefit mechanisms more completely.

Looking Ahead to the Operators’ Big Fall Technology Planning Cycle

Every year in the middle of September operators launch a technology planning cycle.  By mid-November it’s done, and the conclusions reached in the interim have framed the spending and business plans for the operators in the coming year.  We’re about two months from the start of the cycle, and it’s at this point that vendors need to be thinking about how their key value propositions will fare, and what they can do by early October at the latest to get the most bang from their positioning bucks.

According to operators, the big question they have to consider this fall is how to stem the tide in the declining profit-per-bit curve.  Network cost per bit has fallen for decades but revenue per bit has fallen faster, largely driven by the all-you-can-eat Internet pricing model and the lack of settlement among Internet and content providers.  SD-WAN, which is increasingly an Internet overlay that competes with provisioned VPNs, threatens to further undermine revenue per bit and introduce new service competitors who look like OTT players.

The challenge, say the operators I’ve talked with regularly, is that there’s no clear path to profit-per-bit success, either at the approach level or at the technology level.  Cost management is a credible approach, but can you actually achieve significant cost reductions?  Revenue gains would be even better, but how exactly do you introduce a new service that’s not just a cheaper version of a current one, and thus likely to exacerbate rather than solve the problem?

Capex isn’t the answer.  Operators say that new technology choices like SDN and NFV might reduce capex somewhat, but the early calculations by CFOs suggests that the magnitude of the reduction wouldn’t be any larger than the potential savings achieved by putting pricing pressure on vendors.  Huawei has clearly benefitted from this everywhere except the US, and some operators here are starting to lobby for a relaxation of the restrictions on Huawei’s sales here.  In any case, a fork-lift change to current infrastructure is impossible and a gradual change doesn’t move the needle enough.

Opex efficiency is a better candidate, but CFOs tell me that nobody has shown them a path to achieving broad operations economies through SDN or NFV.  Yes, they’ve had presentations on how one particular application might generate operations improvements, but the scope of the improvement is too small to create any real bottom-line changes.  Not only that, the math in the “proofs” doesn’t even get current costs right, much less present a total picture of future costs.  However, this is the area where CFOs and CEOs agree that a change in the profit curve is most likely to be possible.  One priority for the fall is exploring just how operations efficiencies could be achieved quickly and at an acceptable level of risk.

One interesting point on the opex front is that operators are still not prioritizing an operations-driven efficiency transformation that would improve cost and agility without changing infrastructure.  Part of the reason is that while vendors (especially Cisco) are touting a message of conserving current technology commitments, none of the equipment vendors are touting an operations-driven attack on opex.  In fact, only a few OSS/BSS vendors are doing that, and their engagement with the CIO has for some reason limited their socialization of the approach.  Technology planning, for operators, has always meant infrastructure technology planning.  Will some big vendor catch on to that, and link operations transformation to the opex benefit chain?  We’ll see.

On the revenue side, most operators believe that trying to invent a new service and then promote it to buyers is beyond their capabilities.  Simple agility-driven changes to current services hasn’t proved out in their initial reviews; they tend to be sellable only to the extent that they lower network spending.  The current thinking is that services that develop out of cloud trends, IoT, agile development and continuous application delivery, and other IT trends are more reasonable.  This fall they’d like to explore what specific technology shifts would be needed to exploit these kinds of trends, and which would present the best ROI.

There are two challenges operators face in considering new revenue.  First is the fact that the credible drivers for new services, the ones I noted in the last paragraph, are all things waiting to mature even as market targets much less as a place where real prospects are seeking real services.  Operators think it could take several years for something to develop on the revenue side, which means that something else has to be done first to reduce profit pressure.

Another issue operators are struggling with this fall is 5G.  Nobody really expects it to be a target for major investment in 2017, and all the operators see a 5G transition as a kind of new-weapon-in-an-arms-race thing.  You can’t not do it because someone for sure will.  It would be nice if it were extravagantly profitable, but fully three-quarters of mobile operators think that they’d have to offer 5G services for about the same price as 4G.  Further, operators in the EU and the US think that regulatory changes are more likely to reduce profits on mobile services than to increase them.  Roaming fees are major targets, for example, and neutrality pressure seems likely to forever kill the notion of content providers paying for peering.  In fact, content viewing that doesn’t impact mobile minutes is becoming a competitive play.

One question operators who retain copper loop have is whether 5G could be used as an alternative last-mile technology, making every user who isn’t a candidate for FTTH or at least FTTC into a wireless user.  Another question is whether 5G might help generate opportunity in IoT or other areas.  Unfortunately, few of them think there will be much in this area to consider this fall.  It’s too early, and there are regulatory questions that vendors are never much help in dealing with.

The big question with 5G is whether it introduces an opportunity to change both the way mobile infrastructure works and the way that mobile backhaul and metro networks converge.  One operator commented that the mobile infrastructure components IP Multimedia Subsystem (IMS) and Evolved Packet Core (EPC) were “concepts of the last century”, referring to the 3G.IP group that launched the initiatives in 1999.  There’s little question in the mind of this operator, and others, that were we to confront the same mission today we’d do it differently, but nobody believes that can be done now.  Instead they wonder whether application of SDN and NFV to IMS and EPC could also transform metro, content delivery, and access networking.

This is the big question for the fall, because such a transformation would lay the groundwork for a redesign of the traditional structure of networks, a redesign focused on networks as we know them becoming virtual overlays on infrastructure whose dominant cost is fiber and RAN.  If operators made great progress here, they could revolutionize services and networks, but the operators I’ve talked with say that’s going to be difficult.  The problem in their view is that the major winners in a transformation of this kind, the optical vendors, have been slow to promote a transformative vision.  While some operators like AT&T have taken it on themselves to build software to help with virtualization, most believe that vendors are going to have to field products (including software and hardware) with which the new-age 5G-driven infrastructure could be built.

The transformation of metro is the most significant infrastructure issue in networking, and SDN or NFV are important insofar as they play a role in that, either driving it or changing the nature of the investments being considered.  Operators believe that too, and their greatest concern coming into this fall is how to reshape metro networking.  They know they’ll need more capacity, more agility, and they’re eager to hear how to get it.

I’ve been an observer and in some cases contributor to these annual planning parties for literally decades, and this is one of the most interesting I’ve ever seen.  I have no idea what’s going to come out of this because there’s less correspondence between goals and choices in this cycle than in any other.  One thing is for sure; if somebody gets things right here there could be a very big payday.

IBM, VMware, the Cloud, and IT Trends

IBM and VMware both reported their numbers and the results look like a win for the cloud.  The question remains whether a win for the cloud translates into a win for vendors and for IT overall, though.  Neither of the two companies really made a compelling case for a bright future, and when you combine this with the Softbank proposal to buy ARM, you have a cloudy picture of the future.

IBM beat expectations, but the expectations were far from lofty.  The company continues to contract quarter-over-quarter, for the very logical reason that if you’re an IT giant who’s bet on a technology whose benefit to the user is lower IT costs, you’re facing lower revenue.  VMware can buck this trend because they’re not a server incumbent and clouds and virtualization reduce server spending.  Broad-based players can’t.

Perhaps the big news from IBM was that it appears that, adjusting for M&A, software numbers are declining and business services are soft.  Let’s see here; you divest yourself of hardware and focus on technology that stresses a shift of enterprises from self-deployed to public hosting, and you’re also shrinking in software and services.  What else is there in IT?

IBM has some solid assets.  They have a good analytics portfolio, exceptional account control in large accounts, a number of emerging technology opportunities like IoT and Watson, and strong R&D with things like quantum computing.   Its challenge is to somehow exploit these assets better, and that’s where IBM still has problems.

The first problem is marketing.  Sales account control is not a growth strategy because there’s no significant growth in the number of huge enterprises that can justify on-site teams.  IBM needs to be down-market and getting there is a matter of positioning and marketing.  They used to be so good at marketing that almost everything IBM did or said became industry mainstream.  Not anymore.

The second problem is strategic cohesion.  OK, perhaps quantum computing will usher in a new age of computing, but “computing” means “computers”, right?  Hasn’t IBM been exiting the hardware business?  Software today means open-source, and IBM has a lot of that, but you don’t sell free software.  Watson and IoT are emerging opportunities, but they could take years to emerge and it’s obvious that IBM needs to do a better job of promoting them.

The third problem is the lack of a clear productivity-benefit dimension to IBM’s story.  The cloud isn’t by nature accretive to IT spending, it’s the opposite.  To get a net gain from the cloud you have to associate the cloud with a productivity benefit that your buyers don’t realize now, without it.  IBM’s story in this area is muddled when it could be strong.

How about VMware and its implications?  First, clearly, VMware as a virtualization-and-cloud player that doesn’t have a stake in the product areas that the cloud consolidates, would see the cloud as a net gain.  It can focus on the cost driver and exploit it, which is a considerably easier task from a sales and marketing perspective.  How long does it take to say “This will be 30% cheaper” versus explaining the ins and outs of productivity enhancement through some new technology?

VMware is also reaping the benefits of the combination of cloud publicity and OpenStack issues.  There is no question in my mind, nor in the minds of most cloud types I’ve talked with, that OpenStack is where the cloud is going.  The problem is that like all open-source software, OpenStack is a bit disorderly compared to commercial products.  VMware has a clear market opportunity in supporting the evolution of limited-scope virtualization to virtual-machine-agile hosting anywhere and everywhere.  They can develop feature schedules and marketing plans to support their goals, where OpenStack as a community project is harder to drive in a given direction.  VMware can also be confident they’ll reap the benefit of their own education efforts with their stuff, which can’t be said for vendors who rely on OpenStack because it is open-sourced.  VMware’s Integrated OpenStack is in fact at least one of two leaders in the OpenStack space, so VMware has little to fear from OpenStack success.

VMware also seems to be gaining some ground over open-source in the container space.  Docker gets all the container PR, but enterprises tell me that Docker adoption isn’t easy and that securing a supported Docker package is costly enough that VMware’s container solution isn’t outlandish.  Not only that, most enterprises interested in virtual hosting have already implemented VMware.

The VMware NSX (formerly Nicira) SDN overlay technology, which VMware is rolling into a broad software-defined-data-center (SDDC) positioning and also exploiting for the cloud.  It announced a management platform specifically for NSX, and that makes it far easier for enterprises to adopt in the data center.  There’s plenty of run room for expansion of NSX functionality and plenty of room for market growth.

Finally, VMware’s alliance with IBM is helping VMware exploit IBM’s commitment to the cloud perhaps better than IBM can.  IBM’s SoftLayer cloud is powered by VMware and IBM’s support is a highly merchandisable reference.  How this will fare when/if Dell picks up VMware is another matter.

Perhaps the big question for VMware, in fact.  Dell, and its rival HPE, seem to be taking the opposite tack to IBM, focusing on hardware and platform software rather than divesting hardware.  That means that there could be no long-term conflict with SoftLayer even if the Dell buy happens.  However, it’s not clear whether getting into, or getting rid of, hardware is the best approach, and it may well be that neither extreme is viable.  And VMware’s benefit in the cloud-and-virtualization space is transitory.  First, being the supplier for what enterprise buyers are evolving from is useful only until they’re done evolving.  Second, all OpenStack sources will eventually converge on the same baseline product with the same stability.  Can Dell strategize an evolution, and exploit things like NSX?

The cloud isn’t hurting IT, it’s simply an instrument of applying the hurt factor, which is the sense that the only thing good about future products and technologies is their ability to control costs.  Commodity servers and open-source software, hosted virtual machines and server consolidation—you get the picture.  This is where the Softbank deal for ARM comes in.  ARM is a leader in embedded systems, which today means largely mobile devices and smart home devices.  Any commoditization trend in hardware tends to favor chips, which are an irreplaceable element.  We already use more compute power out of businesses than in them, and that trend is likely to continue.  Softbank is betting on it, in fact.

Even the cloud will be impacted by the current cost-driven vision of change.  My model has said consistently that a cost-based substitution of hosted resources for data center resources would impact no more than 24% of IT spending.  A productivity-driven vision, if one could be promoted, could make public cloud spending half of total IT spending, and transform over half the total data centers to private clouds.  That’s the outcome that IBM should be promoting, of course.

The point here is that we’re in a tech-industry-wide transformation that’s driven by a loss of new and valuable things.  All the players, whether they’re in IT or networking, are struggling to realign their businesses to the evolving conditions.  Some like VMware are currently favored, others like ARM are favored more in the long term, and some like IBM seem to be trying to find favor.  Eventually, everyone will have to change if current trends continue, and it’s hard to see what will reverse the cost focus any time soon.

A Compromise Model of IoT Might Be Our Best Shot

One of the problems with hype is that it distorts the very market it’s trying to promote, and that is surely the case with the Internet of Things.  The notion of a bunch of open sensors deployed on the Internet and somehow compliant with security/privacy requirements is silly.  But we’re seeing announcements now that reflect a shift toward a more realistic vision—from GE Digital’s Predix deals with Microsoft and HPE to Cisco’s Watson-IBM edge alliance.  The question is whether we’re at risk of throwing the baby out with the bathwater in abandoning the literal IoT model.

The Internet is an open resource set, where “resources” are accessed through simple stateless protocols.  There’s no question that this approach has enriched everyone’s lives, and few question that even with the security and privacy issues the net impact has been positive.  In a technical sense, IoT in its purist form advocates treating sensors and controllers as web resources, and it’s the risks that sensors and controllers would be even more vulnerable to security and privacy issues that have everyone worried.  You can avoid that by simply closing the network model, making IoT what is in effect a collection of VPNs.  Which, of course, is what current industrial control applications do already.

We need a middle ground.  Call it “composable resources” or “policy-driven resource access” or whatever you like, but what’s important here is to preserve as much of the notion of openness as we can, consistent with the need to generate ROI for those who expose sensors/controllers and the need to protect those who use them.  If we considered this in terms of the Internet resource model, we’d be asking for virtual sensors and controllers that could be protected and rationed under whatever terms the owners wanted to apply.  How?

A rational composable IoT model would have to accomplish four key things:

  1. The sensors and controllers have to be admitted to the community through a rigorous authentication procedure that guarantees everyone who wants to use them that they’d know who really put them up and what they really represent, including their SLA.
  2. The sensors and controllers have to be immunized against attack, including DDoS, so that applications that depend on them and human processes that depend on the applications can rely on their availability.
  3. The information available from sensors and the actions that can be taken through controllers have to be standardized so that applications don’t have to be customized for the devices they use. It’s more than standardizing protocols, it’s standardizing the input/output and capabilities so the devices are open and interchangeable.
  4. Access to information has to be policy-managed so that fees (if any) can be collected and so that public policy security/privacy controls can be applied.

If you look at the various IoT models that have been described in open material, I think you can say that none of these models address all these points, but that most or all of them could be made to address them by adding a kind of “presentation layer” to the model.

The logical way to address this is to transliterate the notion of an “Internet of Things” to an “Internet of Thingservices”.  We could presume that sensors and controllers were represented by microservices, which are little nubbins of logic that respond to the same sort of HTML/HTTP commands that web servers do.  A microservice could look, to a user, like a sensor or controller, but since it’s a software element it’s really only representing one, or maybe many, or maybe an analytic result of examining a whole bunch of sensors or sensor trends.

This kind of indirection has an immediate benefit in that it can apply any kind of policy filtering you’d like on access to the “device’s microservice”.  The device itself can be safely hidden on a private network and you get at it via the microservice intermediary, which then applies all the complicated security and policy stuff.  The sensor isn’t made more expensive by having to add that functionality.  In fact, you can use any current sensor through a properly connected intermediary.

The microservice can also represent a logical device, just as a URL represents a logical resource.  In content delivery applications a user clicks a URL that decodes to the proper cache based on the user’s location (and possibly other factors).  That means that somebody could look for “traffic-sensor-on-fifth-avenue-and-33rd” and be connected (subject to policy) to the correct sensor data.  That data could also be formatted in a standard way for traffic sensor data.

You could also require that the microservices be linked to a little stub function that you make into a service on a user’s private network.  That means that any use of IoT data could be intermediated through a network-resident service, and that access to any data could be made conditional on a suitable service being included in the private network.  There would then be no public sensor at all; everyone would have to get a proxy.  You could attack your own service but not the sensor, or even the “real” sensor microservice.

I know that a lot of people will say that this sort of thing is too complicated, but the complications here are in the requirements and not in the approach.  What you don’t do in microservice proxies you have to do in the sensors, if you put them directly online, or you have to presume would be built in some non-standard way into applications that expose sensor/control information or capabilities.  That’s how you lose the open model of the Internet.  That, or by presuming that people are going to deploy sensors and field privacy and security lawsuits out of the goodness of their hearts with no possibility of profits.

I’d love to see somebody like Microsoft (who has a commitment to deploy GE Digital’s Predix IoT on Azure) describe something along these lines and get the market thinking about it.  There are ways to achieve profitable, rational, policy-compliant IoT and we need to start talking about them, validating them, if we want IoT to reach its full potential.

Tapping the Potential for Agile, Virtual, Network and Cloud Topologies

You always hear about service agility as an NFV goal these days.  Part of the reason is what might cynically be called “a flight from proof”; the other benefits touted for NFV have proven to be difficult to validate or to size.  Cynicism notwithstanding, there are valid reasons to think that agility at the service level could be a positive driver, and there are certainly plenty who claim it.  I wonder, though, if we’re not ignoring a totally different kind of agility in our discussions—topology agility.

For most vendors and operators, service agility means reducing time to revenue.  In the majority of cases the concept has been applied specifically to the provisioning delay normally associated with business services, and in a few cases to the service planning-to-deployment cycle.  The common denominator for these two agility examples is that they don’t necessarily have a lot to do with NFV.  You can achieve them with composable services and agile CPE.

If we step back to a classic vision of NFV, we’d see a cloud of resources deployed (as one operator told me) “everywhere we have real estate.”  This model may be fascinating for NFV advocates, but proving out that large a commitment to NFV is problematic when we don’t really even have many service-specific business cases.  Not to mention proof that one NFV approach could address them all.  But the interesting thing about the classic vision is that it would be clearly validated if we presumed that NFV could generate an agile network topology, a new service model.  Such a model could have significant utility, translating to sales potential, even at the network level.  It could also be a way of uniting NFV and cloud computing goals, perhaps differentiating carrier cloud services from other cloud providers.

Networks connect things, and so you could visualize a network service as being first and foremost a “connection network” that lets any service point on the service exchange information with any other (subject to security or connectivity rules inherent in the service).  The most straightforward way of obtaining full connectivity is to mesh the service points, but this mechanism (which generates n*(n-1)/2 paths) would quickly become impractical if physical trunks were required.  In fact, any “trunk” or mesh technology that charged per path would discourage this approach.  The classic solution has been nodes.

A network node is an intermediate point of traffic concentration and distribution that accepts traffic from source paths and delivers it to destination paths.  For nodes to work the node has to understand where service points are relative to each other, and to the nodes, which means some topology-aware forwarding process.  In Ethernet it’s a bridging approach, with IP it’s routing, and with SDN it’s a central topology map maintained by an SDN controller.  Nodes let us build a network that’s not topologically a full mesh but still achieves full connectivity.

Physical-network nodes are where trunks join, meaning that the node locations are linked to the places where traffic paths concentrate.  Virtual network nodes that are based on traditional L2/L3 protocols are built by real devices and thus live in these same trunk-collection locations.  The use of tunneling protocols, which essentially create a L1/L2 path over an L2/L3 network, can let us separate the logical topology of a network from the physical topology.  We’d now have two levels of “virtualization”.  First, the service looks like a full mesh.  Second, the virtual network that creates the service looks like a set of tunnels and tunnel-nodes.  It’s hard to see why you’d have tunnel nodes where there was no corresponding physical node, but there are plenty of reasons why you could have a second-level virtual network with virtual nodes at only a few select places.  This is what opens the door for topology agility.

Where should virtual nodes be placed?  It depends on a number of factors, including the actual service traffic pattern (who talks to who and how much?) and the pricing mechanism applied.  Putting a virtual node in a specific place lets you concentrate traffic at that point and distribute from that point.  Users close to a virtual node have a shorter network distance to travel before they can be connected with a partner on that same node.  Virtual nodes can be used to aggregate traffic between regions to take advantage of transport pricing economies of scale.  In short, they can be nice.

They can also be in the wrong place at any given moment.  Traffic patterns change over each day and through a week, month, or quarter.  Some networks might offer different prices for evening versus day use, which means price-optimizing virtual-node topologies might have to change by time of day.  Some traffic might even want a different structure than another—“TREE” or multicast services typically “prune” themselves for efficient distribution with minimal generation of multiple copies of packets or delivery to network areas where there are no users receiving the multicast.

NFV would let you combine tunnels and virtual nodes to create any arbitrary topology and to change topology at will.  It would enable companies to reconfigure VPNs to accommodate changes in application topology, like cloudbursting or failover.  It could facilitate the dynamic accommodation of cloud/application VPNs that have to be linked to corporate VPNs, particularly when the nature of the linkage required changed over time to reflect quarterly closings or just shifting time zones for users in their “peak period.”

This has applications for corporate VPNs but also for provider applications like content delivery.  Agile topology is also the best possible argument for virtualizing mobile infrastructure, though many of the current solutions don’t exploit it fully.  If you could place signaling elements and perhaps even gateways (PGW, SGW, and their relationships) where current traffic demanded, you could respond to unusual conditions like sporting or political events and even traffic jams.

These applications would work either with SDN-explicit forwarding tunnels or with overlay tunnels of the kind used in SD-WAN.  Many of the vendors’ SDN architectures that are based on overlay technology could also deliver this sort of capability; what’s needed is either a capability to deliver a tunnel as a virtual wire to a generic virtual switch or router, or a virtual router or switch capability included in the overlay SDN architecture.

Agile topology services do present some additional requirements that standards bodies and vendors would have to consider.  The most significant is the need to locate where you want to exercise your agility, and what triggers changes.  Networks are designed to adapt to conditions, but roaming nodes and trunks aren’t the sort of thing early designers considered.  To exploit agility, you’d need to harness analytics to decide when something needed to be done, and then to reconfigure things to meet the new conditions.

Another requirement is the ability to control the way that topology changes are reflected in the network dynamically, to avoid losing packets during a change.  Today’s L2/L3 protocols will sometimes lose packets during reconfiguration, and agile topologies should at the minimum do no worse.  Coordinating the establishing of new paths before decommissioning old ones isn’t rocket science, but it is something that’s not typically part of network engineering.

Perhaps the biggest question raised by agile-topology services is whether the same thing will be needed in the cloud overall.  If the purpose of agile topology is to adapt configuration to demand changes, it’s certainly something that’s valuable for applications as well as for nodes—perhaps more so.  New applications like a logical model of IoT could drive a convergence of “cloud” with SDN and NFV, but even without it it’s possible operators would see competitive advantages in adding agile-topology features to cloud services.

The reason agile topology could be a competitive advantage for operators is that public cloud providers are not typically looking at highly distributed resource pools, but at a few regional centers.  The value of agile topology is greatest where you have a lot of places to put stuff, obviously.  If operators were able to truly distribute their hosting resources, whether to support agile-node placement or just NFV, they might be able to offer a level of agility that other cloud providers could never hope to match.

The challenge for topology agility is those “highly distributed resource pools”.  Only mobile infrastructure can currently hope to create enough distributed resources to make agile topology truly useful, and so that’s the place to watch.  As I said, today’s virtual IMS/EPC applications are only touching the edges of the potential for virtualization of mobile topology, and it’s hard to know how long it will take for vendors to do better.

Can Cisco Succeed with an SDN-and-NFV-less Transformation Model?

Cisco has always been known for aggressive sales strategies and cynical positioning ploys.  Remember the day of the “five phase plan” that was always in Phase Two when it was announced (and that never got to Phase Five)?  When SDN and NFV came along, Cisco seemed to be the champion of VINO, meaning “virtualization in name only”.  Certainly Cisco Live shows that the company is taking software, virtualization, the cloud, and APIs more seriously, but it doesn’t answer the question of whether Cisco’s emerging vision takes things far enough.

Cisco’s approach to networking, embodied in things like Application-Centric Infrastructure (ACI) and Digital Network Infrastructure (DNA), has been to use a combination of policy management and software control through enhanced APIs to create as many of the benefits of SDN and NFV as possible without actually mandating a transition from current switch/router technology.

ACI and DNA may seem like meaningless “five phase plan” successors, another round of cynical positioning, but they’re not.  They are specific defenses of the status quo from a company who benefits more from that status quo than others.  They’re also an exploitation of what Cisco knows from experience is likely to be a highly visible but largely ill-planned and ineffective initiative to change things.

Obviously, the risk of VINO is that any true benefits of infrastructure transformation are lost.  However, that risk is relevant only if those kinds of benefits are demonstrably realizable.  Very early in the SDN/NFV game, operators and enterprises found that capex reduction wouldn’t delivery any real infrastructure-transformation benefits.  Beat the vendors up on price instead!  But not much later, everyone realized that operations and agility benefits could be truly compelling.  I think Cisco, at this point, started to shift their general ACI positioning from “software control” to “software automation”, emphasizing the importance of software in reducing opex and bringing services to market faster.  DNA shows more of that view, for example, as the later architecture.

The truly interesting flex point came along within the last year, when it became increasingly clear that you not only could gain significant opex economy and service agility through operations automation, you probably should.  My own modeling shows that you can create a bigger impact on network operator profits with operations automation than with infrastructure transformation, do it sooner, and present a better ROI along the way.  Maybe I’m reading things into the Cisco event speeches, but I think Cisco may now accepting this shift, and looking to capitalize on it.

Operations automation implemented in an open way should be able to control the service lifecycle independent of the details of infrastructure.  Yes, that could aid the transition to SDN or NFV or both, but it could also be used simply to improve operations on legacy infrastructure.  That would play to what Cisco wanted all along—some way of improving operator profit per bit that didn’t involve shifting to a new set of network technologies that Cisco wasn’t the incumbent for.

You could also argue that it could play into the Tail-f acquisition, which gives Cisco a way of managing multi-vendor networks.  Earlier this month, Cisco won a deal with SDN/NFV thought-leader Telefonica to use Cisco’s Network Services Orchestrator (NSO) for business services.  This product is derived from Tail-f and the YANG modeling language.  In a real sense, NSO is a kind of superset of the NFV Virtual Infrastructure Manager and NFV Orchestrator rolled into one.  What it does, and what I’m sure Cisco intended it do, is let operators orchestrate/automate legacy configurations just as NFV MANO and other tools would do for NFV.

Which is the challenge Cisco now faces.  In fact, the move generates several challenges to Cisco’s positioning and ultimate success.

The most obvious challenge is that a “Network Service Orchestrator” will have to orchestrate SDN and NFV as well as legacy technology.  Cisco will have to let the new-infrastructure camel at least blow some sand under the tent, if not actually get the nose in.  If compelling SDN/NFV business cases could be made (which so far has not happened, but could happen) then Cisco might end up facilitating the very transition it’s been trying to position against.

This challenge leads into the second challenge, which is a fast start to achieve thought and deployment leadership.  Cisco has a credible NFV Orchestration product in NSO, as a recent report on NFV MANO from Current Analysis shows.  The problem is that NFV orchestration isn’t a business case, it’s just a way of making NFV work.  If Cisco’s goal is to fend off NFV transition it first has to make it clear that NSO opens an alternative path to benefits, then convince operators to take that path and prove out its validity.

Meeting these challenges, in my view, means making a direct connection between Cisco’s architectures (ACI, DNA) and products (NSO) and service, network, and operations automation.  I think some of that came out in the announcements and talks at the Cisco event, but this isn’t something you can win by blowing kisses.  NSO is delivering value below the OSS/BSS, and it’s a single-level model at a time when operators are recognizing the need for multi-layered orchestration.  Other vendors have a broader, better, story in that area than Cisco.

Better-ness equates to the ability to make a compelling near-term business case for software automation of operations.  NSO and YANG evolve from an initiative to fix SNMP management by creating CLI-like flexibility in a standardized way.  NETCONF is the protocol that operates on YANG models, and it’s ideal for network device management, particularly in multi-vendor environments.  As an operations modeling language it is, in my view, sub-optimal.  I know of no operator who likes the idea of doing cloud or OSS/BSS orchestration using YANG.  TOSCA, the OASIS cloud standard, is the emerging choice, in fact.  Cisco has to either prove that YANG is a good choice for generalized multi-layer service orchestration or explain where those other layers and those other kinds of orchestration come from.

Ciena, Cloudify, and others have provided some good insight into how TOSCA and YANG, for example, could relate.  Some operator architectures for NFV also suggest a symbiotic application of these technologies.  For Cisco to get its approach going, it needs to lay out this kind of approach and make it an official Cisco policy.  But it also has to tie this to benefits.  Operators I’ve talked with have been continually frustrated by the lack of insight vendors have into operations efficiency and agility benefits.  Vendors don’t know that the current costs are, how any specific approach would target them, and how the targets could be validated.  Most of the “validations” or “use cases” so far have been inside NFV lab activities, where the specific goal isn’t operations automation at all and where most of the interesting stuff has been declared out of scope.

Cisco is making several critical bets here.  First, they’re betting that SDN and NFV will not get their acts together (a fairly safe bet as I’d see it).  Second, they’re betting that they can deliver meaningful operations automation at the network level, meaningful enough to drive adoption of the Cisco NSO model without much operations integration elsewhere.  Third, they’re betting that nobody delivers operations automation from the top down and cuts off Cisco’s NSO layer.  Neither of these last two bets are particularly good ones, and so Cisco is going to have to figure out a way of taking a lead in operations automation and service agility that sticks its nose outside the network and beyond the influence of Cisco’s typical buyers.  A shift in attitude, which we may be seeing, is important to reaching that goal, but it won’t be enough.  Cisco has to step up and make the business case like everyone else.

Google Fi Could Be Big, or It Could Be Another “Wave”

One of the questions that seems to get asked annually is “When Google is going to build its own network?”  After all, Google has deployed fiber in some areas, and from time to time it’s said it was going to bid on mobile spectrum.  Is it just a matter of time for Google to take over the network as well as the Internet?  Not if you believe in financial reality.

We should be suspicious about a Google-eats-the-world vision based on financials alone.  Google is currently priced at about 6.4 times sales, where telcos are priced about 1.7 time sales.  The Price/Earnings ratio for Google is double that of telcos, and so is its return on assets.  So let’s see, I’m going to make my stock price go up and reward my investors (and myself) by getting into a business where opportunities are lower?  Why not just call Carl Icahn and invite him to tea and takeover?

Then there’s the fact that while Google has threatened to buy spectrum and has dabbled in fiber, it’s not really acquired licenses and it’s cherry-picking fiber cities.  All the indications are, and have always been, that Google is trying to force network operators to sustain broadband improvement cycles in the face of their declining profit per bit.  The media takes this seriously, but I doubt that the telcos worry that Google is trying to get into their business.  They’re more worried about how to get into Google’s.  We hear a lot about network operators trying to learn the “OTT mindset” but not much is written about how OTTs are trying to learn to be like telcos.

But if I intended to blog about how Google wasn’t going to become a telco, I’d be done already.  The strongest indicator of Google’s plans is its Project Fi initiative, and that initiative also shows just where telco vulnerability lies and how vendors will have to think about their customers’ “transformations” in the future.

Project Fi is a lot of things.  At its most fundamental level, it’s a “network anonymizer”.  Google combines WiFi access and selective 4G and future 5G partnerships to create its own virtual network that the customer sees as being their broadband service.  The website is evocative; it shows a smartphone (Android, of course) with the network identifier as “Fi Network”.

Fi also makes Google a Mobile Virtual Network Operator (MVNO) in its relationship with cellular wireless providers.  MVNOs piggyback their service on a “real” mobile broadband network rather than deploy their own.  Apple has long been rumored to covet an MVNO role, but Tim Cook seems to have ruled that course out earlier this year.  If that’s true and not just classic misdirection, then we’d have to address why Google would be seeing opportunity there when Apple isn’t.

Apple wants to sell iPhones and other “iStuff”.  The mobile operators who supply the phones to customers are Apple’s most valuable conduit, and if Apple decided to be an MVNO they’d make perhaps one operator in a given market happy (the partner in their underlay mobile network) and alienate all the others.  Google may be a competitor to Apple with Android, but Android isn’t a phone it’s a phone OS and the Android phone market is already fragmented.  If Google wanted to compete with the iPhone directly, why not just build phones and not license them to third parties?

What Google wants is, I think, clear.  Right now they’re an Internet OTT giant.  It’s hard to visualize what’s above “the top” in both a semantic sense and a realistic sense.  They don’t want to be on the bottom, where all the current bottom-feeders look covetously upward at Google.  What they want is to be “just-under-the-top”.  Project Fi is about JUTT.

A JUTT approach is consistent with Google’s Android approach.  I don’t want to own the dirt, Google says, I just want to make sure the dirt doesn’t own me.  Apple can sell phones as long as they don’t threaten the OTT space.  Operators can sell broadband if they stay in their profit cesspool and play nice.  Android was designed to poison Apple initiatives, and so is Fi designed to poison operator aspirations in Google’s top layer.

But JUTT is also an offensive play.  A simple MVNO approach exploits a brand.  The notion is that you have a good brand, good enough to be an automatic win in a certain number of mobile deals, so your marketing costs are minimal.  Being an MVNO gives you a nice little chunk of the mobile bill for doing not much (as long as it doesn’t turn your operator partners against you).  But what I think Google wants with Fi is to establish itself as a mobile brand.  Android is a brand unto itself, Samsung is an Android brand.  Fi is a Google brand.  Every time a Fi user looks at that smartphone, they’ll see the brand reinforced.

And they’ll see it exploited.  Apple has proved (with things like FaceTime) that you can make a phone into a community.  Fi could let you make broadband service into a community.  Social networks and video chat are already creating what’s essentially a mobile-service-free vision of communication.  Users always see their service as what they directly use, and smart devices with OTT services cover up the broadband underneath.  But if the broadband JUTT has the services within itself, then those services could pull through JUTT just like wanting to call grandma pulled through POTS voice.

Which brings up the final point; your “service provider” perception is set by your service.  If Google can use Fi to establish a Google-slanted vision of social-collaborative services, of “real” IoT, then they can make that vision the default standard for the new area.  That could make Google a leader in even the evolved form of “communications services”, the things the operators believe has to be tied specifically to the network.

This isn’t going to be an easy road, though.  Google has not been very successful in direct marketing; their DNA is in ad-sponsored services and they’re not going to make Fi work without charging for the network services consumed—because Google will have to pay to expand them and keep them strong.  I think Project Fi is just that right now, a “project” and not a product.  Google will have to work out the promotion and make a go of this, or it will fall by the wayside as so many other good Google concepts have.  Remember Google Wave?  That could be Fi’s future too.

Will IT Giants Slim Down to Nothing or Rebuild Around a New Driving Architecture?

For decades, there’s been a view that a one-stop IT shop or full-service vendor was the best approach.  Now it seems like nobody wants to be that any more.  IBM, once the vendor with the largest strategic influence of any vendor, has seen its product line and customer base shrink.  Dell and HPE seem to be getting out of the service and software business in favor of servers and platform software.  What kind of IT market are we facing now, anyway?  There are a number of basic truths that I think answer the “why” and “what’s coming” questions.

Truth number one is there are only 500 companies in the Fortune 500.  OK, I guess you think that’s obvious, but what I’m getting at is that very large enterprises aren’t getting more numerous.  The success of technology hasn’t come by making the big companies spend more, but by getting smaller companies to spend something.  That’s a whole different kind of market than the one that created Big Blue (IBM, for those who don’t recognize the old reference!)

With big customers, you can count on internal technical planning and support.  Your goal as a vendor is to secure account control, meaning that you have a strong ability to influence buyers’ technology planning processes.  You have an account team dedicated to these giant buyers and your team spends a lot of time buying drinks and kissing babies.  They can afford to because they’ll make their quota and your numbers from their single customer.

Down-market, the situation is the opposite.  The buyers don’t even do technology planning, and they can’t provide internal technical support for what they buy.  If you give them a hand, you’re holding them up—often spending days on a deal that might earn you a tenth of your sales quota if the deal is done.  You have a dozen or more of these accounts and you have to spend time with them all, doing little tiny deals that don’t justify much support.  You can’t sell down-market, you have to market to these buyers and that was something new to IBM and to other IT giants.

Which brings up truth number two:  you can sell solutions, but not market them.  If the small buyer doesn’t do technology planning, then they don’t know what they want unless they can somehow connect goals and needs to products.  If you try to help them in this by “solution selling” you end up spending days or weeks on the account for little gain in the end—because they are small companies, they buy IT on a small scale.

Software, particularly business software, is a solution sell.  Imagine trying to take out an ad in a small-business rag to promote analytics.  How many calls do you think it would generate, and how long would it take to turn a “suspect” into a customer?  Too long.  Thus, the down-market trend in tech tends to make software a more difficult sell than hardware.  Another problem is that new applications that demand new hardware pose the classic sales problem of helping the other guy win.  A big project budget is usually about 70% equipment and 30% software and services, and the software company has to do the heavy lifting.  That’s a tough pill to swallow.

Trying to sell both doesn’t help either.  If you try to push a solution that includes both software and hardware, or even if you just have software in your portfolio, you end up doing that endless hand-holding and the software decision holds up hardware realization.  Your management thinks “Hey, why not wait till these guys decide they want analytics and then sell them what it runs on?”  Why not, indeed.

The solution seems to be to either sell hardware or sell software.  In the first case you rely on the market for orderly compute capacity growth or on new applications that software players spawn.  You assume competition rather than account control.  In the software case, you focus on selling something that’s differentiable and you make hardware-only vendors into partners.

The third truth is that there is only a finite number of IT professionals and most of them are looking for vendor jobs, not end-user jobs.  Nike and Reebok make sneakers, not servers or software.  An IT professional has a better career path in a technical company, so it’s hard to keep great performers in end-user organizations.  The very qualities that a company needs to plan for the best IT isn’t likely to stay with the company if they go there in the first place.  That makes it much harder to promote complex solutions or propose significant changes.

I’ve told the story of the bank executive who interrupted a discussion I was having with the comment “Tom, you have to understand my position.  I don’t want decision support.  I want to be told what to do!”  Every expansion in consumer IT, every new startup, every vendor hiring spree, reduces the labor pool for the very people on whom business IT depends.  Those left behind want to be told what to do.

This problem could have been mitigated.  There was a time when vendors spent a lot on technical material to help buyers learn and use their stuff.  There is no similar level/quality of material available today, in large part because vendors who produce it would simply be educating their competition—most product areas are open and competitive.  There was a time when you could read good, objective, comments on products and strategies.  No more; those days were the days of subscription publications, and we’re living in an ad-sponsored world.

This brings us to where we are, which is that the trend to maximize profits in the coming quarter at the expense of the more distant future is alive and well.  None of these moves by the IT giants are smart in the long run.  Spending on business IT can improve only if new applications realize new benefits, and that really takes a company with some skin in the game across the board—hardware, software, and services.  But taking the long view means getting punished in quarterly earnings calls in the short term at best, and at worst having some “vulture capitalist” take a stake in your company to jump off on a hostile takeover.

But what about the future.  Nature, they say, abhors a vacuum and that’s true in information terms or in terms of driving change.  Software innovation happens under the current myopic conditions, but not nearly as fast as it should.  As the potential for a revolution builds, it tends to create revolutionaries.  In some cases (NFV is a good example), we have a wave of enthusiasm that outruns any chance of realization and we fall short.  In others we have enough functional mass to actually do what people get excited about.

For business IT this is hopeful but not decisive.  You can’t easily create business revolution as a startup, and decades of tripping over our own feet because we’re afraid of looking forward hasn’t created an environment where even enthusiasm will be enough.  What I think is about to happen is a fundamental shift in power among IT firms, away from those who shed the broad opportunities by shedding critical elements, and toward those who keep some key stuff.  All of our IT kingpins could still be in that latter category, but by cutting a lot of things out they make it critically important to exploit what they retain.  Will they?  That’s what we’ll have to watch for.

Where, though, could we see a real opportunity to exploit something, given the stripping down that’s happening?  The answer, I think, lies in architecture, a fundamental relationship between hardware and applications that’s set by a software platform.  Nobody who sells hardware can avoid selling platform software, and there are already big platform-software players with little (Oracle) or no (Microsoft, Red Hat) position in hardware.  Microsoft just announced a deal with GE Digital to host its Predix IoT platform on Azure.  The PR around the deal has been week on all sides, but it’s a sign that architecture plays are already emerging to link critical new applications (and IoT is an application not a network) with…you guessed it…platforms.

IoT, or mobility, or contextual productivity support, or even NFV, could be approached architecturally, and the result could lead to a new wave of dominance.  Almost certainly one of these drivers will succeed, and almost certainly it will anoint some player, perhaps a new one.  When that happens it will likely end our cost-management doldrums and announcements of slimming down.  I lived through three waves of IT growth and I’m looking forward to number four.

Building On the Natural Cloud-to-NFV Symbiosis

From almost the first meeting of the NFV Industry Specification Group, there’s been a tension between NFV and the cloud.  Virtual Network Functions (VNFs) are almost indistinguishable from application components in the cloud, and so platforms like OpenStack or Docker and tools like vSwitches and DevOps could all be considered as elements of NFV implementation.  Actually, “should be considered” is a more appropriate phrase because it makes no sense to assume that you’d duplicate most of the capabilities of the cloud for NFV.

What doesn’t get duplicated, or shouldn’t?  Our vision of NFV is increasingly one of a multi-layer modeled structure where the top is service-specialized and the bottom is infrastructure-generalized.  The cloud is a hosting model at the bottom layer, but current virtual CPE trends show that virtual functions can be hosted without the cloud.  Our vision of the cloud is also increasingly multi-layer, with virtualization or cloud stack platforms at the bottom, an “Infrastructure-as-Code” or IaC model between, and DevOps tools above.  You’d think the first step in harmonizing the cloud and NFV would be to solidify the models and their relationships.  That’s not yet happening.

In a rough sense, the cloud’s IaC concept corresponds with NFV’s notion of a Virtual Infrastructure Manager.  Either of them would wrap around the underlying platforms and management tools needed to deploy/connect stuff.  The cloud vision of IaC, which includes the specific notion of “events” that trigger higher-layer processes based on resource conditions, is more advanced in my view.  Most significantly, it’s advanced in that it presumes that IaC has to work with what the platform does, and the NFV ISG seems to think that it needs to do a gap analysis on OpenStack (for example) and submit changes.  That opens the possibility of long delays in coordinating implementations, and also raises the question of whether many of the NFV-related features belongs in a general cloud implementation.

Which raises the question of the next layer, because if you don’t have something in the bottom you probably want to look at putting it a layer above.  In the cloud, DevOps is a broad and modular approach to deployment that could (and in most cases, does) offer a range of options from deploying a whole system of applications to deploying a single component.  In NFV, you have Management and Orchestration (MANO) and the Virtual Network Function Manager (VNFM), with the first (apparently) deploying stuff and the second (apparently) doing lifecycle management.  However, these are subservient to a model of a service that presumably exists higher up, unlike the cloud which makes the DevOps layer extensible as high as you’d like.

Operators like AT&T, Orange, Telefonica, and Verizon have been working through their own models for service-building that start up with operations software (OSS/BSS) and extend down to touch SDN, NFV, and legacy infrastructure.  Even here, though, they seem unwilling or unable to define something DevOps-ish as a uniform higher-layer approach.  TOSCA, as a data model, would clearly be suitable and is already gaining favor among vendors too, but some (Cisco included) have orchestration tools that fit into a lower-level model (based on YANG) and don’t really have a clearly defined higher-level tie-in.

One of the impacts of the confusion here is a lack of a convincing service-wide operations automation strategy.  I’ve blogged about this recently, so I won’t repeat the points here except to say that without that strategy you can’t realize any of the NFV benefits and you can’t even insure that NFV itself could be operationalized with enough efficiency and accuracy to make it practical.  Another impact is an inconsistent, often impractical, and sometimes entirely omitted integration model.  The whole cloud DevOps/IaC concept was built to let applications/components deploy on generalized infrastructure.  Without agreeing to adopt this model for NFV, or to replace it with something equally capable, you have no broad framework for harmonizing different implementations of either network functions or infrastructure elements, which puts everything in the professional-services boat.

Interface standards like the ones described in the NFV documents aren’t enough to assure interoperability or openness.  Software is different from hardware, and the most important thing in software is not how the elements connect (which can often be adapted simply with some stubs of code) but how the elements interrelate.  That’s what describes the features and functions, which is the primary way in which an open approach can be architected.  We need this high-level model.

Another reason we need the model is that since it makes no sense to assume that we’d duplicate cloud efforts in NFV, we need to understand where NFV requirements are introduced and how they’re realized.  Much of that the ISG is working on relates to the description of parameters.  What functionality, exactly, is expected to use them?  Do we want OpenStack to check for whether a given VNF can be deployed on a given host, or do we want some DevOps-like capability to decide which host to put something on given any exclusionary requirements?  If we wait till we try to deploy something to find out it doesn’t go there, deployment becomes a series of trial-and-error activities.

Both the “declarative” (Puppet-like) and “imperative” (Chef-like) models of DevOps allow for the creation of modular elements that can be built upward to create a larger structure.  Both also have IaC features, and both allow for community development of either IaC or application/VNF elements and the sharing of these among vendors and users.  This NFV could ride a lot of this process and get to a useful state a lot faster.

It could also get to the cloud state, which may be the most critical point of all.  The difference between a VNF and an application component is minimal, as I’ve noted above.  If operators want to offer features like facility monitoring, is that feature a “VNF” or a cloud application?  Wouldn’t it make sense to assume that “carrier cloud” was the goal, and not that the goal was NFV?  And IoT is just one of several examples of things that would logically blur the boundary between NFV and the cloud even further.

The good news here is that operators are framing NFV in a logical and cloud-like way in their own architectures.  It’s possible that these initiatives will eventually drive that approach through the open-source NFV initiatives, but the operator approaches themselves are the real drivers for change.  NFV is not going to deploy if it has to re-invent every wheel, and those who are expected to deploy it know that better than anyone.

Coupling Resource Conditions and Service SLAs in the Automation of Operations/Management

In a couple of past blogs, I’ve noted that operations automation is the key to both improved opex and to SDN/NFV deployment.  I’ve also said that to make it work, I think you have to model services as a series of hierarchical intent models synchronized with events through local state/event tables.  The goal is to be able to build everything in a service lifecycle as a state/event intersection, or set of synchronized intersects in linked models.  The key to this, of course, is being able to respond to “abnormal” service conditions, and that’s a complex problem.

If you looked at a single service object in a model, say “Firewall”, you would expect to see a state/event table to respond to things that could impact the SLA of “Firewall”.  Each condition that could impact an SLA would be reflected as an event, so that in the “operational” state, each of the events/conditions could trigger an appropriate action to remedy the problem and restore normal operation.  This framework is the key to operationalizing lifecycles through software automation.

Now, if you look “inside” the object “Firewall”, you might find a variety of devices, hosted software elements and the associated resources, or whatever.  You can now set a goal that however you decompose (meaning deploy or implement) “Firewall” you need to harmonize the native conditions of that implementation with the events that drive the “Firewall” object through its lifecycle processes.  If you can do that, then any implementation will look the same from above, and can be introduced freely as a means of realizing “Firewall” when it’s deployed.

This approach is what I called “derived operations” in many past blogs and presentations.  The principle, in summary, means that each function is an object or abstraction that has a set of defined states and responds to a set of defined events.  It is the responsibility of all implementations of the function to harmonize to this so that whatever happens below is reported in a fixed, open, interoperable framework.  This creates what’s effectively a chain of management derivations from level to level of a hierarchical model, so that a status change below is reflected upward if, at each level, it impacts the SLA.

This sort of approach is good for services that have an explicit SLA, and in particular for services where the SLA is demanding or where customers can be expected to participate in monitoring and enforcing it.  It’s clearly inappropriate for consumer services because the resources associated with linking the service objects and deriving operations would be too expensive for the service cost to justify.  Fortunately, the approach of an intent-model object can be applied in other ways.

The notion of an SLA is imprecise, and we could easily broaden it to cover any level of guarantee or any framework of desired operations responses to service or network conditions.  Now our “Firewall” has a set of events/conditions that represent not necessarily guarantees but actions.  Something breaks, and you generate an automatic email of apology to the user.  The “response” doesn’t have to be remedial, after all, and that opens a very interesting door.

Suppose that we build our network, including whatever realizes our “Firewall” feature, to have a specific capacity of users and traffic and to deliver a specific uptime.  Suppose that we decide that everything that deals with those assumptions is contained within our resource pool, meaning that all remediation is based on remedying resource problems and not service problems.  If the resources are functioning according to the capacity plan, then the services are also functioning as expected.  In theory, we could have a “Firewall” object that didn’t have to respond to any events at all, or that only had to post a status somewhere that the user could access.  “Sorry there’s a problem; we’re working on it!”

There are other possibilities too.  We could say that an object like “Firewall” could be a source of a policy set that would govern behavior in the real world below.  The events that “Firewall” would have to field would then represent situations where the lower-layer processes reported a policy violation.  If the policies were never violated no report is needed, and if the policy process was designed not to report violations but to handle them “internally” then this option reduces to the hands-off option just described.

It’s also possible to apply analytics to resource-level conditions, and from the results obtain service-level conditions.  This could allow the SLA-related processes to be incorporated in the model at a high level, which would simplify the lower-level model and also reduce or eliminate the need to have a standard set of events/conditions for each function class that’s composed into a service.

Finally, if you had something like an SD-WAN overlay and could introduce end-to-end exchanges to obtain delay/loss information, you could create service-level monitoring even if you had no lower-level resource events coupled up to the service level.  Note that this wouldn’t address whether knowing packet loss was occurring (for example) could be correlated with appropriate resource-level remediation.  The approach should be an adjunct to having fault management handled at the resource level.

The point of all of this is that we can make management work in everything from a very tight coupling with services to no coupling at all, a best-efforts extreme on the cheap end and a tight SLA on the other.  The operations processes that we couple to events through the intent-modeled structure can be as complicated as the service revenues can justify.  If we can make things more efficient in hosting operations processes we can go down-market with more specific service-event-handling activity and produce better results for the customer.

The examples here also illustrate the importance of the service models and the state/event coupling to processes through the model.  A service is built with a model set, and the model set defines all of the operations processes needed to do everything in the service lifecycle.  SDN and NFV management, OSS/BSS, and even the NFV processes themselves (MANO, VNFM) are simply processes in state/event tables.  If you have a service model and its associated processes you can run the service.

Resource independence is also in there.  Anything that realizes an object like “Firewall” is indistinguishable from anything else that realizes it.  You can realize the function with a real box, with a virtual function hosted in agile CPE, or with a function hosted in the cloud (your cloud or anyone else’s).

Finally, VNF onboarding is at least facilitated.  A VNF that claims to realize “Firewall” has to be combined with a model that describes the state/event processes the VNF needs to be linked with, and the way that “Firewall” is defined as an intent model defines the things that the implementation the VNF provides has to expose as unified features above.

Operations automation can work this way.  I’m not saying it couldn’t work other ways as well, but this way is IMHO the way a software type would architect it if the problem were handed over.  Since service automation is a software task, that’s how we should be looking at it.

The TMF got part-way to this point with its NGOSS Contract approach, which linked processes to events using SOA (Service-Oriented Architecture, a more rigid predecessor to modern microservices) through the service contract as a data model.  It really hasn’t caught on, in part I think because the details weren’t addressed.  That’s what the TMF’s ZOOM project, aimed at operations automation, should be doing in my view, and whether they do it or not, it’s what the industry should be doing.

I think some are doing it.  Operators tell me that Netcracker uses something like this in their presentations, and a few tell me that Ericsson is now starting to do that too.  I think the presentation made last year by Huawei at the fall TMF meeting in the US exposes a similar viewpoint, and remember that Huawei has both the tools needed for low-level orchestration and also an OSS/BSS product.

It’s a shame that it’s taking so long to address these issues properly, because the lack of software automation integration with operations and management has taken the largest pool of benefits off the table for now.  It’s also hampered both SDN and NFV deployment by making it difficult to prove that the additional complexity these technologies introduce won’t end up boosting opex.  If we’re going to see progress on transformation in 2017 we need to get going.