Cisco Has More Reason to Push NFV Forward than they Think

Cisco turned in a decent quarter, particularly considering the state of the network equipment market and the growing concern about the global economy.  It’s always nice to have good results to tout to investors, and nice in particular when you need to navigate the usually treacherous path toward next-gen technology without trashing your current revenue streams.  Many companies wait too long to shift strategies, and Cisco now has a bit of headroom—if they have the will to use it.

Everything Cisco makes is under some form of commoditization pressure as network operators and enterprises alike try to control costs.  Further, we have white-box alternatives emerging for switching/routing and open server architectures that could threaten Cisco’s UCS, and competitors Alcatel-Lucent and Nokia have consolidated, forcing Cisco to tighten its alliance with Ericsson to respond.  For Cisco, there could be no greater danger than to assume that its happy quarter is replicable in the long run, without some trauma as the company frames a new model for networking.

I think Cisco knows it has to change.  The Jasper deal may be showing that Cisco is aligning with the poster child for higher-layer service drivers, IoT, but the fact that they’ve bought into only the limited connection-device management vision with the deal could be a bad sign.  They might think that’s all IoT will be, in which case somebody is going to eat their lunch, or they might have had to move quickly to take some position because they’d diddled too long.

What IoT and other industry developments are showing is that service value is flying out of connectivity.  This is largely due to the fact that residential broadband is stuck in zero-marginal-cost all-you-can-eat pricing and that mobile broadband is really mostly about watching videos.  On the business side, higher-priced connectivity services are really site-connection services not people-connection services, and the number of new sites to connect isn’t really elastic with price—nobody builds offices because it’s cheap to connect them.  Since you can’t really do much to feature-differentiate connection services, you can see there’s nowhere to go but up.

Despite the hype, this doesn’t mean going up in OSI layers.  The fact is that higher-layer OSI protocols aren’t even “in-network”, they’re on-site by definition, and in any case hardly anyone really knows what the real OSI layers above Level 3 are.  We’re really talking about something more like cloud service than network service.

Cisco’s earnings call transcript doesn’t exactly resonate with this higher-layer non-connectivity view.  Robbins’ first priority as stated was ACI, then security, third cloud/SaaS services (WebEx, for example) and finally M&A to expand Cisco’s strategic scope.  I think Cisco needs to reorder this list while they can, to take advantage of the fact that Cisco has a window here to ride decent performance while looking at the inevitable shift.

Cisco didn’t mention SDN or NFV at all on their call.  That’s not automatically a bad thing, because it’s far from clear that the vision of SDN and NFV the industry is moving to will ever compel major changes in costs or revenues, and thus any improvement in the profit-per-bit constriction they face.  However, both SDN and NFV are to a degree symptoms of network issues that Cisco does have to face, and failing to mention them means Cisco may be trying to address the future without referencing technologies that operators think are the drivers of that future.  NFV, as something the operators themselves started, is a particularly dangerous thing to ignore.

Perhaps, you might think, Cisco is deferring to its new partnership with Ericsson.  Well, Cisco didn’t mention that partnership either until a Street analyst asked about it, and in their response they focused on the tactical/sales impact (minimal in the near term according to Cisco) and not on how it might be a pathway to bridging Cisco to the future.  That’s in-character for Cisco, always known for a “tomorrow is the focus, not the day after so make your quota” mindset, but risky in two dimensions.

Right now, without anyone really driving the bus, SDN and NFV are both focusing on the low apples of opportunity.  That means focus on cost, and cost focus is really problematic for a vendor whose equipment sales are a big part of the cost picture.  It’s also building more credibility for white-box and open solutions because nobody is really driving a story of new benefits, new revenues, that’s meaningful or even credible.  So we’re actually promoting commoditization.

Cisco also needs to remember that Ericsson is an integrator in these new technology areas.  Integrators make money on professional services and they have relatively little incentive to push high-priced hardware because a fixed budget with higher equipment costs inside leaves less for professional services.  Anyway, nearly all the major carrier integration projects these days have a strong open-solution target, which doesn’t favor Ericsson pulling Cisco through.  Despite Cisco’s comment that there are already deals in the pipeline, Cisco may gain less from the partnership than Ericsson.

Logically, what Cisco needs to be doing is at least epitomized by the NFV benefit case, if not directly related to making it.  Operations efficiency and new revenues are the current priorities for NFV, but neither has to be NFV-specific.  Cisco has no real position in OSS/BSS, no real position in NFV management and orchestration.  It’s hard to see how Cisco would be able to promote new efficiencies and revenues and ignore what operators think is most likely to generate them, but if Cisco at least blew kisses at NFV while addressing efficiency/revenue goals, they could hope to win something.

This is all about new “R” in an ROI sense.  If operators have no real hope of raising revenues, they have to lower costs.  Conversely, a credible new-revenue position helps a lot in taking pressure off for cost reduction, pressure that could mean spending less on network gear or maybe switching to price-leader Huawei.  Who, by the way, has become one of the leaders in carrier SDN and NFV.

The thing is, Cisco has faced fast-moving competitors before, and they may think that this time is no different.  They’re wrong.  What’s at stake here is the notion of proprietary technology as the foundation for networks.  The enemy for Cisco is the “open” movement, from software to server designs.  It’s going to take time to get that movement on track, but when and if it finally gels, it’s going to be the worst threat Cisco has ever faced.

Which means Cisco should be focused on minimizing that risk.  They can’t expect to get away with Dirty Open Source Tricks, but they do have an easy path forward—speed.  If Cisco can accelerate the market instead of hanging back, they could deliver key benefits before an open alternative existed.  The smart path for Cisco to take would be to push hard on an NFV approach that embraces Web-layer services to augment basic NFV, delivers on OSS/BSS integration, and accelerates NFV deployment.  They have little to lose in a higher-layer-focused deployment of NFV, and in fact if they could hasten deployment they’d likely beat any effort to standardize white-box servers and switches and make key features open-source.  If they wait, as they have been, then events overtake them.

What’s Needed to Make vCPE Pave the Way to NFV?

Everyone wants NFV to succeed including both vendors and operators, and surely the media and analyst communities will find themselves under revenue pressure if NFV fails and something else doesn’t emerge to hype up.  From the first, one of the most-promoted examples of NFV has been virtual CPE (vCPE), which propose to substitute hosted connection-point features like security for appliance-based features.  The question is whether there’s really hope for vCPE, and if there is whether it will drive NFV deployment.  Both those questions may seem to have been answered, but that’s not the case.

The principle behind vCPE is that there are a bunch of service-edge features that currently would be implemented using separate devices, but that could be hosted via NFV instead.  The idea is that the sum of the costs of the appliances would be far greater than the cost of hosting the features, and that the use of hosting could radically reduce the number of truck rolls needed to install and support the features.  Some believe that users would be induced to buy more features if they could be delivered and installed instantly.

The most obvious question about vCPE is whether it’s really NFV at all.  The vCPE model that prevails today is a smart edge device or board into which features are loaded.  This model is a sharp contrast to the cloud-resource-pool model envisioned by the ETSI NFV specifications.  You don’t have to decide where to host something; the only option is the service edge device.  Management isn’t an issue because you manage a box just like you always did.  Orchestration is trivial because everything you deploy is co-located.

To be fair, there is an application of vCPE as an on-ramp to NFV services, a means of addressing the “first-cost” of NFV where you have to deploy a massive resource pool to achieve economy of scale before you have even a single customer.  The notion is that you start with vCPE then move to a pool-hosted solution when you gain service mass.  That’s fine, but if you’re going to do that then you need to have a real NFV management and orchestration platform in place to control services before and after your move to the cloud, or you don’t have continuity of management practices.

You also have to ask whether you have enough demand to justify a resource pool in the first place.  In the US, there are about 150,000 businesses that have multiple sites, perhaps a half-million worldwide.  These are the primary targets for vCPE.  If we take the US number and look at the distribution by metro area, the average is 50 sites per metropolitan statistical area, or about 20 per central office.  About half of all MSAs and COs have less than five business sites.  Is there enough here to build a pool?  I don’t think so.

That raises the biggest question with vCPE, which is just what CPE we’re talking about.  There’s a lovely (and not uncharacteristic) fuzziness about just who the vCPE user might be, perhaps because that’s not an easy question to answer.

Carrier Ethernet is the place where vCPE has been tested and applied most often.  Business users do in fact often buy service-edge features by buying appliances.  We’ve already talked about the limited number of business sites, so let’s look at the total service revenue opportunity.  The global revenue from carrier Ethernet services this year will likely be only about $38 billion.  Even if we presumed that these users spent as much on vCPE features as they did on services, we’ve not exactly generated an explosive opportunity for new services.  Telecom services overall run well over two trillion dollars, after all.

This raises an important point for vCPE and NFV, which is that there simply isn’t enough money on the business vCPE table to drive significant NFV deployment, even without considering whether there’s a pathway for evolution to a pool of resources from the edge-hosting model.  You’d need consumer CPE to make it worthwhile.

OK, why not?  Well, remember that what we’re talking about is virtualizing the service features, not the entire box.  Consumer broadband needs to terminate the broadband access connection and generate local WiFi connectivity.  Those are physical functions you don’t displace with vCPE; we still need a box on premises.  Yes, that box has perhaps DNS, DHCP, and firewall support, but those features are basis and rarely change.  How many truck rolls do you get because you have to change out a consumer firewall?

Then, for both business and consumer, there’s box cost.  You can buy a minimalist Ethernet termination box with firewall, DHCP, etc. for under $500, and a consumer broadband gateway for under $40, retail.  Cable terminations are about a hundred bucks.  One network operator told me they buy a broadband gateway with WiFi for $17.  Most operators say the average installed life for such a consumer device is 5-7 years, and for businesses 4-6 years.  Do the amortization math here and you see that the total cost of providing the service-edge features with a custom appliance is perhaps $7 per year per consumer or perhaps $100 per year for a business.  Do you think you can provide a feature-hosted alternative for a lot less than that?

So does this prove that vCPE is a bad idea?  Not at all.  It’s a good way of adding feature agility and dynamism to an existing carrier Ethernet service, and it might be a good way of supporting home-video-broadband gateways.  If you can expand the repertoire of features beyond the basic security/connectivity features of today, you can make a case for vCPE.  What’s not easy is making a case that vCPE drives NFV deployment.

So am I saying that vCPE is a dead end, NFV-wise?  No, but what I’m saying is that vCPE won’t justify it.  You can make vCPE an on-ramp to NFV providing that you deploy it within a real NFV framework that can do top-down orchestration that in particular integrates operations (OSS/BSS) and management (NMS, NOC) and some higher-layer service order and support portal.  Very few vCPE vendors have the tools to extend their solution to the broad NFV opportunity.

The largest problem vCPE faces today is the risk of being stranded in a silo if you deploy it.  Everyone is assuming that we’ll somehow converge on a strong, universal, NFV solution and that everything that purports to be NFV today will fit in it.  That is a worthy hope, but not an assumption that we can validate.  If you want to do vCPE and you want to evolve from it to real NFV, you’ll need to pick a solution that can make the broad business case first, then fit your vCPE strategy inside it.

The Hidden Issues of IoT

If there are hidden issues in SDN and NFV then there surely are in the Internet of Things.  IoT, after all, is way ahead in the race to over-hype and perhaps the reason is that the outcome everyone would like to see is decidedly unlikely but decidedly newsworthy.  Why not sing for the press and hope for the best?  Never mind that it never really works.  The good news is that we almost certainly will see IoT success.  The question is whether we’ll recognize it when we see it.

The of IoT’s hidden issues is Internet literalism.  If we think the Internet of Things means having a zillion “things” directly on the Internet then we’re doomed to disappointment.  Almost all the barriers to IoT success come from literal interpretation of the acronym.  Direct Internet connection would mean you’d have to make IPv6 deploy, which has yet to be done.  You’d have to use cellular connectivity for the devices, which would raise the cost per device and the monthly connectivity cost to the point where many devices wouldn’t make sense.  You would have devices open to attack and misuse to the point where the number of highly publicized disasters would surely kill any momentum.

The IoT of the future will be made up of IoT proxies that are visible on the Internet and a whole series of controllers and sensors that live “underneath” these proxies.  In a few cases these will be on WiFi and use private IP addresses in the same way that most home computers and printers do.  In the majority of cases they’ll talk with control protocols like Zigbee or X10 or Insteon.  Since there would only have to be one IoT proxy per home or location, this simplifies the cost and also improves security by making it practical to protect the sensor/controllers without giving each of them a firewall and policy management system.

This architecture raises a second hidden issue, which is that IoT is standardizing the wrong stuff.  Instead of looking at how we talk to the sensors and controllers, we should be focusing on how we talk to IoT proxies.  Right now virtually every home control environment uses its own specific controller, even if they’re all accessed via the Internet.  Anyone who has fiddled with home control in any form knows that, for example, you may not be able to get Amazon’s Echo to do everything you want unless you have a “sub-controller” that can talk to the devices and be controlled by Echo.  Same with any other system.  Programs for the sub-controllers are not compatible with each other, and they don’t have the same interface to the Internet, though many will give you a web page and let you control things that way.

What would really be useful for IoT is a sensor and controller model or API set to be asserted by all the IoT proxies for all the devices under them.  With something like that you could expect to exercise control over a home/facility without having to worry about whether your control program had to be changed for every IoT proxy out there.  Remember, IoT proxies are the visible part of IoT, so they’re what we have to specify and standardize.

The next hidden issue is actually implied by the last.  It’s dependency on Internet service continuity.  Do you want everything that’s being controlled to be exposed, and every sensor that’s input to control inputs likewise?  If you do that then the IoT proxy is a simple API translator, and all your intelligence is outside the facility, connected by the Internet.  Which means that if you lose Internet access your control doesn’t work.  Most people would want to have a local control agent that could translate sensor events into control decisions.

Most home alarm systems, at least the decent ones, have battery backup and will sustain home protection for at least a couple hours even through a blackout.  Internet outages are, for most of us, far more common than power blackouts, so it’s not unreasonable to assume that you’d want to have local control at least as the basic mechanism, to be overridden or supplemented from the outside.  But if that’s the case then you’ve totally eliminated the classic notion of IoT.

Where does that classic notion belong?  That’s the next issue; we don’t recognize the distinction between open IoT elements and private ones.  There are places where open-model, on-the-Internet, IoT can make sense.  If we’re to use IoT in traffic management, retail marketing, and general augmenting of mobile contextual services we have to make the sensors generally visible.  If we’re going to use it in home/facility security then an open model is the last thing we want.  How do we manage to miss the vast difference here?

Most of the privacy issues of IoT, in the general sense, can be addressed by the IoT proxy model because we’re already addressing them successfully that way.  We do need a public model, but we’re hampered by the fact that we don’t recognize the difference between the two, which starts with a need to provide open access, moves through security/privacy, and ends up….

…in our last hidden issue, which is direct public access to sensors doesn’t scale.  If there are a thousand shoppers trying to decide if the light a block away favors their crossing without breaking stride, there is little chance the sensor will withstand the barrage of inquiries.  Further, public sensor data has great value when considered historically, but if we expect sensors to store six months’ worth of data we’re back in deep cost problems.

Public IoT is now, and always has been, a big-data-and-analytics problem.  Anyone who’s rational knows that even where sensor data is public, exposing the sensors themselves to public view might not be the smartest approach.  If we have, even for public sensors, a protected-network architecture and big-data collection and categorization, we’d have something we could hope to scale to the levels needed, and hope to secure besides.

Will IoT withstand a continuing policy of ignoring these hidden issues?  In one sense it will, because those who hype it will do what they’ve always done, which is to redefine what we mean by “IoT” to match whatever happens and then declare the revolution complete.  Everyone knows that an old revolution is an oxymoron anyway, so a quick dismissive article followed by the Next Great Thing will be enough.  However, in the financial and benefit sense, we can’t hope to get the most from IoT without addressing these points.  And what’s the point of a revolution with no real favorable outcome?

Transformation from the Top is Gaining Ground with Operators

While there has certainly been a lot of interest in what the next-generation infrastructure of operators might look like, there’s some indication that operators are thinking less about infrastructure these days.  Transformation is still their goal, but more and more are setting their sights higher than the network, and this could have a major impact on network and service evolution over the next five years.  There are ## critical aspects to the current trend, and I want to explore them briefly here.

The first big thing driving a higher-level view is an operator realization that mobile services are probably where their real opportunity for new revenues and significant cost reduction lie.  There are still operators interested in more systemic infrastructure changes and in business services based on vCPE, but what I’m hearing is that more and more now realize there isn’t going to be enough money on the table for this kind of service to pay off in improved profits.

Mobile services, as I’ve pointed out often, are inherently contextual, something that ItsOn, a recent winner at Telefonica, highlights on its website.  You can do a lot with mobile services by recognizing more things about the customer and then reflecting those new things as a contextual framework into which you could deliver ads and services.  The beauty of this approach is that it’s a service overlay that’s loosely tied to IMS for customer information but that doesn’t require much integration with or changes to the rest of the mobile network.

A few operators are also seeing the natural connection between contextual mobile services and IoT.  While operators are probably more guilty of being 4/5G-connection-revenue-focused in their IoT plans (I’ll bet we see that reinforced at MWC) than even vendors, a small number realize that heady forecasts of billions of new devices in three or four years is just smoke.  This group is looking for the real value of IoT, and that real value is in IoT’s ability to enrich the notion of context.

Verizon’s interest in Yahoo might well be another indicator of this trend.  If contextual mobile services are the way of the future, you need to be able to exploit them as directly as possible.  Operators have wanted to get into the ad business, which is what funds the OTTs, for a long time.  Context combined with Yahoo?  Could be big.

The second big thing is that OSS/BSS has as big an inertia problem as the network does.  Most operators have wrestled with the question of whether operations systems could be transformed or whether they should just be trashed in favor of something new.  Some of the mobile-service contextual trends are showing operators that you can add web-like service overlays on mobile broadband in particular without even having much impact on operations systems.  In fact, the current drive to contextualize mobile services seems largely disconnected from the OSS/BSS as well as from the network.

This opens a new vision of “transformation of operations”.  Instead of trying to do a fork-lift there, which operators are finding is just as impractical as a network fork-lift, you surround your operations systems with a lot of service-overlay technology and let what remains wither on the vine.  Whether the goal for OSS/BSS transformation is simplification or replacement, reducing the things you’re depending on these systems for in transformation terms makes the evolution of operations easier and less risky.

The third big thing is that integration of NFV based on the current state of the specifications is not going to be practical.  It’s not that you can’t do it, but that the steps that are involved are much more work than operators had expected, and the results are likely to be a lot more brittle.  The story I get on Telefonica’s much-discussed integration RFP is that the operator has been confronted with costs much higher than they had expected because the work involved is much more complicated.

Paying for integration isn’t unreasonable when you get something helpful, but the problem with NFV is that VNF and resource openness are simply not properly accommodated in the architecture.  I think the operators themselves know this and are trying to drive the NFV ISG toward a different approach (intent modeling and a specific service-model-driven vision of management and orchestration) but by their own admission they can’t expect more than “significant progress” this year.  That’s just not fast enough to help operators deal with their imminent revenue/cost-per-bit crossover.

Thing number four is that SDN and optical networking can combine to create a revolution of their own, perhaps a bigger one than NFV could.  I’ve always believed that NFV could be the senior partner of the SDN/NFV pairing, but the limitations in the NFV model have limited the extent to which NFV can deploy and have rendered the support NFV might offer to SDN moot.  In the meantime, operators are starting to see ways where SDN grooming can combine with agile optics to create a virtual-wire-network model that would itself be more agile and operationally efficient, and that this model might then create an opening for NFV to be used to deploy switch/router instances.

Two of the six vendors who could make an NFV business case are primarily optical network equipment players, and two more have optical capabilities along with other network elements.  One of the signposts to watch for on the optical-and-SDN path is whether Nokia drops Alcatel-Lucent’s tendency to protect its routing/switching business at the expense of its (excellent) SDN product.  If we see more optical/SDN-centricity in Nokia, it’s a sure sign that this is going to be a real driver for change.

We probably have the most disorderly Presidential political scene in living memory, and we may have a similar level of disorder when it comes to transformation.  The casualty of all of these things/trends is the notion that a technology is going to change everything.  I think that things like ONOS/CORD/XOS are going to offer pathways to unite a high-level vision of change with SDN and NFV, but I think the opportunity for SDN or NFV to drive the bus by themselves are gone.  The network will not transform internally, but because of external pressures from the top—service revenue opportunity shifts.

A Review of NFV Vendors

I think that 2016 is a critical year for NFV because this is the year that real deployment models have to be proven out if they’re to impact 2017 costs and revenues.  That fact also makes it a critical year for vendors, and so this is a good time to take a look at where vendors are and what they have to do to optimize their place in an NFV future.

Alcatel-Lucent/Nokia has a strong position going into 2016 because it’s one of the vendors who have sufficient product scope to make a complete operations-to-equipment business case.  The only challenge that the new combined company has is that it is a combined company.  The merger has focused Nokia even more on mobile broadband services, which means that its NFV strategy almost certainly has to win there.

The big question for Nokia is whether it can look beyond connectivity for NFV and for IoT in particular.  It’s one thing to do mobile infrastructure better, but another to do NFV on a broad scale or impact revenues as well as costs.  IoT is an example of something that can be looked at (as most look at it today) as simply a 4/5G connectivity problem with some IPv6 thrown in, or as an example of adding cloud features to services.  Nokia has to do the latter, because if anyone else does that better than they do, then Nokia’s whole NFV approach is at risk, including mobile broadband.

HPE is another vendor who can make the complete NFV business case, and I think they have the strongest service modeling approach—certainly the best-documented one.  Their biggest asset is that they can attack any NFV opportunity equally—they are not type-cast into a specific NFV mission.  That makes it hard for competitors to shunt them into a backwater of positioning.

It also means that HPE doesn’t have a real linchpin to revolve their marketing around.  I think that the company has focused too much on the ETSI process, and in particular the PoCs.  Because HPE doesn’t really offer a good story on how NFV evolves, this ETSI-centricity tends to link them to the theme of the PoCs (which, remember, come from a bunch of vendors).  Right now, that theme is vCPE, and vCPE is actually perhaps the worst NFV application in terms of driving server deployments.  HPE sells servers, and they need to sell them through NFV positioning if they’re going to win.

Dell is a sort-of-competitor to HPE, and I qualify that because Dell does not have the ability to make an NFV business case with their own offerings.  Their currently-in-progress EMC/VMware acquisition puts them in a funny position because the bulk of NFV has been directed at OpenStack and VMware is a competing technology.  Until Dell positions VMware there’s little it could hope to do with NFV, and it may take some time for Dell to accomplish that.  Meanwhile they have to figure out how to sell servers, just like HPE does.

Red Hat is also a sort-of-competitor to HPE, but with a more complicated set of issues.  They do not sell servers, they sell support-inclusive licenses to open-source software.  That has always driven Red Hat to try to insure that their successes pull through additional software licenses, which makes it hard for them to embrace a fully open framework for something like NFV.  Could an NFV solution have to deploy machine images based on many other operating systems than Red Hat’s?  Sure could.  But right now they really don’t have the ability to make the NFV business case on their own, which makes it much harder to create that initial value island that would pull through other Red Hat licenses in an NFV data center.

Oracle is one of the six vendors who can make the NFV business case, but they may be the only one who doesn’t have a clear major financial win coming out of a business-case success.  Oracle sells servers, but they’re unlikely to be a power player in server deployments.  There aren’t any signs that they are trying to do that, in fact.  Since operators really want an open NFV strategy, can Oracle find stuff to sell if they successfully drive an NFV deployment, or will they end up making somebody else rich?  That’s a question that we can expect to have answered in the next six months, I think.

Ericsson is a kind of NFV maverick.  They have one of the strongest operations systems product suites on the market, and they have impeccable professional services capability.  They don’t have much in the way of infrastructure products at this point, and many operators believe that Ericsson thinks they’ll make money on NFV primarily through integration and not product sales.  The wild card is their alliance with Cisco, who those same operators feel is more an obstruction in the path of SDN and NFV than a supporter of either.  Since Cisco is focusing on making current devices “software-defined” or controlled, it is very possible Ericsson and Cisco could find common ground in the service-to-infrastructure boundary, which might be very appealing to operators looking to migrate from router-centric installed bases.

ADVA and Ciena are both interesting because both of them could make a complete NFV business case and both are obviously committed to optical networking.  If you model all the possible ways that NFV could transform infrastructure, the one that deploys the most servers is where SDN first creates virtual-wire grooming of transport paths that are then augmented by virtual switch/router instances deployed by NFV.  Both ADVA and Ciena could do this, but so far it’s been Ciena who has been working to position itself in the space.  The challenge here is that this sort of transformation goes right to the heart of current service practices and may scare operators off.  There’s a lot of marketing work to be done, particularly by ADVA whose only aspirations with its NFV buy of Overture seems to be extending vCPE to carrier Ethernet.

Huawei is the wildest of wild cards in this game.  While they’re known primarily as a price leader in network equipment, Huawei has been incredibly active in NFV and SDN and has recently started to build what might well be the most insightful and powerful open-source team in the industry.  Their big problem has been the effective ban on their network gear for US operators.  However, Sprint has been winking at that issue and there are hints that it might be up for review in the next administration.  If Huawei can sell here in the US it would put incredible pressure on Nokia, Ericsson, Cisco, and Juniper.  Even if they can’t reverse their US ban, they are making great strides in Europe.  Their big problem remains marketing/positioning, and if they fixed that then they could influence even US operator strategy.  An open-source win in NFV, for example, could undermine everyone else.

My final vendor is one you don’t always think of in NFV terms—Intel.  There’s nobody out there that has more to gain from optimal deployment of NFV.  Imagine a hundred thousand new data centers, all filled with servers running Intel chips.  NFV could more than double Intel’s profits if it worked.  And Intel’s Wind River group could fill the functional gap, give operators top-down NFV, open-source technology, everything they want.  But if they do that they step on every other NFV vendor, some of whom are already major conduits for Intel sales.  Somehow, perhaps through positioning support for things like a rational cloud-intensive IoT model, Intel has to promote data center NFV deployment or it runs the risk of being the biggest loser.

The first test of all of this will be the announcement of the Telefonica integrator contract award, now long past its expected late January date.  Rumor has it that three of the vendors I’ve named above are on the short list for that, and whoever gets named will gain a lot of experience in NFV integration and perhaps have a chance to build a broad benefit model and business case.  It will also be enlightening to see what Telefonica tries to do.  A business-case-driven and practical approach to NFV would advance the chances that other operators will take a similar position and move up the chances of a big NFV success in trials this year, which would set the stage for deployment at scale in 2017.

Just How Serious is Cisco about an IoT Transformation?

Cisco’s announcement it was buying Jasper Technologies for its IoT cloud service platform created a lot of buzz, not all of it favorable.  Many on the Street think Cisco is paying too much for a market entry play, given that Cisco already has IoT offerings.  My view is that there’s good and bad in the Jasper deal, so whether it was smart for Cisco or not will depend on whether Cisco sees the balance of the two.

IoT has overtaken the cloud and NFV as the most-hyped technology of the current age, and like all technologies that get hyped the boundaries of IoT have gotten very fuzzy.  That’s particularly bad in the IoT case because the real opportunity, which is M2M or “environmental intelligence” is often submerged in the “I” part of IoT.  How many people think that sensors and controllers will be placed directly on the Internet, available for hacking and malicious use by anyone?  Most, as it turns out.

I’ve cited my view of IoT in prior blogs.  We will never make universal connection of sensor devices to networks practical if we insist on taking these devices directly to the Internet.  Yes, there are applications where direct Internet presence is logical—I’ve personally seen examples in transportation were cellular Internet access is essential to track big expensive cargo carriers, ships, etc.  But for most home, business, and industrial applications, sensors that work on low-cost local technology and are networked internally rather than right onto the Internet are much more sensible.  And despite what the “I-centric” IoT proponents say, the real opportunity for a controllable, sensible, IoT model is bigger than the opportunity for a direct-on-the-Internet model.

Jasper is a control platform for IoT, something that does some of the things that Verizon’s IoT program also provides, like managing the online devices and access plans.  It appears to me that it could do a lot more than this—linking applications, analytics, and sensor/control applications into a cloud community, but the focus of the company seems to have been dragged into that I-centric, LTE-connections-for-everything, approach.

The most-cited IoT example these days is the “connected car”, which is essentially mobile broadband client in a vehicle, where it can both provide Internet-app access (including WiFi to in-car devices) and connectivity to the car’s own computer for vehicle status, control, and diagnostics.  We’ve had the capability to do all of this for years, of course, but what’s significant about the connected-car example is that it’s all about managing what is effectively a bunch of handsets “owned” by cars instead of people.  Only a small part is related to true M2M, and that part was supported by services like OnStar years ago.  It appears that most of Jasper’s focus is on the management of the connected-car model for IoT, where you have a community of cellular-Internet elements that represent fairly static sets of information.

The good news for Cisco is that this is the kind of IoT that network operators really love.  How could you, as the CFO for a big mobile operator, not salivate at the thought of a zillion machines each having their own phones (in effect)?  Forget about competing for all the human users, who stubbornly refuse to reproduce at the growth rate needed to keep everyone fat and happy in total-addressable-market terms!  Sell to machines instead, or rather to their manufacturers, which is easier.

I think Jasper is focusing on the notion that IoT in general can be viewed as a “connected-x” model where “x” is “home” or “factory” or even “worker”.  In short, they’ve seen IoT as focusing on a mobile broadband connection, and their software focuses on managing the connecting device more than on providing higher-layer “cloud-like” services over it.  Is that practical, though?  The “home” application of the model may be a good way to look at its viability.  Home security and control are often, these days, linked to wireless services to call into the monitoring service.  This is another nice broadband opportunity, and if Cisco thinks that they can expand Jasper into this space then they might have made a smart buy.  The problem arises, both for connected-home and business applications beyond and for Cisco as a seller, when you consider the sensors behind the connection.  If we think that every window, door, and motion sensor is going to be connected with its own mobile broadband plan we’re smoking something.  So does Jasper, and Cisco, believe that?  It’s hard to say.

Jasper’s material doesn’t refer specifically to a model of IoT where a controller that has a cellular connection might act as a proxy for local sensors and controllers that use a more economical communications technology for their connections.  It doesn’t foreclose such a model, though, and I think that it would be possible for Cisco to promote a two-level networking solution for IoT that looks just like this controller-on-the-Internet-and-sensors-on-the-cheap model.

To do that, Cisco is going to have to make Jasper into a lot more than managing devices and cellular data plans, which is where most of it goes today.  Cisco has analytic tools that could be used for IoT big-data collection and reporting.  They could build the kind of big-data-centric IoT that I believe is the real hope for the IoT market.  That’s the model that GE Digital’s Predix already supports, but of course GE Digital may not be well-known to service providers.  If Cisco could promote “real” IoT to network operators using Jasper-Plus technology they might grab a lot of market share in the IoT space.

That would do wonders for Cisco in SDN and NFV, if they wanted to excel in those two areas.  A cloud-modeled big-data-centric IoT would be the largest consumer of cloud services, of NFV, and of SDN.  If somebody like Cisco had the golden key to that IoT model they could pull through their own cloud/SDN/NFV solution with it.  Presuming, of course, that they wanted to do that.  In both SDN and NFV it’s not clear that’s the case.

Jasper leaves IoT trapped in conventional connected-car mode in a positioning sense.  Jasper talks about the cloud model of IoT.  If the cloud is the goal then for carriers the goal translates into carrier cloud and implicates NFV as the means of service feature deployment.  Cisco can break out of this any time they like with a nice PR blitz of the kind Cisco knows well.  If they do, they redefine IoT forever and make fools out of Street skeptics.  Cisco does have a chance to redefine itself with Jasper, but it’s going to take some significant work, and it’s far from clear that Cisco’s prepared to commit the resources.

However, competitors like Nokia or HP could jump out and articulate a true two-tier strategy before Cisco got around to it, which would then force Cisco to say “Me too!” or heap disdain on the two-tier model and bet on everything-on-mobile-broadband.  Since that can’t possibly penetrate the IoT opportunity as far or as fast, Cisco would be behind in its own market.  And GE Digital might take a more aggressive stance and preempt all these guys.  Or everyone might sit on their hands, in which case we’ll have a very long delay before we see an IoT revolution.

“Only the brave deserve the fair,” was the saying of old.  We’ll see who’s going to be brave here.

Could Public Policy Delay or Derail the “Carrier Cloud?”

When I did a survey of operator priorities back in 2013, the top of the list was mobile broadband (88% of operators listed it) and second with 86% was cloud computing.  At the end of 2015, mobile broadband’s score was almost identical but cloud computing had fallen astonishingly—to only 71%.  Nothing in the history of my surveys ever sunk that far that fast.  Is this another example of operators’ having little understanding of the future, or is there more to it?

Cloud computing got to its dizzy heights of operator strategic focus primarily because it was hot in the media.  The most important thing to remember about network operators is that, as a class of business, they are behind the pack in terms of strategic marketing and proactive sales.  The days when they were nothing but order-takers are gone, but they’re not that far in the past and their shadow still falls on the organizations today.  The smart thing for an organization who can’t develop opportunity to do is to look to exploit it.  That’s why cloud was hot.

If you’d asked people what percentage of IT spending would be directed at public cloud services in 2015 back in 2013, you’d have gotten numbers ranging from a low of 22% to a high of almost half.  Actually the number is about 6% even today, and that includes money that might otherwise have gone to simple web hosting.  One reason the operators lost some cloud enthusiasm was that real customer contact was telling them that the public perception of cloud adoption was too optimistic, but that wasn’t the only reason.

The big advantage operators would have in the cloud is their lower expectation for ROI.  While companies like Google want ROIs in the 20% or higher range, most operators are happy to get 15%.  If you presume that cloud computing adoption is purely cost-driven, which it is in today’s market, then the guy who can invest at the lowest ROI—all other factors being equal—wins in a price war and thus in the marketplace.

Nobody wants to win a market by commoditizing it, which means that operators really want to have differentiating features.  For IaaS, the dominant cloud model today, that’s a tough row to hoe.  Some have suggested that operators could position their server assets locally, which would give them better performance, or that they could use premium connectivity to achieve that same goal.  Those notions seem logical but they ignore the fact that the big operators with the best cloud credibility are highly regulated.

Regulations complicate a lot about network services because they impose special obligations and restrictions on the major-market incumbents for basic telecommunications services.  Generally, cloud computing is considered an information service, and generally it’s got to be offered by a separate subsidiary rather than the regulated entity.  This creates issues when it comes to sharing real estate, which is a factor in forward placement of assets.  Regulatory challenges vary by country, though, and are most likely to impact the developed markets.  In the emerging markets, this is less a problem, which is why Tier Two and Three interest in the cloud has dropped only slightly in contract to the big decline among Tier Ones.

The next problem operators report in fulfilling their early cloud hopes is the challenge in building the resource pool.  Cloud computing, as a cost-driven market, is critically sensitive to the cloud provider economy of scale, and that’s particularly true where you want to target enterprises.  Since most operators would have difficulty in targeting lower-end market segments, this means that building a resource pool is a prerequisite for market entry.  That makes it a big first-cost risk.  You spend a lot to build your cloud data center(s) and then you hope you can fill them with paying customers.

When NFV came along, which was about the time that hopes for carrier cloud were highest, a lot of operators saw it as a way of creating a resource pool in advance of having to sell cloud services and with lower risk.  There would be, for most major-market providers at least, a need to dance through regulatory hoops to avoid problems, but if you could harness NFV to build resource pools that would then be available as an element of cloud services, you’d have something great.  Now, of course, we’re seeing the greatest level of NFV success in CPE-hosted VNFs, which don’t build cloud resource pools at all.

We could build resource pools by hosting vCPE functions centrally, but it seems likely that this will create a bootstrapping challenge.  Where there’s a good density of vCPE prospects, you could expect to host functions centrally rather quickly if you could ramp up service sales quickly.  If you were substituting vCPE for fixed devices, you might even be able to skip the premises-hosting step if you could sign on a critical mass of customers then deploy a data center to support them.  But the time lag is significant, which means that as a practical matter you’d need to start with premises-hosted vCPE.  Then you have to decide if there’s enough benefit to move to central hosting, which again would depend on customer density.

Even if you can justify feature hosting, some operators are concerned about the regulatory implications.  Remember that services bundled with CPE were declared to be anti-competitive in some markets long ago.  Can operators host CPE features, then, without making that capability open to competitors?  And as you climb up from “firewall” to something that looks like a server, including DNS and DHCP, are you now entering the realm of information services?  Who knows?

This all illustrates the challenge associated with efficient resource-pool building.  It’s particularly acute for vCPE and mobile infrastructure, two of our NFV missions, because these applications are obviously linked to services that are considered to be “regulated”.  The implications of sharing resources between these services and information services, as I’ve noted, is worrying, and that might make the IoT mission for NFV and the IoT service target for operators critically important.

If IoT is nothing but connectivity, as we tend to think of it today, then it’s not really much about NFV either.  At least, it’s not about NFV any more than any other connectivity service.  If, on the other hand, IoT is really big data and cloud analytics and so forth (as I believe it has to be to succeed) then it is an information service and a candidate to build resource pools that virtually any operator could leverage to offer cloud computing services.

IoT is an example of what I’ve called the “agent model” for services.  You have an agent that’s responsible for browsing through information and calling for analytics.  Something (the user, an application, whatever) communicates with an agent to secure answers, and the agent then does its thing.  This separates user-to-agent connectivity and services (which are clearly “traditional” in all ways) and agent-to-processes-and-data connectivity and services, which seem to be something different.  Is this cloud-like stuff now an information service, and thus in the domain of a subsidiary, or is it something like a CDN and thus exempt from even neutrality regulations?  Who knows?

Among operators, some worry about this stuff and some don’t.  Even inside individual operators, I’ve found that the CTO-level people don’t think about this much and may even be planning stuff that would give the Regulatory Affairs types some radical gastric distress.  At the least, these questions illustrate the fact that our technology aspirations may have to wait a bit till public policy and regulations catch up.

Four Missions for NFV and How They’d Impact Deployment

In my blog yesterday I looked at how an optimum evolution of physical and logical networking would impact SDN and NFV.  I also pointed out that this kind of radical futuristic outcome would be difficult to bring about.  That raises the logical question of just where NFV (in particular) really is today, and where it’s going.  This year is critical for NFV; if it can prove itself then it can have a big play in operators’ attempt to restore profit-per-bit.  How’s it going?  We’ll look at missions and benefits to find out.

The clearest NFV mission in terms of trial activity and market interest is virtual CPE (vCPE).  This is also the purest application of the NFV mission articulated in the first NFV white paper in the fall of 2012.  Virtual CPE means converting CPE that currently has embedded high-level service features (firewall, NAT, DHCP, etc.) to a model where these features are hosted rather than embedded.

While the mission is clear here, the details are murky.  For example, many CPE service-termination devices today have firmware that can be updated and features that can be enabled and disabled.  Many CPE devices also have essential service features that can’t be virtualized; home broadband gateways have WiFi, for example.  There have been dozens of cost/benefit and business case analyses published for vCPE, but I have to say that I don’t accept the numbers for any that I’ve seen.  The fact is that we don’t have a good handle on just what the differential cost for a CPE device that can host features might be, relative to the cost of a custom appliance of the type we use today.

One factor that makes vCPE particularly difficult to assess is the variability in the hosting model.  The majority of vCPE offerings and trials focus not on cloud-hosting the features we’ve extracted from appliances, but hosting them locally on a generalized device.  Thus, we really don’t have virtualized CPE as much as feature-elastic CPE.  Can we build a box that lets you load features remotely for less than we’d build one based on updatable firmware?  Should we see premises hosting of VNFs as just a first-cost stepping-stone to cloud hosting?  If so, how do we get to that future?  If not, does what we’ve created really deserve to be called NFV in the first place?  There’d be no resource pool created by vCPE then; noting to exploit for future services.

The next clearest mission for NFV is mobile infrastructure.  What makes mobile infrastructure a potentially compelling opportunity is that this is where most of the capex is going and where most of the network changes being planned are concentrated.  It’s far easier to redo the technology of something you’re replacing anyway than to replace something just to redo the technology.  5G evolution in particular offers a critical opportunity because it’s going to happen, it’s probably going to coincide with the critical revenue/cost-per-bit crossover in 2017, and it will drive enough network change to allow new technologies to gain a substantial footprint—if they can prove themselves out.

One question that mobile infrastructure raises is whether a new mobile model based on overlay networking (logical connectivity overlaid on arbitrary physical infrastructure) could be proved out here.  A mobile user is in many senses a candidate for a logical-connectivity service.  A user would have a fixed logical address and that address would be mapped for delivery to the cell or hotspot the user happened to be in.  Might we transform not only Evolved Packet Core (EPC) with this, but also WiFi offload, home WiFi, and maybe even home broadband?  Could the mobile model be extended to what we’d normally think of fixed or wireline devices?

Mobile sounds like the hot corner for NFV, but there are issues.  If anything is proposed that would alter the way devices themselves see the service, the standards changes need would take so long that it would make geological timescales look positively hasty.  Some of the most compelling mobile infrastructure opportunities might be stalled by this challenge.  There’s also the problem that any new mobile network is going to have to support all the current phones, many of which (like my own) aren’t even 4G.  No operator is going to kick double-digit-percent of their customers off just to transform.

The last of the NFV missions is virtual networking, where NFV is used to deploy instances of switches and routers to build a virtual topology that is built at the trunk level using tunnels (created by whatever is convenient, but likely evolving to SDN).  This mission could offer enormous opportunity because if it were managed optimally it could displace all of the current L2/L3 devices in the carrier network, most of them with virtual instances created by loading software onto server pools.

This is the NFV mission that generates the radical changes, the huge increase in the number of data centers and servers, and the shift of focus of carrier capex from L2/L3 boxes to servers and software.  The greatest potential change in costs and benefits occurs with this mission, and of course the greatest risk.  You have to figure out how to migrate gracefully to this new virtual state, and that means both having an understanding of what the end-game would look like and having a graceful evolutionary pathway to get there.

The last of the NFV missions is the most hypothetical—NFV as the platform for IoT-related services.  If IoT develops to its full potential, it’s less about connecting sensors via 5G and IPv6 than it is about creating a secure and comprehensive repository of sensor data to be made available to applications and control functions.  This application-and-control mission is very cloud-like but with the kind of SLA needs and dynamism that service features deployment would have, and that justify (maybe even compel) NFV use.

The challenge here is obvious.  Nobody thinks about IoT this way and the other model, which is just a bunch of expensive promiscuous sensors stuck on the Internet, has so many economic, regulatory, and legal pitfalls to address that it would end up being nothing more than wearable technology and connected cars.

To my mind, these four missions define the benefit boundaries of NFV, and thus the extent to which it could actually hope to deploy.  Here we have four possible scenarios:

  1. NFV is an intellectual exercise. This scenario comes about if the only NFV success we can point to in 2017 is the premises-hosted model of vCPE.  With no compelling need for a resource pool, this doesn’t develop any assets-in-place to exploit with future services, and so NFV contracts to a CPE software management strategy that impacts, probably, only carrier Ethernet services and customers.
  2. NFV lite. This scenario comes into play if we get either enough vCPE central hosting or a combination of vCPE and mobile infrastructure deployment to build a small resource pool—perhaps a data center in each metro area.  NFV would not account for more than about 8% of carrier capex under this scenario.
  3. Respectable NFV comes along if we get virtual networking for private network services along with vCPE and mobile infrastructure. In this scenario we’d see NFV represent about 15% of carrier capex over time.
  4. Optimum NFV is achieved if we get a logical IoT mission for NFV, or if the other three missions all succeed in an optimal way. If we get this, we end up with NFV representing about 27% of carrier capex as a steady state, and if we got all four missions to succeed we’d hit 35% of carrier capex.

It’s pretty obvious that the likelihood of these outcomes is highest for the early ones on the list above, and lower for the later ones.  The biggest issue is that of benefits where two things would contribute to greater NFV penetration.  First, a solid service management vision that covered everything from top to bottom would accelerate the operations efficiency and agility benefits.  Second, a vision of the future network to which operators could track evolution would reduce risks and build a commitment to something more than a few diddles and dabs.

Operators are still groping with the question of how to improve profit per bit.  I think the trend globally is to take a holistic view of services and infrastructure, and that means that even a tentative commitment to NFV has to be linked to a broader vision of transformation.  The thing that started to change in 2015 and is surely changing in 2016 is that operators are demanding more than just claims of cost reduction or revenue improvement.  I think those demands can be met, but I also think that the demands will transform the NFV vendor landscape over this year.

The Ultimate Future Infrastructure: Do We Want It?

What exactly might a next-generation network look like?  If we actually try to answer that question it becomes clear that there are two parallel visions to contend with.  On the physical side, we have to build the network with fiber transport and something electrical above it, whether it’s a set of special-function devices or hosted software.  On the “logical” side, it’s probably an IP or Ethernet network—something narrow like a VPN or VLAN or broad like the Internet.  The fact that these sides exist may be troubling in a sense, promising more complications, but those sides are the essence of what “next-generation” is really about, and they may dictate or be dictated by how we get there.

The simplest approach to our network dualism was promised years ago by Nicira.  You build an overlay network, something that has its own independent nodes and addressing and forwarding, that uses arbitrary physical-network underlayment as the transport framework only.  Connectivity is never managed there, and so every piece of transport infrastructure can be optimized for the mission represented by its own place and time in the picture.  What’s best for here and now.

This is a good way to start NGN discussions because it shows the contrast that virtualization can bring.  If everything is virtual then host stuff is just plumbing and all the service value migrates upward into the virtualization layer.  This is a sharp contrast with networking today, where services are properties of the infrastructure deployed.  That tight binding between services and infrastructure is what makes network investment look like sunk cost any time a major shift in the market happens.  We’re not at the point of full virtualization yet, in no small part because the technology shifts on the horizon have limited goals and don’t take us to the future except in combination.  But if the future is where we’re going then we should consider it now to be sure we’re heading the right direction with short-term steps.

In our virtual-service world, the primary truth we have to address is how hosted/virtual nodal elements can handle the traffic of the virtual networks they support.  I can build a VPN or VLAN by using independent software routers/switch instances hosted on standard servers, and in fact build it in a number of ways, but only if I’m sure that the traffic load can be managed.  This suggests that we could use overlay virtualization for services other than the Internet but probably not for the Internet itself, unless we could revisualize how “the Internet” works.

There are two primary models for a “connection network” of any sort.  One model says that you have a tunnel mesh of endpoints, meaning that everything has a direct pipe to everything else.  While this doesn’t scale to Internet size, works fine for private LANs and VPNs.  The other model says that you can look at the traffic topology of the endpoints (what gets exchanged with who) and then place nodes at critical points to create aggregate trunks.  This mechanism lets you reduce the number of tunnels you have to build to get full connectivity.  It’s how L2/L3 networks tend to be built today, in the “real” as opposed to “virtual” age.

Would Internet-scale traffic kill this model?  It depends on how many virtual pipes you’re prepared to tolerate.  If I can host router instances anywhere I like, I could build a multi-layer aggregation network from the user edge inward, to the point where I’d aggregated as much traffic as a hosted instance could carry.  At that point I’d have to hop on a set of tunnels that would link me to all the other instances at the same level so no deeper level of aggregation/routing was required and no software instances had to handle more traffic than my preset maximum.

We could build something like this.  Not with physical routes because they’d be inefficient, but if we had virtual pipes that had effectively no overhead we could fan out as many such pipes as we liked.  This is an important point in NGN design because it says that we’d probably need white-box-SDN tunnel switching above optical-layer transport to groom out enough virtual pipes to keep the traffic per router instance down to a manageable level.  We’d then push the switching/routing closer to the edge and mesh the instances we place there.

This also suggests a new model for the tunnel-core we’ve now created.  At the edge of that element we find something that might be considered the core network of what used to be called an “NBMA” or “non-broadcast multi-access” network.  This concept was developed in the old days of ATM and frame relay—switched virtual circuits.  The goal was to let a switched pathway to augment an edge route by creating a passage across a multi-access core whose “endpoints” might not all have enough traffic with each other to justify a tunnel.

Suppose we used NBMA concepts with our tunnel-core?  We could create the tunnels between our inner-edge aggregating instances when traffic came along, and then leave them in place forever or until they aged out based on some policy.  A directory would let each of these aggregation edge instances find their partners if they had traffic to route.  If we presumed these aggregation edge elements were white-box SDN switches modified as needed, or with a controller intelligence to provide NBMA capability, we could extend aggregation deeper than with hosted router instances alone.

This is the model that seems to have the most promise for an NGN because it’s a model that could scale from private networks to public Internet.  It also demonstrates the role for various new-age network technologies.  SDN is the tunnel-manager, NFV is the instance-deployer.  We could built the network of the future with our two key technologies using this model.  Will we?  That could be more difficult.

The problem with this sort of far-out network planning is that it presumes the acceptance of a future architecture that’s accepted by operators and supported by vendors, when in truth we don’t have either of those things in place.  The fact is that the greenest of all the network fields, the time at which a transition from TDM was essential and operators accepted that, has passed.  We have exited the network revolution and entered the evolution.  Evolution, as we know, isn’t always the most logical driver of change.  We may take optimum steps that don’t add up to an optimum result.

I’m not sure how we could get back to a revolutionary mindset.  We might evolve into some backwater position so unwieldy that mass change looked good by comparison, or some new player might step in and do the right thing.  Even one operator, one vendor of substance, could change our fortunes.

Do we want that?  That’s the last question.  What would this future network do for us?  How much cheaper could it be?  Would its greater agility to generate totally new services be valuable in a market that seems to have hunkered down on Ethernet and IP?  Those are difficult questions to answer, and that’s too bad, because whether we think we’re approaching the future in a single bound or in little baby steps, it’s what’s different about it that makes it exciting.

Some Hidden Truths About Service Automation

The number one issue with making an NFV business case is that of service automation.  Any talk about “operations efficiency” or “service agility” that doesn’t start with the assumption that service activity is automated is just a waste of breath.  In addition, if we don’t have very precise notions of how we’d manage the incremental aspects of deployment and ongoing operation for VNF-based services, we can’t argue these services would reduce TCO relative to current device-based approaches.

While service automation is hardly a hidden issue, there are hidden aspects to its implementation, both in the way standards address it and the way vendors do.  It’s easy to claim service automation success or management completeness when there’s nothing to measure either against.  What I propose to do here is to offer some reference points, things that have been there all along but not discussed to any extent.

First and foremost, you cannot do service automation if you can’t tie processes to service events.  Automation means taking software-driven steps instead of human steps, in response to a condition requiring action.  Those conditions are signaled by events.  Automated platforms for service and network management have to be event-driven, period.  This is why, back in early 2013, I said that the foundation of the CloudNFV initiative I launched was the TMF’s NGOSS Contract/GB942.  That was the first, and is still the most relevant, of the specifications that provided for event-steering to processes via a data model.

The notion that handling events, or as it’s often called being “event-driven”, has been on the list of OSS/BSS changes users wanted to see for years, perhaps more than a decade.  It’s been in the TMF lexicon for at least seven or eight years, but my friends in the TMF tell me that implementations of the approach are rare to non-existent.  That’s a shame because the NGOSS Contract notion should be the foundation of modern network operations and management.

Second, being event-driven means recognizing the notion of state or context in a service-related process.  Events have to be interpreted based on what’s expected, and the notion of expectation is reflected in a “state”.  A service that’s ordered but not yet deployed is in the “Ordered” state, for example, and so an “Activate” event could be expected to deploy it.  An “Order” event, in contrast, would be a process error.

What makes state complicated is that services are made up of a bunch of related, connected, functions.  A VPN has a central VPN service and a bunch of access services, one for each endpoint.  You have to recognize that there is a “state” for each of these elements because they all have to be managed independently first, then collectively.  Thus, we have to visualize a service as a structure at a high level, broken down to lower-level structures, and recognize each of these has to have “state” and “events” that then have to be associated with service processes.

The next service automation issue is that multiple services and multiple users/tenants cannot exist in the same address space.  Service A and Service B might both use exactly the same virtual functions, but they have to be separated from each other or neither users nor management systems can securely and reliably control both of them.

In the cloud, players like Amazon and Google launched their own virtual networking initiatives to insure that tenants in the cloud could deploy and connect things as though these applications and users were alone in the world, even though they shared resources.  NFV should have taken that issue up from the first, but it’s still not described in a satisfactory way.  The logical answer would be to adopt the Amazon/Google model, which is based on the notion of a series of private address spaces (RFC 1918) and selective mapping of elements/ports from one such space to another (from tenant/service to NFV management, for example) and to public (user-visible) addresses.  But if this is going to be done we need to know who assigns the addresses and how the multiple mapping is done.

This problem is particularly incendiary in the area of management.  If a virtual function is to reflect its state through a MIB port like SNMP (for example) then you’d have to let it see the state of the resources used to host it.  How do you let that happen without opening the management portals of the resources to the VNF?  If you do that, you risk at the least flooding resource management APIs with requests from thousands of VNFs, and at worst letting one VNF change the state of shared resources so that other users and VNFs are expected.  This problem needed to be fixed up front; it’s not fixed yet.

The fourth issue is that of brittle implementations of service descriptions or models.  Ideally, a service should be seen at two levels, the logical or functional level and the deployed or structural level.  This has been reflected in the TMF SID data model as “customer-facing” and “resource-facing” services.  The goal of this kind of explicit modeling is to prevent implementation dependencies at the deployment level from breaking service descriptions.  If you have a service in your catalog with structural references included, changes in the network or server pool could break the definition, making it impossible to deploy.

Implementation dependencies can also create the problem of “ships-in-the-night” or “colliding” remediation processes.  When something breaks in a service, it’s common to find that the failure causes other elements elsewhere to perceive a failure.  When a high-level piece of a service can see a low-level condition directly, it’s easy to find the high level process taking action at the same time as other lower-level automated activities are responding.  Fault correlation is difficult if you have to apply it across what should be opaque pieces of service functionality, without risking this sort of collision.

A final aspect of “brittleness” is that if management processes are defined on a per-function basis (which is the ISG recommendation) then it’s very possible for two different implementations of a virtual function to present two different sets of management tools, and require different craft or automation processes to handle them.  The classic way to avoid this, which is to build service automation intelligence into the per-function management element, can create enormous duplication of code and execution overhead and still leave pieces of management processes inconsistently implemented.

The last of the service automation issues is lack of an effective service modeling framework.  A service model that represents the natural hierarchy of service/resource elements can contain, per element, the management hooks and state/event tables needed for service automation.  We have no such model with NFV even though that should have been done up front.  You can see that by the fact that the NFV ISG convoked a conference of SDOs to address the need for a unified model.  Without a model, we have no framework in which process automation can operate, so everything would have to be custom-integrated in order to work.  Or, it would have to be very limited in scope and in its ability to deliver benefits.

This is going to be a hard problem to solve because SDOs are essentially companies these days, and companies need profits.  SDOs get revenue from membership, events, etc. and this means there’s an incentive to jump into areas that are hot and would attract opportunities for new revenue.  So these bodies are now supposed to parcel out service modeling responsibility fairly?  Good luck with that.  Most of them also hold material internally, for members only, until the issues are fully resolved, which means nobody gets to comment or see what’s evolving (unless, of course, they join and often pay).

The good news in the model sense is that there are only a few rules (which I’ve cited in other blogs) that a model would have to support to insure that process automation strategies could be made portable even across model implementations.  You need processes to be defined as microservices and you need each model itself to be a microservice-implemented intent model.  But even these easy steps are yet to be taken in any organized way.  I know that ADVA/Overture, Alcatel-Lucent/Nokia, Ciena, HP, Huawei, and Oracle have the general tools needed to do what’s needed, but it’s hard to pull the detail from their public material.  Thus, it’s hard to say exactly how their service automation solution would work in the real world.  Huawei, at the fall TMF event, offered the greatest insight into service automation, but it wasn’t clear how much of it was productized.

The real world, of course, is where this has to be proven out.  Almost every operator I’ve talked with tells me that they need a strong NFV trial that incorporates full-scope service automation by sometime in Q3 if they’re to deploy in quantity in 2017.  Time is getting shorter.  We already see operators thinking about broad modernization rather than specifically at NFV, and if NFV doesn’t address the critical service automation issues then it may make up a smaller piece of operator deployment than many think, or hope.