Cisco’s Quarter: Are They Really Facing the Future at Last?

Cisco reported its quarterly numbers, which were still down in revenue terms, but they forecast the first growth in revenue the company has had in about 2 years of reports.  “Forecast” isn’t realization of course, but the big question is whether the gains represent what one story describes as “providing an early sign of success for the company’s transition toward services and software”, whether it’s mostly a systemic recovery in network spending, or just moving the categories of revenue around.  I think it’s a bit of everything.

Most hardware vendors have been moving to a subscription model for all their software elements, which creates a recurring revenue stream.  New software, of course, is almost always subscription-based, and Cisco is a bit unique among network vendors in having a fairly large software (like WebEx) and server/platform business.

Cisco’s current-quarter year-over-year data shows a company that’s still feeling the impact of dipping network equipment opportunity.  Total revenue was off 2%, infrastructure platforms off 4%, and “other products” off 16%.  Security was the big winner, up 8%, with applications up 6% and services up 1%.  If you look at absolute dollars (growth/loss times revenue), the big loser was infrastructure and the big winner was applications.

Here’s the key point, the point that I think at least invalidates the story that this is an “early sign of success” for the Cisco shift in emphasis.  Infrastructure platforms are over 57% of revenue as of the most recent quarter.  Applications are about 10%, Security about 5%, and Services about 25%.  Two categories of revenue—applications and security—that are showing significant growth combine to make up only 15% of revenue, and that 57% Infrastructure Products sector is showing a significant loss.  How can gains in categories that account for only 15% of revenue offset losses in a category that account for almost four times as much revenue?

Two percent of current revenues for Cisco, the reported q/q decline, is about $240 million.  To go from 2% loss to a 2% gain, which is where guidance is, would require $480 million more revenue from those two gainer categories, which now account for about $1.8 billion in total.  Organic growth in TAM of that magnitude is hardly likely in the near term, and change in market share in Cisco’s favor similarly so.  What’s left? [see note below]

The essential answer is M&A.  Cisco has a decent hoard of cash, which it can use to buy companies that will contribute a new revenue stream.  However, Cisco classifies the revenue, getting about half a billion more would create everything Cisco needs.  Cisco is being smart by using cash and M&A to diversify, to add products and revenue to offset what seems the inevitable diminution of Cisco’s legacy, core, products’ contribution.  So yes, Cisco is transforming, but less by a transition toward software and services than by the acquisition of revenues from outside.

It may seem this is an unimportant distinction, but it’s not.  The problem with “buying revenue” through M&A is that you easily run out of good options.  It would be better if Cisco could fund its own R&D to create innovative products in other areas, but there are two problems with that.  First, what would an innovator in another “area” want with a job with Cisco?  They probably have experts in their current focus areas, which doesn’t help if those areas are in perpetual decline.  Second, it might take too long; if current infrastructure spending (at 57% of revenue) is declining at a 4% rate, the Cisco’s total revenue will take a two-and-a-quarter-percent hit.  To offset that in sectors now representing 15% of revenue, Cisco would need gains there of about 12%, right now.  That means that at least for now, Cisco needs M&A.

Most of all, it needs a clear eye to the future.  You can’t simply run out to the market and look for people to buy when you need to add something to the bottom line.  The stuff you acquire might be in at least as steep a decline as the stuff whose decline you’re trying to offset.  If you know where things are going you can prevent that, and you can also look far enough out to plan some internal projects that will offer you better down-line revenue and reduce your dependence on M&A.

Obviously, it’s not easy to find acquisitions to make up that needed $350 billion.  Cisco would have to be looking at a lot of M&A, which makes it much harder to pick out winners.  And remember that the losses from legacy sectors, if they continue, will require an offset every year.  A better idea would be to look for acquisitions that Cisco could leverage through its own customer relationships, and that would represent not only that clear current symbiosis but also future growth opportunity.  That kind of M&A plan would require a whole lot of vision.

Cisco has spent $6.6 billion this year on the M&A whose prices have been disclosed, according to this article, of which more than half was for AppDynamics.  Did that generate the kind of revenue gains they need?  Hardly.  It’s hard to see how even symbiosis with Cisco’s marketing, products, and plans could wring that much from the M&A they did.  If it could, it surely would take time and wouldn’t help in the coming year to get revenues from 2% down to 2%up.

To be fair to Cisco, this is a tough time for vision for any network vendor, and a tough industry to predict.  We have in networking an industry that’s eating its heart to feed its head.  The Internet model under-motivates the providers of connectivity in order to incentivize things that consume connectivity.  Regulations limit how aggressively network operators could elect to pursue those higher-layer services, which leaves them to try to cut costs at the lower level, which inevitably means cutting spending on equipment.

That which regulation has taken away, it might give back in another form.  The FCC will shortly announce its “end of net neutrality”, a characterization that’s fair only if you define “net neutrality” much more broadly than I do, and also that the FCC was the right place to enforce real net neutrality in the first place.  Many, including Chairman Pai of the FCC, think that the basic mission of non-discrimination and blocking that forms the real heart of net neutrality belongs in the FTC.  What took it out of there was less about consumer protection than OTT and venture capital protection.

The courts said that the FCC could not regulate pricing and service policy on services that were “information services” and explicitly not subject to that kind of regulation.  The previous FCC then reclassified the Internet as a telecommunications service, and the current FCC is now going to end that.  Whether the FCC would end all prohibitions on non-neutral behavior is doubtful.  The most it would be likely to do is accept settlement and paid prioritization, which the OTT players hate but which IMHO would benefit the ISPs to the point of improving their willingness to capitalize infrastructure.

What would network operators do if the FCC let them sell priority Internet?  Probably sell it, because if one ISP didn’t and another did, the latter would have a competitive advantage with respect to service quality.  Might the decision to create Internet QoS hurt business VPN services?  No more than SD-WAN will, inevitably.

Operators could easily increase their capex enough to change Cisco’s revenue growth problems into opportunities.  Could Cisco be counting on the reversal of neutrality?  That would seem reckless, particularly since Cisco doesn’t favor the step.  What Cisco could be doing is reading tea leaves of increasing buyer confidence; they do report an uptick in order rates.  Some of that confidence might have regulatory roots, but most is probably economic.  Networking spending isn’t tightly coupled to GDP growth in the long term (as I’ve said in other blogs) but its growth path relative to GDP growth still takes it higher in good times.

The question is what tea leaves Cisco is reading.  Their positioning, which is as strident as always, is still lagging the market.  Remember that Cisco’s strategy has always been to be a “fast follower” and not a leader.  M&A is a better way to do that because an acquired solution can be readied faster than a developed one, and at lower cost.  But fast following still demands knowing where you’re going, and it also demands that you really want to be there.  There is nowhere network equipment can go in the very long term but down.  Value lies in experiences, which means software that creates them.  I think there are players out there that have a better shot at preparing for an experience-driven future than any Cisco has acquired.

What Cisco probably is doing is less preparing for “the future” than slapping a band-aid on the present.  They are going to leak revenue from their infrastructure stuff.  The market is going to create short-term wins for other companies as the networking market twists and turns, and I think Cisco is grabbing some of the wins to offset the losses.  Regulatory relief would give them a longer period in which to come to terms with the reality of networking, but it won’t fend off the need to do that.  The future doesn’t belong to networking at this point, and Cisco has yet to show it’s faced that reality.

[The paragraph in italics had errors in its original form and is corrected here!]

MEF 3.0: Progress but not Revolution

We have no shortage of orchestration activity in standards groups, and the MEF has redoubled its own Life Cycle Orchestration (LSO) efforts with its new MEF 3.0 announcement.  The overall approach is sound at the technical level, meaning that it addresses things like the issues of “federation” of service elements across provider boundaries, but it also leaves some gaps in the story.  Then there’s the fact that the story itself is probably not completely understood.

Virtualization in networking is best known through the software-defined network (SDN) and network functions virtualization (NFV) initiatives.  SDN replaces a system of devices with a different system, one based on different principles in forwarding.  NFV replaces devices with hosted instances of functionality.  The standards activities in the two areas are, not surprisingly, focused on the specific replacement mission of each.  SDN focuses on how forwarding substitutes for switching/routing, and NFV on how you make a hosted function look like a device.

The problem we’ve had is that making a substitution workable doesn’t make it desirable.  The business case for SDN or NFV is hard to make if at the end of the day, the old system and the new are equivalent in every way, yet that’s the “replacement” goal each area has been pursuing.  Operators have shifted their view from the notion that they could save enough in capital costs by the change to justify it, to the notion that considerable operational savings and new-service-opportunity benefits would be required.  Hence, the SDN and NFV debates have been shifting toward a debate on service lifecycle management automation.

Neither SDN nor NFV put SLMA in-scope for standardization, which means that the primary operations impact of both SDN and NFV is to ensure that the opex and agility of the new system isn’t any worse than that of the old.  In fact, NFV in particular is aiming at simple substitution; MANO in NFV is about getting a virtual function to the state of equivalence with a physical function.  It’s the lack of SLMA capability that’s arguably hampering both SDN and NFV deployment.  No business case, no business.

The MEF has taken a slightly approach with its “third network”, and by implication with MEF 3.0.  The goal is to create not so much a virtual device or network, but a virtual service.  To support that, the LSO APIs are designed to support “federated” pan-provider control of packet and optical elements of a service, and also for the coordination of higher-layer features (like security) that are added to basic carrier Ethernet.

There are three broad questions about the MEF approach.  First is the question of federation; will the model address long-standing operator concerns about it?  Second is the question of carrier-Ethernet-centricity; does the MEF really go far enough in supporting non-Ethernet services?  Finally, there’s the overarching question of the business case; does MEF 3.0 move the ball there?  Let’s look at each.

Operators have a love/hate relationship with federation, and I’ve worked for a decade trying to help sort things out in the space.  On one hand, federation is important for operators who need to provide services larger than their own infrastructure footprint.  On the other, federation might level the playing field, creating more competitors by helping them combine to offer broader-scope services.  There’s also the problem of how to ensure that federation doesn’t create a kind of link into their infrastructure for others to exploit, by seeing traffic and capacity or by competing with their own services.

Facilitating service federation doesn’t address these issues automatically, and I don’t think that the MEF takes substantive steps to do that either.  However, there is value to facilitation, and in particular for the ability to federate higher-layer features and to integrate technology domains within a single operator.  Thus, I think we can say that MEF 3.0 is at least useful in this area.

The second question is whether the MEF goes far enough in supporting its own notion of the “third network”, the use of carrier Ethernet as a platform for building services at Level 3 (IP).  I have the launch presentation for the MEF’s Third Network, and the key slide says that Carrier Ethernet lacks agility and the Internet lacks service assurance (it’s best-efforts).  Thus, the Third Network has to be agile and deterministic.  Certainly, Carrier Ethernet can be deterministic, but for agility you’d have to be able to deliver IP services and harmonize with other VPN and even Internet technologies.

While the basic documents on MEF 3.0 don’t do much to validate the Third Network beyond claims, the MEF wiki does have an example of what would almost have to be the approach—SD-WAN.  The MEF concept is to use an orchestrated, centrally controlled, implementation of SD-WAN, and they do define (by name at least) the associated APIs.  I think more detail in laying out those APIs would be helpful, though.  The MEF Legato, Presto, and Adagio reference points are called out in the SD-WAN material, but Adagio isn’t being worked on by the MEF, and as a non-member I’ve not been able to pull the specs for the other two.  Thus, it’s not clear to me that the interfaces are defined enough in SD-WAN terms.

Here again, though, the MEF does something that’s at least useful.  We’re used to seeing SD-WAN as a pure third-party or customer overlay, and usually only on IP.  The MEF extends the SD-WAN model both to different technologies (Ethernet and theoretically SDN, but also involving NFV-deployed higher-layer features), and to a carrier-deployed model.  Another “useful” rating.

The final point is the business-case issue.  Here, I think it’s clear that the MEF has focused (as both SDN and NFV did) on exposing service assets to operations rather than on defining any operations automation or SLMA.  I don’t think you can knock them for doing what everyone else has done, but I do think that if I’ve declared SDN and NFV to have missed an opportunity in SLMA, I have to do the same for the MEF 3.0 stuff.

Where this leaves us is hard to say, but the bottom line is that we still have a business-case dependency on SLMA and still don’t have what operators consider to be a solution.  Would the MEF 3.0 and Third Network approach work, functionally speaking?  Yes.  So would SDN and NFV.  Can we see an easy path to adoption, defined and controlled by the MEF itself?  Not yet.  I understand that this sort of thing takes time, but I also have to judge the situation as it is and not how people think it will develop.  We have waited from 2012 to today, five years, for a new approach.  If we can’t justify a candidate approach at the business level after five years, it’s time to admit something was missed.

There may be good news on the horizon.  According to a Light Reading story, Verizon is betting on a wholesale SD-WAN model that would exploit the MEF 3.0 approach, and presumably wrap it in some additional elements that would make it more automated.  I say “presumably” because I don’t see a specific framework for the Verizon service, but I can’t see how they’d believe a wholesale model could be profitable to Verizon and the Verizon partner, and still be priced within market tolerance, unless the costs were wrung out.

We also have news from SDxCentral that Charter is looking at Nuage SD-WAN as a means of extending Ethernet services rather than of creating IP over Ethernet.  That would be an enhanced value proposition for the Third Network vision, and it would also establish that SD-WAN is really protocol-independent at the service interface level, not just in the support for underlayment transport options.  This is the second cable company (after Comcast) to define a non-MPLS VPN service, and it might mean that this will be a differentiator between telco and cableco VPNs.

How much the MEF vision alone could change carrier fortunes is an issue for carriers and for vendors as well.  Carrier Ethernet is about an $80 billion global market by the most optimistic estimates, and that is a very small piece of what’s estimated to be a $2.3 trillion communications services market globally.  Given that, the MEF’s vision can succeed only if somehow Ethernet breaks out of being a “service” and takes a broader role in all services.  There’s still work needed to support that goal.

Are Fiber Network Players Really Playing Well Enough?

We are seeing more signs of the fiber challenge and opportunity, and more uncertainty about how it will play out, especially in terms of winners and losers.  Ciena continues to take sensible steps, Infinera continues to stumble, and making sense of these seeming contradictions is the challenging part of assessing fiber’s future.

It’s not like we don’t all know that fiber deployment has nowhere to go but up.  Wireless alone could double fiber in service by 2025, and there’s a lot of global interest in increasing the commitment to fiber access, especially FTTN combined with 5G.  The challenge for fiber network players like Ciena and Infinera is that they don’t sell glass, but systems, and the role of those systems in a fiber-rich future is much more difficult to determine.

Most network hardware includes fiber interfaces, so single-mission point-to-point or even multipoint connections don’t require the equipment fiber networking vendors offer.  What you need their gear for is building “fiber networks”, which are connective Layer-One structures that provide optical multi-hop paths, aggregation, and distribution of capacity.  If you’re a fiber vendor, you either have to focus on expanded applications of fiber networking, or you have to bet on expansion in the few areas of fiber deployment that are essentially point-to-point but do require or justify specialized devices.

Infinera seems to have taken the second option, talking more about things like subsea cables for intercontinental connection.  Yes, we’re likely to have more of that, but no, it’s not likely to be a huge growth opportunity.  Data center interconnect is another area that they’ve identified, and while surely the cloud will increase the need for that, it’s not exactly a household-scale opportunity.  Of the 7.5 million business sites in the US, for example, only about 150,000 represent any scale of data center, and my surveys say that only 8,000 even represent multiple data centers of a single business.

Ciena has done a better job in positioning optical networking as a target, and focusing on what I think is the fundamental truth for optical network vendors—you need to have a connective, multi-hop, complex Layer One infrastructure opportunity if you want to have an opportunity for discrete optical network products versus glass connected to the interfaces of electrical-layer stuff.  Even Ciena, though, may not be going quite far enough or being quite explicit enough.

It’s helpful here to look at the extreme case.  What would magnify the value of optical networking in its true sense?  Answer, diminution of electrical networking in the same sense.  Put in the reverse, the more connectivity we manage at the optical layer the more the electrical layer looks like a simple edge function.

This is a clear description of what a combination of agile optics and “virtual wires” would be.  If Level 1 is fully connective (in virtual wire form), fully resilient in recovery from faults, and fully elastic in terms of capacity, then higher protocol layers are just the stuff that creates an interface and divides traffic up among the virtual pipes.  SD-WAN is a good example; if you’re going to build services on an overlay principle you’d achieve the lowest cost and simplest operation by overlaying them on the most primitive underlay you can build—a virtual wire.

Virtual wires can be distinguished from optical paths by the presumption that a virtual wire is a Level 1 element that carries traffic but doesn’t participate in any data-plane protocol exchange.  Optical paths can be viewed as an implementation option for virtual wires, but probably not one broadly applicable enough to fulfill their potential.  The problem is that everyone can’t have a piece of an optical pipe serving them; you need to have some electrical-layer tail connectivity that aggregates onto the higher-capacity optical routes.  That’s what Ciena just announced, with the notion of a packet-technology edge function.

“Edge” is important here, because the closer you can get a fiber network—even a “fiber network” that’s including electrical/packet tail connections—to the edge, the more you can absorb into it in terms of features, functions, and benefits.  That absorption is what increases the value of fiber networks, and networking, and raises the revenue potential for vendors in the space.

If we look at edge computing in abstract, it’s tempting to see connectivity requirements as nothing more than a greater number of DCI paths, because edge computing is computer and data center connection if considered on its own.  The thing is, we have to consider it in the context of what else is at the edge.

The majority of edge computing sites will be sites where telecom/cablecom and wireless services converge.  Think telco central office.  There is already considerable traffic in and out of these locations, much of which is concentrated using its own specific equipment.  Historically, the “metro network” was a network created with optics (SONET) and supported through on-ramp add-drop multiplexers that offered operators a way of clumping a variety of traffic sources onto fast fiber paths.  If edge computing comes along, it adds to the stuff that needs clumping, and could potentially further justify the notion of a separate optical-layer network.

Ciena and Infinera already have “metro network” products and strategies, and it seems to me that edge computing is effectively an update to these strategies, a way of providing virtual wires to extend optics, perhaps even virtual-wire services to end users.  Ciena talks about some of the specific value propositions for 10 and 100GigE to the edge, but they really should explore two issues.  First, how do you keep the various higher-speed packet interfaces that the future will demand from being realize as simple glass between boxes and not elements of an optical network?  Second, how can you turn packet-edge into service-virtual-wire?

Virtual private networks can be created without switches/routers in a variety of ways, all of which are likely to offer lower service costs and greater operator profits.  Even things like content delivery networks and mobile packet core can be built that way, and we’re already seeing examples of this.  The logical pathway for operators to achieve better profits is to use cheaper technology—both in capex and opex terms—to create services.  Virtual wires would be a good way to start, because they can link in with SD-WAN, with virtual switch/router instances, and even with NFV-hosted service elements.

Optical players like Ciena and Infinera have an opportunity to anticipate what is likely an inevitable shift in how services like these are created, but it’s not one that will be automatically realized.  Vendors have to sing their own song, and sing effectively, if they want their buyers to listen.  Ciena has taken more positive steps in this direction, but even they’re not quite where they need to be.  Infinera has some hard choices to face.

A good, and sadly deceased, friend of mine, Ping Pan, was an advocate of a virtual wire concept.  He was one of the architects of the IETF effort on “pseudowires”, in fact, and if we’d had all the mature thinking on the cloud, virtual switches, virtual routers, instances of forwarding software, and SD-WAN, that we now have, he’d have seen the connection.  Edge instances of forwarding processes can combine with virtual wires to create all but the largest-scale services.  Interestingly, he was working at Infinera during some of his work on pseudowires.  They should have listened.

 

 

Exploiting the Full Scope of IoT Opportunity

IoT has been contending for the most-hyped technology of our time, and a recent T3C Summit event that cause got a big boost.  According to SDxCentral’s summary of a panel at the event, “…it makes sense that in the Internet of Things (IoT) boom, with its expected 20 billion to 50 billion connected devices by 2020, there’s money to be made by telcos.”  The title of the article characterizes this as a “multi-billion-dollar opportunity.”  Not necessarily, or even probably, unless you look way beyond the obvious.

IoT suffers, as most modern technology developments do, from “bracket creep”.  It gets good ink.  Therefore vendors append the IoT tag to anything that remotely resembles it.  Therefore there’s a constant advance in claims and stories that attract reporters and editors.  Therefore it gets good ink.  You get the picture.  So, yes, we may well end up with 20 to 50 billion connected devices by 2020, but my model says that far less than a tenth of 1% of those devices will be in any way visible to the telcos, much less earning them revenue.

The reason I’m harping on this is that we’re seeing another great market opportunity suffer from stupid positioning.  Any telco who thinks they’ll make their future numbers from IoT is not only doomed to disappointment on that specific opportunity, they’re probably overlooking the real opportunity in IoT.  The wrong focus is not only wrong, it usually precludes having the right focus, which is edge computing.

Another article, this one from Light Reading’s Carol Wilson, quotes the PCCW VP of Engineering as saying that “Competing in the digital services space doesn’t mean going up against web-scalers, it means doing edge-computing….It all comes back to FOG and edge cloud architecture.”  That’s the real point for sure, and IoT would surely be able to earn operators billions if they listened.

Operators have one unique advantage in the fog space—they have real estate.  There are about 70,000 edge offices of telcos worldwide, and another 30,000 deeper-sited offices, for a total of over a hundred thousand locations.  It’s tough to put a data center in the back of a truck and make all the necessary connections; you need permanent real estate, so operators have the place to put over a hundred thousand incremental fog data centers without buying any new buildings.  Amazon, Google, and other OTTs don’t have that advantage, so it would make sense for operators to exploit their real estate assets.

This ties into IoT for two reasons.  First, IoT isn’t about on-the-Internet sensors at all, because the majority of sensors are designed to be used in private applications.  If we put those billions of connected devices directly on the Internet, we’d have billions of hacks and spoofs and spend tens of billions on security making it look like the devices weren’t really there at all.  The fact is that the model of IoT we’ll see dominating is one where the sensors are on a private network that might not even use Internet technology at all (home sensor networks typically don’t).  The sites where they’re located are already connected, so there’s zero revenue associated with connecting those sensors.

Where the revenue comes from is digesting, summarizing, and correlating sensor data.  As I’ve said in other blogs, nobody is going to succeed in IoT if every application has to deal with raw sensor data.  Apart from access, security, and sheer ability to field all the requests, it would be too much work to write something like that and the result would be so sensor-specific it would be brittle.  An army would be needed to keep it up to date.

A better approach would be to presume that there are trusted providers who subscribe to sensor information and do various cuts and splices to create insight.  For example, if there’s a sensor that records ambient temperature in a bunch of places, you could look at patterns of change to detect conditions that range from a sudden cold front to a power failure.  In traffic terms, you could assess traffic patterns at a high level and even predict when a mess of cars in one area was going to translate to a mess in another because of movement along predictable routes.  There are many, many, types of insight that could be drawn, and many applications that would want to take a stab at drawing it.

Who provides all this good stuff, and where is it run?  The second point I talked about is that edge computing is close to the source of telemetry.  Quick access means timely analysis and correlation, which means edge-processed IoT events can lead to more timely insights.  That makes these event-analysis processes valuable in themselves, meaning something others would subscribe to for a fee.  Not only that, edge locations are able to initiate responses with a lower delay, so if the application demands reaction and not just analysis, you could sell the hosting of the reactive process at the edge more easily than somewhere deeper.

Connecting IoT devices is a silly mission.  Sure, operators could offer 5G connectivity (at a cost) to users, but would the users pay when some vendor offered them a free local connection to the same devices by utilizing WiFi or ZigBee or some other protocol?   Picture AT&T going to New York City and telling them they can telemetrize every intersection by adding 5G sensors, while meanwhile ConEd says that they’ll simply carry the traffic on the power connection.  Everyone with a current Internet connection can simply use it to get access to sensors connected to some local control point.  Not a good business for operators to get into, in short.

Turning sensor data into useful, actionable, intelligence?  That’s a whole different story.  Here we have an opportunity to add value, which is the surest way to add revenue.  The challenge is that it’s not at all clear how regulators would treat this kind of telco mission.  Regulatory policy on higher-level services has traditionally said that telcos have to offer such things through a separate subsidiary.  That could preclude their exploiting regulated assets, which in most cases would include real estate.  How that subsidiary was capitalized might also be an issue, and this combination makes it much harder for operators to exploit their advantages.

It also makes it a lot harder for IoT to happen, at least happen in an optimal way.  It’s hard to pick a group that has better assets to develop the market, and enlightened policy would try to encourage them to do that rather than put barriers in place.  I don’t know what other group of companies could even make the kind of investment needed in edge computing, and I don’t know whether we can really get to IoT without it.  Perhaps this is something regulators in major markets need to think about while planning policy changes.

Can NFV Make the Transition from vCPE to “Foundation Services?”

Suppose we decided that it was time to think outside the virtual CPE box, NFV-wise.  The whole of NFV seems to have fixated on the vCPE story, so much so that it’s fair to ask whether there’s anything else for NFV to address, and if so what exactly would the other options look like.

vCPE has two aspects that make it a subset (perhaps a small one) of NFV overall.  One is that it’s been focused on hosting in a general-purpose box that sits on the customer premises, outside the carrier cloud.  The other is that it’s a single-tenant, single-service model.  The first point means that unless NFV moves beyond vCPE, NFV can’t promote the carrier cloud overall.  The second means that it’s very difficult to extend NFV to anything but business services, which limits bottom-line impact.  If these are the limitations, then we should expect that “extended” NFV has to address both.

In theory, there’s nothing to prevent “vCPE” from being virtualized inside the carrier cloud, and many operators and vendors will hasten to say that even as they focus on premises-device-based implementations.  The practical truth is that unless you have a fairly extensive edge-hosted carrier cloud in place, it would be difficult to find a good spot to put the vCPE VNFs other than on premises.  You don’t want to pull traffic too far from the natural point of connection to add features like firewall and encryption, and any extensive new connection requirements would also increase operations complexity and cost.

There’s also an incremental-cost issue to address.  A service has to be terminated in something, meaning that it has to offer a demarcation interface that users can consume, and whatever premises features are expected for the service.  An example is consumer or even small-branch broadband; you need to terminate cable or FiOS, for example, and a WiFi router, which means that you probably have to cover most of the device cost with the baseline features.  Adding in firewalls and other elements won’t add much, so removing them to the cloud won’t save much.

The “tenancy” question is even more fundamental.  Obviously, something hosted on a customer’s premises isn’t likely to be multi-tenant, and it’s very possible that the focus on vCPE has inadvertently created an NFV fixation on single-tenant VNFs.  That’s bad because the great majority of service provider opportunity is probably based on multi-tenant applications.

If you want to host a website, you don’t spin up an Internet to support it.  In many cases you don’t even spin up a new server, because the hosting plan for most businesses uses shared-server technology.  If you believe in wireless, do you believe that every customer gets their own IMS and EPC?  Is 5G network slicing likely to be done on a slice-per-phone basis?  IoT presumes shared sensors, virtual or real.  Almost everything an OTT offers is multi-tenant, and the operators want to reap the service opportunities that OTTs now get almost exclusively.  Might that demand multi-tenant thinking?  Thus, might it demand multi-tenant NFV?

There are huge differences between a vCPE application and virtual IMS or EPC.  The one that immediately comes to mind is that “deployment” is something that’s done once, not something done every time a contract is renewed.  The fact is that multi-tenant VNFs would probably have to be deployed and managed as cloud components rather than through the traditional NFV MANO processes, for the simple reason that the VNFs would look like cloud components.

This raises an important question for the cloud and networking industries, and one even more important for “carrier cloud” because it unites the two.  The question is whether NFV should be considered a special case of cloud deployment, or whether NFV is something specific to per-user-per-service vCPE-like deployments.  Right now, it’s the latter.  We have to look at whether it should or could become the former.

The first step is to ask whether you could deploy a multi-tenant service element using NFV.  At the philosophical level this would mean treating the network operator as the “customer” and deploying the multi-tenant elements as part of the operator’s own service.  There’s no theoretical reason why the basic NFV processes couldn’t handle that.  If we made this first-stage assumption, then we could also presume that lifecycle management steps would serve to scale it or replace faulted components.  The key is to ensure that we don’t let the operator’s customers have control over any aspect of shared-tenant element behavior.  Again, no big deal; users of a corporate network service wouldn’t have control over that service as a shared-tenant process; the network group would control it.

One fly in the ointment that I came across early on is that many of these advanced shared-tenant features are themselves pieces of a larger application.  IMS and EPC go together in 4G networks, for example.  If you deploy them independently, which likely you would do since they are separate pieces of the 3GPP mobile infrastructure model, then you’d have to know where one was put so you could integrate it with the other.  In the original CloudNFV plan, these kinds of features were called “foundation services” because they deployed for building into multiple missions.

Foundation services are like applications in the cloud.  They probably have multiple components and they probably have to be integrated in an access or workflow sense with other applications.  The integration process at the minimum would have to support a means of referencing foundation services from other services, including other foundation services.  In “normal” NFV, you would expect the service elements to be invisible outside the service; not so here.

This relationship between foundation services and NFV may be at the heart of NFV’s future.  Somebody asked me, on my blog yesterday, what the value proposition was for the deployment of cloud elements via NFV.  The only possible answer is improved lifecycle management, meaning management across the spectrum of legacy and software-hosted elements.  That’s not handled by NFV today, though it should be, and so NFV is not clearly useful in foundation service applications.  This, despite people in places like AT&T saying that NFV is fundamental to 5G, means that it’s not clear NFV is needed or even useful there.

You can’t create the future by declaring it, people.  If we want NFV to take the next step, then it has to do what’s necessary.  We have ample evidence of both the truth of this and the direction that step has to be taken.  Is it easier to do nothing?  Sure, but “nothing” is what will result.

Are the “Issues” With ONAP a Symptom of a Broader Problem?

How do you know that software or software architectures are doing the wrong thing?  Answer: They are doing something that only works in specific cases.  That seems to be a problem with NFV software, including the current front-runner framework, ONAP.  The initial release, we’re told by Light Reading, will support a limited set of vCPE VNFs.  One application (vCPE) and a small number of functions not only doesn’t make NFV successful, it begs the question of how the whole project is going together.

Linux is surely the most popular and best-known open-source software product out there.  Suppose that when Linux came out, Linus Torvalds said “I’ve done this operating system that only works for centralized financial applications and includes payroll, accounts receivable, and accounts payable.  I’ll get to the rest of the applications later on.”  Do you think that Linux would have been a success?  The point is that a good general-purpose tool is first and foremost general-purpose.  NFV software that “knows” it’s doing vCPE or that has support for only some specific VNFs isn’t NFV software at all.

NFV software is really about service lifecycle management, meaning the process of creating a toolkit that can compose, deploy, and sustain a service that consists of multiple interdependent pieces, whether they’re legacy technology elements or software-hosted virtual functions.  If every piece of a service has to be interchangeable, meaning support multiple implementations, then you either have to be able to make each alternative for each piece look the same, or you have to author the toolkit to accommodate every current and future variation.  The latter is impossible, obviously, so the former is the only path forward.

To make different implementations of something look the same, you either have to demand that they be the same looking from the outside in, or you have to model them to abstract away their differences.  That’s what “intent modeling” is all about.  Two firewall implementations should have a common property set that’s determined by their “intent” or mission—which in this case is being a firewall.  An intent model looks like “firewall” to the rest of the service management toolkit, but inside the model there’s code that harmonizes the interfaces of each implementation to that abstract intent-modeled reference.

If there’s anything that seems universally accepted in this confusing age of SDN and NFV, it’s the notion that intent models are critical if you want generalized tools to operate on non-standardized implementations of service components.  How did that get missed here?  Does this mean that there are some fundamental issues to be addressed in ONAP, and perhaps in NFV software overall?  Can they be addressed at this point?

From the very first, NFV was a software project being run by a traditional standards process.  I tried to point out the issues in early 2013, and the original CloudNFV project addressed those issues by defining what came to be known as “intent modeling”.  EnterpriseWeb, the orchestration partner in CloudNFV, took that approach forward into the TMF Catalyst process, and has won awards for its handling of the process of “onboarding” and “metamodels”, the implementation guts of intent modeling.  In short, there’s no lack of historicity and support for the right approach here.  Why then are we apparently on the wrong track?

I think the heart of the problem is the combination of the complexity of the problem and the simplicity of ad-sponsored media coverage.  Nobody wants to (or probably could) write a story on the real NFV issues, because a catchy title gets all the ad servings you’re ever going to get on a piece.  Vendors know that and so they feed the PR machine, and their goal is to get publicity for their own approach—likely to be minimalistic.  And you need a well-funded vendor machine to attend standards meetings or run media events or sponsor analyst reports.

How about the heart of the solution?  We have intent-model implementations today, of course, and so it would be possible to collect a good NFV solution from what’s out there.  The key piece seems to be a tool to facilitate the automated creation of the intent models, to support the onboarding of VNFs and the setting of “type-standards” for the interfaces.  EnterpriseWeb has showed that capability, and it wouldn’t be rocket science for other vendors to work out their own approaches.

It would help if we accepted the fact that “type-standards” are essential.  All VNFs have some common management properties, and all have to support lifecycle steps like horizontal scaling and redeployment.  All VNFs that have the same mission (like “firewall”) should also have common properties at a deeper level.  Remember that we defined SNMP MIBs for classes of devices; why should it be any harder for classes of VNF?  ETSI NFV ISG: If you’re listening and looking for useful work, here is the most useful thing you could be doing!

The media could help here too.  Light Reading has done a number of NFV articles, including the one that I opened with.  It would be helpful if they’d cover the real issues here, including the fact that no international standards group or body with the same biases as the NFV ISG has a better chance of getting things right.  This is a software problem that software architectures and architects have to solve for us.

It may also help that we could get a new body working on the problem.  ETSI is setting up a zero-touch automation group, interesting given that the NFV ISG should have addressed that in their MANO work, that the TMF has had a ZOOM (Zero-touch Orchestration, Operation, and Management) project since 2014, and that automation of the service lifecycle is at least implicit in almost all the open-source MANO stuff out there, including ONAP.  A majority of the operators supporting the new ETSI group tell me that they’d like to see ONAP absorbed into it somehow.

These things may “help”, but optimal NFV demands optimal software, which is hard to achieve if you’ve started off with a design that doesn’t address the simple truth that no efficient service lifecycle management is possible if all the things you’re managing look different and require specific and specialized accommodation.  This isn’t how software is supposed to work, particularly in the cloud.  We can do a lot by adding object-intent-model abstractions to the VNFs and integrating them that way, but it’s not as good an approach as starting with the right software architecture.  We should be building on intent modeling, not trying to retrofit it.

That, of course, is the heart of the problem and an issue we’re not addressing.  You need software architecture to do software, and that architecture sets the tone for the capabilities in terms of functionality, composability, and lifecycle management.  It’s hard to say whether we could effectively re-architect the NFV model the right way at this point without seeming to invalidate everything done so far, but we may have to face that to keep NFV on a relevant track.

Does Nokia’s AirGile Advance Stateless Event-Based VNFs?

The notion of stateless microservices for networking and the cloud is hardly new.   I introduced some of the broad points on state in my blog last week, but the notion is much older.  Twitter pioneered the concepts, and Amazon, Google, and Microsoft have all deployed web services to support the model, which is aimed at event processing.  Now, Nokia has announced what it calls “AirGile”, which wraps the stateless-microservice notion into a massive 5G package.  It’s a bold approach, and there are some interesting points made in the material, but is this just too big…not to fail but to succeed?  Or is it something else?

I’ve blogged often on functional programming, lambdas, and microservices, and I won’t bore everyone by repeating what I said (you can do a search on the terms on my blog to find the references).  The short version is that the goal of these concepts, which are in many ways just different faces on the same coin, is to create software components that can be spun up as needed, where needed, and in any quantity, then disappear till the next time.  You can see how this would be perfect for event-handling, since events are themselves transitory stuff.

Events are the justification for AirGile, and certainly event-based systems are critical for the cloud.  It’s also likely that many of the NFV applications are really event applications, though this is less true of the vCPE stuff that dominates the NFV space today.  vCPE VNFs are in the data path, and microservices and functional programming are less relevant to that space than to control-plane or transactional stuff.  Nokia doesn’t make the distinction in their material.

Overall, the AirGile story is a bit hard to pull out of the material; the press release is HERE.  I pick out three elements—a developer program and API set, a microservice-based model that’s more “cloud-agile”, and a target application set that seems to be primarily 5G but might also include NFV (Alcatel-Lucent’s CloudBand).  Two of the three things have been around all along and are presumably tweaked for the AirGile story, so it’s actually the microservices stuff that’s new.  Unfortunately, there is virtually nothing said about that part in the material.  As a result, I’m going to have to do some heavy lifting to assess what’s going on here, meaning presume that there is actual useful thinking behind the story and see what it might be.

I think that this is really mostly about NFV, because NFV is mentioned in the release (CloudBand), is a proposed element in 5G deployment, and is based on cloud technology (OpenStack, for example).  NFV and the cloud have a common touch-point in that components of functionality are deployed in both—NFV as virtual network functions and the cloud as applications.  Microservices are components of software, and so you could say that a microservice architecture could serve both NFV and the cloud.  However, Nokia is a network vendor and not an application provider, so it’s the NFV side that seems to be the linchpin.  There, mobile services and 5G offer an alternative target to that vCPE stuff I mentioned above, an alternative that is easier to cast as an event-based application.  That, in the simplest terms, is how I see AirGile; do the next generation of NFV and focus on control-plane events.

If for Nokia AirGile is mostly an NFV story, then what is being deployed are microservices as VNFs, and in fact they do make that point in their material.  Paraphrasing, operators could create services by assembling microservice-functions, presumably components of VNFs, and do this in a more agile way.  True in general, since composition of applications from microservices is widely accepted as promoting agility.  So let’s take this “truth” (if I’m correct) and run with it.

VNFs today are not microservices, and Nokia can’t do anything from the outside to make them so.  A piece of software is stateless and “functional” only because it was written to be.  Thus, a focus on microservice-VNFs means a focus on NFV applications that don’t depend on transporting legacy physical device code or current network software into VNF form.  You can transport that stuff to a VNF, but you can’t make it a microservice without rewriting it.

Stateless, microservice-based, VNFs are then the components of 5G implementations and other network advances Nokia hopes to exploit.  This supposes a model of NFV that’s very different from today’s model, but remember that NFV today is really all about the single application we’d call “virtual CPE” or vCPE, created by service-chaining VNFs that support things like app acceleration, firewalls, encryption, and so forth.  vCPE is valuable if it can exploit the range of CPE features that are already out there, and so it’s essentially nailed to a legacy software model, not a microservice model.  Nokia, if AirGile is important, has to find new stuff to do with VNFs, and new development to support it.

The advantage of microservice-VNFs, which I’ll shorthand to mVNFs, is that they are inherently stateless.  A stateful component has stuff that implies a contextual awareness of past events, stored within.  If you replace a stateful component, you lose that stuff.  If you scale it, the new copies don’t have what the original had, and thus they might interpret the next message differently.  However, most network functions need “state”, at least in the sense that they store some variables, and Nokia seems to be planning to handle state by providing a back-end database where the variables are stored, keeping them out of the components.  This back-end state control is used routinely in the cloud, so this isn’t a radical departure from industry norms.

Still, we don’t have this sort of VNF hanging around the vCPE world, as I’ve said.  I don’t think that Nokia believes that vCPE is going to set the carrier world on fire, opportunity-wise, and you all know I don’t think so either.  They do, however, need some carrier for their strategy or it’s just a software architecture that only programmers would care about.  Their carrier is 5G.

To quote myself from a recent discussion with an operator, “5G is a UFO.  It’s not going to land on your desk and present itself for inspection, so you’re free to assign whatever characteristics to it that you like!”  There are more moving parts to 5G than perhaps to any other networking standard, and most of them are not yet baked.  Thus, just what role mVNFs might play, or even NFV might play, is uncertain, and vendors like Nokia can spin their tale without fear of (at least near-term) contradiction.  If NFV is big in 5G, and if mVNFs are a good idea for greenfield network functions implementation, then 5G is the right place for AirGile.  Before you decide that I’ve written off this whole AirGile thing as a cynical play, let me make two points that in many ways expose the greater truth.

First, operators are already questioning the value of an NFV future based on porting old device functionality to the cloud.  If everyone was happy to hunker down on old physical-device stuff and stay there for a decade, Nokia and others would have a major uphill battle to push an approach that requires a VNF rewrite to mVNFs.  That’s not the case, even today, and it will obviously be less so over time as NFV goals are tied to things like 5G or IoT.  5G is important to NFV to get VNFs out of the old vCPE model, which frankly I don’t think will ever result in a big success for NFV.  Rather than address something like IoT, which probably has more potential, Nokia is aiming at a target that has near-term operator commitment and standardization support.

Second, whether 5G even deploys, it is still very smart for Nokia to link AirGile to it.  Nobody has AirGile budgets or plans, but they do have both for 5G.  Further, 5G lets Nokia say things about AirGile in the context of an accepted problem/opportunity set, using 5G to explain the need for AirGile’s features.  It’s fine to say that VNFs will migrate to mVNFs, but many won’t believe that without an example of where that would likely happen.  5G is that place, and AirGile is at least on the right track.

The question then is what the mVNF concept will do to/for 5G and NFV, and even more how it might impact IoT, which is the big event-driven champion market issue.  I think that if NFV is to play any role whatsoever in 5G, it will have to be in mVNF form because the simple monolithic VNF model just doesn’t do the job in a large-scale dynamic deployment.  Thus, while we can’t say at this stage what 5G will look like, exactly, or when it will happen (even approximately), we can say that without mVNFs it probably won’t have much NFV inside.  And IoT without mVNFs is just not going to happen, period.

I think that we’re long overdue in thinking about the hardware, network, and software platform needed for a realistic carrier cloud platform, and 5G and IoT combine to represent almost 40% of the opportunity there.  Event-driven applications are even more important, representing a minimum of 74% of carrier cloud opportunity in the long term.  But that 74% isn’t NFV as we think of it, and that’s perhaps the biggest challenge for AirGile and Nokia.  They need to think not so much of NFV but of carrier cloud, and the story there isn’t really well developed.  Might Nokia have exposed the importance of event-driven carrier cloud and not owned the opportunity?  If so, they could have greased the skids for competitors.

We don’t have enough detail on AirGile to say whether it has that golden set of features needed, but it will probably stimulate a lot of reaction from other vendors, and we will hopefully end up a lot closer to a full event-driven architecture than we are today.  I think that Nokia may help drive that closure, but I wish they’d have offered more detail on their microservices framework.  That’s where the real news and value lies, and until we understand what Nokia plans with respect to events overall, we can’t evaluate just how important it could be to Nokia and to the industry.

That’s the “something else”.  AirGile might be vague because the topic of stateless software is pretty complex, certainly beyond the typical media story.  It might also be vague because it’s “slideware” or “vaporware”, or a placeholder for future detail and relevance.  We don’t know based on what’s been released, and I hope Nokia steps up and tells its story completely.

Why “State” Matters in NFV and the Cloud

It’s time we spent a bit more time on the subject of “state”, not in a governmental sense but in the way that software elements behave, or should behave.  State, in a distributed system, is everything.  The term “state” is used in software design to indicate the notion of context, meaning where you are in a multi-step process.  Things that are “stateful” have specific context and “stateless” things don’t.  When you have states, you use them to mark where you are in a process that involves multiple steps, and you move from one state to another in response to some outside condition we could call an “event”.  Sounds simple, right?  It is anything but.  Where we’ve run into state issues a lot in the networking space is NFV, because NFV deploys software functions and expects to provide resiliency by replacing them, or scalability by duplicating.  There are two dimensions of state in NFV, and both of them could be important.

When I’ve described NFV deployment as a hierarchical model structure, I’ve noted that each of the elements in the model was an independent state machine, meaning that each piece of the model had its own state.  That state represented the lifecycle progress of the modeled service, so we can call it “lifecycle state”.  Lifecycle state is critical to any NFV framework because there are many places in a service lifecycle where “subordinate” behaviors have to be done before “superior” ones can be.  A service, at a high level, isn’t operational till all its pieces are, and so lifecycle state is critical in resolving dependencies like that.  Where lifecycle state gets complicated is during special events like horizontal scaling of the number of instances or replacement of an instance because something broke.

The complexity of scaling, in lifecycle state, lies in the scope of the process and the mechanism for selecting an instance to receive a particular packet—load balancing.  When you instantiate a second instance of a scalable VNF, you probably have to introduce a load-balancer because you now have a choice of instances to make.  In effect, we have a service model with a load-balancer in it, but not yet active, and we have to activate it and connect it.

In replacement, the problem depends on just how widespread the impact of your replacement has to be.  If you can replace a broken server with another in the same rack, there is a minimal amount of reconnection.  In that case, the deployment of the new VNF could make the correct connections.  However, if you had to put the new VNF somewhere quite distant, there are WAN connection requirements that local deployment could not hope to fulfill.  That means that you have to buck part of the replacement work “upward” to another element.  Which, of course, means that you had to model another element in the first place.

The rightful meaning of the term “orchestration” is the coordination of separate processes, and that’s what’s needed for lifecycle management.  Lifecycle state is an instrument in that coordination, a way of telling whether something is set up as expected and working as planned, and if it isn’t tracking it through a series of steps to get the thing going correctly.

The individual virtual network functions (VNFs) of NFV also have functional state, meaning that the VNF software, as part of its data-plane dialog with users and/or other VNFs, may have a state as well.  For example, a VNF that’s “session-aware”, meaning that it recognizes when a TCP session is underway, has to remember that the session has started and that it hasn’t yet ended.  If you’re actually processing a TCP flow, you will have to implement slow-start, recognize out-of-order arrivals, etc.  All of this is stateful behavior.

Stateful behavior in VNF functionality means that you can’t readily spawn a new or additional copy of a VNF and have it substitute for the original, because the new copy won’t necessarily “know” about things like a TCP session, and thus won’t behave as the original copy did.  Thus, functional statefulness can limit the ability of lifecycle processes to scale or replace VNFs.

Functional state is difficult because it’s written into the VNFs.  You can impose lifecycle state from above, so to speak, because the VNFs themselves aren’t generally doing lifecycle stuff.  You can’t impose functional state because it’s part of the programming of the VNF.  This is why “functional programming” has to address state in some specific way; it’s used to create things that can be instantiated instantly, replaced instantly, and scaled in an unfettered way.  The process of instantiating, replacing, and scaling are still lifecycle-state-driven, but the techniques used by the programmer to manage functional state still have to be considered, or you may create a second copy of something only to find that it breaks the process instead of helping performance.

To make things a bit more complex, you can have things that are “stateless” in a true sense, and things that have no internal state but are still stateful.   This is what Nokia is apparently proposing in its AirGile model (I’ll blog more on AirGile next week).  Most such models rely on what’s called “back-end state”, where an outside element like a database holds the stateful variables for a process.  That way, when you instantiate a new copy of something, you can restore the state of the old copy.

The only negative about back-end state control is that there may be a delay associated in transporting the state—both saving the state in the “master” copy and moving that saved state to the point where a new copy is going to be instantiated.  This may have to be considered in some applications where the master state can change quickly, but in both scaling and fault recovery you can usually tolerate a reasonable delay.

Every NFV service has lifecycle state, but not every service or service element has functional, internal, state.  Things that are truly stateless, referencing back to functional programming, can be instantiated as needed, replicated as indicated, and nothing bad happens because every copy of the logic can stand in for every other copy since nothing is saved during operation.  True stateless logic is harder to write but easier to operationalize because you don’t have to worry about back-end state control, which adds at least one lifecycle state to reflect the process of restoring state from that master copy.

While state is important for NFV, it’s not unique to NFV.  Harkening back to my opening paragraph, NFV isn’t the only thing that has state; it’s an attribute of nearly all distributed systems because the process of deploying such systems will always, at the least, involve lifecycle states on the components.  That means that logically we might want to think about cloud systems, applications, and services as being the same thing under the covers, and look for a solution to managing both lifecycle state and functional state that can be applied to any distributed (meaning, these days, cloud-distributed) system.

Lifecycle state, as I’ve noted in earlier blogs, can be managed by using what I’ve called representational intent, a model that stands in for the real software component and manages the lifecycle process as it relates both to the service management system overall and to the individual components.  In effect, the model becomes a proxy for the real stuff, letting something that doesn’t necessarily have a common lifecycle management capability (or even have any lifecycle awareness) be fit into a generalized service management or application management framework.

Data models, or small software stubs, can provide representational intent modeling and there have been a number of examples of this sort of lifecycle state modeling, all discussed HERE.  It’s not clear whether modeling could handle functional state, however, beyond perhaps setting up the state in a back-end state control system.  The statefulness of logic is a property of the logic itself, and even back-end state control would be difficult to say the least if the underlying software didn’t anticipate it.

I think it’s clear that for the broad distributed-system applications, some unified way of managing both lifecycle and functional state would be very valuable.  We don’t, at present, have a real consensus on how to manage either one separately at the moment, so that goal may be difficult to reach quickly.  In particular, functional state will demand a transition to functional programming or stateless microservices, and that may require a rewriting of software.  That, in turn, demands a new programming model and perhaps new middleware to harmonize development.

We’ve not paid nearly enough attention to state in NFV or in the cloud.  If we want to reap the maximum benefit from either, that has to change.