The FCC Neutrality Order: It’s Not What you Think

We have at least the as-yet unvoted draft of the FCC’s new position on Net Neutrality, and as accustomed as I am to reading nonsense about developments in tech, the responses here set a new low.  I blogged about the issues that the new FCC Chairman (Pai) seemed to be addressing here, and I won’t reprise all the details.  I’ll focus instead on what the draft says and how it differs from the position I described in the earlier blog, starting with some interesting “two-somes” behind the order.

There are really two pieces of “net neutrality”.  The first could be broadly called the “non-discrimination” piece, and this is what guarantees users of the Internet non-discriminatory access to any lawful website.  The second is the “prioritization and settlement” piece, and this one guarantees that no one can pay to have Internet traffic handled differently (prioritized) or be required to pay settlement among ISPs who carry the traffic.  The public debate has conflated the two, but in fact the current action is really aimed at the second.

There are also two competing issues in net neutrality.  The first is the interest of the consumers and OTTs who are using the Internet, and the second the profit interest of the ISPs who actually provide the infrastructure.  The Internet is almost totally responsible for declining profit per bit, and at some point this year or next, it will fall below the level needed to justify further investment.  While everyone might like “free Internet”, there will be no race to provide it.  A balance needs to be struck between consumer interest and provider interest.

As a practical matter, both the providers and the OTTs have powerful financial interests they’re trying to protect, and they’re simply manipulating the consumers.  Stories on this topic, as I said in my opening paragraph, have been simply egregious as far as conveying the truth is concerned.  The New York Attorney General is investigating whether some comments on the order were faked, generated by a third party usurping the identities of real consumers.  Clearly there’s a lot of special interest here.

Finally, there are two forums in which neutrality issues could be raised.  The first is the FCC and the second the Federal Trade Commission (FTC).  The FCC has a narrow legal mandate to regulate the industry within the constraints of the Communications Act of 1934 as amended (primarily amended by the Telecommunications Act of 1996).  The FTC has a fairly broad mandate of consumer protection.  This is a really important point, as you’ll see.

So, what does the new order actually do?  First and foremost, it reverses the previous FCC decision to classify the Internet as a telecommunications service (regulated under Title II of the Telecommunications Act of 1934).  This step essentially mandates an FCC light touch on the Internet because the Federal Courts have already invalidated many of the FCC’s previous rules on the grounds they could be applied only to Telecommunications Services.

All “broadband Internet access services”, independent of technology, would be classified as information services.  It includes mobile broadband, and also includes MVNO services.  People/businesses who provide broadband WiFi access to patrons as a mass consumer service are included.  It excludes services to specialized devices (including e-readers) that use the Internet for specialized delivery of material and not for broad access.  It also excludes CDNs, VPNs, or Internet backbone services.  The rule of thumb is this; if it’s a mass-market service to access the Internet, then it’s an information service.

The classification is important because it establishes the critical point of jurisdiction for the FCC.  The FCC is now saying that it would be restrictive to classify the Internet as Title II, but without that classification the FCC has very limited authority to regulate the specific behavior of the ISPs.  Thus, the FCC won’t provide much in the way of specific regulatory limits and penalties.  It couldn’t enforce them, and perhaps it could never have done so.  Everything they’ve done in the past, including non-discrimination, has been appealed by somebody based on lack of FCC authority, and the Title II classification was undertaken to give the FCC authority to do what it wanted.  Absent Title II, the FCC certainly has no authority to deal with settlement and prioritization, and probably has insufficient authority to police non-blocking and discrimination.  That doesn’t mean “net neutrality” goes away, as the stories have said.

The FCC will require that ISPs publish their terms of service in clear language, including performance, terms of service, and this is where the FCC believes that “neutrality” will find a natural market leveling.  The order points out that broadband is competitive, and that consumers would respond to unreasonable anti-consumer steps (like blocking sites, slowing a competitor’s offerings, etc.) by simply moving to another provider.

The order also points out that the “premier consumer protection” body, the FTC, has authority to deal with situations where anti-competitive or anti-consumer behavior arise and aren’t dealt with effectively by competitive market forces.  Thus, the FCC is eliminating the “code of conduct” that it had previously imposed, and is shifting the focus of consumer protection to the FTC.  As I noted earlier, it’s never been clear whether the FCC had the authority to impose “neutrality” except through Title II, and so the fact is that we’ve operated without strict FCC oversight for most of the evolution of the Internet.

The FTC and the marketplace are probably not enough to prevent ISPs from offering paid prioritization and for requiring settlement to deliver high-volume traffic.  In fact, one of the things I looked for in the order was the treatment of settlement among ISPs, the latter topic being particularly dear to my heart since I’ve opposed the current “bill and keep” practice for decades, and even co-authored an RFC on the topic.  The order essentially says that the FCC will not step in to regulate the way that ISPs settle for peering with each other or through various exchanges.  Again, the FCC says that other agencies, including DoJ antitrust and the FTC, have ample authority to deal with any anti-competitive or unreasonable practices that might arise.

Paid prioritization is similarly treated; the FCC has eliminated the rules against it, so ISPs are free to work to offer “fast-lane” behavior either directly to the consumer or to OTTs who want to pay on behalf of their customers to improve quality of experience.  This may encourage specific settlement, since the bill-and-keep model can’t compensate every party in a connection for the additional cost of prioritization.  We should also note that paid prioritization could be a true windfall for SD-WAN-type business services, since the economics of high-QoS services created over the top with paid prioritization would surely be a lot better than current VPN economics.  You could argue that SD-WAN might be the big winner in the order.

The OTTs will surely see themselves as the big losers.  What they want is gigabit broadband at zero cost for everyone, so their own businesses prosper.  Wall Street might also be seen as a loser, because they make more money on high-flyers like Google (Alphabet) or Netflix than on stodgy old AT&T or Verizon.  VCs lose too because social-media and other OTT startups could face higher costs if they have to pay for priority services.  That might mean that despite their grumbling, players like Facebook and Netflix could face less competition.

It will be seen as an improvement for the ISPs, but even there a risk remains.  Network operators have a very long capital cycle, so they need stability in the way they are regulated.  This order isn’t likely to do that for two reasons.  First, nobody believes that a “new” administration of the other party would leave this order in place.  Second, only legislation could create a durable framework, and Congress has been unable to do even major things.  They’ve avoided weighing in on Internet regulation for 20 years now.  Thus, realizing the full benefits of the order may be illusive because operators might be reluctant to believe the changes will persist long enough to justify changing their plans for investment in infrastructure.

The long-term regulatory uncertainty isn’t the only uncertainty here.  The Internet is global, and its regulation is a hodgepodge of competing national and regional authorities, most of whom (like the FCC) haven’t had a stable position.  “We brought in one side and gave them everything they wanted, then we brought in the other side and gave them everything they wanted,” is how a lawmaker in the US described the creation of the Telecom Act in 1996.  That’s a fair statement of regulatory policy overall; the policies favor the firms who favor the current political winners.

My view, in the net?  The FCC is taking the right steps with the order, and that view shouldn’t surprise those who’ve read my blog over the last couple of years.  Net neutrality is not being “killed”, but enforcement of the first critical part of it (what consumers think neutrality really is) is shifted to the FTC, whose power of enforcement is clear.  There is no more risk that ISPs could decide what sites you could visit than there has been—none, in other words.  It’s not a “gift to telecom firms” as one media report says, it’s a potential lifeline for the Internet overall.  This might reverse the steady decline in profit per bit, might restore interest in infrastructure investment.  “Might” if the telcos believe the order will stand.

It’s not going to kill off the OTTs either.  There is a risk that the OTTs will be less profitable, or that some might raise their rates to cover the cost of settlement with the ISPs.  Will it hurt “Internet innovation?”  Perhaps, if you believe we need another Facebook competitor, but it might well magnify innovation where we need it most, which is in extending broadband at as high a rate and low a cost as possible.

If the ISPs are smart, they’ll go full bore into implementing the new position, offering paid prioritization and settlement and everything similar or related, and demonstrating that it doesn’t break the Internet but promotes it.  That’s because there could be only about three years remaining on the policy before a new FCC threatens to take everything back.  The only way to be sure the current rules stay in place is to prove they do good overall.

Cisco’s Quarter: Are They Really Facing the Future at Last?

Cisco reported its quarterly numbers, which were still down in revenue terms, but they forecast the first growth in revenue the company has had in about 2 years of reports.  “Forecast” isn’t realization of course, but the big question is whether the gains represent what one story describes as “providing an early sign of success for the company’s transition toward services and software”, whether it’s mostly a systemic recovery in network spending, or just moving the categories of revenue around.  I think it’s a bit of everything.

Most hardware vendors have been moving to a subscription model for all their software elements, which creates a recurring revenue stream.  New software, of course, is almost always subscription-based, and Cisco is a bit unique among network vendors in having a fairly large software (like WebEx) and server/platform business.

Cisco’s current-quarter year-over-year data shows a company that’s still feeling the impact of dipping network equipment opportunity.  Total revenue was off 2%, infrastructure platforms off 4%, and “other products” off 16%.  Security was the big winner, up 8%, with applications up 6% and services up 1%.  If you look at absolute dollars (growth/loss times revenue), the big loser was infrastructure and the big winner was applications.

Here’s the key point, the point that I think at least invalidates the story that this is an “early sign of success” for the Cisco shift in emphasis.  Infrastructure platforms are over 57% of revenue as of the most recent quarter.  Applications are about 10%, Security about 5%, and Services about 25%.  Two categories of revenue—applications and security—that are showing significant growth combine to make up only 15% of revenue, and that 57% Infrastructure Products sector is showing a significant loss.  How can gains in categories that account for only 15% of revenue offset losses in a category that account for almost four times as much revenue?

Two percent of current revenues for Cisco, the reported q/q decline, is about $240 million.  To go from 2% loss to a 2% gain, which is where guidance is, would require $480 million more revenue from those two gainer categories, which now account for about $1.8 billion in total.  Organic growth in TAM of that magnitude is hardly likely in the near term, and change in market share in Cisco’s favor similarly so.  What’s left? [see note below]

The essential answer is M&A.  Cisco has a decent hoard of cash, which it can use to buy companies that will contribute a new revenue stream.  However, Cisco classifies the revenue, getting about half a billion more would create everything Cisco needs.  Cisco is being smart by using cash and M&A to diversify, to add products and revenue to offset what seems the inevitable diminution of Cisco’s legacy, core, products’ contribution.  So yes, Cisco is transforming, but less by a transition toward software and services than by the acquisition of revenues from outside.

It may seem this is an unimportant distinction, but it’s not.  The problem with “buying revenue” through M&A is that you easily run out of good options.  It would be better if Cisco could fund its own R&D to create innovative products in other areas, but there are two problems with that.  First, what would an innovator in another “area” want with a job with Cisco?  They probably have experts in their current focus areas, which doesn’t help if those areas are in perpetual decline.  Second, it might take too long; if current infrastructure spending (at 57% of revenue) is declining at a 4% rate, the Cisco’s total revenue will take a two-and-a-quarter-percent hit.  To offset that in sectors now representing 15% of revenue, Cisco would need gains there of about 12%, right now.  That means that at least for now, Cisco needs M&A.

Most of all, it needs a clear eye to the future.  You can’t simply run out to the market and look for people to buy when you need to add something to the bottom line.  The stuff you acquire might be in at least as steep a decline as the stuff whose decline you’re trying to offset.  If you know where things are going you can prevent that, and you can also look far enough out to plan some internal projects that will offer you better down-line revenue and reduce your dependence on M&A.

Obviously, it’s not easy to find acquisitions to make up that needed $350 billion.  Cisco would have to be looking at a lot of M&A, which makes it much harder to pick out winners.  And remember that the losses from legacy sectors, if they continue, will require an offset every year.  A better idea would be to look for acquisitions that Cisco could leverage through its own customer relationships, and that would represent not only that clear current symbiosis but also future growth opportunity.  That kind of M&A plan would require a whole lot of vision.

Cisco has spent $6.6 billion this year on the M&A whose prices have been disclosed, according to this article, of which more than half was for AppDynamics.  Did that generate the kind of revenue gains they need?  Hardly.  It’s hard to see how even symbiosis with Cisco’s marketing, products, and plans could wring that much from the M&A they did.  If it could, it surely would take time and wouldn’t help in the coming year to get revenues from 2% down to 2%up.

To be fair to Cisco, this is a tough time for vision for any network vendor, and a tough industry to predict.  We have in networking an industry that’s eating its heart to feed its head.  The Internet model under-motivates the providers of connectivity in order to incentivize things that consume connectivity.  Regulations limit how aggressively network operators could elect to pursue those higher-layer services, which leaves them to try to cut costs at the lower level, which inevitably means cutting spending on equipment.

That which regulation has taken away, it might give back in another form.  The FCC will shortly announce its “end of net neutrality”, a characterization that’s fair only if you define “net neutrality” much more broadly than I do, and also that the FCC was the right place to enforce real net neutrality in the first place.  Many, including Chairman Pai of the FCC, think that the basic mission of non-discrimination and blocking that forms the real heart of net neutrality belongs in the FTC.  What took it out of there was less about consumer protection than OTT and venture capital protection.

The courts said that the FCC could not regulate pricing and service policy on services that were “information services” and explicitly not subject to that kind of regulation.  The previous FCC then reclassified the Internet as a telecommunications service, and the current FCC is now going to end that.  Whether the FCC would end all prohibitions on non-neutral behavior is doubtful.  The most it would be likely to do is accept settlement and paid prioritization, which the OTT players hate but which IMHO would benefit the ISPs to the point of improving their willingness to capitalize infrastructure.

What would network operators do if the FCC let them sell priority Internet?  Probably sell it, because if one ISP didn’t and another did, the latter would have a competitive advantage with respect to service quality.  Might the decision to create Internet QoS hurt business VPN services?  No more than SD-WAN will, inevitably.

Operators could easily increase their capex enough to change Cisco’s revenue growth problems into opportunities.  Could Cisco be counting on the reversal of neutrality?  That would seem reckless, particularly since Cisco doesn’t favor the step.  What Cisco could be doing is reading tea leaves of increasing buyer confidence; they do report an uptick in order rates.  Some of that confidence might have regulatory roots, but most is probably economic.  Networking spending isn’t tightly coupled to GDP growth in the long term (as I’ve said in other blogs) but its growth path relative to GDP growth still takes it higher in good times.

The question is what tea leaves Cisco is reading.  Their positioning, which is as strident as always, is still lagging the market.  Remember that Cisco’s strategy has always been to be a “fast follower” and not a leader.  M&A is a better way to do that because an acquired solution can be readied faster than a developed one, and at lower cost.  But fast following still demands knowing where you’re going, and it also demands that you really want to be there.  There is nowhere network equipment can go in the very long term but down.  Value lies in experiences, which means software that creates them.  I think there are players out there that have a better shot at preparing for an experience-driven future than any Cisco has acquired.

What Cisco probably is doing is less preparing for “the future” than slapping a band-aid on the present.  They are going to leak revenue from their infrastructure stuff.  The market is going to create short-term wins for other companies as the networking market twists and turns, and I think Cisco is grabbing some of the wins to offset the losses.  Regulatory relief would give them a longer period in which to come to terms with the reality of networking, but it won’t fend off the need to do that.  The future doesn’t belong to networking at this point, and Cisco has yet to show it’s faced that reality.

[The paragraph in italics had errors in its original form and is corrected here!]

MEF 3.0: Progress but not Revolution

We have no shortage of orchestration activity in standards groups, and the MEF has redoubled its own Life Cycle Orchestration (LSO) efforts with its new MEF 3.0 announcement.  The overall approach is sound at the technical level, meaning that it addresses things like the issues of “federation” of service elements across provider boundaries, but it also leaves some gaps in the story.  Then there’s the fact that the story itself is probably not completely understood.

Virtualization in networking is best known through the software-defined network (SDN) and network functions virtualization (NFV) initiatives.  SDN replaces a system of devices with a different system, one based on different principles in forwarding.  NFV replaces devices with hosted instances of functionality.  The standards activities in the two areas are, not surprisingly, focused on the specific replacement mission of each.  SDN focuses on how forwarding substitutes for switching/routing, and NFV on how you make a hosted function look like a device.

The problem we’ve had is that making a substitution workable doesn’t make it desirable.  The business case for SDN or NFV is hard to make if at the end of the day, the old system and the new are equivalent in every way, yet that’s the “replacement” goal each area has been pursuing.  Operators have shifted their view from the notion that they could save enough in capital costs by the change to justify it, to the notion that considerable operational savings and new-service-opportunity benefits would be required.  Hence, the SDN and NFV debates have been shifting toward a debate on service lifecycle management automation.

Neither SDN nor NFV put SLMA in-scope for standardization, which means that the primary operations impact of both SDN and NFV is to ensure that the opex and agility of the new system isn’t any worse than that of the old.  In fact, NFV in particular is aiming at simple substitution; MANO in NFV is about getting a virtual function to the state of equivalence with a physical function.  It’s the lack of SLMA capability that’s arguably hampering both SDN and NFV deployment.  No business case, no business.

The MEF has taken a slightly approach with its “third network”, and by implication with MEF 3.0.  The goal is to create not so much a virtual device or network, but a virtual service.  To support that, the LSO APIs are designed to support “federated” pan-provider control of packet and optical elements of a service, and also for the coordination of higher-layer features (like security) that are added to basic carrier Ethernet.

There are three broad questions about the MEF approach.  First is the question of federation; will the model address long-standing operator concerns about it?  Second is the question of carrier-Ethernet-centricity; does the MEF really go far enough in supporting non-Ethernet services?  Finally, there’s the overarching question of the business case; does MEF 3.0 move the ball there?  Let’s look at each.

Operators have a love/hate relationship with federation, and I’ve worked for a decade trying to help sort things out in the space.  On one hand, federation is important for operators who need to provide services larger than their own infrastructure footprint.  On the other, federation might level the playing field, creating more competitors by helping them combine to offer broader-scope services.  There’s also the problem of how to ensure that federation doesn’t create a kind of link into their infrastructure for others to exploit, by seeing traffic and capacity or by competing with their own services.

Facilitating service federation doesn’t address these issues automatically, and I don’t think that the MEF takes substantive steps to do that either.  However, there is value to facilitation, and in particular for the ability to federate higher-layer features and to integrate technology domains within a single operator.  Thus, I think we can say that MEF 3.0 is at least useful in this area.

The second question is whether the MEF goes far enough in supporting its own notion of the “third network”, the use of carrier Ethernet as a platform for building services at Level 3 (IP).  I have the launch presentation for the MEF’s Third Network, and the key slide says that Carrier Ethernet lacks agility and the Internet lacks service assurance (it’s best-efforts).  Thus, the Third Network has to be agile and deterministic.  Certainly, Carrier Ethernet can be deterministic, but for agility you’d have to be able to deliver IP services and harmonize with other VPN and even Internet technologies.

While the basic documents on MEF 3.0 don’t do much to validate the Third Network beyond claims, the MEF wiki does have an example of what would almost have to be the approach—SD-WAN.  The MEF concept is to use an orchestrated, centrally controlled, implementation of SD-WAN, and they do define (by name at least) the associated APIs.  I think more detail in laying out those APIs would be helpful, though.  The MEF Legato, Presto, and Adagio reference points are called out in the SD-WAN material, but Adagio isn’t being worked on by the MEF, and as a non-member I’ve not been able to pull the specs for the other two.  Thus, it’s not clear to me that the interfaces are defined enough in SD-WAN terms.

Here again, though, the MEF does something that’s at least useful.  We’re used to seeing SD-WAN as a pure third-party or customer overlay, and usually only on IP.  The MEF extends the SD-WAN model both to different technologies (Ethernet and theoretically SDN, but also involving NFV-deployed higher-layer features), and to a carrier-deployed model.  Another “useful” rating.

The final point is the business-case issue.  Here, I think it’s clear that the MEF has focused (as both SDN and NFV did) on exposing service assets to operations rather than on defining any operations automation or SLMA.  I don’t think you can knock them for doing what everyone else has done, but I do think that if I’ve declared SDN and NFV to have missed an opportunity in SLMA, I have to do the same for the MEF 3.0 stuff.

Where this leaves us is hard to say, but the bottom line is that we still have a business-case dependency on SLMA and still don’t have what operators consider to be a solution.  Would the MEF 3.0 and Third Network approach work, functionally speaking?  Yes.  So would SDN and NFV.  Can we see an easy path to adoption, defined and controlled by the MEF itself?  Not yet.  I understand that this sort of thing takes time, but I also have to judge the situation as it is and not how people think it will develop.  We have waited from 2012 to today, five years, for a new approach.  If we can’t justify a candidate approach at the business level after five years, it’s time to admit something was missed.

There may be good news on the horizon.  According to a Light Reading story, Verizon is betting on a wholesale SD-WAN model that would exploit the MEF 3.0 approach, and presumably wrap it in some additional elements that would make it more automated.  I say “presumably” because I don’t see a specific framework for the Verizon service, but I can’t see how they’d believe a wholesale model could be profitable to Verizon and the Verizon partner, and still be priced within market tolerance, unless the costs were wrung out.

We also have news from SDxCentral that Charter is looking at Nuage SD-WAN as a means of extending Ethernet services rather than of creating IP over Ethernet.  That would be an enhanced value proposition for the Third Network vision, and it would also establish that SD-WAN is really protocol-independent at the service interface level, not just in the support for underlayment transport options.  This is the second cable company (after Comcast) to define a non-MPLS VPN service, and it might mean that this will be a differentiator between telco and cableco VPNs.

How much the MEF vision alone could change carrier fortunes is an issue for carriers and for vendors as well.  Carrier Ethernet is about an $80 billion global market by the most optimistic estimates, and that is a very small piece of what’s estimated to be a $2.3 trillion communications services market globally.  Given that, the MEF’s vision can succeed only if somehow Ethernet breaks out of being a “service” and takes a broader role in all services.  There’s still work needed to support that goal.

Are Fiber Network Players Really Playing Well Enough?

We are seeing more signs of the fiber challenge and opportunity, and more uncertainty about how it will play out, especially in terms of winners and losers.  Ciena continues to take sensible steps, Infinera continues to stumble, and making sense of these seeming contradictions is the challenging part of assessing fiber’s future.

It’s not like we don’t all know that fiber deployment has nowhere to go but up.  Wireless alone could double fiber in service by 2025, and there’s a lot of global interest in increasing the commitment to fiber access, especially FTTN combined with 5G.  The challenge for fiber network players like Ciena and Infinera is that they don’t sell glass, but systems, and the role of those systems in a fiber-rich future is much more difficult to determine.

Most network hardware includes fiber interfaces, so single-mission point-to-point or even multipoint connections don’t require the equipment fiber networking vendors offer.  What you need their gear for is building “fiber networks”, which are connective Layer-One structures that provide optical multi-hop paths, aggregation, and distribution of capacity.  If you’re a fiber vendor, you either have to focus on expanded applications of fiber networking, or you have to bet on expansion in the few areas of fiber deployment that are essentially point-to-point but do require or justify specialized devices.

Infinera seems to have taken the second option, talking more about things like subsea cables for intercontinental connection.  Yes, we’re likely to have more of that, but no, it’s not likely to be a huge growth opportunity.  Data center interconnect is another area that they’ve identified, and while surely the cloud will increase the need for that, it’s not exactly a household-scale opportunity.  Of the 7.5 million business sites in the US, for example, only about 150,000 represent any scale of data center, and my surveys say that only 8,000 even represent multiple data centers of a single business.

Ciena has done a better job in positioning optical networking as a target, and focusing on what I think is the fundamental truth for optical network vendors—you need to have a connective, multi-hop, complex Layer One infrastructure opportunity if you want to have an opportunity for discrete optical network products versus glass connected to the interfaces of electrical-layer stuff.  Even Ciena, though, may not be going quite far enough or being quite explicit enough.

It’s helpful here to look at the extreme case.  What would magnify the value of optical networking in its true sense?  Answer, diminution of electrical networking in the same sense.  Put in the reverse, the more connectivity we manage at the optical layer the more the electrical layer looks like a simple edge function.

This is a clear description of what a combination of agile optics and “virtual wires” would be.  If Level 1 is fully connective (in virtual wire form), fully resilient in recovery from faults, and fully elastic in terms of capacity, then higher protocol layers are just the stuff that creates an interface and divides traffic up among the virtual pipes.  SD-WAN is a good example; if you’re going to build services on an overlay principle you’d achieve the lowest cost and simplest operation by overlaying them on the most primitive underlay you can build—a virtual wire.

Virtual wires can be distinguished from optical paths by the presumption that a virtual wire is a Level 1 element that carries traffic but doesn’t participate in any data-plane protocol exchange.  Optical paths can be viewed as an implementation option for virtual wires, but probably not one broadly applicable enough to fulfill their potential.  The problem is that everyone can’t have a piece of an optical pipe serving them; you need to have some electrical-layer tail connectivity that aggregates onto the higher-capacity optical routes.  That’s what Ciena just announced, with the notion of a packet-technology edge function.

“Edge” is important here, because the closer you can get a fiber network—even a “fiber network” that’s including electrical/packet tail connections—to the edge, the more you can absorb into it in terms of features, functions, and benefits.  That absorption is what increases the value of fiber networks, and networking, and raises the revenue potential for vendors in the space.

If we look at edge computing in abstract, it’s tempting to see connectivity requirements as nothing more than a greater number of DCI paths, because edge computing is computer and data center connection if considered on its own.  The thing is, we have to consider it in the context of what else is at the edge.

The majority of edge computing sites will be sites where telecom/cablecom and wireless services converge.  Think telco central office.  There is already considerable traffic in and out of these locations, much of which is concentrated using its own specific equipment.  Historically, the “metro network” was a network created with optics (SONET) and supported through on-ramp add-drop multiplexers that offered operators a way of clumping a variety of traffic sources onto fast fiber paths.  If edge computing comes along, it adds to the stuff that needs clumping, and could potentially further justify the notion of a separate optical-layer network.

Ciena and Infinera already have “metro network” products and strategies, and it seems to me that edge computing is effectively an update to these strategies, a way of providing virtual wires to extend optics, perhaps even virtual-wire services to end users.  Ciena talks about some of the specific value propositions for 10 and 100GigE to the edge, but they really should explore two issues.  First, how do you keep the various higher-speed packet interfaces that the future will demand from being realize as simple glass between boxes and not elements of an optical network?  Second, how can you turn packet-edge into service-virtual-wire?

Virtual private networks can be created without switches/routers in a variety of ways, all of which are likely to offer lower service costs and greater operator profits.  Even things like content delivery networks and mobile packet core can be built that way, and we’re already seeing examples of this.  The logical pathway for operators to achieve better profits is to use cheaper technology—both in capex and opex terms—to create services.  Virtual wires would be a good way to start, because they can link in with SD-WAN, with virtual switch/router instances, and even with NFV-hosted service elements.

Optical players like Ciena and Infinera have an opportunity to anticipate what is likely an inevitable shift in how services like these are created, but it’s not one that will be automatically realized.  Vendors have to sing their own song, and sing effectively, if they want their buyers to listen.  Ciena has taken more positive steps in this direction, but even they’re not quite where they need to be.  Infinera has some hard choices to face.

A good, and sadly deceased, friend of mine, Ping Pan, was an advocate of a virtual wire concept.  He was one of the architects of the IETF effort on “pseudowires”, in fact, and if we’d had all the mature thinking on the cloud, virtual switches, virtual routers, instances of forwarding software, and SD-WAN, that we now have, he’d have seen the connection.  Edge instances of forwarding processes can combine with virtual wires to create all but the largest-scale services.  Interestingly, he was working at Infinera during some of his work on pseudowires.  They should have listened.

 

 

Exploiting the Full Scope of IoT Opportunity

IoT has been contending for the most-hyped technology of our time, and a recent T3C Summit event that cause got a big boost.  According to SDxCentral’s summary of a panel at the event, “…it makes sense that in the Internet of Things (IoT) boom, with its expected 20 billion to 50 billion connected devices by 2020, there’s money to be made by telcos.”  The title of the article characterizes this as a “multi-billion-dollar opportunity.”  Not necessarily, or even probably, unless you look way beyond the obvious.

IoT suffers, as most modern technology developments do, from “bracket creep”.  It gets good ink.  Therefore vendors append the IoT tag to anything that remotely resembles it.  Therefore there’s a constant advance in claims and stories that attract reporters and editors.  Therefore it gets good ink.  You get the picture.  So, yes, we may well end up with 20 to 50 billion connected devices by 2020, but my model says that far less than a tenth of 1% of those devices will be in any way visible to the telcos, much less earning them revenue.

The reason I’m harping on this is that we’re seeing another great market opportunity suffer from stupid positioning.  Any telco who thinks they’ll make their future numbers from IoT is not only doomed to disappointment on that specific opportunity, they’re probably overlooking the real opportunity in IoT.  The wrong focus is not only wrong, it usually precludes having the right focus, which is edge computing.

Another article, this one from Light Reading’s Carol Wilson, quotes the PCCW VP of Engineering as saying that “Competing in the digital services space doesn’t mean going up against web-scalers, it means doing edge-computing….It all comes back to FOG and edge cloud architecture.”  That’s the real point for sure, and IoT would surely be able to earn operators billions if they listened.

Operators have one unique advantage in the fog space—they have real estate.  There are about 70,000 edge offices of telcos worldwide, and another 30,000 deeper-sited offices, for a total of over a hundred thousand locations.  It’s tough to put a data center in the back of a truck and make all the necessary connections; you need permanent real estate, so operators have the place to put over a hundred thousand incremental fog data centers without buying any new buildings.  Amazon, Google, and other OTTs don’t have that advantage, so it would make sense for operators to exploit their real estate assets.

This ties into IoT for two reasons.  First, IoT isn’t about on-the-Internet sensors at all, because the majority of sensors are designed to be used in private applications.  If we put those billions of connected devices directly on the Internet, we’d have billions of hacks and spoofs and spend tens of billions on security making it look like the devices weren’t really there at all.  The fact is that the model of IoT we’ll see dominating is one where the sensors are on a private network that might not even use Internet technology at all (home sensor networks typically don’t).  The sites where they’re located are already connected, so there’s zero revenue associated with connecting those sensors.

Where the revenue comes from is digesting, summarizing, and correlating sensor data.  As I’ve said in other blogs, nobody is going to succeed in IoT if every application has to deal with raw sensor data.  Apart from access, security, and sheer ability to field all the requests, it would be too much work to write something like that and the result would be so sensor-specific it would be brittle.  An army would be needed to keep it up to date.

A better approach would be to presume that there are trusted providers who subscribe to sensor information and do various cuts and splices to create insight.  For example, if there’s a sensor that records ambient temperature in a bunch of places, you could look at patterns of change to detect conditions that range from a sudden cold front to a power failure.  In traffic terms, you could assess traffic patterns at a high level and even predict when a mess of cars in one area was going to translate to a mess in another because of movement along predictable routes.  There are many, many, types of insight that could be drawn, and many applications that would want to take a stab at drawing it.

Who provides all this good stuff, and where is it run?  The second point I talked about is that edge computing is close to the source of telemetry.  Quick access means timely analysis and correlation, which means edge-processed IoT events can lead to more timely insights.  That makes these event-analysis processes valuable in themselves, meaning something others would subscribe to for a fee.  Not only that, edge locations are able to initiate responses with a lower delay, so if the application demands reaction and not just analysis, you could sell the hosting of the reactive process at the edge more easily than somewhere deeper.

Connecting IoT devices is a silly mission.  Sure, operators could offer 5G connectivity (at a cost) to users, but would the users pay when some vendor offered them a free local connection to the same devices by utilizing WiFi or ZigBee or some other protocol?   Picture AT&T going to New York City and telling them they can telemetrize every intersection by adding 5G sensors, while meanwhile ConEd says that they’ll simply carry the traffic on the power connection.  Everyone with a current Internet connection can simply use it to get access to sensors connected to some local control point.  Not a good business for operators to get into, in short.

Turning sensor data into useful, actionable, intelligence?  That’s a whole different story.  Here we have an opportunity to add value, which is the surest way to add revenue.  The challenge is that it’s not at all clear how regulators would treat this kind of telco mission.  Regulatory policy on higher-level services has traditionally said that telcos have to offer such things through a separate subsidiary.  That could preclude their exploiting regulated assets, which in most cases would include real estate.  How that subsidiary was capitalized might also be an issue, and this combination makes it much harder for operators to exploit their advantages.

It also makes it a lot harder for IoT to happen, at least happen in an optimal way.  It’s hard to pick a group that has better assets to develop the market, and enlightened policy would try to encourage them to do that rather than put barriers in place.  I don’t know what other group of companies could even make the kind of investment needed in edge computing, and I don’t know whether we can really get to IoT without it.  Perhaps this is something regulators in major markets need to think about while planning policy changes.

Can NFV Make the Transition from vCPE to “Foundation Services?”

Suppose we decided that it was time to think outside the virtual CPE box, NFV-wise.  The whole of NFV seems to have fixated on the vCPE story, so much so that it’s fair to ask whether there’s anything else for NFV to address, and if so what exactly would the other options look like.

vCPE has two aspects that make it a subset (perhaps a small one) of NFV overall.  One is that it’s been focused on hosting in a general-purpose box that sits on the customer premises, outside the carrier cloud.  The other is that it’s a single-tenant, single-service model.  The first point means that unless NFV moves beyond vCPE, NFV can’t promote the carrier cloud overall.  The second means that it’s very difficult to extend NFV to anything but business services, which limits bottom-line impact.  If these are the limitations, then we should expect that “extended” NFV has to address both.

In theory, there’s nothing to prevent “vCPE” from being virtualized inside the carrier cloud, and many operators and vendors will hasten to say that even as they focus on premises-device-based implementations.  The practical truth is that unless you have a fairly extensive edge-hosted carrier cloud in place, it would be difficult to find a good spot to put the vCPE VNFs other than on premises.  You don’t want to pull traffic too far from the natural point of connection to add features like firewall and encryption, and any extensive new connection requirements would also increase operations complexity and cost.

There’s also an incremental-cost issue to address.  A service has to be terminated in something, meaning that it has to offer a demarcation interface that users can consume, and whatever premises features are expected for the service.  An example is consumer or even small-branch broadband; you need to terminate cable or FiOS, for example, and a WiFi router, which means that you probably have to cover most of the device cost with the baseline features.  Adding in firewalls and other elements won’t add much, so removing them to the cloud won’t save much.

The “tenancy” question is even more fundamental.  Obviously, something hosted on a customer’s premises isn’t likely to be multi-tenant, and it’s very possible that the focus on vCPE has inadvertently created an NFV fixation on single-tenant VNFs.  That’s bad because the great majority of service provider opportunity is probably based on multi-tenant applications.

If you want to host a website, you don’t spin up an Internet to support it.  In many cases you don’t even spin up a new server, because the hosting plan for most businesses uses shared-server technology.  If you believe in wireless, do you believe that every customer gets their own IMS and EPC?  Is 5G network slicing likely to be done on a slice-per-phone basis?  IoT presumes shared sensors, virtual or real.  Almost everything an OTT offers is multi-tenant, and the operators want to reap the service opportunities that OTTs now get almost exclusively.  Might that demand multi-tenant thinking?  Thus, might it demand multi-tenant NFV?

There are huge differences between a vCPE application and virtual IMS or EPC.  The one that immediately comes to mind is that “deployment” is something that’s done once, not something done every time a contract is renewed.  The fact is that multi-tenant VNFs would probably have to be deployed and managed as cloud components rather than through the traditional NFV MANO processes, for the simple reason that the VNFs would look like cloud components.

This raises an important question for the cloud and networking industries, and one even more important for “carrier cloud” because it unites the two.  The question is whether NFV should be considered a special case of cloud deployment, or whether NFV is something specific to per-user-per-service vCPE-like deployments.  Right now, it’s the latter.  We have to look at whether it should or could become the former.

The first step is to ask whether you could deploy a multi-tenant service element using NFV.  At the philosophical level this would mean treating the network operator as the “customer” and deploying the multi-tenant elements as part of the operator’s own service.  There’s no theoretical reason why the basic NFV processes couldn’t handle that.  If we made this first-stage assumption, then we could also presume that lifecycle management steps would serve to scale it or replace faulted components.  The key is to ensure that we don’t let the operator’s customers have control over any aspect of shared-tenant element behavior.  Again, no big deal; users of a corporate network service wouldn’t have control over that service as a shared-tenant process; the network group would control it.

One fly in the ointment that I came across early on is that many of these advanced shared-tenant features are themselves pieces of a larger application.  IMS and EPC go together in 4G networks, for example.  If you deploy them independently, which likely you would do since they are separate pieces of the 3GPP mobile infrastructure model, then you’d have to know where one was put so you could integrate it with the other.  In the original CloudNFV plan, these kinds of features were called “foundation services” because they deployed for building into multiple missions.

Foundation services are like applications in the cloud.  They probably have multiple components and they probably have to be integrated in an access or workflow sense with other applications.  The integration process at the minimum would have to support a means of referencing foundation services from other services, including other foundation services.  In “normal” NFV, you would expect the service elements to be invisible outside the service; not so here.

This relationship between foundation services and NFV may be at the heart of NFV’s future.  Somebody asked me, on my blog yesterday, what the value proposition was for the deployment of cloud elements via NFV.  The only possible answer is improved lifecycle management, meaning management across the spectrum of legacy and software-hosted elements.  That’s not handled by NFV today, though it should be, and so NFV is not clearly useful in foundation service applications.  This, despite people in places like AT&T saying that NFV is fundamental to 5G, means that it’s not clear NFV is needed or even useful there.

You can’t create the future by declaring it, people.  If we want NFV to take the next step, then it has to do what’s necessary.  We have ample evidence of both the truth of this and the direction that step has to be taken.  Is it easier to do nothing?  Sure, but “nothing” is what will result.

Are the “Issues” With ONAP a Symptom of a Broader Problem?

How do you know that software or software architectures are doing the wrong thing?  Answer: They are doing something that only works in specific cases.  That seems to be a problem with NFV software, including the current front-runner framework, ONAP.  The initial release, we’re told by Light Reading, will support a limited set of vCPE VNFs.  One application (vCPE) and a small number of functions not only doesn’t make NFV successful, it begs the question of how the whole project is going together.

Linux is surely the most popular and best-known open-source software product out there.  Suppose that when Linux came out, Linus Torvalds said “I’ve done this operating system that only works for centralized financial applications and includes payroll, accounts receivable, and accounts payable.  I’ll get to the rest of the applications later on.”  Do you think that Linux would have been a success?  The point is that a good general-purpose tool is first and foremost general-purpose.  NFV software that “knows” it’s doing vCPE or that has support for only some specific VNFs isn’t NFV software at all.

NFV software is really about service lifecycle management, meaning the process of creating a toolkit that can compose, deploy, and sustain a service that consists of multiple interdependent pieces, whether they’re legacy technology elements or software-hosted virtual functions.  If every piece of a service has to be interchangeable, meaning support multiple implementations, then you either have to be able to make each alternative for each piece look the same, or you have to author the toolkit to accommodate every current and future variation.  The latter is impossible, obviously, so the former is the only path forward.

To make different implementations of something look the same, you either have to demand that they be the same looking from the outside in, or you have to model them to abstract away their differences.  That’s what “intent modeling” is all about.  Two firewall implementations should have a common property set that’s determined by their “intent” or mission—which in this case is being a firewall.  An intent model looks like “firewall” to the rest of the service management toolkit, but inside the model there’s code that harmonizes the interfaces of each implementation to that abstract intent-modeled reference.

If there’s anything that seems universally accepted in this confusing age of SDN and NFV, it’s the notion that intent models are critical if you want generalized tools to operate on non-standardized implementations of service components.  How did that get missed here?  Does this mean that there are some fundamental issues to be addressed in ONAP, and perhaps in NFV software overall?  Can they be addressed at this point?

From the very first, NFV was a software project being run by a traditional standards process.  I tried to point out the issues in early 2013, and the original CloudNFV project addressed those issues by defining what came to be known as “intent modeling”.  EnterpriseWeb, the orchestration partner in CloudNFV, took that approach forward into the TMF Catalyst process, and has won awards for its handling of the process of “onboarding” and “metamodels”, the implementation guts of intent modeling.  In short, there’s no lack of historicity and support for the right approach here.  Why then are we apparently on the wrong track?

I think the heart of the problem is the combination of the complexity of the problem and the simplicity of ad-sponsored media coverage.  Nobody wants to (or probably could) write a story on the real NFV issues, because a catchy title gets all the ad servings you’re ever going to get on a piece.  Vendors know that and so they feed the PR machine, and their goal is to get publicity for their own approach—likely to be minimalistic.  And you need a well-funded vendor machine to attend standards meetings or run media events or sponsor analyst reports.

How about the heart of the solution?  We have intent-model implementations today, of course, and so it would be possible to collect a good NFV solution from what’s out there.  The key piece seems to be a tool to facilitate the automated creation of the intent models, to support the onboarding of VNFs and the setting of “type-standards” for the interfaces.  EnterpriseWeb has showed that capability, and it wouldn’t be rocket science for other vendors to work out their own approaches.

It would help if we accepted the fact that “type-standards” are essential.  All VNFs have some common management properties, and all have to support lifecycle steps like horizontal scaling and redeployment.  All VNFs that have the same mission (like “firewall”) should also have common properties at a deeper level.  Remember that we defined SNMP MIBs for classes of devices; why should it be any harder for classes of VNF?  ETSI NFV ISG: If you’re listening and looking for useful work, here is the most useful thing you could be doing!

The media could help here too.  Light Reading has done a number of NFV articles, including the one that I opened with.  It would be helpful if they’d cover the real issues here, including the fact that no international standards group or body with the same biases as the NFV ISG has a better chance of getting things right.  This is a software problem that software architectures and architects have to solve for us.

It may also help that we could get a new body working on the problem.  ETSI is setting up a zero-touch automation group, interesting given that the NFV ISG should have addressed that in their MANO work, that the TMF has had a ZOOM (Zero-touch Orchestration, Operation, and Management) project since 2014, and that automation of the service lifecycle is at least implicit in almost all the open-source MANO stuff out there, including ONAP.  A majority of the operators supporting the new ETSI group tell me that they’d like to see ONAP absorbed into it somehow.

These things may “help”, but optimal NFV demands optimal software, which is hard to achieve if you’ve started off with a design that doesn’t address the simple truth that no efficient service lifecycle management is possible if all the things you’re managing look different and require specific and specialized accommodation.  This isn’t how software is supposed to work, particularly in the cloud.  We can do a lot by adding object-intent-model abstractions to the VNFs and integrating them that way, but it’s not as good an approach as starting with the right software architecture.  We should be building on intent modeling, not trying to retrofit it.

That, of course, is the heart of the problem and an issue we’re not addressing.  You need software architecture to do software, and that architecture sets the tone for the capabilities in terms of functionality, composability, and lifecycle management.  It’s hard to say whether we could effectively re-architect the NFV model the right way at this point without seeming to invalidate everything done so far, but we may have to face that to keep NFV on a relevant track.

Does Nokia’s AirGile Advance Stateless Event-Based VNFs?

The notion of stateless microservices for networking and the cloud is hardly new.   I introduced some of the broad points on state in my blog last week, but the notion is much older.  Twitter pioneered the concepts, and Amazon, Google, and Microsoft have all deployed web services to support the model, which is aimed at event processing.  Now, Nokia has announced what it calls “AirGile”, which wraps the stateless-microservice notion into a massive 5G package.  It’s a bold approach, and there are some interesting points made in the material, but is this just too big…not to fail but to succeed?  Or is it something else?

I’ve blogged often on functional programming, lambdas, and microservices, and I won’t bore everyone by repeating what I said (you can do a search on the terms on my blog to find the references).  The short version is that the goal of these concepts, which are in many ways just different faces on the same coin, is to create software components that can be spun up as needed, where needed, and in any quantity, then disappear till the next time.  You can see how this would be perfect for event-handling, since events are themselves transitory stuff.

Events are the justification for AirGile, and certainly event-based systems are critical for the cloud.  It’s also likely that many of the NFV applications are really event applications, though this is less true of the vCPE stuff that dominates the NFV space today.  vCPE VNFs are in the data path, and microservices and functional programming are less relevant to that space than to control-plane or transactional stuff.  Nokia doesn’t make the distinction in their material.

Overall, the AirGile story is a bit hard to pull out of the material; the press release is HERE.  I pick out three elements—a developer program and API set, a microservice-based model that’s more “cloud-agile”, and a target application set that seems to be primarily 5G but might also include NFV (Alcatel-Lucent’s CloudBand).  Two of the three things have been around all along and are presumably tweaked for the AirGile story, so it’s actually the microservices stuff that’s new.  Unfortunately, there is virtually nothing said about that part in the material.  As a result, I’m going to have to do some heavy lifting to assess what’s going on here, meaning presume that there is actual useful thinking behind the story and see what it might be.

I think that this is really mostly about NFV, because NFV is mentioned in the release (CloudBand), is a proposed element in 5G deployment, and is based on cloud technology (OpenStack, for example).  NFV and the cloud have a common touch-point in that components of functionality are deployed in both—NFV as virtual network functions and the cloud as applications.  Microservices are components of software, and so you could say that a microservice architecture could serve both NFV and the cloud.  However, Nokia is a network vendor and not an application provider, so it’s the NFV side that seems to be the linchpin.  There, mobile services and 5G offer an alternative target to that vCPE stuff I mentioned above, an alternative that is easier to cast as an event-based application.  That, in the simplest terms, is how I see AirGile; do the next generation of NFV and focus on control-plane events.

If for Nokia AirGile is mostly an NFV story, then what is being deployed are microservices as VNFs, and in fact they do make that point in their material.  Paraphrasing, operators could create services by assembling microservice-functions, presumably components of VNFs, and do this in a more agile way.  True in general, since composition of applications from microservices is widely accepted as promoting agility.  So let’s take this “truth” (if I’m correct) and run with it.

VNFs today are not microservices, and Nokia can’t do anything from the outside to make them so.  A piece of software is stateless and “functional” only because it was written to be.  Thus, a focus on microservice-VNFs means a focus on NFV applications that don’t depend on transporting legacy physical device code or current network software into VNF form.  You can transport that stuff to a VNF, but you can’t make it a microservice without rewriting it.

Stateless, microservice-based, VNFs are then the components of 5G implementations and other network advances Nokia hopes to exploit.  This supposes a model of NFV that’s very different from today’s model, but remember that NFV today is really all about the single application we’d call “virtual CPE” or vCPE, created by service-chaining VNFs that support things like app acceleration, firewalls, encryption, and so forth.  vCPE is valuable if it can exploit the range of CPE features that are already out there, and so it’s essentially nailed to a legacy software model, not a microservice model.  Nokia, if AirGile is important, has to find new stuff to do with VNFs, and new development to support it.

The advantage of microservice-VNFs, which I’ll shorthand to mVNFs, is that they are inherently stateless.  A stateful component has stuff that implies a contextual awareness of past events, stored within.  If you replace a stateful component, you lose that stuff.  If you scale it, the new copies don’t have what the original had, and thus they might interpret the next message differently.  However, most network functions need “state”, at least in the sense that they store some variables, and Nokia seems to be planning to handle state by providing a back-end database where the variables are stored, keeping them out of the components.  This back-end state control is used routinely in the cloud, so this isn’t a radical departure from industry norms.

Still, we don’t have this sort of VNF hanging around the vCPE world, as I’ve said.  I don’t think that Nokia believes that vCPE is going to set the carrier world on fire, opportunity-wise, and you all know I don’t think so either.  They do, however, need some carrier for their strategy or it’s just a software architecture that only programmers would care about.  Their carrier is 5G.

To quote myself from a recent discussion with an operator, “5G is a UFO.  It’s not going to land on your desk and present itself for inspection, so you’re free to assign whatever characteristics to it that you like!”  There are more moving parts to 5G than perhaps to any other networking standard, and most of them are not yet baked.  Thus, just what role mVNFs might play, or even NFV might play, is uncertain, and vendors like Nokia can spin their tale without fear of (at least near-term) contradiction.  If NFV is big in 5G, and if mVNFs are a good idea for greenfield network functions implementation, then 5G is the right place for AirGile.  Before you decide that I’ve written off this whole AirGile thing as a cynical play, let me make two points that in many ways expose the greater truth.

First, operators are already questioning the value of an NFV future based on porting old device functionality to the cloud.  If everyone was happy to hunker down on old physical-device stuff and stay there for a decade, Nokia and others would have a major uphill battle to push an approach that requires a VNF rewrite to mVNFs.  That’s not the case, even today, and it will obviously be less so over time as NFV goals are tied to things like 5G or IoT.  5G is important to NFV to get VNFs out of the old vCPE model, which frankly I don’t think will ever result in a big success for NFV.  Rather than address something like IoT, which probably has more potential, Nokia is aiming at a target that has near-term operator commitment and standardization support.

Second, whether 5G even deploys, it is still very smart for Nokia to link AirGile to it.  Nobody has AirGile budgets or plans, but they do have both for 5G.  Further, 5G lets Nokia say things about AirGile in the context of an accepted problem/opportunity set, using 5G to explain the need for AirGile’s features.  It’s fine to say that VNFs will migrate to mVNFs, but many won’t believe that without an example of where that would likely happen.  5G is that place, and AirGile is at least on the right track.

The question then is what the mVNF concept will do to/for 5G and NFV, and even more how it might impact IoT, which is the big event-driven champion market issue.  I think that if NFV is to play any role whatsoever in 5G, it will have to be in mVNF form because the simple monolithic VNF model just doesn’t do the job in a large-scale dynamic deployment.  Thus, while we can’t say at this stage what 5G will look like, exactly, or when it will happen (even approximately), we can say that without mVNFs it probably won’t have much NFV inside.  And IoT without mVNFs is just not going to happen, period.

I think that we’re long overdue in thinking about the hardware, network, and software platform needed for a realistic carrier cloud platform, and 5G and IoT combine to represent almost 40% of the opportunity there.  Event-driven applications are even more important, representing a minimum of 74% of carrier cloud opportunity in the long term.  But that 74% isn’t NFV as we think of it, and that’s perhaps the biggest challenge for AirGile and Nokia.  They need to think not so much of NFV but of carrier cloud, and the story there isn’t really well developed.  Might Nokia have exposed the importance of event-driven carrier cloud and not owned the opportunity?  If so, they could have greased the skids for competitors.

We don’t have enough detail on AirGile to say whether it has that golden set of features needed, but it will probably stimulate a lot of reaction from other vendors, and we will hopefully end up a lot closer to a full event-driven architecture than we are today.  I think that Nokia may help drive that closure, but I wish they’d have offered more detail on their microservices framework.  That’s where the real news and value lies, and until we understand what Nokia plans with respect to events overall, we can’t evaluate just how important it could be to Nokia and to the industry.

That’s the “something else”.  AirGile might be vague because the topic of stateless software is pretty complex, certainly beyond the typical media story.  It might also be vague because it’s “slideware” or “vaporware”, or a placeholder for future detail and relevance.  We don’t know based on what’s been released, and I hope Nokia steps up and tells its story completely.

Why “State” Matters in NFV and the Cloud

It’s time we spent a bit more time on the subject of “state”, not in a governmental sense but in the way that software elements behave, or should behave.  State, in a distributed system, is everything.  The term “state” is used in software design to indicate the notion of context, meaning where you are in a multi-step process.  Things that are “stateful” have specific context and “stateless” things don’t.  When you have states, you use them to mark where you are in a process that involves multiple steps, and you move from one state to another in response to some outside condition we could call an “event”.  Sounds simple, right?  It is anything but.  Where we’ve run into state issues a lot in the networking space is NFV, because NFV deploys software functions and expects to provide resiliency by replacing them, or scalability by duplicating.  There are two dimensions of state in NFV, and both of them could be important.

When I’ve described NFV deployment as a hierarchical model structure, I’ve noted that each of the elements in the model was an independent state machine, meaning that each piece of the model had its own state.  That state represented the lifecycle progress of the modeled service, so we can call it “lifecycle state”.  Lifecycle state is critical to any NFV framework because there are many places in a service lifecycle where “subordinate” behaviors have to be done before “superior” ones can be.  A service, at a high level, isn’t operational till all its pieces are, and so lifecycle state is critical in resolving dependencies like that.  Where lifecycle state gets complicated is during special events like horizontal scaling of the number of instances or replacement of an instance because something broke.

The complexity of scaling, in lifecycle state, lies in the scope of the process and the mechanism for selecting an instance to receive a particular packet—load balancing.  When you instantiate a second instance of a scalable VNF, you probably have to introduce a load-balancer because you now have a choice of instances to make.  In effect, we have a service model with a load-balancer in it, but not yet active, and we have to activate it and connect it.

In replacement, the problem depends on just how widespread the impact of your replacement has to be.  If you can replace a broken server with another in the same rack, there is a minimal amount of reconnection.  In that case, the deployment of the new VNF could make the correct connections.  However, if you had to put the new VNF somewhere quite distant, there are WAN connection requirements that local deployment could not hope to fulfill.  That means that you have to buck part of the replacement work “upward” to another element.  Which, of course, means that you had to model another element in the first place.

The rightful meaning of the term “orchestration” is the coordination of separate processes, and that’s what’s needed for lifecycle management.  Lifecycle state is an instrument in that coordination, a way of telling whether something is set up as expected and working as planned, and if it isn’t tracking it through a series of steps to get the thing going correctly.

The individual virtual network functions (VNFs) of NFV also have functional state, meaning that the VNF software, as part of its data-plane dialog with users and/or other VNFs, may have a state as well.  For example, a VNF that’s “session-aware”, meaning that it recognizes when a TCP session is underway, has to remember that the session has started and that it hasn’t yet ended.  If you’re actually processing a TCP flow, you will have to implement slow-start, recognize out-of-order arrivals, etc.  All of this is stateful behavior.

Stateful behavior in VNF functionality means that you can’t readily spawn a new or additional copy of a VNF and have it substitute for the original, because the new copy won’t necessarily “know” about things like a TCP session, and thus won’t behave as the original copy did.  Thus, functional statefulness can limit the ability of lifecycle processes to scale or replace VNFs.

Functional state is difficult because it’s written into the VNFs.  You can impose lifecycle state from above, so to speak, because the VNFs themselves aren’t generally doing lifecycle stuff.  You can’t impose functional state because it’s part of the programming of the VNF.  This is why “functional programming” has to address state in some specific way; it’s used to create things that can be instantiated instantly, replaced instantly, and scaled in an unfettered way.  The process of instantiating, replacing, and scaling are still lifecycle-state-driven, but the techniques used by the programmer to manage functional state still have to be considered, or you may create a second copy of something only to find that it breaks the process instead of helping performance.

To make things a bit more complex, you can have things that are “stateless” in a true sense, and things that have no internal state but are still stateful.   This is what Nokia is apparently proposing in its AirGile model (I’ll blog more on AirGile next week).  Most such models rely on what’s called “back-end state”, where an outside element like a database holds the stateful variables for a process.  That way, when you instantiate a new copy of something, you can restore the state of the old copy.

The only negative about back-end state control is that there may be a delay associated in transporting the state—both saving the state in the “master” copy and moving that saved state to the point where a new copy is going to be instantiated.  This may have to be considered in some applications where the master state can change quickly, but in both scaling and fault recovery you can usually tolerate a reasonable delay.

Every NFV service has lifecycle state, but not every service or service element has functional, internal, state.  Things that are truly stateless, referencing back to functional programming, can be instantiated as needed, replicated as indicated, and nothing bad happens because every copy of the logic can stand in for every other copy since nothing is saved during operation.  True stateless logic is harder to write but easier to operationalize because you don’t have to worry about back-end state control, which adds at least one lifecycle state to reflect the process of restoring state from that master copy.

While state is important for NFV, it’s not unique to NFV.  Harkening back to my opening paragraph, NFV isn’t the only thing that has state; it’s an attribute of nearly all distributed systems because the process of deploying such systems will always, at the least, involve lifecycle states on the components.  That means that logically we might want to think about cloud systems, applications, and services as being the same thing under the covers, and look for a solution to managing both lifecycle state and functional state that can be applied to any distributed (meaning, these days, cloud-distributed) system.

Lifecycle state, as I’ve noted in earlier blogs, can be managed by using what I’ve called representational intent, a model that stands in for the real software component and manages the lifecycle process as it relates both to the service management system overall and to the individual components.  In effect, the model becomes a proxy for the real stuff, letting something that doesn’t necessarily have a common lifecycle management capability (or even have any lifecycle awareness) be fit into a generalized service management or application management framework.

Data models, or small software stubs, can provide representational intent modeling and there have been a number of examples of this sort of lifecycle state modeling, all discussed HERE.  It’s not clear whether modeling could handle functional state, however, beyond perhaps setting up the state in a back-end state control system.  The statefulness of logic is a property of the logic itself, and even back-end state control would be difficult to say the least if the underlying software didn’t anticipate it.

I think it’s clear that for the broad distributed-system applications, some unified way of managing both lifecycle and functional state would be very valuable.  We don’t, at present, have a real consensus on how to manage either one separately at the moment, so that goal may be difficult to reach quickly.  In particular, functional state will demand a transition to functional programming or stateless microservices, and that may require a rewriting of software.  That, in turn, demands a new programming model and perhaps new middleware to harmonize development.

We’ve not paid nearly enough attention to state in NFV or in the cloud.  If we want to reap the maximum benefit from either, that has to change.

What’s Really Needed to “Simplify” NFV

Intel says it will simplify NFV by creating reference NFVIs.  Is there a need for simplification with NFV, and does Intel’s move actually address it in the optimum way?  It depends on what you think NFV is and what NFVI is, and sadly there’s not full accord on that point.  It also depends on where you think NFV is going—toward more service chaining or to 5G and IoT.  In the ETSI NFV E2E model, “NFVI” or NFV Infrastructure, is something that hosts virtual functions.  It lives underneath a whole set of components and it’s really those components, and in particular the Virtual Infrastructure Manager (VIM) that frames the relationship between hardware and NFV.  We can’t start with NFVI, we have to look at things from the top.

If there’s NFV confusion, it doesn’t start with NFVI because logically the infrastructure itself should be invisible to NFV.  Why that’s true is based on the total structure.  It’s difficult to pull a top-down vision from the ETSI material, but in my view the approach is fairly straightforward.  Virtual Network Functions (VNFs) are the hosted analog of Physical Network Functions (PNFs), which are the devices we already use.  The goal of NFV is to deploy these VNFs and connect the result to current management and operations systems and practices.  Referencing a prior blog of mine, there is a hosting layer and a network layer, and the goal of NFV is to elevate a function from the hosting layer into the network layer, as a 1:1 equivalent of a real device that might otherwise be there.

If that is the mission of NFV, then the role of Management and Orchestration (MANO) is to take what might be a device that consists of multiple functions and is defined as a “service chain” and convert that into successive hosting interactions.  With what?  The VIM.  MANO doesn’t do generalized orchestration.  The VNFM’s role is to harmonize VNF management with management of PNFs.  Service lifecycles are out of scope, as is the control of the PNFs that remain.  In this model, a VIM is the cloud/virtualization software stack that deploys an application on infrastructure.  Many in the NFV ISG say that the VIM is OpenStack.

Logically, an operator would want to be able to deploy VNFs on whatever hosting resources it found optimal, including resources from open architectures like the Open Compute Project or Telecom Infrastructure Project.  Logically, they’d want to be able to embrace servers based on different chips (we just saw an announcement of a server partnership between Nvidia and the biggest server players, focusing on AI), and many think operators should be able to use something other than OpenStack as a VIM (see below).  We should be able to have portions of our resource pool focused on accelerated data plane, and portions focused on high compute power.  We should have what’s needed, in short.

Logically, we should be thinking of a VIM as being (in modern terms) an “intent model”, exposing standard SLAs and interfaces to MANO and implementing them via whatever software and hardware resources were desired.  If this is the goal, then it seems like a lot of the NFVI confusion is really a symptom of an inadequate VIM model.  If the goal of a VIM is to totally abstract the implementation of the deployment platform and underlying hardware, then there can’t be “confusion” in the NFVI because it’s invisible.  If there is confusion then it should be resolved by the implementation of the VIM.

This happy situation is compromised if you assume that there has to be one and only one VIM.  If the NFVI isn’t homogeneous, or if the operator elects to use virtualization software other than OpenStack, you end up with the problem of having all the options supported inside one VIM, which means that somehow a VIM would have to be vendor-neutral.  What vendor will provide that?  Is it a mandate for an open-source implementation of NFV?  Not yet.

I think that it’s particularly critical to be able to use VMware’s solutions rather than depend exclusively on OpenStack.  VMware is widely used, favored by many operators, and solid competition in the virtualization/cloud-stack space would be very helpful to network operators.  A VMware-modeled VIM would be helpful to NFV overall.

Whatever the motive behind VIM multiplicity, having more than one VIM means having some means of selecting which VIM you use.  If there is indeed diverse NFVI, then you need to have a mechanism to decide what part of the diverse pool you plan to use.  This isn’t complicated to do in theory if you have the right implementation, but a requirement to do it would have complicated the work of the NFV ISG and they elected not to address that issue.  I’ve called this a “resource domain” problem, a requirement that an intent model representing hosting/deployment represent a sub-model structure that can then link to the right deployment stack.

The VIM issues are bad enough when you consider the problem of deploying a VNF in the right place and right way, but they’re a lot worse when you consider redeployment.  Suppose I need a server resource with a widget to properly host a given VNF.  It’s certainly a problem if the only way I can make that available is to put it on all servers, because my VIM can’t select a server with it from a diverse pool.  But imagine now that my server breaks and there’s no other widget-equipped server in the same location.  I now have to reconfigure the service to route the connection to a different data center.  This is almost surely going to require configuring not only virtual switch ports for OpenStack, but configuring WAN paths.  Remember, I don’t have that capability because my MANO is focused on deploying the virtual elements and not setting up the WAN.

Intel could address this problem, but not with reference NFVIs.  What they needed to do was to hand it over to Wind River (part of Intel) and ask them to frame an open architecture for a VIM that had that internal, nested-models, capability needed to control diverse infrastructure using diverse virtualization software.  That would be a huge step forward, not only for Intel but for NFV overall.  It would also, of course, tend to open up the NFVI market, which may not be in Intel’s interest.

The need to have a very agile approach to managing virtual infrastructure goes beyond just different implementations of cloud hosting or CPUs.  Nokia has recently announced AirGile, what it calls a “cloud-native” model for NFV hosting that incorporates many of the attributes of functional/lambda/microservice programming that I’ve been talking about.  If you want to be truly agile with stateless elements for 5G and IoT (which is what Nokia is aiming at) then you need to be a lot more efficient in deployment, scaling, and redeployment.  Taking advantage of the AirGile model means having statelessly designed VNFs.  If we’re going to do that, we should rethink some of the use cases for NFV as well as the management model.

Vendors, including Intel and Nokia, clearly have their own visions of how NFV should work.  Add these to the multiplicity of open-source solutions, many with strong operator support and it shows that things aren’t going to converge on a single NFV model any time soon.  That means we have to be able to assess the relative merits of each approach, and the only way to fully understand or assess NFV is to take it from the top, in the most general case, and explore the implications.  I think that the biggest problem NFV had was starting from the bottom.  The second-biggest problem was excessive fixation on a single use case, virtual CPE.  Top-down, all-use-cases, is the way to go.

NFT’s problems can be solved, and in fact there are proposals in various forms and venues to do that.  One candidate is ONAP, and the first of several pieces explaining why can be found HERE.  Certainly ONAP needs to be tested, in particular in terms of how its use of TOSCA can address the modeling needs.  Is it best?  What’s needed is to explore the capabilities of these solutions in that general case I noted, testing them against the variety of service configurations and mixtures of VNF and PNF, and over a range of deployment/redeployment scenarios.  If we do that, we can ensure that all the pieces of NFV fit the mission, and that we simplify the process of onboarding VNFs, infrastructure, and everything else.  The ETSI ISG probably won’t be the forum for that to happen, and the use case focus that has biased the ISG is also biasing other activities.  We may have to wait for broader NFV applications (like 5G, as Nokia suggests, and IoT) to emerge and force a more general approach to the problem.