A Somewhat-New Approach to VNFs

Most of you know that I like the concept of a VNF platform as a service (VNFPaaS) as a mechanism for facilitating onboarding and expanding the pace of NFV deployment.  That’s still true today, but I had some recent conversations with big network operators, and they tell me that it’s just not in the cards.  The time it would take to redirect industry efforts along those lines is too great; NFV would be obsolete by the time the steps were taken.  They did have an alternative that I think has merit; the notion of supporting a series of cloud PaaS frameworks as coequal VNFPaaS models.  There may even be a sign of commercial (vendor) interest in that approach.

The NFV specifications don’t define a specific hosting environment for VNFs.  Presumptively, they are deployed on a VM with a complete set of supporting platform software (operating system and middleware) bundled with them.  There are also minimal definitions to establish how a VNF in such a framework would link to the rest of the NFV management and orchestration structure.  This combination generates a lot of variability, and that means that prepping VNFs for deployment is hardly a standard process.  VNFPaaS would declare a single framework for deployment and execution, one that could provide a single set of APIs to which VNFs could be written.  Obviously, that would facilitate the onboarding and use of VNFs, but whose framework would the VNFPaaS be?  That question can’t be answered in a world where consensus has to be reached for progress, and when consensus means competitive trade-offs by all those involved.

The “multi-VNFPaaS” approach says that while software development takes place on multiple platforms, there are a small number that dominate.  One example is Java, which is supported on nearly all the hardware out there.  Suppose, instead of trying to get everyone to agree on one platform, we took the group of popular ones and prepped a VNFPaaS for each?  Staying with the Java example, we could define a specific set of Classes that represented the extensions to basic Java needed to make VNFs work with the central NFV processes.  You could have cooperative efforts define these platform-specific VNFPaaSs or you could let the platform’s owners do the heavy lifting to qualify their stuff.

Operators might prefer having full portability of VNFs across platforms, but that’s not in the cards.  They say that because most software is written to one single platform, it would be difficult to port software from one platform to another.  Basic service APIs wouldn’t line up.  Today, for example, we’re still working on a perfect solution for porting between Java and Microsoft’s .NET, even though we have more than a decade of experience.  Thus, while it might seem that we’d be giving up portability by accepting multiple VNFPaaSs, the truth is that probably wasn’t a realistic option anyway.

Even if we accept the notion of multiple platforms for VNFPaaS, there’s still the question of coming up with those Classes or APIs that provide the links between the VNFs and NFV.  We end up with a set of “basic service APIs” that access operating system and middleware services and that are outside the VNFPaaS scope, and another set of APIs that have to connect to the NFV foundation or core services and infrastructure.  This other set of APIs should be “adapters” that convert a convenient API structure for the platform involved to whatever API is used by the NFV core implementation.

Which is?  We really need to have some specific API here, which means that we really need to have a specific structure in mind for the VNF management process (VNFM).  There is a presumption in the current specs that VNFM has two pieces, one that is co-resident with the VNF and the other that is centralized and shared.  Leaving aside for the moment just how functionality is divided among these and what the specific APIs are for each, making this approach work in a multi-platform VNFPaaS would be a matter of the implementation of the API within the platform.  You could envision a platform that took responsibility for all VNFs using a remote service, one that left everything to a local process embedded in the VNF, one that used a local service bound to the VNF, or any combination of these.

What has to be standardized in this case is the relationship between the VNF-bound VNFM and the central VNFM.  It wouldn’t be too hard to do that if we had a complete picture of what a VNF deployment looked like in terms of network address space and components, but we really don’t have that as yet.  What could be done without it?  Well, the problem is that unless we assume that every NFV implementation exposes a common central VNFM API, we’re stuck with customization of the platform’s management APIs to match the APIs of the core NFV software.  That would mean that VNFs wouldn’t be fully portable within a given platform, because different NFV MANO/VNFM implementations might use different APIs.

The good news in this area is that we have a number of activities working on streamlining the onboarding of VNFs.  The specific details of these are difficult to obtain because most of the initiatives haven’t produced public output, but since it’s hard to see how you could automate a service lifecycle without any standard interfaces to work with, they should at least expose the issues.  That would probably be a useful step in solving the problems.

The reason for the “probably” qualifier is that we’re still not thinking systemically about the problem of service lifecycle management.  There are multiple management models—the “resource management only” approach that says you just keep things working as planned and it will all work out, through the “service-driven” approach that says all problems are detected at the service layer and remediation is driven from the top.  Which are we talking about?  There are issues of the network model for NFV—you have services and customers who might be “tenants” and have to be kept separate at one level, yet in most cases will resolve onto the Internet address space together.  You also have your own NFV control processes, which have to be isolated from tenants, and yet have to provide them services.  Till all of this stuff is laid out, it’s going to be difficult to apply the lessons of onboarding VNFs to the broader problem of service lifecycle management.

I don’t want to put on rose-colored glasses here, though.  Facilitating VNF onboarding removes a barrier to VNF deployment, but a removed barrier is significant to a driver only if they are on a journey where the barrier is encountered.  We still have major issues with making a broad business case for NFV.  My view that such a business case demands addressing service lifecycle management and its automation hasn’t changed.  Some of the new initiatives on onboarding could also expose issues and opportunities for lifecycle management overall, and this could be instrumental in proving (or disproving) whether operations efficiencies are enough to drive NFV forward.

The Future of Satellite Broadband

I blogged about cable and telco broadband last week, which leaves us a third significant broadband source—satellite.  The advantages of satellite broadband are obvious; you can get it anywhere in the world.  The disadvantages have been equally obvious—higher cost and performance issues on at least some delay-sensitive applications.  There are rumors that NFV or 5G or both will promote satellite, and also rumors that 5G could deal it a mighty blow.  As usual, we’ll have to wade through the stories to get a realistic picture.

Satellite broadband for commercial applications (leaving out military, in other words) consists largely of three categories—fixed satellite broadband, mobile broadband to ships, aircraft, etc., and “leased line” broadband offering point-to-point bandwidth.  We can expect demand in all these spaces to increase over the next five years, and for some specific markets demand could easily triple.  All that is good news.

The bad news, for the satellite players at least, is that there’s been a rampage of launch activity and scheduled activity, and most of the new birds are the HTS (high-throughput satellite) variety.  How much?  Start with the fact that current satellite broadband capacity is on the order of a terabit per second.  One satellite (ViaSat-3), which will see multiple launches, has a per-bird capacity that’s more than the total capacity of all the current broadband data satellites in use, more than that terabit.  The industry will probably see capacity grow by five times or more by 2020, and then double or triple again by 2022.  The result is that the Street expects the cost of satellite bandwidth to decline to about a quarter of current levels by 2020, and full HTS deployment alone could drive it down to half again that level by 2022.

All of this is the familiar geosynchronous market, too.  Low- and medium-earth-orbit (LEO/MEO) plans are even more ambitious, and they could add four or five terabits capacity by 2022, and offer lower latency for real-time communications and other applications that are sensitive to geosynchronous propagation delay.  It’s harder to estimate the total impact of the LEO/MEO satellites because the usage and geometry of the path can involve multiple satellites and thus make load predictions difficult.  Suffice it to say that if the plans are carried out, thousands of new LEO/MEO satellites could be up there by 2022.

Obviously, the big question is how the demand growth and the supply growth will play together.  Satellite data service pricing tends to be negotiated for the longer term, so the biggest changes in price will probably begin in 2020 when current data carriage contracts are expiring in significant numbers.  Current contract pricing already shows a steep discount, and so it’s reasonable to expect to see signs of price/demand elasticity by the end of this year.  However, you can’t judge how that will impact the market without knowing what the total addressable market is.  A lot of that depends on rather subtle aspects of geography, demography, and consumer behavior.

The consensus of the Street’s positive analysts on the space is that all three of the satellite broadband commercial data opportunities have large upsides.  The negative analysts all say (of course) just the opposite.  I’ll look at the segments and try to sort out a realistic view of each.

Satellite fixed consumer broadband is the potentially most interesting of all the spaces because of price/demand elasticity.  A sharp increase in capacity and a corresponding drop in prices would enable as much as 30x growth in this space, most of it coming from emerging market areas.  The challenge is that the equipment needed on the ground side is still costly for these markets, though there may be an opportunity to “hybridize” satellite broadband with other (fixed wireless access) services to reduce per-household cost.

Some Street analysts say that there could be a billion new satellite broadband customers unlocked at the new price points.  I’m doubtful.  A billion new users, meaning a billion new VSAT terminals?  I don’t see the data to prove that such a market exists, and I’m particularly doubtful that the initial cost of the terminal fits developing-market budgets.  FWA would lower the cost of access but reduce the number of satellite users by sharing the VSAT.  Then there’s the fact that concentrations of population that inevitably follow if you presume a billion new users would be attractive enough to justify looking at terrestrial solutions.

I think that the realistic growth opportunity for this space is on the order of 5x by 2022, the time when the pricing curve will be declining the most.  That’s good, but the interesting thing is that my model suggests that the cost reductions needed to boost satellite broadband in developing markets would reduce the revenue from current satellite broadband applications by down-pricing bandwidth, to the point where in the pre-2022 period you’d see a slight decline in revenue.  This means that you’d need to make up the revenue loss by increasing deployment in major markets, and for satellite broadband that doesn’t seem to be in the cards.

The aircraft and marine broadband space looks more promising, and could even provide some of that revenue relief.  As people get more dependent on their phones, they come to expect to be able to get WiFi anywhere they’re spending time.  Today we see an uptake rate of less than 10% on most aircraft broadband services, and slightly less for maritime services.  The model says that maritime broadband could attain 100% penetration if it were free, and could achieve 40% penetration if costs were half what they are today.  That is within the range of reduction possible with HTS and LEO/MEO technology.  This could make every cruise ship a candidate for significant bandwidth—hundreds of megabits even for mid-sized ones.  This is probably the opportunity that has the best chance of creating demand enough to sustain overall revenue for the industry in the face of slipping bandwidth cost.

Aircraft is another big opportunity for satellite, because there are already airlines that offer satellite-based WiFi free and because consumers who are able to slake their thirst for social media on planes could well become even more dependent on broadband, and thus demand it any time they travel or stay anywhere.  Early data suggests that the uptake rate for broadband on aircraft is only about 30% even when it’s free, and while my model says that changing social trends could bring in-flight WiFi use to 100% on all flights, that wouldn’t be likely until after 2022.  Some consumer survey data suggests that the greatest near-term opportunity would lie in flights between 3 and 6 hours’ duration.  This would equate to a trip of about 1,500 to 2,500 miles, which is a bit higher than the average flight mileage in the US.  International market data seems to show a slightly lover mileage per flight.

One important trend in the airline space is the move to offer consumer broadband and video streaming as an alternative to in-flight entertainment.  Airline policies on entertainment vary widely by market, but in general the airlines offer it on longer flights, flights falling into that 3-to-6-hour sweet spot.  By deferring the cost of entertainment systems, airlines can justify subsidizing WiFi onboard, which then gets the airlines closer to the “free” WiFi that could bring radical changes to the market.   However, we can be fairly certain based on terrestrial content delivery practices that aircraft would end up caching their feature videos aboard, reducing the need to support a separate satellite stream per viewer.

Satellite “leased line” services, point-to-point broadband, is in my view the most problematic of the opportunities.  There are absolutely locations where industrial operations or tourism demand broadband service but are too isolated to build out terrestrial infrastructure.  However, we all know that fiber optic cables span most of the world’s oceans today, and multiple times.  I think there is a clear disaster recovery opportunity here, and I also think that beyond 2022 we could see satellite leased line services supplementing terrestrial services for mobile coverage, etc.  I don’t see this application contributing much before that date.

The challenge we have here is that even if we saw total satellite broadband use double by 2020, which is possible, that would still represent demand equal only to about 40% of the bandwidth that will be available by then, neglecting any contribution from LEO/MEO.  Remember that one future HTS satellite could provide as much capacity as everything we have in orbit now.  The big question is whether anything else could come along.  What that might be falls into two categories, one a “supply-side” story and the other a “demand-side” story.

The supply-side theory is that modern virtualization initiatives (SDN, NFV, 5G) will level the playing field with respect to including satellite services into a higher-level retail (or MVNO) framework.  The problem with this is that like all three of these named technologies have proven in their own spaces, just making something technically possible doesn’t mean you’ve made business sense of it.  Absent a demand driver that suddenly makes a technology an impediment in making a boatload of money, I don’t see supply-side initiatives opening any meaningful opportunity.  However, the principles of these three initiatives might help operationalize evolving satcomm infrastructure more effectively, which as we’ll see could be important.

The demand-side story runs up against the reality that the really good broadband markets are served today with terrestrial technology because they are really good.  Satellite is not going to get cheaper than terrestrial options, particularly if we do start deploying FWA to enhance FTTH, starting just beyond 2020.  The aviation and marine markets are significant in terms of number of VSAT terminals, and both will likely contribute to considerable demand growth after 2020, but unless the pace of HTS deployment slows (unlikely) the gap between demand and supply will expand as we move into the next decade.

From a technology/equipment perspective, as opposed to a satellite provider/capacity perspective, the picture we’ve painted generates some interesting shifts and opportunities.  As unit bandwidth cost falls and satellite provider profits are pressured, the role of operations efficiency grows.  That’s particularly true when what satellite networks have to deliver is in effect what a terrestrial ISP delivers, with all the subnet addressing, gateway definitions, DNS/DHCP, video caching and CDN, and so forth.  The demand growth we’re seeing is all going to happen in spaces where even more IP management is linked into satellite service management.  There are definitely things emerging from SDN, NFV, and (in the future) 5G that could help control and even reduce the cost of creating and sustaining IP connectivity rather than just satellite connectivity.

So where do we end up here?  Leading up to about 2020, I think it’s clear that supply of satellite bandwidth will outstrip demand, and at the same time operations costs for the terrestrial part of the link (including onboard aircraft) will rise with the complexity of the delivery models being adopted.  The result will be profit compression on the part of the satellite providers, which is consistent with the negative thesis of the Street.

Beyond 2020 things get more complicated.  The negatives will continue to pile up unless there are steps taken to radically increase the delivery demand in the areas of aircraft and marine satellite broadband; no other options offer much.  5G will kick in here, but in my view 5G offers more negatives to the satellite industry than positives, because the FWA hybridization with FTTN will lower fiber-infrastructure delivery costs further and make it nearly impossible to extend satellite usage significantly except in the rural parts of developing economies, which clearly have fairly low revenue potential.  We could see satellite play a role in things like IoT, but absent a clear and practical IoT model (which we don’t have) we can’t say how much of a role that will be.

The net, in my view, is that satellite broadband will face trying times and profit compression for at least five years, and very possibly longer.  That will start to impact the launches planned beyond 2022, unless we figure out a new mission.  If that’s to be done, then we need to start thinking about it pretty quickly.

Who’s Winning the Telco/Cable Battle?

There has recently been a lot of media attention focused on the cable providers, not only because they’ve been emerging as players in some next-gen technologies like SDN, NFV, and the cloud, but because they’ve been gaining market share on the telcos after losing it for years.  All of this seems tied to trends in television viewing and broadband usage, but it’s hard to say exactly what factors are driving the bus, and so hard to know where it’s heading.

One thesis for the shift is that because cable infrastructure is fairly constant throughout the service area, cable companies can deliver broadband services more consistently.  Telcos usually have zones where they can justify high-capacity broadband infrastructure, where customer density and economic status is high, but others where it’s plain DSL.  There can easily be a factor of 100 between the fastest and slowest broadband available, and cable rarely has anything like that ratio.

Another thesis is television viewing.  TV is dominated by channelized video services, the competition for “broadband” was really a competition for video.  Cable infrastructure is inherently superior to DSL (and, most agree, to satellite) for delivering channelized video.  The slower DSL connections have to husband programming to avoid congestion on the access line, and I think this was a major factor in inducing AT&T to move to satellite video delivery.

The third theory is that it’s really mobile broadband that’s the culprit.  Telcos have been focusing increasingly on mobile services because they’re more profitable, and as a result they’ve been scrimping on modernization of their wireline services, both Internet and video.  The cable companies’ primary revenue and profit center is the delivery of TV and wireline broadband, so it’s not surprising that they’ve put more into those areas, and are reaping the reward.

There are other factors too, which might form the basis of their own thesis or might be a complication in one or more of the others.  Telcos came late to the TV delivery market, and had an initial advantage in being the new kid, able to cherry-pick geographies and tune services to beat competitive offerings.  Those benefits have now passed on.  Cable companies have been a bit more successful in consolidating than telcos, and up to the AT&T/Time Warner deal (yet to be approved, but it probably will be) the cable companies have had a leg up on getting their own content properties.  All of these points are factors.

The current situation is that cable companies, who lost customers to telcos from the time when telco TV launched, have started to gain market share back.  The shift is slow because it generally requires some considerable benefits to drive consumers to go through the hassle of changing their TV, Internet, and phone, but it’s already quite visible among new customers.  At the same time, there is an indication that TV isn’t the powerful magnet that it used to be.  Verizon reported that its vanilla-minimum-channel offering ended up taking about 40% of renewals and new service deals.  Streaming video has changed the game.

Streaming video’s immediate impact on both cable companies and telcos is to shift viewing away from channelized programming, even in the home.  This means that the inherent advantage of cable for channelized delivery is minimized, but it also means that satellite TV isn’t going to save low-speed DSL companies from cable predation and that you’ll need better WiFi and data service to the home.  The phone or tablet, or the streaming stick or smart TV, is the TV of the future, and it needs a data connection.  So far, the net advantage is with cable companies.

The next level of impact here is the mobile/TV symbiosis.  AT&T’s plan to offer unmetered mobile streaming to its DIRECTV customers, and possible symbiotic features/services to enhance viewing of a telco offering on the telco’s own mobile network would open ways to empower TV and fixed broadband providers who have a mobile service, which cable companies do not.  This is almost certainly why Comcast is looking to offer some sort of MVNO service that, like Google’s Fi, feeds on WiFi wherever possible.  Comcast has public WiFi hubs, and could certainly deploy more.

In my view, the future of wireline services is tied to the mobile device, which means that if cable companies don’t secure some form of MVNO offering that can give them some latitude in pricing video streaming, they are going to lose market share again, and probably very quickly.  Some on the Street think cable companies will romp wild and free for as much as five or six years, but I think they could end up losing share even in 2017.

All of this frames infrastructure planning too.  For telcos, it means that there is a renewed reason to look at streaming video to the home, but in the form of a pure on-demand service.  Things like sporting events and news could remain “magnetic” enough to justify channelized video, but you’d be better off using your streaming bandwidth to support on-demand streaming consumption.  Five or six people aren’t going to watch the same show at the same time on individual phones or tablets, after all.  For cable companies, it means you need to have WiFi-centric MVNO or you’re dead.

This could all frame some of the 5G issues.  One of the applications of 5G that operators want to see is enhanced mobile speed—which would make video delivery easier and lower the operator cost to support a given number of streaming consumers in a cell.  Another is the Fixed Wireless Access (FWA), which would use 5G radio technology at the end of an FTTN connection to make the last jump to homes and businesses.  These drive, in a sense, wireless and wireline convergence.  They also make network slicing more valuable, because all of a sudden, we could see a lot of new MVNO candidates.  Operators like Sprint and T-Mobile would almost surely be candidate partners for the cable companies because they’re not wireline competitors.  These two, by the way, are partners with Google in Fi.

The net here in my view is that there is no winner or no truly meaningful trend in wireline broadband or video at all, there is only a set of mobile-driven trends.  The people who can be players in the mobile space can pick their features and battles like the telcos did a decade ago in channelized video.  Those who can’t plan in mobile are now going to face major problems, and if 5G or 5G-like convergence emerges by 2020, they’ll have a serious problem creating a survivable business model by 2022.

Ciena’s Liquid Spectrum: Are They Taking It Far Enough?

The Ciena announcement of what they call Liquid Spectrum has raised again the prospect of a holistic vision of network capacity management and connectivity management that redefines traditional OSI-modeled protocol layers.  It also opens the door for a seismic shift in capex from Layers 2 and 3 downward toward the optical layer.  It could jump-start SDN in the WAN, make 5G hopes of converging wireline and wireless meaningful, and introduce true end-to-end software automation.  All this, if it’s done right.  So is it?  I like Ciena’s overall story, including Liquid Spectrum, but I think they could tell, and perhaps do, more.

Liquid Spectrum, not surprisingly, is about systematizing agile optics to make the optical network into more of a collective resource than a set of dedicated and often service-specific paths.  There is no question that this could generate significantly better spectrum utilization, meaning that more traffic could be handled overall versus traditional optical networking.  There’s also no question that Liquid Spectrum does a better job of operationalizing spectrum management versus what even agile-optics systems have historically provided.

Liquid Spectrum is a better optical strategy, but that point raises two points of its own.  First, is it the best possible optical strategy?  Second, and most important, should it be an “optical” strategy at all?  These two questions are related, and harder to answer, so let’s start with the simple case and work up.

The most basic use of optical networking for services would be to provide optical routes to enterprise or cloud/web customers for things like data center interconnect (DCI).  For this mission, Liquid Spectrum is a significant advance in terms of simple provisioning of the connections, monitoring the SLA, and affecting restoration processes if the SLA is violated.  If the operator has latent revenue opportunity for this kind of service, then Ciena is correct in saying that it can bring in considerable new money.

As interesting as DCI is to those who consume (or sell) it, it’s hardly the basis for vast optical deployments even in major metro areas.  The primary optical application is mass-market service transport.  Here, the goal isn’t to create new services as much as new service paths, since truly new connection services would be very difficult to define in an age where IP and Ethernet are so totally adopted.  Liquid Spectrum’s ability to improve overall spectrum efficiency could mean that more transport capacity for services would be available per unit of capex, which is an attractive benefit.  The coming improved metrics/analytics of Liquid Spectrum will improve this area further.

It should be possible to combine some of the principles of intent-modeled networking, meaning SLA-dependent hierarchies, to define optical transport as a specific sub-service with an SLA that optical agility offered by Liquid Spectrum could then meet.  Since optical congestion management and path resiliency would be addressed invisibly within these SLAs and model elements, the higher layers would see a more reliable network, and the operations cost of that configuration should be lower.  It’s hard to say exactly how much because the savings are so dependent on network topology and service dynamism, but we’re probably talking about something on the order of a 10% reduction in network operations costs, which would equate to saving about a cent of every revenue dollar.

That’s not insignificant, but it’s not profound given that other strategies under review could save ten times that amount.  The reason why optical networking, even Liquid Spectrum, fall short of other cost reduction approaches is the tie to automation of the service lifecycle.  Obviously, you can’t automate the service lifecycle down so deep that services aren’t visible.  Service automation starts at the protocol layer where service is delivered because that’s where the money meets the network.  Optics is way down the stack, invisible unless something breaks, which means that to make something like Liquid Spectrum a meaningful pathway to opex savings, you have to tie it to service lifecycle management.

Ciena provides APIs to do just that, and they cite integration with their excellent Blue Planet orchestration platform.  There’s not much detail on the integration; Blue Planet is mentioned only in the title of a slide in the analyst deck and the slide itself shows the most basic of diagrams—a human, a box (Blue Planet) and the network.  This leaves open the critical question of how optical agility is exploited to improve service lifecycle management.  Should we look at optical agility as the tail of the service lifecycle automation dog?

You absolutely, positively, do not want to build a direct connection between service-layer changes and agile optics, because you risk having multiple service requests collide with each other or make inconsistent requests for transport connectivity.  What needs to happen is an analysis of the transport conditions based on service changes, and the way that has to happen would necessarily be reflected in how you model the “services” of the optical layer and the services of the layers above.  We don’t have much detail on Blue Planet’s modeling approach, and nothing on the specific way that Liquid Spectrum would integrate with it, so I can’t say how effective the integration would be.

Another thing we don’t have is a tie between Liquid Spectrum and SDN or “virtual wire” electrical-layer technology.  There are certainly cases where connectivity might require optical-level granularity in capacity and connection management, but even today those are rare, and if we move more to edge-distributed computing they could become rarer still.  It would be logical to assume that optical grooming was the bottom of a grooming stack that included electrical virtual-wire management as the layer above.  I think Ciena would have been wise to develop a virtual-wire strategy to unite Blue Planet and their optical products.  Logically, Ciena’s packet-optical approach could be integrated with modern SDN thinking, and it’s a referenced capability for Blue Planet, but nothing is said in the preso about packet optical or Ciena’s products in that space.

There have been a lot of optical announcements recently, and to be fair to Ciena none of them are really telling a complete network-infrastructure-transformation story.  ADVA, who also has a strong orchestration capability, did an even-more-DCI-centric announcement too, and Nokia told an optical story despite having, in Nuage, an exceptional SDN story to tell.  Product compartmentalization is driven by a lot of things, ranging from the way media and analysts cover technology to the desire to accelerate the sales cycle by focusing on a product rather than boiling the ocean.  However, it can diminish the business case for something by demanding that it be considered alone when it’s really part of a greater whole.

You have to wonder whether this compartmentalization issue is a part of a lot of technology problems.  Many emerging technologies, even “revolutions”, have been hampered by compartmentalization.  NFV and SDN both missed many (perhaps most) of the benefits that could drive them forward because they were “out of scope”.  It seems that biting off enough, these days at least, is equated to biting off too much.

I think Ciena needs to bite a bit deeper.  They have an almost unparalleled opportunity here, an opportunity to create a virtual-wire-and-optics layer that would not only improve operations efficiency but reduce the features needed in Layers 2 and 3 of the network.  That would make it easier to replace Ethernet and IP devices with basic SDN forwarding.  Sure these moves would be ambitious, but Ciena’s last quarter didn’t impress the Street.  They need some impressive quarters to follow.  Competition is tough in optics, and the recent success of the open-optical Facebook Voyager initiative shows that it would be easy to subsume optical networking in L2/L3 devices rather than absorb electrical-layer features in optical networks.  If Ciena and other optical vendors lose that battle, it’s over for them, and only a preemptive broad service-to-optics strategy can prevent the loss.

Ciena has the products to do the job, and Liquid Spectrum is a functional step along the way.  It’s also an example of sub-optimal positioning.  You can argue that the major challenge Ciena faces is that it wants to be strategic but sells and markets tactically.  If you have a broad product strategy you need to articulate it so that your product symbiosis is clear.  If that doesn’t happen, it looks like you’re a hodgepodge and not a unified company.  Ciena has a lot of great stuff and great opportunities, including Liquid Spectrum.  They still need to sing better.

The Gap Between NFV Sellers and Buyers and the Three Things Needed to Bridge It

The more things change, the more they stay the same, as the saying goes.  That certainly seems to be true with NFV, based on what I’ve heard over the last couple weeks from both vendors and network operators.  Two years ago, I noted that vendor salespeople were frustrated by the unwillingness of their buyers to transform their businesses by buying NFV technology.  There was clearly a fault in the operators’ thinking, and there were plenty of media articles that agreed that operators had to modernize the way they thought.  Operators have consistently said that they’d be happy to transform if somebody presented them with a business case.  Same today, for both groups.  Take a look at the media this week and you’ll find the same kinds of stories, about “digital mindset” or “breaking through the fog” or how an NFV strategy is only a matter of defining what it’s hosted on.

Five different vendor NFV sales types or sales executives told me this month that buyers were “resisting” the change to a virtual world, or a cloud business model, or something.  I asked each of them what they believed the problem was, and not a single one mentioned an issue with business case, cost/benefit, or anything that would normally be expected to drive a decision at the executive level.  Seven operator strategists in the same period said that vendors were “lagging” in producing an NFV solution that could validate a business case.

I think that the biggest problem here is one of focus.  The operators don’t have an NFV goal at the senior exec level, nor should they.  They do have a goal of getting more engaged in higher-level services and another in reducing the cost of their network connection services, both capex and opex.  While most operators think that NFV can play a role in achieving these goals, the technology that they think would do the most is “carrier cloud.”  They believe that somewhere between a quarter and a half of their total capex should refocus onto hosted functionality, partly to offer valuable new services (all of which are above the connectivity layer of the network) and partly to reduce costs.  They “believe”, but nobody is proving it to them yet.

What would get operators to the grand carrier cloud end-game?  At the CIO or CEO levels, the operators themselves think that the three primary drivers are OTT video, mobile contextual services, and IoT.  At the CTO level, the drivers are said to be NFV and 5G.  A disconnect between CFO/CEO and CTO isn’t uncommon, but what I think is offering some hope is that some operators and some vendors are seeing the disconnect and working to bridge it.  The question is whether their efforts will spread.

AT&T has, with Domain 2.0 and ECOMP, done more than other operators in framing a strategy that is capable of transforming to the carrier cloud.  Interestingly, it does that by creating converging cloud models that target each of the two goals—new higher-level revenue and cost management.  For revenue, AT&T has an aggressive and insightful hybrid cloud strategy that includes the ability to bond AT&T VPNs with cloud services offered by all the major providers (NetBond).  It also has content caching and IoT services.  ECOMP and D2 are aimed at the cost side and at creating a holistic service lifecycle automation process.  They have taken a business-cost-and-opportunity path toward carrier cloud without making it the specific goal, just the convergence of two separately justified evolutions.

Vendors may have taken hold of this same notion of converging approaches, but with less success.  AT&T’s D2 model divided infrastructure into zones and works to prevent vendors from gaining too much control by becoming dominant in too many zones.  That places AT&T in a role of a “benefit integrator” because vendors can’t propose a full solution because it would give them too much control.  Most operators haven’t taken this approach and thus are still looking for vendors to connect the dots to benefits.

What I see from vendor discussions is that they’re dodging that mission, for several reasons.  First, it creates a broader sales discussion that takes longer to reach a conclusion and requires more collateralization.  Vendors (salespeople especially) want a quick sale.  Second, most vendors don’t really have the complete answer to the revenue-and-cost-reduction story.  They rely instead on the notion that “standards define the future” and that operators should accept that, which lets the vendors place their offerings in a purely technical framework.  Third, vendors don’t really have a clear picture of the technology framework they’re trying to be a part of.  That’s largely because the standards don’t really draw one.

The central element in carrier cloud is an agile virtual network that, in my own view, almost has to be a form of overlay SDN.  Operators have accepted a very limited number of candidates here; Cisco, Juniper, Nokia, and VMware lead their lists.  Juniper (Contrail) and Nokia (with its Nuage SDN product) just won deals with Vodafone.  Interestingly, the first three of this list have tended to position their SDN assets cautiously to avoid overhanging product sales, and VMware is still promoting NSX more as a data center strategy than as a total virtual network.  Because of cautious positioning, vendors with credentials in the virtual-network space haven’t really promoted a virtual network vision, and the lack of that vision more than anything else muddies up the infrastructure model.  Vendors with no offering in this critical area are at risk to marginalization.

The second thing that’s needed is an updated model for deploying hosted (virtual) functions.  NFV has long focused on OpenStack and VMs, and the industry is migrating to containers, microservices, and even functional (Lambda) programming.  Much of the credible growth in opportunity lies in event-driven applications.  In fact, you can argue that without event-driven applications, there probably isn’t a large enough new-revenue pool to drive more than a quarter to a third of carrier cloud opportunity to fruition.  This is a whole new kind of component relationship, one that Amazon, Google, and Microsoft have all proven (with their functional programming features) to be incompatible with past notions of hosting, even DevOps.

The final thing is that old song, service-wide lifecycle management and automation.  The current practices cost too much, take too long, and tend to create inflexible service-to-resource relationships that limit operators’ ability to respond to market conditions.  This has to be based on a very strong service/application model and it has to be integrated with the other two points without being inflexible with respect to their state of implementation.  Abstraction is a wonderful tool in creating an evolving system even if you’re not totally sure where it’s evolving to or how fast it’s going.

No vendor really has all the pieces here, which of course explains why operators who don’t have their own vision and glue are frustrated and why vendor salespeople are likewise.  I don’t think there’s any chance that 5G standards will develop in a way that helps carrier cloud in general and NFV in particular, at least not before about 2021.  I don’t think NFV standards will evolve to address the key issues here, at least not much faster than that.  So, vendors have to hope that operators converge on their own approach (which will probably commoditize all of the elements of NFV and carrier cloud, working against vendor interest) or move more effectively to promote a solution with the right scope.  Complaining about operators’ lack of insight won’t help.

Nor will complaining about lack of vendor support.  Operators have to decide if they’re willing to sit around and wait for something to be handed to them, or work (perhaps with the ECOMP/Open-O activity) to create a useful model that they can use to guide their own evolution.

Factory Processes and Functional Elements in NFV and IoT: Connecting the Dots

Today I want to take up the remaining issue with edge-centric and functional programming for event processing, both for IoT and NFV.  That issue is control of distributed state and stateless processes.  Barring changes in the landscape, this will be the last of my series of blogs on this topic.

As always, we need to start with an example from the real and familiar world.  Let’s assume that we’re building a car in five different factories, in five different places in the world.  Each of these factories has manufacturing processes that generate local events, and our goal is to end up with a car that can be sold, is safe, and is profitable.  What we have to do is to somehow get those factories to cooperate.

There are a few things that clearly won’t work.  One is to just let all the factories do their own thing.  If that were done you might end up with five different copies of some parts and no copies of others, and there would be little hope that they’d fit.  Thus, we have to have some master plan for production that imposes specific missions on each factory.  The second thing we can’t have is to drive our production line with events that have to move thousands of miles to get to a central event control point, and have a response return.  We could move through several stages of production during the turnaround.  These two issues frame our challenge.

What happens in the real world is that each of our five factories is given a mission (functional requirements) and a timeline (an SLA).  The presumption we have in manufacturing is that every producing point builds stuff according to that combination of things, and every other factory can rely on that.  Within a given factory, the production processes, including how the factory handles events like materials shortages or stoppages in the line, are triggered by local events.  These events are invisible to the central coordination of the master plan; that process is only interested in the mission—the output—of the factories and their meeting the schedule. A broad process is divided into pieces that are individually coordinated and then combined based on a central plan.

If we replace our factory processes by hosted functional processes, meaning Lambdas or microservices, and we replace conditions by specific generated IoT-like events, we have a picture of what has to happen in distributed event processing.  We have to presume that events are part of some system of function creation.  That system has presumptive response times, the total time it takes for an event to be analyzed and a reaction created.  The event-response exchange defines a control loop, whose length is determined by what we’re doing.  Things that happen fast require short control loops, and that means we have to be able to host supporting processes close to where the events are generated.

In both NFV and IoT we’ve tended to presume that the events generated by functions (including their associated resources) are coupled directly to service-specific processes.  The function of NFV management is presumptively centralized, and if IoT is all about putting sensors on the Internet, then it’s all about having applications that directly engage with events.  If our car-building exercise is an accurate reflection of the NFV/IoT world, this isn’t practical because we either create long control loops to a centralized process or create disconnected functions that don’t add up to stable, profitable, activity.

The path to solution here has been around for a decade; it’s hidden inside a combination of the TMF’s Shared Information and Data model (SID) and the Next-Generation OSS Contract (NGOSS Contract).  SID divides what we’d call a “service” into elements.  Each of these elements could correspond to our “factories” in the auto example.  If there’s a blueprint for a car that shows how the various assemblies like power train, passenger compartment, etc. fit, then there would be a blueprint for how each of these assemblies was constructed.  The “master blueprint” doesn’t need the details of each of these sub-blueprints.  They only need to conform to a common specification.  With a blueprint at any level, we can employ NGOSS Contract principles to steer local events to their associated processes.

What this says is that breaking up services or IoT processes into a hierarchy isn’t just for convenience in modeling deployment, it’s a requirement in making event processing work.  With this model, you don’t have to send events around the world, only through the local process system.  But what, and where, is that local process system?

The answer here is intent modeling.  A local process system is an intent-modeled black-box “factory” that produces something specific (functional behavior) under specific guarantees (an SLA).  Every NFV service or IoT application would be made up of some number of intent models, and hidden inside them would be a state/event engine that linked local events to local processes, with “local” here meaning “within the domain”.  If these black boxes have to signal the thing above that uses them, it signals through its own event set.  A factory full of sensors might be aggregated into a single event that reports “factory condition.”

From this, you can see that not only isn’t it necessary to build a single model of a service or an IoT application that describes everything, it’s not even desirable.  The top-level description should only reference the intent models of the layer below—just like in the OSI Reference Model for network protocols, you never dip into how the layer below does something, only the services it exposes.  Services and applications are composed not from the details of every local event-handling process, but from the functional elements that collect these processes into units of utility.

The “factory” analogy is critical here.  Every element, every intent model, is a factory.  It has its own blueprint for how it does its thing, and nothing outside it has any reason to know or care what that blueprint is.  It should not be exposed because exposing it would let something else reference the “how” rather than the “what”, creating a brittle implementation that any change in technology would break.

This brings us to the “where”, both in a model-topology sense and in a geographic sense.  If what we’re after is a set of utility processes that process local events, then we could in theory define the factories based on geography, or administration, or functionality, or a combination of those things.  We can have multiple factories that produce the same utility process, perhaps in a different way or in a different place.

To make this work, you need to have a standard approach to intent modeling so that a “factory abstraction” at a higher level can map to any suitable “factory instance” below.  That means standardized APIs to communicate the intent and SLA, and a standard way to exchange events/responses.  Strictly speaking you don’t need to standardize what happens inside the factory.  However, if you also standardize the state/event structure that creates the implementation—linking local events to local processes in a standardized way, then every intent model at every level looks the same, and processes that are used in one could also be used in others that required the same behavior.

If a high-level structure, a service or application, needs to reference one of our utility processes, it would represent it as an intent model and leave the decoding to the implementation.  If that structure wanted to specify a specific factory it could, or it could leave the decision on what factory to use (Pittsburgh or Miami, VPN or VLAN) to a lower-level abstraction that might make the selection based on the available technology or the geography of the service.

If you presume this approach, then every element of a service is an abstraction first and an implementation second.  Higher-layer users see only the abstraction, and all who provide implementations must build to that abstraction as their “product specification”.  There’s no difference whether an abstraction is realized internally or externally, or with legacy or new technology.  There’s no difference, other than perhaps connectivity, optimality, or price, in where it’s implemented.  Within a given functional capability set, you pick factories, or instantiate them, based on optimality.

In the IoT space, there could also be abstractions created based on geography or on functionality.  I used the example of driving a couple blogs back; you could envision traffic-or-street-related IoT as being a series of locales that collected events and offered common route and status services.  A self-drive car or an auto GPS user might exercise a local domain’s services in abstract from a distance, but shift to a lower-level service as they approached the actual geography.  That suggests that you might want to be able to allow an abstraction to offer selective exposure of lower-level abstractions.

It’s harder to lay out a specific structure of what a state/event model might look like for IoT, but I think the easiest way to approach it is to say that IoT is a service that can be decomposed, and that the decomposition process will balance issues like control loop length and geographic hosting efficiency to decide just where to put things, which frames how to abstract them optimally.  However, I think that the goal is always to create a model approach that lets you model an intersection, a route, a city, a country, a fleet of vehicles, or whatever using the same approach, the same tools, the same APIs and event and process conventions.

Even a self-driving car should, in my view, have a model that lives in the vehicle and receives and generates events.  That’s something we’ve not talked about, and I think we’re missing an opportunity.  Such an approach would let you define behavior when the vehicle has no access to IoT sensors outside it, but also how it could integrate the “services” of city, route, and intersection models to create a safe and optimal experience for the passengers.

This raises the very interesting question of whether the vehicle itself, as something capable of being directed and changing speeds, should also be modeled.  A standard model for a vehicle would facilitate open development of autonomous vehicle systems and also cooperative navigation between vehicles and street-and-traffic IoT.  It shouldn’t be difficult; there are only a half-dozen controls a driver can manipulate and they tend to fall into two groups—switch with specific states (like on/off), and “dial” that lets you set a value within a specified range.

With proper factory support, both IoT and NFV can distribute state/event systems and processes to take advantage of function scaling and healing without risk of losing state control and the ability to correlate the context of local things into a master context.  That combination is essential to get the most from either of these advances—in fact, it may be the key to making the “advances” really advance anything at all.

Google Steps Into Lambdas: What More Proof Do We Need?

I write a lot about things that aren’t mentioned often elsewhere, and that might rightfully make you wonder whether I’m just off in the lunatic fringe.  I did a series of blogs talking about the shift in software in general, and the cloud in particular, to “functional” or “Lambda” programming, and a few of you indicated it was a topic they’d never heard of.  So, was I on the edge or over it, here?  I think the latest news shows which.

Google, finally awakening to the battle with Amazon and Microsoft for cloud supremacy, is making changes to its cloud services.  One of the new features, announced at Google’s Cloud Next event, is extending Google’s “elastic pricing” notion to fixed machine instances, but the rest focus on functional changes.  In one case, literally.

Even the basic innovations in Google’s announcement were indicators of a shift in the cloud market.  One very interesting one is the new Data Loss Protection, which takes advantage of Google’s excellent image analysis software to identify credit card images and block out the number.  There are other security APIs and features as well, and all of these belong to the realm of hosted features that extend basic cloud services (IaaS).  In combination, they prove that basic cloud hosting is not only a commodity, it’s a dead end as far as promoting cloud service growth is concerned.  The cloud of the future is about cloud-based features, used to develop cloud-specific applications.

Which leads us to what I think is the big news, the service Google calls “Cloud Functions”.  This is the same functional/Lambda programming support that Amazon and Microsoft have recently added to their hosted-feature inventory.  Google doesn’t play up the Lambda or functional programming angle; they focus instead on the more popular microservice concept.  A Cloud Function is a simple atomic program that runs on demand wherever it’s useful.

Episodic usage isn’t exactly the norm for business applications, and Google makes it clear (as do Amazon and Microsoft) that the sweet spot for functional cloud is event processing.  When an event happens, the associated Cloud Functions can be run and you get charged for that.  When there’s no event, there’s no charge.

There are a lot of things Google could focus a new competitive drive on, and making Cloud Functions a key element of that drive says a lot about what Google believes will be the future of cloud computing.  That future, I think, could well be built on a model of computing that’s a variant on the popular web-front-end approach now used by most enterprises.  We could call it the “event-front-end” model.

Web front-end application models take the data processing or back-end elements and host them in the data center as usual.  The front-end part, the thing that shows screens and gives users their GUI, is hosted in the cloud as a series of web servers.  Enterprises are generally comfortable with this approach, and while you may not hear a lot about this, the truth is that most enterprise cloud computing commitments are built on these web front-ends.

It seems clear that Amazon, Google, and Microsoft all see the event space as the big driver for enterprise cloud expansion beyond the web front-end model.  The notion of an event front-end is similar in that both events and user GUI needs are external units of work that require an intermediary level of functionality, before they get committed to core business applications.  You don’t want your order entry system serving web pages, only processing orders.  Similarly, an event-driven system is likely to have to do something quickly to address the event, then hand off some work to the traditional application processes.

I doubt that even Google, certainly geeky enough for all practical purposes, think that microservice programming or Lambda programming or any other programming technique is going to suddenly sweep the enterprise into being a consumer of Cloud Functions.  I don’t think they believe that there’s a runaway revenue opportunity converting web front-ends to Cloud Functions either (though obviously user-generated HTTP interactions can be characterized as “events”).  What is happening to drive this is a realization that there’s a big onrushing trend that has been totally misunderstood, and whose realization will drive a lot of cloud computing revenue.  That trend is IoT.

The notion that IoT is just about putting a bunch of sensors and controllers on the Internet is (as I’ve said many times) transcendentally stupid even for our hype-driven, insight-starved, market.  What all technology advances for IT are about is reaping some business benefit, which means processing business tasks more effectively.  Computing has moved through stages in supporting productivity gains (three past ones, to be exact) and in each the result was moving computing closer to the worker.  Moving computing to process business events moves computing not only close to workers, but in many cases moves it ahead of them.  You don’t wait for a request from a worker to do something, you do it in response to the event stimulus that would have (in the past) triggered worker intervention.  Think of it as “functional robotics”; you don’t build robots to displace humans, you simply replace them as the key element in event processing.

This approach, if taken, would offer cloud providers a chance to get themselves into the driver’s seat on the next wave of productivity enhancement, an activity that would generate incremental business benefits (improved productivity) and thus generate new IT spending rather than displacing existing spending.  That would be an easier sell—politically, because there’s no IT pushback caused by loss of influence or jobs, and financially because unlocking improved business operations has more long-term financial value than cutting spending for a year or so.

Event processing demands edge hosting.  Functional programming is most effective as an edge-computing tool, because the closer you get to the edge of the network in any event-driven system, the sparser the events to process are likely to be.  You can’t plan a VM everywhere you think you might eventually find an event.  Amazon recognized that with Greengrass, a way of pushing the function hosting outside the cloud.  I think Google recognizes it too, but remember that Google has edge cache points already and could readily develop more.  I think Google’s cloud will be more distributed than either Amazon’s or Microsoft’s, because Google has designed their network from the first to support edge-distributed functionality.  Its competitors focused on central economies of scale.

The functional/event dynamic is what should be motivating the network operators.  Telcos have a lot of edge real estate to play with in hosting stuff.  The trick has been getting something going that would (in the minds of the CFOs) justify the decision to start building out.  The traditional approach has been that things like NFV would generate the necessary stimulus.  It didn’t develop fast enough or in the right way.  We then have 5G somehow doing the job, but there is really no clear broad edge-hosting mandate in 5G as it exists, and in any case we could well be five years away from meaningful specs in that area.

Amazon, Google, and Microsoft think that edge-hosting of functions for event processing is already worth going after.  Probably they see IoT as the driver.  Operators like IoT, but for the short-sighted reason that they think (largely incorrectly) that it’s going to generate zillions of new customers by making machines into 4/5G consumers.  They should like it for carrier cloud, and what we’re seeing from Google is a clear warning sign that operators are inviting another wave of disintermediation by being so passive on the event opportunity.

Passivity might seem to be in order, if all the big cloud giants are already pushing Lambdas.  Despite the interest from them, all the challenges of event processing through functions/microservices/Lambdas have not been resolved.  Stateless processes are fine, but events are only half the picture of event handling, and the states in state/event descriptions show that the other half isn’t within the realm of the functional processes themselves.  We need to somehow bring states, bring context, to event-handling and that’s something that operators (and the vendors who support them) could still do right, and first.

State/event processing is a long-standing way of making sense out of a sequence of events that can’t be properly interpreted without context.  If you just disabled something, sensors that record its state could be expected to report a problem.  If you’re expecting that something to control a critical process, then having it report a problem is definitely not a good thing.  Same event, different reactions, depending on context.  Since Lambdas are stateless, they can’t be the thing that maintains state.  What does?  This is the big, perhaps paramount, question for event processing in the future.  We need to be able to have distributed state/event processing if we expect to distribute Lambdas to the edge.

I didn’t exaggerate the importance of the Lambda-and-event paradigm in my past blogs.  I’m not exaggerating it now, and I think Google just proved that.  There aren’t going to be any more opportunities for operators to reap IoT and edge-hosting benefits once the current one passes.  This is evolution in action—a shift from predicable workflows to dynamic event-driven systems, and from a connecting economy to a hosting economy.  Evolution doesn’t back up, and both operators and vendors need to remember that.

Applying Edge Programming and Lambdas to OSS/BSS Modernization (and IoT)

Most of you will recall that there has been a persistent goal to make OSS/BSS “event-driven”.  Suppose we were to accept that was the right approach.  Could we then apply some of the edge-computing and IoT principles of software structure and organization of work to the OSS/BSS?  Let’s take a look at what would happen if we did that.

The theoretical baseline for OSS/BSS event-driven modernization is the venerable “NGOSS Contract” notion, which describes how the service contract (modeled based on the TMF SID model) can act as a kind of steering mechanism to link service events to operations/management processes (using, by the way, Service Oriented Architecture or SOA principles).  This concept is a major step forward in thinking about operations evolution, but it’s not been widely adopted, and in many ways it’s incomplete and behind the times.

The most obvious issue with the NGOSS Contract approach is that it doesn’t address where the events come from.  Today, most services are inherently multi-tenant with respect to infrastructure use, which means that a given infrastructure event might involve multiple services, or in some cases none at all.  To make matters worse, most modern networks and all modern data centers have resource-level management and remediation processes that at the least supplement and at most replace service- or application-specific fault and performance management.  The flow of events differs in each of the scenarios these event-related approaches.

The second problem is SOA.  SOA principles don’t dictate that a given “service” which is an operations process in our discussion, be stateless, meaning that it doesn’t store information between executions.  It’s the stateless property that lets you horizontally scale components under load or replace them when they break without interfering with operations.  We have software concepts that many believe will (or have) superseded SOA—microservices and functional (Lambda) programming.  Why would we “modernize” OSS/BSS using software concepts already deprecated?

The third problem with the approach is harder to visualize—it’s distributability.  I don’t mean that the software processes could be hosted anywhere, but that there is a specific architecture that lets operators strike a balance between keeping control loops short for some events, and retaining service-contract-focused control over event steering.  If I have an event in Outer Oshkosh that I want to handle quickly, I can put a process there, but will that distribution of the process then defeat my notion of model-driven event steering?  If I put the contract there, how will I support events in Paris efficiently?  If I put the contract in multiple places, have I lost true central control because service state is now multiply represented?

Reconciling all of this isn’t something that software principles like Lambda programming can fix by itself.  You have to go to the top of the application ladder, to the overall software architecture and the way that work flows and things are organized.  That really starts with the model that describes a network service or a cloud application as a distributed system of semi-autonomous components.

Outer Oshkosh and Paris, in my example, are administrative domains where we have a combination of raw event sources and processing resources.  Each of these places are making a functional contribution to my service, and thus the first step in creating a unified, modern, event-driven OSS/BSS process is to model services based on functional contributions.  There are natural points of function concentration in any service, created by user endpoints, workflow/traffic, or simply by the fact that there’s a physical facility there to hold things.  These should be recognized in the service model.

The follow-on point to this is that function concentration points that are modeled are also intent-based systems that have states and both process and generate events.  If something is happening in Paris or Outer Oshkosh that demands local event handling, then rather than forcing a central model to record the specifics of that handling, have a “local” model of the function behavior that does that.  A service, then, would have a model element representing each of these functions, and would presumably be defining the event-to-process mappings not inside each of the functions (they’re black boxes) but rather then event-to-process mappings for the events those functions each generate at the service level.

This kind of structure is a bit like the notion of hierarchical management.  You don’t try to run a vast organization from a single central point; you build sub-structures that have their own missions and methods, let each of them fill their roles their own way, and coordinate the results.  This notion illustrates another important point on my example; it’s likely you would have a “US” and an “EU” structure that would be coordinating the smaller function concentrations in those geographies.  In short, you have a hierarchy that sits between the raw event sources and the central model, and each level of that hierarchy absorbs the events below and generates new events that represent collective, unhandled, issues to the stuff above.

Edge processes in this model are essentially event-translators.  They absorb local events and accommodate need for immediate short-loop reaction, and they maintain functional state as a means of generating appropriate events to higher-level elements.  Thus, HighLevelThing is good if all its IntermediateLevelThings are good, and each of these depends on LowLevelThings.

This approach has the interesting property of letting you deploy elements of service lifecycle management to the specific places where events are being generated.  In theory, you could even marshal extra processing resources to accommodate a failure, or to help you expedite the change from one service configuration to another.

The interesting thing about this sort of modeling and event-handling is that it also works with IoT.  Precious little actual thought has gone into IoT; it’s all been hype and self-serving statements from vendors.  The reality of IoT is that there will be little application-to-sensor interaction.  Something like that neither scales nor can provide security and privacy assurance, not to mention being cost-effective.

The real-world IoT will be a series of function communities linked to sensors and using common basic event processing and event generation strategies.  There might be a “Route 95 Near the NJ Bridge” community, for example, which would subscribe to events that are processed somewhere local to that point and refined into new events that relate specifically to traffic conditions at the specified intersection.  This community might be a larger part of both the “US 95” community and the “NJ Turnpike” and “PA Turnpike” communities.

Function communities in IoT are hierarchical just like they are in network services, and for the same reason.  If you’re planning a trip along the East Coast, you might need to know the overall conditions on Route 95, but you surely don’t need to know them further ahead than your travel timeline dictates.  Such a trip, in IoT terms, is a path through function communities, and as you enter one you become interested in the details of what’s happening (traffic-wise) there, and more interested than before in conditions ahead.  An “event” from the nearby community might relate to what’s happening now, but events from the next community in your path are interesting only if they’re likely to persist for the time you’ll need to get there.

Contrast the I95 trip approach I’ve described with what would be needed if every driver needed to query sensors along the route.  Just figuring out which ones they needed, and which were real, would be daunting.  The same is true for OSS/BSS or cloud computing or service orchestration.  You need to divide complex systems into subsystems, so that each level in the hierarchy poses reasonable challenges in terms of modeling and execution.

The combination of a hierarchical modeling approach, functional/Lambda programming to create easily-migrated functions, and event-driven processes synchronized by the former and implemented through the latter, gives you an OSS/BSS and IoT approach that could work, and work far better than what we’ve been spinning up to now.  If this could deliver operational efficiencies better, then it’s what we need to be talking about.

Why the Critical Piece of VMware’s NFV 2.0 is the “Network Model” NSX MIGHT Support

I mentioned in my blog yesterday that a network and addressing model was critical to edge computing and NFV.  If that’s true, then could it also be true that having a virtual-network model was critical to vendor success in the NFV space?  The VMware NFV 2.0 announcement may give us an opportunity to test that, and it may also ignite more general interest in the NFV network model overall.

A “network model” in my context is a picture of how connectivity is provided to users and applications using shared infrastructure.  The Internet represents a community network model, one where everyone is available, and while that’s vital for the specific missions of the Internet, it’s somewhere between a nuisance and a menace for other network applications.  VPNs and VLANs are proof that you need to have some control over the network model.

One of the truly significant challenges of virtualization is the need to define an infinitely scalable multi-tenant virtual network model.  Any time you share infrastructure you have to be able to separate those who share from each other, to ensure that you don’t create security/governance issues and that performance of users isn’t impacted by the behavior of other users.  This problem arose in cloud computing, and it was responsible for the Nicira “SDN” model (now VMware’s NSX), an overlay-network technology that lets cloud applications/tenants have their own “private networks” that extend all the way to the virtual components (VMs and now containers).

NFV has a multi-tenant challenge too, but it’s more profound than the one that spawned Nicira/NSX.  VMware’s inclusion of NSX in its NFV 2.0 announcement means it has a chance, perhaps even an obligation, to resolve NFV’s network-model challenges.  That starts with a basic question that’s been largely ignored; “What is a tenant in NFV?”  Is every user a tenant, ever service, every combination of the two?  Answer: All of the above, which is why NFV needs a network model so badly.

Let’s start with what an NFV network model should look like.  Say that we have an NFV service hosted in the cloud, offering virtual CPE (vCPE) that includes a firewall virtual function, an encryption virtual function, and a VPN on-ramp virtual function of some sort.  These three functions are “service chained” according to the ETSI ISG’s work, meaning that they are connected through each other in a specific order, with the “inside” function connected to the network service and the “outside” to the user.  All nice and simple, right?  Not so.

You can’t connect something without having a connection service, which you can’t have without a network.  We can presume chaining of virtual functions works if we have a way of addressing the “inside” and “outside” ports of each of these functions and creating a tunnel or link between them.  So we have to have an address for these ports, which means we have an address space.  Let’s assume it’s an IP network and we’re using an IP address space.  We then have an address for Function 1 Input and Output and the same for Functions 2 and 3.  We simply create a tunnel between them (and to the user and network) and we’re done.

The problem is that if this is a normal IP address set, it has to be in an address space.  Whose?  If this is a public IP address set, then somebody online could send something (even if it’s only a DDoS packet) to one of the intermediary functions.  So presumably what we’d do is make this a subnet that uses a private IP address space.  Almost everyone has one of these; if you have a home gateway it probably gives your devices addresses in the range 192.168.x.x.  This would keep the function addresses hidden, but you’d have to expose the ports used to connect to the user and the network service to complete the path end to end, so there’s a “gateway router” function that does an address translation for those ports.

Underneath the IP subnet in a practical sense is an Ethernet LAN, and if it’s not an independent VLAN then the functions are still addressable there.  There are limits to the number of Ethernet VLANs you can have, and this is why Nicira/NSX same along in the first place.  With their approach, each of the IP subnets rides independently on top of infrastructure, and you don’t have to segment Ethernet.  So far, then, NSX solves our problems.

But now we come to deploying and managing the VNFs.  We know that we can use OpenStack to deploy VNFs and that we can use Nicira/NSX along with OpenStack’s networking (Neutron) to connect things.  What address space does all this control stuff live in?  We can’t put shared OpenStack into the service’s own address space or it’s insecure.  We can’t put it inside the subnet because it has to build the subnet.  So we have to define some address space for all the deployment elements, all the resources, and that address space has to be immune from attack, so it has to be separated from the normal public IP address space, the service address space, and the Internet.  Presumably it also has to be broad enough to address all the NFV processes of the operator wherever they are, so it’s not an IP subnetwork at all, it’s a VPN.  This isn’t discussed much, but it is within the capabilities of the existing NFV technology.

The next complication is the management side.  To manage our VNFs we have to be able to connect to their management ports.  Those ports are inside our subnet, so could we just provide a gateway translation of those port addresses to the NFV control process address space?  Sure, but if we do that, we have created a pathway where a specific tenant can “talk” into the control network.  We also have to expose resource management interfaces, and the same problem arises.

I think that NSX in VMware’s NFV 2.0 could solve these problems.  There is no reason why an overlay network technology like NSX couldn’t build IP subnets, VPNs, and anything else you’d like without limitations.  We could easily define, using the Private Class A IP address (1.x.x.x) an operator-wide NFV control network.  We could use one of the Class B spaces to define a facility-wide network, and use the Class C networks to host the virtual functions.  We could gateway between these—I think.  What I’d like to see is for VMware to take the supposition out of the picture and draw the diagrams to show how this kind of address structure would work.

Why?  The answer is that without this there’s no way we can have a uniform system of deployment and management for NFV because we can’t tell if everything can talk to what it needs to and that those conversations that should never happen are in fact prevented.  Also, because such a move would start a competitive drive to dig into the whole question of the multi-network map that’s an inherent (but so far invisible) part of not only NFV but also cloud computing and IoT.

Finally, because some competitor is likely to do the right thing here even if VMware doesn’t.  Think Nokia, whose Nuage product is still in my view the best overlay technology out there.  Think HPE, who just did their own next-gen NFV announcement and has perhaps the most to gain (and lose) of any vendor in the space.  This is such a simple, basic, part of any virtualized infrastructure and service architecture that it’s astonishing nobody has talked about it.

Ah, but somebody has thought about it—Google.  And guess who is now starting to work with operators on the elements of a truly useful virtual model for services?   Google just announced a partnership with some mobile operators, and they have the necessary network structure already.  And vendors wonder why they’re falling behind!

Taking a Longer Look at 5G Infrastructure and Services

It seems possible, based on the results of the MWC show, to speculate a bit on what infrastructure and service considerations are likely to arise out of the 5G specs.  “Speculate” is the key word here; I’ve already noted that the show didn’t address the key realities of 5G, IoT, or much anything else.  I also want to point out that we don’t have firm specifications here, and in my view, don’t even have convincing indicators that all the key issues are going to be addressed in the specs that do develop.  Thus, we can’t say if these “considerations” will be considered, outside this blog and those who respond on LinkedIn or to me directly.

Three things that 5G is supposed to do according to both the operators and what I read as “show consensus” are to support a unified service framework for wireline and wireless, support “network slicing” to separate services and operators who share infrastructure, and allow mobile services to incorporate elements of other connectivity resources, including wireline and satellite.  These three factors seem to frame one vision of the future that’s still not accepted widely—the notion of an explicit overlay/underlay structure for 5G.

Traditional networking is based on two notions; that services are built on layers that abstract a given layer from the details in implementing the layers below, and that within a layer the protocols of the layer define the features of the service.  When you have an IP network, for example, you rely on some Level 2 and Level 1 service, but you don’t “see” those layers directly.  You do “see” the features of the IP network in the features of your service.

Overlay/underlay networking is similar to the layered structure of the venerable OSI model, but it extends it a bit.  We have overlay/underlay networking today in “tunnel networks” that build connectivity based on the use of virtual paths or tunnels supported by a protocol like Ethernet or IP, and we now have formalized overlays built using SDN or SD-WAN technology.  Most overlay/underlay networks, in contrast to typical OSI-layer models, don’t rely on any feature of the layer below other than connectivity.  There are no special protocols or features needed.  Also, overlay/underlay networking has from the first been designed to allow multiple parallel overlays on a single underlay; most OSI-modeled networks have a 1:1 relationship between L2 and L3 protocols.

In a 5G model, the presumption of overlay/underlay services would be that there would be some (probably consistent) specification for an overlay, both in terms of its protocols and features.  This specification would be used to define all of the “service networks” that wireline and wireless services currently offer, and so the overlay/underlay framework would (with one proviso I’ll get to) support any “service network” over any infrastructure.  That satisfies the first of our three points.

The second point is also easily satisfied, because multiple parallel overlay networks are exactly what network slicing would demand.  If we expanded the “services” of the underlay network to include some class-of-service selectivity, the overlays could be customized to the QoS needs of the services they represent in turn.

In both SD-WAN and SDN overlays, the connectivity of the overlay is managed independent of the underlay; the OSI model tends to slice across layer boundaries or partition the devices to create overlay/underlay connectivity.  In most SD-WAN applications the presumption is that the edge devices (where the user is attached) terminate a mesh of tunnels that create connectivity.  In SDN, there may be a provision for intermediary steering, meaning that an endpoint might terminate some tunnels and continue others.  For proper 5G support, we need to review these options in the light of another element, which is explicit network-to-network interconnect.

Most protocols have some mechanism for NNI, but these are usually based on creating a connection between those singular top-of-the-stack OSI protocols.  In overlay/underlay networks, an NNI element lives at the overlay level, and simply connects across what might be a uni-protocol (same protocol for the underlay) or a multi-protocol (a different underlay on each side) border.  Alternatively, you could have an underlay gateway that binds the two networks together and harmonizes connectivity and QoS, and this could allow the overlay layer to treat the two as the same network.

The border concept could also describe how an underlay interconnect would be shared by multiple overlays, and that concept could be used to describe how a fiber trunk, satellite link, or other “virtual wire” would be represented in an overlay/underlay structure and how it could be used by multiple services.  On- and off-ramps to links like this are a form of gateway, after all.

The question that’s yet to be addressed here is the role that virtual function hosting might play.  There’s nothing explicitly in 5G discussions to mandate NFV beyond hopefulness.  On the other hand, the existence of an overlay technology could well create the beginning of an NFV justification, or at least a justification for cloud-hosting of these overlay components rather than dedicating devices to that role.  An overlay network should be more agile than the underlay(s) that support it.  That agility could take the form of having nodes appear and disappear at will, based on changes in traffic or connectivity, and also in response to changes in the state of the underlay network.  Virtual nodes fit well into the overlay model, even NFV-hosted virtual nodes.

Beyond that it’s harder to say, not because hosting more features isn’t beneficial but because hosting alone doesn’t justify NFV.  NFV was, from the first, fairly specialized in terms of its mission.  A “virtual network function” is a physical network function disembodied.  There really aren’t that many truly valuable physical network functions beyond nodal behavior.  Yes, you can hypothesize things like virtual firewalls and NATs, but you can get features like that for a few bucks at the local Staples or Office Depot, at least for the broad market.  Moving outside nodal (connectivity-routing) features to find value quickly takes you outside the realm of network functions and into application components.  Is a web server a network function, or a mail server?  Not in my view.

From the perspective of 5G and IoT, though, the requirements for hosting virtual functions or hosting cloud processes are very similar; there is a significant connectivity dimension.  We have done very little work in the NFV space to frame what network model is required to support the kind of function-hosting-and-management role needed.  That work that’s been done in the cloud space has focused on a pure IP-subnet model that’s too simple to address all the issues of multi-tenant functions that have to be securely managed as well.  In fact, the issue of addressing and address management is probably the largest issue to be covered, even in the overlay/underlay model.  If operators and vendors are serious about 5G then they need to get serious about this issue too.