Who’s Winning the Telco/Cable Battle?

There has recently been a lot of media attention focused on the cable providers, not only because they’ve been emerging as players in some next-gen technologies like SDN, NFV, and the cloud, but because they’ve been gaining market share on the telcos after losing it for years.  All of this seems tied to trends in television viewing and broadband usage, but it’s hard to say exactly what factors are driving the bus, and so hard to know where it’s heading.

One thesis for the shift is that because cable infrastructure is fairly constant throughout the service area, cable companies can deliver broadband services more consistently.  Telcos usually have zones where they can justify high-capacity broadband infrastructure, where customer density and economic status is high, but others where it’s plain DSL.  There can easily be a factor of 100 between the fastest and slowest broadband available, and cable rarely has anything like that ratio.

Another thesis is television viewing.  TV is dominated by channelized video services, the competition for “broadband” was really a competition for video.  Cable infrastructure is inherently superior to DSL (and, most agree, to satellite) for delivering channelized video.  The slower DSL connections have to husband programming to avoid congestion on the access line, and I think this was a major factor in inducing AT&T to move to satellite video delivery.

The third theory is that it’s really mobile broadband that’s the culprit.  Telcos have been focusing increasingly on mobile services because they’re more profitable, and as a result they’ve been scrimping on modernization of their wireline services, both Internet and video.  The cable companies’ primary revenue and profit center is the delivery of TV and wireline broadband, so it’s not surprising that they’ve put more into those areas, and are reaping the reward.

There are other factors too, which might form the basis of their own thesis or might be a complication in one or more of the others.  Telcos came late to the TV delivery market, and had an initial advantage in being the new kid, able to cherry-pick geographies and tune services to beat competitive offerings.  Those benefits have now passed on.  Cable companies have been a bit more successful in consolidating than telcos, and up to the AT&T/Time Warner deal (yet to be approved, but it probably will be) the cable companies have had a leg up on getting their own content properties.  All of these points are factors.

The current situation is that cable companies, who lost customers to telcos from the time when telco TV launched, have started to gain market share back.  The shift is slow because it generally requires some considerable benefits to drive consumers to go through the hassle of changing their TV, Internet, and phone, but it’s already quite visible among new customers.  At the same time, there is an indication that TV isn’t the powerful magnet that it used to be.  Verizon reported that its vanilla-minimum-channel offering ended up taking about 40% of renewals and new service deals.  Streaming video has changed the game.

Streaming video’s immediate impact on both cable companies and telcos is to shift viewing away from channelized programming, even in the home.  This means that the inherent advantage of cable for channelized delivery is minimized, but it also means that satellite TV isn’t going to save low-speed DSL companies from cable predation and that you’ll need better WiFi and data service to the home.  The phone or tablet, or the streaming stick or smart TV, is the TV of the future, and it needs a data connection.  So far, the net advantage is with cable companies.

The next level of impact here is the mobile/TV symbiosis.  AT&T’s plan to offer unmetered mobile streaming to its DIRECTV customers, and possible symbiotic features/services to enhance viewing of a telco offering on the telco’s own mobile network would open ways to empower TV and fixed broadband providers who have a mobile service, which cable companies do not.  This is almost certainly why Comcast is looking to offer some sort of MVNO service that, like Google’s Fi, feeds on WiFi wherever possible.  Comcast has public WiFi hubs, and could certainly deploy more.

In my view, the future of wireline services is tied to the mobile device, which means that if cable companies don’t secure some form of MVNO offering that can give them some latitude in pricing video streaming, they are going to lose market share again, and probably very quickly.  Some on the Street think cable companies will romp wild and free for as much as five or six years, but I think they could end up losing share even in 2017.

All of this frames infrastructure planning too.  For telcos, it means that there is a renewed reason to look at streaming video to the home, but in the form of a pure on-demand service.  Things like sporting events and news could remain “magnetic” enough to justify channelized video, but you’d be better off using your streaming bandwidth to support on-demand streaming consumption.  Five or six people aren’t going to watch the same show at the same time on individual phones or tablets, after all.  For cable companies, it means you need to have WiFi-centric MVNO or you’re dead.

This could all frame some of the 5G issues.  One of the applications of 5G that operators want to see is enhanced mobile speed—which would make video delivery easier and lower the operator cost to support a given number of streaming consumers in a cell.  Another is the Fixed Wireless Access (FWA), which would use 5G radio technology at the end of an FTTN connection to make the last jump to homes and businesses.  These drive, in a sense, wireless and wireline convergence.  They also make network slicing more valuable, because all of a sudden, we could see a lot of new MVNO candidates.  Operators like Sprint and T-Mobile would almost surely be candidate partners for the cable companies because they’re not wireline competitors.  These two, by the way, are partners with Google in Fi.

The net here in my view is that there is no winner or no truly meaningful trend in wireline broadband or video at all, there is only a set of mobile-driven trends.  The people who can be players in the mobile space can pick their features and battles like the telcos did a decade ago in channelized video.  Those who can’t plan in mobile are now going to face major problems, and if 5G or 5G-like convergence emerges by 2020, they’ll have a serious problem creating a survivable business model by 2022.

Ciena’s Liquid Spectrum: Are They Taking It Far Enough?

The Ciena announcement of what they call Liquid Spectrum has raised again the prospect of a holistic vision of network capacity management and connectivity management that redefines traditional OSI-modeled protocol layers.  It also opens the door for a seismic shift in capex from Layers 2 and 3 downward toward the optical layer.  It could jump-start SDN in the WAN, make 5G hopes of converging wireline and wireless meaningful, and introduce true end-to-end software automation.  All this, if it’s done right.  So is it?  I like Ciena’s overall story, including Liquid Spectrum, but I think they could tell, and perhaps do, more.

Liquid Spectrum, not surprisingly, is about systematizing agile optics to make the optical network into more of a collective resource than a set of dedicated and often service-specific paths.  There is no question that this could generate significantly better spectrum utilization, meaning that more traffic could be handled overall versus traditional optical networking.  There’s also no question that Liquid Spectrum does a better job of operationalizing spectrum management versus what even agile-optics systems have historically provided.

Liquid Spectrum is a better optical strategy, but that point raises two points of its own.  First, is it the best possible optical strategy?  Second, and most important, should it be an “optical” strategy at all?  These two questions are related, and harder to answer, so let’s start with the simple case and work up.

The most basic use of optical networking for services would be to provide optical routes to enterprise or cloud/web customers for things like data center interconnect (DCI).  For this mission, Liquid Spectrum is a significant advance in terms of simple provisioning of the connections, monitoring the SLA, and affecting restoration processes if the SLA is violated.  If the operator has latent revenue opportunity for this kind of service, then Ciena is correct in saying that it can bring in considerable new money.

As interesting as DCI is to those who consume (or sell) it, it’s hardly the basis for vast optical deployments even in major metro areas.  The primary optical application is mass-market service transport.  Here, the goal isn’t to create new services as much as new service paths, since truly new connection services would be very difficult to define in an age where IP and Ethernet are so totally adopted.  Liquid Spectrum’s ability to improve overall spectrum efficiency could mean that more transport capacity for services would be available per unit of capex, which is an attractive benefit.  The coming improved metrics/analytics of Liquid Spectrum will improve this area further.

It should be possible to combine some of the principles of intent-modeled networking, meaning SLA-dependent hierarchies, to define optical transport as a specific sub-service with an SLA that optical agility offered by Liquid Spectrum could then meet.  Since optical congestion management and path resiliency would be addressed invisibly within these SLAs and model elements, the higher layers would see a more reliable network, and the operations cost of that configuration should be lower.  It’s hard to say exactly how much because the savings are so dependent on network topology and service dynamism, but we’re probably talking about something on the order of a 10% reduction in network operations costs, which would equate to saving about a cent of every revenue dollar.

That’s not insignificant, but it’s not profound given that other strategies under review could save ten times that amount.  The reason why optical networking, even Liquid Spectrum, fall short of other cost reduction approaches is the tie to automation of the service lifecycle.  Obviously, you can’t automate the service lifecycle down so deep that services aren’t visible.  Service automation starts at the protocol layer where service is delivered because that’s where the money meets the network.  Optics is way down the stack, invisible unless something breaks, which means that to make something like Liquid Spectrum a meaningful pathway to opex savings, you have to tie it to service lifecycle management.

Ciena provides APIs to do just that, and they cite integration with their excellent Blue Planet orchestration platform.  There’s not much detail on the integration; Blue Planet is mentioned only in the title of a slide in the analyst deck and the slide itself shows the most basic of diagrams—a human, a box (Blue Planet) and the network.  This leaves open the critical question of how optical agility is exploited to improve service lifecycle management.  Should we look at optical agility as the tail of the service lifecycle automation dog?

You absolutely, positively, do not want to build a direct connection between service-layer changes and agile optics, because you risk having multiple service requests collide with each other or make inconsistent requests for transport connectivity.  What needs to happen is an analysis of the transport conditions based on service changes, and the way that has to happen would necessarily be reflected in how you model the “services” of the optical layer and the services of the layers above.  We don’t have much detail on Blue Planet’s modeling approach, and nothing on the specific way that Liquid Spectrum would integrate with it, so I can’t say how effective the integration would be.

Another thing we don’t have is a tie between Liquid Spectrum and SDN or “virtual wire” electrical-layer technology.  There are certainly cases where connectivity might require optical-level granularity in capacity and connection management, but even today those are rare, and if we move more to edge-distributed computing they could become rarer still.  It would be logical to assume that optical grooming was the bottom of a grooming stack that included electrical virtual-wire management as the layer above.  I think Ciena would have been wise to develop a virtual-wire strategy to unite Blue Planet and their optical products.  Logically, Ciena’s packet-optical approach could be integrated with modern SDN thinking, and it’s a referenced capability for Blue Planet, but nothing is said in the preso about packet optical or Ciena’s products in that space.

There have been a lot of optical announcements recently, and to be fair to Ciena none of them are really telling a complete network-infrastructure-transformation story.  ADVA, who also has a strong orchestration capability, did an even-more-DCI-centric announcement too, and Nokia told an optical story despite having, in Nuage, an exceptional SDN story to tell.  Product compartmentalization is driven by a lot of things, ranging from the way media and analysts cover technology to the desire to accelerate the sales cycle by focusing on a product rather than boiling the ocean.  However, it can diminish the business case for something by demanding that it be considered alone when it’s really part of a greater whole.

You have to wonder whether this compartmentalization issue is a part of a lot of technology problems.  Many emerging technologies, even “revolutions”, have been hampered by compartmentalization.  NFV and SDN both missed many (perhaps most) of the benefits that could drive them forward because they were “out of scope”.  It seems that biting off enough, these days at least, is equated to biting off too much.

I think Ciena needs to bite a bit deeper.  They have an almost unparalleled opportunity here, an opportunity to create a virtual-wire-and-optics layer that would not only improve operations efficiency but reduce the features needed in Layers 2 and 3 of the network.  That would make it easier to replace Ethernet and IP devices with basic SDN forwarding.  Sure these moves would be ambitious, but Ciena’s last quarter didn’t impress the Street.  They need some impressive quarters to follow.  Competition is tough in optics, and the recent success of the open-optical Facebook Voyager initiative shows that it would be easy to subsume optical networking in L2/L3 devices rather than absorb electrical-layer features in optical networks.  If Ciena and other optical vendors lose that battle, it’s over for them, and only a preemptive broad service-to-optics strategy can prevent the loss.

Ciena has the products to do the job, and Liquid Spectrum is a functional step along the way.  It’s also an example of sub-optimal positioning.  You can argue that the major challenge Ciena faces is that it wants to be strategic but sells and markets tactically.  If you have a broad product strategy you need to articulate it so that your product symbiosis is clear.  If that doesn’t happen, it looks like you’re a hodgepodge and not a unified company.  Ciena has a lot of great stuff and great opportunities, including Liquid Spectrum.  They still need to sing better.

The Gap Between NFV Sellers and Buyers and the Three Things Needed to Bridge It

The more things change, the more they stay the same, as the saying goes.  That certainly seems to be true with NFV, based on what I’ve heard over the last couple weeks from both vendors and network operators.  Two years ago, I noted that vendor salespeople were frustrated by the unwillingness of their buyers to transform their businesses by buying NFV technology.  There was clearly a fault in the operators’ thinking, and there were plenty of media articles that agreed that operators had to modernize the way they thought.  Operators have consistently said that they’d be happy to transform if somebody presented them with a business case.  Same today, for both groups.  Take a look at the media this week and you’ll find the same kinds of stories, about “digital mindset” or “breaking through the fog” or how an NFV strategy is only a matter of defining what it’s hosted on.

Five different vendor NFV sales types or sales executives told me this month that buyers were “resisting” the change to a virtual world, or a cloud business model, or something.  I asked each of them what they believed the problem was, and not a single one mentioned an issue with business case, cost/benefit, or anything that would normally be expected to drive a decision at the executive level.  Seven operator strategists in the same period said that vendors were “lagging” in producing an NFV solution that could validate a business case.

I think that the biggest problem here is one of focus.  The operators don’t have an NFV goal at the senior exec level, nor should they.  They do have a goal of getting more engaged in higher-level services and another in reducing the cost of their network connection services, both capex and opex.  While most operators think that NFV can play a role in achieving these goals, the technology that they think would do the most is “carrier cloud.”  They believe that somewhere between a quarter and a half of their total capex should refocus onto hosted functionality, partly to offer valuable new services (all of which are above the connectivity layer of the network) and partly to reduce costs.  They “believe”, but nobody is proving it to them yet.

What would get operators to the grand carrier cloud end-game?  At the CIO or CEO levels, the operators themselves think that the three primary drivers are OTT video, mobile contextual services, and IoT.  At the CTO level, the drivers are said to be NFV and 5G.  A disconnect between CFO/CEO and CTO isn’t uncommon, but what I think is offering some hope is that some operators and some vendors are seeing the disconnect and working to bridge it.  The question is whether their efforts will spread.

AT&T has, with Domain 2.0 and ECOMP, done more than other operators in framing a strategy that is capable of transforming to the carrier cloud.  Interestingly, it does that by creating converging cloud models that target each of the two goals—new higher-level revenue and cost management.  For revenue, AT&T has an aggressive and insightful hybrid cloud strategy that includes the ability to bond AT&T VPNs with cloud services offered by all the major providers (NetBond).  It also has content caching and IoT services.  ECOMP and D2 are aimed at the cost side and at creating a holistic service lifecycle automation process.  They have taken a business-cost-and-opportunity path toward carrier cloud without making it the specific goal, just the convergence of two separately justified evolutions.

Vendors may have taken hold of this same notion of converging approaches, but with less success.  AT&T’s D2 model divided infrastructure into zones and works to prevent vendors from gaining too much control by becoming dominant in too many zones.  That places AT&T in a role of a “benefit integrator” because vendors can’t propose a full solution because it would give them too much control.  Most operators haven’t taken this approach and thus are still looking for vendors to connect the dots to benefits.

What I see from vendor discussions is that they’re dodging that mission, for several reasons.  First, it creates a broader sales discussion that takes longer to reach a conclusion and requires more collateralization.  Vendors (salespeople especially) want a quick sale.  Second, most vendors don’t really have the complete answer to the revenue-and-cost-reduction story.  They rely instead on the notion that “standards define the future” and that operators should accept that, which lets the vendors place their offerings in a purely technical framework.  Third, vendors don’t really have a clear picture of the technology framework they’re trying to be a part of.  That’s largely because the standards don’t really draw one.

The central element in carrier cloud is an agile virtual network that, in my own view, almost has to be a form of overlay SDN.  Operators have accepted a very limited number of candidates here; Cisco, Juniper, Nokia, and VMware lead their lists.  Juniper (Contrail) and Nokia (with its Nuage SDN product) just won deals with Vodafone.  Interestingly, the first three of this list have tended to position their SDN assets cautiously to avoid overhanging product sales, and VMware is still promoting NSX more as a data center strategy than as a total virtual network.  Because of cautious positioning, vendors with credentials in the virtual-network space haven’t really promoted a virtual network vision, and the lack of that vision more than anything else muddies up the infrastructure model.  Vendors with no offering in this critical area are at risk to marginalization.

The second thing that’s needed is an updated model for deploying hosted (virtual) functions.  NFV has long focused on OpenStack and VMs, and the industry is migrating to containers, microservices, and even functional (Lambda) programming.  Much of the credible growth in opportunity lies in event-driven applications.  In fact, you can argue that without event-driven applications, there probably isn’t a large enough new-revenue pool to drive more than a quarter to a third of carrier cloud opportunity to fruition.  This is a whole new kind of component relationship, one that Amazon, Google, and Microsoft have all proven (with their functional programming features) to be incompatible with past notions of hosting, even DevOps.

The final thing is that old song, service-wide lifecycle management and automation.  The current practices cost too much, take too long, and tend to create inflexible service-to-resource relationships that limit operators’ ability to respond to market conditions.  This has to be based on a very strong service/application model and it has to be integrated with the other two points without being inflexible with respect to their state of implementation.  Abstraction is a wonderful tool in creating an evolving system even if you’re not totally sure where it’s evolving to or how fast it’s going.

No vendor really has all the pieces here, which of course explains why operators who don’t have their own vision and glue are frustrated and why vendor salespeople are likewise.  I don’t think there’s any chance that 5G standards will develop in a way that helps carrier cloud in general and NFV in particular, at least not before about 2021.  I don’t think NFV standards will evolve to address the key issues here, at least not much faster than that.  So, vendors have to hope that operators converge on their own approach (which will probably commoditize all of the elements of NFV and carrier cloud, working against vendor interest) or move more effectively to promote a solution with the right scope.  Complaining about operators’ lack of insight won’t help.

Nor will complaining about lack of vendor support.  Operators have to decide if they’re willing to sit around and wait for something to be handed to them, or work (perhaps with the ECOMP/Open-O activity) to create a useful model that they can use to guide their own evolution.

Factory Processes and Functional Elements in NFV and IoT: Connecting the Dots

Today I want to take up the remaining issue with edge-centric and functional programming for event processing, both for IoT and NFV.  That issue is control of distributed state and stateless processes.  Barring changes in the landscape, this will be the last of my series of blogs on this topic.

As always, we need to start with an example from the real and familiar world.  Let’s assume that we’re building a car in five different factories, in five different places in the world.  Each of these factories has manufacturing processes that generate local events, and our goal is to end up with a car that can be sold, is safe, and is profitable.  What we have to do is to somehow get those factories to cooperate.

There are a few things that clearly won’t work.  One is to just let all the factories do their own thing.  If that were done you might end up with five different copies of some parts and no copies of others, and there would be little hope that they’d fit.  Thus, we have to have some master plan for production that imposes specific missions on each factory.  The second thing we can’t have is to drive our production line with events that have to move thousands of miles to get to a central event control point, and have a response return.  We could move through several stages of production during the turnaround.  These two issues frame our challenge.

What happens in the real world is that each of our five factories is given a mission (functional requirements) and a timeline (an SLA).  The presumption we have in manufacturing is that every producing point builds stuff according to that combination of things, and every other factory can rely on that.  Within a given factory, the production processes, including how the factory handles events like materials shortages or stoppages in the line, are triggered by local events.  These events are invisible to the central coordination of the master plan; that process is only interested in the mission—the output—of the factories and their meeting the schedule. A broad process is divided into pieces that are individually coordinated and then combined based on a central plan.

If we replace our factory processes by hosted functional processes, meaning Lambdas or microservices, and we replace conditions by specific generated IoT-like events, we have a picture of what has to happen in distributed event processing.  We have to presume that events are part of some system of function creation.  That system has presumptive response times, the total time it takes for an event to be analyzed and a reaction created.  The event-response exchange defines a control loop, whose length is determined by what we’re doing.  Things that happen fast require short control loops, and that means we have to be able to host supporting processes close to where the events are generated.

In both NFV and IoT we’ve tended to presume that the events generated by functions (including their associated resources) are coupled directly to service-specific processes.  The function of NFV management is presumptively centralized, and if IoT is all about putting sensors on the Internet, then it’s all about having applications that directly engage with events.  If our car-building exercise is an accurate reflection of the NFV/IoT world, this isn’t practical because we either create long control loops to a centralized process or create disconnected functions that don’t add up to stable, profitable, activity.

The path to solution here has been around for a decade; it’s hidden inside a combination of the TMF’s Shared Information and Data model (SID) and the Next-Generation OSS Contract (NGOSS Contract).  SID divides what we’d call a “service” into elements.  Each of these elements could correspond to our “factories” in the auto example.  If there’s a blueprint for a car that shows how the various assemblies like power train, passenger compartment, etc. fit, then there would be a blueprint for how each of these assemblies was constructed.  The “master blueprint” doesn’t need the details of each of these sub-blueprints.  They only need to conform to a common specification.  With a blueprint at any level, we can employ NGOSS Contract principles to steer local events to their associated processes.

What this says is that breaking up services or IoT processes into a hierarchy isn’t just for convenience in modeling deployment, it’s a requirement in making event processing work.  With this model, you don’t have to send events around the world, only through the local process system.  But what, and where, is that local process system?

The answer here is intent modeling.  A local process system is an intent-modeled black-box “factory” that produces something specific (functional behavior) under specific guarantees (an SLA).  Every NFV service or IoT application would be made up of some number of intent models, and hidden inside them would be a state/event engine that linked local events to local processes, with “local” here meaning “within the domain”.  If these black boxes have to signal the thing above that uses them, it signals through its own event set.  A factory full of sensors might be aggregated into a single event that reports “factory condition.”

From this, you can see that not only isn’t it necessary to build a single model of a service or an IoT application that describes everything, it’s not even desirable.  The top-level description should only reference the intent models of the layer below—just like in the OSI Reference Model for network protocols, you never dip into how the layer below does something, only the services it exposes.  Services and applications are composed not from the details of every local event-handling process, but from the functional elements that collect these processes into units of utility.

The “factory” analogy is critical here.  Every element, every intent model, is a factory.  It has its own blueprint for how it does its thing, and nothing outside it has any reason to know or care what that blueprint is.  It should not be exposed because exposing it would let something else reference the “how” rather than the “what”, creating a brittle implementation that any change in technology would break.

This brings us to the “where”, both in a model-topology sense and in a geographic sense.  If what we’re after is a set of utility processes that process local events, then we could in theory define the factories based on geography, or administration, or functionality, or a combination of those things.  We can have multiple factories that produce the same utility process, perhaps in a different way or in a different place.

To make this work, you need to have a standard approach to intent modeling so that a “factory abstraction” at a higher level can map to any suitable “factory instance” below.  That means standardized APIs to communicate the intent and SLA, and a standard way to exchange events/responses.  Strictly speaking you don’t need to standardize what happens inside the factory.  However, if you also standardize the state/event structure that creates the implementation—linking local events to local processes in a standardized way, then every intent model at every level looks the same, and processes that are used in one could also be used in others that required the same behavior.

If a high-level structure, a service or application, needs to reference one of our utility processes, it would represent it as an intent model and leave the decoding to the implementation.  If that structure wanted to specify a specific factory it could, or it could leave the decision on what factory to use (Pittsburgh or Miami, VPN or VLAN) to a lower-level abstraction that might make the selection based on the available technology or the geography of the service.

If you presume this approach, then every element of a service is an abstraction first and an implementation second.  Higher-layer users see only the abstraction, and all who provide implementations must build to that abstraction as their “product specification”.  There’s no difference whether an abstraction is realized internally or externally, or with legacy or new technology.  There’s no difference, other than perhaps connectivity, optimality, or price, in where it’s implemented.  Within a given functional capability set, you pick factories, or instantiate them, based on optimality.

In the IoT space, there could also be abstractions created based on geography or on functionality.  I used the example of driving a couple blogs back; you could envision traffic-or-street-related IoT as being a series of locales that collected events and offered common route and status services.  A self-drive car or an auto GPS user might exercise a local domain’s services in abstract from a distance, but shift to a lower-level service as they approached the actual geography.  That suggests that you might want to be able to allow an abstraction to offer selective exposure of lower-level abstractions.

It’s harder to lay out a specific structure of what a state/event model might look like for IoT, but I think the easiest way to approach it is to say that IoT is a service that can be decomposed, and that the decomposition process will balance issues like control loop length and geographic hosting efficiency to decide just where to put things, which frames how to abstract them optimally.  However, I think that the goal is always to create a model approach that lets you model an intersection, a route, a city, a country, a fleet of vehicles, or whatever using the same approach, the same tools, the same APIs and event and process conventions.

Even a self-driving car should, in my view, have a model that lives in the vehicle and receives and generates events.  That’s something we’ve not talked about, and I think we’re missing an opportunity.  Such an approach would let you define behavior when the vehicle has no access to IoT sensors outside it, but also how it could integrate the “services” of city, route, and intersection models to create a safe and optimal experience for the passengers.

This raises the very interesting question of whether the vehicle itself, as something capable of being directed and changing speeds, should also be modeled.  A standard model for a vehicle would facilitate open development of autonomous vehicle systems and also cooperative navigation between vehicles and street-and-traffic IoT.  It shouldn’t be difficult; there are only a half-dozen controls a driver can manipulate and they tend to fall into two groups—switch with specific states (like on/off), and “dial” that lets you set a value within a specified range.

With proper factory support, both IoT and NFV can distribute state/event systems and processes to take advantage of function scaling and healing without risk of losing state control and the ability to correlate the context of local things into a master context.  That combination is essential to get the most from either of these advances—in fact, it may be the key to making the “advances” really advance anything at all.

Google Steps Into Lambdas: What More Proof Do We Need?

I write a lot about things that aren’t mentioned often elsewhere, and that might rightfully make you wonder whether I’m just off in the lunatic fringe.  I did a series of blogs talking about the shift in software in general, and the cloud in particular, to “functional” or “Lambda” programming, and a few of you indicated it was a topic they’d never heard of.  So, was I on the edge or over it, here?  I think the latest news shows which.

Google, finally awakening to the battle with Amazon and Microsoft for cloud supremacy, is making changes to its cloud services.  One of the new features, announced at Google’s Cloud Next event, is extending Google’s “elastic pricing” notion to fixed machine instances, but the rest focus on functional changes.  In one case, literally.

Even the basic innovations in Google’s announcement were indicators of a shift in the cloud market.  One very interesting one is the new Data Loss Protection, which takes advantage of Google’s excellent image analysis software to identify credit card images and block out the number.  There are other security APIs and features as well, and all of these belong to the realm of hosted features that extend basic cloud services (IaaS).  In combination, they prove that basic cloud hosting is not only a commodity, it’s a dead end as far as promoting cloud service growth is concerned.  The cloud of the future is about cloud-based features, used to develop cloud-specific applications.

Which leads us to what I think is the big news, the service Google calls “Cloud Functions”.  This is the same functional/Lambda programming support that Amazon and Microsoft have recently added to their hosted-feature inventory.  Google doesn’t play up the Lambda or functional programming angle; they focus instead on the more popular microservice concept.  A Cloud Function is a simple atomic program that runs on demand wherever it’s useful.

Episodic usage isn’t exactly the norm for business applications, and Google makes it clear (as do Amazon and Microsoft) that the sweet spot for functional cloud is event processing.  When an event happens, the associated Cloud Functions can be run and you get charged for that.  When there’s no event, there’s no charge.

There are a lot of things Google could focus a new competitive drive on, and making Cloud Functions a key element of that drive says a lot about what Google believes will be the future of cloud computing.  That future, I think, could well be built on a model of computing that’s a variant on the popular web-front-end approach now used by most enterprises.  We could call it the “event-front-end” model.

Web front-end application models take the data processing or back-end elements and host them in the data center as usual.  The front-end part, the thing that shows screens and gives users their GUI, is hosted in the cloud as a series of web servers.  Enterprises are generally comfortable with this approach, and while you may not hear a lot about this, the truth is that most enterprise cloud computing commitments are built on these web front-ends.

It seems clear that Amazon, Google, and Microsoft all see the event space as the big driver for enterprise cloud expansion beyond the web front-end model.  The notion of an event front-end is similar in that both events and user GUI needs are external units of work that require an intermediary level of functionality, before they get committed to core business applications.  You don’t want your order entry system serving web pages, only processing orders.  Similarly, an event-driven system is likely to have to do something quickly to address the event, then hand off some work to the traditional application processes.

I doubt that even Google, certainly geeky enough for all practical purposes, think that microservice programming or Lambda programming or any other programming technique is going to suddenly sweep the enterprise into being a consumer of Cloud Functions.  I don’t think they believe that there’s a runaway revenue opportunity converting web front-ends to Cloud Functions either (though obviously user-generated HTTP interactions can be characterized as “events”).  What is happening to drive this is a realization that there’s a big onrushing trend that has been totally misunderstood, and whose realization will drive a lot of cloud computing revenue.  That trend is IoT.

The notion that IoT is just about putting a bunch of sensors and controllers on the Internet is (as I’ve said many times) transcendentally stupid even for our hype-driven, insight-starved, market.  What all technology advances for IT are about is reaping some business benefit, which means processing business tasks more effectively.  Computing has moved through stages in supporting productivity gains (three past ones, to be exact) and in each the result was moving computing closer to the worker.  Moving computing to process business events moves computing not only close to workers, but in many cases moves it ahead of them.  You don’t wait for a request from a worker to do something, you do it in response to the event stimulus that would have (in the past) triggered worker intervention.  Think of it as “functional robotics”; you don’t build robots to displace humans, you simply replace them as the key element in event processing.

This approach, if taken, would offer cloud providers a chance to get themselves into the driver’s seat on the next wave of productivity enhancement, an activity that would generate incremental business benefits (improved productivity) and thus generate new IT spending rather than displacing existing spending.  That would be an easier sell—politically, because there’s no IT pushback caused by loss of influence or jobs, and financially because unlocking improved business operations has more long-term financial value than cutting spending for a year or so.

Event processing demands edge hosting.  Functional programming is most effective as an edge-computing tool, because the closer you get to the edge of the network in any event-driven system, the sparser the events to process are likely to be.  You can’t plan a VM everywhere you think you might eventually find an event.  Amazon recognized that with Greengrass, a way of pushing the function hosting outside the cloud.  I think Google recognizes it too, but remember that Google has edge cache points already and could readily develop more.  I think Google’s cloud will be more distributed than either Amazon’s or Microsoft’s, because Google has designed their network from the first to support edge-distributed functionality.  Its competitors focused on central economies of scale.

The functional/event dynamic is what should be motivating the network operators.  Telcos have a lot of edge real estate to play with in hosting stuff.  The trick has been getting something going that would (in the minds of the CFOs) justify the decision to start building out.  The traditional approach has been that things like NFV would generate the necessary stimulus.  It didn’t develop fast enough or in the right way.  We then have 5G somehow doing the job, but there is really no clear broad edge-hosting mandate in 5G as it exists, and in any case we could well be five years away from meaningful specs in that area.

Amazon, Google, and Microsoft think that edge-hosting of functions for event processing is already worth going after.  Probably they see IoT as the driver.  Operators like IoT, but for the short-sighted reason that they think (largely incorrectly) that it’s going to generate zillions of new customers by making machines into 4/5G consumers.  They should like it for carrier cloud, and what we’re seeing from Google is a clear warning sign that operators are inviting another wave of disintermediation by being so passive on the event opportunity.

Passivity might seem to be in order, if all the big cloud giants are already pushing Lambdas.  Despite the interest from them, all the challenges of event processing through functions/microservices/Lambdas have not been resolved.  Stateless processes are fine, but events are only half the picture of event handling, and the states in state/event descriptions show that the other half isn’t within the realm of the functional processes themselves.  We need to somehow bring states, bring context, to event-handling and that’s something that operators (and the vendors who support them) could still do right, and first.

State/event processing is a long-standing way of making sense out of a sequence of events that can’t be properly interpreted without context.  If you just disabled something, sensors that record its state could be expected to report a problem.  If you’re expecting that something to control a critical process, then having it report a problem is definitely not a good thing.  Same event, different reactions, depending on context.  Since Lambdas are stateless, they can’t be the thing that maintains state.  What does?  This is the big, perhaps paramount, question for event processing in the future.  We need to be able to have distributed state/event processing if we expect to distribute Lambdas to the edge.

I didn’t exaggerate the importance of the Lambda-and-event paradigm in my past blogs.  I’m not exaggerating it now, and I think Google just proved that.  There aren’t going to be any more opportunities for operators to reap IoT and edge-hosting benefits once the current one passes.  This is evolution in action—a shift from predicable workflows to dynamic event-driven systems, and from a connecting economy to a hosting economy.  Evolution doesn’t back up, and both operators and vendors need to remember that.

Applying Edge Programming and Lambdas to OSS/BSS Modernization (and IoT)

Most of you will recall that there has been a persistent goal to make OSS/BSS “event-driven”.  Suppose we were to accept that was the right approach.  Could we then apply some of the edge-computing and IoT principles of software structure and organization of work to the OSS/BSS?  Let’s take a look at what would happen if we did that.

The theoretical baseline for OSS/BSS event-driven modernization is the venerable “NGOSS Contract” notion, which describes how the service contract (modeled based on the TMF SID model) can act as a kind of steering mechanism to link service events to operations/management processes (using, by the way, Service Oriented Architecture or SOA principles).  This concept is a major step forward in thinking about operations evolution, but it’s not been widely adopted, and in many ways it’s incomplete and behind the times.

The most obvious issue with the NGOSS Contract approach is that it doesn’t address where the events come from.  Today, most services are inherently multi-tenant with respect to infrastructure use, which means that a given infrastructure event might involve multiple services, or in some cases none at all.  To make matters worse, most modern networks and all modern data centers have resource-level management and remediation processes that at the least supplement and at most replace service- or application-specific fault and performance management.  The flow of events differs in each of the scenarios these event-related approaches.

The second problem is SOA.  SOA principles don’t dictate that a given “service” which is an operations process in our discussion, be stateless, meaning that it doesn’t store information between executions.  It’s the stateless property that lets you horizontally scale components under load or replace them when they break without interfering with operations.  We have software concepts that many believe will (or have) superseded SOA—microservices and functional (Lambda) programming.  Why would we “modernize” OSS/BSS using software concepts already deprecated?

The third problem with the approach is harder to visualize—it’s distributability.  I don’t mean that the software processes could be hosted anywhere, but that there is a specific architecture that lets operators strike a balance between keeping control loops short for some events, and retaining service-contract-focused control over event steering.  If I have an event in Outer Oshkosh that I want to handle quickly, I can put a process there, but will that distribution of the process then defeat my notion of model-driven event steering?  If I put the contract there, how will I support events in Paris efficiently?  If I put the contract in multiple places, have I lost true central control because service state is now multiply represented?

Reconciling all of this isn’t something that software principles like Lambda programming can fix by itself.  You have to go to the top of the application ladder, to the overall software architecture and the way that work flows and things are organized.  That really starts with the model that describes a network service or a cloud application as a distributed system of semi-autonomous components.

Outer Oshkosh and Paris, in my example, are administrative domains where we have a combination of raw event sources and processing resources.  Each of these places are making a functional contribution to my service, and thus the first step in creating a unified, modern, event-driven OSS/BSS process is to model services based on functional contributions.  There are natural points of function concentration in any service, created by user endpoints, workflow/traffic, or simply by the fact that there’s a physical facility there to hold things.  These should be recognized in the service model.

The follow-on point to this is that function concentration points that are modeled are also intent-based systems that have states and both process and generate events.  If something is happening in Paris or Outer Oshkosh that demands local event handling, then rather than forcing a central model to record the specifics of that handling, have a “local” model of the function behavior that does that.  A service, then, would have a model element representing each of these functions, and would presumably be defining the event-to-process mappings not inside each of the functions (they’re black boxes) but rather then event-to-process mappings for the events those functions each generate at the service level.

This kind of structure is a bit like the notion of hierarchical management.  You don’t try to run a vast organization from a single central point; you build sub-structures that have their own missions and methods, let each of them fill their roles their own way, and coordinate the results.  This notion illustrates another important point on my example; it’s likely you would have a “US” and an “EU” structure that would be coordinating the smaller function concentrations in those geographies.  In short, you have a hierarchy that sits between the raw event sources and the central model, and each level of that hierarchy absorbs the events below and generates new events that represent collective, unhandled, issues to the stuff above.

Edge processes in this model are essentially event-translators.  They absorb local events and accommodate need for immediate short-loop reaction, and they maintain functional state as a means of generating appropriate events to higher-level elements.  Thus, HighLevelThing is good if all its IntermediateLevelThings are good, and each of these depends on LowLevelThings.

This approach has the interesting property of letting you deploy elements of service lifecycle management to the specific places where events are being generated.  In theory, you could even marshal extra processing resources to accommodate a failure, or to help you expedite the change from one service configuration to another.

The interesting thing about this sort of modeling and event-handling is that it also works with IoT.  Precious little actual thought has gone into IoT; it’s all been hype and self-serving statements from vendors.  The reality of IoT is that there will be little application-to-sensor interaction.  Something like that neither scales nor can provide security and privacy assurance, not to mention being cost-effective.

The real-world IoT will be a series of function communities linked to sensors and using common basic event processing and event generation strategies.  There might be a “Route 95 Near the NJ Bridge” community, for example, which would subscribe to events that are processed somewhere local to that point and refined into new events that relate specifically to traffic conditions at the specified intersection.  This community might be a larger part of both the “US 95” community and the “NJ Turnpike” and “PA Turnpike” communities.

Function communities in IoT are hierarchical just like they are in network services, and for the same reason.  If you’re planning a trip along the East Coast, you might need to know the overall conditions on Route 95, but you surely don’t need to know them further ahead than your travel timeline dictates.  Such a trip, in IoT terms, is a path through function communities, and as you enter one you become interested in the details of what’s happening (traffic-wise) there, and more interested than before in conditions ahead.  An “event” from the nearby community might relate to what’s happening now, but events from the next community in your path are interesting only if they’re likely to persist for the time you’ll need to get there.

Contrast the I95 trip approach I’ve described with what would be needed if every driver needed to query sensors along the route.  Just figuring out which ones they needed, and which were real, would be daunting.  The same is true for OSS/BSS or cloud computing or service orchestration.  You need to divide complex systems into subsystems, so that each level in the hierarchy poses reasonable challenges in terms of modeling and execution.

The combination of a hierarchical modeling approach, functional/Lambda programming to create easily-migrated functions, and event-driven processes synchronized by the former and implemented through the latter, gives you an OSS/BSS and IoT approach that could work, and work far better than what we’ve been spinning up to now.  If this could deliver operational efficiencies better, then it’s what we need to be talking about.

Why the Critical Piece of VMware’s NFV 2.0 is the “Network Model” NSX MIGHT Support

I mentioned in my blog yesterday that a network and addressing model was critical to edge computing and NFV.  If that’s true, then could it also be true that having a virtual-network model was critical to vendor success in the NFV space?  The VMware NFV 2.0 announcement may give us an opportunity to test that, and it may also ignite more general interest in the NFV network model overall.

A “network model” in my context is a picture of how connectivity is provided to users and applications using shared infrastructure.  The Internet represents a community network model, one where everyone is available, and while that’s vital for the specific missions of the Internet, it’s somewhere between a nuisance and a menace for other network applications.  VPNs and VLANs are proof that you need to have some control over the network model.

One of the truly significant challenges of virtualization is the need to define an infinitely scalable multi-tenant virtual network model.  Any time you share infrastructure you have to be able to separate those who share from each other, to ensure that you don’t create security/governance issues and that performance of users isn’t impacted by the behavior of other users.  This problem arose in cloud computing, and it was responsible for the Nicira “SDN” model (now VMware’s NSX), an overlay-network technology that lets cloud applications/tenants have their own “private networks” that extend all the way to the virtual components (VMs and now containers).

NFV has a multi-tenant challenge too, but it’s more profound than the one that spawned Nicira/NSX.  VMware’s inclusion of NSX in its NFV 2.0 announcement means it has a chance, perhaps even an obligation, to resolve NFV’s network-model challenges.  That starts with a basic question that’s been largely ignored; “What is a tenant in NFV?”  Is every user a tenant, ever service, every combination of the two?  Answer: All of the above, which is why NFV needs a network model so badly.

Let’s start with what an NFV network model should look like.  Say that we have an NFV service hosted in the cloud, offering virtual CPE (vCPE) that includes a firewall virtual function, an encryption virtual function, and a VPN on-ramp virtual function of some sort.  These three functions are “service chained” according to the ETSI ISG’s work, meaning that they are connected through each other in a specific order, with the “inside” function connected to the network service and the “outside” to the user.  All nice and simple, right?  Not so.

You can’t connect something without having a connection service, which you can’t have without a network.  We can presume chaining of virtual functions works if we have a way of addressing the “inside” and “outside” ports of each of these functions and creating a tunnel or link between them.  So we have to have an address for these ports, which means we have an address space.  Let’s assume it’s an IP network and we’re using an IP address space.  We then have an address for Function 1 Input and Output and the same for Functions 2 and 3.  We simply create a tunnel between them (and to the user and network) and we’re done.

The problem is that if this is a normal IP address set, it has to be in an address space.  Whose?  If this is a public IP address set, then somebody online could send something (even if it’s only a DDoS packet) to one of the intermediary functions.  So presumably what we’d do is make this a subnet that uses a private IP address space.  Almost everyone has one of these; if you have a home gateway it probably gives your devices addresses in the range 192.168.x.x.  This would keep the function addresses hidden, but you’d have to expose the ports used to connect to the user and the network service to complete the path end to end, so there’s a “gateway router” function that does an address translation for those ports.

Underneath the IP subnet in a practical sense is an Ethernet LAN, and if it’s not an independent VLAN then the functions are still addressable there.  There are limits to the number of Ethernet VLANs you can have, and this is why Nicira/NSX same along in the first place.  With their approach, each of the IP subnets rides independently on top of infrastructure, and you don’t have to segment Ethernet.  So far, then, NSX solves our problems.

But now we come to deploying and managing the VNFs.  We know that we can use OpenStack to deploy VNFs and that we can use Nicira/NSX along with OpenStack’s networking (Neutron) to connect things.  What address space does all this control stuff live in?  We can’t put shared OpenStack into the service’s own address space or it’s insecure.  We can’t put it inside the subnet because it has to build the subnet.  So we have to define some address space for all the deployment elements, all the resources, and that address space has to be immune from attack, so it has to be separated from the normal public IP address space, the service address space, and the Internet.  Presumably it also has to be broad enough to address all the NFV processes of the operator wherever they are, so it’s not an IP subnetwork at all, it’s a VPN.  This isn’t discussed much, but it is within the capabilities of the existing NFV technology.

The next complication is the management side.  To manage our VNFs we have to be able to connect to their management ports.  Those ports are inside our subnet, so could we just provide a gateway translation of those port addresses to the NFV control process address space?  Sure, but if we do that, we have created a pathway where a specific tenant can “talk” into the control network.  We also have to expose resource management interfaces, and the same problem arises.

I think that NSX in VMware’s NFV 2.0 could solve these problems.  There is no reason why an overlay network technology like NSX couldn’t build IP subnets, VPNs, and anything else you’d like without limitations.  We could easily define, using the Private Class A IP address (1.x.x.x) an operator-wide NFV control network.  We could use one of the Class B spaces to define a facility-wide network, and use the Class C networks to host the virtual functions.  We could gateway between these—I think.  What I’d like to see is for VMware to take the supposition out of the picture and draw the diagrams to show how this kind of address structure would work.

Why?  The answer is that without this there’s no way we can have a uniform system of deployment and management for NFV because we can’t tell if everything can talk to what it needs to and that those conversations that should never happen are in fact prevented.  Also, because such a move would start a competitive drive to dig into the whole question of the multi-network map that’s an inherent (but so far invisible) part of not only NFV but also cloud computing and IoT.

Finally, because some competitor is likely to do the right thing here even if VMware doesn’t.  Think Nokia, whose Nuage product is still in my view the best overlay technology out there.  Think HPE, who just did their own next-gen NFV announcement and has perhaps the most to gain (and lose) of any vendor in the space.  This is such a simple, basic, part of any virtualized infrastructure and service architecture that it’s astonishing nobody has talked about it.

Ah, but somebody has thought about it—Google.  And guess who is now starting to work with operators on the elements of a truly useful virtual model for services?   Google just announced a partnership with some mobile operators, and they have the necessary network structure already.  And vendors wonder why they’re falling behind!

Taking a Longer Look at 5G Infrastructure and Services

It seems possible, based on the results of the MWC show, to speculate a bit on what infrastructure and service considerations are likely to arise out of the 5G specs.  “Speculate” is the key word here; I’ve already noted that the show didn’t address the key realities of 5G, IoT, or much anything else.  I also want to point out that we don’t have firm specifications here, and in my view, don’t even have convincing indicators that all the key issues are going to be addressed in the specs that do develop.  Thus, we can’t say if these “considerations” will be considered, outside this blog and those who respond on LinkedIn or to me directly.

Three things that 5G is supposed to do according to both the operators and what I read as “show consensus” are to support a unified service framework for wireline and wireless, support “network slicing” to separate services and operators who share infrastructure, and allow mobile services to incorporate elements of other connectivity resources, including wireline and satellite.  These three factors seem to frame one vision of the future that’s still not accepted widely—the notion of an explicit overlay/underlay structure for 5G.

Traditional networking is based on two notions; that services are built on layers that abstract a given layer from the details in implementing the layers below, and that within a layer the protocols of the layer define the features of the service.  When you have an IP network, for example, you rely on some Level 2 and Level 1 service, but you don’t “see” those layers directly.  You do “see” the features of the IP network in the features of your service.

Overlay/underlay networking is similar to the layered structure of the venerable OSI model, but it extends it a bit.  We have overlay/underlay networking today in “tunnel networks” that build connectivity based on the use of virtual paths or tunnels supported by a protocol like Ethernet or IP, and we now have formalized overlays built using SDN or SD-WAN technology.  Most overlay/underlay networks, in contrast to typical OSI-layer models, don’t rely on any feature of the layer below other than connectivity.  There are no special protocols or features needed.  Also, overlay/underlay networking has from the first been designed to allow multiple parallel overlays on a single underlay; most OSI-modeled networks have a 1:1 relationship between L2 and L3 protocols.

In a 5G model, the presumption of overlay/underlay services would be that there would be some (probably consistent) specification for an overlay, both in terms of its protocols and features.  This specification would be used to define all of the “service networks” that wireline and wireless services currently offer, and so the overlay/underlay framework would (with one proviso I’ll get to) support any “service network” over any infrastructure.  That satisfies the first of our three points.

The second point is also easily satisfied, because multiple parallel overlay networks are exactly what network slicing would demand.  If we expanded the “services” of the underlay network to include some class-of-service selectivity, the overlays could be customized to the QoS needs of the services they represent in turn.

In both SD-WAN and SDN overlays, the connectivity of the overlay is managed independent of the underlay; the OSI model tends to slice across layer boundaries or partition the devices to create overlay/underlay connectivity.  In most SD-WAN applications the presumption is that the edge devices (where the user is attached) terminate a mesh of tunnels that create connectivity.  In SDN, there may be a provision for intermediary steering, meaning that an endpoint might terminate some tunnels and continue others.  For proper 5G support, we need to review these options in the light of another element, which is explicit network-to-network interconnect.

Most protocols have some mechanism for NNI, but these are usually based on creating a connection between those singular top-of-the-stack OSI protocols.  In overlay/underlay networks, an NNI element lives at the overlay level, and simply connects across what might be a uni-protocol (same protocol for the underlay) or a multi-protocol (a different underlay on each side) border.  Alternatively, you could have an underlay gateway that binds the two networks together and harmonizes connectivity and QoS, and this could allow the overlay layer to treat the two as the same network.

The border concept could also describe how an underlay interconnect would be shared by multiple overlays, and that concept could be used to describe how a fiber trunk, satellite link, or other “virtual wire” would be represented in an overlay/underlay structure and how it could be used by multiple services.  On- and off-ramps to links like this are a form of gateway, after all.

The question that’s yet to be addressed here is the role that virtual function hosting might play.  There’s nothing explicitly in 5G discussions to mandate NFV beyond hopefulness.  On the other hand, the existence of an overlay technology could well create the beginning of an NFV justification, or at least a justification for cloud-hosting of these overlay components rather than dedicating devices to that role.  An overlay network should be more agile than the underlay(s) that support it.  That agility could take the form of having nodes appear and disappear at will, based on changes in traffic or connectivity, and also in response to changes in the state of the underlay network.  Virtual nodes fit well into the overlay model, even NFV-hosted virtual nodes.

Beyond that it’s harder to say, not because hosting more features isn’t beneficial but because hosting alone doesn’t justify NFV.  NFV was, from the first, fairly specialized in terms of its mission.  A “virtual network function” is a physical network function disembodied.  There really aren’t that many truly valuable physical network functions beyond nodal behavior.  Yes, you can hypothesize things like virtual firewalls and NATs, but you can get features like that for a few bucks at the local Staples or Office Depot, at least for the broad market.  Moving outside nodal (connectivity-routing) features to find value quickly takes you outside the realm of network functions and into application components.  Is a web server a network function, or a mail server?  Not in my view.

From the perspective of 5G and IoT, though, the requirements for hosting virtual functions or hosting cloud processes are very similar; there is a significant connectivity dimension.  We have done very little work in the NFV space to frame what network model is required to support the kind of function-hosting-and-management role needed.  That work that’s been done in the cloud space has focused on a pure IP-subnet model that’s too simple to address all the issues of multi-tenant functions that have to be securely managed as well.  In fact, the issue of addressing and address management is probably the largest issue to be covered, even in the overlay/underlay model.  If operators and vendors are serious about 5G then they need to get serious about this issue too.

What Would Edge-Hosting Mean to Infrastructure and Software Design?

If computing in general and carrier cloud in particular is going to become more edge-focused over time, then it’s time to ask just what infrastructure features will be favored by the move.  Even today we see variations in server architecture and the balance of compute and I/O support needed.  It is very likely that there will be even more variations emerging as a host of applications compete to dominate the cloud’s infrastructure needs.  What are the factors, and what will the result be?  I’m going to have to ask you to bear with me, because understanding the very important issues here means going way beyond 140-character Tweets.

It’s always a challenge to try to predict how something that’s not even started will turn out in the long term.  Carrier cloud today is almost pre-infancy; nearly all carrier IT spending is dedicated to traditional OSS/BSS, and what little is really cloud-building or even cloud-ready is so small that it’s not likely representative of broader, later, commitment.  Fortunately, we have some insight we can draw from the IT world, insight that’s particularly relevant given the fact that things like microservices are already a major driver of change in IT, and are of increasing interest in the carrier cloud.  To get to these insights we need to look a bit at the leading edge of cloud software development.

Microservices are in many ways a kind of bridge between traditional componentized applications (including those based on the Service Oriented Architecture of almost two decades ago) and the “bleeding edge” of computing architecture, the functional programming or Lambda function wave.  A Lambda function is a software element that processes an input and produces an output without relying on the storage of internal pieces—it has a single function regardless of the context of its use.  What makes this nice is that because nothing is ever saved inside a Lambda function, you can give a piece of work to any copy of the function and get exactly the same result.  I’m going to talk a lot about the Lambda functions in this blog, so to save typing I’m going to call them “Lambdas” with apologies to the people who use the term (without capitalizing) to mean “wavelength”.

In the broader development context, this kind of behavior is known as “stateless” behavior, because there are no “states” or differences in function outcome depending on the sequence of events or messages being processed.  Stateless behavior is mandatory for Lambdas, and also highly recommended if not mandated for microservices.  Stateless stuff is great because you can replace it, scale it, or use any convenient element of it and there’s no impact, no cross-talk.  They’re bad because many processes aren’t stateless at all—think of taking money out of the bank if you need an easy example.  What you have left depends on what you’ve put in or taken out before.

The reason for this little definitional exercise is that both Amazon and Microsoft have promoted Lambda programming as a pathway to event-driven IT, and the same is being proposed for microservices.  In Amazon’s case, they linked it with distributing functions out of the cloud and into an edge element (Greengrass).  Event-driven can mean a lot of things, but it’s an almost-automatic requirement for what are called “control loop” applications, where something is reported and the report triggers a process to handle it.  IoT is clearly a control-loop application, but there are others even today, which is why Amazon and Microsoft have focused on cloud support for Lambda functions.  You can write a little piece of logic to do something and just fire it off into the network somewhere it can meet the events it supports.  You don’t commit machine image resources or anything else.

If IoT and carrier cloud will focus on being event-driven, it follows they would likely become at least Lambda-like, be based on stateless microservices that are pushed toward the edge to shorten the control loop while traditional transactional processes stay deeper in the compute structure.  Applications, then, could be visualized as a cloud of Lambdas floating around, supporting collectively a smaller number of stateful repository-oriented central applications.  The latter will almost surely look like any large server complex dedicated to online transaction processing (OLTP).  What about the latter?

The Lambda vision is one of functional units that have no specific place to live, remember.  It’s a vision of migration of capabilities to assemble them along the natural path of work, at a place that’s consistent with their mission.  If they’re to be used in event-handling, this process of marshaling Lambdas can’t take too long, which means that you’d probably have a special system that’s analyzing Lambda demand and caching them, almost like video is cached today.  You’d probably not want to send a Lambda somewhere as much as either have it ready or load it quickly from a local resource.  Once it’s where it needs to be, it’s simply used when the appropriate event shows up.

This should make it obvious that running a bunch of Lambdas is different from running applications.  You don’t need a lot of disk I/O for most such missions, unless the storage is for non-volatile referential data rather than a dynamic database.  What you really want is powerful compute capabilities, a lot of RAM capacity to hold functions-in-waiting, and probably flash disk storage so you can quickly insert a function that you need, but hadn’t staged for use.  Network I/O would be very valuable too, because restrictions on network capacity would limit your ability to steer events to a convenient Lambda location.

How Lambda and application hosting balance each other, requirements-wise, depends on how far you are from the edge.  At the very edge, the network is more personalized and so the opportunity to host “general-use Lambdas” is limited.  As you go deeper, the natural convergence of network routes along physical facilities generate places where traffic combines and Lambda missions could reasonably be expected to be shared across multiple users.

This builds a model of “networking” that is very different from what we have now, perhaps more like that of a CDN than like that of the Internet.  We have a request for event-processing, which is an implied request for a Lambda stream.  We wouldn’t direct the request to a fixed point (any more than we direct a video request that way), but would rather assign it to the on-ramp of a pathway along which we had (or could easily have) the right Lambdas assembled.

I noted earlier in this blog that there were similarities between Lambdas and microservices.  My last paragraph shows that there is also at least one difference, at least in popular usage, between Lambdas and microservices.  The general model for microservices is based on extending componentization and facilitating the use of common functions in program design.  A set of services, as independent components, support a set of applications.  Fully exploiting the Lambda concept would mean that there really isn’t a “program” to design at all.  Instead there’s a kind of ongoing formula that’s developed based on the source of an event, its ultimate destination, and perhaps the recent process steps taken by other Lambdas.  This model is the ultimate in event-driven behavior, and thus the ultimate in distributed computing and edge computing.

There’s another difference between microservices and Lambdas, more subtle and perhaps not always accepted by proponents of the technologies.  Both are preferred to be “stateless” as I noted, but in microservices it’s acceptable to use “back-end” state control to remove state/context from the microservices themselves.  With Lambdas, this is deprecated because in theory different copies of the same Lambdas might try to alter state at the same time.  It would be better for “state” or context to be carried as a token along with the request.

We don’t yet really have a framework to describe it, though.  Here’s an event, pushed out by some unspecified endpoint.  In traditional programming, something is looking for it, or it’s being posted somewhere explicitly.  Maybe it’s publish-and-subscribe.  However, in a pure Lambda model, something Out There is pushing Lambdas out along the path of the event.  What path is that?  How does the Something know what Lambdas are needed or where to put them?

If you applied the concepts of state/event programming to Lambda control, you could say that when an event appears it is associated with some number of state/event tables, tables that represent contexts that need to process that event.  The movement of the event through Lambdas could be represented as the changing of states.  Instead of the traditional notion of an event arriving at a process via a state/event table, we have a process arriving at the event for the same reason.  But it’s still necessary to know what process is supposed to arrive.  Does the process now handling an event use “state” information that’s appended to it and identify the next process down the line?  If so, how does the current process know where the next one has been dispatched, and how does the dispatcher know to anticipate the need?  You can see this needs a lot of new thinking.

IoT will really need this kind of edge-focused, Lambda-migrating, thinking.  Even making OSS/BSS “event-driven” could benefit from it.  Right now, as far as I can see, all the good work is being done in abstract with functional programming, or behind the scenes of web-focused, cloud-hosted startups who probably have stimulated both Amazon and Microsoft to offer Lambda capabilities in their clouds.  It will be hard to make IoT the first real use case for this—it’s a very big bite—but maybe that’s what has to happen.

A Slightly Early MWC Retrospective

The iconic MWC conference is now pretty much history.  The big announcements have been made, the attendees have largely exhausted themselves (the exhibitors certainly have!), and it’s time to take stock and decide whether anything important was really said and shown.  In terms of point announcements, it’s rare for something huge to come out at an event like MWC—too much crosstalk.  The buzz of the show is another matter; we can pick out some important points by looking across all the announcements and demonstrations to detect shifts and non-shifts.

The most important thing that I take away from MWC is that there is an enormous gap between 5G expectation and the current state of the technology.  The goal of 5G is service- and infrastructure-shaking, and the reality of 5G at the moment struggles to be a major shift in the RAN.  Part of the reason for this shift is the (usual) slow progress of the specifications, but another part is the fact that standards groups have a habit of grabbing the low apples or focusing on the most visible questions.

5G RAN improvements are important, but operators I talk with have consistently said that their biggest priority was to standardize the metro and access models for wireless and wireline, and to support wireless 5G extensions of fiber networks.  Without these capabilities, many operators said that it would be difficult to justify 5G versus enhanced 4G.  Ironically, the early “5G trials” have all focused on RAN and on modest adjustments to 4G, like supporting 5G frequencies, to “prove out” the technology.  Some operators have been public in their rejection of this approach, but that’s what’s been happening.

One public approach to pre-standard 5G even retains the Evolved Packet Core, which most operators told me was something that they wanted (as a number-one or number-two priority) to eliminate.  Clearly the focus of many 5G proponents is to move the process ahead even if there’s less utility in what’s produced.  That also was a criticism that’s been made in public.

The next point is that we have not yet abandoned our short-sighted and stupid vision of IoT as being all about wireless connections.  There were plenty of examples of this, but two were particularly figured in the overall stream of hype.  The first is a broadening of the notion that this is all about RF, which makes IoT all about connections.  The second is the almost hypnotic attraction to “connected car” as the prototypical IoT application.

I’m almost tired of saying that getting devices connected is the least of our IoT worries, but it is.  The majority of IoT applications will almost certainly use devices that not only aren’t directly on the Internet at all, but don’t even use Internet-related technology for connections.  Home control today relies on technologies that aren’t related to Ethernet, IP, or the Internet.  Only the home controller is an Internet device, and this model of connectivity is likely to dominate for a long time to come.  If we insist that all our sensors and controllers be IP devices that are Internet-connected, we’re building a barrier to adoption that will take unnecessary years to jump.

The connected car is another potential trap.  Most of what a connected car will do is offer WiFi to consumer mobile devices that passengers and drivers (the latter, hopefully, not while moving) are using in the vehicle.  Yes, there are other features, but the value proposition is really more like a moving WiFi hotspot than a real IoT mission.  There’s always pressure to pick something that’s actually happening and then broaden your definition of revolutionary technology to envelope it, justifying your hype.  That’s not helpful when there are real questions and issues that are not addressed by the billboard-technology example, but will have to be addressed for the market to develop.

The first positive point from the show is that both network operators and equipment vendors realize that mobile broadband personalization is the only relevant demand driver.  Wireline broadband for both consumers and businesses is really just a matter of wringing as much as profit as possible out of something that’s already marginal at best.  If there is new revenue to be had for operators, that revenue is going to come from the exploitation of mobile broadband in both enterprises and consumer markets.

There’s a sad side even to this happiness, though.  For all the fact that the explosion of interest in MWC demonstrates the victory of mobile broadband, or that many who exhibit and probably even more who attend MWC are there for things not directly related to cellular networks, we’re still missing a lot of the key points that justify the mobile focus.

A mobile device is a direct extension of the user, a kind of technological third leg or second head.  It brings the knowledge and entertainment base of the Internet and the power of cloud computing right into the hands of everybody.  The best way to look at IT evolution since the ‘50s is that each new wave brought processing closer to people.  Mobile broadband fuses the two.

Also in my view a positive was the talk from FCC Chairman Ajit Pai, where he said what shouldn’t really have surprised anyone—that the FCC planned a “lighter touch” under the new administration.  The FCC had already taken steps that indicated it would retreat from the very activist position taken by the body under the previous Chairman (Wheeler), but Pai voted against the neutrality ruling and his comments at MWC suggest he has specific moves in mind.  Reinforcing the “lighter touch” was the comment (referencing neutrality) that “It has become evident that the FCC made a mistake.  Our new approach injected tremendous uncertainty into the broadband market. And uncertainty is the enemy of growth.”

Net neutrality is important, insofar as it protects OTT competitors from operators cutting favorable deals with their own subsidiaries.  The current rules, though, were not enough to prevent AT&T from offering outside-data-plan video to its TV customers.  On the other hand, the extension of the rules that Wheeler promoted has made the relationship between subsidiaries and ISPs confusing to say the least, and it’s probably limited willingness of operators to pursue initiatives that would have promoted broadband infrastructure investment.

I have to agree with Pai here.  I think that the FCC in the last term overstepped simple neutrality goals and took a stand on the broadband business that favored one party—the OTTs—over the other, to a degree the FCC had never done before.  A dynamic broadband market—the kind that MWC and 5G propose to support—demands a symbiosis and not an artificial financial boundary.  Through almost my whole consulting career I’ve supported the notion of Internet settlement, and I still support it.  I think it’s time to take some careful, guarded, steps toward trying it out.