The Gap Between NFV Sellers and Buyers and the Three Things Needed to Bridge It

The more things change, the more they stay the same, as the saying goes.  That certainly seems to be true with NFV, based on what I’ve heard over the last couple weeks from both vendors and network operators.  Two years ago, I noted that vendor salespeople were frustrated by the unwillingness of their buyers to transform their businesses by buying NFV technology.  There was clearly a fault in the operators’ thinking, and there were plenty of media articles that agreed that operators had to modernize the way they thought.  Operators have consistently said that they’d be happy to transform if somebody presented them with a business case.  Same today, for both groups.  Take a look at the media this week and you’ll find the same kinds of stories, about “digital mindset” or “breaking through the fog” or how an NFV strategy is only a matter of defining what it’s hosted on.

Five different vendor NFV sales types or sales executives told me this month that buyers were “resisting” the change to a virtual world, or a cloud business model, or something.  I asked each of them what they believed the problem was, and not a single one mentioned an issue with business case, cost/benefit, or anything that would normally be expected to drive a decision at the executive level.  Seven operator strategists in the same period said that vendors were “lagging” in producing an NFV solution that could validate a business case.

I think that the biggest problem here is one of focus.  The operators don’t have an NFV goal at the senior exec level, nor should they.  They do have a goal of getting more engaged in higher-level services and another in reducing the cost of their network connection services, both capex and opex.  While most operators think that NFV can play a role in achieving these goals, the technology that they think would do the most is “carrier cloud.”  They believe that somewhere between a quarter and a half of their total capex should refocus onto hosted functionality, partly to offer valuable new services (all of which are above the connectivity layer of the network) and partly to reduce costs.  They “believe”, but nobody is proving it to them yet.

What would get operators to the grand carrier cloud end-game?  At the CIO or CEO levels, the operators themselves think that the three primary drivers are OTT video, mobile contextual services, and IoT.  At the CTO level, the drivers are said to be NFV and 5G.  A disconnect between CFO/CEO and CTO isn’t uncommon, but what I think is offering some hope is that some operators and some vendors are seeing the disconnect and working to bridge it.  The question is whether their efforts will spread.

AT&T has, with Domain 2.0 and ECOMP, done more than other operators in framing a strategy that is capable of transforming to the carrier cloud.  Interestingly, it does that by creating converging cloud models that target each of the two goals—new higher-level revenue and cost management.  For revenue, AT&T has an aggressive and insightful hybrid cloud strategy that includes the ability to bond AT&T VPNs with cloud services offered by all the major providers (NetBond).  It also has content caching and IoT services.  ECOMP and D2 are aimed at the cost side and at creating a holistic service lifecycle automation process.  They have taken a business-cost-and-opportunity path toward carrier cloud without making it the specific goal, just the convergence of two separately justified evolutions.

Vendors may have taken hold of this same notion of converging approaches, but with less success.  AT&T’s D2 model divided infrastructure into zones and works to prevent vendors from gaining too much control by becoming dominant in too many zones.  That places AT&T in a role of a “benefit integrator” because vendors can’t propose a full solution because it would give them too much control.  Most operators haven’t taken this approach and thus are still looking for vendors to connect the dots to benefits.

What I see from vendor discussions is that they’re dodging that mission, for several reasons.  First, it creates a broader sales discussion that takes longer to reach a conclusion and requires more collateralization.  Vendors (salespeople especially) want a quick sale.  Second, most vendors don’t really have the complete answer to the revenue-and-cost-reduction story.  They rely instead on the notion that “standards define the future” and that operators should accept that, which lets the vendors place their offerings in a purely technical framework.  Third, vendors don’t really have a clear picture of the technology framework they’re trying to be a part of.  That’s largely because the standards don’t really draw one.

The central element in carrier cloud is an agile virtual network that, in my own view, almost has to be a form of overlay SDN.  Operators have accepted a very limited number of candidates here; Cisco, Juniper, Nokia, and VMware lead their lists.  Juniper (Contrail) and Nokia (with its Nuage SDN product) just won deals with Vodafone.  Interestingly, the first three of this list have tended to position their SDN assets cautiously to avoid overhanging product sales, and VMware is still promoting NSX more as a data center strategy than as a total virtual network.  Because of cautious positioning, vendors with credentials in the virtual-network space haven’t really promoted a virtual network vision, and the lack of that vision more than anything else muddies up the infrastructure model.  Vendors with no offering in this critical area are at risk to marginalization.

The second thing that’s needed is an updated model for deploying hosted (virtual) functions.  NFV has long focused on OpenStack and VMs, and the industry is migrating to containers, microservices, and even functional (Lambda) programming.  Much of the credible growth in opportunity lies in event-driven applications.  In fact, you can argue that without event-driven applications, there probably isn’t a large enough new-revenue pool to drive more than a quarter to a third of carrier cloud opportunity to fruition.  This is a whole new kind of component relationship, one that Amazon, Google, and Microsoft have all proven (with their functional programming features) to be incompatible with past notions of hosting, even DevOps.

The final thing is that old song, service-wide lifecycle management and automation.  The current practices cost too much, take too long, and tend to create inflexible service-to-resource relationships that limit operators’ ability to respond to market conditions.  This has to be based on a very strong service/application model and it has to be integrated with the other two points without being inflexible with respect to their state of implementation.  Abstraction is a wonderful tool in creating an evolving system even if you’re not totally sure where it’s evolving to or how fast it’s going.

No vendor really has all the pieces here, which of course explains why operators who don’t have their own vision and glue are frustrated and why vendor salespeople are likewise.  I don’t think there’s any chance that 5G standards will develop in a way that helps carrier cloud in general and NFV in particular, at least not before about 2021.  I don’t think NFV standards will evolve to address the key issues here, at least not much faster than that.  So, vendors have to hope that operators converge on their own approach (which will probably commoditize all of the elements of NFV and carrier cloud, working against vendor interest) or move more effectively to promote a solution with the right scope.  Complaining about operators’ lack of insight won’t help.

Nor will complaining about lack of vendor support.  Operators have to decide if they’re willing to sit around and wait for something to be handed to them, or work (perhaps with the ECOMP/Open-O activity) to create a useful model that they can use to guide their own evolution.

Factory Processes and Functional Elements in NFV and IoT: Connecting the Dots

Today I want to take up the remaining issue with edge-centric and functional programming for event processing, both for IoT and NFV.  That issue is control of distributed state and stateless processes.  Barring changes in the landscape, this will be the last of my series of blogs on this topic.

As always, we need to start with an example from the real and familiar world.  Let’s assume that we’re building a car in five different factories, in five different places in the world.  Each of these factories has manufacturing processes that generate local events, and our goal is to end up with a car that can be sold, is safe, and is profitable.  What we have to do is to somehow get those factories to cooperate.

There are a few things that clearly won’t work.  One is to just let all the factories do their own thing.  If that were done you might end up with five different copies of some parts and no copies of others, and there would be little hope that they’d fit.  Thus, we have to have some master plan for production that imposes specific missions on each factory.  The second thing we can’t have is to drive our production line with events that have to move thousands of miles to get to a central event control point, and have a response return.  We could move through several stages of production during the turnaround.  These two issues frame our challenge.

What happens in the real world is that each of our five factories is given a mission (functional requirements) and a timeline (an SLA).  The presumption we have in manufacturing is that every producing point builds stuff according to that combination of things, and every other factory can rely on that.  Within a given factory, the production processes, including how the factory handles events like materials shortages or stoppages in the line, are triggered by local events.  These events are invisible to the central coordination of the master plan; that process is only interested in the mission—the output—of the factories and their meeting the schedule. A broad process is divided into pieces that are individually coordinated and then combined based on a central plan.

If we replace our factory processes by hosted functional processes, meaning Lambdas or microservices, and we replace conditions by specific generated IoT-like events, we have a picture of what has to happen in distributed event processing.  We have to presume that events are part of some system of function creation.  That system has presumptive response times, the total time it takes for an event to be analyzed and a reaction created.  The event-response exchange defines a control loop, whose length is determined by what we’re doing.  Things that happen fast require short control loops, and that means we have to be able to host supporting processes close to where the events are generated.

In both NFV and IoT we’ve tended to presume that the events generated by functions (including their associated resources) are coupled directly to service-specific processes.  The function of NFV management is presumptively centralized, and if IoT is all about putting sensors on the Internet, then it’s all about having applications that directly engage with events.  If our car-building exercise is an accurate reflection of the NFV/IoT world, this isn’t practical because we either create long control loops to a centralized process or create disconnected functions that don’t add up to stable, profitable, activity.

The path to solution here has been around for a decade; it’s hidden inside a combination of the TMF’s Shared Information and Data model (SID) and the Next-Generation OSS Contract (NGOSS Contract).  SID divides what we’d call a “service” into elements.  Each of these elements could correspond to our “factories” in the auto example.  If there’s a blueprint for a car that shows how the various assemblies like power train, passenger compartment, etc. fit, then there would be a blueprint for how each of these assemblies was constructed.  The “master blueprint” doesn’t need the details of each of these sub-blueprints.  They only need to conform to a common specification.  With a blueprint at any level, we can employ NGOSS Contract principles to steer local events to their associated processes.

What this says is that breaking up services or IoT processes into a hierarchy isn’t just for convenience in modeling deployment, it’s a requirement in making event processing work.  With this model, you don’t have to send events around the world, only through the local process system.  But what, and where, is that local process system?

The answer here is intent modeling.  A local process system is an intent-modeled black-box “factory” that produces something specific (functional behavior) under specific guarantees (an SLA).  Every NFV service or IoT application would be made up of some number of intent models, and hidden inside them would be a state/event engine that linked local events to local processes, with “local” here meaning “within the domain”.  If these black boxes have to signal the thing above that uses them, it signals through its own event set.  A factory full of sensors might be aggregated into a single event that reports “factory condition.”

From this, you can see that not only isn’t it necessary to build a single model of a service or an IoT application that describes everything, it’s not even desirable.  The top-level description should only reference the intent models of the layer below—just like in the OSI Reference Model for network protocols, you never dip into how the layer below does something, only the services it exposes.  Services and applications are composed not from the details of every local event-handling process, but from the functional elements that collect these processes into units of utility.

The “factory” analogy is critical here.  Every element, every intent model, is a factory.  It has its own blueprint for how it does its thing, and nothing outside it has any reason to know or care what that blueprint is.  It should not be exposed because exposing it would let something else reference the “how” rather than the “what”, creating a brittle implementation that any change in technology would break.

This brings us to the “where”, both in a model-topology sense and in a geographic sense.  If what we’re after is a set of utility processes that process local events, then we could in theory define the factories based on geography, or administration, or functionality, or a combination of those things.  We can have multiple factories that produce the same utility process, perhaps in a different way or in a different place.

To make this work, you need to have a standard approach to intent modeling so that a “factory abstraction” at a higher level can map to any suitable “factory instance” below.  That means standardized APIs to communicate the intent and SLA, and a standard way to exchange events/responses.  Strictly speaking you don’t need to standardize what happens inside the factory.  However, if you also standardize the state/event structure that creates the implementation—linking local events to local processes in a standardized way, then every intent model at every level looks the same, and processes that are used in one could also be used in others that required the same behavior.

If a high-level structure, a service or application, needs to reference one of our utility processes, it would represent it as an intent model and leave the decoding to the implementation.  If that structure wanted to specify a specific factory it could, or it could leave the decision on what factory to use (Pittsburgh or Miami, VPN or VLAN) to a lower-level abstraction that might make the selection based on the available technology or the geography of the service.

If you presume this approach, then every element of a service is an abstraction first and an implementation second.  Higher-layer users see only the abstraction, and all who provide implementations must build to that abstraction as their “product specification”.  There’s no difference whether an abstraction is realized internally or externally, or with legacy or new technology.  There’s no difference, other than perhaps connectivity, optimality, or price, in where it’s implemented.  Within a given functional capability set, you pick factories, or instantiate them, based on optimality.

In the IoT space, there could also be abstractions created based on geography or on functionality.  I used the example of driving a couple blogs back; you could envision traffic-or-street-related IoT as being a series of locales that collected events and offered common route and status services.  A self-drive car or an auto GPS user might exercise a local domain’s services in abstract from a distance, but shift to a lower-level service as they approached the actual geography.  That suggests that you might want to be able to allow an abstraction to offer selective exposure of lower-level abstractions.

It’s harder to lay out a specific structure of what a state/event model might look like for IoT, but I think the easiest way to approach it is to say that IoT is a service that can be decomposed, and that the decomposition process will balance issues like control loop length and geographic hosting efficiency to decide just where to put things, which frames how to abstract them optimally.  However, I think that the goal is always to create a model approach that lets you model an intersection, a route, a city, a country, a fleet of vehicles, or whatever using the same approach, the same tools, the same APIs and event and process conventions.

Even a self-driving car should, in my view, have a model that lives in the vehicle and receives and generates events.  That’s something we’ve not talked about, and I think we’re missing an opportunity.  Such an approach would let you define behavior when the vehicle has no access to IoT sensors outside it, but also how it could integrate the “services” of city, route, and intersection models to create a safe and optimal experience for the passengers.

This raises the very interesting question of whether the vehicle itself, as something capable of being directed and changing speeds, should also be modeled.  A standard model for a vehicle would facilitate open development of autonomous vehicle systems and also cooperative navigation between vehicles and street-and-traffic IoT.  It shouldn’t be difficult; there are only a half-dozen controls a driver can manipulate and they tend to fall into two groups—switch with specific states (like on/off), and “dial” that lets you set a value within a specified range.

With proper factory support, both IoT and NFV can distribute state/event systems and processes to take advantage of function scaling and healing without risk of losing state control and the ability to correlate the context of local things into a master context.  That combination is essential to get the most from either of these advances—in fact, it may be the key to making the “advances” really advance anything at all.

Google Steps Into Lambdas: What More Proof Do We Need?

I write a lot about things that aren’t mentioned often elsewhere, and that might rightfully make you wonder whether I’m just off in the lunatic fringe.  I did a series of blogs talking about the shift in software in general, and the cloud in particular, to “functional” or “Lambda” programming, and a few of you indicated it was a topic they’d never heard of.  So, was I on the edge or over it, here?  I think the latest news shows which.

Google, finally awakening to the battle with Amazon and Microsoft for cloud supremacy, is making changes to its cloud services.  One of the new features, announced at Google’s Cloud Next event, is extending Google’s “elastic pricing” notion to fixed machine instances, but the rest focus on functional changes.  In one case, literally.

Even the basic innovations in Google’s announcement were indicators of a shift in the cloud market.  One very interesting one is the new Data Loss Protection, which takes advantage of Google’s excellent image analysis software to identify credit card images and block out the number.  There are other security APIs and features as well, and all of these belong to the realm of hosted features that extend basic cloud services (IaaS).  In combination, they prove that basic cloud hosting is not only a commodity, it’s a dead end as far as promoting cloud service growth is concerned.  The cloud of the future is about cloud-based features, used to develop cloud-specific applications.

Which leads us to what I think is the big news, the service Google calls “Cloud Functions”.  This is the same functional/Lambda programming support that Amazon and Microsoft have recently added to their hosted-feature inventory.  Google doesn’t play up the Lambda or functional programming angle; they focus instead on the more popular microservice concept.  A Cloud Function is a simple atomic program that runs on demand wherever it’s useful.

Episodic usage isn’t exactly the norm for business applications, and Google makes it clear (as do Amazon and Microsoft) that the sweet spot for functional cloud is event processing.  When an event happens, the associated Cloud Functions can be run and you get charged for that.  When there’s no event, there’s no charge.

There are a lot of things Google could focus a new competitive drive on, and making Cloud Functions a key element of that drive says a lot about what Google believes will be the future of cloud computing.  That future, I think, could well be built on a model of computing that’s a variant on the popular web-front-end approach now used by most enterprises.  We could call it the “event-front-end” model.

Web front-end application models take the data processing or back-end elements and host them in the data center as usual.  The front-end part, the thing that shows screens and gives users their GUI, is hosted in the cloud as a series of web servers.  Enterprises are generally comfortable with this approach, and while you may not hear a lot about this, the truth is that most enterprise cloud computing commitments are built on these web front-ends.

It seems clear that Amazon, Google, and Microsoft all see the event space as the big driver for enterprise cloud expansion beyond the web front-end model.  The notion of an event front-end is similar in that both events and user GUI needs are external units of work that require an intermediary level of functionality, before they get committed to core business applications.  You don’t want your order entry system serving web pages, only processing orders.  Similarly, an event-driven system is likely to have to do something quickly to address the event, then hand off some work to the traditional application processes.

I doubt that even Google, certainly geeky enough for all practical purposes, think that microservice programming or Lambda programming or any other programming technique is going to suddenly sweep the enterprise into being a consumer of Cloud Functions.  I don’t think they believe that there’s a runaway revenue opportunity converting web front-ends to Cloud Functions either (though obviously user-generated HTTP interactions can be characterized as “events”).  What is happening to drive this is a realization that there’s a big onrushing trend that has been totally misunderstood, and whose realization will drive a lot of cloud computing revenue.  That trend is IoT.

The notion that IoT is just about putting a bunch of sensors and controllers on the Internet is (as I’ve said many times) transcendentally stupid even for our hype-driven, insight-starved, market.  What all technology advances for IT are about is reaping some business benefit, which means processing business tasks more effectively.  Computing has moved through stages in supporting productivity gains (three past ones, to be exact) and in each the result was moving computing closer to the worker.  Moving computing to process business events moves computing not only close to workers, but in many cases moves it ahead of them.  You don’t wait for a request from a worker to do something, you do it in response to the event stimulus that would have (in the past) triggered worker intervention.  Think of it as “functional robotics”; you don’t build robots to displace humans, you simply replace them as the key element in event processing.

This approach, if taken, would offer cloud providers a chance to get themselves into the driver’s seat on the next wave of productivity enhancement, an activity that would generate incremental business benefits (improved productivity) and thus generate new IT spending rather than displacing existing spending.  That would be an easier sell—politically, because there’s no IT pushback caused by loss of influence or jobs, and financially because unlocking improved business operations has more long-term financial value than cutting spending for a year or so.

Event processing demands edge hosting.  Functional programming is most effective as an edge-computing tool, because the closer you get to the edge of the network in any event-driven system, the sparser the events to process are likely to be.  You can’t plan a VM everywhere you think you might eventually find an event.  Amazon recognized that with Greengrass, a way of pushing the function hosting outside the cloud.  I think Google recognizes it too, but remember that Google has edge cache points already and could readily develop more.  I think Google’s cloud will be more distributed than either Amazon’s or Microsoft’s, because Google has designed their network from the first to support edge-distributed functionality.  Its competitors focused on central economies of scale.

The functional/event dynamic is what should be motivating the network operators.  Telcos have a lot of edge real estate to play with in hosting stuff.  The trick has been getting something going that would (in the minds of the CFOs) justify the decision to start building out.  The traditional approach has been that things like NFV would generate the necessary stimulus.  It didn’t develop fast enough or in the right way.  We then have 5G somehow doing the job, but there is really no clear broad edge-hosting mandate in 5G as it exists, and in any case we could well be five years away from meaningful specs in that area.

Amazon, Google, and Microsoft think that edge-hosting of functions for event processing is already worth going after.  Probably they see IoT as the driver.  Operators like IoT, but for the short-sighted reason that they think (largely incorrectly) that it’s going to generate zillions of new customers by making machines into 4/5G consumers.  They should like it for carrier cloud, and what we’re seeing from Google is a clear warning sign that operators are inviting another wave of disintermediation by being so passive on the event opportunity.

Passivity might seem to be in order, if all the big cloud giants are already pushing Lambdas.  Despite the interest from them, all the challenges of event processing through functions/microservices/Lambdas have not been resolved.  Stateless processes are fine, but events are only half the picture of event handling, and the states in state/event descriptions show that the other half isn’t within the realm of the functional processes themselves.  We need to somehow bring states, bring context, to event-handling and that’s something that operators (and the vendors who support them) could still do right, and first.

State/event processing is a long-standing way of making sense out of a sequence of events that can’t be properly interpreted without context.  If you just disabled something, sensors that record its state could be expected to report a problem.  If you’re expecting that something to control a critical process, then having it report a problem is definitely not a good thing.  Same event, different reactions, depending on context.  Since Lambdas are stateless, they can’t be the thing that maintains state.  What does?  This is the big, perhaps paramount, question for event processing in the future.  We need to be able to have distributed state/event processing if we expect to distribute Lambdas to the edge.

I didn’t exaggerate the importance of the Lambda-and-event paradigm in my past blogs.  I’m not exaggerating it now, and I think Google just proved that.  There aren’t going to be any more opportunities for operators to reap IoT and edge-hosting benefits once the current one passes.  This is evolution in action—a shift from predicable workflows to dynamic event-driven systems, and from a connecting economy to a hosting economy.  Evolution doesn’t back up, and both operators and vendors need to remember that.

Applying Edge Programming and Lambdas to OSS/BSS Modernization (and IoT)

Most of you will recall that there has been a persistent goal to make OSS/BSS “event-driven”.  Suppose we were to accept that was the right approach.  Could we then apply some of the edge-computing and IoT principles of software structure and organization of work to the OSS/BSS?  Let’s take a look at what would happen if we did that.

The theoretical baseline for OSS/BSS event-driven modernization is the venerable “NGOSS Contract” notion, which describes how the service contract (modeled based on the TMF SID model) can act as a kind of steering mechanism to link service events to operations/management processes (using, by the way, Service Oriented Architecture or SOA principles).  This concept is a major step forward in thinking about operations evolution, but it’s not been widely adopted, and in many ways it’s incomplete and behind the times.

The most obvious issue with the NGOSS Contract approach is that it doesn’t address where the events come from.  Today, most services are inherently multi-tenant with respect to infrastructure use, which means that a given infrastructure event might involve multiple services, or in some cases none at all.  To make matters worse, most modern networks and all modern data centers have resource-level management and remediation processes that at the least supplement and at most replace service- or application-specific fault and performance management.  The flow of events differs in each of the scenarios these event-related approaches.

The second problem is SOA.  SOA principles don’t dictate that a given “service” which is an operations process in our discussion, be stateless, meaning that it doesn’t store information between executions.  It’s the stateless property that lets you horizontally scale components under load or replace them when they break without interfering with operations.  We have software concepts that many believe will (or have) superseded SOA—microservices and functional (Lambda) programming.  Why would we “modernize” OSS/BSS using software concepts already deprecated?

The third problem with the approach is harder to visualize—it’s distributability.  I don’t mean that the software processes could be hosted anywhere, but that there is a specific architecture that lets operators strike a balance between keeping control loops short for some events, and retaining service-contract-focused control over event steering.  If I have an event in Outer Oshkosh that I want to handle quickly, I can put a process there, but will that distribution of the process then defeat my notion of model-driven event steering?  If I put the contract there, how will I support events in Paris efficiently?  If I put the contract in multiple places, have I lost true central control because service state is now multiply represented?

Reconciling all of this isn’t something that software principles like Lambda programming can fix by itself.  You have to go to the top of the application ladder, to the overall software architecture and the way that work flows and things are organized.  That really starts with the model that describes a network service or a cloud application as a distributed system of semi-autonomous components.

Outer Oshkosh and Paris, in my example, are administrative domains where we have a combination of raw event sources and processing resources.  Each of these places are making a functional contribution to my service, and thus the first step in creating a unified, modern, event-driven OSS/BSS process is to model services based on functional contributions.  There are natural points of function concentration in any service, created by user endpoints, workflow/traffic, or simply by the fact that there’s a physical facility there to hold things.  These should be recognized in the service model.

The follow-on point to this is that function concentration points that are modeled are also intent-based systems that have states and both process and generate events.  If something is happening in Paris or Outer Oshkosh that demands local event handling, then rather than forcing a central model to record the specifics of that handling, have a “local” model of the function behavior that does that.  A service, then, would have a model element representing each of these functions, and would presumably be defining the event-to-process mappings not inside each of the functions (they’re black boxes) but rather then event-to-process mappings for the events those functions each generate at the service level.

This kind of structure is a bit like the notion of hierarchical management.  You don’t try to run a vast organization from a single central point; you build sub-structures that have their own missions and methods, let each of them fill their roles their own way, and coordinate the results.  This notion illustrates another important point on my example; it’s likely you would have a “US” and an “EU” structure that would be coordinating the smaller function concentrations in those geographies.  In short, you have a hierarchy that sits between the raw event sources and the central model, and each level of that hierarchy absorbs the events below and generates new events that represent collective, unhandled, issues to the stuff above.

Edge processes in this model are essentially event-translators.  They absorb local events and accommodate need for immediate short-loop reaction, and they maintain functional state as a means of generating appropriate events to higher-level elements.  Thus, HighLevelThing is good if all its IntermediateLevelThings are good, and each of these depends on LowLevelThings.

This approach has the interesting property of letting you deploy elements of service lifecycle management to the specific places where events are being generated.  In theory, you could even marshal extra processing resources to accommodate a failure, or to help you expedite the change from one service configuration to another.

The interesting thing about this sort of modeling and event-handling is that it also works with IoT.  Precious little actual thought has gone into IoT; it’s all been hype and self-serving statements from vendors.  The reality of IoT is that there will be little application-to-sensor interaction.  Something like that neither scales nor can provide security and privacy assurance, not to mention being cost-effective.

The real-world IoT will be a series of function communities linked to sensors and using common basic event processing and event generation strategies.  There might be a “Route 95 Near the NJ Bridge” community, for example, which would subscribe to events that are processed somewhere local to that point and refined into new events that relate specifically to traffic conditions at the specified intersection.  This community might be a larger part of both the “US 95” community and the “NJ Turnpike” and “PA Turnpike” communities.

Function communities in IoT are hierarchical just like they are in network services, and for the same reason.  If you’re planning a trip along the East Coast, you might need to know the overall conditions on Route 95, but you surely don’t need to know them further ahead than your travel timeline dictates.  Such a trip, in IoT terms, is a path through function communities, and as you enter one you become interested in the details of what’s happening (traffic-wise) there, and more interested than before in conditions ahead.  An “event” from the nearby community might relate to what’s happening now, but events from the next community in your path are interesting only if they’re likely to persist for the time you’ll need to get there.

Contrast the I95 trip approach I’ve described with what would be needed if every driver needed to query sensors along the route.  Just figuring out which ones they needed, and which were real, would be daunting.  The same is true for OSS/BSS or cloud computing or service orchestration.  You need to divide complex systems into subsystems, so that each level in the hierarchy poses reasonable challenges in terms of modeling and execution.

The combination of a hierarchical modeling approach, functional/Lambda programming to create easily-migrated functions, and event-driven processes synchronized by the former and implemented through the latter, gives you an OSS/BSS and IoT approach that could work, and work far better than what we’ve been spinning up to now.  If this could deliver operational efficiencies better, then it’s what we need to be talking about.

Why the Critical Piece of VMware’s NFV 2.0 is the “Network Model” NSX MIGHT Support

I mentioned in my blog yesterday that a network and addressing model was critical to edge computing and NFV.  If that’s true, then could it also be true that having a virtual-network model was critical to vendor success in the NFV space?  The VMware NFV 2.0 announcement may give us an opportunity to test that, and it may also ignite more general interest in the NFV network model overall.

A “network model” in my context is a picture of how connectivity is provided to users and applications using shared infrastructure.  The Internet represents a community network model, one where everyone is available, and while that’s vital for the specific missions of the Internet, it’s somewhere between a nuisance and a menace for other network applications.  VPNs and VLANs are proof that you need to have some control over the network model.

One of the truly significant challenges of virtualization is the need to define an infinitely scalable multi-tenant virtual network model.  Any time you share infrastructure you have to be able to separate those who share from each other, to ensure that you don’t create security/governance issues and that performance of users isn’t impacted by the behavior of other users.  This problem arose in cloud computing, and it was responsible for the Nicira “SDN” model (now VMware’s NSX), an overlay-network technology that lets cloud applications/tenants have their own “private networks” that extend all the way to the virtual components (VMs and now containers).

NFV has a multi-tenant challenge too, but it’s more profound than the one that spawned Nicira/NSX.  VMware’s inclusion of NSX in its NFV 2.0 announcement means it has a chance, perhaps even an obligation, to resolve NFV’s network-model challenges.  That starts with a basic question that’s been largely ignored; “What is a tenant in NFV?”  Is every user a tenant, ever service, every combination of the two?  Answer: All of the above, which is why NFV needs a network model so badly.

Let’s start with what an NFV network model should look like.  Say that we have an NFV service hosted in the cloud, offering virtual CPE (vCPE) that includes a firewall virtual function, an encryption virtual function, and a VPN on-ramp virtual function of some sort.  These three functions are “service chained” according to the ETSI ISG’s work, meaning that they are connected through each other in a specific order, with the “inside” function connected to the network service and the “outside” to the user.  All nice and simple, right?  Not so.

You can’t connect something without having a connection service, which you can’t have without a network.  We can presume chaining of virtual functions works if we have a way of addressing the “inside” and “outside” ports of each of these functions and creating a tunnel or link between them.  So we have to have an address for these ports, which means we have an address space.  Let’s assume it’s an IP network and we’re using an IP address space.  We then have an address for Function 1 Input and Output and the same for Functions 2 and 3.  We simply create a tunnel between them (and to the user and network) and we’re done.

The problem is that if this is a normal IP address set, it has to be in an address space.  Whose?  If this is a public IP address set, then somebody online could send something (even if it’s only a DDoS packet) to one of the intermediary functions.  So presumably what we’d do is make this a subnet that uses a private IP address space.  Almost everyone has one of these; if you have a home gateway it probably gives your devices addresses in the range 192.168.x.x.  This would keep the function addresses hidden, but you’d have to expose the ports used to connect to the user and the network service to complete the path end to end, so there’s a “gateway router” function that does an address translation for those ports.

Underneath the IP subnet in a practical sense is an Ethernet LAN, and if it’s not an independent VLAN then the functions are still addressable there.  There are limits to the number of Ethernet VLANs you can have, and this is why Nicira/NSX same along in the first place.  With their approach, each of the IP subnets rides independently on top of infrastructure, and you don’t have to segment Ethernet.  So far, then, NSX solves our problems.

But now we come to deploying and managing the VNFs.  We know that we can use OpenStack to deploy VNFs and that we can use Nicira/NSX along with OpenStack’s networking (Neutron) to connect things.  What address space does all this control stuff live in?  We can’t put shared OpenStack into the service’s own address space or it’s insecure.  We can’t put it inside the subnet because it has to build the subnet.  So we have to define some address space for all the deployment elements, all the resources, and that address space has to be immune from attack, so it has to be separated from the normal public IP address space, the service address space, and the Internet.  Presumably it also has to be broad enough to address all the NFV processes of the operator wherever they are, so it’s not an IP subnetwork at all, it’s a VPN.  This isn’t discussed much, but it is within the capabilities of the existing NFV technology.

The next complication is the management side.  To manage our VNFs we have to be able to connect to their management ports.  Those ports are inside our subnet, so could we just provide a gateway translation of those port addresses to the NFV control process address space?  Sure, but if we do that, we have created a pathway where a specific tenant can “talk” into the control network.  We also have to expose resource management interfaces, and the same problem arises.

I think that NSX in VMware’s NFV 2.0 could solve these problems.  There is no reason why an overlay network technology like NSX couldn’t build IP subnets, VPNs, and anything else you’d like without limitations.  We could easily define, using the Private Class A IP address (1.x.x.x) an operator-wide NFV control network.  We could use one of the Class B spaces to define a facility-wide network, and use the Class C networks to host the virtual functions.  We could gateway between these—I think.  What I’d like to see is for VMware to take the supposition out of the picture and draw the diagrams to show how this kind of address structure would work.

Why?  The answer is that without this there’s no way we can have a uniform system of deployment and management for NFV because we can’t tell if everything can talk to what it needs to and that those conversations that should never happen are in fact prevented.  Also, because such a move would start a competitive drive to dig into the whole question of the multi-network map that’s an inherent (but so far invisible) part of not only NFV but also cloud computing and IoT.

Finally, because some competitor is likely to do the right thing here even if VMware doesn’t.  Think Nokia, whose Nuage product is still in my view the best overlay technology out there.  Think HPE, who just did their own next-gen NFV announcement and has perhaps the most to gain (and lose) of any vendor in the space.  This is such a simple, basic, part of any virtualized infrastructure and service architecture that it’s astonishing nobody has talked about it.

Ah, but somebody has thought about it—Google.  And guess who is now starting to work with operators on the elements of a truly useful virtual model for services?   Google just announced a partnership with some mobile operators, and they have the necessary network structure already.  And vendors wonder why they’re falling behind!

Taking a Longer Look at 5G Infrastructure and Services

It seems possible, based on the results of the MWC show, to speculate a bit on what infrastructure and service considerations are likely to arise out of the 5G specs.  “Speculate” is the key word here; I’ve already noted that the show didn’t address the key realities of 5G, IoT, or much anything else.  I also want to point out that we don’t have firm specifications here, and in my view, don’t even have convincing indicators that all the key issues are going to be addressed in the specs that do develop.  Thus, we can’t say if these “considerations” will be considered, outside this blog and those who respond on LinkedIn or to me directly.

Three things that 5G is supposed to do according to both the operators and what I read as “show consensus” are to support a unified service framework for wireline and wireless, support “network slicing” to separate services and operators who share infrastructure, and allow mobile services to incorporate elements of other connectivity resources, including wireline and satellite.  These three factors seem to frame one vision of the future that’s still not accepted widely—the notion of an explicit overlay/underlay structure for 5G.

Traditional networking is based on two notions; that services are built on layers that abstract a given layer from the details in implementing the layers below, and that within a layer the protocols of the layer define the features of the service.  When you have an IP network, for example, you rely on some Level 2 and Level 1 service, but you don’t “see” those layers directly.  You do “see” the features of the IP network in the features of your service.

Overlay/underlay networking is similar to the layered structure of the venerable OSI model, but it extends it a bit.  We have overlay/underlay networking today in “tunnel networks” that build connectivity based on the use of virtual paths or tunnels supported by a protocol like Ethernet or IP, and we now have formalized overlays built using SDN or SD-WAN technology.  Most overlay/underlay networks, in contrast to typical OSI-layer models, don’t rely on any feature of the layer below other than connectivity.  There are no special protocols or features needed.  Also, overlay/underlay networking has from the first been designed to allow multiple parallel overlays on a single underlay; most OSI-modeled networks have a 1:1 relationship between L2 and L3 protocols.

In a 5G model, the presumption of overlay/underlay services would be that there would be some (probably consistent) specification for an overlay, both in terms of its protocols and features.  This specification would be used to define all of the “service networks” that wireline and wireless services currently offer, and so the overlay/underlay framework would (with one proviso I’ll get to) support any “service network” over any infrastructure.  That satisfies the first of our three points.

The second point is also easily satisfied, because multiple parallel overlay networks are exactly what network slicing would demand.  If we expanded the “services” of the underlay network to include some class-of-service selectivity, the overlays could be customized to the QoS needs of the services they represent in turn.

In both SD-WAN and SDN overlays, the connectivity of the overlay is managed independent of the underlay; the OSI model tends to slice across layer boundaries or partition the devices to create overlay/underlay connectivity.  In most SD-WAN applications the presumption is that the edge devices (where the user is attached) terminate a mesh of tunnels that create connectivity.  In SDN, there may be a provision for intermediary steering, meaning that an endpoint might terminate some tunnels and continue others.  For proper 5G support, we need to review these options in the light of another element, which is explicit network-to-network interconnect.

Most protocols have some mechanism for NNI, but these are usually based on creating a connection between those singular top-of-the-stack OSI protocols.  In overlay/underlay networks, an NNI element lives at the overlay level, and simply connects across what might be a uni-protocol (same protocol for the underlay) or a multi-protocol (a different underlay on each side) border.  Alternatively, you could have an underlay gateway that binds the two networks together and harmonizes connectivity and QoS, and this could allow the overlay layer to treat the two as the same network.

The border concept could also describe how an underlay interconnect would be shared by multiple overlays, and that concept could be used to describe how a fiber trunk, satellite link, or other “virtual wire” would be represented in an overlay/underlay structure and how it could be used by multiple services.  On- and off-ramps to links like this are a form of gateway, after all.

The question that’s yet to be addressed here is the role that virtual function hosting might play.  There’s nothing explicitly in 5G discussions to mandate NFV beyond hopefulness.  On the other hand, the existence of an overlay technology could well create the beginning of an NFV justification, or at least a justification for cloud-hosting of these overlay components rather than dedicating devices to that role.  An overlay network should be more agile than the underlay(s) that support it.  That agility could take the form of having nodes appear and disappear at will, based on changes in traffic or connectivity, and also in response to changes in the state of the underlay network.  Virtual nodes fit well into the overlay model, even NFV-hosted virtual nodes.

Beyond that it’s harder to say, not because hosting more features isn’t beneficial but because hosting alone doesn’t justify NFV.  NFV was, from the first, fairly specialized in terms of its mission.  A “virtual network function” is a physical network function disembodied.  There really aren’t that many truly valuable physical network functions beyond nodal behavior.  Yes, you can hypothesize things like virtual firewalls and NATs, but you can get features like that for a few bucks at the local Staples or Office Depot, at least for the broad market.  Moving outside nodal (connectivity-routing) features to find value quickly takes you outside the realm of network functions and into application components.  Is a web server a network function, or a mail server?  Not in my view.

From the perspective of 5G and IoT, though, the requirements for hosting virtual functions or hosting cloud processes are very similar; there is a significant connectivity dimension.  We have done very little work in the NFV space to frame what network model is required to support the kind of function-hosting-and-management role needed.  That work that’s been done in the cloud space has focused on a pure IP-subnet model that’s too simple to address all the issues of multi-tenant functions that have to be securely managed as well.  In fact, the issue of addressing and address management is probably the largest issue to be covered, even in the overlay/underlay model.  If operators and vendors are serious about 5G then they need to get serious about this issue too.

What Would Edge-Hosting Mean to Infrastructure and Software Design?

If computing in general and carrier cloud in particular is going to become more edge-focused over time, then it’s time to ask just what infrastructure features will be favored by the move.  Even today we see variations in server architecture and the balance of compute and I/O support needed.  It is very likely that there will be even more variations emerging as a host of applications compete to dominate the cloud’s infrastructure needs.  What are the factors, and what will the result be?  I’m going to have to ask you to bear with me, because understanding the very important issues here means going way beyond 140-character Tweets.

It’s always a challenge to try to predict how something that’s not even started will turn out in the long term.  Carrier cloud today is almost pre-infancy; nearly all carrier IT spending is dedicated to traditional OSS/BSS, and what little is really cloud-building or even cloud-ready is so small that it’s not likely representative of broader, later, commitment.  Fortunately, we have some insight we can draw from the IT world, insight that’s particularly relevant given the fact that things like microservices are already a major driver of change in IT, and are of increasing interest in the carrier cloud.  To get to these insights we need to look a bit at the leading edge of cloud software development.

Microservices are in many ways a kind of bridge between traditional componentized applications (including those based on the Service Oriented Architecture of almost two decades ago) and the “bleeding edge” of computing architecture, the functional programming or Lambda function wave.  A Lambda function is a software element that processes an input and produces an output without relying on the storage of internal pieces—it has a single function regardless of the context of its use.  What makes this nice is that because nothing is ever saved inside a Lambda function, you can give a piece of work to any copy of the function and get exactly the same result.  I’m going to talk a lot about the Lambda functions in this blog, so to save typing I’m going to call them “Lambdas” with apologies to the people who use the term (without capitalizing) to mean “wavelength”.

In the broader development context, this kind of behavior is known as “stateless” behavior, because there are no “states” or differences in function outcome depending on the sequence of events or messages being processed.  Stateless behavior is mandatory for Lambdas, and also highly recommended if not mandated for microservices.  Stateless stuff is great because you can replace it, scale it, or use any convenient element of it and there’s no impact, no cross-talk.  They’re bad because many processes aren’t stateless at all—think of taking money out of the bank if you need an easy example.  What you have left depends on what you’ve put in or taken out before.

The reason for this little definitional exercise is that both Amazon and Microsoft have promoted Lambda programming as a pathway to event-driven IT, and the same is being proposed for microservices.  In Amazon’s case, they linked it with distributing functions out of the cloud and into an edge element (Greengrass).  Event-driven can mean a lot of things, but it’s an almost-automatic requirement for what are called “control loop” applications, where something is reported and the report triggers a process to handle it.  IoT is clearly a control-loop application, but there are others even today, which is why Amazon and Microsoft have focused on cloud support for Lambda functions.  You can write a little piece of logic to do something and just fire it off into the network somewhere it can meet the events it supports.  You don’t commit machine image resources or anything else.

If IoT and carrier cloud will focus on being event-driven, it follows they would likely become at least Lambda-like, be based on stateless microservices that are pushed toward the edge to shorten the control loop while traditional transactional processes stay deeper in the compute structure.  Applications, then, could be visualized as a cloud of Lambdas floating around, supporting collectively a smaller number of stateful repository-oriented central applications.  The latter will almost surely look like any large server complex dedicated to online transaction processing (OLTP).  What about the latter?

The Lambda vision is one of functional units that have no specific place to live, remember.  It’s a vision of migration of capabilities to assemble them along the natural path of work, at a place that’s consistent with their mission.  If they’re to be used in event-handling, this process of marshaling Lambdas can’t take too long, which means that you’d probably have a special system that’s analyzing Lambda demand and caching them, almost like video is cached today.  You’d probably not want to send a Lambda somewhere as much as either have it ready or load it quickly from a local resource.  Once it’s where it needs to be, it’s simply used when the appropriate event shows up.

This should make it obvious that running a bunch of Lambdas is different from running applications.  You don’t need a lot of disk I/O for most such missions, unless the storage is for non-volatile referential data rather than a dynamic database.  What you really want is powerful compute capabilities, a lot of RAM capacity to hold functions-in-waiting, and probably flash disk storage so you can quickly insert a function that you need, but hadn’t staged for use.  Network I/O would be very valuable too, because restrictions on network capacity would limit your ability to steer events to a convenient Lambda location.

How Lambda and application hosting balance each other, requirements-wise, depends on how far you are from the edge.  At the very edge, the network is more personalized and so the opportunity to host “general-use Lambdas” is limited.  As you go deeper, the natural convergence of network routes along physical facilities generate places where traffic combines and Lambda missions could reasonably be expected to be shared across multiple users.

This builds a model of “networking” that is very different from what we have now, perhaps more like that of a CDN than like that of the Internet.  We have a request for event-processing, which is an implied request for a Lambda stream.  We wouldn’t direct the request to a fixed point (any more than we direct a video request that way), but would rather assign it to the on-ramp of a pathway along which we had (or could easily have) the right Lambdas assembled.

I noted earlier in this blog that there were similarities between Lambdas and microservices.  My last paragraph shows that there is also at least one difference, at least in popular usage, between Lambdas and microservices.  The general model for microservices is based on extending componentization and facilitating the use of common functions in program design.  A set of services, as independent components, support a set of applications.  Fully exploiting the Lambda concept would mean that there really isn’t a “program” to design at all.  Instead there’s a kind of ongoing formula that’s developed based on the source of an event, its ultimate destination, and perhaps the recent process steps taken by other Lambdas.  This model is the ultimate in event-driven behavior, and thus the ultimate in distributed computing and edge computing.

There’s another difference between microservices and Lambdas, more subtle and perhaps not always accepted by proponents of the technologies.  Both are preferred to be “stateless” as I noted, but in microservices it’s acceptable to use “back-end” state control to remove state/context from the microservices themselves.  With Lambdas, this is deprecated because in theory different copies of the same Lambdas might try to alter state at the same time.  It would be better for “state” or context to be carried as a token along with the request.

We don’t yet really have a framework to describe it, though.  Here’s an event, pushed out by some unspecified endpoint.  In traditional programming, something is looking for it, or it’s being posted somewhere explicitly.  Maybe it’s publish-and-subscribe.  However, in a pure Lambda model, something Out There is pushing Lambdas out along the path of the event.  What path is that?  How does the Something know what Lambdas are needed or where to put them?

If you applied the concepts of state/event programming to Lambda control, you could say that when an event appears it is associated with some number of state/event tables, tables that represent contexts that need to process that event.  The movement of the event through Lambdas could be represented as the changing of states.  Instead of the traditional notion of an event arriving at a process via a state/event table, we have a process arriving at the event for the same reason.  But it’s still necessary to know what process is supposed to arrive.  Does the process now handling an event use “state” information that’s appended to it and identify the next process down the line?  If so, how does the current process know where the next one has been dispatched, and how does the dispatcher know to anticipate the need?  You can see this needs a lot of new thinking.

IoT will really need this kind of edge-focused, Lambda-migrating, thinking.  Even making OSS/BSS “event-driven” could benefit from it.  Right now, as far as I can see, all the good work is being done in abstract with functional programming, or behind the scenes of web-focused, cloud-hosted startups who probably have stimulated both Amazon and Microsoft to offer Lambda capabilities in their clouds.  It will be hard to make IoT the first real use case for this—it’s a very big bite—but maybe that’s what has to happen.

A Slightly Early MWC Retrospective

The iconic MWC conference is now pretty much history.  The big announcements have been made, the attendees have largely exhausted themselves (the exhibitors certainly have!), and it’s time to take stock and decide whether anything important was really said and shown.  In terms of point announcements, it’s rare for something huge to come out at an event like MWC—too much crosstalk.  The buzz of the show is another matter; we can pick out some important points by looking across all the announcements and demonstrations to detect shifts and non-shifts.

The most important thing that I take away from MWC is that there is an enormous gap between 5G expectation and the current state of the technology.  The goal of 5G is service- and infrastructure-shaking, and the reality of 5G at the moment struggles to be a major shift in the RAN.  Part of the reason for this shift is the (usual) slow progress of the specifications, but another part is the fact that standards groups have a habit of grabbing the low apples or focusing on the most visible questions.

5G RAN improvements are important, but operators I talk with have consistently said that their biggest priority was to standardize the metro and access models for wireless and wireline, and to support wireless 5G extensions of fiber networks.  Without these capabilities, many operators said that it would be difficult to justify 5G versus enhanced 4G.  Ironically, the early “5G trials” have all focused on RAN and on modest adjustments to 4G, like supporting 5G frequencies, to “prove out” the technology.  Some operators have been public in their rejection of this approach, but that’s what’s been happening.

One public approach to pre-standard 5G even retains the Evolved Packet Core, which most operators told me was something that they wanted (as a number-one or number-two priority) to eliminate.  Clearly the focus of many 5G proponents is to move the process ahead even if there’s less utility in what’s produced.  That also was a criticism that’s been made in public.

The next point is that we have not yet abandoned our short-sighted and stupid vision of IoT as being all about wireless connections.  There were plenty of examples of this, but two were particularly figured in the overall stream of hype.  The first is a broadening of the notion that this is all about RF, which makes IoT all about connections.  The second is the almost hypnotic attraction to “connected car” as the prototypical IoT application.

I’m almost tired of saying that getting devices connected is the least of our IoT worries, but it is.  The majority of IoT applications will almost certainly use devices that not only aren’t directly on the Internet at all, but don’t even use Internet-related technology for connections.  Home control today relies on technologies that aren’t related to Ethernet, IP, or the Internet.  Only the home controller is an Internet device, and this model of connectivity is likely to dominate for a long time to come.  If we insist that all our sensors and controllers be IP devices that are Internet-connected, we’re building a barrier to adoption that will take unnecessary years to jump.

The connected car is another potential trap.  Most of what a connected car will do is offer WiFi to consumer mobile devices that passengers and drivers (the latter, hopefully, not while moving) are using in the vehicle.  Yes, there are other features, but the value proposition is really more like a moving WiFi hotspot than a real IoT mission.  There’s always pressure to pick something that’s actually happening and then broaden your definition of revolutionary technology to envelope it, justifying your hype.  That’s not helpful when there are real questions and issues that are not addressed by the billboard-technology example, but will have to be addressed for the market to develop.

The first positive point from the show is that both network operators and equipment vendors realize that mobile broadband personalization is the only relevant demand driver.  Wireline broadband for both consumers and businesses is really just a matter of wringing as much as profit as possible out of something that’s already marginal at best.  If there is new revenue to be had for operators, that revenue is going to come from the exploitation of mobile broadband in both enterprises and consumer markets.

There’s a sad side even to this happiness, though.  For all the fact that the explosion of interest in MWC demonstrates the victory of mobile broadband, or that many who exhibit and probably even more who attend MWC are there for things not directly related to cellular networks, we’re still missing a lot of the key points that justify the mobile focus.

A mobile device is a direct extension of the user, a kind of technological third leg or second head.  It brings the knowledge and entertainment base of the Internet and the power of cloud computing right into the hands of everybody.  The best way to look at IT evolution since the ‘50s is that each new wave brought processing closer to people.  Mobile broadband fuses the two.

Also in my view a positive was the talk from FCC Chairman Ajit Pai, where he said what shouldn’t really have surprised anyone—that the FCC planned a “lighter touch” under the new administration.  The FCC had already taken steps that indicated it would retreat from the very activist position taken by the body under the previous Chairman (Wheeler), but Pai voted against the neutrality ruling and his comments at MWC suggest he has specific moves in mind.  Reinforcing the “lighter touch” was the comment (referencing neutrality) that “It has become evident that the FCC made a mistake.  Our new approach injected tremendous uncertainty into the broadband market. And uncertainty is the enemy of growth.”

Net neutrality is important, insofar as it protects OTT competitors from operators cutting favorable deals with their own subsidiaries.  The current rules, though, were not enough to prevent AT&T from offering outside-data-plan video to its TV customers.  On the other hand, the extension of the rules that Wheeler promoted has made the relationship between subsidiaries and ISPs confusing to say the least, and it’s probably limited willingness of operators to pursue initiatives that would have promoted broadband infrastructure investment.

I have to agree with Pai here.  I think that the FCC in the last term overstepped simple neutrality goals and took a stand on the broadband business that favored one party—the OTTs—over the other, to a degree the FCC had never done before.  A dynamic broadband market—the kind that MWC and 5G propose to support—demands a symbiosis and not an artificial financial boundary.  Through almost my whole consulting career I’ve supported the notion of Internet settlement, and I still support it.  I think it’s time to take some careful, guarded, steps toward trying it out.

How Could We Accelerate the Pace of New Edge-Deployed Data Centers?

There should be no question that I am a big fan of edge computing, and so I’m happy that Equinix is pushing storage to the edge (according to one story yesterday) or that Vapor IO supports micro-data-centers at the wireless edge.  I just wish we had more demand focus to explain our interest in the supply.  There are plenty of things happening that might drive a legitimate migration of hosting/storage to the network edge, and I can’t help but feel we’d do a better job with deployment if there was a specific business case behind the talk.

Carrier cloud is the definitive network-distributed IT model, and one of the most significant questions for server vendors who have aspirations there is just how the early carrier cloud data centers will be distributed.  A central model of hosting, even a metro-central one, would tend to delay the deployment of new servers.  An edge-centric hosting model would build a lot of places where servers could be added, and encourage operators to quickly support any missions where edge hosting offered a benefit.  So, in the balance, where are we with this?

Where you host something is a balance between economy of scale, economy of transmission, and propagation issues.  Everyone knows that a pool of servers offers lower unit cost than single servers do, and most also know that the cost per bit of transmission tends to fall to a point, then rise again as you pass the current level of economical fiber transport.  Most everyone knows that traversing a network introduces both latency (delay) and packet loss that grows with the distance involved.  The question is how these things combine to impact specific features or applications.

Latency is an exercise in physics; the speed of light and electrons and the delay introduced by queuing and handling in network devices.  The only way to reduce it is to shorten the path, which means pushing stuff to the edge.  Arguably, the only reason to edge-host something is because of latency (though we’ll explore that point later), and most applications run today aren’t enormously latency-sensitive.  Telemetry and control applications, which involve the handling of an event and sending a response, are often critically sensitive to latency in M2M applications.

That means that IoT is probably the obvious reason to think about edge-hosting something.  The example of self-driving cars is trite here, but effective.  You can imagine what would happen if a vehicle was controlled by something half-a-continent away.  You can easily get a half-second control loop, which would mean almost fifty feet of travel at highway speed.

Responses to human queries, particularly voice-driven personal assistants, are also delay sensitive.  I saw a test run a couple years ago that demonstrated that people got frustrated if their queries were delayed more than about two seconds, resulting in their repeating the question and creating a context disconnect with the system.  Since you have to factor in actual “think time” to a response, a short control loop would be helpful here, but you can accommodate longer delay by having your assistant say “Working on that….”

Content delivery in any form is an example of an application where latency per se isn’t a huge issue, but it raises another important point—resource consumption or “economy of transmission”.  If you were to serve (as a decades-old commercial suggested you could) all the movies ever made from a single point, the problem you’d hit is that multiple views of the same movie would quickly explode demands for capacity.  You also expose the stream to extreme variability in network performance and packet loss, which can destroy QoE.  Caching in content delivery networks is a response to both these factors, and CDNs represent the most common example of “edge hosting” we see today.

Let’s look at the reason we have CDNs to explore the broader question of edge-hosting economies versus more centralized hosting.  Most user video viewing hits a relatively contained universe of titles, for a variety of reasons.  The cost of storing these titles in multiple places close to the network edge, thus reducing network resource consumptions and the risk of performance issues, is minimal.  What makes it so is the fact that so many content views hit a small universe of content.  If we imagine for a moment that every user watched their own unique movie, you’d see that content caching would quickly become unwieldy.  Unique resources, then, are better candidates for “deep hosting” if all other traffic and scale economies are equal.

That brings us to scale.  I’ve mentioned in many blogs that economies of scale don’t follow an exponential or even linear curve, but an Erlang C curve.  That means that when you get to a certain size data center, further efficiency gains from additional servers are minimal.  For an average collection of applications I modeled for a client, you reached 95% optimality at about 800 servers, and there are conditions under which less than half that would achieve 90% efficiency.  That means that supersized cloud data centers aren’t necessary.  Given that, my models have always said that by around 2023, operators would have reached the point where there was little benefit to augmenting centralized data centers and move to edge hosting.  The biggest growth in new data centers occurs in the model between 2030 and 2035, where the number literally doubles.  If I were a vendor, I would want to accelerate that shift to the edge.

Centralization of resources is necessary for unique resources.  Edge hosting is necessary where short control loops are essential to application performance.  If you model the processes, you find that up to about 2020, carrier cloud is driven more by caching and video/ad consideration than anything else, and that tends to encourage a migration of processing toward the edge.  From 2020 to about 2023, enhanced mobile service features begin to introduce more data center applications that are naturally central or metro-scoped, and beyond 2023 you have things like IoT that magnify the need for edge caching again.

Video, then, is seeding the initial data center edge locations for operators.  Metro-focused applications will probably use a mixture of space in these existing edge hosting points and new and more metro-central resources.  The natural explosion in the number of data centers will occur when the newer short-control-loop stuff emerges, perhaps four to five years from now.  It would be hard to advance something like this; the market change is pretty profound.

Presuming this is all true, then current emphasis on caching of data is smart and edge hosting of processing features may be premature.  What could accelerate the need for edge hosting?  This is where something like NFV could be a game-changer, providing a mission for edge-centric hosting before broad-market changes in M2M and IoT emerge and building many more early data centers.  If you were to double the influence of NFV, for example, in the period up to 2020, you would add an additional thousand edge data centers worldwide.

NFV will never drive carrier cloud, but what it could do is to promote edge-placement of many more data centers between now and 2020, shifting the balance of function distribution in the 2020-2023 period toward the edge, simply because the resources are there.  That could accelerate the number of hosting points (and slightly increase the number of servers) through 2025, and it would be a big windfall for vendors.

IT vendors looking at the carrier cloud market should examine the question of how this early NFV success could be motivated by specific benefits, and what specific steps in standardization, operationalization, or whatever might be essential in supporting that motivating.  There are few applications that could add as much to the data center count, realistically, in the next three years.

Are You in the Mood for Indigo? AT&T’s New Concept Could Change Your Mind!

When you have an architecture that set the standard for NFV, what do you to for an encore?  AT&T’s answer to that question is “Network 3.0 Indigo” or in short terms, just “Indigo”.  It’s another of those huge concepts that’s difficult to describe or to understand, and its sheer scope is certain to create healthy skepticism on whether AT&T can meet the goals.  Whatever happens in realization, though, Indigo is profoundly important because it frames operators’ views of the future exceptionally well.

Operators have consistently been telling me that their biggest problem with technology initiatives, from SDN and NFV to 5G, is that they seem to be presented as justified for their own sake.  What operators need is a business goal that can be met, an opportunity addressed, and in the complex world of networking most technologies proposed lack the critical property of scope.  They just don’t do the job by themselves, which is why integration is becoming such an issue.  AT&T advanced NFV with ECOMP by incorporating more into it, and they hope to do even more with Indigo.

Let’s start with a quote from AT&T’s Indigo vision statement: “The network of the future will be more than just another “G”, moving from 2G to 3G to 4G and beyond.  It’s about bundling all the network services and capabilities into a constantly evolving and improving platform powered by data. This is about bringing software defined networking and its orchestration capabilities together with big data and an emerging technology called microservices, where small, discrete, reusable capabilities can team up as needed to perform a task. And, yes, it’s about so-called ‘access’ technologies like 5G and our recently -announced Project AirGig. Put all that together, and you have a new way to think about the network.”

Feel better, more educated?  Most people who read the above statement don’t, so don’t feel inadequate.  In simple terms, what Indigo is about is creating agility and efficiency, which you’ll probably recognize as the two paramount (credible) NFV goals.  AT&T is making an important statement here, even if it’s not easy to parse.  The network isn’t going to evolve as a series of disconnected technical shifts, but as a result of serving a clear set of business requirements.  Given that, it makes no sense to keep on talking about “SDN” or “NFV” or “5G” as though they were the only games in town.  There has to be a holistic vision, which is why the quote above ends with the statement that Indigo is “a new way to think about the network.”  It’s about creating something that becomes what’s needed.

Faster access, which is pretty much all anyone thinks about these days when they hear about telecom changes, is rapidly reaching a point where further gains in performance will be difficult to notice.  I’ve said many times that most users could not actually exploit even 25 Mbps; you need multiple people sharing a connection to actually use that much.  AT&T correctly points out that at the point where more bits equals superior service becomes blasé, it’s the overall experience that counts.  Indigo is therefore an experience-based network model.

But, you might rightfully ask, what the heck is it technically?  The kind of detailed Indigo information that we all might like isn’t available, but it’s possible to interpret the high-level data AT&T has provided to gather some useful insight into their approach.  As you might expect from the notion of “experience-based” network services, Indigo steps out beyond connections, to an intermediary position that AT&T calls a “Data-Powered Community”.  Inside this new artifact is the usual access network options, and the now-common commitment to SDN, but there’s also identity management, AI, a data platform that in my view will emerge as the framework for AT&T’s IoT model, and the software orchestration and management tools that tie all this together.

From what I can see, the key technology concept in Indigo is the breaking down of monolithic software structures and service structures into microservices, which are then orchestrated (presumably using ECOMP).  Just as ECOMP can deploy an NFV-based service, it could deploy a function-based application.  Want an operations tool?  Compose it from microservices.  Want to sell a cloud service?  Compose it.  A Community in Indigo is an ad-hoc composition of functional and connection elements.

The Communities Indigo defines are the frameworks that house the customer experiences that they value.  That means that traditional networking ends up merging more with network-related features like agile bandwidth and connectivity, but also with cloud computing and applications.  I think Indigo is a promise that to AT&T, a virtual function and a cloud application will be faces of the same coin, and that services will use both of these two feature packages to add value for users and revenue for AT&T.

One important feature of Indigo is the ability to support services whose pieces are drawn from a variety of sources.  “Federation” isn’t just a matter of interworking connectivity services, it’s a full-blown trust management process that lets third-party partners create elements of services and publish them for composition.  This doesn’t mean that AT&T won’t offer their own advanced service features, but that they expect to have to augment what they can build by incorporating useful stuff from outside.

If you look at the use cases for Indigo that AT&T has already presented, you don’t see more than hint of what I’m describing.  There are four such use cases, and most of them are pretty pedestrian.  What’s really needed is a broader and clearer picture of this federation approach, and in particular examples of how it might be integrated with IoT services.  If there’s a giant revenue pie that AT&T needs to bite into, IoT will likely create it.  Given this, and given that AT&T cites IoT trends twice in its lead-in to justifying Indigo, it’s surprising that they don’t offer any IoT-specific or even related use cases.  In fact, beyond the two justifying mentions, IoT doesn’t appear in the rest of the AT&T technical document on Indigo.

Which, frankly, is my big concern about Indigo.  Yes, all the framing points AT&T makes about the evolution of services and service opportunity are true.  Yes, a framework that envelopes both connectivity and the experiences users want to be connected with is where we’re heading.  And, yes, it’s true that IoT services are still off in the future.  However, they are the big focus of opportunity and Indigo will stand or fail based on whether it supports IoT-related services well.  It’s IoT that offers AT&T and other operators an application so big that most competitors (including OTTs) will be afraid to capitalize it.  They can own IoT, if they really can frame it in Indigo terms.

Indigo’s greatest near-term contribution may well be its impact on ECOMP.  Universal orchestration and software decomposition to microservices would mean a significant enhancement to the ECOMP model of defining services and managing their lifecycle.  A broader goal for orchestration is critical for NFV’s success because the scope needed to deliver the business case is larger than the bite the NFV ISG has taken of the issues.  Indigo is big, which is a risk, but here, bigness could be a precursor to greatness.