Some General Thoughts on Service Modeling

Everyone tells us that service composition based on software tools is critical.  Operators say that “agility” in service creation would help them address opportunities faster and better, and vendors think it would promote their products in a market that’s increasingly price-competitive.  Perhaps it’s surprising, then, that there doesn’t seem to be a unified position on just what “service composition” means or what’s needed to provide it.

Network services are created from cooperative relationships among network devices.  Historically this has been simplified by the fact that the “service” and the cooperative relationships were created at the device level, meaning that the system of devices we’d call a “network” had their own rules for cooperative behavior, an offshoot of which was one or more retail services.  Services in this framework are somewhat coerced rather than composed, because the behavior of the network is really the natural underpinning of service behavior.

Even where operators didn’t have continuity of control, as would be the case for multi-provider services, it was routine to define network interconnect points with policy-managed traffic exchange.  As long as the service protocol was the same, NNIs largely eliminated the composition problem.  I say “largely” because multi-provider services still required the coordination of VPN services (for example) and access connections in various regions.  The fact that most early service composition thinking came from OSS/BSS providers and CIOs is likely due to this; the technical provisioning was less an issue than order coordination.

Virtualization messed things up because it created two levels of elasticity.  First, the relationship between features and network resources changed from being static (purpose-built devices) to dynamic (hosted).  Second, the feature composition of services also became elastic.  We can see both these levels of dynamism in virtualization today—you have “functions” that have to be assembled into services, but for those services to work, each function has to be hosted somewhere.  Virtualization, then, mandates explicit service composition.  Most probably agree with that; the issue is in what kind and how much.

At the “light touch” end of the spectrum of possibilities is the administration-centric view.  This is an evolution from the OSS/BSS TMF approach, one that focuses on the idea of assembling functionally complete elements but leaving the implementation of each up to the operator/administrator that owns them.  You can visualize the implementation as being a combination of commanding networks to do something, much as they would do it today, and instantiating software features on servers.

The opposite pole is the “complete composition” approach.  Here, the assumption is that composition actually builds a functioning service by building every feature, hosting it as necessary, and making internal and to-the-user connections in a fairly explicit way.  Services still require features/functions, but the process of including them and hosting them on something are blurred into different faces of the same coin.

There are a lot of differences between these polar approaches, but they tend to be rather subtle and many depend on just how the composition process (and the underlying resources and overarching services) are structured.  Let me give a single example to illustrate the issues, and I’m sure you can work out others on your own.

Where services are modeled at a low level of granularity—the light-touch approach—it’s very difficult to reflect conditions between parts of the model.  That’s because the modeling process abstracts the features/behaviors at the detail level, making it impossible to see what’s happening inside.  As the granularity of the model increases, in the complete composition approach, you have the chance to reflect conditions across the elements of the model.  That changes optimization and management.

In a light-touch scenario, the presumption is that the feature selection process divides the service by domain, meaning geography or administration.  A “service” like VPN, for example, consists of a core VPN service capability coupled with access elements, one for each endpoint.  If something breaks, the presumption is that either the function itself can self-heal (the VPN core can repair around a node or trunk fault) or there’s no healing possible without manual intervention (you can’t fix the cable-cutting backhoe problem without sending a tech to splice things).  Further, the presumption is that the elements can be managed independently.  Doing something to optimize one won’t impact the others.

The complete composition scenario relieves these presumptions.  If the entire service is built from some software-readable recipe, then the decisions made anywhere can be reflected into the decisions to be made in other places.  You can optimize both hosting and connectivity among hosting points together, because your composition recipe defines both connecting and hosting, and does it across the service overall.

Even within the two basic approaches to modeling, you still have the question of what the model looks like.  Modeling that is functionally generalized, meaning that all “access” has a common set of model properties and implementations of “access” all meet them, is key if you want to support interoperability and integration.  Intent modeling, the current favored approach, is an example of functionally generalized modeling.  Where modeling isn’t functionally generalized, you have the risk of exposing something implementation-specific in the model, which would then mean that the model structure is “brittle”.  That term means that it’s easy to break the model by making a small change at a low-level point, in implementation.  Brittle stuff means a lot of work and a lot of errors down the line, because a small technical change can mean that a lot of models and their associated services don’t work.

Even having a strong intent structure doesn’t insure you have interoperability and integration support.  Intent modeling, if it’s to be done right, means that we have to first define our functions to support functional generalization.  We need a kind of model class library that says that “access” is a high-level model that can represent “roaming-enabled access” or “tethered access”, for example, or that “function-hosting” can mean “VM” or “container” subclasses.  With this kind of approach, we can apply intent modeling without having our non-brittle goals stymied by inconsistent definitions and integration.  If Vendor A defines a “my-access-strategy” model, for example, what’s the likelihood that Vendor B will adopt the same approach?

A final point in modeling is just what you’re expecting the model to drive.  Do you want to model a “service lifecycle” for the purposes of automating it?  If so, then you are accepting the notion that service conditions can arise and can be detected at the service level.  While we surely have example of that, the broad trend in the industry since the replacement of TDM services by packet services has been to manage resources at the resource level and remediate based on resource reconfiguration and substitution.  The presumption is that resources are inherently multi-tenant, so individual per-service remediation isn’t practical.

All of this circles back to that original light-versus-full-composition division.  We tend to see network connection services modeled at a light-touch level because we build connection services largely by coercing naturally cooperative device communities to do something.  The question we have yet to answer in our modeling debates is whether the cloud-like virtualization additions we now contemplate will follow that same model—whether a pool of servers augment a pool of functions that augment or replace a pool of devices.

I think the question of “intent” is critical in modeling for more reasons than the classic one.  What do we intend service lifecycle management to be, to mean?  If systems of hosting elements, functions, and connections are to assemble self-managing service elements that are then composed into services, we have one level of problem.  If we want to have greater granularity of composition and control to allow for per-service lifecycle management, we’ll have to model differently.  Now’s the time to think about this, because we are clearly coming to a point where our lack of a modeling strategy will either limit virtualization’s impact, or force operators and vendors to latch onto something—which might or might not fit the long-term needs of the industry.

Taking a Deeper Look at the Evolution of SD-WAN

There has been a lot of recent discussion about SD-WAN technology and its potential.  Not surprisingly, most of it has been marred by our industry tendency to over-generalize, to seize on a term that describes a host of options and presume that all the options are really the same.  SD-WAN is really important, but not all its options have the same mission or potential.

The common theme of SD-WAN is the use of edge devices to establish what are effectively VPN services.  The earliest examples of SD-WAN focused on using Internet connectivity to supplement traditional (MPLS) VPNs in places where either there were no MPLS options or where MPLS VPN pricing was prohibitive.  These SD-WANs usually supported multiple (MPLS and Internet) connectivity, and often also allowed their users to use multiple ISPs to improve performance and reliability.

This particular SD-WAN mission is clearly tied to arbitraging the price difference and SLA differences between MPLS VPNs and the Internet.  That differential depends on a bunch of factors, perhaps the largest being the presumption that there will never be Internet QoS because there will never be paid prioritization on the Internet.  That’s a regulatory policy issue, and in the US at least it’s likely that the mood of regulators is now shifting the other way.

You can state the SD-WAN mission another way, though.  You could say that the goal of SD-WAN is to present a uniform IP-connective service over a variety of lower-level connectivity options.  The MEF people told me over a year ago that SD-WAN could be a big part of a successful “Third Network” deployment because it could support consistent services as operators shifted from one underlayment (MPLS, Ethernet, whatever) to another.

It’s this second mission that I think will really shape SD-WAN over time.  It doesn’t have radically different technology requirements, but it would weigh the requirements differently and would also be marketed and deployed based on different drivers.  Today, the primary driver of the service is the control of VPN profit per bit.  For the future, operators see it as a way of making network technology evolution more seamless.

Up to now, SD-WAN has been either a managed service play or an option for connectivity deployed by users themselves.  There’s been, recently, just a hint of the broader second mission, driven according to operators I’ve talked with by the need to cope with continued price pressure on VPN services.  One operator told me that in the last seven years, the capital cost of VPNs has declined by about 18% and the opex has almost doubled.  A big part of this is related to the need to go down-market to sell new customers, given that current ones expect their services to get cheaper with every renewal of the contract.

This same profit-per-bit pressure is behind the drive to virtualize things, to build services a different way.  SDN technology is a good way to create services that do essentially nothing but forwarding, for example.  An SDN switch doesn’t really know anything about topology or even about IP, it has a forwarding table that it matches things against, and handles each packet accordingly.  Is this an IP network?  No, and Google among others has demonstrated that if you want to use SDN in place of IP you have to add in some things that an IP user would “see” that SDN won’t provide.  You can do that if you have a piece of CPE that creates the VPN service edge, which means if you have SD-WAN.

If you have a piece of “new-age” SD-WAN CPE, you could say that it divides itself into two pieces.  One is the user-side functionality, which is responsible for creating a network interface that looks like the kind of VPN or VLAN or whatever service the user expects to see.  The other is the network-side functionality, which is primarily responsible for framing the connectivity of the service in the terms of the actual network capability.  If your SD-WAN uses MPLS, this is where the MPLS link has to be made.  If it uses some kind of secure tunnel over the Internet, it’s supported here.  If it expects SDN connectivity, or optical virtual wires, it’s connected here.

It seems pretty likely to me that future services will tend to be constructed using SD-WAN technology like this, because future services will likely evolve from the use of different connection services at the network level.  It also seems to me that there are many different SD-WAN technologies today that either don’t fit this approach at all, or fit it with significant limitations.  Those may or may not be useful, depending on the way that connectivity evolves.

If the FCC in the US were to sweep away all restrictions on inter-ISP settlement and paid prioritization of traffic, we’d end up seeing an Internet that had QoS.  That would quickly become the baseline for providing VPN services because it would be significantly less costly.  In this scenario, we would see SD-WAN VPNs evolve to exploit IP features that are naturally visible at the edge, and you would want your boxes to have some form of box-to-box high-level (Level 4) signaling to mediate the services at the user level.  If, on the other hand, we never see any QoS on the Internet, then it’s likely IMHO that VPNs will split off from IP to exploit SDN, NFV, agile optics, and so forth.

Logically, what we should want from SD-WAN is a kind of modular structure to handle both situations.  You have a plugin for the user connection, and this is the feature set that defines the retail service.  You have another for the network side, matched to the specific technology you’re exploiting there, and you mix and match them as needed.  This would be easy if you had software plugins, features/functions that could be installed in an agile premises device.

This may be the future mission of vCPE, and the most powerful stimulus for NFV-like deployments.  What users want and need on the premises, first and foremost, is a service interface to plug into.  If there is only one option for service, then the value of agility is limited.  If we’re in a serious state of network technology evolution, then agility is everything, and the SD-WAN model may be the best, even only, way to meet future goals.

What is a Model and Why Do We Need One in Transformation?

After my blog on Cisco’s intent networking initiative yesterday, I got some questions from operator friends on the issue of modeling.  We hear a lot about it in networking—“service models” or “intent models”, but typically with a prequalifier.  What’s a “model” and why have one?  I think the best answer to that is to harken back to what I think are the origins of the “model” concept, then look at what those origins teach us about the role of models in network transformation.

At one level, modeling starts with a software concept called “DevOps”.  DevOps is short for “Development/Operations”, and it’s a software design and deployment practice aimed at making sure that when software is developed, there’s collateral effort undertaken to get it deployed the way the developers expected.  Without DevOps you could write great software and have it messed up by not being installed and configured correctly.

From the first, there were two paths toward DevOps, what’s called the “declarative” or “descriptive” path, and what’s called the “prescriptive” path.  With the declarative approach, you define a software model of the desired end-state of your deployment.  With the prescriptive path, you define the specific steps associated with achieving a given end-state.  The first is a model, the second is a script.  I think the descriptive or model vision of DevOps is emerging as the winner, largely because it’s more logical to describe your goal and let software drive processes to achieve it, than to try to figure out every possible condition and write a script for it.

Roughly concurrent with DevOps were two telecom-related activities that also promoted models.  One was the Telemanagement Forum’s “NGOSS Contract”, and the other the IPsphere Forum’s notion of “elements”.  The TMF said that a contract data model could serve as the means of associating service events and service processes, and the IPSF said that a service was made up of modular elements assembled according to a structure, and “orchestrated” to coordinate lifecycle processes.

What’s emerged from all of this is the notion of “models” and “modeling” as the process of describing the relationship between components of what’s a logically multi-component, cooperative, system that provides a service.  The idea is that if you can represent all suitable alternative implementation strategies for a given “model”, you can interchange them in the service structure without changing service behavior.  If you have a software process that can perform NGOSS-contract-like parsing of events via the service model represented by a retail contract, you can use that to manage and automate the entire service lifecycle.

I think that most operators accept the idea that future service lifecycle management systems should be based on “models”, but I’m not sure they all recognize the features of the models that model derivation as I explained it would require.  A model has to be a structure that can represent as two separate things the properties of something and the realization of those properties.  It’s a “mister-outside-mister-inside” kind of thing.  The outside view, the properties view, is what we could call an “intent model” because it focuses on what we want done and not on how we do it.  Inside might be some specific implementation, or it might be another nested set of models that eventually decompose into specific implementations.

One of the big mistakes made in modeling is the requirement for event integration.  Each model element has an intent and a realization, and the realization is the management of the lifecycle of that element.  Thus, every model element has its own events and operating states, and these define the processes that the model requires to handle a given event at a given time.  If you don’t have state/event handling in a very explicit way, then you don’t have a model that can coordinate the lifecycle of what you’re modeling, and you don’t have service automation.

One of the things I look for when vendors announce something relating to SDN or NFV or cloud computing or transformation is what they do for modeling.  Absent a modeling approach that has the pieces I’ve described, you can’t define a complete service lifecycle in a way that facilitates software automation, so you can’t have accurate deployments and you can’t respond to network or service conditions efficiently.  So, no opex savings.

Models also facilitate integration.  If a service model defines the elements of a service, each through its own model, and defines the service events and operating states, then you can look at the model and tell what’s supposed to happen.  Any two implementations that fit the same intent model description are equivalent.  Integration is implicit.  Absent a model, every possible service condition has to somehow figure out what the current service state is, and what the condition means in that state, and then somehow invoke the right processes.  The service model can define even the APIs that link process elements; with no model what defines them, and insures all the pieces can connect?

Where something like policy management fits into this is a bit harder to say, because while we know what policies are at a high level (they are rules that govern the handling of conditions), unlike models it may not be clear how these rules relate to specific lifecycle stages or what specific events the conditions of the policies represent.  It’s my view that policy management is a useful way of describing self-organizing systems, usually ones that have a fairly uniform resource set on which they depend.

Router networks are easily managed using policies.  With NFV-deployed router instances, you have to worry about how each instance gets deployed and how it might be scaled or replaced.  It’s much more difficult to define policies to handle these dependencies, because most policy systems don’t do well at communicating asynchronous status between dependent pieces.  I’m not saying that you can’t write policies this way, but it’s much harder than simply describing a TMF-IPSF-DevOps declarative intent model.

Policies can be used inside intent models, and in fact a very good use for policies is describing the implementation of “intents” that are based on legacy homogeneous networks like Ethernet or IP.  A policy “tree” emerging from an intent model is a fine way of coordinating behavior in these situations.  As a means of synchronizing a dozen or hundred independent function deployments, it’s not good at all.

This all explains two things.  First, why SDN and NFV haven’t delivered on their promises.  What is the model for SDN or NFV?  We don’t have one, and so we don’t have a consistent framework for integration or service lifecycle management.  Second, why I like the OASIS TOSCA (Topology and Orchestration Specification for Cloud Applications).  Because it’s all about doing the very thing that’s too dynamic and complex to control via policies.  Remember, we generally deploy cloud applications today using some sort of model.

Integration is fine.  API specifications are fine.  Without models, neither of them are more than a goal, because there’s no practical way to systematize, to automate, what you end up with.  We will never make controlled services and service infrastructure substitute for autonomous and adaptive infrastructure without software automation, and it’s models that can get us there.  So forget everything else in SDN and NFV and go immediately to the model step.  It’s the best way to get everything under control.

What Does Cisco Intend with “Intent Networking?”

Cisco has announced it’s going to support, and perhaps even focus on, “intent-based” networking.  At one level this could be viewed as a vindication of a widely held view that intent-modeling is the essential (and perhaps under-supplied or even missing) ingredient in the progression of virtualization.  At another level, it could be seen as another Cisco marketing strategy.  The truth is that it’s a little of both.

At the heart of today’s issue set is whole different notion, that of determinism.  The old-day time-division-multiplexed networks were deterministic; they worked in a specific way and provided very specific capacity and SLAs.  As packet networks, and particularly the Internet, evolved, networking tossed out strict determinism in favor of lower cost.  We had “best efforts” networks, which is what dominates today.

So what does this have to do with “intent?”  Well, best efforts is increasingly not good enough in a competitive market, but nobody wants to go back to full determinism to achieve something better—the cost would be excessive.  The alternative is to somehow couple service requirements into packet networks in a way that doesn’t break the bank.  In an intent model, elements of infrastructure are abstracted into a black box that asserts interfaces and an SLA but hides the details.  Intent modeling is therefore a way of looking at how to express how deterministic a network has to be.  It also leaves it to the vendor (and presumably the network-builder) to decide how to fulfill the intent.

Intent modeling is an incredibly important tool in realizing the benefits of virtualization and infrastructure transformation, because it lets operators create abstract building-blocks (intent-based black boxes) that combine to build networks, and that then evolve internally from legacy to modern technology.  A good evolutionary intent model has to be anchored in the present, and support the future.

Cisco’s approach to transformation has always been what cynics would call “cosmetic”.  Instead of focusing on building SDN or building NFV, Cisco has focused on achieving the goals of those technologies using behaviors coerced from current technology.  At one level, this is the same kind of marketing gloss Cisco has been famous for, for decades in fact.  At another it’s reflective of a simple truth, which is that transformational technologies that do the transforming by displacing legacy infrastructure are exceptionally difficult to promote because of cost and risk.

There really isn’t much new in the Cisco intent approach.  Cisco has always been an advocate of “policy-based” networking, meaning a form of determinism where the goals (the “intent”) is translated into a hierarchy of policies that then guide how traffic is handled down below.  This is still their approach, and so you have to wonder why they’d do a major announcement that included the financial industry to do little more than put another face on a concept they’ve had around for almost a decade.

One reason is marketing, of course.  “News”, as I’ve always said, means “novelty”.  If you want coverage in the media rags (or sites, in modern terms) then you have to do something different, novel.  Another reason is counter-predation.  If a competitor is planning on eating its way along a specific food chain to threaten your dominance, you cut them off by eating a critical piece yourself.  Intent modeling is absolutely critical to infrastructure transformation.  If you happen to be a vendor winning in legacy infrastructure, and thus want to stall competitors’ reliance on intent modeling as a path to displacing you, then you eat the concept yourself.

OK, yes, I’m making this sound cynical, and it is.  That’s not necessarily bad, though, and I’d be the first to admit that.  In one of my favorite media jokes, Spielberg when asked what had been the best advice he’d received as a director, said “When you talk to the press, lie.”  But to me the true boundary between mindless prevarication and effective marketing is the buyers’ value proposition.  Is Cisco simply doing intent models the only way they are likely to get done?  That, it turns out, is hard to say “No!” to.

We have struggled with virtualization for five years now, and during that period we have done next to nothing to actually seize the high-level benefits.  In effect, we have as an industry focused on what’s inside the black-box intent model even though the whole purpose of intent models is to make that invisible.  Intent modeling as a driving concept for virtualization emerged in a true sense only within the last year.  Cisco, while they didn’t use the term initially, jumped onto that high-level transformation mission immediately.  Their decision to do that clearly muddies the business case for full transformation via SDN and NFV, but if the proponents of SDN and NFV weren’t making (and aren’t making) the business case in any event, what’s the problem?

Cisco has done something useful here, though of course they’ve done it in an opportunistic way.  They have demonstrated the real structure of intent models—you have an SLA (your intent) on top, and you have an implementation that converts intent into network behavior below.  Cisco does it with policies, but you could do the same thing with APIs that passed SLAs, and then have the SLAs converted internally into policies.  Cisco’s model works well for homogeneous infrastructure that has uniform dependence on policy control; the other approach of APIs and SLAs is more universal.  So Cisco could be presenting us with a way to package transformation through revolution (SLAs and APIs) and transformation through coercion (policies) as a single thing—an intent model.

They could also be stimulating the SDN and NFV world to start thinking about the top of the benefit pyramid.  If Cisco can make the business case for “transformation” without transforming infrastructure, bring service control and a degree of determinism to networking without changing equipment, then more radical approaches are going nowhere unless they can make a better business case.

Is Cisco sowing the seeds of its own competition?  More likely, as I suggested above, Cisco is seeing the way that a vulnerability might be developing and working to cut it off.  But one way or the other, Cisco is announcing that the core concept of SDN and NFV isn’t just for SDN and NFV to realize.  Those who don’t want five years of work to be a science project had better start thinking about those high-level benefits that Cisco is now chowing down on.  There are only so many prey animals in the herd, and Cisco is a very hungry predator.

Solving the Problem that Could Derail SDN and NFV

Back in the days of the public switched telephone network, everyone understood what “signaling” was.  We had an explicit signaling network, SS7, that mediated how resources were applied to calls and managed the progression of connections through the hierarchy of switches.  The notion of signaling changed with IP networks, and I’m now hearing from operators that it changes even more when you add in things like SDN and NFV.  I’m also hearing that we’ve perhaps failed to recognize just what those changes could mean.

You could argue that the major difference between IP networks and the old circuit-switched networks is adaptive routing.  In traditional PSTN we had routing tables that expressed where connections were supposed to be routed.  In IP networks, we replaced that with routing tables that were built and maintained by adaptive topology discovery.  Nodes told each other who they could reach and how good the path was, simply stated.

The big advantage of adaptive routing is that it adapts, meaning that issues with a node or a trunk connection can be accommodated because the nodes will discover a better path.  This takes time, to be sure, for what we call “convergence”, meaning a collective understanding of the new topology.  The convergence time is a period of disorder, and the more complicated convergence is, the longer that disorder lasts.

SDN sought to replace adaptive routing with predetermined, centrally managed routes.  The process whereby this determination and management happens is likely not the same as it was for the PSTN, but at least one of the goals was to do a better job of quickly settling on a new topology map that efficiently managed the remaining capacity.  The same SDN central processes could also be used to put the network into one of several operating modes that were designed to accommodate special traffic conditions.  A great idea, right?

Yes, but.  What some operators have found is that SDN has implicitly reinvented the notion of signaling, and because the invention was implicit and not explicit they’re finding that the signaling model for SDN isn’t fully baked.

Some operators and even some vendors say NFV has many of the same issues.  A traffic path and a set of feature hosts are assembled in NFV to replace discrete devices.  The process of assembling these parts and stitching the connections, and the process of scaling and sustaining the operation of all the pieces, happens “inside” what appears at the service level to be a discrete device.  That interior stuff is really signaling too, and like SDN signaling it’s not been a focus of attention.

It’s now becoming a focus, because when you try to build extensive SDN topologies that span more than a single data center, or when you build an NFV service over a practical customer topology, you encounter several key issues.  Most can be attributed to the fact that both SDN and NFV depend on a kind of “out of band” (meaning outside the service data plane) connectivity.  Where does that come from?

SDN’s issue is easily recognized.  Say we have a hundred white-box nodes.  Each of these nodes has to have a connection to the SDN controller to make requests for a route (in the “stimulus” model of SDN) and to receive forwarding table updates (in either model).  What creates that connection?  If the connection is forwarded through other white-boxes, creating what would be called a “signaling network”, then the SDN controller also has to maintain its signaling paths.  But if something breaks along such a path and the path is lost, how does the controller reach the nodes to tell them how the new topology is to be handled?  You can, in theory, define failure modes for nodes, but you then have to ensure that all the impacted nodes know that they’re supposed to transition to such a mode.

In NFV, the problem is harder to explain in simple terms and it’s also multifaceted.  Suppose you have to scale out, meaning instantiate a new copy of a VNF to absorb additional load.  You have to spin up a new VNF somewhere, which means you need to signal a deployment of a VNF in a data center.  You also have to connect it into the data path, which might mean spinning up another VNF that acts as a load-balancer.  In NFV, if we’re to maintain security of the operations processes, we can’t expose the deployment and connection facilities to the tenant service data paths or they could be hacked.  Where are they then?  Like SS7, they are presumably an independent network.  Who builds it, with what, and what happens if that separate network breaks?  Now you can’t manage what you’ve deployed.

I opened this blog with a comment on SS7 because one EU operator expert explained the problem by saying “We’re finding that we need an SS7 network for virtualization.”  The fact is that we do, and a collateral fact is that we’ve not been thinking about it.  Which means that we are expecting a control process that manages resources and connectivity to operate using the same resources it’s managing, and never lose touch with all the pieces.  If that were practical we’d never have faults to manage.

The signaling issue has a direct bearing on a lot of the SDN and NFV reliability/availability approaches.  You don’t need five-nines devices with virtualization, so it’s said, because you can replace a broken part dynamically.  Yes, if you can contact the control processes that do the replacing, then reconnect everything.  To me, that means that if you want to accept the dynamic replacement approach to availability management, you need to have a high-reliability signaling network to replace the static-high-availability-device approach.

Even the operators who say they’ve seen the early signs of the signaling issue say that they see only hints today because of the limited scope of SDN and NFV deployments.  We are doing trials in very contained service geographies, with very small resource pools, and with limited service objectives.  Even there, we’re running into situations where a network condition cuts the signaling connections and prevents a managed response.  What’s going to happen with we spread out our service deployments?

I think SDN and NFV signaling is a major issue, and I know that some operators are seeing signs of that too.  I also think that there are remedies, but in order to apply them we have to take a much higher-level view of virtualization and think more about how we provide the mediation and coordination needed to manage any distributed system.  Before we get too far into software architectures, testing, and deployment we should be looking into how to solve the signaling problem.

The Drive to Add Feature Value to Networks is Changing How Networks are Built

Every vendor wants to sit astride the critical value propositions, and in networking that’s particularly true.  With capital spending under pressure, it’s crucial to have some strong value propositions you can spout to impress buyers.  The problem has been that “value” really means either cost or revenue, and much of networking is insulated from both these areas by the structure of services.  But the problem is increasingly in the past, because drive to differentiate is creating innovative solutions.

Traditional networks build upward from physical media through a series of “layers”, each of which sees only the services/features of the layer below.  The user experience is created at the top layer, which for the network (despite the media hype to the contrary) is “Level 3” most times, and Level 2 for the rest.  Operators tell me that there are very few services asserted at other layers.  They also tell me that almost three-quarters of their operations costs are incurred at the service layer, the top.

Virtualization has thrown out the basis of the traditional division of features, because virtualization allows for the creation of a virtual network with virtual layers.  For example, you can use virtualization to build a “virtual wire” from one continent to another, transiting all manner of facilities.  It still looks like a wire to the higher layers, but the traditional mission of Level 3, which is to tack physical media spans together to create routes and paths, really isn’t necessary any more.

Lower-layer network features that replace features normally provided at a higher layer are subtractive feature examples; your new features subtract requirements from higher layers, and by doing that potentially simplify them.  If, for example, we had total resilience at the optical layer, would we be able to eliminate not only error recovery but even dynamic routing at higher layers?

Another thing virtualization could do is support a new model of services.  NFV is an example of what could be called feature addition; you can build services that add features to basic connectivity.  These features could be related to connectivity, or they might be elements of applications, as would be the case with cloud computing.

Finally, you could think of parallelism as an attribute of services.  Today we get most IP services by partitioning IP infrastructure.  Might we instead use virtual wires to create truly independent subnetworks at the lower layers, and then pair these with hosted instances of switches and routers, or any other set of feature-driving elements, above?  Why not?

Virtualization isn’t the only driver here, either.  Every lower layer in the network, and every vendor who has products there, has aspirations to add features and capabilities.  These features generate impacts that fall into these same three categories.  Chip vendors want programmable forwarding, which is another set of features whose value proposition demands they change how “services” would be built.

How do we build them?  Each of the virtualization impact models require different accommodations, with a final common element.

Subtractive models are the simplest and most complicated at the same time.  They’re simple because a problem removed at a lower layer automatically simplifies the higher layers.  If an error doesn’t occur, then higher-layer steps aren’t required and operations is automatically less complicated and expensive.  They’re complicated because full exploitation of subtractive feature benefits requires that you remove the features from the higher layer.  As an example, SDN could take advantage of subtractive reliability and availability features at the optical layer because it has no intrinsic capability to recover from problems—you’d have to include that capability explicitly.  With proper lower-layer features you simplify SDN software and presumably lower cost.

The additive models are easier to understand because we have examples of them in both the cloud and NFV.  The challenge here is to identify incremental features that are valuable, and then cull them to eliminate those that are not expensive enough to justify hosting in the network.  Business firewalls are expensive, so you can probably host them.  Residential features of the same type are part of a device that might cost forty bucks and includes the essential WiFi feature.  Host that?

The parallel model is really all about benefits and costs.  We build VPNs today using network-resident features (MPLS).  We could build them by partitioning capacity through virtual wires at the optical or optics-plus level, and then add in the L2/L3 features using hosted instances of routing/switching or simply by using SD-WAN features.  Would the cost of this alternative approach be lower in the long run?  Would the security benefits make the service inherently more valuable to buyers?

The common element in all of this is that future services are less likely to be “coerced” from the cooperative behavior of a system of compatible devices and more likely to be deployed as a collection of features.  You start with infrastructure that is inherently more service-independent and make it service-specific by targeting the services you can sell and then deploying the behaviors that make up those services.

This requires a violation of the old OSI assumption of building on lower layers, because with the new model the goal may well be to exploit and thus to expose features of lower layers.  We’re seeing some of the new debate on this point already, because if you have (for example) the ability to expand transport bandwidth at Level 1, do you allow service creation to do that, or do you wait until the sum of higher-layer capacity demands signal a need?

If you decide to let service creation cross OSI layers, you buy into a much more complex approach to service management today, but one that better prepares for those three virtualization feature models down the road.  If we want to see every aspect of networking freed to develop its own optimum cost/benefit relationships, then we have to free every aspect of networking from traditional constraints—constraints applied at a time when we had no novel capabilities to exploit, so we had nothing to lose.

Experiences are the way of the future, service-wise.  We already know that universal connectivity provides a platform for enriching our experiences by supporting easy delivery.  Easy delivery alone isn’t the experience, though.  Eventually, we will have to see all services as composed feature sets married with connectivity.  Eventually, service management will have to be able to model those complex relationships and make them efficient and reliable.  The sooner we look at how that’s done, the better.

This is IMHO the strongest reason to be looking at cloud-intent-model-driven service models (like OASIS TOSCA) rather than models that were derived to control connectivity.  The most connective network you could ever have is simply an invitation for disintermediation if what the buyer wants is the experience that the network can deliver.  Those experiences can always be modeled as functions-plus-connectivity, and that looks like a cloud and not like a network.

Exploring the Operators’ Views on Transformation Drivers and Time-Line

In past blogs, I’ve said that there were three dominant drivers for transformational change in networking.  One is carrier cloud (which of course has its own drivers), one is 5G, and the last is IoT.  Given that the industry is fast-paced, it’s a good time to look at where we stand on each, based on the project information I’ve seen from the network operators.

This is particularly important given that traditionally cited drivers like SDN and NFV are clearly a moving finish line according to surveys.  Light Reading published a summary of a survey that said, in essence, that operators were pushing back their expected realization of virtualization plans.  Yes, because technology change without business drivers is just a science project.  So, what are the real drivers doing?  In some cases, a lot and in others not much.

Carrier cloud is a bit of a catch-all driver, but it is the meaningful consequence of all the carrier interest in transformation.  The net-net of this driver is that it represents a goal of shifting the carrier business model to owning experiences and delivering them on the network, rather than owing just the network and letting others provide the experience.  NFV, operator entry into cloud computing, and content and advertising ventures could all create specific carrier cloud opportunities.

In my view, based on operator inputs, I think carrier cloud as a driver is suffering from a large dose of the vagues.  To say that you want to transform to, or lead in, carrier cloud is like saying you want your profits to go up.  The goal is inarguable but the realization of it is difficult because there’s no specific defined pathway you can expect to follow.  Carrier cloud activity is dominantly linked to lab and market research projects that are still largely in the hoped-for stage.

What might get carrier cloud out of the fuzziness that plagues it now is IoT.  Smart operators have been gradually coming to realize that the media vision of IoT as “billions of new devices on the Internet generating billions of new cellular service bills” (as one cynical operator staff type puts it) is a pipe dream.  That realization killed off the one obvious IoT business model but didn’t suggest anything more rational.  Fortunately, Amazon has come along (joined by Microsoft and Google) in presenting IoT as an application of event-based cloud computing—in the form of functional or “lambda” programming.

Event-driven systems, as I’ve said, pose a dilemma for cloud providers.  On the one hand, events are likely widely distributed, so they might be a very logical source of new cloud demand.  On the other hand, they demand a short control loop for at least the early stages of their processing.  That’s why Amazon introduced its Greengrass on-premises hosting option for the Amazon event-centric Lambda service.  Carriers have convenient real estate to host event processing near the source.

The problem for both IoT as a driver and for carrier cloud as something IoT could drive, is that operators are still hesitant about getting into process hosting, particularly in the form of a generalized cloud service.  Remember that carriers have been generally unsuccessful in cloud computing ventures.  The Amazon example might seem a clear direction to mimic, but not if it’s an example of cloud computing.  Thus, carriers would have to see IoT as a set of functions aimed at facilitating M2M or something.  They may not see it as a cellular-billing opportunity any longer, but they still don’t have the process perspective.

The “Why?” of that is best answered by relating the question of a Tier One.  “What do we deploy?”  Operators have a major problem planning things that don’t have any hard-deployable elements.  This is the sort of thing they’ve traditionally turned to vendors to provide. If IoT is about event processing and function hosting, then will somebody please sell me the stuff?  If you can point it out, tell me what it takes to run, price it, etc. then I can figure out whether I can make a business case.  The notion of formulating an IoT event-and-process strategy from scratch, then going out and assembling the pieces, is pretty well out of the carrier comfort zone.

The good news is that smart staff types in at least the major Tier One operators are accepting event-and-function reality for the first time, and in no small part because of the initiatives of the cloud providers.  They are looking at some packages (including GE Digital’s Predix) and some middleware and hosting options.  I think we may see some movement in this space by late in 2018.

Surprisingly, the carrier cloud impact of IoT planning might come along before the real IoT application.  One reason is that NFV’s “favorite” application, virtual CPE, is logically targeted at business sites, and these are also the targets for event processing.  A “function” in IoT terms isn’t the same as an NFV virtual function, but there are many similarities at the hosting level, and it would be possible to present IoT event-function hosting as an NFV application (were any vendors fully rational in this space, at least).  Vendors and operator planners eager to find a reason to deploy some servers in edge offices might find the IoT process hosting mission an attractive add-on to the vCPE hosting opportunity.

The final driver to be explored is 5G, and it’s the hardest of the three to get any handle on.  On the one hand, mobile operators are at least titularly committed to 5G at an almost unprecedented rate.  On the other hand, hardly any of them think that the specifications are even fully baked, and when you ask them what their 5G drivers are, you often get vague responses.

In fact, there are only two solid drivers for 5G at this point, one technological and one marketing.  On the technological side, many operators think that 5G is an opportunity for them to create fiber tail-connections to make fiber-to-the-node truly useful.  It’s a bonus that these nodal 5G cells could also provide better mobile coverage.  On the marketing side, it’s competition.  “More G’s win,” according to one operator.

If the only value in 5G is “more G’s” then it’s likely any deployment would be designed to have minimal impact on the network overall, simply to avoid introducing new costs.  A 5G tail connection doesn’t necessarily need a lot of the novel 5G features either, so it’s safe to say that 5G is still in a very early deliberation-and-planning stage.  Operators think that’s the case too; most stay they can “see the value” of things like network slicing, but they can’t put a number on it.  This reminds me of the state of NFV a couple years ago.

The bad news here is that none of the transformational drivers in networking show any sign of driving major changes in infrastructure in the near term.  The good news is that there does seem to be recognition that at least IoT and some aspects of 5G could drive value, and my personal view is that vendors or operators who want action before 2020 will need to somehow promote the edge-hosting model of event-driven IoT to get it.

How Do We Define Software-Defined Network Models?

If networks are truly software-defined, what defines the software that defines them?  This is not only the pivotal question in the SDN and NFV space, but perhaps the pivotal question in the evolution of networks.  We knew how to build open, interoperable networks using fixed devices like switches and routers, but it’s increasingly clear that these old methods won’t work in the new age of virtualization and software.  What does?  There are a variety of answers out there, but it may be that none are really complete.

The classic network solution is the formal standards process, which we have seen for both NFV and SDN.  The big question with the standards approach is “What do you standardize?”  SDN focused on standardizing a protocol, OpenFlow, and presumed that by doing that they would achieve openness and interoperability among the things that supported the protocol.  NFV originally said they weren’t going to write standards at all, but simply select from those already available.  That approach isn’t being followed IMHO, and arguably what NFV did do was to standardize a framework, an architecture, that identified “interfaces” it then proposed to standardize.

I don’t think that either SDN or NFV represents a successful application of formal standards to the software-defined world.  You could kick around the reasons why that’s true, but I think the root cause is that software design doesn’t work like formal standards work.  You really need to start software at the top, with your benefits and the pathway you propose to follow in achieving them.  Both SDN and NFV defined their model before they had identified the critical benefits and the critical steps that would be needed to secure them.

A second approach, this one from the software side, is the open-source model.  Open source software is about community development, projects staffed by volunteers who contribute their efforts and aimed at producing a result that’s open for all to use without payment for the software that results.  It’s worked with Linux, and so why not here?

I’m a fan of open source, but it has its limitations, the primary one being that the success of the project depends on the right software architecture, and it’s hard to say where that architecture vision comes from.  In Linux, it came from one man, and his inspiration was a running operating system (UNIX) mired in commercial debates and competing versions.  But for SDN and NFV there’s not just a division but a set of divisions on the open-source side that make things even more complicated.

One obvious division is among competing projects that have the same goal.  For both SDN and NFV we have that already, and even if all the projects are open, they are different and so threaten interoperability by creating competing software models that could be the targets for integration and deployment.  What works with one will probably not work with others, without special integration work at least.

Another division is the end-game-versus-evolutionary-path approach conflict.  We have projects like CORD (Central Office Re-architected as a Datacenter) that define the future end-state of software-driven networking, and others like the various NFV MANO projects that define a stepping stone toward that future.  It’s not clear to many (including me) just what all the MANO projects would generate as a future network model, and it’s not clear what actionable steps toward CORD would look like.

All of this uncertainty is troubling at best, but it’s intolerable if you want operators to commit to a big-budget transformation.  Add to this the fact that the important work being done today is de facto committed to one of these (flawed) approaches, and you can see that we could have a big problem.  It may even be too big to solve for current software-driven initiatives, but at the least we should try to lay out the right approach so future initiatives will have a better shot at success.  If we can then apply the right future answer to current work, retrofit it, we have at least a rational pathway forward.  If we can’t make the retrofit work, then we’ll have to accept that current initiatives are not likely to be fully successful.

Software projects have to start with an architecture, because you have to build software by successively decomposing missions and goals into functions and processes.  The architecture can’t be a “functional” one in the sense of the ETSI End-to-End model because it has to describe the organization of the software.  ETSI ended up doing a high-level software architecture perhaps without intending it, because you can’t interpret a functional model into software any other way.   A software expert would not have designed NFV that way, and that problem cannot be fixed by tighter interface descriptions, etc.  The software design based on the model isn’t optimum, period.  SDN has a similar problem, but the Open Daylight work there, combined with the fact that SDN is a much more contained strategy, has largely redeemed the SDN approach.  Still, the fact that there has to be an “optical” version of the spec demonstrates that the approach was wrong; the right design would have covered any path types without needing extensions.

Standards, in the traditional network sense, aren’t going to generate software architectures.  That requires software architects, who are rarely involved in formal standards processes.  It would certainly be possible to target the creation of a software architecture as a part of a standards process, though, and we should do that for any future software-defined activities.  We didn’t do it with SDN and NFV, though, and it’s exceptionally difficult to retrofit a new architecture onto an existing software project.  That means that open-source software would have to evolve into an optimum direction, based on recognized issues and opportunities.  Which, of course, could take time.

We may have to let nature take its course now with SDN and NFV, but in my view, it’s time to admit that we can’t fit the right model onto the current specification, and that in any case we’re past the point where standards and specifications will help us.  Once we have an implementation model we need to pursue it.  If we have several, we need to let market conditions weed them out.  That means that current SDN and NFV standards shouldn’t drive the bus at all, but rather should undertake specific and limited missions to harmonize the multiplicity of approaches being taken by open source.

Specs and standards guide vendor implementations, and it’s clear that in the case of SDN and NFV we are not going to get implementations that fully address the benefit goals of the operators.  We have to start with things that do, and in my own view there is only one that does, which is AT&T’s ECOMP, now part of the ONAP project in the Linux Foundation, along with OPEN-O.  ECOMP provides the total-orchestration model that the ETSI spec and other MANO implementations lack.  That’s true not only for NFV, but also for SDN.

It’s time for a change here, and the thing we need to change to is the new ONAP platform.  The best role for ETSI here would be to map their stuff to ONAP and facilitate the convergence of MANO alternatives with it.  The best role for the ONF would be to do the same with SDN.  Then, we need to get off the notion that traditional standards can ever successfully drive software virtualization projects.

The Role of As-a-Service in Event Processing, and it’s Impact on the Network

We seem to be in an “everything as a service age”, or at least in an age where somebody is asserting that everything might be made available that way.  Everything isn’t a service, though.  Modern applications tend to divide into processes stimulated by a simple event, and processes that introduce context into event-handling.  We have to be able to host both kinds of processes, and host them in the right place, and we also have to consider the connection needs of these processes (and not just of the “users” of the network) when we build the networks of the future.

The purpose of an as-a-service model is to eliminate (or at least reduce) the need for specialized hardware, software, and information by using a pool of resources whose result is delivered on demand.  The more specialized, and presumably expensive, a resource is, the more valuable the “aaS” delivery model could be.  The value could come in the cost of the resource, or because analytic processes needed to create the resource would be expensive to replicate everywhere the results are needed.

You can easily envision an as-a-service model for something like calculating the orbit of something, or the path of an errant asteroid in space, but the average business or consumer doesn’t need those things.  They might need to have air traffic control automated, and there are obvious advantages to having a single central arbiter of airspace, at least for a metro area.  On the other hand, you don’t want that single arbiter to be located where the access delay might be long enough for a jet to get into trouble while a solution to a traffic issue was (so to speak) in flight to it.

Which might happen.  The most obvious issue that impacts the utility of the “aaS” option is economy the resource pool can offer.  This is obviously related not only to the cost of the resource, but also to how likely it’s going to be used, by how many users, and in what geography.  It’s also related to the “control loop”, or the allowable time between a request for the resource in service form and the delivery of a result.  I’d argue that the control loop issue is paramount, because if we could magically suspend any latency between request and response, we could serve a very large area with a single pool, and make the “aaS” model totally compelling.

The limiting factor in control loop length is the speed of light in fiber, which is about 120 thousand miles per second, or 120 miles per millisecond.  If we wanted to insure a control loop no more than 50 milliseconds long, and if we presumed 20 milliseconds for a lightweight fulfillment process, we’re left with 30 milliseconds for a round trip in fiber, or a distance of about 1800 miles.  A shorter control loop requirement would obviously shorten the distance our request/response loop could travel.  So would any latency introduced by network handling.  As a practical matter, most IoT experts tell me that process control likely can’t be managed effectively at more than metro distances because there’s both a short control loop requirement and a lot of handling that happens in the typical access and metro network.

Still, once you’ve paid the price for access/metro handling and have your request for a resource/service on fiber, you can haul it a long way for an additional millisecond or two.  Twenty milliseconds could get you to a data center in the middle of the US from almost anywhere else in the country, and back again.  That is, in my view, the determining factor in the as-a-service opportunity.  You can’t do everything as a service with that long a control loop, which means that event-driven processes in the cloud or as a part of a carrier service will have to be staged in part to resources nearer the edge.  But with proper software and network design you can do a lot, and the staging that’s needed for resource hosting is probably the driver behind most network changes over the next decade or so.

One obvious truth is that if electrical handling adds a lot to the delay budget, you want to minimize it.  Old-day networks were an electrical hierarchy to mass up traffic for efficient handling.  If fiber is cheap enough, no such massing up is needed.  If we could mesh hosting points with fiber connections, then we could make more seldom-used (and therefore not widely distributed) features available in service form without blowing our control loop budget.

In a given metro area, it would make sense to mesh as many edge hosting points as possible with low-latency fiber paths (wavelengths on DWDM are fine as long as you can do the mux/demux without a lot of wasted time).  I’d envision a typical as-a-service metro host network as being a redundant fiber path to each edge point from a metro data center in the center, with optical add/drop to get you from any edge point to any other with just a couple milliseconds of add/drop insertion delay.  Now you can put resources to support any as-a-service element pretty much anywhere in the metro area, and everything ties back with a low-latency path to a metro (and therefore nearby) data center for hosting processes that don’t require as short a control loop.  You could carry this forward to state/regional and central data centers too.

All this hosting organization is useless if the software isn’t organized that way, and it’s not enough to use “functional” techniques to make that happen.  If the context of an event-driven system has to be determined by real-time correlation of all relevant conditions, then you end up with everything at the edge, and everything has to have its own master process for coordination.  That doesn’t scale, nor does it take advantage of short-loop-long-loop segregation of processes.  Most good event-driven applications will combine local conditions and analytic intelligence to establish conditions in terms of “operating modes”, which are relevant and discrete contexts that establish their own specific rules for event-handling.  This is fundamental to state/event processing, but it also lets you divide up process responsibility efficiently.

Take the classic controlled-car problem.  You need something in the car itself to respond to short-loop conditions like something appearing in front of you.  You need longer-loop processes to guide you along a determined route to your destination.  You can use a long-loop process to figure out the best path, then send that path to the car along with a set of conditions that would indicate the path is no longer valid.  That’s setting a preferred state and some rules for selecting alternate states.  You can also send alerts to cars if something is detected (a traffic jam caused by an accident, for example) well ahead, and include a new route.  We have this sort of thing in auto GPSs today; they can receive broadcast traffic alerts.  We need an expanded version in any event-driven system so we can divide the tasks that need local hosting from those that can be more efficiently handled deeper in the cloud.

We also need to be thinking about securing all this.  An as-a-service framework is subject to hacking as much as a sensor would be, though it’s likely easier to secure it.  There is a unique risk with one, though, and that’s the risk of impersonation.  If you have an event-driven system sensitive to external messages, you have a system that can be doctored by spoofing.  Since event processing is about flows, we need to understand how to secure all the flows to prevent impersonation.

As-a-service is critical for future cloud applications, but particularly so for event-driven systems.  By presenting information requirements derived from analytics and not just triggered by a simple event as “services” we can simplify applications and help divide tasks so that we use more expensive edge resources more efficiently.  To get the most from this model, we’ll need to rethink how we network our edge, and how we build applications to use those services optimally.

Amazon Signals a Major Shift in Software and the Cloud

Amazon is making its Greengrass functional programming cloud-to-premises bridge available to all customers, and Nokia has already announced its support on Nokia Multi-Access Edge Computing (MEC) gear.  This is an important signal to the market in the area of IoT, and also a potentially critical step in deciding whether edge (fog) computing or centralized cloud will drive cloud infrastructure evolution.  It could also have profound impact on chip vendors, server vendors, and software overall.

Greengrass is a concept that extends Amazon’s Lambda service outside the cloud to the premises.  For those who haven’t read my blogs on the concept, the Lambda service applied functional programming principles to support event processing and other tasks.  A “lambda” is a unit of program functionality that runs when needed, offering “serverless” computing in the cloud.  Amazon and Microsoft support the functional/lambda model explicitly, and Google does so through its microservices offering.

The challenge that Amazon faced with Lambda was a poster child for the edge/central cloud issue I’ve blogged about (most recently last week).  The most compelling application for Lambda is event processing, including IoT.  Most event processing is associated with what are called “control loop” applications, meaning that an event triggers a process control reaction.  These applications typically demand a very low latency for the obvious reason that if, for example, you get a signal to kick a defective product off an assembly line, you have a short window to do that before the product moves out of range.  Short control loops are difficult to achieve over hosted public cloud services because the cloud provider’s data center isn’t local to the process being controlled.  Greengrass is a way of moving functions out of the cloud data center and into a server that’s proximate to the process.

The obvious message here is that Amazon knows that event-processing applications will need edge-hosting of functions.  Greengrass solves that problem by moving them out of the public cloud, which is good in that it solves the control-loop-length problem and bad in that it denies Amazon the revenue associated with the running of the functions.  To me, this is a clear signal that Amazon eventually wants to offer “edge hosting” as a cloud service, which means that the cloud event-processing opportunity creates such a need, which means that IoT creates it.

There are few credible IoT applications that aren’t related to some kind of event processing since IoT is all about handling sensor-created events.  Thus, a decisive shift toward IoT as a driver of cloud deployments could shift the focus of those deployments to the edge.  This could change a lot of power balances.

In the cloud provider space, edge hosting is problematic because of real estate.  Cloud providers have traditionally focused on a small number of large data centers, not only for reasons of economy of scale in hosting, but to avoid having to acquire facilities in every metro area.  Amazon may be seeing Greengrass as an opportunity to enter the event fray with an “integrated hybrid cloud” approach, where they could license a cloud service that includes the option for premises hosting.  However, facility-based service providers (telcos, ISPs, cablecos, etc.) would have edge-hosting-ready real-estate to exploit, and that could force the traditional cloud providers to look for their own space.

On the vendor side, edge hosting would be a boon to the chip vendors, particularly vendors who focus not on chips for “servers” but chips designed to run the more compute-intensive functional programming components associated with event processing.  The event-cloud model could look like a widely distributed set of compute nodes, requiring what could be millions of new chips.

At the same time, edge hosting divides the chip opportunity, or perhaps even totally changes it.  Functional programming is highly compute-intensive, to the point where strict adherence to its principles would make it a totally compute-driven process.  General-purpose server chips can still execute functional programs, but it’s likely that you could design a function-specific chip that would do better, and be cheaper.

At the server design level, we could see the possibility of having servers made up of more of these specialized chips, either by having dense multi-chip boards or by having a bunch of “micro-boards” hosting a traditional number of chips per board.  The combination would provide an entry point for a lot of new vendors.

This shift would favor (as I pointed out last week) network equipment vendors as providers for “hosting”.  A network-edge device is a logical place to stick a couple compute boards if you are looking for event processing support.  This wouldn’t eliminate the value of and need for deeper hosting, since even event-driven applications have back-end processes that look more like traditional software than like functional programming, but it would make the back-end the tail to the event-edge dog.

On the software side, event-focused application design that relies on functional programming techniques could shift the notion of cloud applications radically too.  You don’t need a traditional operating system and middleware; functional components are more like embedded control software than like traditional programs.  In fact, I think that most network operating systems used by network equipment vendors would work fine.

That doesn’t mean there aren’t new software issues.  Greengrass itself is essentially a functional-middleware tool, and Microsoft offers functional-supporting middleware too.  There are also special programming languages for functional computing (Haskell, Elm, and F# are the top three by most measures, with F# likely having a momentum edge in commercial use), and we need both a whole new set of middleware tools and a framework in which to apply them to distributed application-functional design.

The issues of functional software architectures for event-handling are complicated, probably too complicated to cover in a blog that’s not intended purely for programmers.  Suffice it to say that functional event programming demands total awareness of workflows and latency issues, and that it’s probably going to be used as a front-end to traditional applications.  Since events are distributed by nature, it’s reasonable to expect that event-driven components of applications would map better to public cloud services than relatively centralized and traditional IT applications.  It’s therefore reasonable to expect that public cloud services would shift toward event-and-functional models.  That’s true even if we assume nothing happens with IoT, and clearly something will.

What we can say about functional software impacts is that almost any business activity that’s stimulated by human or machine activity can be viewed as “event processing”.  A transaction is an event.  The model of software we’ve evolved for mobile-Internet use, which is to front-end traditional IT components with a web-centric set of elements, is the same basic model that functional event software implements.  Given that, it is also very possible that functional logic will evolve to be a preferred tool in any application front-end processes, IoT and machine-driven or human-based.

That means that Amazon’s Greengrass might be a way for Amazon to establish a role for itself in broader IT applications.  Since Amazon (and Microsoft, and Google) also have mobile front-end tools, this might all combine to create a distinct separation of applications between a public-cloud-front-end component set and a traditional data-center-centric back end.  This, and not the conversion of legacy applications to run in the cloud, would then be the largest source of public cloud opportunity.

A “functional cloud” would also have an impact on networking.  If we assume that event processing is a driver for the future of cloud services, then we have to assume that there is a broad need to control the length of that control loop, meaning network latency.  Edge-hosting accomplishes that for functional handling that occurs proximate to the event source, but remember that all business applications end up feeding more traditional deeper processes like database journaling.  In addition, correlation of events from multiple sources has to draw from all those sources, which means that the correlation has to be sited conveniently to all, and have low-latency paths.  All of this suggests that functional clouds would have to be connected with a lot of fiber, that “data center interconnect” would become a subset of “function-host-point interconnect”.

Overall, the notion of a function-and-event-driven cloud could be absolutely critical.  It would change how we view the carrier cloud because it would let carriers take advantage of their edge real estate to gain market advantage.  It’s been validated by all the major public cloud providers, including OTT giant Google.  Now, Amazon is showing how important edge hosting is.  I think it’s clear that Amazon’s decision alone would carry a lot of predictive weight, and of course it’s only the latest step on an increasingly clear path.  The times, they are a ‘changing.