Can Multiple Paths to Carrier Cloud Get Us to the Destination?

Video and advertising are the most credible drivers for carrier cloud over the next five years, and they may be the most critical even in the long run.  They’re also the best hope for early deployment of edge computing.  The question is what kind of software and infrastructure video and advertising might end up deploying, and how the deployed stuff might then facilitate other applications of carrier cloud.

One useful way of assessing how carrier cloud drivers might interact is to look at the very near term (2019) versus medium-term (2020-2022) potential for justifying carrier cloud deployments, and see what the most important symbiosis points would be.  This year and next, video/advertising represent half the total opportunity drivers for carrier cloud, but for the following three years it represents only about a quarter of opportunity.  Three other areas (operator cloud services, contextual and personalization services, and IoT) will each double their influence on carrier cloud between 2019 and 2022.  Realizing these mid-term opportunities would be most likely if video/contextual services prepped infrastructure for them.

The requirements for video/contextual services are evolving largely due to the pressure of streaming video in general, and streaming live TV in particular.  The classic “what’s on” model of scheduled viewing is still the dominant source of ad revenues, and so if we transition to streaming for live TV we can expect to have to at least sustain if not improve advertising revenues through the shift.  This seems to depend on three major factors, content delivery network caching for both programming and commercials, personalization of ad delivery based on all possible monetizable factors, and efficient insertion of ads within the commercial slots.

CDN use in live programming favors edge caching because the fact that programming is delivered on a specific schedule makes it possible to get reasonable estimates of audience size and avoid the classic streaming problem of having a thousand viewers of the same material distributed over just enough time to make efficient delivery difficult.  There is some evidence that even time-shifted (cloud DVR or on-demand) viewing of scheduled content is more predictable, and thus makes edge caching more efficient.

Edge caching is beneficial to any form of ad insertion in live video, for the simple reason that it’s difficult to accommodate any time slippage in delivery when you’ve got to fit things seamlessly into time slots.  Advertising (commercials), at least in personalized form, don’t run on as dependable schedule as live TV, but there are generally fewer commercials on tap for delivery and it’s not unreasonable to assume that you can cache most of them close to the edge in most metro areas where demand density is high.

Caching near the edge would be a significant driver for carrier cloud and edge computing, because the current trend in CDNs is toward hosted functionality rather than custom appliances.  Furthermore, hosted CDN functionality allows operators to allocate a pool of resources to CDN and other carrier cloud applications, creating the potential for that positive leverage I noted earlier.  However, it doesn’t mandate either carrier cloud or edge caching, because in theory operators could cede CDN responsibility to the content providers.  This doesn’t seem likely if the operators are using CDN to support their own streaming live TV services, though.

Ad insertion raises the question of personalization, our second factor.  If ads are selected to fit into predefined programming slots, the selection can consider the specific demographics and interests of each consumer, which means ads can command higher prices.  That’s important in an age where the number of commercial minutes per show is already annoying many viewers, inducing them to move to time-shifted viewing or even to abandon live TV in favor of streaming libraries from firms like Hulu, Netflix, or Amazon.

The current state of the art in personalization of ads doesn’t require real-time intervention or perhaps even much specific customization.  Anyone in the content delivery chain who has the right to insert ads could provide profile information to facilitate ad selection, and that information would be sufficient to pick the right commercials from an inventory of cached material.  Propagation delay in this process wouldn’t be a meaningful factor.

Both advertisers and content providers tell me that they expect to see more precise targeting in the future.  One thing of particular interest is linking commercials to current online search and social media activity, and another is linking ads to programming based on the specifics of the content at the time of the commercial insertion.  These accommodate the specific interest of a viewer, perhaps stimulated by other ads or by paid product appearances in the content itself.  In either case, a more real-time selection of commercials would be required, and this could (but wouldn’t necessarily) promote edge computing.  It would surely be a good carrier cloud application.

Ad personalization is obviously a lead-in to what I’ve called “contextualization and personalization”.  This collective application involves understanding more about what a user/viewer is doing at the moment as well as demographic or historical factors, and using the combination to anticipate what the user’s wants and interests could be at any moment.  Providing for this could cover a lot of ground in terms of data gathering and even event processing, but much of the personalization goals of video/advertising don’t require event-handling at all, and therefore wouldn’t necessarily promote edge computing.

That seems to open the key carrier cloud evolution issue.  Right now, video/advertising would promote carrier cloud and the CDN/caching mission could promote edge computing, but neither is guaranteed to require or promote event-based applications.  That would mean that event-driven contextualization of user interests and the integration of IoT might not fall out of early carrier cloud development, which could slow their impact on further promotion of carrier cloud.

This illustrates one of the big issues we face in promoting technology revolutions; multiple drivers are often needed to create the massive benefits needed to fund the revolution, and yet the drivers may not fuel a common technology shift that benefits everything that we’re depending on.  If we can’t harness video/advertising to build out the correct model at the edge, then future carrier cloud applications will have to justify their own smaller revolutions to deploy, creating a less-than-optimum path to the future.

The Multidimensional Changes in the Cloud Market

The cloud has been a kind of cultural influence on the IT space for years, but the influence is now becoming more direct, and the pace of that shift is picking up very quickly.  Each of the factors that seem to be driving this shift are important, and in combination they might be transformative.

The biggest factor in the shift is the growing influence of enterprise buyers on cloud provider revenues.  While many have seen the public cloud as the natural destination for every corporate application, the truth is that most corporate applications are staying where they started—in the data center.  As a result, early cloud revenues have been dominated by startups, particularly those involved in some way in ad-sponsored services, like social networking.  Amazon’s supremacy in the public cloud space owes much to its dominance of that particular niche.  However, selling cloud services to startups is clearly not a growth space; most of the big winners end up going to their own infrastructure eventually, too.  Enterprises are now working their way into the market.

Microsoft has a natural advantage in enterprise cloud, because nearly all enterprise cloud aspirations aim at supplementing data center or in-house-hosted applications with specialized front-ends.  Microsoft has server and desktop dominance, and so can create the kind of symbiosis between the cloud and the premises that users want to see.  As a result, my own surveys show Microsoft’s Azure leads Amazon’s AWS in enterprise cloud sales, and there’s some indication that lead is growing.

Amazon’s initial reaction to this was to attempt to extend its own cloud outside its boundaries using Greengrass, a server technology capable of running AWS code.  This approach hasn’t proved to be sufficient, and most recently, Amazon has extended its alliance with VMware to create a stronger alliance to counter Microsoft and Azure.  I think even this step is too tentative because it’s not backed by any strident new positioning, the thing likely needed to get enterprises to pay attention to the moves.

Both Amazon and Microsoft are facing a common enemy in the form of IoT-and-event platforms designed for hosting functions/lambdas outside the public cloud.  Serverless or functional computing has been available from all the major cloud providers, but users have reported that the delays associated with loading functions in demand makes many event-driven applications less than impressive.

VMware has its own challenges, of course.  The champion of virtualization lost momentum to both OpenStack and containers, and has been struggling to get it back.  They’ve now recognized that SDN management of hybrid and multi-cloud address spaces could be the secret sauce, and so they’ve been promoting NSX as the premier SDN strategy—something that in terms of total users and total connected applications seems to be true.

SDN and networking have confounded the cloud providers too, at least so far.  VMware has been promoting its ability to connect AWS applications, but most of the major SD-WAN players can do that as well, so the differentiation value of VMware’s positioning has been limited.  Amazon hasn’t jumped on VMware, it’s said, because VMware also wants to connect Azure clouds for the good reason that would serve VMware’s own cloud interests better.

The network side of the cloud seems destined (eventually) to be what I’ve called “logical networking” or “network-as-a-service” (NaaS).  The elasticity and agility of the cloud make traditional location-based networking obsolete, but very few vendors have the features needed to promote an alternative, and none seems willing to position it aggressively.  The question, in my view, is whether public cloud providers or network operators will step in and take up the NaaS mantle.  Certainly the cloud providers are more agile and traditionally more aggressive early market movers, but the operators probably know they need a strong SD-WAN and NaaS approach badly to avoid complete cannibalization of their business services, and to support their own cloud computing initiatives (both internal use of carrier cloud and enterprise cloud computing offerings).

Google, so far, has been a laggard in all the missions I’ve talked about here, despite the fact that they have arguably the best technology for their cloud and also the best SDN/NaaS model.  It seems often that Google wants the market to come to them, recognizing the error of the “old ways” and embracing Google’s model even if they’re not totally sure what that model is.  They may be right in the long run, but they stand to lose a lot of positioning value near-term, and also risk having customers get entrenched in alternative models.

Microsoft seems to have the best position in the ongoing enterprise cloud services war.  They are leading at the moment, they have an established enterprise data center and desktop position, and they have a strong public cloud offering.  All this gives them the enormous advantage of a broad and simple message, one nobody else controls.  Amazon and Google, their rivals, have the disadvantage of relying on things that are really not in their control at all.  Amazon doesn’t control any credible premises strategy, and even an alliance with VMware won’t give them one.  Google’s best known for Kubernetes, which is the centerpiece of so many strategies that it’s hard for enterprises to really understand what it is and where it’s going.

VMware has a little of Microsoft’s benefits in the “singleness” sense.  OpenStack and Docker, like Kubernetes, are pulled in a million different directions, and even though VMware is behind the collective movements they’re probably ahead of any of the individual players in either space.  Add that to the fact that their whole future strategy can speak with one voice because they own it, and you could see them developing into something useful, even commanding.  Add to that VMware’s recent purchase of a multi-cloud management company, its reducing AWS pricing, and its aggressive courting of both Microsoft and Amazon, and you see a player who thinks they have an opportunity.

Do they?  It’s the application models that will likely set the course for the cloud.  There’s a lot of good work being done, but much of it focuses on developers and is (by management/executive reckoning) incredibly geeky in terms of documentation.  As a former software architect I can understand the idea that tools will reshape development, which will reshape applications, which will reshape business practices.  It just takes a long time, much longer than having executive mandates for change created by jumping right to the business practices and benefits side.  Every player who’s looking for cloud supremacy has to deal with the evolution of the application model, not only in a technical/software sense but in a positioning and education sense.  The latter has been a tough nut for today’s market to crack, and it’s not going to get easier.

A Wrap-Up on Event-and-Model-Driven ZTA

Continuing with my model-and-event-driven service automation theme, some of you wondered whether I could dig a little deeper into how state/event processes were hosted, and also why ExperiaSphere mandated the separation of “orchestration” into a service and resource layer.  This also gets into how federation of services across provider boundaries (or across administratively separate parts of the same provider) would work.  I reference my general description of the architecture HERE.  I don’t plan to take these issues deeper in blogs unless there’s a new development, but feel free to ask questions on LinkedIn!

The presumptions I’ve consistently made in modeling are that service models should be infrastructure-independent and that each model element be a “black box” or intent model.  You’ll recall that a model element, when “decomposed” could yield either other model elements (a hierarchy) or an atomic commitment of resources.  You may also recall that a given model element could decompose selectively, meaning it would yield different sub-components (either hierarchical model elements or resource commitments) depending on things like where the service was delivered or what parameters were provided in the order.

An example of a selective decomposition is where something like a cloud-hosted firewall, a uCPE-hosted firewall, or a firewall appliance might be selected depending on the service order.  All these options are resource-specific, which means that if we allowed a “service model” to contain them, the model would be brittle because changes in hardware options might change how the model had to describe the decomposition to the resource level.

My solution to this was to say that the decomposition into resources was never part of a service model, but rather a part of a resource layer.  The service model would then say (essentially) “I need a firewall in Oshkosh” and an Oshkosh-specific resource-layer model would then undertake to fulfill that need based on the parameters and what was locally available.  Since what the service layer asks for in this example is a binding to resources, I’ve said that the resource layer asserts behaviors that are abstract functional properties, and these are then linked to requests for those behaviors in service models.  There are a number of benefits to this approach.

First and most obvious, this lets service models stay functional in nature, immune to changes in infrastructure that could otherwise require that models be reformed when technologies evolve.  The service model at the bottom of a hierarchy would bind to a behavior in each area where it needed resources.  Oshkosh, in my example, is a location where customer access is needed and therefore where a firewall is needed.  Whatever firewall options are available in Oshkosh are then candidates for selection.

A less obvious point is that if the resource layer and the service layer are connected only by a binding to a behavior, anyone could own the resources a service called on, as long as they exposed the proper behavior. That makes this process “self-federating”, meaning that you don’t have to do anything special to cross a provider/administrative boundary other than what you always do, which is to support service/behavior binding.

A further benefit of this model is that because the resource layer exports the behavior from an intent-modeled object that then decomposes to the correct resource commitments, the resource details are hidden and the only thing the service layer sees is the SLA and parameters.  That’s what a wholesale/retail relationship requires.

There’s still more.  If something breaks in the implementation of a bound resource behavior, the service layer could simply re-bind the resource, which would redeploy things and presumably select something that works.  Further related to things breaking, the process of relating actual infrastructure conditions to the state/event processes of a model only happens within the resource layer; the service layer only operates on events generated by other intent-modeled service elements or on the events generated by the model element that represents the bound resources.

This approach to resources and Behaviors could also put some service-specific teeth into the notion of intent-modeled infrastructure.  You can model resource behavior in hierarchical layers too, so it would be perfectly reasonable to declare some Behaviors associated with hosting, for example.  These Behaviors could then be satisfied by intent models for cloud, container, and different implementations of both.  Even bare metal would be a reasonable option.

My final point on separating services and resources in modeling is the benefit this can bring to securing the service lifecycle processes themselves.  Resource commitments are likely to include data-plane connections, which must be in an address space visible to the user.  In addition, many of the management interfaces and tools designed for either network or server elements are visible in the data plane.  It’s desirable to keep the data plane isolated from the management/control plane of our lifecycle processes for security reasons, and it’s easier to do that if the service portion is separated in a different address space.  Control/resource separation facilitates that.

The fact that service models and resource models are independent doesn’t mean that they couldn’t be orchestrated by common Factory processes where the functionality was the same, or even share state/event processes.  The point is to isolate to the extent that isolation is valuable.

Let’s look now at the nature and implementation of state/event processes, which is where the lifecycle logic resides.  The general model I used in ExperiaSphere was that the entire service data model (either service or resource) would be passed to any referenced process.  This could be done by passing the actual data structure, or by passing a reference to the model in the repository.  The process needs to know the object it’s being invoked on and the data associated with that object and nothing else, so it would be possible to pass only that, but it raises the question of synchronization should several processes be operating on the model at the same time.

The process itself would surely be represented in the data model as a URI (Universal Resource Indicator, the general case of the familiar URL), and the only requirement would be that the referenced process be visible in the network address space of the Factory that’s invoking it.  My vision was that there would be a “Factory address space” where all the service factories and related processes were hosted, but you could segment this by service or any other useful means, subject only to the fact that this might restrict the reuse of common processes.

The processes themselves could in theory be hosted anywhere, at least for those referenced in a service model.  Resource processes might be best hosted in proximity to the resources themselves, which is why I took the view that each “hosting domain” or “cluster” of servers would likely have its own Service Factory (or Factories) and would host the appropriate lifecycle processes for the Behaviors it hosted and the models it decomposed.  Remember that a Factory is general logic to parse a model; the service lifecycle processes are just referenced in the model and invoked by the Factory as appropriate.  In theory, both Factories and processes could be totally stateless and thus scalable and resilient.

 This again raises the point of addressing and address spaces, a topic I think we’ve underthought in looking at modern virtualized or partially virtualized service infrastructure.  Everything that communicates has to be addressed, and the address space of the service itself, the data plane, also has to be considered.  If traditional management APIs are to be supported, they have to be connected.

I think the easiest way to think of a service is to think of it as a private subnet that contains the actual service-specific resources, much like the way a container-based application is deployed.  Addresses within that subnet are then selectively exposed to a broader space, which might be the service data plane or the service lifecycle process (the resource layer) plane.  Many models are possible, but we need to structure things the same way or we’ll have diverse implementations that require diversions in lifecycle practices, and that will make universal management much more difficult.

For those who want to dig deeper into the modeling, there’s a whole series of tutorial slides on the ExperiaSphere website.  I recommend you view the annotated version of the presentations where they’re provided, because the notes per slide are helpful.  As I said above, I’m not going to pursue these points further in a blog unless (as I hope there will be!) there are new developments in the space.

An Example of a State/Event Implementation of ZTA

When I did my blog yesterday on the problems with the ETSI ZTA software architecture, I had a number of emails asking how you could do lifecycle management using state/event principles.  They showed me that one problem we have in coming to a good consensus on ZTA software is the general lack of understanding on how to view “lifecycles” in event terms.

I built the original ExperiaSphere project in Java based on Service Factories, state/event, and componentized data models, and the same principle was used to define the second stage of the project, documented in a series of PowerPoint presentations.  You can get a specific and detailed explanation of what I’ll go over in this blog by reviewing THIS presentation.

Peoples’ questions showed that most think of a service as being something you “deploy” in a step, then manage by fielding any faults that come along.  This is an intrinsic retreat to the functional view, and the way my email contacts suggested I reverse it is to explain how you do state/event processing of a simple lifecycle.  I’ll do that, but to make it work I need to set the stage in a FIGURE.

In this figure, we see a service order describing the service as a series of intent models (the pentagons).  We also see a provider network “oval” and within it a series of “Service Factories” where the logic to associate events with a service model is hosted.  The Factories contain the necessary code to parse/decompose the data model itself, and also to pass events within the model.

The service data model is a structure that holds service data, either fixed or inherited from the service order, and also the structural relationship between intent-modeled service components.  Each component of a service (the lettered pentagons in the figure) is an intent model, and each intent model can contain either a deeper hierarchical structure (as the “B’s” to in the figure) or actual resource commitments.  Each component has an SLA that it administers internally, and each is responsible for generating an event only to superior/subordinate elements, to coordinate changes in overall state.

When an order is taken, the order entry process will withdraw a standard blank order template for the service and populate it with service-specific data.  The instance of the service data model is then created and stored in a repository, and all the intent models within it are in the “Ordered” state.  A primary factory (here, Factory 101) is assigned, and an “Activate” event sent to Factory 101 referencing the order.  Factory 101 then retrieves the order instance from the repository.

Within the “P” element is a state/event table, and that table defines an entry for the Activate event in the Ordered state.  This entry would point to the logic needed for “primary decomposition” of the data model.  This decomposition would take place based on the location where the service is to be delivered, for example, and it identifies that the “A” and “B” components of the service are actually needed here.  That logic, in our example, identifies that components “A” and “B” are needed, and it then assigns factories to each (Factories 111 and 121, respectively) and dispatches an Activate to each, again referencing the service order and element.

Factory 111 has all it needs, and so it decomposes its “A” object to direct provisioning.  Factory B has a choice of sub-elements (B1 through B3) and decides that B1 is what it needs, so it then sends an Activate to Factory 151 referencing Element B1 of the order instance.  The B1 element is decomposed there, and it yields the deployment logic for that component.  As each of the subordinate factories completes its task, it changes its state from Ordered to Active, and it generates an event to the superior object and factory, signaling “Active”, which then changes the state of the superior element to Active.  This rolls back to the primary “P” element, which then reports the service as active.

Now let’s assume we have a fault condition within the domain of Factory 151, where the responsibility for the B1 service element resides.  Factory 151 deployed infrastructure as a result of decomposing B1, and since B1 is intent-modeled, that infrastructure is responsible for meeting its SLA or reporting a fault.  Thus, if the problem that occurs can be addressed within B1, nothing has to happen outside.  If the fault cannot be corrected internally, then a “Fault” event is generated for B1 for this service.  In theory, that fault could be handled by any Factory, but let’s assume it’s handled by the one that deployed it (which is likely in most cases).

Factory 151 then processes a Fault, it sets B1 into the Fault state, and it then reports a Fault to its parent component “B”, which you’ll recall is handled by Factory 121.  That factory can then undertake whatever conditions would be suitable to correct things.  For example, it could generate a “Teardown” to B1, restoring it to the Ordered state, and reactivate it.  It could also decide (if possible) to deploy an alternative configuration like B2 or B3, sending an Activate to the selected component at one of the other Factories.

Every service condition here is an event.  Every event is associated with a given service/element, and its data model has a state/event table that identifies the process that’s to handle that particular combination.  The processes can all be stateless because all the data associated with the service is recorded in its record in the repository, which includes everything that every process did or needs, for every component of the service.

There are some important points to note here:

  1. There is a single common repository for the service data model for each instance of each service. That repository is used to prepare a Service Factory to process an event, so any service factory can in theory be given any event to process.  In practice, you’d probably pick a factory near where the actual deployment was taking place.
  2. Both Service Factory processes (which just run the state/event tables) and the processes at each state/event intersect can be stateless, spun up where needed and as often as needed. The model is fully scalable.
  3. Decomposition of a model, the base process for a Service Factory, is service-independent. It just parses the data model and gets the proper service element to use in state/event processing.
  4. Activation, meaning deployment, like every other service lifecycle process, is simply the process associated with a specific state/event intersection. You could, in theory, specialize a process to one service, generalize it to a specific event, or whatever.
  5. The functional logic that’s defined in a model like NFV or ZTA is distributed through the state/event descriptions, not monolithic.

The “Service Factory” software element here is specialized to the decomposition of the model structure.  The rest of the software can be whatever works.  Every service model could have its own software processes at the state/event intersections.  Vendors or third parties could provide pre-packaged services complete with the necessary software.  Standardization is needed only at the level of the Service Factory and the mechanism for event connection to the proper services/processes.

Functionally, this structure could conform to the ETSI ZTA model, or the ETSI NFV E2E model, but it doesn’t impose an old software approach on a problem that should be solved with state-of-the-art thinking.  This is the approach I tested out in the first ExperiaSphere project, done in connection with the TMF’s Service Delivery Framework activity.  The lessons I learned in the implementation are the basis for the later ExperiaSphere framework presentations I referenced here.

My point in all of this is that this is the right way to do any network software, because networking is event-driven and because the implementation has to be resilient, scalable, and based on intent-modeled discrete elements, any implementation of which can serve the overall service needs equally.  This is what the ZTA people need to do now, what the NFV people should have done.  It’s probably too late for that second group, but not for the first, and because ZTA is so broadly critical I think the ETSI activity needs to take a second look at the architecture.

ETSI ZTA Architecture Shows Some Real Risks

In past blogs I’ve talked about abstract threats to the ETSI zero-touch automation (ZTA) project, but referencing one of the open documents, I want to talk here about the real threats that are now visible in the early documentation.  ETSI’s reference architecture for ZTA balances the new and old of standards, but I think one particular part of it biases the ZTA process toward the “old”, and in a way we already know can be fatal.

I want you to think for a moment about a network service in the hypothetical SDN/NFV future.  We would likely see a collection of cooperating features and devices, some appliances and some hosted software functions.  Service lifecycle management in this situation is a combination of order-driven activity from above and event responses from below.  Every customer, every service, has a corresponding implicit or explicit resource commitment associated with it, reaching across not only the primary network provider’s infrastructure but also probably the infrastructure of “federated” providers as well.

The reason this is important is that it almost guarantees that the only software architecture that’s going to work is one designed for event processing.  I also believe that when you tie in event processing, the modern notion of functional/microservice components that are inherently stateless for scalability, and the need to handle a very large number of concurrent services, you end up with a prescription for a model-driven architecture.  Event-driven, model-driven, systems are a collection of functions whose contextual handling of things is determined by model-hosted state/event relationships.

If you look at the referenced ETSI document, you can actually see a lot of this spelled out (in a very limited way) in Section 4, which is “Architecture Principles”.  The section calls for model-driven, scalable, intent-based implementations.  It even calls for stateless components.  The first problem is that even this section has contradictions.

Look at Principle 1: “A modular architecture defines logical functional components that interoperate with each other via interfaces. A modular architecture avoids monoliths and tight coupling [italics mine; I’ll come back to this], and consists of self-contained, loosely-coupled services, each with a confined scope.”  This seems fine on the surface but it’s not.

There is a logical contradiction between the first principle and some of the others, at least at the potential level, though that may not seem obvious.  The problem is that event-driven systems really don’t have interfaces between components, they have components activated by state/event tables in the data model.  Yes, there might be cases where a state/event-defined process had a series of components linked through conventional interfaces, but the important stuff is state/event-driven.  Coupling, “tight” or otherwise, components through interfaces can (and often does) create a “monolith”.

Where that really hits home is in the “reference architecture” shown as Figure 6.2-1.  This model is just the kind of thing that the ETSI NFV group created, calling it the “end-to-end” or functional model of NFV.  While “functional models” don’t purport to describe an implementation, what happens in many cases is that people take the model as a literal guide to the software structure.  That creates the very notion of “monoliths” and “tight coupling” that Principle 1 says must be avoided.  All the blocks in the diagram are monoliths.

Another figure, 6.5.4.2.1-1, has in my view even greater risk of misuse.  The figure describes the relationship between services and the management domain, and it again shows a task-oriented view rather than a state/event view.  In a real event-driven system, events drive processes directly so you don’t have the compartmentalized functional divisions the figure shows; functionality is the result of the process executions that events (via the state/event mappings in the service data model) trigger.  The notion that you get an event, analyze it in some way like an AI process, is a batch/transactional vision not an event vision.  Since this figure is supposed to represent real implementation relationships, it can’t even hide behind the notion that it’s just explaining a functional vision.

I’m also concerned about the notion of inter- and intra-domain fabrics.  An event-driven system built on intent modeling would always let each component of the service (modeled as an intent model) manage itself to its SLA and generate an event if it couldn’t.  That would be true within an administrative domain or between them, and the only thing that moves around in an event-driven system is the events themselves.  What’s a fabric supposed to be doing?  You don’t need to integrate a service model that’s defined by a data structure as a hierarchy of intent models.

I don’t want to understate the challenge that a standards group faces with this sort of thing.  Very few members are software architects, and in any event one of the greatest strengths of a data-model-coupled state/event-driven system is that its functionality is almost totally determined by the state/event relationships in the model and how they steer events to processes.  You can do a lot to define service lifecycle automation by just changing the relationships, but it’s very difficult to describe what exactly is being done, because it’s the state/event/process relationships and not the software that determine that.  How do you then describe the “functionality”?

The only sure solution I can see to the impasse we have here is for standards bodies to describe both the functional model (making it clear that’s not an implementation model) and the application architecture, which would be a description of the data model and how it steers events to processes based on per-element state/event relationships.  If both were presented, you could visualize the “what” and the “how” from independent models framed from the same requirements and have a pathway both to explain things and to implement them.

State/event systems are quite old.  I wrote a computer-to-computer protocol system for IBM 360 computers way back in the 1960s and used state/event logic.  Every protocol handler I ever did, or saw, used that principle.  The TMF embodied the notion of service data models coupling events to processes in its work a full decade ago.  It’s hard to believe that some vendor, some player who has hopes of being a kingpin in the carrier cloud, isn’t going to catch on to the right principle here and do something simple and yet having a profound impact on the whole ZTA space.  If that happens, somebody is going to catapult into a lead role in very little time, and the whole market dynamic of carrier cloud could change.

Modeling Pools of Resources for Carrier and Other Clouds

Virtualization is all about abstraction, and in most cases that means abstracting resources and building resource pools.  The ideal vision of “the cloud”, whether it’s a private cloud, a public cloud provider or carrier cloud, is one of a vast pool of resources that can be tapped as needed to provide the optimum in economy and performance. That pool is a hard thing to create in practice, and we’re just starting to see how many dimensions there are to the process.

A resource pool, a server pool, has to decompose into servers, so we can infer some of the requirements for resource-pool abstraction from the server side.  An abstraction of a server has to look like a real server, at least to the point that it has to run applications as a real server would.  Making that happen involves three interrelated things.  One is having a model of the abstract element itself, which we could call a “virtual machine” or a “container”.  The second is having a way of getting the application deployed into the abstraction, which we’d call “orchestration”, and the final one is getting the abstraction mapped to the real resource pool.  This can be a part of orchestration, or something outside it, including the so-called “infrastructure as code” or IaC.

Public and private cloud services all accept this approach and provide tools to handle the three interrelated things needed to make virtualized resource pools work.  However, those complications I opened with can make implementing the three necessary abstraction facets much more difficult.  The complications arise from managing a fundamental requirement for resource pools, which is resource equivalence.

A virtual machine that has to be handled differently depending on what “real” machine it’s mapped to is a heck of a lot less useful.  Differences in the quality of resources, meaning how they’ll perform for the task of hosting the abstraction, can mean that some applications may have to deploy not in the general resource pool, but in a kind of sub-pool.  As you start subdividing resource pools to accommodate specialized application needs, you start to lose the efficiency of resource utilization that pools bring.  If one server in the cloud is all you can deploy your app on, you don’t have a cloud any longer, just a server.

Resource equivalence spills into operations too.  If orchestration has to “know” about differences in deployment requirements, it makes orchestration more complicated.  It might be necessary to describe not one orchestration sequence, but one for each of the possible resource types.  Many of the older DevOps and scripting tools had this problem; change just a little in how something was hosted and the deployment broke.

There have been a number of approaches taken to hide a lack of complete resource equivalence from orchestration, so as to make practices more portable, but they fit into two main classes.  One is multi-level orchestration and the other is intermediary abstraction.

The NFV ISG’s Virtual Infrastructure Manager is an example of a kind of multi-level orchestration.  NFV’s primary Management and Orchestration (MANO) deals with a VIM, and the VIM then “orchestrates below” that level to accommodate differences in the resource pool.  The implementation of the VIM concept is a bit hazy now, and not all the options are workable.

A “single-VIM” model says that there is only one VIM, and it has to then handle all the differences in the resource pool.  That not only means dealing with differences in the cloud/container stack or hardware configuration of each real server, it also means dealing with the difference between the physical-network-function (PNF) version of a device like a firewall, and its virtual-network-function (VNF) version.  This packs a lot of diverse logic into a single software element, and it means that if there are licensing issues associated with the APIs needed to control a particular resource, there may be difficulties getting a tool that can actually manage everything.

The “multi-VIM” model resolves that problem, but by doing so creates another.  In this model, a given kind of infrastructure has its own VIM, perhaps one for each vendor or each class of device.  That means anyone who wants to sell infrastructure to an NFV-equipped operator would have to provide the VIM needed to manage it.  However, it also means that something “above” the VIM has to decode the service model to the point where the correct, specific, VIM is invoked when appropriate.  That might mean “nested VIMs” where one super-VIM did the initial screening and then ran the right “sub-VIM”.

Another mechanism to deal with the problem is to employ an intermediary abstraction.  OpenStack does this by defining, for Neutron’s networking support, a “plugin” interface that can be written to by vendors to allow their gear to bind with the rest of Neutron.  In Kubernetes, the networking feature itself is a kind of “plugin” that, as long as the network implementation matches the integration with Kubernetes, makes Kubernetes network-independent.

A more sophisticated and complete solution to the problem is to formalize this intermediate abstraction into that third map-to-infrastructure piece of the puzzle.  Apache Mesos does that by creating a functional layer running on each machine and exposing APIs that then let any kind of deployment orchestration to map to a common resource abstraction Mesos creates.

You might wonder how this maps to the notion of “intent-based” data center tools, and the answer is that it’s complicated.  I blogged recently about the overall problem of using the term “intent”, and one area where the problem is especially acute is in data center resource abstraction.  An article quoting the new head of marketing for Apstra, a company who’s championed “intent-based” networking, says it “allows a network operator to state a business intent for the network and then use software to automatically implement this intent take corrective actions if needed.”  That’s really more a definition of policy-based networking, which Cisco has always championed.

Real intent modeling could fit into the virtualization-and-resource-pool problem, at several levels, in fact.  As I noted above, all of virtualization involves abstraction, and it is perfectly possible to expand the notion of any abstraction to include a definition of all the interfaces and lifecycle stages.  Any interior or implementation of the abstraction would then be responsible for meeting those external properties, just like any intent-modeled implementation.

The missing piece in this, the piece that could connect “intent-based” stuff with virtualization and orchestration, is some solid reference abstractions.  We know what the abstraction for a server is—a VM or container.  What’s the abstraction that represents a useful element of a resource pool?  In most orchestration tools, we’d call a cohesive set of resources that kind of operates in harmony a “cluster”.  Thus, it’s possible that intent-based data centers could define clusters as abstractions, and provide a way of mapping them to cluster-based orchestration (Kubernetes) or cluster-infrastructure mappings like Mesos.

This, I think, would be a very useful exercise, something that bodies like ETSI should have taken a look at.  A uniform intent-modeled cluster management abstraction would be a huge benefit in any sort of cloud, and it could be critically important in carrier cloud.  It would open the very important conversation about how to measure “resource equivalence” when we have differences in geography, connectivity, and regulations to deal with.  As I’ve pointed out in my ExperiaSphere stuff, it would also be the optimum way of connecting services that depend on specific resource behaviors with the places where that behavior can be provided.

The modeling of resources is just another of those areas where we’re not thinking deeply enough about the technology requirements of the concepts we’re proposing.  Without more foresight, it’s difficult to ensure that we take the optimum path toward implementing some of our most critical technology shifts, and without optimality the long-term future of those technologies is threatened.

Is Tech Wilting Under Unrealistic Expectations?

There was a thoughtful piece on NFV in Light Reading last week, and it raises a lot of points that are becoming important as carrier cloud opportunity awareness grows in the industry.  The big point the article makes is one of unrealistic expectations, and surely in our hype-driven tech world, that’s a problem.  I do think that there’s a deeper problem lurking behind the hype screen, though.

There isn’t a single technology in our industry that doesn’t have unrealistic expectations associated with it.  I used to joke that in the media, everything had to be the single-handed savior of western civilization or the last bastion of international communism.  Anything less extreme was too hard to write about.  When something new like NFV comes along, it gets the full “savior” treatment, so we’ve surely over-hyped NFV, and so I totally agree with the article’s conclusion in that sense.

The key part of the article opens with a comment that service lifecycle automation is becoming a big part of nearly all the network revolutions, then says “Still, I am hoping we don’t make the same mistake with automation as we did with NFV and saddle it with unrealistic implementation objectives. What I mean is the theme that NFV has not lived up to expectations is still making the rounds based on carrier implementation frustrations.”

This is where I think we have to transition from “expectations” in a hype sense to “expectations” in a goal sense.  NFV is like any technology; it depends on a business case for what it proposes to do.  There’s a lot wrong with living up to hype (like, it’s impossible), but living up to the goals set for a technology is never unrealistic.  Much of the hype surrounding NFV was never linked to any real business case, any specific goal of the NFV ISG.  However, the NFV ISG has to propose a technology evolution that meets some business case, and there I think there’s a problem that goes beyond over-hyping.

I totally agree that operators are frustrated with their NFV experience, but in my view the problem isn’t unrealistic implementation objectives in the “goals” sense.  The fact is that NFV didn’t have enough implementation objectives, which is why service lifecycle automation is now seen as so critical.

The very first US meeting of the NFV ISG was the spring of 2013, and I attended that meeting.  I fairly often on what I saw were important issues, but the one I was most concerned about was the lack of true “end-to-end” vision.  Service management is critical to achieving any kind of operations efficiency, and the NFV ISG decided to rule it, and in fact management of anything other than VNFs, out of scope.  That was the single worst decision in the whole NFV process because it disconnected NFV from the service lifecycle automation framework that was essential not only to improve opex, but to ensure NFV complexity didn’t end up increasing costs more than saving them.

NFV also has a problem architecturally.  The original architecture, a functional end-to-end description, encouraged a literal interpretation in early implementations, and that resulted in what was almost a batch system, a big software monster that was not only very complex but very brittle.  Limited opportunity to define new services or functions through a simple data model makes the architecture unrealistic.  We’ve proposed to build NFV management and orchestration processes like we built OSS/BSS systems a couple decades ago, and that’s the wrong approach (the right one is a microservice-based, model-driven, state/event system).

That raises the next point in the piece I’d like to look at references the view that the issues with NFV are growing pains related to business practices and pricing.  “While this is not trivial, these growing pains indicate that NFV has achieved the technology maturity to be commercially deployed on a massive scale.”  Here I have to disagree totally.

First, absent a compellingly capable service lifecycle automation framework, it’s difficult to see how NFV could really deploy in any volume.  By the fall of 2013, six of the operators who originally launched the NFV ISG told me at a meeting that the original capex-reduction justification of NFV could not work because the cost reduction wasn’t enough.  “If I want a 25% reduction in capex, I’ll just beat Huawei up on price” was the statement (Huawei was there at the time, by the way).

Architecture is the second problem.  Because the way NFV has been implemented, you almost have to try to impose a data-driven, state/event model onto an implementation that really focused in a different direction.  Nobody likes to go back and do something over, but we’re faced with whether to do NFV right, optimally, or have it fall short.

I think the view that there are business practice and pricing problems associated with NFV stems from the dogmatic clinging to the capex justification, because there’s no solution to the opex reduction challenge and because open integration doesn’t fall out of the NFV implementations.  There are a lot of people who seem to think that the market has to accommodate the technology choices.

Sure, if we could get vendors to give away software and support white-box devices with free integration, we could make it easier to substitute commodity technology for proprietary appliances.  But is it realistic for operators, who got into NFV to cut their spending on vendor technology, to then expect vendors to suck it up and make NFV work at their own financial expense?  Technology has to accommodate, has to optimize for, the market.

Early decisions were short-sighted in that area too.  I told the NFV ISG several times that if they wanted virtual network functions to be truly inexpensive, they needed to focus on open-source software as the basis for VNFs.  That would not only make VNFs “free” in open-source form, it would put pricing pressure on vendors who wanted to promote a proprietary version of VNFs.  I also recommended that there be a “class” and “inheritance” modeling of VNFs as a part of an intent-modeled, data-driven approach.  That way all “firewalls”, for example, could be deployed in the same way, further increasing competition.  I don’t think this would have solved the problem that the lack of operations automation created for NFV, but it would have helped.

What I think is the closing summary statement is “And let’s be honest, NFV, SDN and the cloud fabric they deliver are the future. Without this fabric, 5G wouldn’t be achievable in a few years. It’s unlikely the telecom industry would have designed and standardized a 5G next-gen core (NGC) and new radio (NR) in only two years without the foundations of NFV and SDN.”  It is absolutely true that carrier cloud is the future.  It is similarly true that SDN will play a role in connecting the data center elements of carrier cloud.  It is not true, at least based on my modeling, that NFV has any real role in carrier cloud deployment.  It’s a follower technology, something that could exploit carrier cloud if it were available, but not something that could justify it.

My model says that NFV could never hope to justify carrier cloud, even if you assume that somehow it could be linked into 5G deployment as some propose.  NFV, for its own mysterious reasons, probably linked to taking an easily visualized path, focused almost from the start on the application of virtual CPE (vCPE).  It is very difficult to make vCPE useful beyond enterprise sites, because the price pressure of readily available small-site and home technology.  I can buy a superb WiFi router for a home or branch office for less than $300, and it would almost certainly last for five years at least.  That’s $60 per year, or five dollars a month.  Does anyone think we could deliver a vCPE solution for that?  And we need WiFi on premises, not hosted in some remote cloud location, so the costliest element of that small-office/home device can’t be virtualized.

I agree that 5G could be a driver for NFV, but here again we’ve got two problems.  First, 5G is a technology that needs a justifying business case in itself.  We have no industry obligation to adopt a shred of it.  Second we focused on the wrong kind of VNF for 5G.  vCPE is a per-customer, per-site, technology.  Anything that goes in a 5G network is going to be explicitly multi-user in nature.  You won’t deploy a VNF to make a call, but where’s the thinking about how multi-user elements of a service could be deployed and sustained with NFV?  Would a “VNF” for 5G really be a VNF at all, or would it be simply a cloud application that supports multiple users like a web server does?  Surely the latter.

I completely agree with the sense of this piece; we’ve over-hyped NFV and expected too much from NFV and we’re at risk to expecting too much from service lifecycle automation too.  However, I don’t think the problem is the goal as much as the mechanism.  Everything is going to be over-hyped in one sense, and we have to live with the reality of our industry.  Everything still has to make a rational business case for itself.  We fell short on NFV not because we had unrealistic, meaning unachievable, expectations, but because we failed to design an approach that was capable of achieving even the right and reasonable business case expectations.  Zero-touch service lifecycle automation and 5G have exactly the same problem, and the article is dead on in saying we need to fear that outcome.

What’s the real future of NFV?  I think that like SDN, which will deploy significantly but not in the form that was originally defined by the ONF, we’ll see most “NFV” stuff out there has nothing to do with the ETSI specs.  That’s not necessarily a bad thing; the market makes its own decisions after all.  I do think that doing things right from the first would have ended up taking us to a better place, and with less effort overall, than we’ll eventually get to in NFV.  Clearly zero-touch service lifecycle automation is going to follow the same path, and I’ve said often that 5G in NSA or millimeter FTTN hybrid form will be the 5G most of us see.  Maybe these exercises in futility are necessary, but they just seem so wasteful.

We have been assuming that technologies are self-justifying and that everyone needs to get with the program and somehow make them work.  Not true.  Technologies aren’t even useful if they don’t present an optimum path to a set of goals that all the stakeholders can get behind, at least to the extent needed to play their role in the process.  We didn’t get that with NFV, we don’t seem (in my view) to be getting that with zero-touch service lifecycle automation, and we’re probably not getting it with 5G either.  There’s still time to make things better, but the lesson we need to learn is that optimality has to be designed in from the first.

VMware is Demonstrating Awareness of Carrier Cloud

VMware has been in the news this week, both for its plans to make its Virtual Cloud Network “real” and for the acquisition of the Dell EMC Service Assurance Suite’s technology group.  The two moves, as I’ve noted in an earlier blog, seem directed primarily at the network operator, managed service provider, and cloud provider spaces.  As I blogged recently, that could put VMware in the path of Cisco’s enterprise-centric SD-WAN initiative, but it also poses some real challenges and risks for VMware itself.

The Virtual Cloud Network concept is a good one, as I’ve said.  The goal is to create a unified architecture to address two specific problem sets that are arising out of the massive shift toward virtualization in IT.  The first problem is the segmentation of networking at a logical level to both separate tenants/applications and to allow application networks to be hosted in any number of independent places without impacting addressing.  The second is the creation of a virtual user network, a VPN if you like, that can address virtual resources as well as real ones, and that can exercise access and QoE control over selected connections.

This seems, on the surface, to be something enterprises would love as much as operators or MSPs.  The first question that the Virtual Cloud Network initiatives raise is why, then, the positioning seems so provider-centric.  Are we seeing an accidental collision of two different marketing goals, or a convergence of prior strategies into one unified push?

VMware has been a bit of a kid-in-the-candy-store-window regarding carrier cloud.  When the NFV initiative got going in late 2012, it was active in the ETSI NFV Industry Specification Group (NFV ISG) but it always seemed to me to be a kind of eager outsider trying to break into the big leagues.  NFV is hardly a touchstone for even carrier cloud today (my model says it will account for less than 6% of the driver influence in the next three years), much less for virtualization overall.  Yet VMware talks about NFV in its current Virtual Cloud Network and network operator positioning.

The second question is how symbiotic the solutions to those two problems really are.  Data center application and tenant segmentation, or hypersegmentation, or whatever you like to call it, is something that’s very real to cloud providers and to some enterprises, but so far the ETSI NFV ISG has been pretty slow in even thinking about the network model of carrier cloud infrastructure.  Does VMware hope to change that?  I tried for several years, without success, but perhaps they’re a better influence.

Virtualization in the WAN, via SD-WAN, is today focusing primarily on simply extending the corporate VPN to locations that can’t get or can’t afford MPLS, and in a few cases providing a backup for MPLS VPN connectivity in at least some sites.  What I’ve been calling “logical networking” or “network-as-a-service” (NaaS) is a logical evolution of SD-WAN capabilities that has some support in vendors today.  It could provide enhanced connection control, security, and prioritization but it requires a knowledge of user/application identity not (currently) provided in the Velocloud product that VMware acquired.

It’s tempting to say that a combined solution to both problems would be compelling, but I struggle to justify the view.  Logical networking requires that application interfaces exposed to the outside be controlled by the same VPN structure that controls user connections, or it can’t recognize, secure, and prioritize traffic appropriately.  Application segmentation is primarily associated with building application subnets, which are largely made up of components that are not exposed to users or other applications.  SD-WAN today already has many examples of cloud network connectivity that provides control of the exposed interfaces.  What’s the need to integrate the solution to our two problems, then?

Another problem with integrated positioning is that data center networking and “application” or “wide-area” networking are typically sold to different constituencies, even in the provider market.  VMware’s major software-defined competitor, Nokia/Nuage, has separate SD-WAN and SDN offerings, in contrast to VMware’s evolving unified positioning under one umbrella.  Nokia sells almost exclusively to providers, and surely know the market very well, so their decision on positioning has to be considered an indicator of the best path.

There’s a flip side to all of this, of course.

On the positioning side, carrier cloud is going to be the largest source of new data centers through 2030, and so anyone who wants to sell data center networking is obliged to take it seriously.  Furthermore, while NFV may be a failure as a driver for carrier cloud (even in the long term), it does still command some interest among operators.  The true drivers of carrier cloud are a lot more complicated to sell than NFV in any event.  Finally, operators who want to sell network and cloud services to the enterprise certainly aren’t hurt by using a software-defined vendor who has nice enterprise positioning too.

On the technology side, I’ve already seen in surveys that VPN, SD-WAN, or NaaS prospects aren’t particularly clear about what the relationship between data center virtual networks and virtual WANs should be.  They don’t see a linkage as being critical, but they don’t really understand how cloud/virtual-network connectivity works in the WAN or data center anyway.  They surely don’t think the integration is a bad thing, and they could probably be sold on the notion that it’s really very good.  The difference between the VMware-is-right and VMware-is-spinning-its-wheels positions here is a combination of two things.

First, the “right” track has to include a real shift by VMware toward logical networking and NaaS as the future of networking.  This means they have to implement identity-based connection control on both WAN and data center, and integrate the latter into VM and container networking.  It also demands VMware collateralize the way they see the future of networking in a virtual world.

The second requirement is that VMware get a lot more sophisticated in their thinking about carrier cloud.  NFV is not going to carry them to the golden future of provider success against the formidable competition of both broad IP players like Cisco and specialized network-operator specialists like Nokia.  NFV has carrier-cloud driver strength that’s at the very bottom of the list, a third of the next-lowest driver.  Even 5G is more credible in the near term, and VMware has no horse in the 5G race.  They need to get very smart, very fast.

Either a technology shift to logical networking or a major positioning shift for carrier cloud would be a formidable challenge for a vendor.  Doing both at the same time would require a level of planning and execution rarely mustered these days.  It’s far more likely that VMware will talk a good game here, and then (like so many others) take the easy way out on the execution.  That, if they do it, could put their whole operation at risk.

VMware’s big asset has been the combination of the data center incumbency of VMware virtualization software, and NSX as application SDN.  We are now seeing a bit of a changing of the guard in application virtualization, a shift toward containers.  Kubernetes has the ability to support network plugins, and VMware has provided one, but being one of a field of open solutions is a lot more precarious a position than being the incumbent.  This raises what I think is the real driver behind Virtual Cloud Networking.  VMware needs a new strategy because we’re in a new age of IT and networking, and its strengths lie in the past practices.

Whatever happens with VMware’s technology and positioning choices, their movement is a sign that more and more vendors are waking up to carrier cloud opportunity, but just as clear that vendors are still fumbling with the best approach.  The networking market has, for decades, been one based on static technical requirements and price-driven expansion.  We’ve come to the end of that, and so everyone needs to be thinking about what new demand sources are out there.  In the end, the vendors who get that assessment right will be the ones who prosper, because you can’t make the right technology and positioning choices based on the wrong market presumptions.

What Application Changes will Drive the Cloud and Network?

Maybe you believe in edge computing, or in IoT or AI, or maybe carrier cloud, or all of the above.  The main thing is you believe the future requirements for applications and hosting will be different, and so the data centers and servers of the future will be different.  The obvious question is what these differences will be, and how might they impact vendors and even software.

In any given economy, the traditional business applications tend to grow in their need for processing power and information movement capability at a rate proportional to GDP growth.  Transaction volumes relate directly to the business exchanges associated with commerce, which map pretty well to overall economic growth.  Furthermore, these transactional exchanges are little different from those of the past, so the future of IT would be simple evolutionary growth if we assumed that future data processing was purely transactional.

What IoT and AI represent are a non-transactional paradigm first and foremost, and these are actually only individual drivers of the broader paradigm of event-based computing.  What event-based computing does is get IT processes closely coupled to the real world, via things like sensors (IoT) or a form of analytics (AI).  The idea is that if computer systems could respond to real-world conditions they could provide direct support for a lot more than business transactions.  Worker productivity might be enhanced significantly, we could introduce automation to things like driving, and we could contextualize user/worker requests to provide insight without so much work predefining search criteria or asking questions.

Transactions of the kind we’ve had for decades are “pre-contextualized” and “pre-packaged”.  The originator of the transaction has decided, and has framed that decision in the form of a well-understood sequence like look-update-check.  Future computing seeks to involve IT earlier in that sequence, which means that the things that the worker used to make the decision that launched a transaction are instead used by software to help, or do, the heavy lifting.

What, then, can we say about the event-based future, in terms of requirements?  I think there are ## specific differences that will evolve, if IT and platforms are really going to change.

The first is that the future will be more network-intensive than the present.  If you think of an application like opening a gate, the process of making that happen if the driver of a truck asks it to be done, versus if software analyzes the contextual relationship between truck and gate and makes the decision, is a shift in network involvement.  The former is a simple transaction, and the latter is the gathering of information on truck position, gate position, relative speed, authorization/expectation, and a bunch of other things.

The key phrase here is “network-intensive”, which doesn’t necessarily mean traffic-intensive.  Events typically don’t carry a lot of data, and so a large volume of events doesn’t necessarily mean a lot of traffic.  In fact, M2M exchanges that don’t have to worry about human readability may actually have less data associated with them than a similar “human” exchange.  It also doesn’t necessarily mean “low-latency”, but it does mean that there is a latency budget associated with a series of events, based on the human tolerance for delay and the impact of total delay on the system that’s being controlled.

The speed of light in fiber means that a signal would move 127 miles in a millisecond and cross the continent in about 20 milliseconds.  In that same 20 milliseconds, a person walking would cover about an inch.  A vehicle at 60 mph would travel about 21 inches, and a jet at 600 mph would move almost 18 feet.  The importance of network latency is obviously dependent on what you’re expecting to be happening during the delay.  However, network latency is almost surely not the major factor in the latency budget for an application, and that raises our next impact point.

In order to manage latency budgets, future applications will have to be designed for event anticipation and hierarchical event processing.  Getting a record from a large shared RDBMS might well take 20 milliseconds or more in itself.  Paging in memory could take that long too, and according to users who’ve shared their functional/lambda testing results, spinning up a serverless app in the cloud would take much longer than that.  The point is that there’s no point to thinking about the latency difference between a server 10 miles from the point of activity and one a thousand miles away, when the application itself will spend much longer than that processing the event.

Event anticipation is the process of getting ready for an event, having the compute resources available and waiting, and having the necessary data in place instead of having to go get it.  In order to do event anticipation, you need to have some understanding of the flow of the system you’re processing events for, and that means a mechanism for hierarchical process modeling.

Say you have an event-driven system to open a gate.  If you visualize this as a transactional system, the driver pulls up to a keypad and enters a code that then authorizes the gate to open.  Lots of wasted time, so suppose you put an RFID tag in the truck and had it read at that same kiosk where the code entry was done?   The problem is that the driver still has to slow down or the gate wouldn’t open fast enough.  The right approach would be to have the ID read perhaps down the road a hundred yards, at a point where the facility was being entered and not at the gate point.  The authorization to open the gate (or the denial) could then be pre-staged to the gate point, and if entry was authorized the gate would open.

You can also use hierarchical modeling to do complex event processing (CEP) behind a primary event handler.  You get Sensor A input, and while you’re doing the immediate processing needed for that you also send it along to a CEP element, that decides that the next event will probably be from Sensors C or D.  The necessary information for those events can then be pre-positioned and the resources prepared.  All this does more for latency than running the processes close to the events versus far away.

This doesn’t mean you don’t need edge computing, at least in some situations.  The problem that edge computing solves isn’t event latency but event volume, which is our third point.  Haul events from the geography of a telephone exchange to a central-office data center, and you have a population in the tens of thousands.  Haul it from a metro area and you have a thousand times that population, which means a thousand times (or more) the number of events.  Can you process that level of event density with enough efficiency to maintain your event budget?

Edge computing has another benefit, which is to allow for a difference in architecture between “edge” and “core” data centers.  Toward the edge, things are biased toward event-episodic things, and toward the core toward more traditional database and analytical things, with an emphasis on process latency control.  You might want to employ AI in a different context in each place and in between.  You’re very network-centric at the edge, more database-centric in the core.  You’re doing almost embedded-systems applications at the edge and traditional data center work inward.

The primary issue with event processing that crosses everything else is that you need more memory on these systems.  You can’t afford to do a lot of paging or you create latency issues, especially if you have limited I/O speeds on the systems.  The deeper you go, the more memory you need because there will always be more processes going when there are more users, and more users are inevitable with deeper data center positioning.

With the memory issue comes efficient task management.  One of the problems with cloud/serverless today is the latencies that develop from the scheduling and loading of the functions/lambdas.  If event processing is to reach its full potential, then we need to be able to run things on demand.  I think the problems here arise from the fact that the media latched onto the “serverless” model as being the important one rather than the “functional” model and as a result, cloud providers are reluctant to do anything (like scheduling persistence) that would undermine that vision.

Obviously, all of this is complex, not even considering the question of how any major change in network or IT paradigm gets funded.  Event-based services are difficult to conceptualize lacking any model infrastructure in place on which they can deploy.  That infrastructure is difficult to deploy lacking any confidence in a profitable mission, and lacking any real sense of the architecture of these futuristic IT systems.

Perhaps the focus on “edge computing” or “AI” or “IoT” or even “5G” is a trivialization of the real issues, driven by the simple fact that the real issues are too complex for our current buyer/seller/media symbiosis to deal with.  It’s the mission that ultimately sets the requirements for the technology, not the other way around.

There’s Intent, Then There’s Intent

I blogged quite a while ago about “intent-washing”, the tendency to use the terms of intent models and modeling where that’s not really the story being told.  Like all “washing”, intent-washing is driven by a desire to ride a positive news wave even when you’re not really doing exactly what the term means.  Cisco is one of the most recent intent-washers with its “intent-based networking” but they’re far from the first or only of our washers.

Part of the problem here is that “intent” is perhaps a poor term to have chosen to describe the original model concept.  As I’ve noted in several blogs, intent modeling is a form of function abstraction where a network or application is modeled as a series of interdependent functions, each of which are known by the properties (the features and interfaces) they expose while the specific implementation is hidden.  Presumably these functions and properties are the “intent”, but intent in a strict sense is “intention or purpose”, and thus doesn’t convey the notion of a whole being assembled from modeled parts based on functionality.  I intended to mow the lawn, so is it (or the mower) an intent model?  The intent of a network is to connect, so does that make all networks that support what the user intends “intent-based?”

Cisco’s take on this is embodied in the introduction to the referenced Cisco document: “Intent-based networking is the difference between a network that needs continuous attention and one that simply understands what your organization needs and makes it happen. It’s the difference between doing thousands of tasks manually and having an automated system that helps you focus on business goals.”  They also say “It allows you to continuously align your network to your organization’s needs.”  From an implementation perspective, it’s a feature or property of Cisco’s Digital Network Architecture (DNA).

DNA, in my view, is best visualized as a distributed-policy system for controlling network behavior.  There’s nothing wrong with that approach, in my view; certainly centralized policy control with local enforcement is a very reasonable way of achieving “software-defined” networking.  It has plusses and minuses with regard to the other approach that’s gaining popularity—the overlay SDN model—but it’s easier on current infrastructure investment and offers a potential for deep service control integration.  You can say that “intent” sets policies, which then define network behaviors.

Is this enough to make it “intent-based”, though?  What I don’t see from Cisco is functional modeling according to intent-model principles, or model-driven service lifecycle management.  Those are the things that I think really define intent-based networks, or at least what differentiates them from implementations that seem to say that you’re intent-based if you recognize intentions.

You might wonder whether I’m splitting hairs here, or whether I’m just picking on Cisco, but neither is the case.  I believe firmly that a data-driven model of service lifecycle management, meaning service automation, is absolutely critical.  Without that, I do not believe that we’ll make significant strides in dealing with the additional complexity that virtualization necessarily brings to all things networking.  A data-driven model is a way of describing intent-based functionality, and by this I mean real intent-based functionality.

This intent-washing thing is also demonstrating a problem we have in the industry, which is the growing gulf between the technologists who are bringing us new and critical concepts, and the whole process of buyer education and enlightenment, from the media/analyst community right down to the person who signs the checks.  People can’t adopt what they don’t understand, and it’s fair to say that the vision I’ve advocated for service management and intent modeling isn’t one that’s easily understood.  That lends itself to casual omission of the details that are critically important.

Education in the details isn’t something that vendors are happy to help users with overall.  I’ve noted as recently as yesterday’s blog that the larger and more successful network vendors are incentivized to stay the course with technology based on the reasonable argument that’s where they’re winning.  Why promote revolution if you’re the one in power?  This is another reason why “washing” is popular and at the same time destructive.  If you think you’re getting the future technology from your vendor because you see the terms you recognize, and if you can’t understand the details, then you are lulled into believing you’re ready for what’s to come.  Maybe you aren’t.

It’s not terribly difficult to create real intent-modeled networks.  We have standards like TOSCA that can describe them fully as intent models.  We have guidance from people like the TMF in how to use intent-modeled data structures to create event-driven implementations of service lifecycle management that are both resilient and fully distributable.  We know how to build current technology into these futuristic models.  Or, perhaps I should admit, some know those things.  You can get a better look at “real intent” in a network-centric model-driven form (TOSCA) from Cloudify or Ubicity. 

Another truly interesting source for a more general software vision is Avi Networks.  Avi has what’s essentially the same policy-based, intent-driven, approach as Cisco but it also has all the software elements needed to deploy network applications in an event-driven form.  Avi defines their progression from the perspective of a “declarative model” but they don’t provide much detail on what the model looks like.  If they had that, I’d say they were a formula for successful building of generalized intent-based software, not just network software.

The full execution framework is important.  My early projects in networking were state/event oriented but were based on the architecture of the time, which was a monolithic application and an event queue.  Avi takes event processing to a microservice level, and includes the ability to scale software and load-balancing to millions of transactions per second.  I’d like to see Avi get more detailed on its declarative modeling and state control.  I’d sure like to see somebody build a service lifecycle automation system based on the Avi platform’s principles or the platform itself , because (if they have the right modeling) I think they have all the pieces to do what’s needed.

One reason I’m bringing Avi up here is that Cisco is an investor in the company.  Perhaps we can expect Cisco to do a management system using their technology, but Avi’s also a partner with Nokia/Nuage, and they could learn some techniques from Avi’s approach too.  Somebody will have to build model-driven networking around Avi, rather than use a more architected strategy like Cloudify’s or Ubicity’s, but I think the result would be very interesting.  Cisco, Nokia/Nuage?  Want to start a service automation arms race?  Here’s your chance!