Taking the Carrier Cloud Beyond CORD and the Central Office

CORD, the new darling of telco transformation using open source, is a great concept and one I’ve supported for ages.  I think it’s a necessary condition for effective transformation, but it’s not a sufficient condition.  There are two other things we need to look at.  The first is what makes up the other carrier-cloud data centers, and the second is what data-center-like central offices are being driven by.

If we re-architect all the world’s data centers into a CORD model, we’d end up with about 50,000 new carrier-cloud hosting points.  If we added all the edge offices associated with mobile transformation, we’d get up to about 65,000.  My latest model says we could get to about 102,000 carrier cloud data centers, so it’s clear that fully a third of carrier-cloud data centers aren’t described by central office evolution.  We need to describe them somehow or we’re leaving a big hole in the story.

An even bigger hole results if we make the classic mistake technology proponents have made for at least twenty years; focus on what changes and not why.  The reason why COs would transform to a CORD model is that service focuses on hosting things and not on connecting things.  The idea that this hosting results because we’ve transformed connection services from appliance-based to software-based is specious.  We’ve made no progress in creating a business justification for that kind of total-infrastructure evolution, nor will we.  The question, then, is what does create the hosting.

Let’s start (as I like to do) at the top.  I think most thinkers in the network operator space agree that the future of services is created by the thing that made the past model obsolete—the OTT services.  Connection services have been commoditized by a combination of the Internet pricing model (all you can eat, bill and keep) and the consumerization of data services.  Mobile services are accelerating the trends those initial factors created.

A mobile consumer is someone who integrates network-delivered experiences into their everyday life, and in fact increasingly drives their everyday life from mobile-delivered experiences.  All you have to do is walk down a city street or visit any public place, and you see people glued to their phones.  We can already see how mobile video is changing how video is consumed, devaluing the scheduled broadcast channelized TV model of half-hour shows.  You can’t fit that sort of thing into a mobile-driven lifestyle.

One thing this has already done is undermine the sacred triple-play model.  Video delivery has fallen to the point where operators like AT&T and Verizon are seeing major issues.  AT&T has moved from an early, ambitious, and unrealistic notion of universal IPTV to a current view that they’ll probably deliver only via mobile and satellite in the long term.  Verizon is seeing its FiOS TV customers rushing to adopt the package plan that has the lowest possible cost, eroding their revenues with each contract renewal.

Mobile users demand contextual services, because they’ve elected to make their device a partner in their lives.  Contextual services are services that recognize where you are and what you’re doing, and by exploiting that knowledge make themselves relevant.  Relevancy is what differentiates mobile services, what drives ARPU, and what reduces churn.  It’s not “agility” that builds revenue, it’s having something you can approach in an agile way.  Contextual services are that thing.

There are two primary aspects of “context”, geographic and the other is social.  We have some notion of both of these contextual aspects today, with geographic location of users being communicated from GPS and social context by the applications and relationships we’re using at any given moment.  We also have applications that exploit the context we have, but mining of social context from social networks, searches, and so forth, and expanding geographic context by inserting a notion of mission and integrating location with social relationships will add the essential dimension.  IoT and the next generation of social network features will come out of this.

And it’s these things that operators have to support, so the question is “How?”  We have to envision an architecture, and what I propose we look at is the notion of process caching.  We already know that content is cached, and it seems to follow that applications that have to “know” about user social and location context would be staged far enough forward toward the (CORD-enabled) CO that the control loop is reasonable.  If you like things like self-drive cars, they require short control loops so you stage them close.  Things moving at walking speed can deal with longer delay, and so forth.

The second-tier offices, the stuff beyond the two-thirds of cloud data centers that are essentially edge-located, would represent second-tier process cache points, numbering about 25,000, and metro-area repositories (about 4,000 globally).  From there we have roughly 7,000 deeper specialized information cache points and places where analytics are run, which gets us up to the 102,000 cloud data centers in the model.

All of the edge office points would have to be homed to all of the second-tier repositories in their metro area, and my model says you home directly to three and you have transit connectivity to them all.  The metro points would connect to a global network designed for low latency, and these would also connect the specialized data centers.  This is basically how Google’s cloud is structured.

From a software structure, it’s my view that you start with the notion of an agent process that could be inside a device or hosted on an edge-cloud.  This process draws on the information/contextual resources and then frames things for both queries into the resource pool (“How do I get to…”) and for responses to the device users.  These agent processes could be multi-threaded with user-specific context, or they could be dedicated to users—it depends on the nature of the user and thus the demands placed on the agent.

This same thing is true for deeper processes.  You would probably handle lightweight stuff in a web-like way—multiple users accessing a RESTful resource.  These could be located fairly centrally.  When you start to see more demand, you push processes forward, which means first that there are more of them, and second that they are closer to users.

The big question to be addressed here isn’t the server architecture, but how the software framework works to cache processes.  Normal caching of content is handled through the DNS, and at least some of that mechanism could work here, but one interesting truth is that DNS processing takes a fairly long time if you have multiple hierarchical layers as you do in content delivery.  That’s out of place in applications where you’re moving the process to reduce delay.  It may be that we can still use DNS mechanisms, but we have to handle cache poisoning and pushing updates differently.

There is a rather obvious question that comes out of the distribution of carrier cloud data centers.  Can we start with a few regional centers and then gradually push applications outward toward the edge as economics dictates?  That’s a tough one.  Some applications could in fact be centrally hosted and then distributed as they catch on and earn revenue, but without edge hosting I think the carrier cloud is going to be impossible to differentiate versus the cloud providers like Google and Amazon.  Operators are already behind in experience-based applications, and they can’t afford to adopt an approach that widens the gap.

A less obvious problem is how revenue is earned.  Everyone expects Internet experiences to be free, and what that really means is that they’d be ad-sponsored.  The first issue there is that ads are inappropriate with many contextual applications—self-driving cars comes to mind, but any corporate point-of-activity empowerment application would also not lend itself to ad sponsorship.  The second issue is that the global advertising budget is well under a fifth of total operator revenues globally.  We have to pick stuff people are willing to directly pay for to make all this work, and that may be the knottiest problem of all.

Comcast Joins ONOS/CORD: Why We Should Care a Lot

Comcast just joined the ONOS project, and I think that raises an important question about SDN, NFV, and the whole top-down or bottom-up model of transformation.  Couple that with the obvious fact that you read less about SDN and NFV these days, and you get a clear signal that something important might be happening.  Several things are, in fact.

For those who haven’t followed the ONOS and CORE projects (I blogged on the concept here), they’re a software-centric model for service provider evolution that presumes the future will be created by software hosted on generic virtualized servers.  CORE is the “Central Office Re-architected as a Data center”, and it’s a conceptual architecture that has been realized through the Open Network Operating System (ONOS).  What I liked about this approach is that it’s a kind of top-down vision-driven way of approaching transformation.  Your goal is to make your CO more data-center-centric, right?  Then CORD is clearly applicable.  Apparently to Comcast too, which raises a broad and a narrow point.

The broad point is the “why” of CORD overall.  Why is the CO being architected as a data center interesting?  Clearly, the superficial reason is that’s what people think they’re going to do, and the deeper question is why they think that.

There isn’t a single network market segment today that’s not already seeing “bit commoditization”.  Bandwidth isn’t intrinsically valuable to consumers or businesses—it’s a resource they can harness to do something that is valuable.  I’ve talked for years about the problem of the convergence of the price-per-bit and cost-per-bit curves.  The key now is to forget causes for the moment and focus on the facts:  This is happening everywhere and this is never going to reverse itself.  Transport, meaning connecting bandwidth between points, is not a growth market.  We all sort of know this because we all use the Internet for what’s there, not how we get it.

Which means, of course, that the profit of the future lies in providing stuff that people want to get to, not the means of getting to it.  OTT stuff, by definition, is the way of the future as much as transport is not.  The “top” that it’s “over” is transport networking.  So, what is an OTT’s “central office?”  Answer: A data center.  Google has a bunch of SDN layers, but they’re not to provide SDN services, they’re to connect Google data centers, and link the result onward to users, advertisers, and so forth.  In this light, CORD is just a response to a very clear market trend.

It’s a response in a terminological way, for sure, but how about realism?  Realistically, CORD is about virtualization at a level above what SDN and NFV describe.  You virtualize access, for example, in CORD.  You don’t get specific about how that’s done yet.  CORD also has an orchestration concept, and that concept is aimed at the same higher level, meaning that it’s above things like SDN and NFV.  But even higher than that is the simple truth that there are real network devices in data centers.  CORD isn’t trying to get rid of them, it’s trying to harness them in a way that submits to software automation of the lifecycle processes of the resources and the services built on them.

If I take a CORD-like approach to transformation, I might say to an operator CFO “Focus your transformation investment on software automation of service lifecycles and in the first year you can expect to obtain opex savings of about 2 cents per revenue dollar.  Focus the same on SDN/NFV transformation of infrastructure and in Year One your savings will be one tenth of that.  To achieve even that, you’ll spend 30 times as much.”  Even in 2020, the CORD-like approach would save more opex than SDN/NFV transformation would, and with the same efficiency of investment.

Which brings us to Comcast.  Do they advertise the beauty of DOCSIS and the elegance of a CATV cable, or do they push Xfinity, which is a platform, meaning it’s hosted software?  Even if you go to Comcast for Internet, you’re not going there for SDN or NFV.  You’d not see that level of transformation, but you already see (if you’re a Comcast customer) the transformation to a data-center-centric service vision.

Which raises the most interesting point of all.  If the transformation future of network operators is to look more like OTTs in their service formulation, and if re-architecting their COs to look like data centers is the high-level goal, then what role do SDN and NFV play?  Answer: Supporting roles.  SDN’s success so far has been almost entirely in the data center.  NFV is a partially operator-centric feature cloud strategy.  If my CO is a data center I can for sure connect things with SDN and host virtual functions.

Given this, is Comcast buying into a specific data-center-centric approach to future services by joining CORD/ONOS?  Or is it simply acknowledging that’s where they’re already committed to going, and looking for standards/community help along the way?  I think it’s the latter, and I think that’s a profound shift for network operators, equipment vendors, and those promoting infrastructure modernization as a step toward transformation.

Future service revenues will not come from tweaking the behaviors of connection services, but from the creation of an agile OTT platform.  That platform may then utilize connection services differently, but the platform transformation has to come first.  We have connectivity today, using legacy technologies.  We have successful OTTs today, depending for their business on that connectivity.  Operators who want the future to be bright have to shine an OTT light on it, not try to avoid OTT commitments by burying their heads in the sands of tweaking the present services.

And by “operators” here, I mean all operators.  If you run a network and provide connection services using fiber or copper, mobile or satellite, IP or Ethernet or maybe even TDM, then you have the same basic challenge of bandwidth commoditization.  The Financial Times ran a piece on this in the satellite space, for example, saying that if capacity was what the industry sold, then capacity demand was already outstripped by capacity supply even before a new generation of higher-capacity birds started to fly.

How do you meet that challenge?  You reduce current service cost and you chase new service revenues.  How do you do that?  You evolve from a business model of connecting stuff (which provably means you connect your OTT competitors to customers and disintermediate yourself) to being the stuff that users want to connect with.  Which is why CORD is important, and why Comcast’s support for it is also important.

Verizon’s SDN/NFV Architecture in Depth

I noted in my introductory blog in AT&T’s and Verizon’s SDN/NFV approach that Verizon has taken a totally different tack with its architecture.  Where AT&T is building open-source glue to bind its vendor-controlling D2 architecture, Verizon is defining an open architectural framework for vendor integration.  Standards from the ONF, TMF, and NFV ISGs fit deep in the AT&T ECOMP model, but the Verizon model is built around them.  That’s a critical difference to keep in mind as you read this.

Just as there’s a critical difference between the Verizon and AT&T models, there’s a critical commonality.  Both operators are saying that the current standards work isn’t going far enough fast enough.  In particular, both admit to the fact that the scope of the current work is too focused, so assuring broad-based benefits is nearly impossible.  So, in either architecture, standards are somehow supplemented to create something with enough functional breadth to be truly useful.

One obvious extension in the Verizon model is its functional heart, which is End-to-End (E2E) Orchestration.  Verizon builds E2E around the NFV Orchestration (usually called “MANO”) element, and from E2E it extends control downward in two forks—the SDN and NFV forks.  Network connectivity of all types is managed along the SDN path, with real devices (Physical Network Functions or PNFs in the Verizon diagram) and SDN-OpenFlow elements both under a set of SDN controllers.  On the NFV side, the structure is fairly standard, except for the critical point of SDN/NFV separation that we’ll get to.  There are also two “flanks”, one for FCAPS NMS and the other for OSS/BSS.

The way that SDN is handled is the most different thing about the Verizon approach.  Rather than proposing to have a single “Network-as-a-Service” model that then decomposes into vendor- or technology- or geographic-specific domains, Verizon has created three firm subdivisions—Access, WAN, and Data Center (DC) (along with an undefined “other”).  They appear to link the DC with the NFV elements only.

The classic interpretation of the NFV ISG model is that connection services are a subset of infrastructure services, meaning that they’d be expected to be supported by a (Virtual) Infrastructure Manager or VIM.  Verizon’s splitting of the data center connections off into the SDN control space firmly divides the SDN from the NFV, with cooperative behavior created above in the E2E function.  This somewhat mirrors the separation that AT&T proposes in its ECOMP model, where “Controllers” handle the cloud, the network, and applications.

The management flank is “Service Assurance”, and it consists of the traditional NMS FCAPS applications plus log management and correlation tools.  There are NMS links into both the SDN and NFV forks we’ve already described, and the links are both to E2E and to the lower forks, which implies a complex management relationship.  The OSS/BSS flank comprises connections to the OSS/BSS system from E2E and also from PNFs.  The “management” functions in the Verizon model are designed around the notion that function management is the same for either PNFs or VNFs.  Thus, you deploy a VNF using NFV tools, but you manage the functional aspects using a management toolset evolving from today’s EMSs to something like SDN control.

Verizon’s document starts its detailed description with the NFV Infrastructure (NFVI) element.  Verizon goes into great detail explaining the relationship between hardware elements (physical infrastructure) and software elements (virtual).  They also explain things like how a VIM “discovers” what its infrastructure is capable of, which is a nice advance in thinking from the ETSI starting point.  They do the same on the SDN side, including framing the future direction around intent-based interfaces.  All of this facilitates the interworking of components of the architecture with each other, critical if your intent is (as I think Verizon’s is) defining a framework in which vendor elements can be interworked confidently.

This is one area where Verizon’s document shines.  They’ve gone a long way toward defining the hardware for NFV, right down to CPU features, and they’ve also done well in defining how the physical infrastructure would have to be managed for consistency and reliability.  Every operator interested in carrier cloud should look at the document for this reason alone.

Another area where Verizon has the right approach is service modeling.  Verizon’s architecture shows a kind of succession of layers—service to functional to structural.  Each layer is governed by a model, and that allows vendors to incorporate model-driven deployment they may already have offered.  You can also model different layers in different ways, or even use two different models in the same layer.  YANG, for example, is a good architecture to model real network configurations, but I firmly believe that TOSCA is better for cloud deployments and functional/service-layer stuff.

As always, there are issues, and the big one here starts with the goal (can you hope to define an open architecture for vendors, and if so can you move the ball relative to the standards groups) and moves to some details.  I think that two issues are paramount.,

One area I think may pose a problem is the lack of specific support for multiple Infrastructure Managers, virtual or otherwise.  The biggest risk of lock-in in NFV comes because a vendor provides a VIM/IM for its own gear to the exclusion of all other gear.  If multiple VIM/IMs are allowed that’s not a major problem, but clearly it’s a killer if you can have only one VIM/IM in the architecture and several (incompatible) vendors want to be it!

In both my CloudNFV architecture and my ExperiaSphere architecture, I proposed that the equivalent of the VIM/IM be explicitly referenced in the model element that connects to the infrastructure.  That would allow any suitable VIM/IM implementation to be used, no matter how many, but it does require that the E2E model have the ability to include the specific reference, which Verizon says it doesn’t do.  I think they’ll need to fix this.

My other area of concern is the VNF Manager.  Verizon has retained the ETSI approach, which defines both “generic” VNFMs that can support multiple VNFs, and Specific VNFMs (S-VNFMs) that are specific to a given VNF.  I’ve cataloged all my concerns about this approach in previous blogs, and those interested can use the Search function on my blog page to find them.  For now, let me just say that if you don’t have a standardized way of managing all VNFs, you’ll end up with a significant onboarding issue, which is where we are now with the ISG model.

Part of the VNFM issue, I think, arises from a bit of vCPE myopia on Verizon’s part.  Yes, Verizon has the geography where vCPE is most likely to deploy (they have over three times the opportunity that AT&T has in its own home territory, for example).  However, Verizon’s customers are also long-standing users of business WAN services, and its therefore less likely that they 1) need a managed service approach and 2) don’t already have a solution for it if they do.  The focus on NFV in the model, and then on vCPE as the NFV application of choice, could fall short of justifying a major NFV commitment, which would make the architecture moot.

I think it’s clear from the Verizon material that the goal is to guide vendor implementations of what’s supposed to be an open architecture.  Candidly, I think this is going to be a hard road to travel for Verizon.  First, it’s far from clear that vendors are interested in an open approach.  Second, once you get outside the very limited boundaries of SDN and NFV standards, there’s nothing to guide an open model or even to pick a specific approach.  Verizon’s architecture identifies a lot of things that are critical, essential, to a business case.  The problem is that they don’t define them in detail, and so implementations in these extended areas have no reference to guide converging approaches.

Whether this will work depends on the vendors.  Those same vendors, it must be said, who have not stepped up to NFV, in no small part because most of them either see a small reward at the end of a long road, or no reward at all.  Creating a framework for vendor participation does little good if they don’t want to participate, and whether they do is still an open question.  That question, if answered in the affirmative, will only expose another question—whether the Verizon framework delivers on its technical mission, and there I have questions of my own.  The dependence on the formal standards that Verizon has created is risky when those standards don’t cover enough to make the business case.  Will Verizon fix that?  I don’t know.

A Deeper Dive into AT&T ECOMP

Even a superficial review of AT&T ECOMP shows it’s a whole different way of looking at the virtualization/softwarization of networking.  The master architecture diagram is a picture of start-to-finish service lifecycle management, the bottom is a redrawing of SDN and NFV concepts, and the middle is a modeling approach that seems to draw from multiple sources.  There are no specific references to any standards in the diagram, and the term “ECOMP” appears only twice.

The reason for this is simple, I think.  ECOMP is a new glue that binds and somewhat reshapes things that AT&T had always believed would be part of its Domain 2.0 (D2) architecture.  Because that architecture was designed to create a network where vendors fit into a specific set of silos and were strongly discouraged from slopping across multiple zones to lock AT&T into their approach, so is ECOMP.  In fact, it goes beyond D2 in that regard.  By framing services in a D2 way, ECOMP makes D2 real and not just a set of zones and boundaries.

It’s a fascinating model, and I have to open by saying I’m reviewing ECOMP based on less information than I’d like.  Until the full source code is released we won’t have all the details of the current implementation, and another that I expect that ECOMP will evolve as AT&T gains experience.  It will evolve faster, and further, if Orange and other operators now looking at (or trialing) ECOMP decide to go with it.  That could make ECOMP the most important development in SDN/NFV.

Functionally, ECOMP divides into two parallel frameworks, a design-time framework that builds and sustains the models and policies, and a runtime framework that applies them to the service lifecycle and network infrastructure.  The latter is repeatedly linked to D2 with references like “manages the full lifecycle of D2 infrastructure”, and in the various diagrams and texts, it’s possible to see many holdovers from the early D2 work.

The “left side” or design part of the service lifecycle process is rooted in the AT&T Service Design and Creation (ASDC) element, and also includes policy and analytics application design.  The models themselves seem to be somewhat based on the TM Forum model, combined with a dash of TOSCA, but the details are still murky because the code hasn’t yet been released for public review.  This element feeds another related component that’s responsible for acting as a model and policy repository and distributor.  ECOMP’s common services heart element provides an interface to this repository.

There are two high-layer ECOMP elements, the Master Service Orchestrator (MSO) and the Active and Available Inventory (A&AI) elements.  The former does the orchestrating based on a catalog of “recipes” and is roughly analogous to an expanded NFV MANO function.  The latter is the real-time viewer into the D2 environment, including both resources and services and in my view, it represents something that’s not explicit in the ETSI model at all.

The lower-layer elements of ECOMP are perhaps the most critical to vendors because it’s here that D2 largely assigns outside servers, devices, and software.  Again, there are two primary components in this area.  One is the Data Collection, Analytics, and Events (DCAE) element that handles telemetry and analysis, and the other is the collection of what’s perhaps the most critical element—the Controllers.

Orchestration in ECOMP is a two-level process.  The MSO handles high-level service-end-to-end orchestration and it then hands off its needs to one of a number of Controllers, each of which are resource-domain-specialist orchestrators that turn high-level requests into specific commitments of resources.  ECOMP defines three Controller types, the Infrastructure (or cloud), the Network, and the Application.

This structure divides responsibility between MSO and the Controllers, with the former doing stuff that involves high-level deployment and redeployment actions that have broad scope and minute-level response requirements, and the latter doing the seconds-of-time responses to things.  The Controllers are like ETSI Infrastructure Managers, except that ECOMP explicitly assigns some level of orchestration responsibility to them, which ETSI should and does not (so far).

AT&T seems to envision “orchestration” at the Controller level to be a combination of specific orchestration steps (DevOps- or “NetOps”-like) and policy distribution.  The combination seems appropriate given that the domains for each Controller could involve both legacy and virtual elements.  The implication of the structure is that a Controller is given a mission to deploy and sustain a service element (from its associated domain) and will be responsible for event-handling as long as the Controller can do what it’s expected to do.  If it can’t, then it would buck the case upward to the MSO for handling.

The MSO seems responsible for activating individual elements, meaning that the recipes that it works on would have to be fairly detailed in terms of the steps needed.  The Controllers carry out the requests of the MSO, but as I noted they also respond to lifecycle management events.  This makes the controllers a mixture of ETSI functions.  The Infrastructure controller is surely similar to the Infrastructure Manager, but the ETSI IM (virtual IM in ETSI) is singular while the ECOMP model divides it into Network and Infrastructure (meaning cloud).  The Application Controller is analogous to the ETSI VNF Manager in some ways.

This approach sounds strange, particularly for those versed in the ETSI approach, but it’s more logical than the ETSI original.  Controllers are passed “policies” from the MSO, and they have their own orchestration/policy mechanisms to cycle through the lifecycles of the stuff they build.  It’s supported by telemetry that’s collected everywhere and distributed to where it’s needed, using a publish-and-subscribe model.

ECOMP is a big advance in SDN/NFV architecture, a major improvement in nearly every way.  That doesn’t mean that it’s entirely without questions, perhaps even issues.  Again I have to stress that the details here are sketchy because the code’s not released, but I think there’s enough to comment on.

The big issue in ECOMP remains the VNFs themselves.  A “virtual network function” is generally seen as a function transplanted from an appliance and strung into a service chain.  Every appliance is different, and there’s no standard way of hosting one of these functions as a result.  Each presumably has a set of interfaces, each requires parameters, and all of this variability could be handled in only two ways—require that it be shed in favor of a standard set of API dependencies (what I’ve called a “VPF PaaS” model) or nest custom code with each VNF to provide it what it needs and interface with the outside world.  Even that would require some standard way of harmonization.  Neither ETSI’s work nor ECOMP mandates either of these two approaches, and without them there’s still too much “snowflake” variability and not enough “Lego” interchangeability in the VNFs.

The second issue is in the Controller design.  It appears from the ECOMP material that while there are three types of controllers, there could be multiple instances of each type corresponding to specific domains.  I’d take “domains” here to mean resource/administrative domains, meaning areas under a common management jurisdiction.  That’s a good idea, and it could also contribute to federation if some of the domains were “foreign”, which appears to be possible given the implementation description.

What’s not quite clear is whether the instances of a given Controller type all share a common implementation.  In some places the material seems to suggest that they do, and in others that there might be different implementations to accommodate different technologies deployed in the controlled regions.  This isn’t a simple semantic point; if there is only one controller implementation for each type, then every controller would have to know a lot of things to reflect all the variability in implementation within its domain.  Or, the domains would have to be totally homogeneous from a control and functional perspective.

The final point is that orchestration of operations functions is still undefined.  It’s not that ECOMP precludes it, but that it doesn’t presume a common modeling and orchestration flow that starts at the OSS/BSS and moves downward to the services.  Operators will vary on how much they rely on OSS/BSS tools for specific service lifecycle processes, and thus it’s not clear how much operations efficiency benefits might be left on the table.

OK, overall, ECOMP isn’t perfect, but it’s a massive forward step in an industry that’s been dawdling around the key points for all too long.  I’m sure it will mature with adoption and that if there is successful promotion of ECOMP beyond AT&T, it will promote things to move further and faster.  I’ll probably revisit ECOMP down the line as it evolves and more details become available.

An Overview of the AT&T and Verizon SDN/NFV Architectures

When AT&T and Verizon released their architecture models for SDN and NFV, I did a quick blog overview of the two.  I’ve had a chance to talk with operators and vendors about the approach now, and I’d like to revisit the two architectures based on what I’ve heard.  This is going to be a three-part series, with this (the first) focused on broad architecture and goals and the other two on the specifics of the two architectures.

To me, the important starting point for the discussion is the fact that both operators believe that they have to step up and define an architecture on their own.  We’ve seen in other areas (like 5G and IoT) that operators are increasingly unhappy with the pace of standards development and the support they believe they’re getting from vendors.  The market doesn’t wait for I-dotting and T-crossing, nor can the operators.  They need something for SDN and NFV that delivers benefits that justify deployment and, at deployment, solve the problem of profit-per-bit compression.  Standards haven’t delivered.

On the other hand, operators have spun a lot of cycles in activities like the TMF, ONF, and NFV ISG and they don’t want to just toss the work (and money) away.  Thus, it’s not surprising that these two operators think that they have to blow a few kisses in the appropriate directions.  The biggest risk that these two operator initiatives share is the one they’ve inherited from those other activities.  It’s not just a matter of not having gotten enough done, or of not working fast enough to do more.  Some of what was done, having been approached wrong, was done wrong.  Some break with the past may be essential in avoiding the pitfalls that have already been created.

My read on the AT&T and Verizon models is that they have some similarities but also some fundamental differences, and these differences reflect what I think are basic differences in the goal of each of the operators.  AT&T is defining an open-source project that effectively puts vendors in their place, where Verizon is defining an open framework to encourage vendors.  Keep these missions in mind because I’ll refer back to them and even expand a bit on their implications.

AT&T’s ECOMP is an element in its Domain 2.0 (D2) strategy, and that’s important because D2 tries to divide up networking into functional pieces where AT&T can then establish a number of specific bidder/competitors for future deals.  Fundamental to D2 is the idea that vendors can’t create vast connected ecosystems that put the operator in an all-or-nothing position.  ECOMP, as an architecture for the software elements that create the service lifecycle and manage infrastructure, has to insure that vendors stay where they’re put.  Thus, it doesn’t try to create open elements within itself, which would risk the substitution of a vendor element that might pull through other stuff.  Instead it creates specific low-level points where equipment and software can be placed.

Because D2 isn’t limited to virtual elements, the AT&T strategy has relevance to all its services and infrastructure.  You could orchestrate a service made up completely of legacy equipment with the AT&T approach, though it seems clear that AT&T intends to use both SDN and NFV as broadly as it can.  Still, the D2 focus when combined with the open-source implementation implies that AT&T is truly building a “software-defined” network.

Verizon’s “SDN/NFV Reference Architecture” has no company-offered convenient acronym, so I’m going to refer to it as “SNRA” here.  Whatever you call it, the contributors and acknowledgements page makes it clear that there were more vendor people involved than Verizon people.  I’m not suggesting that outside pressure is what creates the “encourage vendors” model; from what I hear Verizon’s explicit goal was to frame the way vendors approached Verizon with offerings.  Vendor participation in something that’s supposed to encourage vendors is essential, and that’s why there were seven vendors listed (Cisco, Ericsson, HPE, Intel, Nokia, Red Hat, and Samsung).  Verizon aims at open implementation by a community that contributes entire pieces, and so pays a lot of attention to both interfaces and to the ETSI NFV E2E model that created at least a rough (if not totally functional) consensus.

SNRA is what the name suggests, meaning that it’s really about SDN and NFV.  While you can definitely provision legacy elements with the model, it seems pretty clear that Verizon is seeing SNRA as deploying lock-step with SDN and NFV and not being prominent where those technologies haven’t reached.  Again, I’m not surprised with this given Verizon’s apparent goal of creating an architecture into which vendors can fit their wares.  Displacing legacy gear or reducing its differentiation would be a hard sell to many of the contributors on the list.

The scope of both models is set by the need to harness benefits more effectively.  One of the issues I’ve raised repeatedly in both SDN and NFV is that deployment depends more on operations improvements than on capital savings.  Since neither the OSF nor the ETSI NFV ISG chose to address full-scope operations and the full-service lifecycle in their models, and since the TMF has yet to provide its own approach to operational orchestration, one of the key questions that both Verizon and AT&T will have to address to make their approaches successful is “can this do opex efficiency?”  The answer has to start with the relationship between the two architectures and OSS/BSS systems.

In both the ECOMP and SNRA models, infrastructure (including both legacy elements and virtualization-enhanced SDN/NFV elements) is composed into what could fairly be called “service-specific virtual devices” that present a simple model of the service to the OSS/BSS layer.  Neither of the approaches propose to modernize OSS/BSS by introducing event-driven concepts or OSS/BSS functional orchestration.  In my past characterization of the ways in which service orchestration could be applied, they both operate below the OSS/BSS boundary.  While this doesn’t foreclose securing opex benefits, it doesn’t fully address the question of how you’d orchestrate OSS/BSS processes into service operations.  I’ll comment on some approaches as we develop this series to cover the specifics of each approach.

We have competing approaches here, and what’s particularly helpful I think is the fact that what they compete on is fundamental to the evolution of networking.  One operator is embodying the “go it alone” trend, taking control of its software evolution, subordinating hardware to software, and limiting vendor contributions so they fit into the overall scheme.  The other has tried to rehab a vendor-participation process that has broken down in the standards bodies.  Verizon realizes that it is very difficult for operators to drive progress alone, and in many areas impossible for them to cooperate outside formal standards groups without risking anti-trust action.

How is this all going so far?  Obviously, operators are happy with the AT&T model; Orange is running trials of ECOMP.  What may be particularly telling is that the vendors whose chestnuts Verizon may be trying to pull out of the fire aren’t being particularly helpful in promoting the Verizon approach.  For example, I have had a number of vendors ask for details on ECOMP and (so far) none have asked for similar details on SNRA.

Vendors have done a generally abysmal job with SDN and NFV at the positioning and benefit-securing level, and they may now be shooting down their last, best, chance to redeem themselves.

Can Nokia Really Make Itself Software-Centric?

If network operators want their vendors to embrace software, stories in SDxCentral and Light Reading hint that Nokia may be thinking of doing just that.  Details on the notion are incredibly scarce, so we’re left not only to ask the usual question “Is this the right approach?” but also the question “What is the approach?”

All the fuss comes from a single sentence in a Nokia release on its Capital Markets Day discussions.  What they said was that one goal was to “Move beyond our current product-attached software model and create a software business with the margin profile of large software companies, focused on areas including enterprise software and IoT platforms.”  So, to the question: Is this “software business” a business unit, a spin-off, or what?  And, “What does “software model” mean?

Clearly, a software model not the “current product-attached software model”, by which Nokia means the practice of tying software that’s associated with equipment in some way into the equipment business unit.  Many vendors have found that this approach tended to subordinate software advances to hardware sales, and also offered such a limited career path for software professionals that it was hard to get the right people, particularly in leadership roles.

The product attachment issue probably got especially acute when Nokia acquired Alcatel-Lucent.  That company was almost legendary for its product silos and isolationist policies, many of which held over from the original Alcatel and Lucent merger.  It was never really resolved after that first M&A, and that at least complicates achieving the “software business” goal.

As it happens, Alcatel-Lucent actually has an “independent software business” in Nuage, its SDN property.  I’ve made it clear in a number of my blogs that Nuage is my pick for the top prize in the SDN and virtual network space.  Despite this, it never got the kind of support from corporate that it should have, and many operators believed that this was to protect the router business, headed by the powerful executive Basil Alwan.  The point is that just making something a business unit, even an independent one carrying its own brand, doesn’t guarantee its independence.

There’s also the question of the Ericsson example.  I suggested in an earlier blog that Ericsson probably had abandoned a hardware-centric model too soon, and tied itself to a model of network evolution that was still too radical to be immediately successful.  If that’s true for Ericsson, it certainly could pose a threat to Nokia’s software-business approach, whatever it turns out to be.

The big question for Nokia, then, is what software-centricity does turn out to mean.  They can grab the software out of the product units, but they then have to do something specific with it.  There has to be some new and specific focus to all that product-centric stuff or all you do is disconnect the old connections if you break it out.  Having it as an independent business unit would essentially replicate the Nuage model, and I think it’s clear that Nuage didn’t really have the independence and senior management support it needed.  Having the software business actually spin out as a separate company could create shareholder value but poses the risk of having two parts that need each other and don’t have a clear path to symbiosis.

Nuage may be the functional key to what Nokia should do in terms of planning a unified software strategy, whatever the relationship between the software unit and the rest of the company.  In Nuage, Nokia has an outstanding virtual network model, a model into which you could fit SDN services, NFV, services, cloud services, and legacy infrastructure.  If Nuage is the universal constant, then it would make sense to build an architecture positioning around it and fit other software (and even hardware) products into the architecture.

If virtualization is the future, then you have to start by virtualizing the network.  That means creating a model of networking that’s as infrastructure-independent as you can make it.  It’s fairly easy to do that with an overlay-SDN approach like Nuage has.  You can then apply virtual network services directly as NaaS or even as SD-WAN (which Nokia is already doing), and you can also apply them to the cloud and to cloud-centric missions like NFV’s function deployment and connection.  Nuage could have enriched Alcatel-Lucent’s NFV strategy, which was pretty good in any case, and it could enhance Nokia’s too.

The big benefit of a virtual-network approach is its ability to abstract resources, including legacy equipment.  The traditional SDN model has been “fork-lift-and-install” where the virtual-network approach lets you “abstract-and-adapt”.  You can create a unified model of networking and adapt the real-product element relationships as far as hardware-level virtualization and the cloud can take you.  Overlay SDN works fine over Ethernet or IP, and also fine over OpenFlow-forwarded virtual wires.

The fact that you can base SD-WAN service, which is probably the most logical way to deliver NaaS, on overlay SDN and Nuage is to me a proof point for its value.  It lets you deliver a service to users while at the same time exploring the questions of the relationship between virtual-overlay and underlay technology.  It also lets you explore the relationship between virtual networks and the carrier cloud, which could consume those virtual networks either to deliver services or to connect its own elements—or both, of course.

The cloud is the key to all of this, in my view.  Any operator software will end up running in it.  All virtual functions and application elements will be hosted there.  But first and foremost, the cloud is green field.  It’s far easier to apply virtualization where you aren’t displacing something else.  If operators were convinced that the Nuage model was the right approach for cloud networking, and if they also believed it could create an infrastructure-independent abstraction of connection infrastructure, they would have a single path forward to the networking model of the future.

Everything isn’t all beer and roses just because you build a virtual network, of course.  All of the challenges of virtualization in the areas of operations and management remain.  What a virtual network does is create a connection abstraction in a consistent way, which opens the door to using an expanded form of that abstraction (a “service model”) to bind the virtual and the real and to integrate it with OSS/BSS/NMS and onward to the service users.

Ultimately that has to be the goal of Nokia’s software business.  Just finding a home for open-source or DevOps initiatives, as a Nokia person suggested in one of the articles, isn’t a real strategy.  You have to find a role for all this stuff, a way to create one or more deployable ecosystems.  And we know from Broadway that you have to understand what the play is so you can organize the players.  Figuring that out will be Nokia’s challenge.

Competitors’ challenge, too.  Other vendors have played with software-centric visions, including Nokia competitors Cisco and Juniper.  The initiatives haven’t really moved the ball much for either company, in no small part because they haven’t figured out the answer to that first problem I mentioned—the box business generates a lot of revenue that the software business doesn’t. Software-centricity also frames another problem, a dimension of the Ericsson problem.  Software is moving increasingly to open-source, which means it’s a commodity and isn’t differentiating.  What then is the software business?  If it’s professional services, then Nokia and every company who want software-centricity is surely heading into uncharted territory.

Is an Open-Source Framework For Next-Gen Network Software Possible?

Network operators have accepted open source software.  Or, sort of accepted.  I don’t want to minimize the fact that operators have indeed made the culture shift necessary to consider the adoption of open-source software, but that’s Step One.  There are other steps to be taken, and you can see some of the early attempts at those other steps emerging now.

The decisive driver for Step One here seems to have been the fact that cost reduction needs were far outpacing any possible, credible, path to a software-driven revolution.  Operators see a much higher probability that they’ll adopt open hardware models like that of the Open Compute Project.  If more and more of the data center iron is an open commodity, how many vendors who are denied the big hardware revenues will then jump out to address software needs?  Open source, like it or not, may be the only option.

No operator is naïve enough to think that they can just run out into the marketplace and say “InstallOpenSourceBusiness” and have it happen.  There are literally millions of open-source packages, and even narrowing the field to the possible packages that operators could adopt probably wouldn’t cut the number below a million.  Operators are used to buying solutions to problems, usually specific problems and not huge ones like “Make Me Profitable”, so one of the issues that has been facing operators is finding the components to their solution suites.  Hopefully from a single provider, but certainly at least a set of open-source tools that combine properly.

All of the operators I talked with in the last two years agree on two things.  First, they need to transform their business to be leaner and meaner and to open the opportunity for new revenues.  Second, achieving the first goal will surely mean becoming more software-centric rather than device-centric in their planning.  That’s not easy, and it’s only Step Two.

I had a nice talk with an operator a couple months ago, and they indicated that their planners had divided their technology processes into an operations layer, a service feature layer, and a virtual infrastructure layer.  The first focused on OSS/BSS modernization, the second on adding software instances of features to the current appliances/devices in use, and the last on the platform used to host and connect those software features.

In the first area, this operator said that while there were in theory open-source tools for OSS/BSS, they did not believe these tools could possibly do their job, or that transforming to use them could be managed without significant disruption.  That means that they have to confront a stark choice of either discarding the traditional OSS/BSS model completely and building up operations features by area, or staying with (likely their current) proprietary OSS/BSS strategy.

In the second area, which is the software feature-building process, this operator confesses that their current focus in NFV has been entirely on proprietary virtual functions.  They offered two reasons for that.  First, many on the marketing side told the operators that picking virtual functions from “familiar” vendors would grease the sales skids on NFV service offerings.  Second, everyone they found offering virtual functions were offering only proprietary ones.  The argument was that there was no unified approach to creating open-source virtual functions.

The final area has been the most perplexing for this particular operator.  They had envisioned, when they started, that both SDN and NFV would create virtual devices through a process of feature lifecycle management that lived within the third, infrastructure or platform, area.  Early on, this operator felt that cloud tools like OpenStack and some DevOps solution would build this virtual device, but they accepted quickly that management and orchestration processes would also be needed.  That’s because OpenStack doesn’t do everything operators need.

To make matters worse in this last area, this operator realized that by staying with legacy OSS/BSS systems, they couldn’t do much to ensure that management of these new services would be possible within the OSS/BSS.  That requirement had to be added elsewhere, and the only place that seemed to fit was that third area of infrastructure or platform.  What this has done is to concentrate a lot of change in a single space, while leaving the issues of the other two largely unresolved, especially with regard to open source.

There are really three layers of “infrastructure”.  At the bottom is the physical resources and cloud deployment stuff, including OpenStack, SDN control, and legacy device management.  In the middle are the things that drive deployment below, which would be DevOps and orchestration tools, and at the top are the mystical orchestration processes that create the virtual devices that have to be presented upward.

OpenStack has issues, especially in terms of networking, even for cloud computing.  While there are initiatives to harden it for carrier NFV use, these are aimed more at the deployment side than at broader concerns of performance and scalability limits, or at network constraints.  All of these could be resolved outside OpenStack by the next layer up, and probably should be.  OpenNFV seems likely to harmonize this layer a bit, but its role in organizing the next one up is unclear and so is how it would treat legacy elements.

In that next layer, we have current DevOps technology that could control deployment, but today’s DevOps isn’t rich in network configuration and control features.  NFV has at least two MANO projects underway.  Open-Source MANO (OSM) seems to me to be aimed at the limited mission of deploying virtual functions and service chains.  OPEN-O also wants to control legacy elements and orchestrate higher processes.  Neither of them are fully developed at this point.

In the top, final, layer, the only specific contribution we have is OPEN-O, but I think you could argue that whatever happens here would probably be based on a service modeling architecture.  Some like TOSCA, which is in my view very generalized and very suitable.  There is a basic open-source TOSCA implementation, and there’s been some university work building service deployment based on it and on the Linked USDL XML variant.

There is nothing I know of underway that suitably links the management chains through all of this, allowing for higher-level elements to control what happens below and allowing lower-level elements to pass status upward for action.  There is also no accepted architecture to tie this all together.

You can see how “open source” is challenged in Step Two, and we’ve not even gotten to the last step, where we can actually field an integrated suite of software elements.  We’ve also not addressed the question of how virtual functions could be authored in open source without them all becoming what one operator has characterized as “snowflakes”, every one unique and demanding custom integration.

Operators like AT&T and Verizon are trying to create models to address the whole next-gen ecosystem, but even they are risking falling into the same trap that has hurt the ETSI NFV efforts and the ONF’s SDN.  In the name of simplification and acceleration, and in the name of showing results quickly to sustain interest, they make assumptions without following the tried-and-true top-down modeling from goals to logic.  I’m hopeful that the operators can avoid this trap, but it’s not a done deal yet.

Still, it’s these operator architectures that will likely carry the open-source banner as far forward as it gets.  Other open-source SDN and NFV elements will be relevant to the extent they can map into one or both of these carrier models.  The reason for their primacy is simple; they represent the only real ecosystem definitions available.  All of the issues I’ve noted above will have to be addressed within these architectures, and so they’ll be a proving-ground for whether any open-source solution can make a complete business case.

Looking at the Future of IT Through the “Whirlpool” Model

Changes in how we build data centers and networks, and in how we deploy applications and connect them, are really hard to deal with in abstract.  Sometimes a model can help, something to help visualize the complexity.  I propose the whirlpool.

Imagine for the moment a whirlpool, swirling about in some major tidal flow.  If you froze it in an instant, you would have a shape that at the top is wide, slowly narrowing down, then more quickly, till you get to a bottom where again the slope flattens out.  This is a decent illustration of the relationship between distributed IT and users, and it can help appreciate the fundamental challenges we face with the cloud and networking.

In our whirlpool, users are strung about on the very edge, where there’s still no appreciable slope.  They’re grouped by their physical location, meaning those who are actually co-located are adjacent on the edge, and the further they are away from each other, the further they are spaced on that whirlpool edge.

Compute resources are distributed further down the sides.  Local resources, close to the users, are high up on the whirlpool near that user edge, and resources that are remote from workers are further down.  The bottom is the data center complex, generally equidistant from users but at the bottom of a deep well that represents the networking cost and delay associated with getting something delivered.

When you deploy an application to support users, you have to create a connection between where it’s hosted and where it’s used, meaning you have to do a dive into the whirlpool.  If the application is distributed, you have multiple places where components can live, and if those places aren’t adjacent you have to traverse between the hosting points.  Where multiple users are supported, every user has to be linked this way.

This illustrates, in a simple but at least logical way, why we tend to do distributed computing in a workflow-centric way.  We have computing local to workers for stuff that has relevance within a facility and requires short response times.  We may then create a geographic hierarchy to reflect metro or regional specialization—all driven by the fact that workers in some areas create flows that logically converge at some point above the bottom of the whirlpool.  But the repository of data and processing for the company still tends to be a place everyone can get to, and where economies of scale and physical security can be controlled.

Now we can introduce some new issues.  Suppose we move toward a mobile-empowerment model or IoT or something that requires event-handling with very fast response times.  We have to push processing closer to the worker, to any worker.  Since in this example the worker is presumably not in a company facility at all, caching processing and data in the local office may not be an option.  The cost efficiency is lost and resources are likely to be under-utilized.  Also, a worker supported on a public broadband network may be physically close to a branch office, but the network connection may be circuitous.  In any event, one reason why the cloud is much easier to justify when you presume point-of-activity empowerment is that the need for fast response times can’t be met unless you can share hosting resources that are better placed to do your job.

The complication here is the intercomponent connectivity and access to central data resources.  If what the worker needs is complex analysis of a bunch of data, then there’s a good chance that data is in a central repository and analyzing it with short access delays would have to be done local to the data (a million records pushed over a path with a 100-millisecond delay takes over a day in communication latency alone).  Thus, what you’re calling for is a microservice-distributed-processing model and you now have to think about interprocess communications and delay.

Purpose of composability, networking, DC networking, is to make the whirlpool shallower by reducing the overall latency.  That has the obvious value of improving response times and interprocess communications, but it can also help cloud economics in two ways.  One is by reducing the performance penalty for concentration of resources—metro might transition to regional, for example.  That leads to better economies of scale.  The other is by spreading the practical size of a resource pool, letting more distant data centers host things because they’re close in delay terms.  That can reduce the need to oversupply resources in each center to anticipate changes in demand.

Information security in this picture could change radically in response to process and information distribution.  Centralized resources are more easily protected, to be sure, because you have a mass of things that need protecting and justify the costs.  But transient process and data caches are difficult to find, and the substitution of a user agent process for a direct device link to applications allows you to impose more authentication at all levels.  It’s still important to note that as you push information toward the edge, you eventually get to a point where current practices would make workers rely on things like hard copy, which is perhaps the least secure thing of all.

Do you like having refinery maps tossed around randomly?  Do you like having healthcare records sitting in open files for anyone to look at?  We have both today, and probably in most cases.  We don’t have to make electronic distribution of either one of these examples perfect to make it better than what we have.  The problem is not the level of security we have but the pathway to getting to it.  A distributed system has to be secured differently.

Suppose now that we tool things to extreme; the whirlpool is just a swirl that presents no barriers to transit from any point to any other.  Resources are now fully equivalent no matter where they are located and from where they’re accessed.  Now companies could compete to offer process and data cache points, and even application services.

So am I making an argument against my own position, which is that you can’t drive change by anticipating its impact on infrastructure and pre-building?  Not at all.  The investment associated with a massive shift in infrastructure would be daunting, and there would be no applications or business practices in place that could draw on the benefits of the new model.  A smarter approach would be to start to build toward a distributable future, and let infrastructure change as fast as the applications can justify those changes.  Which, circling back to my prior comments, means that this has to be first and foremost about the cloud as an IT model, and only second about the data center.

Wise Council from the Past

Whatever your party, if you are concerned about the country’s future, I recommend this poem, one I’ve quoted to friends in the past.  Henry Wadsworth Longfellow:

Thou, too, sail on, O Ship of State!
Sail on, O Union, strong and great!
Humanity with all its fears,
With all the hopes of future years,
Is hanging breathless on thy fate!
We know what Master laid thy keel,
What Workmen wrought thy ribs of steel,
Who made each mast, and sail, and rope,
What anvils rang, what hammers beat,
In what a forge and what a heat
Were shaped the anchors of thy hope!
Fear not each sudden sound and shock,
‘T is of the wave and not the rock;
‘T is but the flapping of the sail,
And not a rent made by the gale!
In spite of rock and tempest’s roar,
In spite of false lights on the shore,
Sail on, nor fear to breast the sea!
Our hearts, our hopes, are all with thee,
Our hearts, our hopes, our prayers, our tears,
Our faith triumphant o’er our fears,
Are all with thee, — are all with thee!

 

 

Is There a Business Benefit Driving “Hyperconvergence” or “Composable Infrastructure?”

The cloud is a different model of computing, a combination of virtualization and network hosting.  We all recognize that “the cloud” is something apart from virtual machines or containers, OpenStack or vCloud, IaaS or PaaS or SaaS.  It’s also something apart from the specific kind of servers you might use or the data center architecture you might adopt.  Or so it should be.

I had a question last week on LinkedIn (which is where I prefer my blog questions to be asked) on what I thought would drive “Infrastructure 2.0.”  My initial response was that the term was media jargon and that I’d need a more specific idea of what the question meant in order to respond.  When I got that detail, it was clear that the person asking the question was wondering how an agile, flexible, infrastructure model would emerge.  Short answer; via the cloud.  Long answer?  Well, read on.

The biggest mistake we make in technology thinking and planning is perpetuating the notion that “resources suck”, meaning that if we simply supply the right framework for computing or networking (forgetting for the moment how we’d know what it was and how to pay for it), the new resource model would just suck in justifying applications.  Does that sound superficial?  I hope so.

IT or network resources are the outcome of technology decisions, which are the outcome of application and service decisions, which are the outcome of benefit targeting, which is the outcome of demand modeling.  We can’t just push out a new infrastructure model, because the layers above it are not in place to connect it.  The best we could do at this point is to say that the new compute model, the cloud, could be an instrument of change.  The challenge even there is deciding just what kind of changes would then drive the cloud, and you have to do that before you decide how the cloud drives compute infrastructure.

If you did a true top-down model of business IT, you’d have to start with an under-appreciated function, the Enterprise Architect (EA).  This is a role that evolved from the old “methods analysts” of the past, responsible for figuring out what the elements of a job were as a precursor to applying automation.  But it’s here that we expose a big risk, because the best way to do a job may not be the way it’s been done in the past, and often past practices have guided high-level business architectures.

An alternative to this approach is the “focus-on-change” model, which says that if you’re going to do something transformational in IT these days, you will probably have to harness something that’s very different.  I cite, as change-focus options, mobility and IoT.  No, not analytics; they apply to current practices and try to make better decisions through better information.  Mobility and IoT are all about a higher-level shift, which is away from providing a worker an IT-defined framework for doing a job and toward providing a way to help a worker do whatever they happen to be doing.

Any business has what we could call elemental processes, things that are fundamental to doing the business, but there’s a risk even in defining these.  For example, we might say that a sales call is fundamental, but suppose we allow the buyer to purchase online?  Same with delivery, or even with production.  There are basic functional areas, though.  Sales, production, delivery, billing and accounting, all are things that have to be done in some way.  Logically, an EA should look at these functional areas and then define a variety of models that group them.  An online business would have an online sales process, and it might dispatch orders directly to a producer for drop-shipping to the customer or it might pick goods from its own warehouse.  The goods might then be manufactured by the seller or wholesaled.

When you have a map of processes you can then ask how people work to support them.  The most valuable change that appears to be possible today is the notion of “point-of-activity empowerment” I’ve blogged about many times.  Information today can be projected right to where the worker is at the moment the information is most relevant.

Another mistake we make these days is in presuming that changing the information delivery dynamic is all we need to do.  A mobile interface on an application designed for desktop use is a good example.  But would the worker have hauled the desk with them?  No, obviously, and that means that the information designed for desktop delivery isn’t likely to be exactly what the worker wants when there’s a mobile broadband portal that delivers it to the actual job site.  That’s why we have to re-conceptualize our information for the context of its consumption.

That’s why you could call what we’re talking about here contextual intelligence.  The goal is not to put the worker in a rigid IT structure but to support what the worker is actually doing.  The first step in that, of course, is to know what that is.  You don’t want to interfere with productivity by making the worker describe every step they intend to take.  If we defined a process model, we could take that model as a kind of state/event description of a task, a model of what the worker would do under all the conditions encountered.  That model could create the contextual framework, and we could then input into it events from the outside.

Some events could result from the worker’s own action.  “Turn the valve to OFF” might be a step, and there would likely be some process control telemetry that would verify that had been done, or indicate that it didn’t happen.  In either case, there is a next step to take.  Another event source might be worker location; a worker looking for a leak or a particular access panel might be alerted when they were approaching it, and even offered a picture of what it looked like.

From an application perspective, this is a complete turnaround.  Instead of considering information the driver of activity, activity is the driver of information.  Needless to say, we’d have to incorporate process state/event logic into our new system, and we’d also have to have real-time event processing and classification.  Until we have that, we have no framework for the structure of the applications of the future, and no real way of knowing the things that would have to be done to software and hardware to do them.

The converse is true too.  We could change infrastructure to make it hyperconverged or composable or into Infrastructure 2.0 or 3.0 or anything else, and if the applications are the same and the worker behavior is the same, we’re probably going to see nothing but a set of invoices for changes that won’t produce compensatory benefits.

Obviously, it’s difficult to say what the best infrastructure model is until we’ve decided on the applications and the way they’ll be structured.  We can, though, raise some points or questions.

First, I think that the central requirement for the whole point-of-activity picture is a cloud-hosted agent process that’s somewhere close (in network latency terms) to the worker.  Remember that this is supposed to be state/event processing so it’s got to be responsive.  Hosting this in the mobile device would impose an unnecessary level of data traffic on the mobile connection.  The agent represents the user, maintains user context, and acts as the conduit through which information and events flow.

We also need a set of context-generating tools, which can be in part derived from things like the location of the mobile device and in part from other local telemetry that would fall into the IoT category.  Anything that has an association with a job could be given a sensor element that at the minimum reports where it is.  The location of the worker’s device relative to the rest of this stuff is the main part of geographic context.

The agent process is then responsible for drawing on contextual events, and also drawing on information.  Instead of the worker asking for something, the job context would simply deliver it.  The implication of this is that the information resources of the company would be best considered as microservices subservient to the agent process map (the state/event stuff).  “If I’m standing in front of a service panel with the goal of flipping switches or running tests, show me the panel and what I’m supposed to do highlighted in some way.”  That means the “show-panel” and “highlight-elements” microservices are separated from traditional inquiry contexts, which might be more appropriate to what a worker would look at from the desk, before going out into the field.

You can see how the cloud could support all of this, meaning that it could support an application model where components of logic (microservices) are called on dynamically based on worker activity.  The number of instances of a given service you might need, and where you might need them, would depend on instantaneous workload.  That’s a nice cloud-friendly model, and it pushes dynamism deeper than just a GUI, back to the level of supporting application technology and even information storage and delivery.

Information, in this model, should be viewed as a combination of a logical repository and a series of cache points.  The ideal approach to handling latency and response time is to forward-cache things that you’ll probably need as soon as that probability rises to a specific level.  You push data toward the user to lower delivery latency.

The relationship between this productivity-driven goal set, which would at least hold a promise of significant increases in IT spending, and things like “hyperconvergence” or “composable infrastructure” is hard to establish.  Hyperconvergence is a data center model, and so is composable infrastructure.  It’s my view that if there is any such thing as either (meaning if they’re not totally hype-driven rehashes of other technology) then it would have to be a combination of a highly integrated resource virtualization software set (network, compute, and storage) and a data center switching architecture that provided for extremely low latency.  A dynamic application, a dynamic cloud, could in theory favor one or both, but it would depend on how distributed the data centers were and how the cloud itself supported dynamism and composability.  Do you compose infrastructure, really, or virtualize it?  The best answer can’t come from below, only from above where the real benefits—productivity—are generated.

Which leads back to one of my original points.  You can’t push benefits by pushing platforms that can attain them.  You have to push the entire benefit food chain.  The cloud, as I said in opening this blog, is a different model of computing, but no model of computing defines itself.  The applications, the business roles, we support for the cloud will define just what it is, how it evolves, and how it’s hosted.  We need to spend more time thinking about Enterprise Architecture and software architecture and less time anticipating what we’d come up with.