The Technical Pieces of a Successful NGN

What do we need, in a technical sense, to advance to next-generation networking?  Forget trite suggestions like “carriers need to change their culture” or “we need to focus on customer experience.”  When has any of that been in doubt, and how long has it been said?  If there are problems that need to be solved, what are they?  Three, in my view.  We need a good service modeling architecture, we need a framework for automating the service lifecycle, and we need to have a strong and scalable management architecture to bind services and resources.

To my mind, defining a good service modeling architecture is the primary problem.  We need one that starts with what customers buy, dives all the way to resource commitments, covers every stage of the service lifecycle, and that embraces the present and the future.  Forget technology in this effort; we should be able to do this in abstract…because service models are supposed to be abstract.  The abstract should cover four key points.

Point number one is hierarchical structure.  An object in the modeling architecture has to be able to represent a structure of objects that successively decompose from those above.   “Service” might decompose into “Core” and “Access”, and each of the latter might decompose based on technology and/or geography.

Point number two is intent-based elements.  An object in the architecture should have properties that are based on what it does, not how it does it.  Otherwise the object is linked to a particular implementation, which then limits your ability to support evolving infrastructure, multiple vendors, etc.

The third point is per-element state/event-based process mapping.  Each object needs to have a state/event table that defines the operating states it can be in, the conditions it expects to handle, and the processes associated with the state/event combinations.  “If State A and Event B, then Run Process C and Set State X” might be a form of expression.

Point four is that the parameters input into and expressed by elements must be related to the parameters received from the next-higher element and the next-lower elements.  If “Access” and “Core” report normal states then “Service” does likewise.  Any numerical or Boolean properties of an object would, if set, result in something being sent downward to subordinates, and anything below has to be transformed to a common form published by the layer above.

The single biggest failing in our efforts to transform services and networks is the fact that we have not done the modeling work.  Let make this point clearly; without a strong service model, there is no chance of transformation success.  And, folks, we still do not have one.

Service automation is the second problem we have to resolve.  If you have a good service model, then you can define good software processes to decompose the model and link lifecycle processes to states/events.  We’ve only recently started accepting the importance of service automation, but let me make it clear in financial terms.

This year, “process operations” costs, meaning costs directly attributable to service and network operations, will account for 28 cents of each revenue dollar.  If we were to adopt SDN and NFV technology in its limited, pure, standards-based form, at the largest plausible scale, we could reduce 2018 costs by a bit less than two and a half cents.  If we were to adopt service automation principles absent any technology shifts whatsoever, we could reduce the 2018 costs by three cents more per revenue dollar—more than double the savings.  Furthermore, the investment needed to secure the 2.4 cents of SDN/NFV savings would be thirty-eight times the investment needed to secure the 5.4 cents of operations savings.

Perhaps one reason it’s complicated is that “service automation”, is really a combination of two problems.  First is the already-mentioned lack of a good service modeling architecture.  The second is a scalable software architecture with which to process the model.  It does little good to have one without the other, and so far, we have neither.

I’ve been involved in three international standards activities and two software projects, aimed in part at service automation.  From the latter, in particular, I learned that a good model for scalable service management is the “service factory” notion, which says that a given service is associated with an “order” that starts as a template and is then solidified when executed.  I’ve tried out two different approaches to combining software and models, and both seem to work.

One approach is to use a programming language like Java to author services by assembling Java Classes into the service models.  This creates a combination of an order template that can be filled in, and a “factory” that when supplied with a filled-in template (an “instance”), will deploy and manage the associated service.  You can deploy as many factories as you like, and since all the service data (including state/event data) lives in the instance, any factory can process any event for any service it handles.

The second approach is to have generalized software process a service data model and execute processes based on states and events.  To make this work, you need to make all the service lifecycle steps into state/event behaviors, so things might start with an “Offer” state to indicate a service can be ordered, and progress from there.

My personal preference is for the second of the two approaches, but you can see that neither one bears any resemblance to the ETSI End-to-End structure, though I’ve been told by many that the model was never intended to be taken as a literal software architecture.  I think you can fit either approach to the “spirit of ETSI”, meaning that the functional progression can be made to align.

The final technical problem to be resolved in getting to next-gen networking is a management model.  Think for a moment about a global infrastructure, with tens of thousands of network devices, millions of miles of fiber, tens of thousands of data centers holding millions of servers, each running a dozen or so virtual machines that in turn run virtual functions.  All of this stuff needs to be managed, which means it needs attention when something happens.  How?

The first part of the “how?” question is less about method detail than about overall policy.  There are really two distinct management models, the “service management” and the “resource management” models.  Service management says that management policies are set where they’re offered to the customer in terms of SLAs.  You thus report conditions that violate or threaten SLAs, and you use service policies to decide what to do.  Resource management says that resources themselves assert “SLAs” that define their design range of behavior.  You manage the resources against those SLAs, and if you’ve assigned services to resources correctly, you’ll handle services along the way.

We’ve rather casually and seemingly accidentally moved away from formal service management over time, largely because it’s difficult in adaptive multi-tenant services to know what’s happening to a specific service in case of a fault.  Or, perhaps, it would be more accurate to say that it’s expensive to know that.  The problem is that when you start to shift from dedicated network devices to hosted software feature instances, you end up with a “service” problem whether you want one or not.

The goal of management is remediation.  The scope of management control, and the range of conditions that it responds to, has to fit that goal.  We’re not going to be focusing on rerouting traffic if a virtual function goes awry; we’re going to redeploy it.  The conditions that could force us to do that are broad—the software could break, the server, some of the internal service-chain connections, etc.  The considerations relating to where to redeploy are equally diverse.  So in effect, virtualization tends to move us back at least a bit toward the notion of service management.  It surely means that we have to look at event-handling differently.

Real-device management is straightforward; there’s a set of devices that are normally controlled by a management stack (“element, network, service”).  Conditions at the device or trunk level are reported and handled through that stack, and if those conditions are considered “events” then what we have are events that seek processes.  Those processes are large monolithic management systems.

In a virtual world, management software processes drift through a vast cloud of events.  Some events are generated by low-level elements or their management systems, others through analytics, and others are “derived” events that link state/event processes in service models of lifecycle management.  In the cloud world, the major public providers see this event model totally differently.  Processes, which are now microcomponents, are more likely to be thrown out in front of events, and the processes themselves may create fork points where one event spawns others.

The most important events in a next-generation management system aren’t the primary ones generated by the resources.  These are “contextless” and can’t be linked to SLAs or services.  Instead, low-level model elements in modern systems will absorb primary events and generate their own, but this time they’ll be directing them upward to the higher-level model elements that represent the composition of resource behaviors and services.  Where we run the event processes for a given model element determines the source of the secondary events, and influences where the secondary processes are run.

“Functional programming” or “lambda processing” (and sometimes even “microservices”, in Google’s case) is the software term used to describe the style of development that supports these microcomponent, relocatable, serverless, event-driven systems.  We don’t hear about this in next-gen networking, and yet it’s pretty obvious that the global infrastructure of a major operator would be a bigger event generator than enterprises are likely to be.

The event management part of next-generation networks is absolutely critical, and so it’s critical that the industry takes note of the functional programming trends in the cloud industry.  If there’s anything that truly makes the difference between current- and next-generation networks, it’s “cloudification”.  Why then are we ignoring the revolutionary developments in cloud software?

That should be the theme of next-gen networking, and the foundation on which we build the solutions to all three of the problems I’ve cited here.  You cannot test software, establish functional validity of concepts, or prove interoperability in a different software framework than you plan to use, and need, for deployment.  The only way we’re going to get all this right is by accepting the principles evolving in cloud computing, because we’re building the future of networking on cloud principles.  Look, network people, this isn’t that hard.  We have the luxury of an industry out there running interference for us in the right direction.  Why not try following it instead of reinventing stuff?  I’ll talk more about the specifics of a cloud-centric view in my blog tomorrow.