One Operations Model for Networks and Services

OSS/BSS is part and parcel of the business of network operators, the way they manage their people, sell their services, bill and collect, plan…you get the picture.  The question of how OSS/BSS will accommodate new stuff like SDN and NFV is thus critical, and anything critical generates a lot of comment.  Anything that generates comment generates misinformation and hype these days, unfortunately, so it’s worth talking a look at the problem and trying to sort out the real issues.

To start with, the truth is that accommodating SDN and NFV isn’t really the issue.  The challenge operations systems face these days arises from changes in the business of the operators more than the technology.  Operators acknowledge that their traditional connection services are returning less and less on investment in infrastructure.  That means that they have to improve their revenue line and control costs.  It’s those requirements that are really framing the need for operations changes.  They’re also what’s driving SDN and NFV, so the impact of new technologies on OSS/BSS is really the impact of common drivers.

Human intervention in service processes is expensive, and that’s particularly true when we’re talking about services directed at consumers or services with relatively short lifespans.  You can afford to manually provision a 3-year VPN contract covering a thousand sites, but not a 30-minute videoconference.  We already know that site networking is what’s under the most price pressure, so most useful new services would have to move us to shorter-interval commitments (like our videoconference).  That shifts us from depending on human processes to depending on automated processes.

Human processes are easily conceptualized as workflows because people attack tasks in an orderly and sequential way.  When we set up services from an operations center, we drive a human process that when finished likely records its completion.  When we automate service setups, there’s a tendency to follow that workflow notion and visualize provisioning as an orderly sequential series of actions.

If we look at the problem from a higher perspective, we’d see that automated provisioning should really be based on events and states.  A service order is in the “ordered” state when it’s placed.  When we decide to “activate” it, we would initiate tasks to commit resources, and as these tasks evolved they’d generate events representing success or failure.  Those events would normally change the state of the service to “activated” or “failed” over time.  The sum of the states and the events represents the handling model for a given service.  This is proven logic, usually called “finite-state machine” behavior.  Protocol handlers are written this way, even described this way with diagrams.

The processes associated with setting up a service can now be integrated into the state/event tables.  If you get an “activate” in the “order” state, you initiate the provisioning process and you transition to the “activating” state.  If that provisioning works, the success event then transitions you to the “activated” state where you initiate the process of starting billing and notifying the customer of availability.  You then move to “in-service”.  If provisioning fails, you can define how you want to handle the failure, define what processes you want invoked.  This is what event-driven means.

The reason for this discourse is that OSS/BSS systems need to be event-driven to support service automation, for the simple reason that you can’t assume that automated activity is going to generate orderly progression.  A failure during service activation is not handled the same way as one when the service is “in-service” to the customer, and we can’t use the same processes to handle the two.  So what is necessary in operations systems is to become event-driven, and that is an architectural issue.

We always hear about conferences on things like billing systems and their response to SDN or NFV.  That’s a bad topic, because we should not be talking about how processes respond to technologies.  We should be talking about a general model for event-driven operations.  If we have one, billing issues resolve themselves when we map billing processes into our state/event structure.  If we don’t have one, then we’d have to make every operations process technology-aware, and that’s lousy design not to mention impractical in terms of costs and time.

But a “service state/event table” isn’t enough.  If we have a VPN with two access points, we have three interdependent services, each of which would have to have its own state/event processes, and each of which would have to contribute and receive events to/from the “master” service-level table.  What I’m saying is that every level of service modeling needs to have its own state/event table, each synchronized with the higher-layer tables and each helping synchronize subordinate tables.  The situation isn’t unlike how multi-layer protocols work.  Every protocol layer has a state/event table, and all the tables are synchronized by event-passing across the layer boundaries.

Where do our new technologies come into this?  First, they come into it in the same way the old ones do.  You can’t have automated operations that sometimes works and sometimes doesn’t depending on what you’ve translated to SDN or NFV and what’s still legacy.  All service and network operations has to be integrated or you lose the benefits of service automation.  Second, this illustrates that we have a level of modeling and orchestration that’s independent of technology—higher levels where we ask for “access” or a “VPN” and lower levels where we actually do the stuff needed based on the technology we have to manipulate to get the functionality required.

We could deploy SDN and NFV inside a “black box” that could also contain equivalent legacy equipment functionality.  “AccessNetwork” or “IPCore” could be realized many different ways, but could present a common high-level state/event process table and integrate with operations processes via that common table.  Any technology-specific stuff could then be created and managed inside the box.  Or, we could have a common architecture for state/event specification and service modeling that extended from the top to the bottom.  In this case, operations can be integrated at all levels, and service and network automation fully realized.

Our dilemma today is that every operator is looking for the benefits of event-driven operations, but there’s really nobody working on it from top to bottom.  If you are going to mandate operations integration into SDN or NFV state/event-modeled network processes, then you have to define how that’s done.  But SDN and NFV aren’t doing management.  Management bodies like the TMF are really not doing “SDN” or “NFV” either; they’re defining how SDN or NFV black boxes might integrate with them.

We can’t solve problems by sticking them inside an abstraction and then asserting that it’s someone else’s responsibility to peek through the veil.  We have to tear down the barriers and create a model of service automation that works for all our services, all our technology choices.