Can We Achieve Universal Service/Resource Orchestration in the Real World?

In many of my past blogs I’ve talked about the question of operations transformation, and proposed that it be considered co-equal to network infrastructure transformation.  I’ve also noted that most network operators are weighing the question of how to go about operations transformation.  Perhaps because of this (or perhaps because operator views are driven by vendor offerings) vendors are also thinking about the way to get operations into the new age.  What exactly does that mean?  That’s what I’d like to consider here.

Operations systems have always been the top end of the operator business, the part that faces the customer.  This is the “business support system” or BSS part of the picture today, and it’s been largely responsible for things like billing and accounting.  In the past, the customer-facing side of the process, relating largely to orders and order status, was pushed down into the provisioning of services by providing support for human (“craft”, as they say in the operator world) processes.  These were the operations support systems, OSS.  Everything was happy until IP convergence created some cracks in this process, for two reasons.

First, IP services (like all packet services) are non-deterministic and thus require their own management processes (fault, configuration, accounting, performance, and security or FCAPS) to sustain their operation.  These technical network processes were easier to deploy outside of OSS/BSS, and this created a kind of network adapter plugin notion to allow OSS and the NMS/FCAPS processes to coordinate their behavior to suit user needs.

Second, IP ushered in the packet age and multiplied the number of services that could be provided, and the number of functional components inside each.  This encouraged both operators and OSS/BSS vendors to create service-specific higher-layer features, often paralleling some of the OSS/BSS elements and almost always overlapping each other in terms of functions and features.  One of the technical challenges that came out of this IP convergence period was a collision in the basic model of “provisioning”.

In the old TDM days, you provisioned a service by performing a bunch of largely manual tasks that could include running new access connections, installing CPE, and so forth.  These processes were undertaken in a nice orderly flow, a linear progression of steps.  If something broke, you had a tech fix it.  Shared tenancy was non-interfering and networks didn’t take their own steps to fix things.  Again, it was an easy linear flow to imagine.

In the IP world, services are more often coerced from in-place resources, and while the setup process could still be visualized as a flow of steps, the rest of service lifecycle management didn’t fit that model.  Self-healing adaptive networks do all kinds of things on their own and report issues (either ones that they’ve fixed but with some loss of service continuity or ones they could not fix) to the higher layer.  This is an “event model”, and it’s difficult to fit random asynchronous events into a nice linear flow.  This is what gave rise to the drive to make OSS/BSS “event-driven”.

SDN and NFV take things even further because they exacerbate two primary issues that IP introduced.  First, network infrastructure was even smarter and autonomous than before, with all kinds of lifecycle management processes built in.  Many of these processes were intended to provide service assurance, something that had often been viewed as an OSS function, and many required changes to customer billing, etc. that was always a BSS activity.  This raised the question of whether “services” and “resources” had such a flexible relationship that it would be impossible to create a fixed link between network management and service management even through an adapter.  Does the orchestration and management of resources then have to rise up somehow to be visible to OSS/BSS, or does OSS/BSS somehow have to be orchestrated and managed by resource-oriented processes like NFV MANO?

The easiest way to frame the results of all these changes is to postulate the difference between a traditional operations flow-driven structure and what someone would likely come up with today if there were no incumbent technology to worry about.

Today’s system could be likened to a service-bus workflow, where a work item like an order or a change moves along a pathway from a determined starting point to a determined completion point.  Along the way it would encounter data dropped off by asynchronous tasks, and based on this data it might pause or change course.  This sort of system is used routinely in transaction processing for enterprises, but there it faces a simpler set of asynchronous tasks and there are fewer requirements to create new rules for new services or new market conditions.

A “modern” system would look more like a microservice set that’s coupled to a service data model.  The data model, by providing a set of state/event relationships, associated processes that could be network/resource-linked or service-linked with a specific lifecycle state and a specific event within it.  If you get a CHANGE event in the DEPLOYING state you change resources but perhaps make no billing adjustment because you aren’t billing yet.  In the OPERATING state you’d have to change resources and presumably also change the billing.

It sounds like we’re talking about oil and water here, but the differences aren’t as irreconcilable as they’d appear.  The majority of OSS/BSS systems have evolved over time to be highly “componentized” meaning that their functionality has been divided into logical components rather than composed into one big monolithic application.  Workflow/service bus systems use components too (though in a different way) and the TMF proposed to do event coupling to processes (componentized processes) via the same service-oriented architecture (SOA) now used by many enterprise workflow/service-bus transaction processing systems.  It wouldn’t be impossible to simply “compose” the event-to-process linkage using a state-event table in a data model without changing current operations systems much.

Why then hasn’t it been done?  Operators think that OSS/BSS vendors resist the notion of composed event-to-process coupling because it would allow buyers to shop for individual processes instead of entire OSS/BSS systems.  “It’s the classical opposition to best-of-breed or point competition,” one operator told me, meaning OSS/BSS vendors fear that competitors would eat off little pieces of their business when the competitor wouldn’t be credible to provide a total solution.  Others say that it’s impossible to map out who supports such a composed configuration, making operators rely on third-party integrators.

There are other issues to be addressed here too.  Many operations systems today have evolved service silos to a whole new level, to the point that they duplicate processes and even data elements.  If the components of a nice new compositional operations system all expect their own databases to be in a certain state when they run, then the integration complexity explodes, and if different vendors provide the components of such a system, all bets are off.  It’s not that this problem couldn’t be solved too, but that it’s hard to see who has the incentive to solve it.

Well, it was hard until some recent trends offered hope.  Two things might break the logjam here.  One is NFV orchestration and the other is open-source.  And yes, the two might even combine.

NFV orchestration could in the right hands generate a model-driven service architecture with event-to-process component coupling, just to make NFV lifecycles work out at the technical level.  If this framework were suitable for use by operations processes too, then operators could build operations processes into the framework on their own, using components supplied by third-party vendors or (you guessed it!) open source.

It’s also possible that NFV vendors who don’t have a horse in the legacy OSS/BSS race could use a microservice-based, model-driven, service/resource lifecycle process to gain traction.  These days, progress in infrastructure tends to be made not as much by startups as by non-aligned major vendors creeping in from other spaces.  NFV, promising as it does a shift from network appliances to software and hosting, is a perfect opportunity for that sort of thing.  I believe that all of the six vendors who can currently make a full NFV business case could support a completely orchestration-integrated operations/resources model.  Of the six, four have no entrenched position in OSS/BSS and the remaining two are not really OSS/BSS incumbents.

Open source is a way for operators to get functional pieces of operations software without vendors cooperating to make it happen.  If we had, for example, an entire OSS/BSS set up as a set of microservices, you could see how operators would be able to compose a lot of their future service, network, and SDN/NFV operations software from that inventory.  It’s not that farfetched either.  Sun Microsystems launched just such a project, which passed to Oracle and eventually became known as OSS for Java or OSS/J.  The project was taken over by the TMF, and it’s not made a lot of progress there, but the concepts and even a basic structure are available to members.

If the opposite of “hard” is “easy” then neither of these options really moves the ball.  If we recognize, as we should, that the real border isn’t the “hard/easy” boundary, but the boundary at which something that was highly improbable becomes likely.  It’s opportunity and need that create the pressure to test that boundary.

We have both aplenty.  Next-gen consumer and business services could add over a trillion dollars to somebody’s coffers.  Over half the current network equipment budget could shift to IT over time, and all the differentiation could be sucked out of Levels 2 and 3.  Every OSS/BSS could be rendered obsolete, along with all the NMS/EMS tools.  The new services that might come along could form a bridge between the “pure” OTT model of ad-sponsored experiences or high-price-pressure video and the traditional telco model, a bridge that Google could try to cross in one direction as operators try to cross in the other.

We are going to have virtualization in infrastructure, you can bet on it.  We’re going to have winners and losers, and you can bet on that too.  All we’re doing now is sorting out the details of both.