How Do We Orchestrate Complex Services?

One of the things that 5G and the notion of “virtual network functions” has done is demonstrate that “provisioning” is getting a lot more complicated. A virtual function that mirrors a physical device lives in two worlds, one the world of network operations and the other the world of application/cloud hosting. If we expand our goal to provisioning on-network elements (virtual functions are in-network), or if we create virtual functions that have to be composed into features, it gets more complicated yet. What does “service orchestration” mean, in other words?

In the world of networking, services are created by coercing cooperative behaviors from systems of devices through parameterization. Over time, we’ve been simplifying the concept of network services by adopting an “IP dialtone” model overall, and with our increasing reliance on the universality of the Internet. This model minimizes the provisioning associated with a specific service; Internet connectivity doesn’t require any provisioning other than getting access working, and corporate VPNs require only the use of a specific feature (MPLS) that’s pre-provisioned.

Application networking has also changed over time. The advent of the cloud, containers, and Kubernetes has combined with basic IP principles to create what could be called the subnet model. Applications deploy within a subnet that typically uses a private IP address space. Within this, all components are mutually addressable but from the outside the subnet is opaque. The on/off ramps are then selectively “exposed” to a higher-layer network through address translation.

If we apply this combination to the world of Network Functions Virtualization (NFV) using the ETSI ISG rules, we can see a “service” that is made up of discrete hosted functions that are “service chained” with each other, and which obviously also have to connect with real devices. NFV took a network-centric view in service chaining and internal service connectivity, so the virtual functions were hosted in the “application domain” and connected in the “network domain”. VNFs lived in the network domain as virtual forms of devices.

What happens when we move away from original-ETSI-ISG models, to “containerized network functions”. It depends, and that’s the nub of our first challenge in orchestration. Do we create CNFs by containerizing VNFs but otherwise leave things as they were? If so, then we’ve not really moved away from the ETSI approach, only changed the hosting piece. VNFs are still devices in the network domain. Or, do we adopt the cloud/container/Kubernetes model of hosting, and extend the network assumptions of an “application domain” to include devices? Or will neither work?

This isn’t a philosophical issue. The reason why the NFV ISG locked itself into its model of networking was to preserve existing management practices and tools. If a VNF is a virtual form of a physical device, then we could argue that a network service is created by device management as usual, and that the virtual-device piece is created by software hosting. Ships in the night, except of course that if a virtual device fails and can’t be restored at the “hosting” level, it has to be reported as failed to the network service layer.

If ships in the night are still passing, so to speak, then the current management and orchestration practices associated with networks and applications can continue to be applied. There isn’t a compelling need to integrate them, so “service orchestration” may not be a big deal. In the world of NFV, this seems to be a safe bet.

But the world of NFV may not be the end of it. If we are deploying services, can we compose a service where a real device or a virtual device might be employed? If so, then our deployment orchestration process has to have the ability to recognize equivalence at the functional level but divergence at the realization level. We could even envision a situation where somebody wants “SASE-Device” as a part of the service, and that want might be fulfilled by 1) loading SASE functionality as a VNF into a uCPE element already in place, 2) shipping a uCPE or an SASE device and then setting it up, or 3) deploying SASE functionality inside a cloud data center.

That latter point implies that we have to consider “local conditions” when setting up a service. Those conditions could include not only what’s locally available already, but also perhaps constraints on what can be used, such as cost. This in itself suggests that it may be necessary to mingle provisioning steps across the network/hosting boundary. Ships in the night may collide.

The dilemma that faced the NFV ISG, and the choice to book ships in the night, is now being faced by Nephio, for the same reasons as it faced the ISG. Service orchestration, in a unified sense, is a big challenge. I took the time to lay out a complete model in ExperiaSphere, and it took six sets of tutorial slides to describe it. The late John Reilly of the TMF (one of my few personal heroes) created a model that, if properly extended, could have done much the same thing, but a decade earlier (NGOSS Contract). Implementation difficulties, then, may be the issue in service orchestration.

Or, it may be that we’ve not faced the issue because we’ve been able to avoid it, which is what I think is the case. From both the network side and from the hosting side, jumping between ships in the night seems unnecessary. The question is whether that’s true.

Currently, it probably is. As long as we stick with the NFV-established vision that a VNF is just a virtual device, then what we’re doing is codifying a vision where separation of orchestration doesn’t change just because we may add different virtual devices in the future. That’s because the NFV vision is not really limited to current physical devices; what we’re really saying is that the network behavior of a virtual function can be separated from the hosting. That’s true with 5G, which is the only truly standards-sanctioned use of NFV.

I’m not so sure that those ships can continue to pass, night or otherwise. There are a number of trends already visible that suggest we may need integrated provisioning.

First, the question of whether physical devices should be required to decompose themselves into microservices before being assembled into virtual devices was raised at the first US meeting of the ISG in 2013. At the time, there was great concern that requiring this sort of decomposition/recomposition would make vendors unwilling to submit their current device logic in VNF form, and I agreed with that. I wish I’d brought up the question of whether future VNFs might be better composed from microservices, but I didn’t. If we are to compose a VNF, then multiple software instances have to be selected and deployed to create the network-layer managed entity, and that may be beyond simple ships-in-the-night orchestration.

Second, operators worldwide are trying to automate service lifecycle management, from order entry through end of life. Automated lifecycle management has to be able to accommodate differences in “local conditions”, which for residential services means in a variety of areas, and for business services means in multiple locations. The more location differences there are, the more important it is to be able to change an abstract requirement (an SASE, to use my earlier example) into a locally optimal specific deployment.

Third, we are already seeing operator interest in “facilitating services” designed to bridge between their connection services and on-the-network or OTT services. It’s hard to imagine how these will be supported without a fairly comprehensive orchestration capability, not only because they’re likely to require feature deployment to accommodate OTT partners’ locations, but also because they’re a form of federation.

Which is the final reason. Even 5G and VNF-based services are already facing the question of how they are supported when an operator’s footprint won’t match the footprint of the customer. Every operator who has this problem (which is pretty much every operator) will conclude that they don’t want the customer to contract for each geographic segment independently, but if that isn’t done then the operator who owns the deal will have to accept federated contributions from other operators. How, without better orchestration, would those be lifecycle-managed?

Of course, logically deducing that we need an orchestration solution doesn’t create one. We could always just stay the current course, and if we do we’re likely heading for a future where “the Internet” subsumes every other notion of networking. If we presumed universal IP like that, we could use virtual-networking tools (which Kubernetes, for example, already supports) to build service connections both to the users and between feature elements. The problems with this are that it could take a long time to evolve, and that if it does it’s difficult to see how network connection features like 5G slicing could be differentiated without mandating paid prioritization on the Internet.

This is a complex issue, one that’s going to take some serious thinking to resolve, and it seems like there’s an almost-universal tendency to just assign the problem to a higher layer. That’s not been working to this point, and I don’t think it will work better in the future. Adding layers adds complexity, in orchestration, security, and pretty much everything. We seem to be complex enough as it is.