One very interesting thing I’ve started seeing is an increased operator focus on “service orchestration”, a topic that’s dear to me but has been as much tire-kicking as real commitment on the operator side. What’s behind this, and where might it be taking things?
I interact with a lot of operator people, but in most cases their service orchestration or lifecycle automation questions have been more in the nature of fact-finding. In the last month, I’ve had 11 operators open discussions on specific needs and projects, which is more than happened in the year prior to that. Clearly there’s new stuff going on, and (without compromising anyone’s confidences) I can get an idea of what that something is.
There are two drivers of interest, at least within my Group of Eleven. First, 5G deployment, which introduces an explicit requirement for function hosting. Second, service outages (often massive, like that of Rogers in Canada) that suggest that traditional operations centers may be too prone to errors. While both of these drivers are directly connected to the sudden interest, the nature of what kind of project the interest will drive (and when it will happen) is fairly different.
Let’s start with today’s challenge, as I think it’s reflected in the inquiries. Networks are cooperative systems where individual elements (devices, in most cases) are expected to collectively behave to support a set of user services. These devices have to be parameterized and managed to ensure service availability. That requirement has been getting more complex because of the growing dominance of consumer broadband services in the service mix. There are a lot more consumers, obviously, and consumers are unsophisticated with regard to network management. That combination then combines with operators’ shifting of service management tasks to portals and more automated systems. It also creates complexity, which always generates errors and raises operations costs. Thus, we could expect some interest in at least service lifecycle automation even if nothing was happening in the hosting area.
Something is happening, though. 5G, as I’ve noted, introduces a specific need for hosted functions, and at least half of those 11 operators believe that future service evolution will broadly mandate hosted functions. That means that they see function hosting likely expanding beyond 5G, and that sub-group has a slightly different slant on orchestration.
What I’m seeing in the 5G-specific inquiries is a concern about the ETSI orchestration and management process overall. There seems to be some discomfort with the ETSI Management and Orchestration (MANO) implementation, specifically with the fact that it’s disconnected from Kubernetes, which more and more operators understand is the way that cloud software overall is already focused. The obvious solution to this is something like Nephio, which would frame a function-hosting architecture around Kubernetes. I think operator interest in Nephio is emerging from this group, in part because some operator members of the project are among my Group of Eleven.
All this is good, but there is still an issue, IMHO. 5G raised the issue of MANO credibility in comparison with something like Kubernetes, a cloud approach. The issue is that MANO’s problems are potentially deeper, and it’s too early to know whether Nephio is going to address them all.
MANO is a device-centric approach, by which I mean that the goal of MANO is to deploy a network function that looks like a device, and that can be managed using traditional device-management tools, both at the NMS level and at the OSS/BSS level. This approach is fine where there are no issues created by presuming that a network function is a virtual form of a device, but that’s not the case even for 5G functions; they never were associated with an appliance at all. For the sub-group of my Eleven who were already looking at functions beyond 5G, this is a problem already, but most of the 5G-specific sub-group are still getting their arms around the overall MANO approach, and don’t see the potential risks, of which two seem to dominate.
One such risk is the “ships-in-the-night” management model. Higher-level operations (NMS/OSS/BSS) doesn’t “know” about virtual functions at all. Lower-level operations (MANO) doesn’t know about services at all. The effective integration of management and orchestration activities in both areas depends on creating a kind of shim that ties things together, which in the NFV model is the VNF Manager or VNFM. The VNFM is what makes a virtual function look like a device, by mirroring the “Physical Network Function” or PNF management model. There are a lot of potential slip-ups in this process, and it surely complicates the on-boarding of functions.
Another risk is the “composed function” problem. Recall that NFV presumes that all VNFs are created by virtualizing a real device. That means that VNFs are always “atomic”, that a set of cooperative VNFs has to be considered a network of devices. In 5G, this problem is (so far) minimized by the fact that the 3GPP sort-of-conceptualized even the VNF aspects of 5G architecture as a device model. If we start to think more broadly, we could see that even 5G elements might benefit from being composed from lower-level ones—the “cloud-native” or “microservice” approach. Get beyond 5G and you surely need that capability.
There’s a third potential problem that seems to cross all the potential solutions to function hosting, which is addressing. What you host functions on, the resource pool, has to be addressable. What you host there, the functions themselves, also need to be addressable. You don’t want your functions to be able to address the resource pool directly, nor do you want them to be able to address each other if they’re not part of the same service or at least part of a “service community” that represents the set of functions that are related to each other by ownership and cooperative properties. Finally, you probably don’t want functions or resource pools to be addressable by service users.
NFV never really addressed this issue, never set up a strategy or policy on how addresses for functions would be assigned and controlled. Kubernetes presumes that nodes (what you host on) and pods (functions) are assigned addresses from a specific CIDR block, and pod addressing is most likely based on some virtual-network framework. Thus, there are two levels of explicit “subnetworking”, node and pod. The service (application) level creates a third, and to get service elements to be visible in this address space, you have to expose them explicitly.
All this exposing is great, but easy to lose track of. Some cloud providers (Amazon, for example) has a tool that allows for address management in public cloud infrastructure, and something like this could be adapted to managing addresses for function hosting, but you can see from my comments that where services are made up of a combination of real devices and virtual functions, there may be situations where device management at the network level, and function management at the hosting level, have to be coordinated. In a network made up of physical devices and hosted virtual functions, the top layer of my three subnetworks has to be exposed to the same address space as the real devices are using, and management of the real-device properties of those functions has to be done within that address space. The hosted piece, the three-layer subnetwork structure, starts with that space, and dives down to the pod/function-subnetwork and node-network levels.
There’s an underlying (and usually ignored) lesson here for operators now getting serious about orchestration, and for all of us in networking. You can’t orchestrate something without understanding both the hosting and connection context, and there are differences between those things in the world of applications and the cloud, versus the world of services. The good news is that I’m seeing, for the first time, some serious thought being given to this issue, and that’s the first step toward identifying and solving problems.