In prior blogs I looked at the NFV deployment model and the way that management as ETSI defined it would presumably work within a “typical” deployment. The question this last of my more detailed explorations of NFV management will deal with is how “NFV management” relates to management and operations in a broader sense. You can’t, after all, support services by managing only NFV infrastructure. You almost certainly can’t built them that way either.
There’s no single management and operations model in play today among operators, but whatever is out there has to deal with those two areas in some way. “Management” is normally applied to the physical resources used to build services, and “operations” to the business processes and commercial tasks related to service sale and maintenance. It wouldn’t be unreasonable to say that operations is a customer-facing process and tool set, and management faces resources. Since the TMF links these two in its SID data model, it should be clear that many view management to be “under” operations. The fact that many services today are still provisioned through NMSs says that many see them separated.
Another TMF concept is useful in understanding management integration. The Enhanced Telecommunications Operations Map or eTOM is a picture of the steps associated with creating, selling, sustaining, and terminating a service. There are a number of eTOM references, depending on whether you are or are not a TMF member, but here’s a basic public vision. eTOM is divided into levels or layers, and at the most detailed level it’s a pretty comprehensive picture of what has to be done from soup to nuts, service-wise.
In the real world, most eTOM activities are intermingled between human and automated tasks, and between operations and management tools (using my previous division of the two). From low-level eTOM, one could almost picture service operations as a modular function, where different pieces might be implemented different ways and in different places. As part of a service, NFV has to integrate in some way with eTOM.
How? NFV, in the strict construction of the ETSI ISG, is a set of specifications that define how real network functions hosted in traditional devices could instead be deployed as cooperative software elements on some agile resource set. The operative part of “NFV” that threatens the traditional management/operations model is the “virtual” part. In effect, virtualization of any sort creates an intermediary. We used to have customer-facing and resource-facing pieces, remember? Well, now we have this “virtual” piece that might look like a resource from the customer side, a customer from the resource side, or all or none of the above.
In the ETSI E2E architecture, there is an implicit vision of how virtualization and management combine. We have an Element Manager that’s almost cohabiting with VNFs and is responsible for management of the VNFs themselves in the “customer direction”. We have a VNF Manager that is (via some intermediary elements) responsible for managing the resource relationships with the VNFs. Presumably, though this isn’t stated explicitly, we have resource management tools and practices aimed at the NFV Infrastructure as a pool of devices.
IMHO, the ETSI activity has focused most of its specification work on the VNF Manager piece as the “management” approach. This is consistent with what I’ve called a “black-box” view of network functionality. A VNF is a function. A function is managed as a function, not as a collection of chips (today) or software (under NFV). What happens to make software into the manageable function we expect is largely the VNFM’s problem? And largely what ETSI worries about. We could draw this out if you like. Make a box all the way on the right and call it “traditional management/operations”. Draw a box to the left of that with a bidirectional arrow connecting it, and call the new box “ETSI Element Manager”. Draw another right-working box called “VNFs”, then one more called “VNFM” and finally one called “VIM/NFVI” and you have the picture.
This picture doesn’t necessarily represent a break in any management model. If we assume that the ETSI EM depicts the functional model of the underlying structure completely and accurately then we could substitute a VNF implementation for a real device 1:1 and nobody would care. The devil is in the details.
Here’s an example. We can horizontally scale components in NFV, right? That’s supposed to be one of the benefits. You don’t horizontally scale chips or devices on demand, so the current management model for Real_Widget wouldn’t have the properties of Virtual_Widget I’d like to sell, whatever a widget is. However, I could in theory build a new Widget-MIB that had the fields necessary to represent incremental NFV functionality, and if my management system could contend with that extra data I’d still be fine.
Another issue less easily fixed is in the concept of FCAPS, which is traditionally seen as the high-level vision of “network management”. All of the letters in the acronym represent something that had a single logical meaning in the old device days, but has two meanings in the world of NFV. What’s a “fault?” Is it a failure of the virtual device, meaning that we’ve exhausted the automatic remedies for replacement/reconfiguration of VNFCs that NFV might offer, or a failure of an underlying resource?
We could assume operations integration with FCAPS would work if we applied the acronym to the virtual world. In the real world, downward to the resources, we have a problem of correlation because the relationship between resource faults and virtual device faults depends on how we’ve allocated resources and the extent to which we attempt automatic remediation.
Which raises the challenge of virtualization. If we want operations to know about real problems, real resources, real capacities and cost accounting, then we have to dip below the virtual. We have to somehow tie operations processes to the deeper reality. That’s also true of management processes, because as we travel down the traditional service-network-element-management stack in a virtual world, we find there’s a basement, which is the virtual-to-resource mapping.
ETSI talks in general terms about operations/management relationships with the NFV software, but the interfaces for these are not defined nor are there any solid rules for how the relationships would be structured. The TMF has a good opening approach in its customer/resource-facing service model and the (NGOSS Contract, now part of GB942, the TMF Business Services Suite) notion of steering service events to suitable processes through the intermediary of a service contract data model, but the specifics of this aren’t real clear even for the real-device world and that part of the TMF model is (according to my operator sources) rarely implemented.
In a standards sense, then, we’re not solving the problem yet. Unfortunately, we can’t just ignore management integration because there will surely be no pure NFV service early on, and likely never a pure NFV service even down the line. There are going to be legacy devices in networks for a very long time, likely forever. Given that, and given that operations efficiencies and service agility isn’t very meaningful if you confine either or both to just a piece of a service, we need to harmonize management completely. Here and there, federated and solo, NFV and legacy, applications and services, transport and connection. Services to users have few boundaries even now, and management can’t have them either.
So here’s where I think we are. There are only two ways to make a management connection from top to bottom. One is to build “virtual-device MIBs” that could be based on current “real” MIBs but that would reflect data elements that represented any new service features, costs, or conditions that would arise in an NFV world. We’d then have to populate these fields from real resource information as the service progressed through its lifecycle. The other is to provide operations/management coupling through the virtual layer into the real resources. My own work has always focused on the second of these approaches because I’m leery of having resources living behind a perpetual mask, but there’s no question that it would be easier to attack the former approach than the latter.
If this second approach is taken, then the service data model could be supplemented with the information collected when binding service components to each other and to resources. These bindings could be traversed to dive into more detail on service state. You could also, at any level of “object” in the model, describe the state/event relationships that would fulfill the TMF concept of mapping events to services. It’s obviously more complicated, but if you did this you could define any current or newly developed operations process at any state/event intersection, and provide full integration of management components from top to bottom.
We have to do either a virtual-device-MIB or data-coupled management model; I don’t believe any other options even exist. Unfortunately, I don’t think we have a convincing model for either in place; not in the ETSI ISG or TMF. So I’d like to see operators and vendors cooperating (perhaps even in PoCs and lab trials) to explore the consequences of each approach and the alternatives for implementation.