One of the primary design goals of CloudNFV was to create a viable management framework for networks composed in whole or in part from virtual or cloud-hosted functions. Along the way we needed to deal with another of our principles; that the cloud, SDN, and NFV had to be united into a single concept. While management in this new world of virtual resources, virtual services, and virtual everything (except perhaps for payments!) isn’t easy, I think that picking the right approach is helpful.
One primary principle of virtual management is that you can’t let virtual functions poll real resources for management information. The IETF’s i2aex project has the right idea; there has to be a repository between resources and applications that need management data to decouple the two. Otherwise the growth in virtual functions creates an exploding load on devices and the network just to exchange control telemetry. In recent conferences, operators were already reporting an explosion in management traffic; it can’t get worse if we hope to reduce network costs with NFV.
A second primary principle is that while you want to be able to support legacy management practices in an evolution to virtual functions, SDN, and the cloud, you don’t want to constrain modern virtual-cloud environments by tying them to device-oriented management systems. I had a meeting with a Tier One recently, and two people sitting right next to each other had diametrically opposed views of whether traditional OSS/BSS frameworks had to be supported, or had to be discarded! Both are right in some situations, so you have to support both.
The third principle is that virtual management is all about context. Real devices have real, fixed, relationships both with each other and with services. Virtual devices can’t have real relationships at all, so you have to be able to create a framework for management using the same principles of virtualization that you applied to services. If it’s convenient for you to manage the network like it’s one enormous god-box device, so be it. If you want to manage it as virtual devices that map closely to previous real devices, that’s your call. Whatever you pick, though, the same resource status has to feed your view. You can’t mold real resources to suit virtual management dimensions.
In CloudNFV, we started off by saying that the context of all management is services. This management stuff is all being done so operators can sell something credible to consumers or enterprises, and so we have to reflect everything back to that context from the first. We structured our service model from top to bottom based on the TMF GB922 Services hierarchy, part of the Frameworx/SID model. To accommodate NFV and cloud principles, we added NFV-based elements at the “bottom” of the Service hierarchy, and we also added virtual-cloud-driven connectivity to the entire hierarchy so that cloud principles could deploy and connect elements of services—whether virtual or real.
To get resource data, CloudNFV has a concept we call Active Resource, which is an unstructured semantically overlaid repository of everything we know about any resource in the infrastructure. Things we call “Visualizers” take traditional management interfaces and MIBs and reflect them into Active Resource. There are no restrictions on either the source or nature of the data collected. Anything that can feed a Visualizer can contribute management state, and management state is stored and correlated in real time so any view of it is always up-to-date and in the format you expect, regardless of what you expect or how the information was collected to begin with.
But this doesn’t provide full context, so we’ve done even more. A virtual function or software application either has a management interface of its own (in which case we can connect to it through a Visualizer) or it is comprised of components that have a collective, systemized, management view. So we associate a Management Visualizer with each layer of the GB922 hierarchy, so that any time you group a bunch of devices or virtual functions or even combinations of the two into a cooperative system, you define how that system is supposed to appear in a management sense. And because you can browse up and down one of those GB922 service hierarchies from retail service to virtual function and back, you can also browse through the management views in a similar way.
Another way that context is critical is in management of hybrid services. If a word processor is included in a unified communications and collaboration service, do we manage it as a cloud application and manage the rest of the service as a network service? Hardly. The fact that our word processor is part of UC/UCC means it’s a carrier service and should be managed as one. Put that same word processor in a SaaS application, though, and it’s a cloud application, managed as software would be. That’s context-sensitive management.
That’s still not enough. Suppose you want to consider a hundred customer-specific iterations of a single set of virtual functions as one collective virtual device? We build VPNs not by deploying sets of customer-specific routers, but by inducing customer-specific behavior from a system of shared routers. So we can, with CloudNFV, collect virtual functions into horizontal systems based on common elements and manage that. And if you don’t like vertical service-based management or horizontal device-based management, we can go diagonal for you too (if you can figure out what that means!) Any view of resource/service relationships that creates efficient operations processes is fine—in fact, they’re all just different designs on the same coin to us.
When you ask CloudNFV whether a given service is meeting its SLA, what CloudNFV does is drill down through the GB922 layers, viewing the Management Virtualizers along the way. Each one of these presents a virtual MIB that has elements whose values are set by the relationships of real resource states in the network and data centers. A virtual firewall’s state might be the sum of the state of the VM it’s hosted on, the data center network it runs in, and the WAN connectivity that links it to the user and to the network. All that’s defined in the Visualizer, and if you want a different management derivation you just change the rules. Same for presentation; a GUI based on HTML or a management API are just different presentation formats to CloudNFV.
This structure also allows CloudNFV to offer what could be called “management services” including configuration and reconfiguration. “Daemon” processes can use Visualizers to check the state of services, resources, etc. and activate management processes in response to the conditions they find. That means that you can use congestion at a given virtual function level to invoke management processes to spawn more copies of something and load-balance them—what the NFV ISG calls “horizontal scaling”. If applications can detect the need for scale-out or scale-back internally, they can also use a management services API to invoke this process themselves.
“Federation” or pan-provider services are important to both service providers and users, and so CloudNFV management is inherently federation-ready. Services are composed using either a provider’s own assets or assets made available by partners, and can span multiple networks. We support federation at the cloud infrastructure level (one provider hosts all service logic and deployment tools and uses IaaS services from other providers) up to the service level (partners provide wholesale service components that can be integrated into retail services as needed). To provide controlled management visibility, federation partners are linked through a Federation Connector, which is a form of Visualizer that allows for management interaction only on the components of a federated service where providers have agreed to visibility.
A final point on CloudNFV management is that it’s based on totally scalable, fault-tolerant-distributed-cloud technology itself. Resources, whether data storage or processing, can be added dynamically and the logic is all distributed-state-based to allow for both fail-over and scale-in/out. CloudNFV management can even expand itself if needed using its own principles, to scale as necessary to support uniform, cloud-ready, NFV and cloud service management using a completely open and extensible platform.
Everyone agrees that management of virtual resources of any sort is a complicated problem. Because SDN, NFV, and the cloud are all based on a slightly different vision of virtualization, they create a compound challenge for planners who know the three must be one at some point. We’ve tried to face that challenge head-on, and to define an architecture that anticipates the multi-dimensional challenges of virtualization. We think that’s the only way to address them effectively.