A lot of important relationships in networking aren’t obvious, and that is the case with the relationship between management system boundaries, models, and elements of infrastructure. Those relationships can be critical in service lifecycle management, which in turn is critical to sustaining operator profit-per-bit and driving or supporting “innovations” like SDN or NFV. In my last blog I talked about the general trend in modeling services, and here I want to talk about the relationship between legacy management concepts and evolving notions of infrastructure and service lifecycle automation.
Networks have always been made up of discrete elements, “trunks” and “nodes”. Trunks are the physical media that carries traffic and nodes are the switching/routing points that create end-to-end connectivity. Even in the day of SDN and NFV, you still end up having trunks and nodes, though the nodes may be hosted or virtual rather than real devices.
In the old days, you used to have to control nodes directly and individually, using a command-line interface (CLI). Over time, the complexity of that task became clear, and management systems evolved to simplify the process. Since the purpose of nodes was to create end-to-end connectivity, one of the management innovations was to provide tools to create the kind of cooperative node behavior that was essential in building routes through a network. In the OSI world, this built a management hierarchy—element management, then network management, and finally service management.
Three layers of management may simplify management tasks by standardizing them, but the layers themselves can be a bit complicated. To make matters worse, packet networks have almost always been built to offer adaptive behavior, learning topology to find addresses and also learning about conditions so that failures can be accommodated. Adaptive behavior generally takes place inside areas of the network which in routing are known as autonomous systems (ASs). So we have elements, networks, and ASs that all have to be factored in.
SDN and NFV add in their own stuff because SDN has a controller that handles route management on the devices under its jurisdiction, and NFV has to deploy and manage the virtual resources that host network functions. There is, in both SDN and NFV, a world “inside” a virtual box that has to be managed, so that creates another element.
We can’t keep listing all these things, so let’s adopt some terms here that are generally congruent with industry practices. Let’s say that a group of resources that work cooperatively are a domain. A network consists of one or more domains, and a domain has external behaviors that end up being something like what our old friends, nodes and trunks, would create. Domains are ingrained in current practices, and so it’s helpful to retain the concept as long as it doesn’t run at odds with progress.
The structure that we’ve built here is multi-layered by both nature and design. We have domains that interconnect with each other, and each of these are “networks”. Underneath the domains we have perhaps another domain layer (the NFV resource pool) that has to be managed to push up virtual functions into the domain layer. Above this, we have services that coordinate behavior across the domains.
This structure seems to naturally break down into an intent-model hierarchy. A network might be several domains, which means that a network “model” could be said to decompose into multiple domain models. Those in turn might decompose into models that defined that process of “elevation” that made nodes created by hosting something on a resource pool visible and functional.
One obvious question here is whether a given intent/domain model models the domain and all its services, or whether it models a specific service on a domain level. Do we have an “IP VPN” service as well as perhaps a half-dozen others, exposed by an IP domain, or do we have separate domain models for each of the services, so that every domain that offers IP VPNs has a separate intent/domain model for it? There is no reason why you couldn’t do either of these two things, but the way that the higher-layer processes worked would obviously be different, one manipulating the same intent/domain models for all services, and the other manipulating service-specific models.
Generally, network services (those that are made up of real or virtual nodes and trunks and offer connectivity) will manage cooperative behavior via a management API. An intent/domain model would envelope this API and use it to coerce the properties it needs from the underlying resources. We can say that network domains are represented by management APIs, and conversely we could say that a domain is created by every set of resources that have their own management API.
Resource domains, meaning hosting domains, are different because there is no inherent functionality presented by one. You have to deploy some virtual network function onto such a domain, and when you do you essentially create a node. You also have to connect network functions to each other, and the outside world, so we could say that a hosting domain builds virtual boxes that become nodes. When this happens, the virtual boxes should be managed at the network level the way that the real boxes would be.
This, I think, is a useful model exercise. What we’ve done is to say that the “network” is always nodes and trunks, and that the management of the network should look like node/trunk management with real boxes as the nodes. The hosting domain is responsible for node-building, and that process will always create a virtual box that looks like and is managed like some real box, and also will create ports on that node that link to real network trunks.
This can be loosely aligned with the ETSI model, in that they tend to cede management of the VNFs at the “functional” level to traditional management elements, but it’s less clear what the mapping between these virtual nodes and the implementation down in the hosting domain. This is where the principles of intent modeling are helpful; you can assume that the virtual nodes are intent models that expose the management properties of the physical network functions they replace, and that the vertical implementation of the VNFs in the hosting domain harmonize to these PNF management properties.
The problem with this approach is the agility of virtual functions. Real nodes are somewhere specific, and so the topology of a network is firmly established by node locations; there’s nowhere trunks can go and still be connected to something. With virtual functions, any hosting point can support any type of node. When you decide where to host something you create the equivalent of the PNF in the networking domain. But you have to decide, and when you do you have to trunk to that point to connect. This makes VNF-for-PNF static linkage difficult because the trunk connections to a PNF would be in place, period. For a VNF you have to build them.
It would seem that this argues for at least one other layer of abstraction, at least where you have to mingle VNFs and PNFs. A better approach is top-down, which is to say that you compose a service, map it to the fixed and hosting topologies, and then push down the VNF pieces to the hosting layer. This might suggest that there are three broad layers—a service domain, a network domain, and a hosting domain. At least, in a functional sense.
The purpose of a service domain would be to break out the service into functional pieces, like “Access” and “VPN”. The network domain might then map those pieces to a service topology that identifies the management domains that have to be involved, and the hosting domain then hosts the VNFs as needed, and makes the connections—the “trunks” and anything internal to the VNF hosting process.
This latter point raises its own issues. You can’t host functions without having connectivity, not only among the functional elements but also with the management framework itself. The industry standard approach for this from the cloud computing side is the subnet model, where an application (in the NFV case, a virtual network function) is deployed in a specific IP subnet, which in most cases is based on the RFC 1918 private IP address space. That means that the elements can talk with each other but not with the outside world. To make something visible (to make it into a trunk connection), you’d associate its private address to a public address. Docker, the most popular container architecture, works explicitly this way, and presumptively so do VM architectures like OpenStack.
Putting something in a subnet is one thing, but creating a service chain could be something else again. A service chain is an ordered sequence of functions, which means that the chains are created by tunnels that link the elements. Traditional container and cloud software doesn’t normally support that kind of relationship. You could surely set up a tunnel from the individual functions, but in the first place they probably don’t know the order of functions in the chain, and second they probably don’t set up tunnels at all today; they expect to have their stuff connected from the outside. You could create tunnels at Level 2 with a virtual switch and something like OpenStack, but does that mean that we need to host service chains in a virtual LAN at Level 2? OpenStack could also be used at Level 3 to create tunnels, of course, providing you had some white-box switches that knew Level 3.
In address terms, this isn’t the end of it. You generally won’t want to make the composed elements of a service visible and accessible except by the management processes. The user of a service has an address space that the high-level service fits into. The service provider will have a subnet address space for the virtual functions. They may also have an address space for the virtual trunks. Finally, the management structure, particularly for the hosting domain, will need an address space to connect to the lifecycle management processes. One of the things that’s essential in the management and modeling of services is accounting for all these address spaces. For example, we have to deploy VNFs into an address space. We also have to connect the virtual devices that VNFs create using an address space. Thus, we have to be able to manage these spaces with the modeling steps.
Finally, hosting adds complexity to both address space management and optimization. Remember that you can host a function wherever you have resources. What’s the best place, then? How do you factor in things like the location of other non-portable elements of the service? Not to mention questions of not creating single points of failure or violating regulatory policies by putting a service in or through a location where there might be governing regulations to worry about.
You can’t push these kinds of decisions down into OpenStack, because the issues are at a higher level. In the real world, with limitations on the capacity of a single OpenStack domain, you have to at least divide hosting by domains. You have to connect across data centers, where a single domain won’t have both ends or all the components. We risk devaluing both SDN and NFV if we don’t think about the bigger picture here.