I hope that convincing you that having NFV evolve in sync with the leading edge of the cloud hasn’t proven too difficult. If it has, then I hope the rest of this series will do the job. If you’re on board, then I hope that the rest gives you a few action items to study. The next step in the series is going to be a philosophical challenge for some. I want you to stop thinking about services as we’re accustomed to, to stop thinking about NFV as “service chains”. Instead I want to think about a service as being a series of workflows, and functions are not sites through which we chain services, but steps we put in the path of them. That applies both to the data plane and the management and control plane.
All services that matter today have an intrinsic topology, and there are essentially three such topologies recognized today; the LINE or point-to-point, the LAN or multipoint, and the TREE or point to multipoint. The workflows, or connection paths, that form these topologies are impressed on infrastructure based on the combination of what the flow needs and where those needs can be met. One of the fundamental challenges that NFV faces is that unless you presume a fairly rich deployment of cloud data centers, particularly close to the service edge, you find yourself looking for a Point B that you can reach from a given service-flow Point A without unnecessary diversion. Longer paths consume more resources in flight, generate more risk of failure because they traverse more things that can fail, and generate more delay and packet loss.
Ideally, you’d like to see an optimized path created, a path that transits the smallest number of resources and takes the most direct route. This is true with legacy technology, virtual functions, or anything between. Where NFV complicates the picture is in the limitations created by where you can host something, relative to the structure of the network. This is no different from the cloud, where in most cases you have a small number of cloud data centers that you have to connect with to host functions there. When the number of data center hosting points is limited relative to the service geography, the optimum routes are distorted relative to “pure” network routes because you have to detour to get to the data centers, then return to the best path. You could lay such a path onto a network map without knowing anything about hosting or virtual functions if you could presume a fairly dense distribution of hosting points. Yet nobody talks about NFV as starting with the formulation of a set of paths to define the optimum route.
This is the first of many places where we nail NFV to the traditional ground of legacy technology. We are trying to define network functions that are as near to location-independent as cloud density permits, and then we start by defining where things go based on abstract policies. We should be going about this in a totally different way. The only fixed points in an NFV hosting plan are the service edge points, the place where users connect. What that means is that we would normally want to assume that primary hosting responsibility lays right at those edges. You put as much there as you can, hosted in CPE or in the edge office. You then work inward from those edge points to site additional elements, always aware that you want stuff to be where the natural service flows pass, and you want technology associated with specific features to be hosted where those features appear in service logic.
A VPN service is access devices linked to access pipes, linked to VPN onramps or gateways. You can see that as we start to move inward from the edge we find places where geography and topology creates concentrations of access pipes, which means that we could add some onramp or aggregation features there. We could also, if we had issues with features hosted closer to the edge, pull back some of those features along the path of the flow to the next natural hosting point, that point of aggregation or gateway. This approach presumes what I think a lot of operators have deduced for other reasons, which is that services will tend to be better if we can push features close to the user.
If something breaks, and if you have to redeploy to get around the failure, the best strategy will be one that has the least impact on the commitments already made, which in most cases will be one where the substitute resource used is proximate to the original one. A minor break that impacts only one service flow (from edge to gateway or aggregator) won’t change service topology much at all, which means that you don’t have to recommission a bunch of things and worry about in-flight data and delays. A major fault that breaks a bunch of paths would probably mean you’d have to examine failure-mode topologies to find the choices that would result in the smallest number of impacted flows.
If you have to add something to a service, the right place to put the addition would be based on the same service flow analysis. A single site, or a bunch of sites, wants a new feature? First goal is to edge-host it. Second, pull it back to the primary aggregation level behind the user connection points, the place where natural traffic concentration gets multiple flows together.
To make this work, you’d have to assume (as I have already noted) a fairly rich deployment of cloud data centers. In fact, it would be best to have one in every facility that represented a concentration of physical media. Where fiber goes, so goes service flows, and at the junction points are where you’d find ideal interior points of hosting. You’d also have to assume that you had fairly uniform capabilities in each hosting point so you didn’t have a risk of needing a specialized resource that wasn’t available there. You’d also probably want to presume SDN deployment so you could steer paths explicitly, though if you follow a gateway-to-gateway hopping model across LAN and VPN infrastructure you can still place elements at the gateway points and the edge.
The special case of all of this comes back to that functional programming (Lambda). If we viewed all “VNFs” as being pipelined Lambda processes, then we’d simply push them into a convenient hosting point along the paths and they’d run there. Unlike something that we had to put into place and connect, a pipelined function is simply added to the service flow in any convenient data center it transits. You don’t really have to manage it much because it can be redeployed at a whim, but if you want to control its operation you could presume that each Lambda/VNF had a management pipeline and a data pipeline and that it passed its control messages along, or that every hosting location had a management bus to which each could be connected.
The management bus comment is relevant because if we should be aware of a flow model of connectivity and function placement, we should be even more mindful of these concepts when we look at the distribution of the control software used in SDN and NFV. I remarked in my last blog that there might be benefits to using a robust container approach to hosting VNFs because containers seemed to lend themselves more to distributability of the basic control functions—the lifecycle management. Perhaps we can go even further.
A modeled service is a hierarchy of objects, starting with the “service” at the retail level at the top, and then decomposing to an upside-down tree where the tendrils of the branches touch the resources. In a model like this, only the lowest layer has to be made aware of resource state. The higher-level objects, if we follow the abstraction principles to their logical conclusion, would not see resources because they’re inside the black box of their subordinate objects and so are invisible. What these higher-level objects know about is the state of those subordinates. This implies that every object is a finite-state machine that is responding to events generated within the model tree.
If every object is a self-contained state-event process, we could in theory distribute the objects to places determined by the service topology. Objects close to the bottom might live closer to resources, and those at the top closer to services. In effect, all these objects could be serviced by interpretive Lambdas, pushed in a stateless way to wherever we need them and operating off the specific model element for its data and state. This model is a logical extension of how the cloud is evolving, and we need to look at it for NFV, lest we fall into a trap of trying to support dynamic virtualization with static software elements for our management and orchestration.
Nothing here is “new”; we already see these trends in the cloud. Remember that my thesis here is that from the first NFV was expected to exploit the cloud, and that means understanding the best of cloud evolutionary trends. We need the best possible implementations to make the best business case, and drive optimum deployment.