A New Policy Managed Model for SDN (and NFV?)

One of the challenges that packet networks faced from the first is the question of “services”.  Unlike TDM which dedicates resources to services, packet networks multiplex traffic and thus rely more on what could be called “statistical guarantees” or “grade of service” than on specific SLAs.  Today, it’s fair to say that there are two different service management strategies in play, one that is based on the older and more explicit SLAs and one based on the packet norm of “statisticalism”.  The latter group has focused on policy management as a specific mechanism.

One reason this management debate has emerged recently is SDN (NFV has some impact on this too, which I’ll get to in a minute).  From the first it’s been clear that SDN promises to displace traditional switching and routing in its “purist” white-box-and-OpenFlow form.  It’s also been clear that software could define networks in a number of lower-touch ways, and that if you considered a network as my classic black box controlled from the outside, you’d be unable to tell how the actual networking was implemented—pure SDN or “adapted Ethernet/IP”.  Cisco’s whole SDN strategy has been based on providing API-level control without transitioning away from the usual switches and routers.

Policy management is a way to achieve that.  We’ve had that for years, in the form of the Policy Control Point and Policy Enforcement Point (PCP/PEP) combination.  The notion is that you establish service policies that the PCP then parses to direct network behavior to conform to service goals.  Cisco, not surprisingly, has jumped on policy management as a kind of intermediary between the best-efforts service native to packet networks and the explicit QoS and traffic management that SDN/OpenFlow promises.  Their OpFlex open policy protocol is a centerpiece in their approach.  It offers what Cisco likes in SDN, which is “declarative control”.  That means that you tell the network what you want to happen and the network takes care of it.

How does this really fit in the modern world?  Well, it depends.

First, policy management isn’t necessarily a long way from purist SDN/OpenFlow.  With OpenFlow you have a controller element that manages forwarding.  While it’s not common to express the service goals the controller recognizes as formal policies, you could certainly see a controller as a PCP or as a PEP depending on your perspective.  It “enforces” policies by translating service goals to forwarding instructions, but it also “controls” policies by providing a forwarding changes in forwarding down to physical devices that shuffle the bits.  If applications talked to services via APIs that set goals, you could map those APIs to either approach.

The obvious difference in the purist model versus the policy model as we usually see it is that the purist model presumes we have explicit control of devices where the policy model says that we have coercive control.  We can make the network bend to our will in any number of ways, including just exercising admission control to keep utilization at levels needed to sustain SLAs.  That’s our second point of difference, and it leads to something that could be significant.

With explicit control, we have a direct link between resources and services.  Even though the control process may not be aware of individual services, it is aware of individual resources because it has to direct forwarding by nodes over trunks.  With coercive control, we know we’ve asked for some behavior or another, but how our desired behavior was obtained is normally opaque.  That’s a virtue in that it creates a nice black-box abstraction that can simplify service fulfillment, but it’s a vice in a management sense because it isolates management from services.

In an ordinary policy-managed process you have network services and you have offered services, with a policy controller making a translation between the two.  Your actual “network management” manages network services so your management tendrils extend “horizontally” out of the network to operations processes.  Your offered services are consumers of network services, but it’s often not possible to know whether an offered service is working or not, or if a network service breaks whether that’s broken some or all of the offered services.

What separates network and offered services is a mapping to specific topology that can relate one to the other.  One possible solution to the problem is to provide topology maps and use them not only to make decisions on management and create management visibility but also to facilitate control.  A recent initiative by Huawei (primary) and Juniper called SUPA (Shared Unified Policy Automation) is an interesting way of providing this topology coordination.

SUPA works by having three graphs (Yang models).  The lowest-level one models the actual network at the protocol level.  The highest one graphs the service abstractly as a connectivity relationship, and the middle one is a VPN/VLAN graph that relates network services to the physical topology.  The beauty of this is that you could envision something like SUPA mapping to legacy elements like VPN and VLAN but also to purist OpenFlow/SDN elements as well.  You could also, in theory, presume that by creating a new middle-level model to augment the current ones, extend SUPA to support new services that have forwarding and other behaviors very different from those we have in Ethernet and IP networks.

Obviously, management coordination between services and networks demands that somebody understand the association.  In SUPA, the high-level controller binds an offered service to a topology and that binding exposes the management-level detail in that it exposes the graph of the elements.  If the underlying structure changes because something had to be rerouted, the change in the graph is promulgated upward.  The graph then provides what’s needed to associate management state on the elements of the service with the service itself.

This is an interesting approach, and it’s somewhat related to my own proposed structure for an “SDN”.  Recall that I had a high-level service model, a low level topology model, and a place where the two combined.  It can also be related to an NFV management option because you could say that the middle-level graph was “provisioned” and formed the binding between service and resources that you need to have in order to relate network conditions to service conditions.

I’m not saying that this is the final answer; SUPA is still in its early stages.  It is a hopeful sign that we’re looking for a more abstract and thus more generalizable solution to the management problem of SDN.  I’d like to see less specificity on the middle-layer graphs—a network service should be any realizable relationship between endpoints and not just IP VPNs, VLANs, or tunnels.  I’d like to see the notion of hierarchy be explicit—a low-level element (an Application Based Policy Decision or ABPD) should be decomposable into its own “tree” of Network Service Agent and ABPD and each NSA should offer one or more abstract services.  I’d also like to see an explicit way of defining management of a service through a hierarchy of related graphs.

We’re not there yet, but I think all this could be done, and so I think SUPA is worth watching.