APIs for NFV Operation: A High-Level Vision

There are a lot of technical questions raised by NFV and even the cloud, questions that we cannot yet answer and that are not even widely recognized.  One of the most obvious is how the component elements are stitched together.  In NFV, it’s called “service chaining”.  Traditionally you’d link devices using communications services, but how to link software virtual devices or features isn’t necessarily obvious from past tradition.  I think we need to look at this problem, as we should look at all problems, from the top down.

A good generalization for network services is that every device operates at three levels.  First, it has the data plane, which it passes according to the functional commitments intrinsic to the device.  Second, it has a control/signaling plane that mediates pair-wise connections, and finally it has a management plane that controls its behavior and reports its status.

In NFV, I would contend that we must always have a management portal for every function we deploy, and also that every “connection” between functions must support the control/signaling interface.  A data-plane connection is required for functions that pass traffic, but is not a universal requirement.  Interesting, then, is the fact that we tend to think of service chaining only in terms of connecting the data paths of functions into a linear progression.

Because we have to be able to pass information for all three of our planes, we have to be able to support a network connection for whatever of the three are present.  This connection carries the information, but doesn’t define its structure, and that’s why the issue of application programming interfaces (APIs) are important.  An API defines the structure of the exchanges in “transactional” or request/response or notification form, more than it does the communications channel over which they are completed.

I believe that all management plane connections would be made via an API.  I also believe that all signaling/control connections should be made via APIs.  Data plane connections would not require an API, only a communications channel, but that channel would be mediated by a linked control interface.  Thus, we could draw a view of a “virtual function” as being a black box with a single management portal, and with a set of “interfaces” that would each have a control API port and an optional data port.  If the device recognized different types of interfaces (port and trunk, user and network, etc.) then we would have a grouping of interfaces by type.

Going with this model in an example might help.  Let’s suppose we have a function called “firewall” designed to pass traffic between Port and Trunk.  This function would then have a management port (Firewall-Management) with an API defined.  It would also have a Firewall-Port and Firewall-Trunk interface, each consisting of a control API and a data plane connection.

Let’s suppose we had such a thing in a catalog.  What would have to be known to let us stitch “firewall” into a service chain?  We’d need to know the API-level and connection-level information.  The latter would be a matter of knowing what OSI layer was supported for the data path (Ethernet, IP) and how we would address the ports to be connected, and this is a place where I think some foresight in design would be very helpful.

First, I think that logically we’d want to recognize specific classes of functions.  For example, we might say we have functions designed for data path chaining (DPC, I’ll call it), others to provide control services (CTLS), and so forth.  I’d contend that each function class should have two standards for APIs—one standard representing how that class is managed (the management portal) and one that defines the broad use of the control-plane API.  So our firewall function, as a DPC, would have management exchanges defined by a broad DPC format, with specificity added through an extension for “firewall” functions.  Think of this as being similar to how SNMP would work.

The management plane should also have some of the TMF MTOSI (Multi-Technology Operations Systems Interface) flavor, in that it should be possible to make an “inventory request” of a function of any type and receive a list of its available interfaces and a description of its capabilities.  So our firewall would report, if queried, that it is a DPC device of functional class “FIREWALL”, and has a Port and Trunk interface both of which are a control/data pairing and supported via an IP address and port.

This to me argues for a hierarchy of definitions, where we first define function classes, then subclasses, and so forth.  All DPC functions and all CTLS functions would have a few common management API functions (to report status) but more functions would be specific to the type of function.  A given implementation of a function might also have an “extended API” that adds capabilities, each of which would have to be specified as optional or required so the function could be chained properly.

An important point here is that the management APIs should be aimed at making the function manageable, not at managing the service lifecycle or service-linked resources.  Experience has shown that pooled assets cannot be managed by independent processes without colliding on actions or allocations.  That’s long been recognized with things like OpenStack, for example.  We need to centralize, which means that we need to reflect the state of functions to a central manager and not reflect resource state to the functions.

To continue with the example of the firewall, let’s look at the initial deployment.  When we deploy a virtual function, we’d check the catalog to determine what specific VNFs were available to match the service requirements, then use the catalog data on the function (which would in my view match the MTOSI-like inventory) to pick one.  We’d then use the catalog information to deploy the function and make the necessary connections.  Each link in the chain would require connecting the control and data planes for the functions.

In our firewall, the control link on the PORT side would likely be something GUI-friendly (JSON, perhaps) while that on the TRUNK side would be designed to “talk” with the adjacent chained element, so that two functions could report their use of the interface or communicate their state to their partner.  We might envision this interface as JSON, as an XML payload exchange, etc. but there are potential issues that also could impact the management interface.

Most control and management interfaces are envisioned as RESTful in some sense, meaning that they are client-server in nature and stateless in operation.  The latter is fine, but the former begs the question of duplex operation.  A function needs to accept management commands, which in REST terms would make it a server/resource.  It also needs to push notifications to the management system, which would make it a client.  We’d need to define either a pair of logical ports, one in each direction, or use an interface capable of bidirectional operation.

What interface fits the bill?  In my view, it’s not necessary or perhaps even useful to worry about that.  The key things that we have to support in any management or control API is a defined function set, a defined information model, and a consistent set of interactions.  We could use a large number of data models and APIs to accomplish the same goals, and that’s fine as long as we’re really working to the same goals.  To me, that mandates that our basic function classes (DPC and CTLS in this example) define states and events for their implementations, and that we map the events to API message exchanges that drive the process.

How might this work?  Once a function deploys on resources and is connected by the NFV management processes, we could say the function is in the DEPLOYED state.  In that state its management interface is looking for a command to INITIALIZE, which would trigger the function’s setup and parameterization, and might also result in the function sending a control message to its adjacent elements as it comes up.

This sounds complicated, doesn’t it?  It is.  However, the complexity is necessary if we want to build services from a library of open, agile, selectable, functions.  The fact that we’ve done very little on this issue to date doesn’t mean that it’s not important, it just means we still have a lot of work to do in realizing the high-level goals set for NFV.