A Deeper Dive into ONAP

When I blogged about the ONAP Amsterdam release, I pointed out that the documentation that was available on ONAP didn’t address the questions I had about its architecture and capabilities.  The ONAP people contacted me and offered to have a call to explain things, and also provided links and documentation.  As I said in my prior blog, there is a lot of material on ONAP, and there’s no way I could explain it all.  It would be difficult for me to explain the answers to my questions in terms of the documentation too.

I had my call, and my proposal was to take three points that I believed (based on operator input) were critical to ONAP’s value in service lifecycle automation.  I asked them to respond to these points in their own terms, and it’s that set of questions and responses that I’m including here.  For the record, the two ONAP experts on the call were Arpit Joshipura, GM Networking, Linux Foundation and Chris Donley, senior director of Open Source Technology at Huawei and chair of the ONAP Architecture Committee.

I got, as reference, a slide deck titled “ONAP Amsterdam Release” and a set of document links:

The ONAP people were very helpful here, and I want to thank them for taking the time to talk with me.  They pointed out at the start that their documentation was designed for developers, not surprising given that ONAP is an open-source project, and they were happy to cooperate in framing their approach at a higher level, which was the goal of my three points.  I framed these as “principles” that I believed had been broadly validated in the industry and by my own work, and I asked them to respond to each with their views and comments on ONAP support.

The first point is that Network Functions (NFs) are abstract components of a service that can be virtual (VNF), physical (PNF), or human (HNF).  This is an architectural principle that I think is demonstrably critical if the scope of ONAP is to include all the cost and agility elements of carrier operations.

My ONAP contacts said this was the path that ONAP was heading down, with their first priority being the VNF side of the picture.  In the architecture diagram on Page 4 of the Amsterdam Architecture White Paper referenced above, you’ll see a series of four gray boxes.  These represent the Amsterdam components that are responsible for framing the abstractions that represent service elements, and realizing them on the actual resources below.

The notion of an HNF is indirectly supported through the Closed Loop Automation Management Platform (CLAMP), which is the ONAP component responsible for (as the name suggests) closed-loop automation.  CLAMP provides for an escape from a series of automated steps into an external manual process either to check something or to provide an optional override.  These steps would be associated with any lifecycle process as defined in the TOSCA models, and so I think they could provide an acceptable alternative to composing an HNF into a service explicitly and separately.

An abstraction-driven, intent-based approach is absolutely critical to ONAP’s success.  I don’t think there’s any significant difference between how I see industry requirements in this area and what ONAP proposes to do.  Obviously, I think they should articulate this sort of thing somewhere, but articulation in terms that the industry at large could understand is a weakness with ONAP overall.  They appear to recognize that, and I think they’re eager to find a way to address it.

The second point is all network functions of the same type (Firewall, etc.) would be represented by the same abstraction, and implementation details and differences would be embedded within.  Onboarding something means creating the implementation that will represent it within its abstraction.  Abstractions should be a class/inheritance structure to ensure common things across NFs are done in a common way.

The ONAP people say they’re trying to do this with the VNFs, and they have VNF requirements project whose link reference I’ve provided above.  VNF development guidelines and an SDK project will ensure that VNF implementations map into a solid common abstraction.  This works well if you develop the VNF from scratch, but while the architecture supports the notion of creating a “wrapper” function to encapsulate either an existing software component to make it a VNF, or to encapsulate a PNF to make it an implementation of the same NF abstraction, this hasn’t been a priority.  However, they note that there are running implementations of ONAP that contain no VNFs at all; the user has customized the abstractions/models to deploy software application elements.

I don’t see any technical reason why the ONF could not support the kind of structure my second point describes, but I don’t think they’ve established a specific project goal to identify and classify NFs by type and create a kind of library of these classes.  It can be done with some extensions to the open-source ONAP framework and some additional model definition from another party.  Since most of the model properties are inherited from TOSCA/YAML, the notion of extending ONAP in this area is practical, but it is still an extension and not something currently provided.

The final point is lifecycle processes should operate on the abstractions, both within them and among them.  The former processes can be type-specific or implementation-specific or both.  The latter should always be generalized for both NFs and services created from them.

If we go back to that architecture diagram I referenced in my first point, you can see that the processes “above the line”, meaning above those four gray blocks, are general service processes that operate on abstractions (modeled elements) and not on the specific way a given network function is implemented.  That means that it’s a function of modeling (and assigning the abstraction to a gray box!) to provide the link between some NF implementation and the service processes, including closed-loop automation (CLAMP).

The key piece of lifecycle or closed-loop automation is the handling of events.  In ONAP, it’s OK for VNFs (or, presumably, PNFs) to operate on “private resources”, but they can access and control shared-tenant facilities only through the Data Collection, Analytics, and Events (DCAE) subsystem and the Active and Available Inventory (A&AI) subsystem.  There’s API access to the latter, and publish-and-subscribe access to DCAE.

The workings of these two components are fairly complicated, but the combination appears to deal with the need to identify events (even if correlation is needed) and to pass them to the appropriate processes, where handling is presumably guided by the TOSCA models.  I like the A&AI notion because it decouples process elements from real-time access to actual multi-tenant resources.

In our discussions we touched on a couple of things not part of my list of points.  One was the issue of the relationship between open-source projects like ONAP and standards bodies that were tasked with creating something in the same area.  Obviously ONAP and the ETSI NFV ISG have such a relationship.  According to the ONAP people, the coders are supposed to follow standard where they are available and work properly, and to kick the problem upstairs for liaison with the appropriate body if that isn’t possible.

The tension here is created, in my view, by the fact that “standards” in the carrier space are developed by a team of people who are specialists in the standards process.  Open-source development is populated by programmers and software architects.  My own work in multiple standards groups has taught me that there is a real gulf between these two communities, and that it’s very difficult to bridge it.  I don’t think that the ETSI structural model for NFV software is optimal or even, at scale, workable, but I also don’t think that ONAP has been religious in enforcing it.  As long as they’re prepared to step outside the ETSI specs if it’s necessary to do good code, they should be OK.

Which is how I’d summarize the ONAP situation.  Remember that in my earlier blog I questioned whether ONAP did what they needed to do, and said that I wasn’t saying they did not, but that I couldn’t tell.  With the combination of my conversation with the ONAP experts and my review of the material, I think that they intend to follow a course that should lead them to a good place.  I can’t say if it will because by their own admission they are code/contribution-driven and they’re not in that good place yet.

There is a lot that ONAP is capable of, but doesn’t yet do.  Some of it is just a matter of getting to things already decided on, but other things are expected to be provided by outside organizations or the users themselves.  Remember that ONAP is a platform not a product, and it’s always been expected that it would be customized.  Might it have been better to have brought more of that loosely structured work inside the project?  Perhaps, but god-boxes or god-packages are out of fashion.  ONAP is more extensible for the way it’s conceptualized, but also more dependent on those extensions.

This is the second, and main, risk that ONAP faces.  The operators need a solution to what ONAP calls “closed-loop” automation of the operations processes, and they need it before any significant modernization of infrastructure is undertaken.  The advent of 5G creates such a modernization risk, and that means that ONAP will have to be ready in all respects for use by 2020.  The extensions to the basic ONAP platform will be critical in addressing the future, and it’s always difficult to say whether add-on processes can do the right thing fast enough to be helpful.