Is There a Role for Graph Databases in Lifecycle Automation and Event-Handling? – Welcome to CIMI Corporation's Public Blog

Everything old is new again, so they say, and that’s even more likely to be true when “new” means little more than “publicized”. When I talked to the 177 enterprises I’ve chatted with this year, I was somewhat surprised to find that well over two-thirds believed that artificial intelligence was “new”, meaning that it emerged in the last decade. In fact, it goes back at least into the 1980s. The same is true for “graph databases”, which actually go back even further, into the ‘70s, but Amazon’s Neptune implementation is bringing both AI and graph databases into the light. It might even light up networking and cloud computing.

We’re used to thinking of databases in terms of the relational model, which uses the familiar “tables” and “joins” concept to reflect a one-to-many relationship set. RDBMSs are surely the most recognized model of database, but there have been other models around for at least forty or fifty years, and one of them is the “semantic” model. I actually worked with a semantic database in my early programming career.

Semantic databases are all about relationships, context. Just like words in a sentence or conversation have meaning depending on context, so data in a semantic database has meaning based on its relationships with other data. The newly discussed “graph databases” are actually more properly called “semantic graph databases” because they extend that principle. One common graph database technology is the NoSQL graph database increasingly popular for IoT applications.

The same notion was the root of a concept called the “Semantic Web”, which many saw as the logical evolutionary result of web usage. In fact, the Resource Definition Framework (RDF) used in many (most) graph databases came about through the Semantic Web and W3C.

Graph databases shine at storing things that are highly correlated, particularly when it may be the correlations rather than the value of a data element that really matters. Amazon’s Neptune and Microsoft Azure’s Cosmos DB, as well as a number of cloud-compatible software graph database products (Neo4j is arguably the leader), will usually perform much better in contextual applications than RDBMS databases would. That makes them an ideal foundation for applications like IoT, and also for things like network event-handling and lifecycle management. While you don’t need graph databases for AI/ML applications, there’s little doubt that most of those applications would work better with graph databases, and my notion of “contextual services” would as well.

Network service lifecycle automation, a topic dear to my heart and the hearts of anyone with a network, would seem a natural for graph database technology. Network events, since they reflect a state change in a highly interconnected community of cooperative elements, are handled properly when they’re handled in context, so obviously something that can reflect relationships would be a better way of storing and analyzing them. Why then don’t we see all manner of vendor representations on the power of their graphical database technology in network management?

We do see an increased awareness of the contextual nature of lifecycle automation, and I’ve illustrated it through my blogs about finite-state machines (FSM) and state/event processing. You can also see it in the monolithic models of network automation, including the ONAP management framework, by the fact that the processing of an event often involves a query into the state of related elements. That begs the question of whether a graph database might serve as an alternative to both FSM and specific status polling.

One barrier to graph database application to network or service lifecycle automation, and one that would apply to application lifecycle automation as well, is the tendency to rely on specific polling for status, rather than on a database-centric analysis of status. Polling for status has major issues in multi-tenant infrastructure because excessive polling of shared resources can almost look like a denial-of-service attack. Back in 2013, I did some work with some Tier Ones on what I called “derived operations”, which was an application of a proposal in the IETF called “i2aex”, which stood (in their whimsical manner of naming) for “infrastructure to application exposure”. The idea was that status information obtained either by a poll or a pushed event would be stored in a database, and applications like lifecycle automation would query the database rather than do their own polling. I2aex never took off, and I didn’t follow through with serious thought about just what kind of database we might want to store these infrastructure events in. I think that graph database storage is an option that should be considered now (and that I should have explored then, candidly).

Conceptually, the “state” of a community of cooperative elements (of any kind, network or application) can be determined from the sum of the states of the elements themselves. The relationships between the elements and their states can surely be represented in a graph database, and in fact you could use a graph to represent a FSM and “read” from it to determine what process to invoke for a given state/event intersection. Why not create a graph database of the network and the service, and use it for lifecycle automation?

One potential issue is that the number of relationships among elements grows exponentially with the number of elements, which means that a graph representing a large network, service, or application might be very large indeed, and that a query into it, even given the high performance of graph databases, might be time-consuming. Still, the concept might have real merit if we could tweak things a bit.

One possible tweak would be to use the same techniques I’ve discussed for creating a “service hierarchical state machine” or HSM from individual service-component FSMs. In the approach I discussed in my blogs, the components of a service or service element reported state changes back to their superior element, which then only had to know about the subordinate elements and not their own interior components. The model constrains the complexity.

Another possible tweak would be to use AI principles. A service or an application, in the real world, is really a fusion of two models, one representing the resources themselves and another the way that service functionality is related to or impressed on those resources. I believe a graph database could model that, but it might be easier to use AI to do bridge correlations between a graph database representing each model.

I’ve always been a fan of state/event tables, but I’m not wedded to an approach if something better comes along. I’d like to hear about any state/event applications mapped to a graph database versus the traditional table, and hear comments from some of my LinkedIn friends on their views on the use of graph databases in applications traditionally seen as state/event-driven. Please do not advertise your product in response; if you want to reference a product, post a link to a blog you’ve done that explains the graph-database connection to state/event handling.