Why Analytics Alone isn’t Power

Everyone knows that knowledge is power, but information is not “knowledge”, and that’s the general issue I have with the big data and analytics craze.  We’re presuming that collecting a whole lot of stuff is valuable, which could be true…if we can do something useful with what we collect.  In the areas of SDN and NFV, I think analytics is becoming the classic “universal constant”, which every college student knows is that number which, when multiplied by your answer, yields the correct answer.  Is there a role for data gathering and analytics in SDN or NFV?  Yes, but you don’t fill the role by asserting the inclusion of analytics.  It’s more complicated.

Most of the analytics claims made with regard to SDN or NFV deal with gathering information about network or resource state as a means of 1) making decisions on resource assignment (routing, hosting, etc) or 2) invoking automated responses to conditions like congestion or failure.  The presumption is that by gathering “stuff” and running through it, you can gain an accurate picture of resource/service state.  Maybe.

The first issue is one of availability of the data.  Telemetry can be collected and analyzed only if its source is addressable, and in the case of network failure that may not be the case.  In SDN, for example, the network’s default state can be “no connections” and you don’t have data paths until forwarding rules are created to support transport.  That can be done explicitly and from a central set of stored states, or on demand when packets are presented.  But in either case you don’t have forwarding paths until you set them up, and so if something breaks you may be disconnected from the telemetry that carries resource state, for the thing that’s broken and perhaps for other stuff as well.  There are advantages to adaptive networking where every device cooperates to seek a connectivity map, folks.

The second issue is timeliness.  There’s always a propagation delay in data transmission so the bigger and more complicated a network is, the more likely it is that your knowledge of state is imperfect because the current data is still in flight.  But even if you have it, analytics is essentially the application of robot processes to data repositories, and the regularity with which these processes run will impact the timeliness of any alerts.

Even if you get data when you need it from what’s important to you, there’s still the question of relativity.  Suppose Server Four is overloaded.  Great, but what services are associated with Server Four?  Virtualization means that abstract resources are committed for real when you provision/orchestrate something.  That means that the relationship between server (in this example) and service is elastic.  If we presume that we can relocate something if its server breaks or horizontally scale in response to load, the resource commitments can vary over time.  How does a real resource fault get tracked back to a virtual service commitment?

Then there’s the issue of actionability.  We know Server Four is broken.  What do we do about it?  If the answer is “send a human” or even “tell a human” we’re defeating the whole notion of service automation and compromising our goals of operations efficiency.  But knowing something is broken and what services are impacted is still an intellectual exercise if we can’t remedy the problem in an automated way.  That means that we have to be able to relate issues with remedies at the service level, and associate them there with the proper remedy based on available alternatives.  That, in turn, means we have to be able to rethink the “what we did” to create the service (instantiate or orchestrate or whatever term you like) and do it again, keeping in mind that this time some of the choices have already been made (the stuff that’s not broken) and these choices might still be good, or good enough, or maybe they’re inconsistent with our remaining options post failure, so we have to do some or all of them over again.

The net of all of this is that “data” in isolation is useless.  Status, in isolation, is useless.  We have to be able to contextualize our information and then evaluate it against the rules for how it should be interpreted, for every service impacted.  That, friends, is way beyond what people are generally talking about these days.

The TMF awarded the CloudNFV Catalyst the title of “most innovative” this week in Nice and (keeping in mind my role in launching the concept as fair disclosure here) I think the award is deserved, and in no small part because CloudNFV explicitly recognizes the need for service context in interpreting management events and taking meaningful action.  You could absorb analytics and big data into it in a number of ways, but the key point is that the critical service context is never lost, including when you change something in deployment because of changes in resource state.

Part of the “contextualization” benefit offered by CloudNFV is architectural; it’s a byproduct of the way that services are modeled.  Part is also created by the implementation platform (EnterpriseWeb) that allows for essentially totally elastic binding of processes and data elements, so that forming and reforming pictures of what you’ve done and what you now have to do is straightforward and efficient.  The implementation also provides that second critical element, actionability, because the demonstration shows how network conditions can be fed back to create operational changes in deployment.

The final aspect of this is the integration of all the pieces, something that the multiple-player nature of CloudNFV demonstrated, with its inclusion of a host of other founding contributors and integration partners.  The truth is that nobody is going to have a meaningful single-vendor solution to this problem because nobody has single-vendor networks and OSS/BSS/NMS tools that all come from that same vendor.  We’re living in a disorderly world, tech-wise, and we have to face that or we’ll never bring it to heel.

I’m not saying that CloudNFV is the only way to do what needs done (most of you know I have an open-source project with the same goals), or even the best (that has to be proved by comparison, and there’s nothing yet to compare it with).  I am saying it’s a validated solution where, as far as I’m aware, there are no others.  It’s also a warning to us all.  When we throw out terms like “analytics” like they were solutions, or even paths to a solution, we not only undervalue what the TMF at least has found highly valuable, we submerge what might be the critical part of a debate over how to use information to sustain quality of experience in a virtual age.

 

Leave a Reply