With IBM announcing a bunch of “C-suite” analytics tools designed for the cloud and GE getting into big-data analytics, it’s hard not to think that we’re deep in the “build-the-buzz-meaning-hype” phase of big data. Well, did we expect the market to be rational? After all, we’ve pretty much washed every erg of attention we could out of cloud and SDN already. As always, the hype cycle is helping reporters and editors and hurting buyers.
It’s not that analytics isn’t a good cloud application. In theory, it’s among the best, according to what users have told me in my surveys. Two years ago, analytics was the only horizontal application that gained significant attention as a target for the cloud. But today, analytics is back in the pack—having lost that early edge. The challenge is the big-data dimension, and that challenge is also the major problem with cloud adoption overall.
If you look at cloud service pricing, even the recent price reductions for data storage and access don’t change the fact that data storage in the cloud is very expensive. So given that, what’s all this cloud-big-data hype about? Hype, for one thing, but leaving that aside there is a factual reality here that’s being swept (or, better yet, washed) under the rug. We really have two different clouds in play in today’s market. One is the resource cloud which we have long been touting. Resource clouds are about hosting applications. The second is the information cloud, which has nothing to do with resources and everything to do with processing architectures for managing distributed data. I would submit that Hadoop, the archetypal “cloud” architecture is really an information cloud architecture. Further, I’d submit that because we don’t acknowledge the difference, we’re failing to support and encourage the developments that would really help big data and analytics advance.
In many ways, what the market is looking for is the long-sought “semantic web”, the superInternet that somehow understands critical data relationships (the “semantics”) and can exploit them in a flexible way. We’ve not managed to get very far in the semantic web even though it’s been nearly 50 years since the idea emerged, but if we really want to make big data and analytics work, we need to be thinking about semantic networks, semantic knowledge storage, and semantics-based analytics that can be distributed to the data’s main storage points. It’s superHadoop to make the superInternet work.
We have a lot of business problems that this sort of semantic model could help to solve. Optimization of networking or production decisions, the target of some of the recent big-data announcements, are examples of the need. Simple Dijkstra algorithms for route optimization are fine if the only issue is network “cost”. Add in optimal server locations based on usage of a resource pool, policies to limit where things are put for reliability or performance reasons, and information availability and you quickly get a problem that scales beyond current tools. We could solve that problem with superHadoop. We might even be able to evolve Hadoop into something that could solve this problem, and more, if we focused on Hadoop as an information cloud architecture.
Do you believe in the future of the cloud? If you do, then there may be no single thing that could be done to advance it as far as the simple separation of the information and resource clouds. We need an IT architecture, a semantic information architecture, for the new age. Yes, it will likely enhance the resource cloud and be enhanced by it in return, but it’s an interdependent overlay on the resource cloud, the traditional cloud, and not an example of how we implement it. Big data and analytics problems are solved by the information cloud, not the resource cloud, and what we need to be doing now is recognizing that and looking for information-cloud solutions that go beyond the Hadoop obvious.