Big data: What’s hot, what’s not according to the Twitter stream

By Tony Baer, Principal Analyst, Ovum IT Enterprise Solutions

Because (or in spite) of the hype, sentiment about Big Data vendors was generally bullish in 2012. The attention spilled over from IT to the business media. These were among the findings reported by DataSift, which conducted a retrospective analysis of vendor mentions on Twitter during 2012 for Ovum.

To some extent, the results were surprising: while Hadoop garners much of the spotlight as a Big Data platform, the vendor 10gen, which develops MongoDB, came in second in mentions to Apache, which hosts the Hadoop project. Although only peripherally a Big Data story, HP and Autonomy was the biggest negative story of the year.

The data provided by DataSift provides a good example of how social media mining provides a useful snapshot of popular thinking that supplements – or replaces – the traditional role of marketing focus groups.

Mining Twitter for insights

Traditionally, brand recognition studies used focus groups – with data often correlated to actual sales – to qualify and quantify how a company and/or product is perceived, and why. The rise of social networks has provided a valuable new source of data that is selected, not by scientific sample, but by participants themselves; they vote with their keyboard on whether they will say something for public consumption.

DataSift gained fame as one of a handful of companies authorized to syndicate the entire stream of public Tweets, totaling more than 400 million tweets every day. To enable companies to mine insights, it built a platform to allow companies to create filters to mine and categorise vast volumes of social data, and deliver it into business intelligence tools for further analysis.

Today, Twitter is one of several social media streams that DataSift analyses. Twitter is a surprisingly rich stream of information; although the 140-character messages are often cryptic, they are supplemented by over 70 metadata tags that enrich the data, along with details about URLs that are often shared in tweets.

DataSift conducted a retrospective analysis of Big Data vendor mentions during 2012 to quantitatively analyse brand recognition. By restricting the search to vendors, the analysis focused on perception of the Big Data market, as opposed to the perception of Big Data among the general public. In all, the analysis reflected 2.2 million Twitter interactions from more than 981,000 authors.

Big Data spills over to the business world

With links present in 70% of Big Data posts mentioning vendors, media sites were frequent targets. Analysing link targets, DataSift confirmed what had been anecdotal evidence: awareness of the Big Data technology market has crossed over from IT to the business world.

The most frequently cited media source was, the online site of Forbes magazine (a US business journal). While technology news portals GigaOM and Techcrunch followed Forbes, another major business media source – the Harvard Business Review blog site – edged out popular IT news portal ZDNet.

Which vendors are hot?

Given the hype around Hadoop, it shouldn’t be surprising that the Apache Foundation – which hosts the Hadoop open source project – was the most frequently cited “vendor,” accounting for 9.4% of the posts. But behind Apache was a sleeper: 10gen, which develops the popular MongoDB JSON-based online document data store, came in second with 6.2% of all posts.

Although MongoDB is not known for storing high volumes of data, it is associated with variety, given its schemaless architecture. The popularity of the 10gen brand is attributable to the fact that MongoDB has become for web developers the document equivalent of MySQL; it is open source, built in a language (JavaScript) that is highly popular among web developers, and relatively simple to develop.

Following Apache and 10gen were (in order) IBM, HP, Teradata, Splunk, Oracle, Cloudera, Amazon – and then DataSift (SAP and Hortonworks ranked immediately behind DataSift).

Not all the attention was positive. While positive mentions of Big Data vendors outnumbered negative mentions by 3:1, negative sentiment spiked in November with headlines over HP’s troubled acquisition of Autonomy. Not surprisingly, given that vendors accelerated the pace of product announcements during 2012, 60% of Twitter activity occurred in the second half of the year.

The attention was not necessarily uniform by country. While conventional wisdom is that the US is the leading market for Big Data platform installs, the Japanese, Germans, and French were often far more vocal on Twitter.

By company, there were some conflicting trends. While companies such as SAP, DataSift, and Splunk found the most mentions in their home countries, the opposite was the case for the Apache Foundation and Cloudera, where Japan was the most vocal; 10gen, where France and Japan were the most represented; and IBM, which drew more mentions out of France.

If social network chatter is indicative, investments by startups such as Cloudera and 10gen in less “sexy” (or stagnant) markets like Japan appear to be paying off.

Big Data is a global phenomenon

Ovum’s Big Data survey, conducted in 2011, showed that the US was leading the way in Big Data implementation. Since then, most vendors have reported to us that the US was also their most mature market.

Yet the discrepancies with vendor mentions by region suggest strong latent interest in the next tier of national markets. Without question, although the Big Data market in the rest of the world may not be as well developed as in the US, the curiosity is clearly there.

Related Stories

Leave a comment


This will only be used to quickly provide signup information and will not allow us to post to your account or appear on your timeline.

23 Jan 2013, 10:53 p.m.

Sound report