A week's gone by since the third annual and biggest ever Hadoop Summit, but the interest and momentum continue unabated. Yahoo!'s communications team spoke to presenters and thought leaders from some participating companies about how they are using Hadoop. Here's a sampling of what we heard and some video conversations we captured:
Facebook uses Hadoop and Hive extensively to process large data sets. This infrastructure is used for a variety of different jobs - including adhoc analysis, reporting, index generation and many others. We have one of the largest clusters with a total storage disk capacity of more than 20PB and with more than 23000 cores. We also use Hadoop and Scribe for log collection, bringing in more than 50TB of raw data per day. Hadoop has helped us scale with these tremendous data volumes. - Ashish Thusoo, Engineering Manager at Facebook.
"Hadoop is a key ingredient in allowing LinkedIn to build many of our most computationally difficult features, allowing us to harness our incredible data about the professional world for our users," said Jay Kreps, Principal Engineer, LinkedIn.
"Twitter's rapid growth means our users are generating more and more data each day. Hadoop enables us to store, process, and derive insights from our data in ways that wouldn't otherwise be possible. We are excited about the rate of progress that Hadoop is achieving, and will continue our contributions to its thriving open source community," said Kevin Weil, Analytics Lead, Twitter.
"Yahoo is a pioneer of the Apache Hadoop ecosystem. They contribute novel innovations and essential quality assurance alike," said Amr Awadallah, CTO and founder of Cloudera. "Cloudera is proud to work together with Yahoo building this important base while at the same time adding innovative capabilities that enable enterprise consumers to use Hadoop for large-scale storage and analytics."
"At Karmasphere, we believe Hadoop represents a quantum leap forward for organizations looking to unlock the power held within massive data sets. Hadoop helps big data professionals find creative solutions to tough problems, speed medical and scientific discovery and provide new commercial insights from what has previously been labeled data exhaust. At Karmasphere we are dedicated to harnessing the power of big data through providing front-end solutions that simplify working with Hadoop and MapReduce. Our aim is to make Hadoop accessible to anyone analyzing with massive data sets," said Martin Hall, chief executive of Karmasphere.