Hadoop User Group meeting recap, November 2010

More than 100 Hadoop developers and enthusiasts congregated on the Yahoo campus for the monthly HUG meeting on November 17. As always, they were treated to some enlightening presentations in addition to good food and beverages.

After the usual 30 minutes of socializing and networking, James Dixon, the CTO of Pentaho, kicked off the presentations with an interesting talk on "Business Intelligence for Big Data." He spoke about the current Hadoop use-cases and its limitations when it comes to traditional BI use-cases. He introduced the concept of "data lakes" and spoke in depth about Pentaho's approach for BI on Hadoop. He ended the presentation with an interesting demo.

Here are the slides from Pentaho's talk.

Following is the video of the presentation.

This was followed by a talk on "Fuzzy Tables" by Ed Kohlwey from the strategy and technology consulting giant Booze Allen Hamilton. He introduced the audience to the concept of fuzzy matching and its application in the important field of biometrics. Biometrics DBs are a big data problem and hence a natural fit for Hadoop. He presented a detailed and in-depth look at the architecture and design of the Fuzzy Table concept and how it helps the biometrics databases.

Please see the link to the slides for more details.

The final presentation was by Ramkumar Vadali and his team from Facebook. They presented their implementation of HDFS RAID at Facebook. This is a very interesting and useful idea and has the potential to save a lot of storage space for real big-data customers of Hadoop. This presentation explained all the tradeoffs and benefits of this approach that has apparently saved Facebook 5 PB of storage.

Following are the slides from Facebook's presentation.

Here's the video of the Facebook presentation, HDFS-Raid.

This talk was a fitting end to an interesting HUG. Thanks to all the Hadoop users and presenters who attended the November HUG despite the imminent onset of the holiday season.