Pig, Cascalog & HBase Among Highlights of May Hadoop Meet-Up

Hi Hadoopers

Thanks to close to 300 developers who came this week to Yahoo! for our monthly Hadoop User Group meeting. The energy in the packed room was phenomenal and conversations continued long after the formal sessions.

>Hundreds of Hadoop Fans Flock to Yahoo! for  the May Hadoop User Group
Hundreds of Hadoop Fans Flock to Yahoo! for the May Hadoop User Group

A few lucky winners received free tickets to the upcoming Hadoop Summit 2010 (June 29th, at the Hyatt Regency, Santa Clara). Congratulations to those winners – everyone else please register here

The event started with Alan Gates from Yahoo! who described the new features and work done in Pig 0.6 and 0.7 including the Hadoop’s compatibility plan, described in more details in this post.


Nathan Marz from BackType presented a cool demo of how easy it is to query existing data stores using Cascalog, a query language for Hadoop. Nathan described how queries can be written as regular Clojure code and combined with Cascading. Be sure to watch the demo as part of the video below.


Next was Dmitriy Ryaboy, an engineer at Twitter and a Pig committer. Dmitriy walked us through the extensive use of Hadoop eco-system at Twitter. He explained what are the challenges they face in processing 55 million tweets a day and why they chose to use Hadoop, Pig and HBase. Dmitriy introduced the Elephant Bird libraries and shared interesting tips for dealing with Big Data.


We concluded with Tom White from Cloudera who walked us through the release plans for Apache Hadoop 0.21 including the Source Compatibility project described in the Yahoo! hadoop blog


We at Yahoo! are embracing Hadoop – we share the challenges presented by Twitter for processing massive data sets and continue to invest heavily in the technology and the community. We love to hear about the growing ecosystem and solutions like Cascalog.

Please join us at the Hadoop Summit to continue the conversation.

As always, we are looking for exciting technologies and experiences you want to share.
Please contact me via the Hadoop Bay Area User Group Meetup page.

Note that we will not have a meetup in June due to the Hadoop Summit . See you all on July 21st, 2010. Registration is available here, agenda will be published soon


Dekel Tankel
Dekel Tankel

Director, Product Management

Cloud Computing at Yahoo!