Thanks to more than 250 developers who came tonight to Yahoo! for our monthly Hadoop User Group meeting. With Facebook's F8 developer conference and the downpour of April showers it was nice to see such turnout.
The event started with Vishwanath Ramarao, Director of anti-spam engineering for Yahoo! Mail. Vish described the intricate cat-and-mouse games played with spammers, and how Yahoo! uses Hadoop to abstract away the complexity of large scale data analysis and provide deep insight into spammer campaigns.
Next was a presentation from John Sichi, lead engineer for Facebook's data infrastructure team. John provided an overview of Facebook's recent integration between Hadoop, HBase and Hive and the motivation for it - "Data, data, and more data".
We concluded with Ken Krugler, the founder of Bixo Labs. Ken described the Public Terabyte Dataset project - a large-scale web crawl that uses SimpleDB, Hadoop, Cascading and Bixo in the Amazon's EMR cloud.
We will publish shortly video recordings of the sessions on this blog. Stay tuned!
We at Yahoo! are embracing Hadoop – as illustrated by the Yahoo! Mail case study. The ability to process massive data sets is core to our business and we are continuing to invest heavily in the technology and the community. We love to hear about the growing ecosystem of solutions and frameworks built around Hadoop.
Please join us at the Hadoop Summit to continue the conversation.
As always, we are looking for exciting technologies and experiences you want to share.
Please add presentation requests at the Hadoop Bay Area User Group Meetup page.
See you all on May 19th, 2010. Registration is available here, agenda will be published soon
Director, Product Management
Cloud Computing at Yahoo!