Welcome, guest Sign In

Hadoop Summit '09

Join us at this year's Hadoop Summit

Focus:

The event will focus on the advancements made in the development and deployment of Hadoop and related technologies. It will also feature applications which use Hadoop in new and unique ways.

Registration:

Registration for the Summit is through EventBrite. Please register now, if you haven't already.

Location:

Santa Clara Marriott
2700 Mission College Boulevard
Santa Clara, CA 95054

Map it:
Yahoo!  Google  MapQuest

Date and Time:

June 10, 2009
Doors open at 8 am. The first talk begins at 8:30. Lunch is from 12:30 to 1:30. Individual tracks start after lunch. Join us for an evening reception, with refreshments and an opportunity to network with other Hadoop enthusiasts, from 6:30 to 8:30 pm.
Please see the agenda below for more information.

Media Syndication:

Please use the tag hadoopsummit09 in your tweets, posts, and photos.

This event page will be online up to and during the Summit, and will remain online after the Summit for an extended period of time.

Hot Topics at the Hadoop Summit '09

State of Hadoop

Join Eric Baldeschwieler and the Yahoo! team to learn about the progress made with Hadoop over the last year, core capabilities and related sub-projects, deployment experiences, and future directions.

Pig

Alan Gates' talk on Pig will include an introduction to Pig, general information and performance tips for Pig users, and descriptions of current projects and planned development directions for Pig developers.

Amazon Elastic MapReduce

A large number of AWS customers are currently running Hadoop jobs on Amazon's EC2. In an attempt to create a more friction-free path for them, AWS developed Elastic MapReduce. While Hadoop abstracts out all the development complexity in running a massively distributed task in parallel, Amazon Elastic MapReduce abstracts out all the operational complexity in running Hadoop on Amazon EC2. In his talk, Jinesh Varia, will describe Elastic MapReduce and some of the interesting ways in which some customers are using it.

Chukwa

"Chukwa is an open source data collection system for monitoring and analyzing large distributed systems. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying monitoring and analyzing results, in order to make the best use of this collected data." (wiki.apache.org/hadoop/Chukwa) UC Berkeley's Ariel Rabkin will introduce this powerful tool in Track 2.

Genetic Sequence Analysis in the Clouds

Jimmy Lin, Michael Schatz, and Ben Langmead from the University of Maryland will discuss the computational challenges associated with processing and storing the vast quantities of data produced by next-generation genetic sequencers. They will present initial results from their use of MapReduce to meet these challenges. They will also speculate on the future of cloud computing technologies in the life sciences.

Towards Energy Efficient Hadoop

Yanpei Chen, Laura Keys, Randy H. Katz, from UC Berkeley, will discuss the motivation for, and findings from, their research on the energy efficiency of the Hadoop implementation of MapReduce. In their work, they compared Hadoop energy consumption under realistic workloads, measured the energy consumption of different parts of the Hadoop datapath, and construct a quantitative model to predict the energy consumption for a particular task. Preliminary results show that there is a relation between decreasing the job completion time and reducing the energy consumption.

Sponsors

Agenda

General Session

Schedule Topic Speaker
8:30-9:00 am Breakfast Sponsored by IBM
9:00-9:15 am Welcome and Kickoff Shelton Shugar (Yahoo!)
9:15-10:00 am State of Hadoop Eric Baldeschwieler, Doug Cutting (Yahoo!)
10:00-10:30 am Hadoop in the Enterprise Rod Smith (IBM, VP Engineering)
10:30-10:50 am Coffee Break Sponsored by Amazon
10:50-11:10 am Sun Cloud and Hadoop Juan Carlos Soto (Sun)
11:10-11:40 am Amazon Elastic MapReduce Jinesh Varia (Amazon)
11:40-12:10 pm The Growing Hadoop Community Christophe Bisciglia (Cloudera)
12:10-1:30 pm Lunch

Track 1 (Developers)

Schedule Topic Speaker
1:30-2:00 pm Hbase Goes RealTime Jonathan Gray, Jean-Daniel Cryans
2:00-2:30 pm Hive Zheng Shao, Namit Jain (Facebook)
2:30- 3:00 pm Getting more out of Pig Alan Gates (Yahoo!)
3:00-3:30 pm Future proofing Map-Reduce Owen O’Malley (Yahoo!)
3:30-4:00 pm Coffee Break Sponsored by Sun
4:00-4:30 pm Zookeeper Mahadev Konar (Yahoo!)
4:30-5:00 pm Automated diagnosis of problems in Hadoop Priya Narasimhan (CMU)
5:00-5:30 pm Workflow / Oozie Alejandro Abdelnur (Yahoo!)
5:30-5:40 pm Speeding up Hadoop: Winning the 2009 Sort Benchmarks Arun C Murthy (Yahoo!)
5:40-6:30 pm Futures Panel Sanjay Radia, Owen O’Malley, Doug Cutting (Yahoo!), Ashish Thusoo (Facebook), Tom White (Cloudera)

Track 2 (Administration)

Schedule Topic Speaker
1:30-2:00 pm Towards Energy Efficient Hadoop Yanpei Chen, Laura Keys, Randy H. Katz (UC Berkeley)
2:00-2:30 pm Hadoop scheduling in the OpenCirrus Cloud testbed Thomas Sandholm and Dejan Milojicic (HP)
2:30- 3:00 pm Scheduler- Fairshare & Capacity Matei Zaharia (UC Berkeley)
3:00-3:30 pm Scaling Hadoop for multi-core and highly threaded systems Jangwoo Kim (Sun)
3:30-4:00 pm Coffee Break Sponsored by Sun
4:00-4:30 pm Running Hadoop in the Cloud Tom White (Cloudera)
4:30-5:00 pm Hadoop Configuration Management and Deployment Matt Massie, Christophe Bisciglia (Cloudera)
5:00-5:30 pm Chukwa Ariel Rabkin (UC Berkeley)
5:30-6:00 pm Hadoop and Condor Jason Stowe
6:00-6:30 pm Cascading at ShareThis Paco Nathan

Track 3 (Applications)

Schedule Topic Speaker
1:30-2:00 pm The Worldwide LHC Computing Grid: Data Processing on a Global Scale Brian Bockelman (U Nebraska)
2:00-3:00 pm Case studies on EC2 Jinesh Varia (Amazon) with panel:
Paco Nathan, Principal Scientist, Data insights (ShareThis)
Ben Hardy, Senior Software Engineer (eHarmony.com)
Dr. Ted Dunning, Committer of Mahout, CTO (DeepDyve)
Elias Torres, Director of Engineering (Lookery)
John Barr, COO and VP of Engineering (YieldEx)
3:00-3:30 pm Genetic Sequence Analysis in the Clouds: Applications of MapReduce to the Life Science Jimmy Lin, Michael Schatz, Ben Langmead (U Maryland)
3:30-4:00 pm Coffee Break Sponsored by Sun
4:00-4:20 pm Lightning talk: Hadoop architecture and Application Dhruba Borthakur, Ding Zhou (Facebook)
4:20-4:40 pm Lightning talk: Anti-spam Yahoo! Mail team
4:40-5:00 pm Lightning talk: EMI Music’s next generation business intelligence platform Stefan Groschupf
5:00-5:20 pm Lightning talk: Mapping the World's Photos David Crandall (Cornell)
5:20-5:40 pm Lightning talk: Lydia: news, blog analysis Mikhail Bautin (SUNY)
5:40-6:00 pm Lightning talk: Parallel Data Mining in Telco Zhiguo Luo (China Mobile)
6:00-6:20 pm Lightning talk: Natural language learning with Hadoop Kevin Gimpel (CMU)

Evening Reception

Schedule
6:30-8:30 pm Sponsored by Lightspeed Venture Partners

Twitter

  • @aaroncordova: sweet! I just noticed Tahoe-LAFS v1.6 is out (and in Ubuntu) Now I need to test Hadoop with it, and publish the code this time.
  • @us_azure: Cloud Computing: Assessing Azure, Amazon EC2, Google App Engine and Hadoop for IT Decision Making and Developer Ca... http://bit.ly/dDroTc
  • @shot6: RT @fujibee: なんと、Hadoopではデフォルトでは、ドットやアンスコで始まるファイルはinputsにできないのか。。しばしはまった。ここにかいてあった。 http://bit.ly/cLwNhY
  • @NinGoo: 春节期间,准备大概浏览一下hadoop
  • @shun0102: RT @fujibee: 新しく追加インスタンスを上げてジョブ実行してわかったが、HadoopはHDFSへのレプリケーションをジョブ実行をトリガとしてやってるのか。ジョブのlocalityを取るためかと。ジョブ実行したら新規30台のHDFSに一気に分散した。壮観だった。

More Tweets

Copyright © 2010 Yahoo! Inc. All rights reserved. Copyright | Privacy Policy | Terms of Use

Help us continue to improve the Yahoo! Developer Network: Send Your Suggestions