Hadoop Summit '09
Join us at this year's Hadoop Summit
Focus:
The event will focus on the advancements made in the development and deployment of Hadoop and related technologies. It will also feature applications which use Hadoop in new and unique ways.Registration:
Registration for the Summit is through EventBrite. Please register now, if you haven't already.Location:
Santa Clara Marriott2700 Mission College Boulevard
Santa Clara, CA 95054 Map it:
Yahoo! Google MapQuest
Date and Time:
June 10, 2009Doors open at 8 am. The first talk begins at 8:30. Lunch is from 12:30 to 1:30. Individual tracks start after lunch. Join us for an evening reception, with refreshments and an opportunity to network with other Hadoop enthusiasts, from 6:30 to 8:30 pm.
Please see the agenda below for more information.
Media Syndication:
Please use the tag hadoopsummit09 in your tweets, posts, and photos. This event page will be online up to and during the Summit, and will remain online after the Summit for an extended period of time.Hot Topics at the Hadoop Summit '09
State of Hadoop
Join Eric Baldeschwieler and the Yahoo! team to learn about the progress made with Hadoop over the last year, core capabilities and related sub-projects, deployment experiences, and future directions.
Pig
Alan Gates' talk on Pig will include an introduction to Pig, general information and performance tips for Pig users, and descriptions of current projects and planned development directions for Pig developers.
Amazon Elastic MapReduce
A large number of AWS customers are currently running Hadoop jobs on Amazon's EC2. In an attempt to create a more friction-free path for them, AWS developed Elastic MapReduce. While Hadoop abstracts out all the development complexity in running a massively distributed task in parallel, Amazon Elastic MapReduce abstracts out all the operational complexity in running Hadoop on Amazon EC2. In his talk, Jinesh Varia, will describe Elastic MapReduce and some of the interesting ways in which some customers are using it.
Chukwa
"Chukwa is an open source data collection system for monitoring and analyzing large distributed systems. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness. Chukwa also includes a flexible and powerful toolkit for displaying monitoring and analyzing results, in order to make the best use of this collected data." (wiki.apache.org/hadoop/Chukwa) UC Berkeley's Ariel Rabkin will introduce this powerful tool in Track 2.
Genetic Sequence Analysis in the Clouds
Jimmy Lin, Michael Schatz, and Ben Langmead from the University of Maryland will discuss the computational challenges associated with processing and storing the vast quantities of data produced by next-generation genetic sequencers. They will present initial results from their use of MapReduce to meet these challenges. They will also speculate on the future of cloud computing technologies in the life sciences.
Towards Energy Efficient Hadoop
Yanpei Chen, Laura Keys, Randy H. Katz, from UC Berkeley, will discuss the motivation for, and findings from, their research on the energy efficiency of the Hadoop implementation of MapReduce. In their work, they compared Hadoop energy consumption under realistic workloads, measured the energy consumption of different parts of the Hadoop datapath, and construct a quantitative model to predict the energy consumption for a particular task. Preliminary results show that there is a relation between decreasing the job completion time and reducing the energy consumption.
Sponsors
Agenda
General Session
| Schedule | Topic | Speaker |
|---|---|---|
| 8:30-9:00 am | Breakfast | Sponsored by IBM |
| 9:00-9:15 am | Welcome and Kickoff | Shelton Shugar (Yahoo!) |
| 9:15-10:00 am | State of Hadoop | Eric Baldeschwieler, Doug Cutting (Yahoo!) |
| 10:00-10:30 am | Hadoop in the Enterprise | Rod Smith (IBM, VP Engineering) |
| 10:30-10:50 am | Coffee Break | Sponsored by Amazon |
| 10:50-11:10 am | Sun Cloud and Hadoop | Juan Carlos Soto (Sun) |
| 11:10-11:40 am | Amazon Elastic MapReduce | Jinesh Varia (Amazon) |
| 11:40-12:10 pm | The Growing Hadoop Community | Christophe Bisciglia (Cloudera) |
| 12:10-1:30 pm | Lunch |
Track 1 (Developers)
| Schedule | Topic | Speaker |
|---|---|---|
| 1:30-2:00 pm | Hbase Goes RealTime | Jonathan Gray, Jean-Daniel Cryans |
| 2:00-2:30 pm | Hive | Zheng Shao, Namit Jain (Facebook) |
| 2:30- 3:00 pm | Getting more out of Pig | Alan Gates (Yahoo!) |
| 3:00-3:30 pm | Future proofing Map-Reduce | Owen O’Malley (Yahoo!) |
| 3:30-4:00 pm | Coffee Break | Sponsored by Sun |
| 4:00-4:30 pm | Zookeeper | Mahadev Konar (Yahoo!) |
| 4:30-5:00 pm | Automated diagnosis of problems in Hadoop | Priya Narasimhan (CMU) |
| 5:00-5:30 pm | Workflow / Oozie | Alejandro Abdelnur (Yahoo!) |
| 5:30-5:40 pm | Speeding up Hadoop: Winning the 2009 Sort Benchmarks | Arun C Murthy (Yahoo!) |
| 5:40-6:30 pm | Futures Panel | Sanjay Radia, Owen O’Malley, Doug Cutting (Yahoo!), Ashish Thusoo (Facebook), Tom White (Cloudera) |
Track 2 (Administration)
| Schedule | Topic | Speaker |
|---|---|---|
| 1:30-2:00 pm | Towards Energy Efficient Hadoop | Yanpei Chen, Laura Keys, Randy H. Katz (UC Berkeley) |
| 2:00-2:30 pm | Hadoop scheduling in the OpenCirrus Cloud testbed | Thomas Sandholm and Dejan Milojicic (HP) |
| 2:30- 3:00 pm | Scheduler- Fairshare & Capacity | Matei Zaharia (UC Berkeley) |
| 3:00-3:30 pm | Scaling Hadoop for multi-core and highly threaded systems | Jangwoo Kim (Sun) |
| 3:30-4:00 pm | Coffee Break | Sponsored by Sun |
| 4:00-4:30 pm | Running Hadoop in the Cloud | Tom White (Cloudera) |
| 4:30-5:00 pm | Hadoop Configuration Management and Deployment | Matt Massie, Christophe Bisciglia (Cloudera) |
| 5:00-5:30 pm | Chukwa | Ariel Rabkin (UC Berkeley) |
| 5:30-6:00 pm | Hadoop and Condor | Jason Stowe |
| 6:00-6:30 pm | Cascading at ShareThis | Paco Nathan |
Track 3 (Applications)
| Schedule | Topic | Speaker |
|---|---|---|
| 1:30-2:00 pm | The Worldwide LHC Computing Grid: Data Processing on a Global Scale | Brian Bockelman (U Nebraska) |
| 2:00-3:00 pm | Case studies on EC2 | Jinesh Varia (Amazon) with panel: Paco Nathan, Principal Scientist, Data insights (ShareThis) Ben Hardy, Senior Software Engineer (eHarmony.com) Dr. Ted Dunning, Committer of Mahout, CTO (DeepDyve) Elias Torres, Director of Engineering (Lookery) John Barr, COO and VP of Engineering (YieldEx) |
| 3:00-3:30 pm | Genetic Sequence Analysis in the Clouds: Applications of MapReduce to the Life Science | Jimmy Lin, Michael Schatz, Ben Langmead (U Maryland) |
| 3:30-4:00 pm | Coffee Break | Sponsored by Sun |
| 4:00-4:20 pm | Lightning talk: Hadoop architecture and Application | Dhruba Borthakur, Ding Zhou (Facebook) |
| 4:20-4:40 pm | Lightning talk: Anti-spam | Yahoo! Mail team |
| 4:40-5:00 pm | Lightning talk: EMI Music’s next generation business intelligence platform | Stefan Groschupf |
| 5:00-5:20 pm | Lightning talk: Mapping the World's Photos | David Crandall (Cornell) |
| 5:20-5:40 pm | Lightning talk: Lydia: news, blog analysis | Mikhail Bautin (SUNY) |
| 5:40-6:00 pm | Lightning talk: Parallel Data Mining in Telco | Zhiguo Luo (China Mobile) |
| 6:00-6:20 pm | Lightning talk: Natural language learning with Hadoop | Kevin Gimpel (CMU) |
Evening Reception
| Schedule | |
|---|---|
| 6:30-8:30 pm | Sponsored by Lightspeed Venture Partners |
- @akitada: Hadoop World: NYC 2009 | Cloudera http://ff.im/-b7CHg
- @akitada: 10 MapReduce Tips » Cloudera Hadoop & Big Data Blog http://ff.im/-b7uX4
- @tdedecko: Hadoop seems so powerful. I think I need to learn it and soon. Just bought a book on it. http://bit.ly/26kLck
- @nsharp_2ch: Hadoop 0.20向けのPig新版が出てたんだ。( ゚д゚) http://hadoop.apache.org/pig/releases.html
- @zmarty: Working on a distributed bisecting K-Means implementation on Hadoop.

