Hadoop Summit 2011 – A Different Approach

Hadoop Summit 2011 is over. If you saw this tweet ”#hadoopsummit planned for 1,500. upped on demand to 1,600. finally accommodated 1,700. ran out of space, good problem to have. :-),” then you probably got an idea of how exciting and mobbed the conference was this year. With folks dropping by from coast-to-coast, and quite a few from around the world, Hadoop Summit 2011 will quite likely be the year’s largest Hadoop gathering. But even more so, because of the passion of everyone that participated, it was also the best Hadoop gathering of the year, raising the bar yet again for Hadoop technical content and networking.

At the Summit and since it ended, I have received questions from folks who attended the show and some who couldn’t make it. In general, a lot of people were curious about what went into developing the Summit and the approach we took to the Summit. I thought I’d take some time today and summarize my thoughts on this topic.

Obviously, in conference planning, a lot of the success of an event comes down to logistics, and fitting 1,700 people into the Santa Clara Convention Center for one action-packed today definitely requires a lot of logistics. But beyond those details, I think more important this year was the decision to change how we approached the Summit and to make sure the focus of the event was on building the Hadoop community itself. The Hadoop community will be at the heart of that continued innovation, so it is important that the community continues to grow and share with each other.

Here is what the Summit was really about for me and what I asked the team to focus on:

1. Content: This year we moved away from the “come as you please” style for presentation content that we had used in years past. What is the line that I used to stress this to the team? “Content is to the Summit what Location is to Real-Estate.”. Everything. Therefore all technical content was first selected then shepherded through a rigorous review and feedback process by the Program Committee. As a result, we heard some fantastic feedback on the quality and usefulness of the presentations this year in the technical tracks. Raymie Stata, Yahoo!’s CTO, made the point numerous times, saying that as a technology grows you usually see the amount and depth of technical content at conferences dedicated to that technology erode, but that at Hadoop Summit was just the opposite. If anything, the technical content was even deeper, which bodes well for Hadoop and the communities future. I cannot thank my Track Co-chairs enough for their support in making this happen. You can find out more about them at: www.hadoopsummit.org.

2. Sponsored Content: Related to keeping the overall quality of technical content high at the Summit was the desire to keep the amount of sponsored content relatively low, ideally in the ball park of 30 minutes out of the day. Mind you, the goal was never not to have any sponsored content. Sponsors who are offering Hadoop-related solutions have a significant role to play in the Hadoop Ecosystem and its evolution, but especially with so many new attendees and folks just getting started with Hadoop, the most valuable content is the technical focused insights they can put to work as they ramp up projects and get up to speed. The results speak for themselves – check out my fun fact number 3 below.

3. Ecosystem: There was some discussion and thinking early on in the conference planning that we should only include open source solutions in the program. However, in the end the majority of the Program Committee agreed that the Summit should represent the entire Hadoop ecosystem. Why? Because while technically it’s possible to separate Apache Hadoop from other Hadoop powered solutions, actual users don’t always make this distinction when putting Hadoop to work. Some use Apache, some use other distributions, but all of these folks are Hadoop users and are therefore part of the Hadoop ecosystem. Without users, there is no Hadoop, so who are we to leave some out?

4. Users: Speaking of users, they are who motivated us to have a brand new track this year focused on them, the Operations and Experience track. The goal of this track was to provide a forum for sharing how different companies are operating and managing Hadoop in the real world, or otherwise talk about their experience with Hadoop. As we had expected, this content was particularly popular with Hadoop users this year. I believe, in the years to come, given the pace of Hadoop growth the interest in this track will continue to increase and the content will no doubt expand as well.

5. Developers: Finally, developers are still the core of the Hadoop Summit and are still the engine for innovation in Hadoop. Many attendees complimented me on what a great “event” Hadoop Summit was. Interestingly, I never really thought of the Summit as an event. For me it was a Hadoop developer and user gathering. As a developer myself, having been there and done that, now I enjoy helping showcase the amazing, high quality work that comes from the Hadoop community of developers. Helping great code get shared and adopted by developers and users is really the heart and soul of the Summit for me.

Looking forward, on the heels of a successful Summit, what’s next? Here at Yahoo! I’ve been saying that Hadoop the Software is maturing, while Hadoop the Product is still in its nascence. Do I mean that Hadoop the Software is done? Not at all. What I mean is that future work on Hadoop as software will be focused on transforming it into Hadoop the product. That means a lot of development on Usability, Manageability and Operability. These very broad areas cover a lot of ground, but collectively they all go towards making Hadoop more Enterprise ready. Hadoop is already ready for the Enterprise when it comes to Scalability, Availability and Reliability. At Yahoo! we know this probably better than anyone else. But it is the rest of these core “abilities” that will put Hadoop on the fast track to Enterprise deployment at companies that don’t have large Hadoop engineering teams in-house.

Before I wrap-up, here are a few little known facts about this year’s Summit that speak to all of the above:

1. The Summit had a 27% percent acceptance rate for submissions – it was very competitive and a great sign that the interest in Hadoop is driving high-quality technical content.
2. 50% of the ticket sales happened in the last two weeks. In other words, you all need to plan better so we can fit more of you. :-)
3. There were no sponsored tech talks this year, none at all – so if you’re looking for unbiased, useful technical content, the Summit tech talks were as pure as can be.
4. Multiple leading database vendors are currently evaluating Hadoop and HBase for internal use.
5. Hadoop is designed for use with non-reliable storage. ;-)

Looking forward to seeing the northern California Hadoopers at the July HUG. If you would like to attend, you can sign up here: http://www.meetup.com/hadoop/.

These are exciting times for Hadoop and may you enjoy living through them.

/later