Posts in the Miscellaneous category

Fourth Annual Hadoop Summit: The Countdown Begins!

On June 29, Yahoo! will host the 4th annual Hadoop Summit at the Santa Clara Convention Center. Hadoop Summit 2011 brings together some of the most influential thought leaders in the space – from Yahoo, Facebook, IBM, NetApp, and others. Jay Rossiter, Senior Vice President of the Yahoo! Cloud Platform Group will open the show [...]

Hadoop Summit CFP closing tomorrow!

Stack and I are the track organizers for the community track at the Hadoop Summit this year. The community track is for presentations on roadmap, developments and features in Apache Hadoop. So if you’ve added a new feature to Hadoop and want to publicize it to the world’s largest and most important Hadoop conference, please [...]

HCatalog, tables and metadata for Hadoop

Last month the HCatalog project (formerly known as Howl) was accepted into the Apache Incubator. We have already branched for a 0.1 release, which we hope to push in the next few weeks. Given all this activity, I thought it would be a good time to write a post on the motivation behind HCatalog, what [...]

Hadoop User Group meeting recap, March 2011

More than 200 Hadoop developers and enthusiasts congregated on the Yahoo campus for the monthly HUG meeting on March 16-Th. As always, they were treated to some enlightening presentations in addition to good food and beverages. After the usual 30 minutes of socializing and networking, Milind Bhandarkar from LinkedIn, kicked off the evening with a [...]

Next Generation of Apache Hadoop MapReduce – The Scheduler

Introduction The previous post in this series covered the next generation of Apache Hadoop MapReduce in a broad sense, particularly its motivation, high-level architecture, goals, requirements, and aspects of its implementation. In the second post in a series unpacking details of the implementation, we’d like to present the protocol for resource allocation and scheduling that [...]

Watson playing Jeopardy

I’ll Take Hadoop for $400, Alex

See what Yahoo! and Jeopardy! have in common. This week, IBM’s supercomputer, Watson (named after IBM’s founder, Thomas J. Watson), took on two of the most championed Jeopardy! contestants of all time in an exhilarating million-dollar Jeopardy! face-off between man and machine.Watson defeated Jeopardy! defenders Ken Jennings and Brad Rutter, amassing $77,147 in winnings in [...]

Managing Big Data: Architectural Approaches for making batch data available online

This is the beginning of an ongoing series of blog posts on “Managing Big Data”. This series will focus on techniques that Yahoo uses to process large volumes of data, ranging from initial collection of data to the end usage of that data. Introduction Over the last several years there are two important trends that [...]

Hadoop and the fight against shape-shifting spam

At a recent Hadoop User Group meeting, I made a presentation on how we leverage hadoop for spam mitigation in Yahoo! Mail. A number of people followed up requesting additional details of our architecture and engineering strategy. In this post, I am going to try and capture our antispam engineering story, how it came to [...]

Hadoop Bay Area User Group – Feb 17th at Yahoo! – RECAP

Hi Hadoopers, Thanks everyone for joining us last night at the Yahoo!’s Sunnyvale campus. There were more than 150 attendees, the community is growing!. It was great to see many new faces and companies/solutions that are basing their business on Hadoop. For those of you who were unable to attend in person the session’s details [...]

Hadoop Bay Area January 2010 User Group – Recap

Hi Hadoopers Thanks everyone for joining us last night at the Yahoo!’s Sunnyvale campus. There were close to 150 attendees, a nice way to start the meetings for 2010. I was happy to see familiar and many new faces. It was also great to see the thriving conversations and solution sharing..   For those of [...]