May 5, 2011
Stack and I are the track organizers for the community track at the Hadoop Summit this year. The community track is for presentations on roadmap, developments and features in Apache Hadoop. So if you’ve added a new feature to Hadoop and want to publicize it to the world’s largest and most important Hadoop conference, please [...]
March 24, 2011
The Hadoop project won the top MediaGuardian Innovation award. A groundbreaking open source project has won the top prize at the 2011 MediaGuardian Innovation Awards.The judging panel described the Apache Hadoop project as the Swiss army knife of the 21st Century, and having the potential to completely change the face of media innovations across the [...]
February 24, 2011
See what Yahoo! and Jeopardy! have in common. This week, IBM’s supercomputer, Watson (named after IBM’s founder, Thomas J. Watson), took on two of the most championed Jeopardy! contestants of all time in an exhilarating million-dollar Jeopardy! face-off between man and machine.Watson defeated Jeopardy! defenders Ken Jennings and Brad Rutter, amassing $77,147 in winnings in [...]
February 23, 2011
We had a record turnout for the February 2011 Hadoop User Group at the main Sunnyvale Yahoo! campus with 336 people signed up. Next month and for the rest of the year, we’ll be in the larger Yahoo! cafeteria across the street that can hold up to 1000 people. If I remember correctly, the first [...]
October 23, 2009
About us We are PhD students at Carnegie Mellon in the Machine Learning Department and the Language Technologies Institute, and our thesis work is part of the Read the Web project, which is led by Professor Tom Mitchell. The goal of our project is to build a system that can start from a limited amount [...]
September 30, 2009
As the world’s largest user and contributor of Hadoop, Yahoo is excited to be sponsoring and presenting at the upcoming Hadoop World in New York City on Friday October 2, 2009. Yahoo has been using Hadoop since the beginning of 2006 and have built up our Hadoop clusters from 20 machines up to a current [...]
August 27, 2009
Introduction In a typical Hadoop MapReduce job, input files are read from HDFS. Data are usually compressed to reduce the file sizes. After decompression, serialized bytes are transformed into Java objects before being passed to a user-defined map() function. Conversely, output records are serialized, compressed, and eventually pushed back to HDFS. This seemingly simple, two-way [...]