Posts by Owen OMalley

Owen OMalley () I am a software architect on Yahoo's Hadoop development team, was the first committer added to Hadoop, and the original chair of the Apache Hadoop Project Management Committee. My focus has been on MapReduce and Security.

Hadoop Summit CFP closing tomorrow!

Stack and I are the track organizers for the community track at the Hadoop Summit this year. The community track is for presentations on roadmap, developments and features in Apache Hadoop. So if you’ve added a new feature to Hadoop and want to publicize it to the world’s largest and most important Hadoop conference, please [...]

Apache Hadoop Innovation Award

Apache Hadoop project wins MediaGuardian Innovation award

The Hadoop project won the top MediaGuardian Innovation award. A groundbreaking open source project has won the top prize at the 2011 MediaGuardian Innovation Awards.The judging panel described the Apache Hadoop project as the Swiss army knife of the 21st Century, and having the potential to completely change the face of media innovations across the [...]

Watson playing Jeopardy

I’ll Take Hadoop for $400, Alex

See what Yahoo! and Jeopardy! have in common. This week, IBM’s supercomputer, Watson (named after IBM’s founder, Thomas J. Watson), took on two of the most championed Jeopardy! contestants of all time in an exhilarating million-dollar Jeopardy! face-off between man and machine.Watson defeated Jeopardy! defenders Ken Jennings and Brad Rutter, amassing $77,147 in winnings in [...]

Hadoop User Group (HUG) February 2011 recap

We had a record turnout for the February 2011 Hadoop User Group at the main Sunnyvale Yahoo! campus with 336 people signed up. Next month and for the rest of the year, we’ll be in the larger Yahoo! cafeteria across the street that can hold up to 1000 people. If I remember correctly, the first [...]

M45 Enables Web-Scale Information Extraction Research

About us We are PhD students at Carnegie Mellon in the Machine Learning Department and the Language Technologies Institute, and our thesis work is part of the Read the Web project, which is led by Professor Tom Mitchell. The goal of our project is to build a system that can start from a limited amount [...]

Yahoo! at Hadoop World in New York

As the world’s largest user and contributor of Hadoop, Yahoo is excited to be sponsoring and presenting at the upcoming Hadoop World in New York City on Friday October 2, 2009. Yahoo has been using Hadoop since the beginning of 2006 and have built up our Hadoop clusters from 20 machines up to a current [...]

The Anatomy of Hadoop I/O Pipeline

Introduction In a typical Hadoop MapReduce job, input files are read from HDFS. Data are usually compressed to reduce the file sizes. After decompression, serialized bytes are transformed into Java objects before being passed to a user-defined map() function. Conversely, output records are serialized, compressed, and eventually pushed back to HDFS. This seemingly simple, two-way [...]