Hadoop and Distributed Computing at Yahoo!
Introduction
Apache Hadoop* is an open source Java software framework for running data-intensive applications on large clusters of commodity hardware. Hadoop is a top level Apache project. It relies on an active community of contributors from all over the world for its success.
Hadoop implements two important elements. The first is a computational paradigm called Map/Reduce, which takes an application and divides it into multiple fragments of work, each of which can be executed on any node in the cluster. The second is a distributed file system called HDFS. HDFS stores data on nodes in the cluster with the goal of providing greater bandwidth across the cluster.
The Hadoop project is extremely important to us here at Yahoo!. We run the world's largest Hadoop clusters, work with academic institutions and other large corporations on advanced cloud computing research and our engineers are active participants in the Hadoop community.
Yahoo! Distribution of Hadoop
Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop we test and deploy across our large Hadoop clusters. As a service to the Hadoop community, we are releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project. This source distribution includes code patches that we have added to improve the stability and performance of our clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop.
The Yahoo! Distribution of Hadoop is currently available for download here.
Learn More
- Follow the Yahoo! Hadoop blog
- Read the Apache Hadoop documentation
- Read the Apache Hadoop wiki
- View the Yahoo! Hadoop tutorial (highly recommended)
- View Practical Problem Solving with Hadoop, an intro Hadoop course that Yahoo! Hadoop team member Milind Bhandarkar presented at UIUC on 2/13/09
- View Crossing the Chasm: Sneaking a Parallel File System Into Hadoop, a lecture by Wittawat Tantisiriroj from CMU on 2/6/09
Get Involved
- Sign up for the Hadoop mailing lists
- Learn how to contribute code on the Hadoop wiki
* Apache and Hadoop are trademarks of the Apache Software Foundation
Recent Blog Articles
view all
Hadoop Bay Area User Group - Feb 17th at Yahoo!, Sunnyvale
Wed, 03 Feb 2010
Comparing Pig Latin and SQL for Constructing Data Processing Pipelines
Fri, 29 Jan 2010
Video from Jan. 20, 2010 Hadoop Bay Area User Group now online
Thu, 28 Jan 2010
Stomping out Java "concurrency cockroaches" with SureLogic's Flashlight and JSure tools
Tue, 26 Jan 2010
Hadoop Bay Area January 2010 User Group - Recap
Thu, 21 Jan 2010

