Hadoop and Distributed Computing at Yahoo!
Introduction
Apache Hadoop* is an open source Java software framework for running data-intensive applications on large clusters of commodity hardware. Hadoop is a top level Apache project. It relies on an active community of contributors from all over the world for its success.
Hadoop implements two important elements. The first is a computational paradigm called Map/Reduce, which takes an application and divides it into multiple fragments of work, each of which can be executed on any node in the cluster. The second is a distributed file system called HDFS. HDFS stores data on nodes in the cluster with the goal of providing greater bandwidth across the cluster.
The Hadoop project is extremely important to us here at Yahoo!. We run the world's largest Hadoop clusters, work with academic institutions and other large corporations on advanced cloud computing research and our engineers are active participants in the Hadoop community.
Yahoo! Distribution of Hadoop
Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop we test and deploy across our large Hadoop clusters. As a service to the Hadoop community, we are releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project. This source distribution includes code patches that we have added to improve the stability and performance of our clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop.
The Yahoo! Distribution of Hadoop is currently available for download here.
Learn More
- Follow the Yahoo! Hadoop blog
- Read the Apache Hadoop documentation
- Read the Apache Hadoop wiki
- View the Yahoo! Hadoop tutorial (highly recommended)
- View Practical Problem Solving with Hadoop, an intro Hadoop course that Yahoo! Hadoop team member Milind Bhandarkar presented at UIUC on 2/13/09
- View Crossing the Chasm: Sneaking a Parallel File System Into Hadoop, a lecture by Wittawat Tantisiriroj from CMU on 2/6/09
Get Involved
- Sign up for the Hadoop mailing lists
- Learn how to contribute code on the Hadoop wiki
* Apache and Hadoop are trademarks of the Apache Software Foundation
Recent Blog Articles
view all
Slides from Hadoop World and University Talks
Wed, 28 Oct 2009
Hadoop User Group (HUG) – Oct 21st at Yahoo!
Fri, 23 Oct 2009
M45 Enables Web-Scale Information Extraction Research
Fri, 23 Oct 2009
Slides of September 23rd Bay Area Hadoop User Group
Mon, 05 Oct 2009
New Update: Yahoo! Distribution of Hadoop
Thu, 01 Oct 2009
Hadoop Core Users List
view all
Sun, 28 Jun 2009
Re: hadoop jobs take long time to setup
Sun, 28 Jun 2009
Re: hadoop jobs take long time to setup
Sun, 28 Jun 2009
Re: hadoop jobs take long time to setup
Sun, 28 Jun 2009

