Hadoop at Yahoo!
Introduction
Apache Hadoop* is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. Hadoop is a top level Apache project, initiated and led by Yahoo!. It relies on an active community of contributors from all over the world for its success.
With a significant technology investment by Yahoo!, Apache Hadoop has become an enterprise-ready cloud computing technology. It is becoming the industry de facto framework for big data processing.
The Hadoop project is an integral part of the Yahoo! cloud infrastructure — and is the heart of many of Yahoo!’s important business processes.
We run the world's largest Hadoop clusters, work with academic institutions and other large corporations on advanced cloud computing research and our engineers are leading participants in the Hadoop community.
Yahoo! sponsors the Annual Hadoop Summit and the monthly Hadoop User Group.
What’s new from Yahoo!?
Yahoo! Distribution of Hadoop with security
Hadoop with security is a significant update to the Yahoo! Distribution of Hadoop, previously contributed to Apache Hadoop. This update integrates Hadoop with Kerberos, a mature open source authentication standard.
Hadoop with security:- Prevents unauthorized access to data on Hadoop clusters
- Authenticates users sharing business sensitive data
- Reduces operational costs by consolidating Hadoop clusters
- Collocates data for new classes of applications
The Yahoo! Distribution of Hadoop with security is available for download here.
Oozie – Yahoo!'s workflow engine for Hadoop
Oozie, Yahoo!'s workflow engine for Hadoop is an open-source workflow solution to manage and coordinate jobs running on Hadoop, including HDFS, Pig and MapReduce.
Oozie was designed for Yahoo!’s complex workflows and data pipelines at global scale. It is integrated with the Yahoo! Distribution of Hadoop with security and is a primary mechanism to manage complex data analysis workloads across Yahoo!.
Oozie is available for download here.
Learn More
- Follow the Yahoo! Hadoop blog.
- Read the Apache Hadoop wiki.
* Apache and Hadoop are trademarks of the Apache Software Foundation
Recent Blog Articles
view all
Hadoop 0.20.S Virtual Machine Appliance
Tue, 29 Jun 2010
Managing Big Data: Architectural Approaches for making batch data available online
Thu, 24 Jun 2010
Hadoop and the fight against shape-shifting spam
Tue, 15 Jun 2010
Enabling Hadoop Batch Processing Systems to Consume Streaming Data
Wed, 09 Jun 2010
Hadoop Summit 2010 - Agenda is available!
Thu, 27 May 2010

