Posts by Arun C Murthy

Arun C Murthy () Arun C Murthy leads the Apache Hadoop MapReduce Development Team at Yahoo. He has been a full-time contributor to Apache Hadoop since inception in 2006. He is a long-term Committer and Member of the Apache Hadoop PMC and jointly holds the current world sorting records using Hadoop. He is currently responsible for every bit of MapReduce code and configuration deployed on over 40,000 machines running Hadoop at Yahoo. He has worked on web scale infrastructure technology for Yahoo since 2003.

Next Generation of Apache Hadoop MapReduce – The Scheduler

Introduction The previous post in this series covered the next generation of Apache Hadoop MapReduce in a broad sense, particularly its motivation, high-level architecture, goals, requirements, and aspects of its implementation. In the second post in a series unpacking details of the implementation, we’d like to present the protocol for resource allocation and scheduling that [...]

Architecture of Next Generation MapReduce

The Next Generation of Apache Hadoop MapReduce

Overview In the Big Data business running fewer larger clusters is cheaper than running more small clusters. Larger clusters also process larger data sets and support more jobs and users. The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors [...]

The Hadoop Map-Reduce Capacity Scheduler

byArun C. MurthyLead, Hadoop Map-Reduce Development Team, Yahoo This blog post describes the Capacity Scheduler, a pluggable MapReduce scheduler for Apache Hadoop, which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities.   We have developed and deployed the [...]