March 16, 2011
Introduction The previous post in this series covered the next generation of Apache Hadoop MapReduce in a broad sense, particularly its motivation, high-level architecture, goals, requirements, and aspects of its implementation. In the second post in a series unpacking details of the implementation, we’d like to present the protocol for resource allocation and scheduling that [...]
February 14, 2011
Overview In the Big Data business running fewer larger clusters is cheaper than running more small clusters. Larger clusters also process larger data sets and support more jobs and users. The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors [...]
February 10, 2011
byArun C. MurthyLead, Hadoop Map-Reduce Development Team, Yahoo This blog post describes the Capacity Scheduler, a pluggable MapReduce scheduler for Apache Hadoop, which allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. We have developed and deployed the [...]