Hadoop 0.17 Preview

Apache Hadoop 0.17 is due for release any day now. Feature freeze for the release was on April 4th. The Hadoop dev community is currently actively fixing blocking issues discovered by users that have tried it out. This is a release we’re very excited about as it introduces many long awaited performance fixes to the platform. We’ve observed on the order of 30%(!) improvement in the runtime of some of the Hadoop benchmarks. As always, user feedback is invaluable and we urge folks to kick the tires on the release and help close it out. Here is a quick rundown of the important changes in the release.

HDFS

 

  • Syntax cleanup of Hadoop fs shell commands

    The syntax of many Hadoop fs shell commands has been revised. The goals have syntax consistency across commands and compatibility with POSIX syntax as far as possible. For examples, see HADOOP-1677, HADOOP-1792, HADOOP-1891
  • Block placement changes

    HDFS’ block placement strategy is now more tuned towards evenly distributing data across nodes. When a block is written from a node that is not running a Datanode, the first replica is written to a node on the same switch as the writer (instead of being written to the same node as the writer). The remaining two replicas are written to two nodes on a different switch. This strategy produces substantially better distributions in cases where data is loaded into HDFS from a small number of machines. More details at HADOOP-2559.
  • Append... getting closer

    Everyone’s favorite bug, HADOOP-1700, is still not closed, but a lot of progress has been made in this release. Append related work will make possible new flush() and sync() methods in the HDFS interface. The semantics are familiar. The flush() call plonks data into the ‘system buffers’ and returns immediately. The sync() call writes the data to the system, and returns only when it has hit disk.
  • More efficient replication

    Losing a rack of nodes usually means that the Namenode has to replicate around half a million blocks, and quickly. This would cause Namenode responsiveness to degrade and would increase the risk of data loss. A faster replication scheduling algorithm in the Namenode enables it to maintain quality of service to clients and replicate data faster in the event of losing many nodes at once.

Map/Reduce

 

  • Switch awareness

    Hadoop map/reduce is now capable of switch aware task placement. The framework attempts to place tasks on machines where their input data resides. In many cases (in particular with HoD), machines with input data are not available for running tasks, but machines that share a switch with such ‘input nodes’ are. Hadoop will now attempt to place tasks on machines that are switch local to input when machines with input data are unavailable.
  • Faster task scheduling

    HADOOP-2119 removes many inefficiencies in task placement and scheduling logic. The JobTracker would perform linear scans of the list of submitted tasks in cases where it did not find an obvious candidate task for a node. With better data structures for managing job state, all task placement operations now run in constant time.
  • Sort and shuffle improvements

    A couple of significant improvements to sort and shuffle are included in the form of HADOOP-910 and HADOOP-2919. HADOOP-910 has reducers performing merges of shuffle data (both in memory and on disk) while fetching map outputs. HADOOP-2919 improves memory management in sort on the map side, substantially decreases setup cost for the sort and uses quicksort instead of mergesort as the sorting algorithm.

Sameer Paranjpye
Yahoo! Grid Computing Team