Hadoop 0.18 Highlights

Apache Hadoop 0.18 was released on 8/22. This is the largest Hadoop release to date in terms of the number of patches committed (266). It also has the largest percentage of patches (20%) from contributors outside of Yahoo!. This is a great indicator of both the growth of the Hadoop community and their increasing involvement in the projects progress. The size of the release resulted in a very large number of blocking bugs in the code base. Unfortunately, this created a big delay between the feature freeze on 6/4 and the final release.

Hadoop 0.18 has many improvements in the areas of performance, scalability and reliability in addition to new features. Some of the performance improvements contributed to Hadoop’s first place in the terabyte sort benchmark. Hadoop 0.18 runs the grid mix benchmark in ~45% of the time taken by Hadoop 0.15. Lots of cool new stuff in this release, some of which is briefly described below.


Namespace auto-recovery
The HDFS Namenode can store the filesystem image and journal in multiple locations. Upon startup it automatically consults all configured locations of its state and reads the most up to date image and journal. If all of the Namenodes copies of data are unavailable state can be (mostly) recovered from the secondary Namenode using the ‘¬-importCheckpoint’ switch. More details can be found in HADOOP-2585.
Fast restart
Namenode re-start, particularly for large clusters has been slow. It has until Hadoop 0.17 taken, for instance, over an hour to bring up a Namenode on a 2000 node cluster. Problems included inefficiencies in block report processing, getting stuck in safe mode etc. Most of these have been addressed and Namenode startup on up to 3000 node clusters happens in HADOOP-3022.
Namespace quotas and archives
HDFS now has directory-based quotas for namespace management. A quota set on a directory limits the number of entries in that sub-tree to the quota value. Only the super-user may set or change quotas. Quotas can be manipulated programmatically or via command line utilities. HADOOP-3187 describes quotas in detail.
RPC performance and scaling improvements
This release comes with a significant re-vamp of the RPC subsystem in the form of HADOOP-2188, HADOOP-2909 and HADOOP-2910. This includes the use of pings instead of timeouts, improvements in the management of idle connections and client throttling when the server is under load. These improvements will have the greatest effect of large clusters (>1000 nodes) and prevent jobs from failing when the Namenode or Jobtracker are under load.
Read/write performance improvements
HADOOP-1702 reduces buffer copies while writing to HDFS and brings down Datanode CPU usage during writes by 30%. HADOOP-3164 used sendfile on the Datanodes for reads from HDFS. This results in an 80% reduction in CPU usage by the Datanodes.
Audit logging
HADOOP-3336 introduces audit logging for HDFS. The Namenode logs all file and directory accesses. An audit log entry includes the originating IP, action requested, pathname accessed, client user and group id and existing permissions on the accessed pathname.
Append … almost there
Lots more work to support append which unfortunately did not make the cut for Hadoop 0.18. Notable changes include lease recovery for append (HADOOP-3310), datanode generation stamp upgrades (HADOOP-3283) and lease management when open files get renamed (HADOOP-3176).
Mounting via FUSe
The oldest open Hadoop bug, HADOOP-4 is now closed. This work enables HDFS mounting via FUSE.


Intermediate compression that just works
Compression of intermediate outputs in Hadoop Map/Reduce has long been a source of grief. Intermediate compression, when enabled would frequently induce job failure by causing tasktrackers to run out of memory, run slow, cause disk thrash etc. HADOOP-3366 and HADOOP-2095 address these problems by introducing a different file format for shuffle data and improving memory management in reduce tasks. Intermediate compression may now be enabled with the supported codecs (gzip and lzo).
(Single) reduce optimizations
Many important optimizations in sort/merge in reduce tasks. HADOOP-3297 improves the fetching of many small outputs. HADOOP-3365 eliminates unnecessary buffer copies during the merge phase. HADOOP-3429 improves Hadoop streaming performance by buffering the i/o paths to streaming processes.
Archive tool
Quotas are complemented by ‘Hadoop archives’, which are a tool for users to manage their namespace consumption. A large number of files can be converted into a Hadoop archive using a Map/Reduce utility. A Hadoop archive is basically an HDFS directory with a small number of data files that consist of files from the original set concatenated together. An index stores the location of each file from the original set. Individual files in an archive can be accessed using a special URI with the ‘har’ schema. Archives and their use are discussed in HADOOP-3188.

Sameer Paranjpye