hadoop-elephantWe are proud to announce we used Apache Hadoop to set a new Gray sort record for the Jim Gray's Sort benchmark. We nearly doubled the rate of the previous Gray sort entry by sorting at a rate of 1.42 Terabytes per minute. The previous record was 0.725 Terabytes per minute.
Jim Gray's sort benchmark consists of a set of many related benchmarks, each with their own rules. All of the sort benchmarks measure the time to sort different numbers of 100 byte records. The first 10 bytes of each record is the key and the rest is the value. The Gray sort is to measure the sort rate achieved while sorting at least 100 terabytes of data. The Minute sort is the amount of data that can be sorted in less than a minute. There are two different benchmark categories. The Daytona category requires the sort code to be general purpose sort. The Indy category needs to only sort 100-byte records with 10-byte keys. We used Hadoop Terasort with slightly different configurations in both categories.
There wereRead More »from Hadoop at Yahoo! Sets New Gray Sort Record – The Yellow Elephant is Getting Faster