Yahoo Developer Network

February 5, 2016

Share

Hadoop Turns 10

<center>by Peter Cnudde, VP of Engineering</center><figure class="tmblr-full" data-orig-height="500" data-orig-width="500"><img src="https://66.media.tumblr.com/f4ca9d4bdd4c46f8bcccc15b6e9b0c22/tumblr_inline_o233esaSG41t17fny_540.jpg" data-orig-height="500" data-orig-width="500"/></figure> It is hard to believe that 10 years have already passed since Hadoop was started at Yahoo. We initially applied it to web search, but since then, Hadoop has become central to everything we do at the company. Today, Hadoop is the de facto platform for processing and storing big data for thousands of companies around the world, including most of the Fortune 500. It has also given birth to a thriving industry around it, comprised of a number of companies who have built their businesses on the platform and continue to invest and innovate to expand its capabilities. At Yahoo, Hadoop remains a cornerstone technology on which virtually every part of our business relies on to power our world-class products, and deliver user experiences that delight more than a billion users worldwide. Whether it is content personalization for increasing engagement, ad targeting and optimization for serving the right ad to the right consumer, new revenue streams from native ads and mobile search monetization, data processing pipelines, mail anti-spam or search assist and analytics – Hadoop touches them all.When it comes to scale, Yahoo still boasts one of the largest Hadoop deployments in the world. From a footprint standpoint, we maintain over 35,000 Hadoop servers as a central hosted platform running across 16 clusters with a combined 600 petabytes in storage capacity (HDFS), allowing us to execute 34 million monthly compute jobs on the platform.But we aren’t stopping there, and actively collaborate with the Hadoop community to further push the scalability boundaries and advance technological innovation. We have used MapReduce historically to power batch-oriented processing, but continue to invest in and adopt low latency data processing stacks on top of Hadoop, such as Storm for stream processing, and Tez and Spark for faster batch processing.What’s more, the applications of these innovations have spanned the gamut – from cool and fun features, like <a href="http://yahoohadoop.tumblr.com/post/128806138136/building-flickrs-magic-view-with-hbase-and-the" target="_blank">Flickr’s Magic View</a> to one of our most exciting recent <a href="http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop" target="_blank">projects</a> that involves combining Apache Spark and <a href="http://caffe.berkeleyvision.org/" target="_blank">Caffe</a>. The project allows us to leverage GPUs to power deep learning on Hadoop clusters. This custom deployment bridges the gap between HPC (High Performance Computing) and big data, and is helping position Yahoo as a frontrunner in the next generation of computing and machine learning.We’re delighted by the impact the platform has made to the big data movement, and can’t wait to see what the next 10 years has in store.Cheers!

Latest Blogs

Latest Blogs

Hadoop Turns 10