Hadoop2010: Online Content Optimization

allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21232247&autoPlay=0'>

iPod: Download high-resolution version

One of the most interesting problems we work on at Yahoo! is to provide the most relevant content to our users. This involves being able to track what are the interests of our users; mining the ever-changing content pool to see what is relevant, popular for our users. There is also content normalizing and de-duping issues to avoid redundancy. To solve all these problems, we make extensive use of Hadoop technology stack in our systems. Using Hadoop, we are able to scale to build models for millions of items, and users in near-real time. We leverage HBase for point lookups/stores of these models. We also use Pig for phrasing our workflows so the map-reduce parallelism is abstracted out of core processing.

Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.