Hadoop2010: Hadoop for Genomics

allowFullScreen='true' src='https://s.yimg.com/m/up/ypp/default/player.swf' flashvars='vid=21120816&autoPlay=0'>

iPod: Download high-resolution version

The field of genomics is of increasing importance to research and medicine. As the physical cost of DNA sequencing continues to drop, biologists are collecting ever larger data sets, requiring more sophisticated data processing. Hadoop is an excellent platform on which to build a consistent set of tools for genomics research. In this talk, Jeremy presents a general framework for working with genomic data in Hadoop, and provide details on implementations for many common operations, including a novel mechanism for de novo DNA sequence assembly. Hhe discusses how this open source genomics platform can be leveraged by researchers to reduce repeated effort and increase collaboration


Media Production by BAYCAT, a non-profit community media producer that educates and employs underserved youth and adults in the digital media arts.