Blog Posts by Yahoo! Developer Network

  • Jonathan Gray & J-D Cryans – HBase Goes RealTime

    Jonathan Gray and Jean-Daniel Cryans introduce their latest release of HBase.

    Jonathan Gray is the co-founder and CTO of Streamy.com, which has been using HBase in production for about nine months. Jean-Daniel (J-D) Cryans is an HBase committer and Universite de Quebec graduate student, currently working as an Hbase consultant for OpenPlaces.com. Together they've released HBase 0.20 -- the performance release -- rewritten and re-architected for the Web.

    HBase is a storage system that's built on top of HDFS. The guiding philosophy of their release: to unjava-fy everything. Some of the major changes: new key format, new file format (HFile), new query API, new result API and optimized serialization, new scanner abstractions, and new concurrent LRU block cache.

    See for yourself.



    For a better quality version, higher resolution, click below:
    iPodDownload NOW

    DesktopDownload NOW

    Read More »from Jonathan Gray & J-D Cryans – HBase Goes RealTime
  • MObStor: Yahoo!’s Unstructured Data Cloud

    Introduction

    Over the past fourteen years, Yahoo!'s properties have served a broad swath of the Internet. With several hundred million users across the globe and tens of thousands of page views every second, we've generated and served petabytes of content to our users worldwide.

    A number of Yahoo! products are content-centric, whether the content is user-generated, editorial, or sourced from our partners. That means a lot of data, structured and unstructured. We talked recently about how Sherpa, a service we've built in-house to efficiently manage structured storage. Unstructured storage deals with storing data comprised of streams of bytes that are treated as complete objects (e.g. images, videos, CSS and JavaScript libraries, etc.), and is an equally interesting area at Yahoo!.

    Unstructured Storage for the Internet

    I'm the Product Manager for MObStor, Yahoo!'s unstructured storage cloud, and I'm writing this to share our approach to unstructured storage.

    MObStor grew out of the

    Read More »from MObStor: Yahoo!’s Unstructured Data Cloud
  • Jangwoo Kim & Denis Sheahan – Scaling Hadoop

    "Scaling Hadoop for multi-core and highly threaded systems" is the full title of this presentation from Sun's Jangwoo Kim and Denis Sheahan.

    Here they present the basic architecture of CMT (chip multi-threading) processors, designed by Sun for maximum throughput, and then describe the work the team did using Hadoop and other virtualization technologies to help scale CMT. They also present a case study of a small Washington, DC company they worked with that's using Hadoop for email discovery.



    For a better quality version, higher resolution, click below:
    iPodDownload NOW

    DesktopDownload NOW

    Read More »from Jangwoo Kim & Denis Sheahan – Scaling Hadoop
  • Tom White – Running Hadoop in the Cloud

    Tom White, Cloudera engineer, Hadoop committer, and author of Hadoop the Definitive Guide, from O’Reilly and Yahoo! Press, talks about running Hadoop in EC2.

    He opens with a discussion of the Berkeley RAD Lab paper on cloud computing and walks us through a set of definitions to a discussion of the public cloud. He sees a realm of interesting possibilities: an apparently infinite resource; the elimination of user commitment; and the pay-as you go model, which enables elasticity. Tom describes the implementation of Hadoop in this landscape.



    For a better quality version, higher resolution, click below:
    iPodDownload NOW

    DesktopDownload NOW

    Read More »from Tom White – Running Hadoop in the Cloud
  • Yanpei Chen & Laura Keys: Energy Efficient Hadoop

    There are compelling economic reasons to run datacenters with energy efficient software, as UC Berkeley researchers Laura Keys and Yanpei Chen describe in this talk. Typically, datacenters are measured using a Power Utilization Efficiency (PUE) metric-- but until now the role of software in the equation has been neglected.

    The speakers include energy use as a dimension of software performance alongside productivity and resource consumption metrics, and describe their experiments and findings on changing Hadoop software parameters to improve energy efficiency.



    For a better quality version, higher resolution, click below:
    iPodDownload NOW

    DesktopDownload NOW

    Read More »from Yanpei Chen & Laura Keys: Energy Efficient Hadoop
  • Jinesh Varia on Amazon Elastic MapReduce

    Amazon Web Services (AWS) evangelist Jinesh Varia presents Amazon's Elastic MapReduce, a web service that simplifies the complexity of large-scale data processing operations for a growing ecosystem of AWS users. The product utilizes a hosted Hadoop framework running on the infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3) and addresses use cases that involve data mining, bio-informatics, financial simulation, file processing, and web indexing.



    For a better quality version, higher resolution, click below:
    iPodDownload NOW

    DesktopDownload NOW

    Read More »from Jinesh Varia on Amazon Elastic MapReduce
  • Open Source Bridge 2009

    General Notes

    The Open Source Bridge conference was held at the Oregon Convention Center in Portland from June 17 to June 19. Sessions covered a range of topics: from building and growing open source businesses to yoga and meditation ... but the focus was decidedly technical, with some great sessions on a number of different OSS projects at varying levels of detail. The first two days featured talks of interest to the Open Source community, while the last day was an unconference. After the day was done, the Yahoo! Developer Network crew hosted a 24x7 hacker lounge with WiFi, to alleviate midnight hacking withdrawal symptoms.

    I had the opportunity to attend an eclectic mix of sessions, and a few common threads emerged. As a Product Manager in the cloud computing group at Yahoo!, I'll focus on subjects that relate to the cloud, although there was no shortage of interesting discussion on a wide range of other subjects too.

    The Cloud:

    People in the Open Source Community are interested in

    Read More »from Open Source Bridge 2009
  • Christophe Bisciglia on The Growing Hadoop Community

    Cloudera co-founder Christophe Bisciglia takes a detailed look at the growth and evolution of Hadoop technology and community over the past year. Then he focuses in on the layers of tools and services that are being built now for a variety of implementations in more than one sector.

    Thanks to Cloudera for their support for Hadoop Summit 2009.



    For a better quality version, higher resolution, click below:
    iPodDownload NOW

    DesktopDownload NOW

    Read More »from Christophe Bisciglia on The Growing Hadoop Community
  • Juan Carlos Soto on the Sun Cloud and Hadoop

    Juan Carlos Soto from Sun leads the business and marketing group for Cloud Computing. Here he speaks about the Sun Cloud and how it's an ideal service for Hadoop users. Juan Carlos presents Sun's perspective on the benefits of cloud computing for developers across the industry, in a startup or a large enterprise environment.



    For a better quality version, higher resolution, click below:
    iPodDownload NOW

    DesktopDownload NOW

    Read More »from Juan Carlos Soto on the Sun Cloud and Hadoop
  • Rod Smith on Hadoop in the Enterprise

    Rod Smith is an IBM Fellow who leads IBM's Internet Emerging Technology team. In this presentation, Rod discusses Hadoop from the enterprise perspective. His team works with customers, and looks at emerging technologies to see if there's a fit with the problems his customers are trying to solve.

    He believes Hadoop can provide the framework for a new class of applications that are on the horizon, apps that unlock insights from vast quantities of data.



    For a better quality version, higher resolution, click below:
    iPodDownload NOW

    DesktopDownload NOW

    Read More »from Rod Smith on Hadoop in the Enterprise

Pagination

(91 Stories)