Blog Posts by Sumeet Singh

  • Apache HBase at Yahoo! – Multi-tenancy at the Helm Again

    By Francis Liu and Sumeet Singh

    In 2009-2010, Yahoo! saw an unprecedented growth in the number of users coming onboard to its Apache Hadoop platform for their data processing and analytics needs. We attribute a majority of that success and increase in user base to the introduction of multi-tenancy, security, and partitioned namespaces in Hadoop.

    Screen Shot 2013-06-07 at 1.23.42 PM
    With Hadoop and its ecosystem components like Apache Pig and Apache Oozie getting popular at Yahoo!, we needed a solution to store mutable data and support random access to the stored data to complement the Apache Hadoop platform. Yahoo! had been using Apache HBase in isolated instances, most notably for the CORE personalization platform and for the web crawl cache at the time. However, the use of Apache HBase was limited to large projects that had the resources to operate dedicated HBase clusters.

    In 2012, Yahoo! developed multi-tenancy in Apache HBase to cater to a growing number of use cases where HBase was an excellent fit as part of its

    Read More »from Apache HBase at Yahoo! – Multi-tenancy at the Helm Again
  • Join Us for the 6th Annual Hadoop Summit in San Jose, CA


    Hortonworks and Yahoo! are pleased to host the 6th Annual Hadoop Summit, the leading conference for the Apache Hadoop community to be held on June 26-27, 2013 at the San Jose Convention Center. Hadoop Summit, the two-day event, will feature many of the Apache Hadoop thought leaders who will showcase successful Hadoop use cases, share development and administration tips and tricks, and educate organizations about how best to leverage Apache Hadoop as a key component in their enterprise data architecture. This event will also be an excellent networking event for developers, architects, administrators, data analysts and data scientists interested in advancing and extending Apache Hadoop.

    Popular sessions include:

    • Applied Hadoop
    • Scaling Big Data Mining Infrastructure: the Twitter Experience
    • HDFS, What's New and Future
    • Past, Present and Future of Data Processing in Apache Hadoop
    • Analysing 1.4 Trillion Events with Hadoop
    • Hadoop Operations at LinkedIn
    • Enterprise Integration of Disruptive
    Read More »from Join Us for the 6th Annual Hadoop Summit in San Jose, CA
  • Hadoop at Yahoo!: More Than Ever Before

    Hadoop ElephantHadoop ElephantA lot has changed at Yahoo! last year. We have new leaders, we gained millions in new audience*, we saw engagement gains from Social Bar, and we released several successful mobile apps such as Flickr and Yahoo! Mail. But with all that change, there is one thing that has remained constant, and that is our commitment to pioneering new ground for Hadoop.

    I was well aware of the rich legacy behind Hadoop at Yahoo! when I started in the Cloud Engineering Group about eight months ago. What I was perhaps not fully aware of was the talent and energy of our engineering team (watch the 2 min Hadoop Summit 2012 video), a team eager to push the scale and efficiency boundaries of Hadoop for delivering tangible business results for Yahoo!. We have really come together as a customer-focused group with tight alignment on our strategy, vision, and roadmap with continued commitment to stay true to Apache Software Foundation and contribute 100% of our development work back into the community.

    Hadoop at

    Read More »from Hadoop at Yahoo!: More Than Ever Before