• Hadoop Summit 2011 – A Different Approach

    Hadoop Summit 2011 is over. If you saw this tweet ”#hadoopsummit planned for 1,500. upped on demand to 1,600. finally accommodated 1,700. ran out of space, good problem to have. :-),” then you probably got an idea of how exciting and mobbed the conference was this year. With folks dropping by from coast-to-coast, and quite a few from around the world, Hadoop Summit 2011 will quite likely be the year’s largest Hadoop gathering. But even more so, because of the passion of everyone that participated, it was also the best Hadoop gathering of the year, raising the bar yet again for Hadoop technical content and networking.

    At the Summit and since it ended, I have received questions from folks who attended the show and some who couldn’t make it. In general, a lot of people were curious about what went into developing the Summit and the approach we took to the Summit. I thought I’d take some time today and summarize my thoughts on this topic.

    Obviously, in conference planning, a lot of the

    Read More »from Hadoop Summit 2011 – A Different Approach
  • Fourth Annual Hadoop Summit: The Countdown Begins!

    On June 29, Yahoo! will host the 4th annual Hadoop Summit at the Santa Clara Convention Center. Hadoop Summit 2011 brings together some of the most influential thought leaders in the space - from Yahoo, Facebook, IBM, NetApp, and others.

    Jay Rossiter, Senior Vice President of the Yahoo! Cloud Platform Group will open the show with a keynote around how Yahoo! is developing the next generation of Hadoop applications to handle big data, the important role that Hadoop plays in Yahoo!’s integrated technology ecosystem and how wide industry adoption of Hadoop is benefiting the entire community.

    Also on the main stage, Facebook will discuss its use of Hadoop to power the Facebook Messages infrastructure and IBM will discuss how they used Hadoop to power supercomputer, Watson.

    Additional conference highlights include some key sessions:

    * Next Generation Apache Hadoop MapReduce: Arun Murthy, Yahoo!’s lead architect on the Hadoop Map-Reduce development team, will lead a discussion on the next

    Read More »from Fourth Annual Hadoop Summit: The Countdown Begins!
  • Slides from eric14 talks @ #IbmBigData

    Hi Folks,

    Here are my slides from the IBM big data symposium. This was a good event. IBM announced a new release of their Apache Hadoop based Big Insights platform. It is great to hear their commitment to Apache. Yahoo was there talking about our experiences and uses of Hadoop. I got a lot of questions about why we invest in Hadoop, so let me point you back to my post on that and our commitment to Apache Hadoop. (http://yhoo.it/e8p3Dd and http://yhoo.it/i9Ww8W)

    Thanks,
    E14

    Read More »from Slides from eric14 talks @ #IbmBigData
  • Hadoop Summit CFP closing tomorrow!

    Stack and I are the track organizers for the community track at the Hadoop Summit this year. The community track is for presentations on roadmap, developments and features in Apache Hadoop. So if you've added a new feature to Hadoop and want to publicize it to the world's largest and most important Hadoop conference, please submit it!

    http://developer.yahoo.com/events/hadoopsummit2011/

    The deadline is 6 May, which is tomorrow!

  • Call for participation in the Hadoop Summit Research Track

    Hadoop Summit is a great annual gathering of developers to talk about all things Hadoop. The attendance is great, we are expecting 2000 this year; the presentations are excellent; and the hallway conversations are a great way to meet new people and come up with new ideas.

    This environment is especially great if you have a great idea that you would like to share with the community. You will have a great audience of knowledgeable developers that you can try to convince to help you to take your work to the next level. Doesn't it sound ... great!?!

    Milind and I are organizing the research and application track. If you have built some new framework on top of Hadoop or made Hadoop better, let us know. We will be selecting the most interesting results for the research and application track.

    General information for the Hadoop Summit is at http://hadoopsummit.org. You can submit an abstract for your presentation at http://developer.yahoo.com/events/hadoopsummit2011/presentationguidelines.html

    Read More »from Call for participation in the Hadoop Summit Research Track
  • HCatalog, tables and metadata for Hadoop

    Last month the HCatalog project (formerly known as Howl) was accepted into the Apache Incubator. We have already branched for a 0.1 release, which we hope to push in the next few weeks. Given all this activity, I thought it would be a good time to write a post on the motivation behind HCatalog, what features it will provide, and who is working on it.

    Why Did We Create HCatalog?

    Out of the box Hadoop provides the HDFS file system for users to store their data. File systems are nice because they provide a simple interface. Users can easily copy data into the file system and run jobs against that data. However, for more complex data processing tasks, the file system abstraction is not rich enough. It forces users to know where data is located, what format it is stored in, how it is compressed, and what its schema is. Consider, for example, a Pig Latin script used to do ETL on raw web logs:


    A = load '/data/raw/ds=20110225/region=us/property=news' using PigStorage()
          as

    Read More »from HCatalog, tables and metadata for Hadoop
  • Hadoop User Group meeting recap, March 2011

    More than 200 Hadoop developers and enthusiasts congregated on the Yahoo campus for the monthly HUG meeting on March 16-Th. As always, they were treated to some enlightening presentations in addition to good food and beverages.

    After the usual 30 minutes of socializing and networking, Milind Bhandarkar from LinkedIn, kicked off the evening with a really enlightening talk on "Scaling Hadoop Applications." As a well-respected Hadoop expert and a founding member of the Hadoop team at Yahoo in 2005, Milind was able to articulate the issues and solutions very succinctly. His talk was especially interesting because he tied well known theorems and laws around scalability to the ground realities on the Hadoop clusters today.

    Here are the slides from Milind's talk.

    Following is the video of the presentation.

    This was followed by an interesting talk on "HDFS Federation" by Yahoo's Suresh Srinivas. HDFS Federation is a major feature slated to come out in the

    Read More »from Hadoop User Group meeting recap, March 2011
  • Hadoop Summit 2011 – Registration Now Open!

    Calling all Hadoopers

    Yahoo! is pleased to announce that this year’s Hadoop Summit is scheduled for June 29th at the Santa Clara Convention Center. Registration for the event is now open and offers an early bird special of $125, a savings of nearly 30% on the full ticket price of $175. This ends on May 1st, so register now to take advantage of this great offer.

    Whether you are already running and managing a Hadoop installation, developing Hadoop-based applications or exploring how to adopt Apache Hadoop for your business, the summit provides a unique opportunity to gain deep insights into the world of Hadoop from the company that pioneered it. Learn about interesting and relevant real-world applications and find out about the latest Big Data research.

    The summit brings together some of the most influential speakers in the Hadoop space. Our full agenda provides many informative tracks for developers, administrators, managers and researchers. A

    Read More »from Hadoop Summit 2011: June 29th, Santa Clara Convention Center
  • Apache Hadoop Innovation AwardThe Hadoop project won the top MediaGuardian Innovation award.

    A groundbreaking open source project has won the top prize at the 2011 MediaGuardian Innovation Awards.

    The judging panel described the Apache Hadoop project as the Swiss army knife of the 21st Century, and having the potential to completely change the face of media innovations across the globe. Overall, the project was seen as a greater catalyst for innovation than WikiLeaks, the iPad and a host of other suggested nominees.

    All of the Hadoop contributors should be very proud of this award. Sanjay Radia, Jakob Homan, and I attended in person as members of the Hadoop Project Management Committee to receive the award on behalf of the project.

    I've been working on Hadoop full time since the beginning and it has been a pleasure working with such bright and dedicated engineers. It takes a village to raise an elephant from a prototype that runs on a few nodes to the project that is disrupting the big data industry.

    Read More »from Apache Hadoop project wins MediaGuardian Innovation award
  • ## Introduction

    The previous post in this series covered the next generation of Apache Hadoop MapReduce in a broad sense, particularly its motivation, high-level architecture, goals, requirements, and aspects of its implementation.

    In the second post in a series unpacking details of the implementation, we’d like to present the protocol for resource allocation and scheduling that drives application execution on a Next Generation Apache Hadoop MapReduce cluster.

    ## Background

    Apache Hadoop must scale reliably and transparently to handle the load of a modern, production cluster on commodity hardware. One of the most painful bottlenecks in the MapReduce framework has been the JobTracker, the daemon responsible not only for tracking and managing machine resources across the cluster, but also for enforcing the execution semantics for all the queued and running MapReduce jobs. The fundamental shift we hope to effect takes these two complex and interrelated concepts and re-factors them into

    Read More »from Next Generation of Apache Hadoop MapReduce – The Scheduler

Pagination

(99 Stories)