Hive is open-source data warehouse infrastructure built on top of Hadoop, started at Facebook. In this talk, Namit Jain and Zheng Shao discuss how and why Facebook uses Hive. They present Hive's progress and roadmap and describe how the open source community can contribute to the evolution of Hive.
Hive is a system for managing and querying structured data built on top of Hadoop: it uses MapReduce for execution, HDFS for storage, and adds metadata on raw files.
Advanced data warehousing is a *huge* priority for Facebook -- in March 2008 the service was generating about 1TB per day in March 2008; in mid-2009, data production had increased to 10TB per day.
For a better quality version, higher resolution, click below: