Posts by Alan Gates

Alan Gates () Architect, Yahoo! Grid Team Pig PMC member HCatalog mentor and committer Apache member

HCatalog, tables and metadata for Hadoop

Last month the HCatalog project (formerly known as Howl) was accepted into the Apache Incubator. We have already branched for a 0.1 release, which we hope to push in the next few weeks. Given all this activity, I thought it would be a good time to write a post on the motivation behind HCatalog, what [...]

Pig and Hive at Yahoo!

Yahoo! has begun evaluating Hive for use as part of its Hadoop stack. Since, in many peoples’ minds, Hive and Pig are roughly equivalent and Pig Latin is very close to SQL, this has led to some confusion. Why are we interested in using both technologies? As we have looked at our workloads and analyzed [...]

Comparing Pig Latin and SQL for Constructing Data Processing Pipelines

I have been asked by users who are going to construct a data pipeline whether they should use Pig Latin or SQL. For those of you who are not familiar with Pig, it is a platform for analyzing large data sets. It is built on Hadoop and provides ease of programming, optimization opportunities and extensibility. [...]