- Christopher Olston et al. Pig Latin: A Not-So-Foreign Language for Data Processing. SIGMOD 2008 [PDF].
This paper describes a declarative language called Pig Latin, and a system, called Pig, for data processing on top of Hadoop, and open source implementation of MapReduce.
As you read the paper, consider the following questions:
- What are the differences between Pig Latin and SQL? Why invent a new language?
- How are Pig Latin programs compiled into MapReduce jobs? In particular, how would you implement a join?
- What are the performance tradeoffs involved in compiling a declarative query into MapReduce versus having a conventional iterator-based query plan?