6.830/6.814: Database Systems
Spring 2021

This paper describes a declarative language called Pig Latin, and a system, called Pig, for data processing on top of Hadoop, and open source implementation of MapReduce.

As you read the paper, consider the following questions:

  1. What are the differences between Pig Latin and SQL? Why invent a new language?
  2. How are Pig Latin programs compiled into MapReduce jobs? In particular, how would you implement a join?
  3. What are the performance tradeoffs involved in compiling a declarative query into MapReduce versus having a conventional iterator-based query plan?