6.5830/6.5831: Database Systems
Fall 2022
In this lecture, we will discuss Spark, a "cluster computing" language with similar design goals to MapReduce, but with improved performance, caching, and programmability.

Some questions to consider as you read:

  • How is the Spark computation model like MapReduce? How is it different?
  • What is the notion of a "resilient distributed dataset"? How does it help programmers write fault-tolerant programs?