6.5830/6.5831: Lecture 21

In this lecture, we will discuss Spark, a "cluster computing" language with similar design goals to MapReduce, but with improved performance, caching, and programmability.

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. [PDF]

Some questions to consider as you read:

How is the Spark computation model like MapReduce? How is it different?
What is the notion of a "resilient distributed dataset"? How does it help programmers write fault-tolerant programs?