- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. [PDF]
Some questions to consider as you read:
- How is the Spark computation model like MapReduce? How is it different?
- What is the notion of a "resilient distributed dataset"? How does it help programmers write fault-tolerant programs?