[Announcements | Schedule and Readings | Assignments and Quizzes | Syllabus/Policies ]

Reading Questions, Lecture 8 (10/4)

Papers: Parallel Database Systems, Optimizer Validation and Performance Evaluation in R*

These papers deal with issues in multiprocessor database systems. The Dewitt and Gray paper is a high level summary of database architectures for parallelism, illustrating some of the techniques that can be used to exploit the availability of multiple processors. The R* system (not to be confused with R* trees!) is an extension of System R to run on multiple machines -- they are particularly concerned with the additional cost of network I/O in query processing.

Questions to consider:

  1. What's the difference between a parallel and a distributed database? What issues are different in one architecture versus the other? In what ways are the two architectures alike?
  2. Why do Dewitt and Gray advocate a shared nothing architecture?
  3. In what ways must existing database architectures be modified to support multi-processor environments? What new data layout issues are introduced? What new query processing challenges must be addressed?
  4. In System R*, which distributed join algorithm performs the best? Why? Do you think it would still perform better than the others in a modern database system?

Samuel Madden (madden at csail dot mit dot edu)
Last modified: Thu Sep 30 18:34:01 EDT 2004