[Announcements | Schedule and Readings | Assignments and Quizzes | Syllabus/Policies ]

Reading Questions, Lecture 17 (11/10)

Papers: Introduction to XML, XML Query Optimization, X is for XQuery

These papers focus on querying "semistructured data". Often, when people use this term they simply mean XML, though there are a number of other forms of semi-structured data, including HTML and RDF. Semistructured data simply refers to data with some known fields with known structure (such as a traditional database table) combined with some unstructured data, such as text, images, or sound files. Thus, many of the documents that you work with every day are in fact semistructured -- MP3 files, for example, have ID3 tags that are regular and form a kind of structure; research papers have authors, titles, and publication venues that are also a form of structure.

As you read the papers, consider the following questions:

  1. In what ways does the data model used in XML (the "semistructured data model") differ from the relational data model?
  2. In what ways is query processing and optimization of XML documents more difficult than relational query processing?
  3. How does the "XML Query Optimization" paper exploit structure to improve the performance of XML queries?
  4. Does the XQuery language seem like a good proposal to you? How does it compare to SQL in terms of expressiveness, complexity, ease of use, and optimizability?
  5. Go back and look at the comments about XML in the "What goes around comes around" paper. Do you agree with Stonebraker's thoughts?

Samuel Madden (madden at csail dot mit dot edu)
Last modified: Sun Nov 7 13:29:21 EST 2004