XML Interview exercises

Introduction

In this set of exercises we look at the effect of modelling decisions on the difficulty of carrying out standard XML tasks, like writing schemas, evolving schemas, writing transformations and XQueries. We model a well known real life situation: an interview.

Author: Maarten Marx
Date: 2010-09-20

(1) Model Interviews

An interview consists of a sequence of question-answer alternations. Our interviews are such that on each question multiple answers follow. In this exercise we create DTD's for different modellings of such interviews.

Your DTD consists of three elements I,Q,A. I is the root, and Q and A are XML elements to store questions and answers, respectively. Both Q and A have an attribute text, in which the text of the questions and asnwers is stored.

There are several ways to model such interviews. In each model, the tree of an interview looks different. You have to find at least five different modellings. For each of them, write the DTD, and make an example document. Parts of the DTD which are common to all models don't have to be repeated. DTD rules for elements can be given as follows: Element-name -> regular expression over element-names, like OL -> LI+. For convenience and inspiration, we have given the models we like to see suggestive names:

  1. flat
  2. hierarchical
  3. the stick
  4. hierarchical-stick hybrid
  5. nested hierarchical
Here is an example of an XML tree:
            <?xml version="1.0" encoding="UTF-8"?>
            
            <I>
                <Q text="q1"/>
                <A text="a11"/>
                <A text="a12"/>
                
                <Q text="q2"/>
                <A text="a21"/>
            </I>
            

(2) XSLT Transformations I

You want to publish your interviews on the web as XHTML files and you use XSLT for that. Create stylesheets for all five models. From XSLT, you only need to use xsl:apply-templates select= and xsl:template match=. You can use XPath 2.0 expressions, and of course XHTML tags. The output should look as follows:

     <p class='q'>text of first question</p>
     <p class="a">text of first answer to first question</p>
     <p class="a">text of second answer to first question</p>
     ...
     <p class='q'>text of second question</p>
     <p class="a">text of first answer to second question</p>
     ....
 

(3) XSLT Transformations II

Now you want to publish again in XHTML, and again with XSLT, but now the output looks different. You want to make a table with two colums, with questions in the first column, and their answers in the second. So each question-answer pair occupies one row. It looks nice if you structure the answers as well, e.g. using a list. Again make the transformations for all five models.

The exercise might be easier if you first do the next XQuery exercise.

If you don't like this exercise, you can do the following transformation: Add an id-attribute to each query and and idref-attribute to each answer. The value of the idref attribute of an answer should be the value of the id-attribute of the corresponding query. An obvious candidate for the value of the id-attribute of the i-th question is ...i. There are many ways to get that value, for instance by counting how many previous questions were asked.

(4) XQuery

You want to create an XML database with question answer pairs, and do that with XQuery. Below you find the skeleton of the query. You must give the two XPath 2.0 expressions which define the questions and their corresponding answers. These are the ones indicated in the skeleton by Q-XPath and A-XPath.

            for $q in Q-XPath return
            <question text='{$q/@text}'>
                for $a in $q/A-XPath return
                <answer text='{$a/@text}'/>
            </question>
                   
        

(5) Changing the model

We have a new interview corpus by an interviewer which has a particular feature: she always wants to be the last who speaks. Thus her interviews satisfy the rule: the last Question does not have an Answer.

For each of the five modellings do the following:

  1. Change your DTD so that it satisfies the new feature.
  2. If you can't, give an intuitive argument why not.
  3. For those models for which you couldn't write the DTD, give an argument why the new language is still a regular tree language.
  4. Check out the language Relax NG on the compact syntax tutorial page. Create types and specify the new feature using types in Relax NG.
  5. Check out the notion of ancestor guarded subtree exchange (AGSE). Which of the modellings with the new feature are not closed under ancestor guarded subtree exchange? Provide the counterexamples.