Other XQuery benchmarks

Below is a complete (to our knowledge) list of XQuery benchmarks proposed so far in the XQuery research comunity:

Application benchmarks:
  • XMach-1 (2000)
  • XMark (2001)
  • X007 (2001)
  • XBench (actually, XBench is a collection of four benchmarks: TC/SD, DC/SD, TC/MD and DC/MD; 2002 )
Micro-benchmarks: We start running all these benchmarks on several XQuery engines and found out that all, except XMark, benchmarks has syntactically wrong queries. We fixed these errors and present below the new queries including detailed description of the changes we made.

This being done, we ran the benchmarks with the help of XCheck on Galax, SaxonB, Qizx/Open and MonetDB/XQuery. Check the results.

For more details about this work read An Analysis of the Current XQuery Benchmarks.

Changes made to the benchmark queries

Fixing the syntax

In order to run the benchmarks (XMach-1, X007, XBench, Michigan) on current XQuery engines, we had to ensure that their queries correspond to the current W3C specifications of XQuery language [XQuery 1.0: An XML Query Language. W3C Candidate Recommendation 3 November 2005. http://www.w3.org/TR/xquery/].

This implied:
  • re-writeing the queries to the new XQuery syntax (XMach-1, Michigan),
  • fixing typos (Michigan),
  • fixing type checking errors (X007, XBench, Michigan).
In order to silence the (optional) static type checking we changed the queries to explicitly apply:
  • functions that test for the cardinality of a sequence (e.g. fn:zero-or-one),
  • type convertors.
Note that, these modifications did not change the query semantics. We also tried to keep the query syntax as close as possible to the one given by the authors of the benchmark. Disclaimer: all the errors that we fixed were detected by running the benchmarks on 4 XQuery engines: SaxonB, Galax, MonetDB/XQuery, and Qizx/Open. There still might be other errors that these engines did not detect.

Specifying the input data

The input of an experiment run on XCheck consists of: (i) a list of XQuery engines; (ii) a list of benchmark documents (usually scaled by size), and (iii) a list of queries of the benchmark. For each engine, for each document, XCheck runs all provided queries. In this way XCheck implements the data scalability test. In order to fit this scenario and to keep the queries independent from the local document names, the invocation of fn:doc() XQuery function appears in the queries with no argument. At the time of query execution XCheck fills the queried document name in the fn:doc() function call. This is the only deviation from the current XQuery specification that we allow.

Querying more documents (fixed number)

An XQuery query might invoke more than one document via fn:doc() function. Like in the case described above, we would like to keep the queries independent from the local document names. This allows us to easier test the data scalability of the benchmark. The difference between the last case is that now we want to iterate not single documents but groups of fixed number of documents. For example, a query invokes doc1 and doc2. When scaling we iterate through a list of pairs (doc1,doc2) of documents. To do this, we create an XML document input for XCheck that contains the URIs (name, local directory path, etc.) of the group of documents that appear in the same query. Each iteration has an input document (which, in our example, contains the URIs of two documents: doc1 and doc2). Then we query this input document to obtain the corresponding document nodes. This is computed in a preamble that we add to each query.

Multi-document scenario

XMach-1 and two of the XBench benchmarks (TC/MD and DC/MD) are multi-document scenario benchmarks, i.e. a query is executed against a big collection of documents (unfixed size) at once without explicitly invoking each of them in the query via the fn:doc() function. In order to run this scenario with XCheck, we create an XML document input that contains the list of documents in the collection. Then we can query this document to obtain a sequence of document nodes that can be further used by the benchmark queries. This is obtained by adding the computation of this sequence as a preamble to each query. Detailed descriptions of the benchmark queries and their changes are below.
  • MBench
    • queries in total: 46
    • modified queries: 9
  • XMach-1
    • queries in total: 8
    • modified queries: 8
  • X007
    • queries in total: 22
    • modified queries: 22