Resources

A collection of resources for the WiQA pilot task.

[Links|Collections|Topics |Assessments]

Links

XML Collections

The corpora available for download below are derived from the conversion of Wikipedia dumps by Ludovic Denoyer (see the technical report here, which should be cited in any published work that uses the data). Like Wikipedia itself, the collections are distributed under GNU Free Documentation License (GFDL). In essence, this means that you can use the collections freely, but any derivatives should be made available under the same license (GFDL). By downloading the collections below you agree to these conditions.

An example of the article from the English collection: 113707.xml. The annotation includes Named Entity class of the articles' topics, sections, paragraphs and sentences of the text, and links between articles.

English, 939M compressed (.tar.gz)

Spanish, 117M compressed (.tar.gz)

Dutch, 127M compressed (.tar.gz)

Test and development topics

Below are the topics for the official submissions (which are due on June 30). Each language task consists of 50 topics correctly tagged as PERSON, LOCATION or ORGANIZATION in the XML data collections, and also several other (optional) topics: either not falling into these three categories, or not tagged correctly in the XML collections. The latter topics are marked as optional in the topic files below, and can be ignored by systems without penalty. If submitted, these optional topics will also be assessed, but evaluation measures for them will be calculated separately.

Development topics for the English monolingual task are also available:

Results of the WiQA 2006 assessments

The anonymized results of the assessment of the 20 submitted runs are available:

The format of the XML files in the archive is similar to the submission format: for each topic there is a ranked list of snippets, with assessments (supported, important, novel, not_repeated) indicated using attributes of the snippet element (values 0 and 1, meaning true and false).