To support historic document retrieval in general, and retrieval in 17th century Dutch texts in particular, we are making available a document collection, together with topics and qrels for those topics.
- The Braun corpus, consisting of 393 documents from two historic sources:
- the Antwerpse Compilatae (1609) and the
- the Gelders Land- en Stadsrecht (1620).
Main reference for the corpus is (Braun, 2002).
- A set of ad hoc topics plus assessments. The set has 60 topics, there are relevance judgments for 25 topics (for 21 topics at least one relevant document was found). The topics are written and assessed by historians familiar with the language of the period. The main reference for the ad hoc topics is (Braun, 2002).
- A set of known-item topics plus assessments. The set has 25 topics, for each exists a unique relevant page in the corpus. The topics are written by non-experts and formulated in modern Dutch. The main reference for the known-item topics is (Adriaans,
Details about the development of those resources and their usage for evaluation purposes can be found in the following publications:
- F. Adriaans, Historic Document Retrieval: Exploring Strategies for 17th Century Dutch, M.Sc. Thesis, Universiteit van Amsterdam, 2005.
- L. Braun. Information Retrieval from Dutch Historic Corpora, M.Sc. Thesis, IKAT, Universiteit Maastricht, 2002.
- M. Koolen, Constructing Language Resources for Historic Document Retrieval, M.Sc. Thesis, Universiteit van Amsterdam, 2005.
- M. Koolen, F. Adriaans, J. Kamps, and M. de Rijke. A Cross-Language Approach to Historic Document Retrieval. In Advances in Information Retrieval: 28th European Conference on Information Retrieval (ECIR 2006), Lecture Notes in Computer Science. Springer Verlag, Heidelberg, 2006.
If you use these resources, please let us know. If you publish results obtained using the resources made available here, please cite the Koolen, Adriaans, Kamps and de Rijke.