Resources for WebCLEF 2007
Data access
The data listed below can be accessed by submitting an online application form. Please contact jijkoun@science.uva.nl if you have any questions about the resources below.2007 test topics
- WebCLEF2007_topics.xml - 30 test topics for the WebCLEF 2007 task
2007 test documents
- WebCLEF2007_docs.xml.zip (2.9M) - XML description of the web documents - the data collection for the WebCLEF 2007 task
- WebCLEF2007_docs.tgz (3G) - tar-gzipped archive containing originals (HTML, PDF, PS, etc.) and text versions of the web documents of the collection.
- WebCLEF2007_docs_txt_only.tgz (240M) - tar-gzipped archive containing only the text versions of the documents. (Note that this archive is a subset of WebCLEF2007_docs.tgz
2007 anonymized submissions
- WebCLEF2007_submissions.tgz (3.5MB) - 12 anonymized runs submitted by the task participants, and a baseline "google run".
2007 manual assessments
- WebCLEF2007_assessments.xml.gz (86KB) - results of human assessments of the 13 runs (12 submissions and one "google run"). For each topic, the file lists nuggets and spans marked as "relevant" by human assessors.
2007 evaluation software
- WebCLEF2007_evaluate_run.pl (6KB) - Perl script used to evaluate runs. Takes one XML run file (see WebCLEF2007_submissions.tgz above) as parameter. Requires topic and assessment files (see above).
2007 system
- webclef2007_system_uva.zip (12KB) - Perl source code of the University of Amsterdam's system that took part in WebCLEF2007