Resources
This page contains information about some of the software and resources produced and released by members of the Information and Language Processing Systems group.
- Answer type classification: guidelines for answer type classification plus 1,371 classified questions
- Arabic blogs dataset: 12,000 Arabic blogs with over 120,300 posts
- Blogs and wishlists: a collection of blog and the wishlists of their bloggers
- Comment spam in weblogs: a toy collection of comment spam in blogs
- Concept selection benchmarks for concept-based video retrieval
- DBpedia annotations: A set of user queries annotated with DBpedia concepts (in Dutch). Also the features that were extracted for and used in the paper "Learning Semantic Query Suggestions".
- Historic document retrieval resources: 17th century Dutch: test collections to support historic document retrieval
- Information retrieval resources for Bahasa Indonesia: a stemmer, stop word list, as well as two test collections
- Timex annotation system: a modular system for recognition and interpretation of temporal expressions in English text, producing TIMEX2 annotations
- Web FAQ data used in Retrieving Answers from Frequently Asked Question Pages on the Web
- Weblog post moods