Large volumes of written, image, and video material are stored across the world in archives, providing access to unique material about our history and culture.
However, users of archives have difficulty accessing material since only sparse metadata descriptions for this material are available. One promising direction to explore is to create thematic groupings of archival material automatically.
In this project we found that automatic grouping of material based on similarities between metadata descriptions does not result in groupings that reflect those preferred by human annotators.
These results are described in the paper:
A Social Bookmarking System to Support Cluster Driven Archival Arrangement by Marc Bron, Shenghui Wang, Titia van der Werf, and Maarten de Rijke published at IIiX 2014.
To facilitate the development and evaluation of clustering algorithms for archival data we make the annotations developed within the project available below. See the readme for details.