This page contains the dataset that was created for the WSDM 2012 paper Adding Semantics to Microblog Posts by Edgar Meij, Wouter Weerkamp and Maarten de Rijke. In the paper, we evaluate various methods for automatically identifying concepts (in the form of Wikipedia articles) that are contained in or meant by a tweet.
This release contains the tweets that we used, as well as the manual annotations, i.e., links to Wikipedia articles. More information on this dataset can be found here. If there is sufficient interest, we will also release the extracted features that were used in the paper.