Dataset is available
The Entity track (among several other TREC tracks) will use the ClueWeb09 dataset, which has officially been released. The full collection consists of 1 billion pages, in 10 languages. For the first year of the Entity track we will use the smaller, “Category B” subset, which contains about 50 million English pages.
Tags: Data