Dataset for Contrastive Theme Summarization

The collection of articles consists  articles from New York Times for SIGIR 2015 paper “Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes”.  The New York Times Corpus dataset (http://data.nytimes.com/) contains over 1.8 million articles written and published between January 1, 1987 and June 19, 2007. Over 650,000 articles have manually written article summaries. In this data collection, we only use part of Opinion column articles that were published during 2004–2007. Due to copyrights problem, we only release docID in our dataset.

NYT dataset