Gen&Topic data

The Gen&Topic evaluation set is an Arabic-English evaluation set for machine translation. It exhibits controlled topic-genre distributions and therefore allows for disentanglement of topics and genres in MT. The data set is presented in the ACL2015 short paper What’s in a Domain? Analyzing Genre and Topic differences in Statistical Machine Translation. The data set can be downloaded here.