Information Retrieval Resources for Bahasa Indonesia

To support document retrieval in Bahasa Indonesia, we are making available a Porter stemmer for the language, a stop word list, as well as two document collections, together with topics and qrels for those topics: the Kompas online collection and the Tempo online collection.

Details about the development of those resources and their usage for evaluation purposes can be found in the following publications:

  • F.Z. Tala, A Study of Stemming Effects on Information
    Retrieval in Bahasa Indonesia, M.Sc. Thesis, University of
    Amsterdam, 2003
  • F. Tala, J. Kamps, K. Müller, and M. de Rijke. The Impact of Stemming on Information Retrieval in Bahasa Indonesia (Abstract). In: 14th Meeting of Computational Linguistics in the Netherlands (CLIN-2003), 2003.

If you use these resources, please let us know. If you publish results obtained using the resources made available here, please cite the Tala, Kamps, Müller and de Rijke paper listed above.