information and language processing systems
isla, university of amsterdam

WikiXML: Downloading and installing

The collections can be downloaded from here under GNU Free Documentation License (GFDL).

Full installation

The full installation requires a database server. The database dumps were tested with MySQL 4.1 and MySQL 5. Note: the max_allowed_packet setting in the MySQL server should be increased to at least 16 MB.

There is a separate collection for each language, an additional table containing cross-lingual links and an auxiliary script.

In order to install all data for a given language:

  1. Download the zipped XML files for the selected language (e.g., wikixml-en-20061104/p00nnnnn.zip, ... , p77nnnnn.zip for English).
  2. Download the database files for the selected language (e.g., wikixml-en-20061104-index.sql.gz).
  3. Unzip the downloaded wikixml-en-20061104-index.sql.gz to get the file wikixml-en-20061104-index.sql.
  4. Create a MySQL database (e.g., wikixml_en) and insert the collection index into the database by executing SQL commands:
    create database wikixml_en;
    use wikixml_en;
    source "wikixml-en-20061104-index.sql"
  5. Download the installation script wikixml_insertxml.py
    Run this script to insert the XML content of all pages into the database:
    python wikixml_insertxml.py p*.zip | mysql wikixml_en

In order to install the table with cross-lingual links:

Installing only XML content of the pages

Alternatively, you can download only the files with XML documents (e.g., wikixml-en-20061104/p00nnnnnn.zip, ... , wikixml-en-20061104/p77nnnnnn.zip). They will unzip into a set of files, one file nnn.xml for a Wikipedia page with id nnn.