MemBeR XML Generator

[ about | download | contribute | running | examples ]

About

The MemBeR generator is a synthetic XML data generator designed for benchmarking purposes. It allows one to generate arbitrarily big XML files, which obey certain statistical properties. It is written in Java and has roughly 3 operational modes:
  1. Depth based mode
    If the generator is given a desired fanout and size, it will generate a more or less balanced tree with a depth, which is calculated from the size and fanout. In order to obtain a tree of the desired size, subtrees are randomly inserted at the leaf level (without exceeding the maximum fanout). This mode also accepts parameters
    • tags = the number of different tags used
    • distribution = the distribution of the tag names in the tree (one of normal, exponential, uniform)
    These parameters are passed to the generator using an XML-file.

    EXAMPLE

         <Tree xmlns="http://microbenchmarks.org/treegen"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://microbenchmarks.org/treegen TreeGen.xsd" 
         fanout="6" size="1000000" distribution="normal" tags="50"/>
    

  2. Fanout based mode
    If the generator is given a desired depth and size, it will generate a more or less balanced tree with a fanout, which is calculated from the size and depth. In order to obtain a tree of the desired size, subtrees are randomly inserted (without exceeding the maximum depth). This mode also accepts parameters
    • tags = the number of different tags used
    • distribution = the distribution of the tag names in the tree (one of normal, exponential, uniform)
    These parameters are passed to the generator using an XML-file.

    EXAMPLE

         <Tree
         xmlns="http://microbenchmarks.org/treegen"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://microbenchmarks.org/treegen TreeGen.xsd"
         depth="6" size="1000000" distribution="normal" tags="50"/>
    

  3. Advanced mode
    Sometimes it is necessary to have more controll over the tree. Therefore an advance mode is added, where you have to specify each tag that occurs in the tree, along with the levels it shoulkd occur on and the frequency of occurance at the specified level.
    WARNING: You CANNOT control the tags and distribution attributes in this mode!
    In this mode you can control the occurence of tags in the tree by specifying those tags in a Tag element. Additionally, you can control the level(s) where the tag should occur in the levels- attribute. Also the fanout for each tag can be set. You are advised to set a default fanout for level with constant fanouts. If the default fanout is not given the program will complain.

    EXAMPLE

             <Tree
                    xmlns="http://microbenchmarks.org/treegen"
                    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                    xsi:schemaLocation="http://microbenchmarks.org/treegen TreeGen.xsd"
                    depth="6" size="10000" distribution="normal">
    
               <Tag name="t01" levels="1"   frequency ="1"   fanout="10"/>
               <Tag name="t02" levels="2"   frequency="0.5"  fanout="3"/>
    
               <Tag name="t03" levels="2"   frequency="0.5"  fanout="1"/>
               <Tag name="t04" levels="3-4" frequency="0.33" fanout="6"/>
               <Tag name="t05" levels="3-4" frequency="0.33" fanout="6"/>
               <Tag name="t06" levels="3-4" frequency="0.33" fanout="3"/>
               <Tag name="t06" levels="5"   frequency="1"    fanout="0"/>
             </Tree>
    
    

Contribute

Feel free to suggest improvements for the generator. Or, even better, add the improvements yourself or request CVS access if you want to cooporate.

Download

Download and unpack the tar-ball member-bin.tgz
$ tar -xzf member-bin.tgz A new directory will be created containing the following files: member-bin/MemBeR.tgz
member-bin/psimj.jar
member-bin/xerces.jar


psimj.jar and xerces.jar are added for you convenience. Please visit the Psimj website (http://science.kennesaw.edu/~jgarrido/psimj.html) and the Xerces website (http://xerces.apache.org/) for more information.

You now have three options:
  1. Use the member.sh script without modification from the meber-bin directory. This only works if the java binary is in your PATH or if the JAVA_HOME environment variable is set correctly.
  2. Modify the MemBeR script, setting the PREFIX variable to the correct value (i.e., the location of the script) and optionally add it to your PATH. You can now use the scrpt from everywhere in you system.
  3. Add to your CLASSPATH variable the paths to MemBeR.jar psimj.jar and xerces.jar (replace [path] with the location of the files):

    [tcsh]$ setenv CLASSPATH [path]/MemBer.jar:psimj.jar:[path]/xerces.jar:.:$CLASSPATH
    [bash]$ export CLASSPATH=[path]/MemBer.jar:[path]/psimj.jar:[path]/xerces.jar:.:$CLASSPATH

    For your convinience we added these jars to the distribution packadge.

Running MemBeR

  • Using the script: $ member.sh [options] > output.xml
  • Otherwise: $ java /path/to/your/MemBeR/../MemBeR.Generator [options] > output.xml
Options Here's a short description of the supported options:
  java MemBeR.Generator 
        -file  // file name (required, the XML file descriptor)
        -debug // level 1-5 (optional, debugging info, default 1)
        -help  // get help

Examples of MemBeR input files and XML schema