MemBeR XML Generator
[ about
| download
| contribute
| running
| examples ]
About
The MemBeR generator is a synthetic XML data generator designed for benchmarking purposes. It allows one to generate arbitrarily big XML files, which obey certain statistical properties. It is written in Java and has roughly 3 operational modes:
-
Depth based mode
If the generator is given a desired fanout and size, it will generate
a more or less balanced tree with a depth, which is calculated from
the size and fanout. In order to obtain a tree of the desired size,
subtrees are randomly inserted at the leaf level (without exceeding
the maximum fanout). This mode also accepts parameters
- tags = the number of different tags used
- distribution = the distribution of the tag names in the tree
(one of normal, exponential, uniform)
These parameters are passed to the generator using an XML-file.
EXAMPLE
<Tree xmlns="http://microbenchmarks.org/treegen"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://microbenchmarks.org/treegen TreeGen.xsd"
fanout="6" size="1000000" distribution="normal" tags="50"/>
-
Fanout based mode
If the generator is given a desired depth and size, it will generate
a more or less balanced tree with a fanout, which is calculated from
the size and depth. In order to obtain a tree of the desired size,
subtrees are randomly inserted (without exceeding the maximum depth).
This mode also accepts parameters
- tags = the number of different tags used
- distribution = the distribution of the tag names in the tree
(one of normal, exponential, uniform)
These parameters are passed to the generator using an XML-file.
EXAMPLE
<Tree
xmlns="http://microbenchmarks.org/treegen"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://microbenchmarks.org/treegen TreeGen.xsd"
depth="6" size="1000000" distribution="normal" tags="50"/>
-
Advanced mode
Sometimes it is necessary to have more controll over the tree. Therefore
an advance mode is added, where you have to specify each tag that occurs
in the tree, along with the levels it shoulkd occur on and the frequency
of occurance at the specified level.
WARNING: You CANNOT control the tags and distribution attributes
in this mode!
In this mode you can control the occurence of tags in the tree by
specifying those tags in a Tag element. Additionally, you can
control the level(s) where the tag should occur in the levels-
attribute. Also the fanout for each tag can be set.
You are advised to set a default fanout for level with constant fanouts.
If the default fanout is not given the program will complain.
EXAMPLE
<Tree
xmlns="http://microbenchmarks.org/treegen"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://microbenchmarks.org/treegen TreeGen.xsd"
depth="6" size="10000" distribution="normal">
<Tag name="t01" levels="1" frequency ="1" fanout="10"/>
<Tag name="t02" levels="2" frequency="0.5" fanout="3"/>
<Tag name="t03" levels="2" frequency="0.5" fanout="1"/>
<Tag name="t04" levels="3-4" frequency="0.33" fanout="6"/>
<Tag name="t05" levels="3-4" frequency="0.33" fanout="6"/>
<Tag name="t06" levels="3-4" frequency="0.33" fanout="3"/>
<Tag name="t06" levels="5" frequency="1" fanout="0"/>
</Tree>
Contribute
Feel free to suggest improvements for the generator. Or, even better, add the improvements yourself or request CVS access if you want to cooporate.
Download
Download and unpack the tar-ball member-bin.tgz
$ tar -xzf member-bin.tgz
A new directory will be created containing the following files:
member-bin/MemBeR.tgz
member-bin/psimj.jar
member-bin/xerces.jar
psimj.jar and xerces.jar are added for you convenience. Please visit the
Psimj website ( http://science.kennesaw.edu/~jgarrido/psimj.html) and the Xerces website ( http://xerces.apache.org/) for more information.
You now have three options:
- Use the member.sh script without modification from the meber-bin directory. This
only works if the java binary is in your PATH or if the JAVA_HOME environment variable
is set correctly.
- Modify the MemBeR script, setting the PREFIX variable to the correct value (i.e., the location of the script) and optionally add it to your PATH. You can now use the scrpt from everywhere in you system.
- Add to your CLASSPATH variable the paths to MemBeR.jar psimj.jar and xerces.jar (replace [path] with the location of the files):
[tcsh]$ setenv CLASSPATH [path]/MemBer.jar:psimj.jar:[path]/xerces.jar:.:$CLASSPATH
[bash]$ export CLASSPATH=[path]/MemBer.jar:[path]/psimj.jar:[path]/xerces.jar:.:$CLASSPATH
For your convinience we added these jars to the distribution packadge.
Running MemBeR
- Using the script:
$ member.sh [options] > output.xml
- Otherwise:
$ java /path/to/your/MemBeR/../MemBeR.Generator [options] > output.xml
Options
Here's a short description of the supported options:
java MemBeR.Generator
-file // file name (required, the XML file descriptor)
-debug // level 1-5 (optional, debugging info, default 1)
-help // get help
Examples of MemBeR input files and XML schema
|