- Info
Metasim
MetaSim - A Sequencing Simulator for Genomics and Metagenomics

by Daniel H. Huson and Felix Ott,
with contributions from Ramona Schmid, Alexander F. Auch and Daniel C. Richter
Introduction
The new research field of metagenomics is providing exciting insights into various, previously
unclassified ecological systems. Next-generation sequencing technologies are producing an increase
of environmental data in public databases.
There is great need for specialized software solutions and
statistical methods for dealing with complex, metagenome data sets. To facilitate the development and
improvement of metagenomic tools, we introduce a sequencing simulator called MetaSim.
Our software can be used to generate collections of synthetic reads that reflect the
diverse taxonomical composition of typical metagenome data sets.
Based on a database of given genomes, the program allows the user to design a
metagenome by specifying the number of genomes present at different levels of
the NCBI taxonomy, and then to collect reads from the metagenome using a simulation
of a number of different sequencing technologies.
A population sampler optionally produces evolved sequences based on
source genomes and a given evolutionary tree.
The resulting data sets can be used as standardized test scenarios for planning
sequencing projects or for benchmarking metagenomic software.
Feature List:
MetaSim
- integrates a database for source genome sequences
- generates sets of synthetic reads or mate-pairs based on adaptable sequencing error models (e.g. for Sanger chemistry, Roche's 454 and Illumina (former Solexa)
- enables the user to configure abundance values for each organism to model specific taxon compositions
- provides a population sampler to generate evolved sequences based on source genomes and a given evolutionary tree
- can be controlled via graphical user interface or in command line mode
Publication:
Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008)
MetaSim—A Sequencing Simulator for Genomics and Metagenomics.
Download:
Use of the program is free for academic purposes.
The software requires
Java 1.5.
If you use this program for your own research please
cite our software.
FAQ
-
I installed MetaSim. After start up, I do not know how to begin.
Please refer to the section "Getting started" in the manual (found in the pr
ogram folder or
here).
-
When clicking on the database item after initial program start, an error m
essage comes up.
Maybe the location of the database has to be changed to a folder where you ha
ve write permission.
Change the default database location in your file systems using
Edit -> Preferences -> Set Database Location.
-
I have generated a taxon profile but MetaSim says: "Profile NOT saved".
Please check the syntax of your taxon profile. Refer to the manual or use on
e of the example taxon profiles in the examples folder that can be easily adapted.
-
I have generated/loaded a taxon profile but its icon in the project tree s
hows a red
exclamation mark.
The syntax of your taxon profile seems to be correct but at least one sequen
ce entry
could not be found in the database.
First, check whether the genome sequence that is listed with a red exclamation mark
in the taxon profile has already been loaded into the database.
Second, check whether the spelling of the name or taxid in the taxon profile equals
the name or taxid in the database.
-
I have selected a taxon profile and I wanted to open the taxonomy editor.
A window opens but nothing is displayed.
The taxonomy editor can only be used if the genome sequences in the database
have a NCBI taxon id.
Please check if the database contains the taxon ids for each genome sequence
.
Database entries showing a '-1' in the taxid column are not assigned a taxon id.
Please import this file: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz and i
mport it using
Database -> Get Taxon IDs by GI...
-
I tried to download the taxon ids using Database -> Get Taxon IDs (NCBI
ftp) but it did not work.
Seems to be a network problem with ftp. Alternatively, download the file from the NCBI ftp server (Link) and import the file
manually using
Database -> Get Taxon Ids by GI...
-
I do not need ALL genome sequences that are contained in this huge all.fna.tar.gz file (~760MB).
Can I use my own files?
Of course. You can import any genome sequence (fasta format) in the database
using
Database ->Import Files..
Note that without any gi number, MetaSim is not able to assign unique taxon ids to
genome sequences.
Without taxon ids, the taxonomy editor can not be used.
-
I can provide the community with an empirical error model from another sequencing technology.
Maybe this could support and motivate others to develop software and analysis tools
based on this error model.
Great! Please (contact us
),
so that we can provide this file for others.
-
I started a simulation generating 10000 reads. In the result folder of the
project tree
this file only contains 10 fasta entries. What went wrong?
The result file in the project tree can be used only to get a short overview
about few generated reads.
The multifasta file with ALL reads can be found at the location
where the taxon profile has been saved to.
-
There are some bugs in the program. What shall I do with them?
Sorry for this. MetaSim is still under development.
We are looking forward to any user feedback.
So, if you noticed any bugs please (let us know). Thanks!
- I want to use the command line version of MetaSim. It does not work properly.
We currently try to fix this problem (11/2008).
-
I can not find my question in the FAQs.
In the program folder of MetaSim, you can find a detailed manual.
It can also be found at here.
Otherwise send us a message.
Screenshots:

Main window with project tree, taxon abundance profile and message panel.
A second window shows the Taxonomy editor that can be alternatively used to determine the abundance values for the
source genomes.

Error model settings for Sanger reads.

Error model settings for Sanger reads.

View of the integrated database holding all loaded source genomes.