MetaSim - A Sequencing Simulator for
Genomics and Metagenomics
by Daniel H. Huson and Felix Ott,
with contributions from Ramona Schmid, Alexander F. Auch and Daniel C.
Richter
Introduction
The new research field of metagenomics is providing exciting insights
into various, previously
unclassified ecological systems. Next-generation sequencing
technologies are producing an increase
of environmental data in public databases.
There is great need for specialized software solutions and
statistical methods for dealing with complex, metagenome data sets. To
facilitate the development and
improvement of metagenomic tools, we introduce a sequencing simulator
called MetaSim.
Our software can be used to generate collections of synthetic
reads that reflect the diverse taxonomical composition of typical
metagenome data sets. Based on a database of given genomes, the program
allows the user to design a metagenome by specifying the number of
genomes present at different levels of the NCBI taxonomy, and then to
collect reads from the metagenome using a simulation of a number of
different sequencing technologies. A population sampler optionally
produces evolved sequences based on source genomes and a given
evolutionary tree.
The resulting data sets can be used as standardized test scenarios for
planning sequencing projects or for benchmarking metagenomic software.
Feature List:
MetaSim
- integrates a database for source genome sequences
- generates sets of synthetic reads or mate-pairs based on
adaptable sequencing error models (e.g. for Sanger chemistry, Roche's
454 and Illumina (former Solexa)
- enables the user to configure abundance values for each
organism to model specific taxon compositions
- provides a population sampler to generate modified sequences
- can be controlled via graphical user interface or in
command line mode
Publication:
Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008)
MetaSim—A Sequencing Simulator for Genomics and Metagenomics.
PLoS ONE 3(10): e3373. doi:10.1371/journal.pone.0003373
Link
Download:
Use of the program is free for academic purposes.
The software requires Java 1.5.
Download from here
If you use this program for your own research please cite our software.
FAQ
-
I installed MetaSim. After start up, I do not know how to begin.
Please refer to the section "Getting started" in the manual (found in the program folder or
here).
-
When clicking on the database item after initial program start, an error message comes up.
Maybe the location of the database has to be changed to a folder where you have write permission.
Change the default database location in your file systems using
Edit -> Preferences -> Set Database Location.
-
I have generated a taxon profile but MetaSim says: "Profile NOT saved".
Please check the syntax of your taxon profile. Refer to the manual or use one of the example taxon profiles in the examples folder that can be easily adapted.
-
I have generated/loaded a taxon profile but its icon in the project tree shows a red
exclamation mark.
The syntax of your taxon profile seems to be correct but at least one sequence entry
could not be found in the database.
First, check whether the genome sequence that is listed with a red exclamation mark in the taxon profile has already been loaded into the database.
Second, check whether the spelling of the name or taxid in the taxon profile equals the name or taxid in the database.
-
I have selected a taxon profile and I wanted to open the taxonomy editor. A window opens but nothing is displayed.
The taxonomy editor can only be used if the genome sequences in the database have a NCBI taxon id.
Please check if the database contains the taxon ids for each genome sequence.
Database entries showing a '-1' in the taxid column are not assigned a taxon id.
Please import this file: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz and import it using
Database -> Get Taxon IDs by GI...
-
I tried to download the taxon ids using Database -> Get Taxon IDs (NCBI ftp) but it did not work.
Seems to be a network problem with ftp.
Alternatively, download the file from the NCBI ftp server (Link) and import the file manually using
Database -> Get Taxon Ids by GI...
-
I do not need ALL genome sequences that are contained in this huge all.fna.tar.gz file (~760MB).
Can I use my own files?
Of course. You can import any genome sequence (fasta format) in the database using
Database ->Import Files..
Note that without any gi number, MetaSim is not able to assign unique taxon ids to genome sequences.
Without taxon ids, the taxonomy editor can not be used.
-
I can provide the community with an empirical error model from another sequencing technology.
Maybe this could support and motivate others to develop software and analysis tools
based on this error model.
Great! Please (contact us),
so that we can provide this file for others.
-
I started a simulation generating 10000 reads. In the result folder of the project tree
this file only contains 10 fasta entries. What went wrong?
The result file in the project tree can be used only to get a short overview about few generated reads.
The multifasta file with ALL reads can be found at the location
where the taxon profile has been saved to.
-
There are some bugs in the program. What shall I do with them?
Sorry for this. MetaSim is still under development.
We are looking forward to any user feedback.
So, if you noticed any bugs please (let us know). Thanks!
-
I can not find my question in the FAQs.
In the program folder of MetaSim, you can find a detailed manual.
It can also be found at here.
Otherwise send us a message.
Screenshots:
Main window with project tree, taxon abundance profile and
message panel.
A second window shows the Taxonomy editor that can be alternatively
used to determine the abundance values for the
source genomes.
Error model settings for Sanger reads.
Error model settings for Sanger reads.
View of the integrated database holding all loaded source
genomes.
|