Algorithms in Bioinformatics
Software MetaSim
 
Welcome
People
Research
Teaching
Publications
Bachelor Thesis/ Student Projects
Master Thesis/ Diploma Projects
Studienkommission
Software
  CGViz
  Copycat
  CrossLink
  Dendroscope
  MEGAN
  MetaSim
  microHARVESTER
  NRPSpredictor
  OSLay
  PAT
  ReadSim
  SplitsTree4
  SplitsTree3.2
  2D Tiler
Workshops
Address
Webmaster

Available Positions

External Links
Internal Links

Contents
Search

ZBIT
CS Dept.
University
 

MetaSim - A Sequencing Simulator for Genomics and Metagenomics



MetaSim splash screen

by Daniel H. Huson and Felix Ott,
with contributions from Ramona Schmid, Alexander F. Auch and Daniel C. Richter



Introduction
Publication
Download
FAQ
Screenshots

Introduction

The new research field of metagenomics is providing exciting insights into various, previously unclassified ecological systems. Next-generation sequencing technologies are producing an increase of environmental data in public databases.
There is great need for specialized software solutions and statistical methods for dealing with complex, metagenome data sets. To facilitate the development and improvement of metagenomic tools, we introduce a sequencing simulator called MetaSim.

Our software can be used to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets. Based on a database of given genomes, the program allows the user to design a metagenome by specifying the number of genomes present at different levels of the NCBI taxonomy, and then to collect reads from the metagenome using a simulation of a number of different sequencing technologies. A population sampler optionally produces evolved sequences based on source genomes and a given evolutionary tree.
The resulting data sets can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software.

Feature List:

MetaSim

  • integrates a database for source genome sequences
  • generates sets of synthetic reads or mate-pairs based on adaptable sequencing error models (e.g. for Sanger chemistry, Roche's 454 and Illumina (former Solexa)
  • enables the user to configure abundance values for each organism to model specific taxon compositions
  • provides a population sampler to generate modified sequences
  • can be controlled via graphical user interface or in command line mode

Publication:

Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008)
MetaSim—A Sequencing Simulator for Genomics and Metagenomics.
PLoS ONE 3(10): e3373. doi:10.1371/journal.pone.0003373
Link

Download:

Use of the program is free for academic purposes. The software requires Java 1.5.

Download from here

If you use this program for your own research please cite our software.

FAQ

  • I installed MetaSim. After start up, I do not know how to begin.
    Please refer to the section "Getting started" in the manual (found in the program folder or here).
  • When clicking on the database item after initial program start, an error message comes up.
    Maybe the location of the database has to be changed to a folder where you have write permission.
    Change the default database location in your file systems using Edit -> Preferences -> Set Database Location.
  • I have generated a taxon profile but MetaSim says: "Profile NOT saved".
    Please check the syntax of your taxon profile. Refer to the manual or use one of the example taxon profiles in the examples folder that can be easily adapted.
  • I have generated/loaded a taxon profile but its icon in the project tree shows a red exclamation mark.
    The syntax of your taxon profile seems to be correct but at least one sequence entry could not be found in the database.
    First, check whether the genome sequence that is listed with a red exclamation mark in the taxon profile has already been loaded into the database.
    Second, check whether the spelling of the name or taxid in the taxon profile equals the name or taxid in the database.
  • I have selected a taxon profile and I wanted to open the taxonomy editor. A window opens but nothing is displayed.
    The taxonomy editor can only be used if the genome sequences in the database have a NCBI taxon id.
    Please check if the database contains the taxon ids for each genome sequence. Database entries showing a '-1' in the taxid column are not assigned a taxon id.
    Please import this file: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz and import it using Database -> Get Taxon IDs by GI...
  • I tried to download the taxon ids using Database -> Get Taxon IDs (NCBI ftp) but it did not work.
    Seems to be a network problem with ftp. Alternatively, download the file from the NCBI ftp server (Link) and import the file manually using Database -> Get Taxon Ids by GI...
  • I do not need ALL genome sequences that are contained in this huge all.fna.tar.gz file (~760MB). Can I use my own files?
    Of course. You can import any genome sequence (fasta format) in the database using Database ->Import Files..
    Note that without any gi number, MetaSim is not able to assign unique taxon ids to genome sequences. Without taxon ids, the taxonomy editor can not be used.
  • I can provide the community with an empirical error model from another sequencing technology. Maybe this could support and motivate others to develop software and analysis tools based on this error model.
    Great! Please (contact us), so that we can provide this file for others.
  • I started a simulation generating 10000 reads. In the result folder of the project tree this file only contains 10 fasta entries. What went wrong?
    The result file in the project tree can be used only to get a short overview about few generated reads.
    The multifasta file with ALL reads can be found at the location where the taxon profile has been saved to.
  • There are some bugs in the program. What shall I do with them?
    Sorry for this. MetaSim is still under development.
    We are looking forward to any user feedback. So, if you noticed any bugs please (let us know). Thanks!
  • I can not find my question in the FAQs.
    In the program folder of MetaSim, you can find a detailed manual. It can also be found at here. Otherwise send us a message.

Screenshots:

Profile Editor
Main window with project tree, taxon abundance profile and message panel.
A second window shows the Taxonomy editor that can be alternatively used to determine the abundance values for the source genomes.

Simulation of Sanger reads
Error model settings for Sanger reads.

Simulation of 454 reads
Error model settings for Sanger reads.

Database and run settings
View of the integrated database holding all loaded source genomes.


University of Tübingen