MEGAN - Metagenome Analysis Software
Software for analyzing metagenomes.
New: MEGAN 3 supports comparison of multiple datasets and uses a new file format, RMA, that makes it possible to process and interactively explore BLAST files up to 1 TB in size.

with contributions from Alexander F. Auch,
Daniel C. Richter, Suparna Mitra and Qi Ji.
Metagenomics
Metagenomics is the study of the genomic content of a sample of organisms obtained from a common habitat using targeted or random sequencing. Goals include understanding the extent and role of microbial diversity
![]() |
![]() |
![]() |
| http://soils.usda.gov |
|
Poinar et al 2006 |
The taxonomical content of such a sample is usually estimated by comparison against DNA and protein sequence databases of known sequences. Most published studies employ the analysis of paired-end reads, complete sequences of environmental fosmid and BAC clones, or environmental assemblies. Emerging very-high-throughput sequencing technologies are paving the way to low-cost random shotgun approaches.

Laptop Analysis
MEGAN (“MEtaGenome ANalyzer”) is a new computer program that allows laptop analysis of large metagenomic datasets. In a preprocessing step, the set of DNA reads (or contigs) is compared against databases of known sequences using BLAST or another comparison tool. MEGAN can then be used to compute and interactively explore the taxonomical content of the dataset, employing the NCBI taxonomy to summarize and order the results.

Assignment of Reads to Taxa
An LCA-based algorithm assigns reads to taxa in such a way that the taxonomical level of the assigned taxon reflects the level of conservation of the sampled sequence. The software allows dissection of large datasets without the need for assembly or the targeting of specific phylogenetic markers. It provides graphical and statistical output for the comparison of different data sets. We have sucessfully applied this approach to a number of datasets obtained by Sanger sequencing and sequencing-by-synthesis technology, including the Sargasso Sea dataset, a recently published metagenomic dataset sampled from a mammoth bone, and several complete microbial genomes.

Comparison and Analysis of Multiple Datasets

COG analysis
MEGAN3 provides tools for analyzing the functional content of a metagenome using COGs:
GO analysis:
A comparative analysis of the functional content of metagenome datasets, based on the Gene Ontology, is now available in version 3.7 of MEGAN!


Publications
MEGAN 1.0 was published in: D.H. Huson, A.F. Auch, Ji Qi and S.C. Schuster, MEGAN Analysis of Metagenomic Data, Genome Research. 17:377-386, 2007.
An example of the application of MEGAN can be found in: H. N. Poinar, C. Schwarz, Ji Qi, B. Shapiro, R. D. E. MacPhee, B. Buigues, A. Tikhonov, D. H. Huson, L. P. Tomsho, A. Auch, M. Rampp, W. Miller, S. C. Schuster, Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNA, Science 311:392-394, 2006, where we used an early version of our software to analyze the taxonomical content of a collection of DNA reads sampled from a mammoth.
An example of using MEGAN to analyze RNA sequences from soild can be found here: T. Urich A. Lanzén, Ji Qi, D.H. Huson, C. Schleper and Stephan C. Schuster, Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome, PLoS ONE 3(6): e2527 doi:10.1371/journal.pone.0002527.
To find out more about the program, please take a look at the current user manual.
Download
Use of the program requires a license. Academic licenses are freely available to all academic users. Usage in non-academic settings requires a commerical license. Obtain a license key online.
Download the latest version here.
(Download the original version 1.0 here.)
Have a look at our tutorial on how to set BLAST parameters for long/short read sequences.
How to use MEGAN
To analyze a set of reads using MEGAN, proceed as follows.
1) Put all your reads in one fastA file and use BLASTX to compare your reads against the NCBI-nr database. (You can also use BLASTN to compare against NCBI-nt, or other variants). Here are some hints on blasting metagenomic data sets.
2) Concatenate all the resulting blast files into one large file.
3) Once you have the BLAST results, MEGAN has to import the reads and BLAST file to generate its own archive, called an RMA file. The RMA file will only be about 10-20% the size of the
original input files, but will contain all your reads and the best 25 blast matches for each read. For very large datasets, this step may require a lot of memory. In this case, install MEGAN
on a large memory machine (8 GB should suffice even for one terabyte of input data) and then modify the MEGAN startup script as described below to allow MEGAN more memory.
Even for relatively small datasets, running on a high memory machine is recommended as this speeds up the program significantly.
4) The main computational bottle neck of the analysis is the BLAST run. This will usually be performed on a server. We recommend that the initial parsing of the resulting blast files also
be performed on a server, whereas the interactive analysis can then take place on a desktop or laptop.
5) One you have computed an RMA file for your data, this data can be downloaded, e.g. onto a laptop, and then can be explored and analyzed at ease (2 GB of memory recommended).
You will find that MEGAN allows you to open many different datasets at once and produce comparisons of them.
Example datasets
| Publication | MEGAN RMA file |
|---|---|
| T. Urich A. Lanzén, Ji Qi, D.H. Huson, C. Schleper and Stephan C. Schuster, Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome, PLoS ONE 3(6): e2527 2008, doi:10.1371/journal.pone.0002527. | RudSoil_vs_lssu_160807.rma (5.7 GB) |
| Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, Peterson DM, Saar MO, Alexander S, Alexander EC Jr, Rohwer F. Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics. 2006 Mar 20;7:57 | Red.rma (1.7GB) |
| Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, Bork P, Hugenholtz P, Rubin EM. Comparative metagenomics of microbial communities. Science. 2005 Apr 22;308(5721):554-7. | MinnesotaSoil.rma (2.8 GB) |
| Lo I, Denef VJ, Verberkmoes NC, Shah MB, Goltsman D, DiBartolo G, Tyson GW, Allen EE, Ram RJ, Detter JC, Richardson P, Thelen MP, Hettich RL, Banfield JF. Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature. 2007 Mar 29;446(7135):537-41. Epub 2007 Mar 7. | AcidMine.rma (4.8 GB) |
| Gut Microbiome of Mice with Diet-Induced Obesity project at Washington University | Mouse_gut_28789_west1.rma (417 MB) Mouse_gut_28793_west3.rma 545 MB) Mouse_gut_28795_fatr1.rma (457 MB) Mouse_gut_28799_carbr1.rma (461 MB) |
(Download old example datasets associated with MEGAN 1 from here).




