Algorithms in Bioinformatics
Research Sequence Assembly
 
Welcome
People
Research
  Phylogenetic Networks
  Simulation Studies
  Sequence Assembly
  Genome Comparison
  Viewer
  Tilings
  miRNA
Teaching
Publications
Bachelor Thesis/ Student Projects
Master Thesis/ Diploma Projects
Studienkommission
Software
Workshops
Address
Webmaster
Available Positions
External Links
Internal Links
Contents
Search
ZBIT
CS Dept.
University
 

OSlay - Optimal Syntenic Layouter

Layout of unfinished assemblies of two related genomes using synteny

Abstract

With the increasing number of sequencing projects successfully completed, the idea of using the genomic collinearity between closely related organisms to facilitate the assembly of an unfinished genome becomes more and more popular. The apparent syntenic similarities of two related genomes used in this approach are based on the fact that the two organisms derive from a common ancestor a long time ago. Usually this leads to the desirable situation that the arrangement of one genome determines the other.
In this project, an algorithm called Optimal Syntenic Layout Algorithm [pdf] is used and improved which allows the prediction of an ordering and orientation of an unfinished and fragmented genome.

Strategy

Ideally, the reference genome should be in an advanced assembly state consisting of longer genomic pieces than the target genome. This approach does not need any mate-pair information. Instead, the sequence similarity information at nucleotide level provided by sequence comparison is used to predict a layout of the contigs. Taking the visualization software CGViz as a framework, the algorithm is implemented as a plugin named OSLay (Optimal Syntenic Layouter).

OSLay's pipeline

Using contigs of assemblies of two related organisms as input, a sequence alignment (e.g. using BLAST or MUMmer) results in a set of matches, usually visible as diagonals in a dot-plot. These diagonals can be used to compute a rearrangement of contigs by maximizing the number of “local extensions”. This means that contigs are placed next to each other in a new layout if this produces an extension of their contained diagonal matches.

Several experiments were made either with prokaryotic as well as eukaryotic assemblies. The results are quite promising and show that this sequence-based approach is capable of detecting scaffolds without considering any mate-pair information.

Result of OSLay Here, a total of 69 contigs of the target assembly (x-Axis) is rearranged by OSLay producing only 4 super-contigs.

Due to the given fact that most assemblers use only mate-pair information to detect a contig layout, the additional application of OSLay is able to significantly advance the scaffolding process. Especially, during comparative assembly projects when two or more syntenic genomes are sequenced in parallel, this tool can be useful because actually, the determining reference genome does not have to be finished.

Update 01/17/07: OSLay Software released!

OSLay 1.0 is now available. Please visit the the project's homepage...

Contact: Daniel Richter

 

 

 


University of Tübingen