|
 |
Layout of unfinished assemblies of two related genomes using synteny |
|
Abstract
With the increasing number of sequencing projects successfully completed, the idea of
using the genomic collinearity between closely related organisms to facilitate the assembly of
an unfinished genome becomes more and more popular. The apparent syntenic similarities
of two related genomes used in this approach are based on the fact that the two organisms
derive from a common ancestor a long time ago. Usually this leads to the desirable situation
that the arrangement of one genome determines the other.
In this project, an algorithm called
Optimal Syntenic Layout Algorithm [pdf]
is used and improved which allows the prediction of an ordering
and orientation of an unfinished and fragmented genome.
Strategy Ideally, the reference genome should be in an advanced assembly state consisting of longer genomic
pieces than the target genome. This approach does not need any mate-pair information. Instead,
the sequence similarity information at nucleotide level provided by sequence comparison is used
to predict a layout of the contigs. Taking the visualization software
CGViz as a framework,
the algorithm is implemented as a plugin named OSLay (Optimal Syntenic Layouter).
Using contigs of assemblies of two related organisms as input, a sequence alignment (e.g. using BLAST or
MUMmer)
results in a set of matches, usually visible as diagonals in a dot-plot. These diagonals can be used to
compute a rearrangement of contigs by maximizing the number of “local extensions”. This means that contigs are
placed next to each other in a new layout if this produces an extension of their contained diagonal matches.
Several experiments were made either with prokaryotic as well as eukaryotic assemblies. The results are quite
promising and
show that this sequence-based approach is capable of detecting scaffolds without considering any mate-pair information.
Here, a total of 69 contigs of the target assembly (x-Axis) is rearranged by OSLay producing only 4 super-contigs.
Due to the given fact that most assemblers use only mate-pair information to detect a contig layout,
the additional application of OSLay is able to significantly advance the scaffolding process.
Especially, during comparative assembly projects when two or more syntenic genomes are sequenced in parallel, this tool can
be useful because actually, the determining reference genome does not have to be finished.
Update 01/17/07: OSLay Software released!
OSLay 1.0 is now available. Please visit the the project's homepage...
|
|
|