| Research
topics |
| |
|
| |
| Life is
complex, |
|
and biology seeks to understand life. To
address
this complexity, molecular biology is scaling up to a high-throughput
science
and more and more insights are gained by processing huge amounts of
data,
produced in increasingly automated facilities using more and more
advanced
technologies. In this context, many new questions arise that can best
be
addressed by scientists that have sufficient biological and computer
science
(or mathematical) training. Indeed, a new field has emerged that is
known
as bioinformatics or computational biology. The algorithmic side of
this
field deals with the development of new methods for addressing problems
that arise in this context.
The goal here is to develop algorithms that solve real world biological
problems
in a useful way. Our group works on the design, implementation and
experimental
validation of new algorithmic approaches to problems that arise in
biology.
Moreover, we believe that bioinformatics research makes most sense in
close
cooperation with biologists that are confronted with computational
problems
in their research and our goal is to help address such problems. Here
is
an overview of some of the areas in which we are working: |
| |
| Network-based
method for phylogenetic inference |
|
As speciation is a branching process, the
underlying
assumption of phylogenetic analysis is that the evolutionary
relationships
in a set of species can be represented by a phylogenetic tree. However,
real
world data sets usually containing a number of different phylogenetic
signals
and the question arises whether forcing such a data set onto a tree is
appropriate.
Methods such as split decomposition (Andreas Dress and Hans-Jürgen
Bandelt)
produce a tree, given tree-like data, but more generally, use a
network
to represent a set of species, showing conflicting signals that are
present.
Other methods pruduce split networks from sequences, distances and
trees.
Split networks are an example of
implicit networks, namely phylogenetic networks that aim at
visualizing incompatible signals. Explicit
networks aim at explicitly explaining a data set in terms of reticulate
evolution, examples being
hybridization networks, recombination networks or networks describing
evolution in terms of HGT. For more details, please see this tutorial on phylogenetic networks, slides.
Our
program SplitsTree4
provides
a user-friendly implementation many different network methods.
|
| |
| Experimental
evaluation of phylogenetic methods |
| |
The accuracy of tree
reconstruction
methods is hard to determine for large sets of real data. One approach
is
to use simulation studies to evaluate and compare different tree
reconstruction
methods. This requires a careful design of the experiments. Moreover,
such
studies give one the opportunity to design and evaluate new methods, or
new
variants of existing ones. This work is in cooperation with Tandy
Warnow. |
| |
| Sequence assembly algorithms |
|
With present technology, one cannot simply "read" a long
sequence
of DNA in a single experiment. Rather, "shotgun sequencing" is used to
collect
reads of average length 550
and the computation task remains to assemble the unknown sequence from
these
pieces.
Recent advances were fueled by the "race" to sequence the human genome
between
the Human Genome Project and Celera Genomics. Advances have been made
both
with respect to assemblers for huge whole genome shotgun reads and also
for
assemblers for small projects. We have developed methods for
scaffolding
contigs using mate pair data and are working on contigging methods
based
on the concept of a string graph, suggested by Gene Myers. Moreover, we
are
cooperating with Stephan Schuster at the MPI for Developmental Biology
in
this area.
|
| |
| Comparative gene finding |
|
Only
3% of the human genome consists of genes. Methods that predict genes in
silico are very useful for determine putative genes. However, the
problem
is not yet solved, as all existing methods produce substantial amounts
of
false positives and false negatives.
Comparatives methods are currently of great interest, such as the
Compared Exon Method developed in cooperation with Vineet Bafna or also
Twinscan,
a comparative version of Genscan. |
| |
| Genome comparison methods |
|
Given different genomes from
different
species, or from different individuals of the same species, one would
like
to compare the genomes in many different ways, on many different
levels.
Together with Aaron Halpern and Knut Reinert at Celera Genomics, we
have
developed a number of such methods in the context of comparing
different
assemblies of the human genome, and comparingthe human and mouse
genomes.
Again, in cooperation with Stephan Schuster at the MPI for
Developmental
Biology, we are working on new methods in the context of bacterial
genomes. |
| |
| Comparative
analysis- and visualization tools for genomes |
|
Genomic and biological data comes in many
forms
and formats, and in huge amounts. We need tools that can analysis such
data
quickly and that also provide easy access to visualization of the data
and
computed properties of it, in a fast and flexible way. We have designed
and are implementing such a system. Using graphical programing, the
system
allows one to configure a visualization on the fly and allows one to
bring
new data to a running visualization interactively, with total control
over
the layout and representation of the data. Of course the system allows
one
to interactively explore and query the displayed data. |
| |
| Analysis
and visualization tools for multiple alignments |
|
Correct multi-alignments of protein
sequences
are crucial in the context of structure prediction of proteins. We are
developing
and implementing tools that can be used in this context, in cooperation
with Andre Lupas at the MPI for Developmental Biology.  |
| |
| Periodic
tilings and the enumeration of crystals |
|
This research area is
not
related to bioinformatics. Instead, it brings together mathematics
(combinatorial
geometry) and crystal chemistry. The approach here is to use results
and
methods from combinatorial geometry to systematically enumerate all
possible
three-dimensional crystals of a given type. This research is led by
Olaf
Delgado. |
| |
|