Algorithms in Bioinformatics
Research
 
Welcome
People
Research
  Phylogenetic Networks
  Simulation Studies
  Sequence Assembly
  Genome Comparison
  Viewer
  Tilings
  miRNA
Teaching
Publications
Bachelor Thesis/ Student Projects
Master Thesis/ Diploma Projects
Studienkommission
Software
Workshops
Address
Webmaster
Available Positions
External Links
Internal Links
Contents
Search
ZBIT
CS Dept.
University
 
Phylogenetics Scientific Programme, Sept 3-Dec 21, 2007, Cambridge


Warning: this page is out of date.

Research topics
 
Network-based method for phylogenetic inference
Experimental evaluation of phylogenetic methods
Sequence assembly algorithms
Comparative gene finding
Genome comparison methods
Comparative analysis- and visualization tools for genomes
Analysis and visualization tools for multiple alignments
Periodic tilings and enumeration of possible crystals
 
Life is complex,

and biology seeks to understand life. To address this complexity, molecular biology is scaling up to a high-throughput science and more and more insights are gained by processing huge amounts of data, produced in increasingly automated facilities using more and more advanced technologies. In this context, many new questions arise that can best be addressed by scientists that have sufficient biological and computer science (or mathematical) training. Indeed, a new field has emerged that is known as bioinformatics or computational biology. The algorithmic side of this field deals with the development of new methods for addressing problems that arise in this context.
The goal here is to develop algorithms that solve real world biological problems in a useful way. Our group works on the design, implementation and experimental validation of new algorithmic approaches to problems that arise in biology. Moreover, we believe that bioinformatics research makes most sense in close cooperation with biologists that are confronted with computational problems in their research and our goal is to help address such problems. Here is an overview of some of the areas in which we are working:
 
Network-based method for phylogenetic inference

phyleogenetic tree generated with splitstreeAs speciation is a branching process, the underlying assumption of phylogenetic analysis is that the evolutionary relationships in a set of species can be represented by a phylogenetic tree. However, real world data sets usually containing a number of different phylogenetic signals and the question arises whether forcing such a data set onto a tree is appropriate. Methods such as split decomposition (Andreas Dress and Hans-Jürgen Bandelt) produce a tree, given tree-like data, but more generally, use a network to represent a set of species, showing conflicting signals that are present. Other methods pruduce split networks from sequences, distances and trees.
Split networks are an example of implicit networks, namely phylogenetic networks that aim at visualizing incompatible signals. Explicit networks aim at explicitly explaining a data set in terms of reticulate evolution, examples being
hybridization networks, recombination networks or networks describing evolution in terms of HGT. For more details, please see this  tutorial on phylogenetic networks, slides.

Our program SplitsTree4 provides a user-friendly implementation many different network methods.

 
Experimental evaluation of phylogenetic methods
  phylogenetic tree built with the Disk-Covering-Method (DCM)The accuracy of tree reconstruction methods is hard to determine for large sets of real data. One approach is to use simulation studies to evaluate and compare different tree reconstruction methods. This requires a careful design of the experiments. Moreover, such studies give one the opportunity to design and evaluate new methods, or new variants of existing ones. This work is in cooperation with Tandy Warnow.
 
Sequence assembly algorithms

merging 2 paths of a contig-mate graphWith present technology, one cannot simply "read" a long sequence of DNA in a single experiment. Rather, "shotgun sequencing" is used to collect reads of average length 550
and the computation task remains to assemble the unknown sequence from these pieces.
Recent advances were fueled by the "race" to sequence the human genome between the Human Genome Project and Celera Genomics. Advances have been made both with respect to assemblers for huge whole genome shotgun reads and also for assemblers for small projects. We have developed methods for scaffolding contigs using mate pair data and are working on contigging methods based on the concept of a string graph, suggested by Gene Myers. Moreover, we are cooperating with Stephan Schuster at the MPI for Developmental Biology in this area.
 
Comparative gene finding

eukariotic gene structureOnly 3% of the human genome consists of genes. Methods that predict genes in silico are very useful for determine putative genes. However, the problem is not yet solved, as all existing methods produce substantial amounts of false positives and false negatives.
Comparatives methods are currently of great interest, such as the Compared Exon Method developed in cooperation with Vineet Bafna or also Twinscan, a comparative version of Genscan.
 
Genome comparison methods

tblastx of two similar regions of the human and mouse genomeGiven different genomes from different species, or from different individuals of the same species, one would like to compare the genomes in many different ways, on many different levels. Together with Aaron Halpern and Knut Reinert at Celera Genomics, we have developed a number of such methods in the context of comparing different assemblies of the human genome, and comparingthe human and mouse genomes. Again, in cooperation with Stephan Schuster at the MPI for Developmental Biology, we are working on new methods in the context of bacterial genomes.
 
Comparative analysis- and visualization tools for genomes

comparison of the quality of 2 assembliesGenomic and biological data comes in many forms and formats, and in huge amounts. We need tools that can analysis such data quickly and that also provide easy access to visualization of the data and computed properties of it, in a fast and flexible way. We have designed and are implementing such a system. Using graphical programing, the system allows one to configure a visualization on the fly and allows one to bring new data to a running visualization interactively, with total control over the layout and representation of the data. Of course the system allows one to interactively explore and query the displayed data.
 
Analysis and visualization tools for multiple alignments

Correct multi-alignments of protein sequences are crucial in the context of structure prediction of proteins. We are developing and implementing tools that can be used in this context, in cooperation with Andre Lupas at the MPI for Developmental Biology. iPET
 
Periodic tilings and the enumeration of crystals

This research area is not related to bioinformatics. Instead, it brings together mathematics (combinatorial geometry) and crystal chemistry. The approach here is to use results and methods from combinatorial geometry to systematically enumerate all possible three-dimensional crystals of a given type. This research is led by Olaf Delgado.
 


University of Tübingen