Bioinformatics Tools
| Topic | Bioinformatics Software Tools |
| Teachers: |
Daniel Huson and Suparna Mitra |
| Time: |
30 March-10 April, 9:00-17:00 |
| Signup meeting |
Thursday, Oct 23, 17h, C311 |
| Pre-lab meeting |
Friday, March 27, 16-18h, C311 |
| Final due date for all materials |
April 24. |
| Credits: | Diploma students: 4 SWS, examinable: 2 SWS, MSc Students: 8 LP (Modules: Bioinformatik or Praktische Bioinformatik)|
| Location: | Computer lab C311 |
| Material and Downloads: | Introductory papers by email. Marine datasets from local nfs disk. MEGAN, MetaSim |
Description |
| The aim of this course is to learn how to analyze
metagenomic datasets. The three main questions that biologists hope to answer using computational techniques are: (1) What is the taxonomical profile of my sample? (2) What is the functional profile of my sample? (3) How do two samples differ? The structure of the course is as follows: - Introduction to metagenomics analysis using MEGAN. - Metagenomic analysis of 8 Marine datasets using different techniques. - Comparison of datasets and of the results obtained using different techniques. Two weeks before the beginning of the course, each participant will be given a number of papers to study in preparation of the course. Each participant will be expected to give a 20 minute presentation on the content of specific papers. During the actual course, which will run for two weeks, participants will work on their projects daily in the computer lab in C311. In addition to the lab work, participants will be expected to prepare short presentations on different topics related to the project. After the course, participants will be expected to finish projects and to write a five-ten page report on the course. This report will be based on a lab logbook that each participant is expected to maintain. |
Requirements for admission |
This course is for MSc and Diploma
students only (sorry, no BSc students). The lectures
"Algorithms in Bioinformatics I and II" or "Bioinformatics I and II"
are recommended. Knowledge of Java and a scripting language
(Python, Perl, bash,
...) is required.GradingAs this is the first time that students will receive grades for this course, the grading system will be developed within the course. |
Course language
The teaching language is English.
Credits for this course
Diploma students: To
obtain a "Schein" for this course, you are required to successfully
complete all parts of the course.
Master students:
you are required to successfully complete all parts of the
course. You will be graded on your initial presentation, your
participant in the course, other presentations and your final report.
Participants
Till Helge Hedwig, Dominikus Krüger, Paul Rupek, Mario Stärk, Annette Treichel, Christian Zielke, Julian Zipperer
Schedule
| Date | Activities |
| 30.03.09 |
Morning: Presentations by students on introductory papers.
Afternoon: Presentations by students on introductory papers. Download of 8 Marine datasets. Extraction of first 10,000 reads from each dataset. Launch of BLASTX against NR on all datasets. Installation of MEGAN and MetaSim. Read MetaSim paper.
Hand in: Presentation |
| 31.03.09 |
Morning: Transform cDNA data from GenBank format to FastA. Launch BLASTX on remaining datasets. Run MEGAN on Marine DNA datasets. Solve tasks: for each dataset, perform analyses of taxa and of function that parallel the ones in the Marine paper. Explore use of different GO-slims.
Afternoon: Compare all datasets and then compare the comparison with the one reported in the paper. Hand in: Code for converting GenBank to FastA, code for grabbing first 10,000 reads |
| 01.04.09 |
Morning: Write a 2 page report on the comparison of the Marine DNA datasets as analysed using MEGAN with the results reported in the Marine paper. Explain why it is difficult to compare MEGAN's functional analysis with the one reported in the paper.
Afternoon: Study JGI paper on simulated Metagenome datasets. Design LC, MC and HC simulations on MetaSim. Simulate reads and launch BLASTX against NR runs. Produce CSV files for MEGAN to enable comparison of result against "truth". Hand in: Report on comparison |
| 02.04.09 |
Morning: Compile a list of all online resources for performing metagenomic analyses. Launch analysis on small test sets extracted from the LC, MC and HC datasets. Where feasible, perform analyses of full LC, MC and HC datasets.
Afternoon: 'Introduction to Geneious' by Melanie Hayr. Use Geneious to evaulate the performance of the LCA heuristic for taxonomical placement by comparing against phylogenetic placement.
Work on performance evaluation of MEGAN on LC, MC and HC datasets. Hand in: Presentation on Marine dataset results. |
| 03.04.09 | Morning: Analyze results on simulated datasets. How to evaluate the results? Afternoon: Work on investigating a phylogenetic alternative to the LCA algorithm. Setup timed comparison runs of BLAST and MEGAN using nfs vs using local disks. Launch additional simulations on LC, MC and HC datasets so that we have results for: Sanger sequencing, 454 sequencing, Solexa sequencing. |
| 06.04.09 |
Morning:Compare all cDNA datasets and then compare the comparison with the one
reported in the paper.Write 1 page report on the comparison of the Marine cDNA datasets as analyzed using MEGAN with the results reported in the Marine paper, together with the DNA datasets also in both taxonomic and functional aspect. Afternoon: Complete the report. and Hand in. |
| 07.04.09 | Morning: Complete performance evaluation of MEGAN on LC, MC and HC datasets. Compare MEGAN and MG-RAST results depending on 4 DNA and 4 cDNA datasets. Afternoon:Prepare the presentation for tomorrow. Hand in: A short report on MEGAN and MG-RAST results. |
| 08.04.09 | Morning: Presentations of assigned topics Afternoon: Use Geneious to study phylogenetic improvement of LCA algorithm Hand in: Presentations, discussion of phylogenetic method |
| 09.04.09 | Morning: Write ORF finder for prokayotic genes Afternoon: Run ORF finder on different metagenome datasets. Compare predicted ORFs on assigned reads vs "No hits". Compare performance for different sequencing technologies. Hand in: Code and comparisons. |
After completion of the the lab dates, each participant is expected to write a 10-15 page report. This report should be structured by the days of the course. For each topic studied in the course, please give a brief introduction to the topic, then described what computations and analyses were performed. Perhaps most importantly, provide a discussion of each topic. Also, please write a section on problems with current approaches to metagenome analysis and provide some ideas on how to improve analysis techniques.
Due date for this is: April 24th.

