Algorithms in Sequence Analysis


Course Objective

Have you ever wondered how we can track a gene across 3 billion years of
evolution? Sequence alignment can be used to compare genes from humans
and bacteria, using a dynamic programming algorithm. In this course we
focus on algorithms for biological sequences that can be applied to real
scientific problems in biology.

Students will obtain in-depth knowledge about the theory of sequence
analysis methods. They will also develop understanding and skills to
apply the algorithms to protein and DNA sequences. We would like to
stress that no biological knowledge is required to enter this course.

- At the end of the course, the student will be aware of the major
issues, methodology and available algorithms in sequence analysis.
- At the end of the course, the student will have hands-on experience in
tackling biological problems using sequence analysis algorithms and
applying the general statistical framework of Hidden Markov Models.
- At the end of the course, the student will be able to implement
several of the most important algorithms in sequence analysis.

Course Content

- Dynamic programming, database searching, pairwise and multiple
alignment, probabilistic methods including hidden markov models, pattern
matching, entropy measures, evolutionary models, and phylogeny.

- Programming (in Python) an alignment algorithm based on dynamic
- Aligning sequencing data from tumors to the human genome and analysing
structural variants
- Programming (in Python) an implementation of Hidden Markov Models and
using it to predict protein domain structure

Teaching Methods

13 Lectures: 2 two-hour lectures per week
13 Computer practicals and associated assignments: 2 two-hour hands-on
sessions per week

Method of Assessment

The final grade for this course will consist of 50% practical work (see
above) and 50% theoretical assessment.
The theoretical assessment will be an oral and/or written exam
(depending on number of students).

Entry Requirements

Bachelor in any science discipline (including medicine).
Basic programming skills (Python) and an interest in biological


Course material on
Books: Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.. Biological
Sequence Analysis. Cambridge University Press, 1998, 350 pp., ISBN
Recommended reading: Marketa Zvelebil and Jeremy O. Baum Understanding
Bioinformatics Garland Science 2008 ISBN-10: 0-8153-4024-9

Target Audience

mAI, mBio, mCS

Additional Information

BYOD policy (Bring Your Own Device)
We expect students in this course to use their own laptop. This laptop
should at the very least support an SSH client, for remote shell access
to the VU Linux servers. Ideally, this laptop supports a command line
shell, Python 3 and a text editor with syntax highlighting -- either
standalone (e.g. Atom or Sublime Text) or as part of a simple IDE (e.g.
Spyder). As such, we recommend the Anaconda python distribution
regardless of operating system, along with PuTTy or PowerShell for
Windows users specifically.

If you are considering purchasing new hardware, we recommend the
o Processor: Intel i5 / AMD Ryzen 5 or above
o Memory: At least 4GB RAM
o Storage: At least 512GB harddisk space
o Operating System: Ubuntu 16.04

The course is taught in English.

General Information

Course Code X_405050
Credits 6 EC
Period P2
Course Level 600
Language of Tuition English
Faculty Faculty of Science
Course Coordinator prof. dr. J. Heringa
Examiner prof. dr. J. Heringa
Teaching Staff prof. dr. J. Heringa

Practical Information

You need to register for this course yourself

Last-minute registration is available for this course.

Teaching Methods Lecture, Computer lab
Target audiences

This course is also available as: