Statistics for High-Dimensional Data

Doel vak

Teaching students the adjustments to classical statistical methodology,
necessary to tackle high-dimensional data.

Inhoud vak

This course gives an overview of statistical methods that are used for
analyzing high-dimensional data sets in which many variables (often
thousands) have been measured for a limited number of subjects. This
type of data arises in genomics, where genetic information is measured
for many thousands of genes simultaneously, in functional MRI imaging of
the brain, and also in economic applications. The course covers some of
the most important statistical issues for high-dimensional data,
including: a) initial processing of the data; b) model-based
statistical inference for Gaussian and count data (classical and
Bayesian methods); c) multiple testing (family-wise error rate and false
discovery rate control); d) prediction of binary endpoints (e.g.
recurrence of a tumor) and survival; e) clustering of samples (e.g. to
find tumor subtypes). Several specific types of high-dimensional data
will be discussed and used during the course. In terms of applications
the course focuses on cancer genomics, but theoretical aspects will
apply to other fields as well.


Lectures + practical exercises


Written exam


Tutorial in biostatistics: multiple hypothesis testing in genomics" by
Goeman & Solari (article in Statistics in Medicine) plus handouts
provided by the lecturer


