### General Information

Course Code | X_401029 |
---|---|

Credits | 6 EC |

Period | P4+5 |

Course Level | 300 |

Language of Tuition | English |

Faculty | Faculty of Science |

Course Coordinator | dr. D. Dobler |

Examiner | prof. dr. M.C.M. de Gunst |

Teaching Staff |
dr. D. Dobler |

### Practical Information

You need to register for this course yourself

Last-minute registration is available for this course.

Teaching Methods | Seminar, Lecture |
---|

Target audiences

This course is also available as:

### Course Objective

After the student has follwed the course, she/he should be able to⦁ find, with the help of QQ-plots, symplots, histograms, box plots,

goodness-of-fit tests, etc., a suitable model distribution for a dataset

at hand, e.g., a normal or exponential distribution, and to estimate the

unknown (e.g. location and/or scale) parameters,

⦁ describe the data distribution quantitatively and qualitatively (e.g.,

symmetry, presence of outliers) with the help of the computer software

R,

⦁ decide, by taking characteristics of the dataset into account, which

statistical method is preferred (e.g. to use a nonparametric test

statistic, or to make a trade-off between robustness and efficiency of

an estimator) to draw conclusions on the population underlying the data,

for example with the help of hypothesis tests and confidence intervals,

⦁ apply tests for location parameters, stochastic order, or equality in

distribution in two-sample problems, and be able to assess the

asymptotic relative efficiency of tests,

⦁ apply (again in R) resampling methods such as the bootstrap or random

permutation to find characteristics of a statistic, even if no model

assumptions are made,

⦁ analyse, with the help of rank-based correlation tests, chi-square

tests for contingency tables, or multiple linear regression, the

relationship between two or more variables in a given dataset. In the

context of multiple linear regression, the student should be able to

identify influential observations and select variables for the linear

regression model.

In all the above, the student should be able to present the findings

appropriately.

### Course Content

This is an advanced level statistical data analysis course that buildson an introductory course on statistics, e.g. Statistics (Algemene

Statistiek). The course introduces the students to several widely used

statistical models and methods, and the students are taught how to apply

these tools to real data with the use of the statistical software

package R. The following subjects are covered:

- summarizing data;

- investigating the distribution of data;

- robust methods;

- nonparametric methods;

- bootstrap;

- two-sample problems;

- contingency tables;

- multiple linear regression.

The course is a combination of theory (in the lectures) and practice (in

the computer classes) in such a way that the theory is explicitly linked

to the practice of statistical data analysis.

### Teaching Methods

Lectures (2h), computer classes (2h).Attendance is not mandatory.

### Method of Assessment

Homework assignments in R and two written exams.50% of the final grade consists of the average assignment grades, the

other 50% of the final grade consists of the exam grade. Both of these

grades should be at least 5.5. Otherwise, the course is failed.

The exam grade equals either the average of the grades of both partial

exams, that are written during the semester (if they are both passed

with a grade of at least 4.0), or it equals the resit exam grade.

If the resit exam is written, the homework assignment grades still count

towards the final course grade as explained above.

### Literature

Lecture notes.### Target Audience

2BA, 2W, 2W-B, 3W, 3W-B, 3Ect.### Additional Information

Language of tuition: English### Recommended background knowledge

The required knowledge has been obtained if the students had previouslypassed the VU courses Statistics (X_400004) and Probability Theory

(X_400622) or equivalent courses.