Statistical Data Analysis

2018-2019
Dit vak wordt in het Engels aangeboden. Omschrijvingen kunnen daardoor mogelijk alleen in het Engels worden weergegeven.

Doel vak

After the student has follwed the course, she/he should be able to
⦁ find, with the help of QQ-plots, symplots, histograms, box plots,
goodness-of-fit tests, etc., a suitable model distribution for a dataset
at hand, e.g., a normal or exponential distribution, and to estimate the
unknown (e.g. location and/or scale) parameters,
⦁ describe the data distribution quantitatively and qualitatively (e.g.,
symmetry, presence of outliers) with the help of the computer software
R,
⦁ decide, by taking characteristics of the dataset into account, which
statistical method is preferred (e.g. to use a nonparametric test
statistic, or to make a trade-off between robustness and efficiency of
an estimator) to draw conclusions on the population underlying the data,
for example with the help of hypothesis tests and confidence intervals,
⦁ apply tests for location parameters, stochastic order, or equality in
distribution in two-sample problems, and be able to assess the
asymptotic relative efficiency of tests,
⦁ apply (again in R) resampling methods such as the bootstrap or random
permutation to find characteristics of a statistic, even if no model
assumptions are made,
⦁ analyse, with the help of rank-based correlation tests, chi-square
tests for contingency tables, or multiple linear regression, the
relationship between two or more variables in a given dataset. In the
context of multiple linear regression, the student should be able to
identify influential observations and select variables for the linear
regression model.

In all the above, the student should be able to present the findings
appropriately.

Inhoud vak

This is an advanced level statistical data analysis course that builds
on an introductory course on statistics, e.g. Statistics (Algemene
Statistiek). The course introduces the students to several widely used
statistical models and methods, and the students are taught how to apply
these tools to real data with the use of the statistical software
package R. The following subjects are covered:
- summarizing data;
- investigating the distribution of data;
- robust methods;
- nonparametric methods;
- bootstrap;
- two-sample problems;
- contingency tables;
- multiple linear regression.
The course is a combination of theory (in the lectures) and practice (in
the computer classes) in such a way that the theory is explicitly linked
to the practice of statistical data analysis.

Onderwijsvorm

Lectures (2h), computer classes (2h).
Attendance is not mandatory.

Toetsvorm

Homework assignments in R and two written exams.
50% of the final grade consists of the average assignment grades, the
other 50% of the final grade consists of the exam grade. Both of these
grades should be at least 5.5. Otherwise, the course is failed.
The exam grade equals either the average of the grades of both partial
exams, that are written during the semester (if they are both passed
with a grade of at least 4.0), or it equals the resit exam grade.
If the resit exam is written, the homework assignment grades still count
towards the final course grade as explained above.

Literatuur

Lecture notes.

Doelgroep

2BA, 2W, 2W-B, 3W, 3W-B, 3Ect.

Overige informatie

Language of tuition: English

Aanbevolen voorkennis

The required knowledge has been obtained if the students had previously
passed the VU courses Statistics (X_400004) and Probability Theory
(X_400622) or equivalent courses.

Algemene informatie

Vakcode X_401029
Studiepunten 6 EC
Periode P4+5
Vakniveau 300
Onderwijstaal Engels
Faculteit Faculteit der Bètawetenschappen
Vakcoördinator dr. D. Dobler
Examinator dr. D. Dobler
Docenten dr. D. Dobler

Praktische informatie

Voor dit vak moet je zelf intekenen.

Voor dit vak kun je last-minute intekenen.

Werkvormen Werkcollege, Deeltoets extra zaalcapaciteit, Hoorcollege
Doelgroepen

Dit vak is ook toegankelijk als: