Course ObjectiveThe goal of the course is to gain insight into and experience with
algorithms and infrastructures for managing big data.
Course ContentThis course confronts the students with some data management tasks,
where the challenge is that the mere size of this data causes naive
solutions, and/or solutions that work only on a single machine, to stop
being practical. Solving such tasks requires the computer scientist to
have insight in the main factors that underlie algorithm performance
(data access patterns, hardware latency/bandwidth), as well as possess
certain skills and experience in managing large-scale computing
infrastructure. Apart from the data being of large volume, another
problem invariably is that data comes in strange forms and formats, is
polluted, and needs to be transformed and cleaned. The main part of the
course is the second assignment: a large big data analysis project where
each student teams tackles a different problem, and while doing so gains
experience in multiple aspects of large-scale data engineering (critical
thinking, data management technologies, visualization techniques, paper
More information is found on http://event.cwi.nl/lsde - also check the
"showcase" section where you can see past project results
(visualizations and papers).
Teaching MethodsThere are two lectures per week, and the course requires significant
practical work. The practicals are done outside lecture hours, at the
discretion of the students who are supported remotely through Skype.
Method of AssessmentIn the first assignment the students can work either on their own
laptops via a prepared VM, or in the cloud using an Amazon EC2 Micro
Instance; and there is an online competition between practicum teams for
the best result. The second assignment, using a Hadoop Cluster, are done
on the SurfSARA Hadoop cluster (90 machines, 720 cores, 1.2PB storage).
For this assignment, a report of 5-8 pages must be written. The students
also need to read two scientific papers of choice, related to the second
assignment. There is no written exam; the grade is based on the two
assignments grades and the grade for the in-class presentation.
Entry RequirementsHadoop environments consist of Linux machines, so some basic ability in
working with these comes in handy. Also, you must have some programming
skills in C,C++ or Java.
LiteratureThe course website http://event.cwi.nl/lsde provides all this
information. In this website, each lecture is provided, but also its
main points are summarized. Further, each lecture page provides links to
useful videos and presentations, but also the scientific papers to be
Target AudiencemCS, mPDCS
Recommended background knowledgeProgramming proficiency in C/C++ or Java
|Language of Tuition||English|
|Faculty||Faculty of Science|
|Course Coordinator||prof. dr. P.A. Boncz|
|Examiner||prof. dr. P.A. Boncz|
prof. dr. P.A. Boncz
You need to register for this course yourself
Last-minute registration is available for this course.
This course is also available as: