Course ObjectiveThe goal of the course is to gain insight into and experience with
algorithms and infrastructures for managing big data.
Course ContentThis course confronts the students with some data management tasks,
where the challenge is that the mere size of this data causes naive
solutions, and/or solutions that work only on a single machine, to stop
being practical. Solving such tasks requires the computer scientist to
have insight in the main factors that underlie algorithm performance
(data access patterns, hardware latency/bandwidth), as well as possess
certain skills and experience in managing large-scale computing
infrastructure. Apart from the data being of large volume, another
problem invariably is that data comes in strange forms and formats, is
polluted, and needs to be transformed and cleaned. The main part of the
course is the second assignment: a large big data analysis project where
each student teams tackles a different problem, and while doing so gains
experience in multiple aspects of large-scale data engineering (critical
thinking, data management technologies, visualization techniques, paper
More information is found on http://event.cwi.nl/lsde - also check the
"showcase" section where you can see past project results
(visualizations and papers).
Teaching MethodsThere are two lectures per week, and the course requires significant
practical work. The practicals are done outside lecture hours, at the
discretion of the students who are supported remotely through slack,
Method of AssessmentIn the first assignment (writing a hand-coded program doing graph
analysis in a single machine) the students can work either on their own
laptops via a prepared VM, or in the cloud using an Amazon EC2 Micro
Instance; and there is an online competition between practicum teams for
the best result. The second assignment is done on large datasets (TB
scale) using Spark on a cluster. For this assignment, a report of 5-8
pages must be written. The students also need to read two scientific
papers of choice, related to the second
assignment. There is no written exam; the grade is based on the two
assignments grades and the grade for the in-class presentation.
Entry RequirementsProgramming experience with Java as well as with C or C++.
Theoretical knowledge of database systems, computer architectures, and
Experience with Linux (to work with Hadoop environments) and SQL.
LiteratureThe course website http://event.cwi.nl/lsde provides all this
information. In this website, each lecture is provided, but also its
main points are summarized. Further, each lecture page provides links to
useful videos and presentations, but also the scientific papers to be
Target AudiencemCS, mPDCS
Recommended background knowledgeKnowing how to work with a debugger (gdb for C/C++) and git (github).
|Language of Tuition||English|
|Faculty||Faculty of Science|
|Course Coordinator||prof. dr. P.A. Boncz|
|Examiner||prof. dr. P.A. Boncz|
prof. dr. P.A. Boncz
You need to register for this course yourself
Last-minute registration is available for this course.