Project Big Data


Course Objective

After completing this course:
1. the student can transform and explore data with the command line
2. the student can extract data with regular expressions
3. the student can import and process static and streaming data in
4. the student can store and re​trieve semi-structured data in and from
a database
5. the student can parallelize tasks via MapReduce, threads and/or
queues in Python.
6. the student can create appropriate and well-formatted visualizations
and tables
7. the student can address a research question and report on their

Course Content

This course aims to integrate various aspects involved with data science
and to teach the fundamentals of working with big data (including an
introduction to Hadoop). Topics include visualization of data; preparing
data for processing (machine learning or data mining); storing
unstructured data; and scaling techniques for working with big volumes
of data. Python is used throughout this hands-on course.​

Teaching Methods

Lectures and Q&A sessions

Method of Assessment

Hand-in assignments, presentation, and a report.
Assignment week 1: 15%
Assignment week 2: 15%
Assignment week 3: 15%
Report: 35%
Presentation: 20%

The weighted average needs to be 5.5 or higher.​

Entry Requirements

Programming experience in any language



Target Audience


General Information

Course Code X_400645
Credits 6 EC
Period P6
Course Level 300
Language of Tuition English
Faculty Faculty of Science
Course Coordinator prof. dr. S. Bhulai
Examiner prof. dr. S. Bhulai
Teaching Staff Q. Wang

Practical Information

You need to register for this course yourself

Last-minute registration is available for this course.

Teaching Methods Lecture, Study Group
Target audiences

This course is also available as: