Course ObjectiveAfter completing this course:
1. the student can transform and explore data with the command line
2. the student can extract data with regular expressions
3. the student can import and process static and streaming data in
4. the student can store and retrieve semi-structured data in and from
5. the student can parallelize tasks via MapReduce, threads and/or
queues in Python.
6. the student can create appropriate and well-formatted visualizations
7. the student can address a research question and report on their
Course ContentThis course aims to integrate various aspects involved with data science
and to teach the fundamentals of working with big data (including an
introduction to Hadoop). Topics include visualization of data; preparing
data for processing (machine learning or data mining); storing
unstructured data; and scaling techniques for working with big volumes
of data. Python is used throughout this hands-on course.
Teaching MethodsLectures and Q&A sessions
Method of AssessmentHand-in assignments, presentation, and a report.
Assignment week 1: 15%
Assignment week 2: 15%
Assignment week 3: 15%
The weighted average needs to be 5.5 or higher.
Entry RequirementsProgramming experience in any language
|Language of Tuition||English|
|Faculty||Faculty of Science|
|Course Coordinator||prof. dr. S. Bhulai|
|Examiner||dr. R. Bekker|
You need to register for this course yourself
Last-minute registration is available for this course.
|Teaching Methods||Lecture, Study Group|
This course is also available as: