Advanced Methods 1: Data Mining and Text Analysis


Course Objective

After taking this course, you have acquired knowledge and understanding
- the formulation of research proposals, including design, methodology,
procedure and data analysis
- advanced issues in computational methods, specifically: data modeling
and visualization; machine learning;
text analysis.

Additionally, you have acquired the competences to:
- conduct advanced analyses in computational research and analytical
methods, including: data modelling and visualization; text analysis;
machine learning.

Moreover, you will be able to:
- reflect critically on the validity and scientific and societal
relevance of text and data analysis results.

Finally, you will have acquired the skills to:
- Communicate the results of data analysis in a clear and accurate way
to an academic audience using appropriate visualizations in a written
report and oral presentation

Course Content

This course provides a strong foundation for text and data-intensive
research either in academia or in businesses.

Our online and offline actions increasingly leave digital traces that
are a treasure trove for analysing social behaviour, both for academics
and companies. These traces are often in textual form, such as Facebook
and Twitter posts, product reviews, and online profiles; or in the form
of large semi-structured data sets such as communication logs and
purchasing records. The unstructured nature of these data poses a
challenge to the social scientists or analyst, as new techniques such as
text and network analysis are needed to explore, visualize, interpret,
and test hypotheses using these data.

Each week, students will work in small teams on a specific challenge
relating to text and data analysis. Near the end of the week, you
present the results to your peers and give each other feedback. This
results in a written research report submitted at the end of the week.

Teaching Methods

Each week, students will participate in four meetings, for which
attendance will be required:
- An interactive lecture introducing the main methodology taught in that
- Two computer practicals in which students practice the main techniques
and work on their assignments;
- A closing workshops where students present their (draft) assignments
and give each other feedback.
See the daily schedule at the end of this document for more information.

Method of Assessment

Assessment is based on written assignments

Entry Requirements

This courses focuses on applying R to analysing text and quantitative
data. Before you start, you should make sure that you have knowledge
about the basics of R (especially tidyverse) and statistical modeling.
It builds directly and specifically on Big Data, Small Data in P1-P2.
Students who have not followed that course will be required to master
relevant R skills taught in that course before starting this course.


van Atteveldt, W., Trilling, D., & Arcila Calderón, C. (in progress),
Computational Analysis of Communication, A practical introduction to the
analysis of texts, networks, and images with code examples in Python and
R. Healy, K. (2018). Data Visualization: A Practical Introduction.
Princeton University Press. (selected chapters)
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An
introduction to statistical learning. New York: springer; chapters 4 &
Welbers, K., Van Atteveldt, W., & Benoit, K. (2017). Text analysis in R.
Communication Methods and Measures, 11(4), 245-265.
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and
pitfalls of automatic content analysis methods for political texts.
Political analysis, 21(3), 267-297.
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder‐Luis, J.,
Gadarian, S. K., ... & Rand, D. G. (2014). Structural topic models for
open‐ended survey responses. American Journal of Political Science,
58(4), 1064-1082.
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M.
(2009). Reading tea leaves: How humans interpret topic models. In
Advances in neural information processing systems (pp. 288-296).

Background / additional reading
Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy,
transform, visualize, and model data. " O'Reilly Media, Inc.".
Jurafsky, D., & Martin, J. H. (2014). Speech and language processing
(2nd Ed.). London: Pearson.
Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of
statistical natural language processing. MIT press.
Goldberg, Y. (2017). Neural network methods for natural language
processing. Synthesis Lectures on Human Language Technologies, 10(1),

General Information

Course Code S_AM1D
Credits 6 EC
Period P3
Course Level 500
Language of Tuition English
Faculty Faculty of Social Sciences
Course Coordinator dr. W.H. van Atteveldt
Examiner dr. W.H. van Atteveldt
Teaching Staff dr. K. Welbers
dr. W.H. van Atteveldt

Practical Information

You need to register for this course yourself

Teaching Methods Study Group, Practical, Reading