Language as Data


Course Objective

After this course, students are able to find their way in different
linguistic data collections. They know what kind of information can be
found in text and how this information is encoded. Students understand
the syntactic and semantic concepts that are needed for finding this

Course Content

Linguistics describes language as a cognitive system (or a cognitive
process) of form and meaning. However, in mining texts it is important
to find not just meaning as such, but the actual information in text. As
an example, from a purely linguistic point of view a referring
expression like he, Henry or the terrorist may point to an individual,
but in the mining of information you would like to know what individual
is at stake, and where in the text the same individual is mentioned by
other means.
In Language as Data we study how information is stored in text. But we
also will be looking at data collections: sources of text (written and
spoken text from different genres, like newspapers, social media etc.)
and how these data collections can be accessed.

Teaching Methods

There are two meetings of two hours each during 7 weeks. In the
lectures, the relevant linguistic theory is explained and the practical
skills are trained. Students are expected to show an interactive

Method of Assessment

The course is evaluated by assignments (50%) and a final exam (50%).
Both should be scored at least a 5, with a minimum average of 5.5.

Entry Requirements

Linguistic Research, Programming in Python

Target Audience

MA students in Linguistics (specialisation Text Mining).

General Information

Course Code L_PAMATLW001
Credits 6 EC
Period P2
Course Level 400
Language of Tuition English
Faculty Faculty of Humanities
Course Coordinator dr. H.D. van der Vliet
Examiner dr. H.D. van der Vliet
Teaching Staff dr. H.D. van der Vliet

Practical Information

You need to register for this course yourself

Last-minute registration is available for this course.

Teaching Methods Seminar
Target audiences

This course is also available as: