Text Mining


Course Objective

You will get acquainted with the possibilities and problems of automatic
analysis of natural language by computers. Students will obtain
practical knowledge; they will learn to use existing technology and
experience the obstacles and options of the domain. They will learn
about the theories behind language technology and its connection to
artificial intelligence, linguistics and semantic web. The students will
choose a project themselves in which they apply the learned
technologies, evaluate its results and communicate their findings
through a report.

Course Content

It is estimated that about 80% of knowledge is captured in language:
think of news, wikis, social media and handbooks. Searching for
information is also largely done through language. The amount of
information is too large for humans to oversee, which is why
technologies are developed to access and use this information more

Text Mining is a promising research domain whose goal it is to extract
structured information from unstructured natural language. This is a big
challenge as human language is a rich and complex medium that is to be
understood in the context of social human interaction. Therefore,
language technology analyses language on different levels: the
grammatical level (e.g. word types and syntax), and the semantic level
(e.g. entities, events, opinions). During the course you will learn how
this information is coded in text and how you can extract and present it
using computers.

Teaching Methods

Lectures (2 hours/week) and labs (2 hours/week).

Method of Assessment

Assignments and exam:
50% final assignment (group);
50% exam.

None of the grades can be lower than 5 to pass the course, the average
should be 5.5 or higher.
Attendance at the final assignment presentation session is mandatory and
all but one of the practical assignments need to be passed.


Will be announced on Canvas

Target Audience


Additional Information

This course is also interesting to students from other faculties as many
fields deal with text and can benefit from automated text analysis (e.g.
digital humanities, financial domain). Specific prior knowledge is not
required, but affinity with computers is needed as the lab sessions and
assignment require some Python programming. Students need to work on
their laptops and Linux or Mac OS platforms are preferred.

Recommended background knowledge

Information Retrieval and Python

General Information

Course Code L_PABAALG002
Credits 6 EC
Period P4
Course Level 300
Language of Tuition English
Faculty Faculty of Humanities
Course Coordinator prof. dr. P.T.J.M. Vossen
Examiner prof. dr. P.T.J.M. Vossen
Teaching Staff dr. H.D. van der Vliet
dr. E. Maks
prof. dr. P.T.J.M. Vossen

Practical Information

You need to register for this course yourself

Last-minute registration is available for this course.

Teaching Methods Seminar, Lecture
Target audiences

This course is also available as: