Text Mining

2019-2020
Dit vak wordt in het Engels aangeboden. Omschrijvingen kunnen daardoor mogelijk alleen in het Engels worden weergegeven.

Doel vak

You will get acquainted with the possibilities and problems of automatic
analysis of natural language by computers. Students will obtain
practical knowledge; they will learn to use existing technology and
experience the obstacles and options of the domain. They will learn
about the theories behind language technology and its connection to
artificial intelligence, linguistics and semantic web. The students will
choose a project themselves in which they apply the learned
technologies, evaluate its results and communicate their findings
through a report.

Inhoud vak

It is estimated that about 80% of knowledge is captured in language:
think of news, wikis, social media and handbooks. Searching for
information is also largely done through language. The amount of
information is too large for humans to oversee, which is why
technologies are developed to access and use this information more
efficiently.

Text Mining is a promising research domain whose goal it is to extract
structured information from unstructured natural language. This is a big
challenge as human language is a rich and complex medium that is to be
understood in the context of social human interaction. Therefore,
language technology analyses language on different levels: the
grammatical level (e.g. word types and syntax), and the semantic level
(e.g. entities, events, opinions). During the course you will learn how
this information is coded in text and how you can extract and present it
using computers.

Onderwijsvorm

Lectures (2 hours/week) and labs (2 hours/week).

Toetsvorm

Assignments and exam:
50% final assignment (group);
50% exam.

None of the grades can be lower than 5 to pass the course, the average
should be 5.5 or higher.
Attendance at the final assignment presentation session is mandatory and
all but one of the practical assignments need to be passed.

Vereiste voorkennis

None

Literatuur

Will be announced on Canvas

Doelgroep

BA 3IMM, BA 3LI

Overige informatie

This course is also interesting to students from other faculties as many
fields deal with text and can benefit from automated text analysis (e.g.
digital humanities, financial domain). Specific prior knowledge is not
required, but affinity with computers is needed as the lab sessions and
assignment require some Python programming. Students need to work on
their laptops and Linux or Mac OS platforms are preferred.

Aanbevolen voorkennis

Information Retrieval and Python

Algemene informatie

Vakcode L_PABAALG002
Studiepunten 6 EC
Periode P4
Vakniveau 300
Onderwijstaal Engels
Faculteit Faculteit der Geesteswetenschappen
Vakcoördinator prof. dr. P.T.J.M. Vossen
Examinator prof. dr. P.T.J.M. Vossen
Docenten dr. H.D. van der Vliet
dr. E. Maks
prof. dr. P.T.J.M. Vossen

Praktische informatie

Voor dit vak moet je zelf intekenen.

Voor dit vak kun je last-minute intekenen.

Werkvormen Werkcollege, Hoorcollege
Doelgroepen

Dit vak is ook toegankelijk als: