Web Data Processing Systems

2019-2020
Dit vak wordt in het Engels aangeboden. Omschrijvingen kunnen daardoor mogelijk alleen in het Engels worden weergegeven.

Doel vak

After taking this course, you will be able to:
- Understand the fundamentals of the most important problems that modern
Web companies face daily;
- Process large amounts of Web data efficiently using state-of-the-art
tools that are currently used in the Web industry;
- Extract useful insights from raw data available on the Web;
- Adapt or reuse techniques used on the Web to other fields (e.g. Data
Mining, Artificial Intelligence) where similar problems might occur.

This course has seven learning goals:
G1: Learning to store and retrieve information from large repositories
of knowledge
G2: represent and extract knowledge on the Web
G3: Connect unstructured and unstructured data
G4: Infer new knowledge from existing knowledge bases
G5: Process large amounts of Web data efficiently using state-of-the-art
tools
G6: Implement efficient prototypes that work with Web data
G7: Adapt or reuse techniques used on the Web to other fields

Inhoud vak

The Web constitutes the largest repository of knowledge that is
available to mankind, and its impact on modern society is unprecedented
at many levels. Many Web companies are valued with billion dollar
quotations and are now central to our modern life.

The key players in the Web industry must face numerous challenges that
are concerned with the size, distribution, heterogeneity, and the
uncontrolled nature of the Web. Systems to process Web data require the
application of a combination of techniques spanning databases,
distributed systems, data mining, and artificial intelligence.

The goal of this course is to introduce the student to the most advanced
systems and techniques which deal with Web data. Important classes of
problems concern:

- the storage and retrieval of Web data (How can we store and retrieve
information from large social networks, graphs, or large volumes of
text?)
- efficient entity disambiguation (What is a particular web page talking
about?)
- large-scale knowledge extraction (What sort of knowledge can we
extract from web documents -- e.g. Wikipedia?)
- effective link prediction (Is there a connection between two
users/events/concepts?)
- expressive ontological inference (Can current knowledge lead to more
implicit knowledge?)
- trust (Can we trust the content on a certain blog post?)

This course will describe techniques to perform these tasks with a
particular emphasis on scalability. In order to better understand the
challenges and effectiveness of current solutions, the student will be
called to implement a practical assignment on realistic Web data. The
assignment will be part of the final evaluation of the course.

Onderwijsvorm

The course takes the form of lectures and practical assignments.

Toetsvorm

The final grade is determined by a final written exam (60%) and one
practical group assignment (40%).

Literatuur

A mixture of scientific publications and other material available on the
Web.

Doelgroep

XM_CS

Algemene informatie

Vakcode XM_40020
Studiepunten 6 EC
Periode P2
Vakniveau 400
Onderwijstaal Engels
Faculteit Faculteit der Bètawetenschappen
Vakcoördinator J. Urbani
Examinator J. Urbani
Docenten J. Urbani

Praktische informatie

Voor dit vak moet je zelf intekenen.

Voor dit vak kun je last-minute intekenen.

Werkvormen Hoorcollege
Doelgroepen

Dit vak is ook toegankelijk als: