20879 - LANGUAGE TECHNOLOGY
Course taught in English
Go to class group/s: 29
Synchronous Blended: Lessons in synchronous mode in the classroom (for a maximum of one hour per credit in remote mode)
To feel comfortable in this course, you should have good knowledge of programming in Python, as well as simple linear algebra (what are vectors and matrices, how are they multiplied) and probability theory (what is a probability distribution, what is conditional probability). Additional knowledge of data structures (classes, Counter, defaultdict) make many of the applications easier to solve.
Natural Language Processing and language technology tools are becoming ubiquitous: from everyday tools like machine translation or smart speakers, to industry applications for hiring, customer analysis, etc. Machine-learning based text analysis tools provide a range of possibilities and are a growing field of expertise. The advance of large language models like (chatGPT, etc) have changed and greatly expanded NLP capabilities. This course provides an overview and hands-on experience in all relevant techniques.
Information theory, basics and history of NLP, language models, representations, topic models, classification, NLP applications, ethics of AI and NLP.
- understand the power of large langauge models
- reason about the risks and benefits of various approaches
- come up with an appropriate method for a given problem
- implement various NLP methods
- develop, run, and analyze various tools
- Face-to-face lectures
- Guest speaker's talks (in class or in distance)
- Exercises (exercises, database, software etc.)
- Individual assignments
- Group assignments
The course has lectures, with slides and explanantions, and associated practice Jupyter notebooks.
Each student completes individual assignments to get experience in implementation details, and students work together in groups to solve a joint task. If applicable/available, students have the option to participate in external competitions such as Kaggle competitions or shared tasks in natural language processing.
Continuous assessment | Partial exams | General exam | |
---|---|---|---|
x | x | ||
x |
Best two out of three individual assignments (50%)
Final project (50%)
Projects are graded based on the performance of the system and the quality of the report. Assessment of projects will include their clarity of presentation and performance of models used, as well as ambitiousness of the project.
Jupyter notebooks are provided for each class, as well as class notes for required reading.
OPTIONAL READING
Hovy, Dirk. Text Analysis in Python for Social Scientists, Discovery and Exploration. Cambridge University Press, 2020.
Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London: Pearson, 2014.
Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. MIT press, 1999.
Marsland, Stephen. Machine learning: an algorithmic perspective. CRC press, 2015.
Chollet, Francois. Deep learning with Python. Manning Publications Co., 2017.
Goldberg, Yoav. A Primer on Neural Network Models for Natural Language Processing. ArXiv, 2015.
Eisenstein, Jacob. Introduction to Natural Language Processing. MIT Press, 2019.