Course 2023-2024 a.y.

20879 - LANGUAGE TECHNOLOGY

Department of Computing Sciences

Course taught in English

AI (8 credits - II sem. - OB | ING-INF/05)

Course Director:
DIRK HOVY

Classes: 29 (II sem.)

Instructors:
Class 29: DIRK HOVY

Suggested background knowledge

To feel comfortable in this course, you should have good knowledge of programming in Python, as well as simple linear algebra (what are vectors and matrices, how are they multiplied) and probability theory (what is a probability distribution, what is conditional probability). Additional knowledge of data structures (classes, Counter, defaultdict) make many of the applications easier to solve.

Mission & Content Summary

MISSION

Natural Language Processing and language technology tools are becoming ubiquitous: from everyday tools like machine translation or smart speakers, to industry applications for hiring, customer analysis, etc. Machine-learning based text analysis tools provide a range of possibilities and are a growing field of expertise. The advance of large language models like (chatGPT, etc) have changed and greatly expanded NLP capabilities. This course provides an overview and hands-on experience in all relevant techniques.

CONTENT SUMMARY

Information theory, basics and history of NLP, language models, representations, topic models, classification, NLP applications, ethics of AI and NLP.

Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

- understand the power of large langauge models

- reason about the risks and benefits of various approaches

- come up with an appropriate method for a given problem

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

- implement various NLP methods

- develop, run, and analyze various tools

Teaching methods

Face-to-face lectures
Guest speaker's talks (in class or in distance)
Exercises (exercises, database, software etc.)
Individual assignments
Group assignments

DETAILS

The course has lectures, with slides and explanantions, and associated practice Jupyter notebooks.

Each student completes individual assignments to get experience in implementation details, and students work together in groups to solve a joint task. If applicable/available, students have the option to participate in external competitions such as Kaggle competitions or shared tasks in natural language processing.

Assessment methods

	Continuous assessment	Partial exams	General exam
Individual assignment (report, exercise, presentation, project work etc.)	x		x
Active class participation (virtual, attendance)	x

ATTENDING AND NOT ATTENDING STUDENTS

Best two out of three individual assignments (50%)
Final project (50%)

Projects are graded based on the performance of the system and the quality of the report. Assessment of projects will include their clarity of presentation and performance of models used, as well as ambitiousness of the project.

Teaching materials

ATTENDING AND NOT ATTENDING STUDENTS

Jupyter notebooks are provided for each class, as well as class notes for required reading.

OPTIONAL READING
Hovy, Dirk. Text Analysis in Python for Social Scientists, Discovery and Exploration. Cambridge University Press, 2020.
Jurafsky, Dan, and James H. Martin. Speech and language processing. Vol. 3. London: Pearson, 2014.
Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. MIT press, 1999.
Marsland, Stephen. Machine learning: an algorithmic perspective. CRC press, 2015.
Chollet, Francois. Deep learning with Python. Manning Publications Co., 2017.
Goldberg, Yoav. A Primer on Neural Network Models for Natural Language Processing. ArXiv, 2015.
Eisenstein, Jacob. Introduction to Natural Language Processing. MIT Press, 2019.

Last change 06/02/2024 10:46