20597 - NATURAL LANGUAGE PROCESSING
Department of Computing Sciences
Course taught in English
Go to class group/s: 23
Course Director:
DEBORA NOZZA
DEBORA NOZZA
Suggested background knowledge
To feel comfortable in this course, you should have good knowledge of programming in Python, as well as simple linear algebra (what are vectors and matrices, how are they multiplied) and probability theory (what is a probability distribution, what is conditional probability). Additional knowledge of data structures (Counter, defaultdict) make many of the applications easier to solve.
Mission & Content Summary
MISSION
Natural Language Processing tools are becoming ubiquitous: from everyday tools like chatGPT, Siri, or Alexa to decision making processes in industry or politics and to text analysis tools in social science research. Machine-learning based text analysis tools provide a range of possibilities and are a growing field of expertise. Whether it is the exploration of text to find structures and topics, or the construction of a classifier to predict the sentiment or author characteristics of a text, this course provides an overview and hands-on experience in all relevant techniques.
CONTENT SUMMARY
Preparation: how do I work with text:
- Data formats.
- Preprocessing.
- Storage and retrieval.
Exploration: exploring structure in the data:
- Clustering.
- Topic models.
- Static and contextual word embeddings.
Prediction: finding patterns to impute new values:
- Text classification (sentiment analysis, author attributes).
- Logistic Regression.
- Perceptron.
- Feed-forward Neural Nets.
- Transformer architecture
- BERT
- chatGPT and (Large) Language Models
Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
At the end of the course student will be able to...
- Describe different text analysis problems.
- Talk about the linguistic foundations.
- Distinguish between exploration and prediction approaches.
- Know which algorithm to choose for a given problem.
- Understand the trade-offs between different approaches.
APPLYING KNOWLEDGE AND UNDERSTANDING
At the end of the course student will be able to...
- Apply their knowledge to practical text analysis problems.
- Implement a variety of algorithms for text exploration and classification in Python.
Teaching methods
- Lectures
- Guest speaker's talks (in class or in distance)
- Practical Exercises
- Individual works / Assignments
- Collaborative Works / Assignments
DETAILS
- Each lecture features hands-on exercises in Jupyter notebooks.
- Each student completes several individual assignments to get experience in implementation details.
- Students work together in groups to solve a joint task.
- If available, guest speakers from data-science companies present their work on text and language processing.
Assessment methods
Continuous assessment | Partial exams | General exam | |
---|---|---|---|
|
x | ||
|
x | ||
|
x |
ATTENDING AND NOT ATTENDING STUDENTS
There is no distinction between attending and non-attending students.
Teaching materials
ATTENDING AND NOT ATTENDING STUDENTS
- Lecture slides and notes provided on Bboard.
- D. HOVY, Text Analysis in Python for Social Scientists: Discovery and Exploration, Cambridge University Press, 2020. (https://www.cambridge.org/core/elements/text-analysis-in-python-for-social-scientists/BFAB0A3604C7E29F6198EA2F7941DFF3)
- JURAFSKY, DAN, J.H. MARTIN, Speech and language processing, Vol. 3. London, Pearson, 2014.
- C.D. MANNING, H. SCHUTZE, Foundations of statistical natural language processing, MIT press, 1999.
- S. MARSLAND, Machine learning: an algorithmic perspective, CRC press, 2015.
- F. CHOLLET, Deep learning with Python, Manning Publications Co., 2017.
Last change 20/11/2024 08:03