20597 - NATURAL LANGUAGE PROCESSING
Department of Marketing
DIRK HOVY
Suggested background knowledge
Mission & Content Summary
MISSION
CONTENT SUMMARY
Preparation: how do I work with text:
- Data formats.
- Preprocessing.
- Storage and retrieval.
Exploration: exploring structure in the data:
- Clustering.
- Topic models.
- Word embeddings.
Prediction: finding patterns to impute new values:
- Text classification (sentiment analysis, author attributes).
- Logistic Regression.
- Perceptron.
- Feed-forward Neural Nets.
- Convolutional Neural Nets.
- Structured perceptron.
- Recurent Neural Nets.
Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
- Describe different text analysis problems.
- Talk about the linguistic foundations.
- Distinguish between exploration and prediction approaches.
- Know which algorithm to choose for a given problem.
- Understand the trade-offs between different approaches.
APPLYING KNOWLEDGE AND UNDERSTANDING
- Apply their knowledge to a practical text analysis problem.
- Implement a variety of algorithms for text exploration and classification in Python.
Teaching methods
- Face-to-face lectures
- Online lectures
- Guest speaker's talks (in class or in distance)
- Exercises (exercises, database, software etc.)
- Individual assignments
- Group assignments
- Participation in external competitions
DETAILS
- Each lecture features hands-on exercises in Jupyter notebooks.
- Each student completes several individual assignments to get experience in implementation details.
- Students work together in groups to solve a joint task.
- If applicable/available, students have the option to participate in external competitions such as Kaggle competitions or shared tasks in natural language processing.
- If available, guest speakers from data-science companies present their work on text and language processing.
Assessment methods
Continuous assessment | Partial exams | General exam | |
---|---|---|---|
|
x | ||
|
x | ||
|
x |
ATTENDING STUDENTS
- Individual Assignment (50%)
Individual midterm project (Jupyter Notebook) on material from first half of class.
Each student completes an individual assignment to get experience in implementation details.
- Final Group project (50%)
Group final project (Jupyter Notebook) on second half of class material.
Students work together in groups to solve a joint task.
It is graded based on the performance of the system and the quality of the report.
Assessment of projects will include their clarity of presentation and performance of models used.
NOT ATTENDING STUDENTS
- Individual Assignment (100%)
Individual project (Jupyter Notebook) on material from entire class.
Each student completes an individual assignment to get experience in implementation details.
Assessment of project will include their clarity of presentation and performance of models used.
Teaching materials
ATTENDING AND NOT ATTENDING STUDENTS
- Lecture slides and notes provided on Bboard.
- D. HOVY, Text Analysis in Python for Social Scientists: Discovery and Exploration, Cambridge University Press, 2020. (https://www.cambridge.org/core/elements/text-analysis-in-python-for-social-scientists/BFAB0A3604C7E29F6198EA2F7941DFF3)
- JURAFSKY, DAN, J.H. MARTIN, Speech and language processing, Vol. 3. London, Pearson, 2014.
- C.D. MANNING, H. SCHUTZE, Foundations of statistical natural language processing, MIT press, 1999.
- S. MARSLAND, Machine learning: an algorithmic perspective, CRC press, 2015.
- F. CHOLLET, Deep learning with Python, Manning Publications Co., 2017.