20596 - MACHINE LEARNING
Department of Decision Sciences
DANIELE DURANTE
Prerequisites
Mission & Content Summary
MISSION
CONTENT SUMMARY
- INTRODUCTION: A smooth introduction to Machine Learning
- LINEAR METHODS: High-dimensional linear regression; Logistic regression; Linear and quadratic discriminant analysis
- MODEL ASSESSMENT AND SELECTION: Bias-variance trade-off; Training, test and validation sets; Cross-validation; Bootstrap
- REGULARIZATION AND SHRINKAGE: Subset selection; Ridge regression; Lasso and related algorithms
- METHODS BEYOND LINEARITY: Regression and smoothing splines; Local linear regression; Kernel methods; Generalized additive models
- TREE-BASED METHODS: Regression and classification trees; Bagging; Random forests; Boosting
- BEYOND TREE-BASED METHODS: Support vector machines; Neural networks
The above methods will be also implemented during LAB SESSIONS on real-world case studies. Code and implementation in classical statistical softwares, with a main focus on R, are also part of the course topics.
Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
- Explain the methodology and theory underlying the classical machine learning methods
- Illustrate the technical aspects related to the implementation of classical machine learning methods
- Recognize the distinctive properties of each machine learning technique
- Identify the most suitable machine learning technique for a given data analytic problem
- Summarize differences and similarities between multiple machine learning techniques
APPLYING KNOWLEDGE AND UNDERSTANDING
- Examine the relevant research questions underlying a real-data analytic problem
- Choose a machine learning technique coherent with the analytic question and apply it to the dataframe
- Identify relevant structures underlying the data and effectively predict unobserved events
- Discuss the empirical output produced by a machine learning technique
- Connect different machine learning techniques to improve predictive performance in complex analytic problems
Teaching methods
- Face-to-face lectures
- Exercises (exercises, database, software etc.)
- Case studies /Incidents (traditional, online)
- Individual assignments
- Group assignments
- Interactive class activities (role playing, business game, simulation, online forum, instant polls)
DETAILS
Classical face-to-face lectures will focus on the presentation and the discussion of the machine learning techniques covered by the course, with a main attention to methodology, theory and computational methods. To improve the learning experience and motivate the interaction, illustrative case studies and in-class exercises may also be considered.
A series of lab sessions, with the students working on their own laptop, will be also provided. These classes will, typically (but not always), consist of two main parts:
- The students will be guided in the implementation of the machine learning techniques on standard statistical softwares, with a main focus on R. Some Python code will be also made available as supplement materials. To download R or R Studio see https://www.r-project.org/ and https://www.rstudio.com/.
- After the guided implementation, an in-class individual assignment (performed on a data competition platform) will ask the students to solve a specific predictive problem from a data analytic case study, leveraging suitable machine learning tools. This interactive class activity is expected to improve the autonomy of the students in answering a variety of real-world analytic questions, and will serve as a self-assessment occasion. Some other online data competitions may be provided as individual or group homeworks (not compulsory), to offer additional training materials for the interested students.
Assessment methods
Continuous assessment | Partial exams | General exam | |
---|---|---|---|
|
x | ||
|
x |
ATTENDING AND NOT ATTENDING STUDENTS
Due to the nature of the course, only a final general exam will be considered to evaluate, with the same criteria, attending and non-attending students. This assessment will consist of two main parts.
- Traditional written individual exam which will consist of open-ended questions and small exercises. The focus is on evaluating students based on their methodological, theoretical and computational understanding of the machine learning techniques presented in the face-to-face lectures.
- Individual assignment based on a data challenge where students are asked to develop and apply a data analytic strategy to answer a predictive problem. Such a data challenge will be a longer and more structured version of those proposed in the lab sessions, and will take place towards the end of the course. This assignment is managed via an online data competition platform, and the evaluation will consider the predictive performance of the analytic approach proposed by the student and the quality of a document describing the methods considered, the code, the final results and related comments.
Grading Rule: Let X denote the grade of the traditional written individual exam and let Y be the grade of the individual assignment. Then, if Y is greater than or equal to X, the final grade is 0.3*Y+0.7*X. Otherwise, if Y is less than X, the final grade is X.
Teaching materials
ATTENDING AND NOT ATTENDING STUDENTS
The course relies on two books which complement each other and are available online for free.
- Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Second Edition). Springer.
- James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.
Slides summarizing the contents presented in class will be also provided. Students who are interested in deepening, individually, specific concepts will be provided with additional reading materials upon request. These additional materials will not be object of final evaluation.