20596 - MACHINE LEARNING
Department of Decision Sciences
DANIELE DURANTE
Suggested background knowledge
Mission & Content Summary
MISSION
CONTENT SUMMARY
- Introduction: An introduction to Machine Learning.
- Linear methods: High-dimensional linear regression; Logistic regression; Discriminant analysis.
- Model assessment and selection: Bias-variance trade-off; Training, test and validation sets; Cross-validation; Bootstrap.
- Regularization and shrinkage: Subset selection; Ridge regression; Lasso; Elastic-net and related algorithms.
- Methods beyond linearity: Regression and smoothing splines; K-nearest neighbors; Local linear regression; Kernel methods; GAM; MARS.
- Tree-based methods: Regression and classification trees; Bagging; Random forests; Gradient boosting methods; Stacking.
- Beyond tree-based methods: Support vector machines; Projection pursuit; Neural networks.
The above methods are also implemented during LAB sessions on real-world case studies. Code and implementation in classical statistical softwares are also part of the course topics.
Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
- Explain the methodology and theory underlying the classical Machine Learning methods.
- Illustrate the technical aspects related to the implementation of classical Machine Learning methods.
- Recognize the distinctive properties of each Machine Learning technique.
- Identify the most suitable Machine Learning technique for a given data analytic problem.
- Summarize differences and similarities between multiple Machine Learning techniques.
APPLYING KNOWLEDGE AND UNDERSTANDING
- Examine the relevant research questions underlying a real-data analytic problem.
- Choose a Machine Learning technique coherent with the analytic question and apply it to the dataframe.
- Identify relevant structures underlying the data and effectively predict unobserved events.
- Discuss the empirical output produced by a Machine Learning technique.
- Connect different Machine Learning techniques to improve predictive performance in complex analytic problems.
Teaching methods
- Face-to-face lectures
- Exercises (exercises, database, software etc.)
- Case studies /Incidents (traditional, online)
- Individual assignments
- Interactive class activities (role playing, business game, simulation, online forum, instant polls)
DETAILS
Classical face-to-face lectures focus on the presentation and the discussion of the Machine Learning techniques covered by the course, with a main attention to methodology, theory and computational methods. To improve the learning experience and motivate the interaction, illustrative case studies and in-class exercises may also be considered.
A series of lab sessions, with the students working on their own laptop, are also provided. These classes, typically (but not always), consist of two main parts:
- The students are guided in the implementation of the Machine Learning techniques on standard statistical softwares, such as R and Python.
- After the guided implementation, an in-class individual assignment (performed on the Bocconi Data Science Challenges platform) asks the students to solve a specific predictive problem from a data analytic case study, leveraging suitable Machine Learning tools. This interactive class activity is expected to improve the autonomy of the students in answering a variety of real-world analytic questions, and serve as a self-assessment occasion. Some other online data competitions may be provided as individual homeworks (not compulsory), to offer additional training materials for the interested students.
Assessment methods
Continuous assessment | Partial exams | General exam | |
---|---|---|---|
|
x | ||
|
x |
ATTENDING AND NOT ATTENDING STUDENTS
Due to the nature of the course, only a final general exam is considered to evaluate, with the same criteria, attending and non-attending students. This assessment consists of two main parts.
- Traditional written individual exam which consists of open and closed answers questions, and small exercises. The focus is on evaluating students based on their methodological, theoretical and computational understanding of the Machine Learning techniques presented in the face-to-face lectures.
- Individual assignment based on a data challenge where students are asked to develop and apply a data analytic strategy to answer a predictive problem. Such a data challenge is a longer and more structured version of those proposed in the lab sessions, and takes place towards the end of the course. This assignment is managed via the Bocconi Data Science Challenges platform, and the evaluation considers the predictive performance of the analytic approach proposed by the student along with the quality of a document describing the methods considered, the code, the final results and related comments.
Grading rule: Let X denote the grade of the traditional written individual exam and let Y be the grade of the individual assignment. Then, if Y is greater than or equal to X, the final grade is 0.3*Y+0.7*X. Otherwise, if Y is less than X, the final grade is X.
Teaching materials
ATTENDING AND NOT ATTENDING STUDENTS
The course relies, mostly, on two books which complement each other and are available online for free.
- T. HASTIE, R. TIBSHIRANI, J. FRIEDMAN (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Second Edition.
- G. JAMES, D. WITTEN, T. HASTIE, R. TIBSHIRANI (2013). An Introduction to Statistical Learning with Applications in R. Springer.
Other useful secondary references are listed below:
- K.P. MURPHY (2021) Machine Learning: A Probabilistic Perspective. MIT press.
- C.M. BISHOP (2006) Pattern Recognition and Machine Learning. Springer.
Slides, videos and clarification notes summarizing the contents presented in class are also provided. Students who are interested in deepening, individually, specific concepts are provided with additional reading materials upon request. These additional materials not are object of final evaluation.