Course 2018-2019 a.y.

20596 - MACHINE LEARNING

Department of Decision Sciences

Course taught in English

DSBA (8 credits - II sem. - OB | SECS-S/01)

Course Director:
DANIELE DURANTE

Classes: 23 (II sem.)

Instructors:
Class 23: DANIELE DURANTE

Prerequisites

Prerequisites: For a fruitful and effective learning experience, it is strongly recommended a basic preliminary knowledge in mathematics and linear algebra, descriptive statistics, probability and random variables, simple and multiple linear regression, likelihood-based inference, and generalized linear models. Students should also be familiar with basic statistical softwares.

Mission & Content Summary

MISSION

In 2009, the Chief Economist of Google, Hal Varian, said that Data Science would have been the most attractive job of the next ten years. He also claimed that understanding, processing and extracting value from data were going to be hugely important skills in many careers. He was right. Indeed, the Data Scientist is listed among the top jobs in the United States since several years now. The reason of this huge demand is simple and can be found in the words of Eric Schmidt, Chief Economist of Google after Hal Varian: "we create as much information in two days now as we did from the dawn of man through 2003". But information (data) is not knowledge. This fundamental translation process requires skills in database management, statistical learning, machine learning, computational statistics, along with a good intuition and the ability to deal with data, understand the analytic goals and interpret the final outputs. The course in Machine Learning aims at fostering these skills and provide students with the instruments and the mind-set to successfully deal with a wide range of data analytic problems they may find in their future jobs.

CONTENT SUMMARY

INTRODUCTION: A smooth introduction to Machine Learning
LINEAR METHODS: High-dimensional linear regression; Logistic regression; Linear and quadratic discriminant analysis
MODEL ASSESSMENT AND SELECTION: Bias-variance trade-off; Training, test and validation sets; Cross-validation; Bootstrap
REGULARIZATION AND SHRINKAGE: Subset selection; Ridge regression; Lasso and related algorithms
METHODS BEYOND LINEARITY: Regression and smoothing splines; Local linear regression; Kernel methods; Generalized additive models
TREE-BASED METHODS: Regression and classification trees; Bagging; Random forests; Boosting
BEYOND TREE-BASED METHODS: Support vector machines; Neural networks

The above methods will be also implemented during LAB SESSIONS on real-world case studies. Code and implementation in classical statistical softwares, with a main focus on R, are also part of the course topics.

Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

Explain the methodology and theory underlying the classical machine learning methods
Illustrate the technical aspects related to the implementation of classical machine learning methods
Recognize the distinctive properties of each machine learning technique
Identify the most suitable machine learning technique for a given data analytic problem
Summarize differences and similarities between multiple machine learning techniques

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

Examine the relevant research questions underlying a real-data analytic problem
Choose a machine learning technique coherent with the analytic question and apply it to the dataframe
Identify relevant structures underlying the data and effectively predict unobserved events
Discuss the empirical output produced by a machine learning technique
Connect different machine learning techniques to improve predictive performance in complex analytic problems

Teaching methods

Face-to-face lectures
Exercises (exercises, database, software etc.)
Case studies /Incidents (traditional, online)
Individual assignments
Group assignments
Interactive class activities (role playing, business game, simulation, online forum, instant polls)

DETAILS

Classical face-to-face lectures will focus on the presentation and the discussion of the machine learning techniques covered by the course, with a main attention to methodology, theory and computational methods. To improve the learning experience and motivate the interaction, illustrative case studies and in-class exercises may also be considered.

A series of lab sessions, with the students working on their own laptop, will be also provided. These classes will, typically (but not always), consist of two main parts:

The students will be guided in the implementation of the machine learning techniques on standard statistical softwares, with a main focus on R. Some Python code will be also made available as supplement materials. To download R or R Studio see https://www.r-project.org/ and https://www.rstudio.com/.
After the guided implementation, an in-class individual assignment (performed on a data competition platform) will ask the students to solve a specific predictive problem from a data analytic case study, leveraging suitable machine learning tools. This interactive class activity is expected to improve the autonomy of the students in answering a variety of real-world analytic questions, and will serve as a self-assessment occasion. Some other online data competitions may be provided as individual or group homeworks (not compulsory), to offer additional training materials for the interested students.

Assessment methods

	Continuous assessment	Partial exams	General exam
Written individual exam (traditional/online)			x
Individual assignment (report, exercise, presentation, project work etc.)			x

ATTENDING AND NOT ATTENDING STUDENTS

Due to the nature of the course, only a final general exam will be considered to evaluate, with the same criteria, attending and non-attending students. This assessment will consist of two main parts.

Traditional written individual exam which will consist of open-ended questions and small exercises. The focus is on evaluating students based on their methodological, theoretical and computational understanding of the machine learning techniques presented in the face-to-face lectures.
Individual assignment based on a data challenge where students are asked to develop and apply a data analytic strategy to answer a predictive problem. Such a data challenge will be a longer and more structured version of those proposed in the lab sessions, and will take place towards the end of the course. This assignment is managed via an online data competition platform, and the evaluation will consider the predictive performance of the analytic approach proposed by the student and the quality of a document describing the methods considered, the code, the final results and related comments.

Grading Rule: Let X denote the grade of the traditional written individual exam and let Y be the grade of the individual assignment. Then, if Y is greater than or equal to X, the final grade is 0.3*Y+0.7*X. Otherwise, if Y is less than X, the final grade is X.

Teaching materials

ATTENDING AND NOT ATTENDING STUDENTS

The course relies on two books which complement each other and are available online for free.

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Second Edition). Springer.
James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.

Slides summarizing the contents presented in class will be also provided. Students who are interested in deepening, individually, specific concepts will be provided with additional reading materials upon request. These additional materials will not be object of final evaluation.

Last change 15/01/2019 11:56