30607 - FOUNDATIONS OF DATA SCIENCE
Department of Decision Sciences
OMIROS PAPASPILIOPOULOS
Suggested background knowledge
Mission & Content Summary
MISSION
CONTENT SUMMARY
Part A: The basics
+ Intro to course and case studies
+ (Less) Basic Python programming
+ Basic data management and visualization with Python
+ Messy data and feature engineering
Part B: Predictive modelling
+ Fundamentals: supervised learning and optimization
+ Lasso regression
+ Classification
+ Representational learning pt1: trees, bagging and boosting
+ Representational learning pt2: neural networks
Part C: Uncertainty quantification and causal inference
+ Stability
+ Split sample methods, bootstrap, conformal inference
+ Elements of causal inference
+ Treatment effect estimation and double machine learning
+ Causal forests
Part D: Wrap up
+ Student project presentations
Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
- define data analysis methodology
- carry out basic data warehousing to represent, visualize and transform data
- build, train and evaluate machine learning models and algorithms
- Integrate machine learning with uncertainty quantification and basic causal inference
- develop models, algorithms and code
- understand the fundamental machine learning methodologies
APPLYING KNOWLEDGE AND UNDERSTANDING
- apply appropriate data analysis methodologies
- choose appropriate machine learning algorithms and evaluate their performance
- produce measures of uncertainty associated with the statistical learning
- carry out causal inference using appropriate assumptions and algorithms
- develop and adapt Python code for all the above tasks
Teaching methods
- Face-to-face lectures
- Exercises (exercises, database, software etc.)
- Case studies /Incidents (traditional, online)
- Individual assignments
- Group assignments
- Interactive class activities (role playing, business game, simulation, online forum, instant polls)
DETAILS
Combination of 5 basic approaches:
0. Videos distributed before course that review background knowledge in Statistics, computing and Python
1. few lectures on the foundations of the methodology
2. most of the lectures are based on jupyter notebooks where models and algorithms are illustrated directly on data and the students can interact with the code
3. guided project sessions
4. TA sessions on more practical coding aspects
Assessment methods
Continuous assessment | Partial exams | General exam | |
---|---|---|---|
|
x | ||
|
x | ||
|
x |
ATTENDING AND NOT ATTENDING STUDENTS
- 9/31 of the mark is on the basis of exercises given at the end of each theme and correspond to the guided project sessions.
- 13/31 of the mark is for a group project for Part B, done in groups of 4. This will take the form of a hackathlon managed through the Bocconi Data Science Challenges Platform.
- 9/31 of the mark is based on an individual final exam
Teaching materials
ATTENDING AND NOT ATTENDING STUDENTS
0. Videos distributed before the course
1. Jupyter notebooks
2. Lecture notes
Suggested references:
0. Art of Statistics
https://www.amazon.it/Art-Statistics-Learning-Data/dp/0241398630
This is an excellent book for understanding modern Statistics and it can serve as a preparation before starting the course
The following three books can be used to understand deeper the machine learning methods we will cover
1. Elements of Statistical Learning
https://www.amazon.it/Elements-Statistical-Learning-Inference-Prediction/dp/0387848576/ref=sr_1_1?adgrpid=54230735724&gclid=Cj0KCQjw-daUBhCIARIsALbkjSZOMjFXZB-g4Nbo7ccbC7-1-2vbv4NqoVYrCnkuIDKD94LaTcmy-OsaAk3sEALw_wcB&hvadid=255139979982&hvdev=c&hvlocphy=1008463&hvnetw=g&hvqmt=e&hvrand=3531467951480362546&hvtargid=kwd-299792246878&hydadcr=18578_1822585&keywords=elements+of+statistical+learning&qid=1654013448&sr=8-1
2. Pattern recognition and machine learning
https://www.amazon.it/Pattern-Recognition-Machine-Learning-Christopher/dp/0387310738
3. Deep Learning
https://www.deeplearningbook.org/
Parts of the course will also be based on the forthcoming book:
5. Veridical Data Science
The Practice of Responsible Data Analysis and Decision Making
6. There will be references to certain articles. The following four are particularly relevant for the aims of this course:
+ Statistical Modeling: The Two Cultures (2001) by Leo Breiman
+ Prediction, Estimation and Attribution (2020) by Brad Efron
+ Statistics in the big data era: Failures of the machine (2018) by
David Dunson
+ 50 years of Data Science (2017) by David Donoho