Course 2024-2025 a.y.

20592 - STATISTICS AND PROBABILITY

Department of Decision Sciences

Course taught in English

Student consultation hours
Go to class group/s: 23
DSBA (8 credits - I sem. - OB  |  2 credits MAT/06  |  6 credits SECS-S/01)
Course Director:
SONIA PETRONE

Classes: 23 (I sem.)
Instructors:
Class 23: SONIA PETRONE


Mission & Content Summary

MISSION

A solid background in Probability and Statistics is an absolute MUST for a data scientist, in whatever field she/he is willing to work. This course aims at providing such a solid methodological background. We start with a recap of fundamental notions in probabiilty theory and stochastic processes (in particular, Markov chains), presented in a friendly but rigorous way. Then the core of the course covers classical statistical inference, parametric and nonparametric, consolidating the metodological basis of maximum likelihood estimation, confidence intervals and tests; and we end with an introduction to Bayesian learning. This all is treated "modernely", having in mind the "exlain or predict" big debate; simplifying a lot, "classical statistics" towards "modern statistics" and machine learning. The course is integrated with a "Computations and Applications" modulus, that presents computational tools (stochastic integration & Monte Carlo, optimization, bootstrap, EM, Markov chain Monte Carlo) and applications on real data, particularly in regression contexts (lm and glm) and/or in multiple testing, using R and Python. The lectures include frontal lecturing, group work with periodic take-home assigments, coding and simulations and real data analysis.

CONTENT SUMMARY

 

PART I : Probability recap (Prof. Sandra Fortini)

  • Definition and basic properties
  • Random variables. Multivariate distributions
  • Expectation and conditional expectation.
  • Convergence of random variables.
  • Basic notions on stocastic processes. Random noise. Random walks. Markov chains. NOTE (this topic may be postponed later, when dealing with Inference on Markov chains)

 

Part II : Statistical inference (Prof. Sonia Petrone)

Models, Statistical Inference and Learning

Elements of nonparametric estimation.

The bootstrap.

Parametric Inference

MLE and asymptotics

Confidence intervals

Hypothesis testing and p-values

Regression.

 

+ Exercizes

+ Computations and Applications (part I), with R and Phyton (Prof. Yichen Zhu)

Computations: Stochastic integration and Monte Carlo.

Optimization. EM algorithm.

Parametric bootstrap (if time permits)

Applications: Regression and generalized linear models.

Multiple testing: an application in genomics (it time permits)

 

PART III - Bayesian learning (Prof Sonia Petrone)

  • Fundamentals of Bayesian learning
  • Bayes rule and examples.
  • Bayesian linear regression (if time permits)

 

PART IV: Computational methods (Prof. Yichen Zhu)

Bayesian regression

Markov Chain Monte Carlo.

Applications


Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...
  • Define, describe and explain rigorously the main notions of probability and statistical learning in the frequentist and Bayesian approach.

 

* Identify computational strategies for fundamental complex problems

 

* Recognize the role of probability and statistics in "data science" and related fields

 

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...
  • Estimate and predict, and quantify uncertainty, in fundamental problems
  • Achieve a solid methodological background of probability and statistics on which they can build solid competence in data science
  • Make conscius statistical analysis in basic applications (with R)
  • Write algorithms in Python for the implementation of computational statistic techniques, namely optimization and integration techniques.

Teaching methods

  • Lectures
  • Practical Exercises
  • Individual works / Assignments
  • Collaborative Works / Assignments

DETAILS

Students will be given periodic group or individual assignments,  on the theory and applications (with R) and  on the implementation of computational methods (with Python).


Assessment methods

  Continuous assessment Partial exams General exam
  • Written individual exam (traditional/online)
    x
  • Collaborative Works / Assignment (report, exercise, presentation, project work etc.)
x    

ATTENDING AND NOT ATTENDING STUDENTS

ASSIGNMENTS:

Students wil be given periodic take-home assigments, on the theory and computational methods presented in class.

The assigments can be done individually or in groups (up to 4  people; exceptionally with motivated request 5 people).

 

The assignments are meant to support and engage students to follow and verify their ongoing understanding along the lectures - actually, students usually find them very helpful!

As such, the assigments are not mandatory and are not formally evaluated; however, students' work on the assigments is acknowldged in the final exam:


***  students who did not deliver the assigments will have additional questions in the written proof, with up to 10 min of extra time. These questions do not contribute to the final grade if the are reasonably well answered; but will  penalize the final grade if poorly done.

 

*** students who did deliver the assigments will not have to answer those additional questions.

 

EXAM:

The exam will consist in an individual written proof  that will count 70%, and a final project on computational methods, that counts 30%.

 

NOTE 1: The final project is done in groups, while the written proof is individual. Therefore, the written proof may count 100% if poorly done.

 

NOTE 2: The exam structure might be slightly modified, in order to accomodate for unforseen issues (as it happened with the COVID-19 pandemic), taking into account the  students' needs. In that case, students will be promptly informed, through BBoard announcements and more.

 

NOTE 3: The assigments contribute to the achievement of all the learning objectices of the course; 

in particular are of support to
- *define, describe and explain rigorously the main notions of probability and statistical learning in the frequentist and Bayesian approach,

which is the necessary basis for being able to

*estimate and predict, and quantify uncertainty, in fundamental problems and

*recognize the role of probability and statistics in "data science" and related fields;
-  *Identify computational strategies for fundamental complex problems and implement those statistical techniques, *writing algorithms in Python.


The EXAM aims at assessing all the learning objectives. In particular,

-- the written proof aims at assessing ILOs:
 *define, describe and explain rigorously the main notions of probability and statistical learning in the frequentist and Bayesian approach and

*estimate and predict,   and quantify uncertainty*, and *recognize the role of probability and statistics in "data science";
--  the final project aims at assessing ILOs
   * Identify computational strategies for fundamental complex problems, and write algorithms in Python for the implementation of computational statistic techniques,

 

 

 

 

 

 


Teaching materials


ATTENDING AND NOT ATTENDING STUDENTS

textbook

  • L. Wasserman, "All of Statistics", 2009, Springer.

 

More teaching material, lecture notes, R and Python code etc will be provided on BBoard.

Last change 26/05/2024 13:16