30401  MATHEMATICS AND STATISTICS  MODULE 2 (STATISTICS)
Department of Decision Sciences
OMIROS PAPASPILIOPOULOS
Suggested background knowledge
Mission & Content Summary
MISSION
CONTENT SUMMARY
The course is organized in themes. Each theme starts with a theme overview, it introduces some motivating data and associated scientific questions and then develops the statistical tools (models, algorithms, mathematical concepts) needed to gather knowledge from the data to address the motivating questions. The theme finishes with a summary and exercises.
The themes are:
1. Data visualization and summarization
Data: heart attack study, Shipman's dead patients, daily homicides, test results, jelly beans competition
Concepts: barplots, box plots, means and medians and variational formulation, logarithmic scale, correlation and distance correlation
2. From randomization to randomness
Data: chocolates and nobel prizes, university admission data, death penalty data
Concepts: spurious correlations, experimental vs observational data, random numbers, randomized control trials, confounders, simpson's paradox
3. What is probability and what is it useful for
Concepts: Bernoulli distr., probability densities, Poisson distribution, series and limits, learning a model from the data
4. The calculus of probability
Concepts: events, basic set theory, axioms of probability
5. More models for more data
Data: birth weights, human heights, heart transplant survival data
Concepts: density functions, Gaussian distribution, survival analysis, exponential distribution, censoring, gamma distribution and special functions, uniform distribution, transformation of variables, simulation of random variables
6. Joint distributions, independence and combinatorics
Data: 10 year maturity bonds, heights of fathers and sons, the Sally Clark story
Concepts: joint and marginal distributions, independence, statistical arguments in Law, the binomial distribution, binomial coefficients, basic combinatorics
7. Expectation
Concepts: expected value and interpretation, properties of expectation, moments, variance, standard deviation and interpretation, the uncertainty rule of thumb, skewness and interpretation, sample and population moments
8. Elements of Network Science
Data: the Internet, employees communication network, the actor network
Concepts: ErdosRenyi network model, degree distributions, six degrees of separation, heavy tails, scalefree property, power laws, the Studentt distribution
9. Concentration, inequalities and limit theorems
Concepts: Markov inequality, Chebyshev inequality, uncertainty quantification, weak law of large numbers, a basic understanding of the central limit theorem
10. Statistical learning
Data: cholestor and heart disease, armfolding and sex, bowel cancer rates in the UK
Concepts: quantifying evidence in data about a hypothesis, pvalue, Fisher exact test, multiple testing, confidence intervals from concentration inequalities, bootstrap and confidence intervals, funnel plots
Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
+ fomulate statistical learning questions
+ identify appropriate data analysis methodologies
+ carry out uncertainty quantification
+ learn basic models from data
APPLYING KNOWLEDGE AND UNDERSTANDING
+ choose appropriate data summaries and visualization
+ carry out basic network analysis
+ derive basic probability calculations
+ use statistical learning tools
Teaching methods
 Facetoface lectures
 Exercises (exercises, database, software etc.)
 Group assignments
 Interactive class activities (role playing, business game, simulation, online forum, instant polls)
DETAILS
Exercises (Exercises, database, software etc.):
Special sessions with exercises, examples and illustrations of concepts and methods, also with the help of statistical software R, will be provided.
Group assignments:
A project will be given for students to work in groups that will involve both methodology and data analysis
Assessment methods
Continuous assessment  Partial exams  General exam  


x  x  

x 
ATTENDING AND NOT ATTENDING STUDENTS
Students may choose between the following two options:
 Two partial written exams (a midterm and a final) that contribute to the final grade with a 50% weight each.
 A single general written exam (after the end of the course) that counts for 100% of the final mark.
The tests consist of exercises. They aim at ascertaining students' mastery of concepts and results discussed during lectures as well as an adequate knowledge of R.
In each test the maximum grade is 31.
The assessment method is the same for both attending and nonattending students.
Students who take the midterm exam may still take the general exam instead of taking the final exam.
Importantly, access to the final (or second partial) exam follows the rules indicated in Section 7.6 of the Guide to the University.
There will be an optional group project that will receive a maximum of 1.5/31 points. These will be added to the total mark achieved by the previous options.
Teaching materials
ATTENDING AND NOT ATTENDING STUDENTS
The teaching material will be primarily that developed during the classes and distributed to the students in a PDF format after each class.
The course will use examples and extracts primarily from the first book listed below. It is advisable to acquire this book either in its original publication or its Italian translation (it is also available as an ebook), since it is an excellent modern resource to learn Probability and Statistics and why these are fundamental in anything that has to do with learning from data.
Early chapters from the second book provide an excellent more technical introduction to Probability. The introduction and some Appendices of the third book provide an excellent and accessible introduction to statistical machine learning and the use of Probability and Statistics for designing and analyzing algorithms. The fourth is a textbook whose syllabus correlates highly with the contents of this course. For a number of basic concepts the corresponding Wikipedia pages are a great resource. Please use that instead of random blogs, webpages or videos posted on youtube.

Spiegelhalter, The Art of Statistics: How to Learn from Data, Penguin, 2019, ISBN 9781541618510 (available also in Italian translation)

Barabasi, Network Science
 Grimmett and Stirzaker, Probability and Random Processes, Oxford, Fourth Edition, 2020, ISBN 9780198847595
 Bishop, Pattern Recognition and Machine Learning, Springer, 2006, ISBN 9780387310732
 S. ROSS, Introduction to Probability and Statistics for Engineers and Scientists, Fourth Edition, Academic Press, 2014