30001 - STATISTICA / STATISTICS
For the instruction language of the course see class group/s below
Class group/s taught in English
For a fruitful attendance students are strongly advised to have a basic understanding of the concepts of probability theory and random variables. These arguments can be found in the chapters 3, 4 and 5 of the course textbook. In particular it is suggested to look carefully at the topics covered in paragraphs 4.3 and 4.7.
In the last decade an unprecedented revolution has taken place in the collection of and accessibility to all types of data: for example, it is estimated that 90% of the data present today was created in the last two years. The possibility of collecting such a huge mass of data does not mean however a direct increase in the knowledge on the various phenomena; indeed the opposite is possible. Besides the relevant technical problems due to huge dataset processing (big-data), an accurate analysis of such data cannot avoid taking into account, for example, their different natures, their complexity, their inter-relationships, etc. Therefore, the course is meant to provide the first essential theoretical and applied instruments to carry out a rigorous statistical analysis. In particular, the student learn not only to extract information from data, but also to assess the reliability of such information.
The course covers the following broad areas:
- Collection, management and summary of data using frequency distributions, graphical representations and indexes.
- Study of the relationship between two variables.
- Statistical inference and sampling variability.
- Theory of point estimation and confidence intervals.
- Hypothesis testing.
- Simple regression model and brief introduction to the multiple regression model.
- Recognize different types of data.
- Understand the difference between the tools of descriptive and inferential statistics, and identify the most suitable approach for the problem at hand.
- Recognize simple statistical models.
- Properly summarize a dataset.
- Estimate, and test hypotheses on, the unknown parameters of a population on the basis of sample data.
- Build simple statistical models, as regression models, aimed at studying the relationships between variables of interest.
- Use the R software to find the solutions to the aforementioned problems.
- Face-to-face lectures
- Exercises (exercises, database, software etc.)
- Case studies /Incidents (traditional, online)
Beyond the traditional classroom lectures, the teaching method adopts practical sessions using the statistical software R to solve the problems previously illustrated. More specifically, during these sessions students use their pc’s to solve several problems together with the instructor. A real-world dataset is used throughout all the course, thus providing an exhaustive example (with respect to the course contents) of a practical statistical analysis.
|Continuous assessment||Partial exams||General exam|
The assessment method, equal for attending and not-attending students, considers two alternative ways: 1) three partial exams, 2) a general exam.
Two Partial Exams (PE1,PE2) are traditional written exams (at most 31/30), while in the third one using the R software (PR), the students are asked to conduct a short data analysis session to answer some questions. This last partial exam is worth at most 4 points that are added to the weighted average grade of the remaining two partial exams. Thus the final mark is given by: [ (PE1+PE2)/2]*(27/31) + PR.
A general written exam (at most 31/30). The exam contains explicit questions on the code of the R software, on its working principles and on the interpretation of its output. The R-related questions are worth 4 points. A total grade of 31/30 is equivalent to 30/30 cum laude.
Both forms of the exam aim at assessing:
- The ability to identify the proper methodology to solve a given problem.
- The understanding of the logic underlying a certain procedure.
- The ability to compute appropriate statistical measures with both a pocket calculator and a statistical software.
- The ability of suggesting and implementing with R a statistical model, consistent with both the assumptions stated and the data at hand.
- The ability to understand the output from the software.
- P. NEWBOLD, W.L. CARLSON, B. THORNE, Statistics for Business and Economics, Pearson/Prentice Hall, 9th global edition.
Additional material document on Frequency Distributions, available on the Bboard platform.
- Specific material on the use of the R software are available on the Bboard platform since the beginning of the course.