# 20570 - DATA ANALYTICS AND VISUALIZATION

EMIT
Department of Decision Sciences

Course taught in English

Go to class group/s: 22

EMIT (8 credits - II sem. - OB  |  SECS-S/01)
Course Director:
RAFFAELLA PICCARRETA

Classes: 22 (II sem.)
Instructors:
Class 22: RAFFAELLA PICCARRETA

Course Objectives
The key aim of the course is to provide the students with basic skills in multivariate data analysis. In particular, students learn techniques and methods useful to analyze and summarize (relatively) large data sets, and to visualize the most relevant tendencies in data. All methods are taught through hands-on classes, during which the students analyze a number of databases relevant to their studies (using the software SAS). Some lessons are specifically dedicated to the analysis and discussion of the results obtained applying the considered techniques.

Course Content Summary
Introduction: data matrices.
• Multivariate samples. Summary statistics for multivariate samples. Geometric interpretation of data matrices. Space of the cases and distances. Total and generalized variance and their geometric interpretation.
Factorial Techniques.
• Principal component analysis (PCA). PC transformation. Property of PCs and their interpretation. Evaluation of results, and graphical representations.
• Factor analysis (FA). The Factor model: definition and assumptions. Parameter estimates: the principal component and the principal factor methods. Interpretation of factors: factors rotation. Factor Scores and factorial maps.
• Simple correspondence analysis (SCA). Association between categorical variables. Profiles and Chi-square metric. Factors and their interpretation. Graphical representation and analysis of results.
Dissimilarity matrices and clustering.
• Cluster analysis (CA). Distance and dissimilarity matrices. Hierarchical and partitioning clustering methods. Choice of the number of clusters. Criteria for the evaluation of a partition.

Detailed Description of Assessment Methods
For attending students
The final grade for attending students is based on a practical and on a theoretical exam, exactly as described for not attending students. Nonetheless, attending students can give the theoretical exam in two partial exams. The first partial exam concerns PCA and FA. The second partial exam concerns CA and SCA.

A student is considered as attending if
• He/she attended at least 4 of the lab sessions dedicated to PCA and FA (and at least one lab for each technique) and at least 4 of the lab sessions dedicated to CA and SCA (and at least one lab for each technique).

For non attending students
The final grade is based on
• A practical exam. Analysis of a real data set (Pc-lab session).
• A theoretical exam (written exam concerning the methodological issues discussed during the course and possibly comments on software output).
The practical and theoretical exam must be given in the same session.

Textbooks
• The slides of the course are made available on the blackboard.
For details and more in depth description of the techniques described in the course the following text can be referred to:
• R.A. JOHNSON, D.W. WICHERN, Applied Multivariate Statistical Analysis, Prentice Hall, 2002, 5th edition.
or:
• J. LATTIN, J.D. CARROLL, P.E. GREEN, Analyzing Multivariate Data, Thomson, 2003.

Prerequisites
• Basic notions of statistics. Descriptive statistics (univariate and bivariate). Most relevant inferential concepts (samples, statistics, estimators, hypothesis testing, p-values).
• Students are expected to be able to work with Excel and Word (basic skills).
Last change 05/06/2017 12:10