Course 2022-2023 a.y.

20570 - DATA ANALYTICS AND VISUALIZATION

Department of Decision Sciences

Course taught in English
Go to class group/s: 22
EMIT (8 credits - II sem. - OB  |  SECS-S/01)
Course Director:
RAFFAELLA PICCARRETA

Classes: 22 (II sem.)
Instructors:
Class 22: RAFFAELLA PICCARRETA


Suggested background knowledge

For an effective learning experience, it is strongly recommended to have basic notions of statistics, in particular of univariate and bivariate descriptive statistics and of the most relevant inferential concepts (samples, statistics, estimators, hypothesis testing, p-values). To this aim, an online preparatory course (20354) is available, including online tests to verify the level of knowledge and understanding of the concepts used during the course. Online meetings are organized on September and at the end of January/beginning of February. Students are expected to be able to work with Excel and Word (basic skills)

Mission & Content Summary

MISSION

Modern graduates need to use data to a much greater extent compared to their past counterparts. Data management (retrieving, filtering, or cleaning), exploratory data analysis, and appropriate data visualization are becoming more and more relevant in any field. In this course, students are introduced to problems related to the extraction of information from data collected on a relevant number of variables and cases, and gain an applied understanding of the most relevant techniques of data analytics, with specific reference to unsupervised learning. The key goal of the course is to illustrate methods useful to analyse and summarize the most salient features of data sets with respect to both the variables and the cases. The course features hands-on classes, where the application of each techniques is discussed with reference to real datasets.

CONTENT SUMMARY

Data analytics is a broad term that defines the activities in the process of analysing data to draw meaningful and actionable insights. It involves a number of steps and procedures, including:

  • Data manipulation and analysis, aimed at discovering the salient patterns in data
  • Visualisation (e.g. effective presentation) of results, interpretation and communication to stakeholders, in order to drive business strategy and outcomes.

 

The course introduces exploratory techniques to efficiently analyse, summarize and visualize data collected on (relatively) large sets of data. The goal is to reduce the dimension of data while preserving information about the most salient/distinctive features. Such simplification applies both to variables and to cases.

 

The course is articulated as follows:

  • Introduction to multivariate data
    In the first part of the course, summaries of data collected on many variables will be introduced, by extending to the multivariate case central tendency and dispersion measures
  • Dimensionality reduction techniques
    We will introduce Principal Components and Factor analysis, two techniques aimed at discovering low-dimensional indicators/summaries that capture some structure underlying the (possibly high-dimensional) input data
  • Clustering techniques
    The last part of the course introduces techniques to group cases based on their similarities or differences.


Beyond traditional classes, the course features hands-on classes, where the statistical software R - and in particular the integrated development environment (IDE) RStudio - is used to apply the considered techniques (Principal Components Analysis, Factor Analysis, and Cluster Analysis) to real data, and to properly interpret and present results, via suitable visualisation tools.

 


Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...
  • Indentify the technique most suitable to simplify relevant information in a dataset with reference to a specific goal of analysis.
  • Recognize appropriate and inappropriate applications and approaches with reference to a specific goal of analysis.
  • Justify the adoption of a specific path of analysis and the choices made during the analysis.
  • Compare the results obtained using different approaches, evaluate the stability of results.
  • Write R scripts to analyze data

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...
  • Design/develop a script in the R-programming language that read, manipulate, analyse and visualise data

  • Interpret and critically analyze results, emphasizing the most relevant conclusions both from a technical and from an interpretative point of view.
  • Effectively present the output, using suitable visualization tools allowing an immediate and unbiased understanding of the most salient features in data.

Teaching methods

  • Face-to-face lectures
  • Exercises (exercises, database, software etc.)
  • Group assignments

DETAILS

The course is articulated into different types of teaching methods:

  • Theory. Lessons introducing the most relevant theoretical concepts relative to each technique.
  • Theory&app. Lessons illustrating and discussing the appropriate application of the technique with reference to a specific problem and set of data. The choices left to the analyst and the possible available methods are presented. Criteria to evaluate and compare results are discussed.
  • R-labs. Lessons illustrating and discussing the scripts in the R-programming language employed to obtain the results discussed during the theory&app lessons.
  • Hands-on. Blocks of classes (3 for each technique) where teams of students work on assignment addressing a substantive problem using data analysis. Details on teams assignment are provided in the section on Assessment methods for attending students

Assessment methods

  Continuous assessment Partial exams General exam
  • Written individual exam (traditional/online)
x   x
  • Individual assignment (report, exercise, presentation, project work etc.)
    x
  • Group assignment (report, exercise, presentation, project work etc.)
x    
  • Active class participation (virtual, attendance)
x    
  • Peer evaluation
x    

ATTENDING STUDENTS

Effective class participation includes attendance, preparation, making an active and constructive contribution to the class discussion, asking questions, making constructive comments, and having a positive attitude toward learning.

 

To be considered attending, students must participate to the activities described below.

  • During the course, there will be 3 blocks of hands-on classes, one for each of the three techniques taught during the course. For each technique, teams of students will work on an assignment concerning a substantive problem using data analysis.
    Such assignments aim at assessing the ability to design a work flow to analyse data using the software R, as well as the ability to draw substantive conclusions based on the software output.
    Students must be able and ready to contribute to their team’s assignment, both with respect to the R-commands needed to perform the required analyses and with respect to the knowledge of the technique, in order to contribute both to the definition of the path of analysis and to the interpretation and critical evaluation of the obtained results. During each hands-on class, teams will answer to the specific questions presented in class writing a memorandum uploaded on Bboard by the end of the class.

 

  • Each block of hands-on classes will be followed by a session where individual tests will be administered containing questions on the theoretical aspects of the considered technique, on the results obtained during the hands-on classes, and on the aspects taken into account to develop the analysis presented by the students with their team.
    Such tests aim at assessing the knowledge on the techniques introduced in the course, also with respect to the obtained output.

 

To measure the acquisition of the learning outcomes, the students’ assessment is based on three main components:

  • The team assignment will count for the 30% of the final grade (9 points overall, 3 points for each block of hands-on classes). Students should be aware that a peer review process will be in place, and that critical situations reported by peers might imply substantial reduction of the final grade
  • The individual tests taken during the course will count for the 30% of the final grade (9 points overall, 3 points for each individual test)
  • A final exam (denoted as S - scritto - on the Bocconi website) at the end of the course – counting for the 40% of the final grade (12 points overall) – consisting in an in-class (lab) computer assignment.
    Students will use their own laptop to analyse a set of data using the techniques illustrated during the course, writing a script from the scratch using the software R and preparing a short report with their analysis, also offering a substantive interpretation of the obtained results.
    The exam aims at assessing the individual ability to apply the techniques illustrated during the course, to coherently design a work flow to analyse data using the software R and to draw substantive conclusions on the data at hand based on the software output.

 

Important:

  • For the hands-on sessions, the class will be divided into two groups (classes scheduled on the same day but at different time slot) following the alphabetical order (group 51, first half, and group 52, second half). The same division into two groups is adopted for the course 20620 – ECONOMICS OF BUSINESS STRATEGY - MODULE II (TRANSACTIONS AND INCENTIVES. For the sake of organisation, it is not possible to change group membership during the semester and students can attend only the hands-on classes dedicated to the group they are assigned to.
  • Students who skip more than one block of hands-on classes and more than one individual test cannot qualify as attending. Partial participation to team work will imply a proportional reduction of the grade on team assignment
  • There is no midterm exam.
  • To be admitted to the final exam it is mandatory to register to it. No exception will be made to this rule
  • Students of the past years who already sat for the final (practical) exam and/or who participated to the teams assignment in the past years cannot qualify as attending. This is in line with the rules stated in the syllabi of the past years. The same rule will apply to the students enrolled in the current academic year.

NOT ATTENDING STUDENTS

The non-attending students can take a final exam at the end of the course. Such exam will be organized as described for attending students, but besides the analysis of data using the software R, it will include some theoretical questions.

 

Important:

  • There is no midterm exam.
  • To be admitted to the final exam it is mandatory to register to it. No exception will be made to this rule
  • Students of the past years who already sat for the final (practical) exam and/or who participated to the teams assignment in the past years must take the exam as not attending. This is in line with the rules stated in the syllabi of the past years, and is coherent with the structure of the exam in the past years.

Teaching materials


ATTENDING AND NOT ATTENDING STUDENTS

Slides of the theoretical lessons are uploaded on the Bboard. These notes are complete and cover the whole program. For more detailed discussions of the topics, students can refer to:
 

  • R.A. JOHNSON, D.W. WICHERN, Applied Multivariate Statistical Analysis, Prentice Hall, 2002, 5th ed or subsequent editions
     
    OR
     
  • J. LATTIN, J.D. CARROLL, P.E. GREEN, Analyzing Multivariate Data, Thomson, 2003.
Last change 14/12/2022 22:56