Course 2025-2026 a.y.

20840 - DATA MINING FOR MARKETING, BUSINESS, AND SOCIETY

Department of Marketing

Course taught in English

Go to class group/s: 31

ACME (6 credits - II sem. - OP | SECS-P/08) - AFC (6 credits - II sem. - OP | SECS-P/08) - AI (6 credits - II sem. - OP | 12 credits SECS-P/08) - CLELI (6 credits - II sem. - OP | SECS-P/08) - CLMG (6 credits - II sem. - OP | SECS-P/08) - DES-ESS (6 credits - II sem. - OP | SECS-P/08) - DSBA (6 credits - II sem. - OP | SECS-P/08) - EMIT (6 credits - II sem. - OP | SECS-P/08) - ESS (6 credits - II sem. - OP | SECS-P/08) - FIN (6 credits - II sem. - OP | SECS-P/08) - GIO (6 credits - II sem. - OP | SECS-P/08) - IM (6 credits - II sem. - OP | SECS-P/08) - MM (6 credits - II sem. - OP | SECS-P/08) - PPA (6 credits - II sem. - OP | SECS-P/08)

Course Director:
QIAONI SHI

Classes: 31 (II sem.)

Instructors:
Class 31: QIAONI SHI

Suggested background knowledge

• Basic knowledge in Python programming (variables, functions, loops, working in Jupyter notebooks). • Basic statistics and data literacy (descriptive statistics, interpreting graphs and tables) are helpful but not strictly required.

Mission & Content Summary

MISSION

Digital traces and unstructured text from the web – product reviews, social media posts, news, policy documents – are central to modern market research and economic analysis. This course equips students with the skills to collect, structure, and analyze web data and then leverage large language models (LLMs) to extract insights relevant for marketing and business decision-making. Students learn how to (1) ethically collect data from the web (static and dynamic pages, archives, PDFs, APIs), (2) build reproducible data pipelines in Python, and (3) apply AI and LLM APIs (e.g. GPT-style models) to tasks such as sentiment analysis, topic discovery, and text coding for market research applications. The course sits at the intersection of marketing, data science, and computational social science, with an emphasis on hands-on work in Jupyter notebooks and real-world datasets.

CONTENT SUMMARY

Foundations: law, ethics, and tools for web data
Structured web data: XML, JSON and basic formats
Web protocols, static pages, and HTML scraping
Archived and dynamic web pages
Documents and text from PDFs and Wikipedia
APIs for economic and social data
Automation and reproducible web data workflows
AI & LLM for market research

Intended Learning Outcomes (ILO)

KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

By the end of the course, students are expected to:

Understand the legal and ethical principles that govern web data collection and reuse for research and marketing applications.
Explain how web protocols (HTTP, DNS, URLs) and data formats (HTML, XML, JSON, PDFs) underpin modern web data pipelines.
Describe the role of web scraping, APIs, and archives in constructing datasets for market research and economic analysis.
Understand the basic principles behind large language models and AI APIs used in the course, including prompts, roles, and structured outputs.
Explain how LLMs can be integrated into market research workflows (e.g. sentiment analysis, text classification, survey coding, document comparison).

APPLYING KNOWLEDGE AND UNDERSTANDING

At the end of the course student will be able to...

Upon successful completion, students will be able to:

Use Python in Jupyter notebooks to collect data from static and dynamic web pages, archives, PDFs, and APIs, and convert it into structured formats suitable for analysis.
Design and implement scraping and API-calling routines that respect ethical and legal constraints, including robots.txt and rate limits.
Build reproducible pipelines that automate web data collection and basic cleaning steps for ongoing monitoring tasks.
Apply AI and LLM APIs to real-world text datasets (e.g. reviews, social media posts, open-ended responses) to:
- classify sentiment and topics,
- extract entities and key phrases,
- compare documents using embeddings,
- generate structured outputs (e.g. coded categories for market research).
Develop small, end-to-end projects that connect web data collection to LLM-based analysis in order to answer concrete marketing or economic research questions, and communicate the results in a clear and reproducible way.

Teaching methods

Lectures
Practical Exercises
Individual works / Assignments
Collaborative Works / Assignments

DETAILS

For each topic in the course, we combine short lectures with hands-on exercises in Jupyter notebooks. Students work directly with web data, APIs, and LLM services to implement the techniques covered in class. In the final part of the course, students develop a small project that connects web data collection with LLM-based analysis for a marketing or business application.

Assessment methods

	Continuous assessment	General exam
Written individual exam (traditional/online)		x
Individual Works/ Assignment (report, exercise, presentation, project work etc.)	x
Active class participation (virtual, attendance)	x

ATTENDING STUDENTS

Participation (20%)
Engagement in class activities, contribution to discussions, and completion of in-class exercises involving web data, APIs, and LLMs.
Assignments (30%)
Multiple assignments designed to practice web data collection and LLM techniques. These may include short reports or notebooks on scraping, API usage, text extraction, and LLM-based analysis of marketing-relevant data.
Final Exam (50%)
Written individual exam assessing both conceptual understanding (ethics, web protocols, data formats, basic LLM concepts) and practical skills (interpreting code snippets, reasoning about scraping/API requests, understanding LLM-based workflows).

Attendance will be registered at the beginning of all the sessions. To obtain the attending student status, students must be present in at least 75% of the lessons.

NOT ATTENDING STUDENTS

Final Exam (100%)

Written individual exam covering the full set of topics, including web data collection, APIs, automation, and the application of LLMs to market research data.

Teaching materials

ATTENDING AND NOT ATTENDING STUDENTS

Jupyter notebooks and datasets provided by the instructor (web data collection, APIs, LLM examples).
Course slides and additional readings posted on the learning platform (Blackboard / BBoard).
Documentation and guides for selected APIs and AI services (links provided in class).

All essential teaching materials will be made available on the course website / Blackboard.

Last change 25/11/2025 21:14