20840 - DATA MINING FOR MARKETING, BUSINESS, AND SOCIETY
Department of Marketing
Course taught in English
QIAONI SHI
Suggested background knowledge
Mission & Content Summary
MISSION
CONTENT SUMMARY
- Foundations: law, ethics, and tools for web data
- Structured web data: XML, JSON and basic formats
- Web protocols, static pages, and HTML scraping
- Archived and dynamic web pages
- Documents and text from PDFs and Wikipedia
- APIs for economic and social data
- Automation and reproducible web data workflows
- AI & LLM for market research
Intended Learning Outcomes (ILO)
KNOWLEDGE AND UNDERSTANDING
By the end of the course, students are expected to:
- Understand the legal and ethical principles that govern web data collection and reuse for research and marketing applications.
- Explain how web protocols (HTTP, DNS, URLs) and data formats (HTML, XML, JSON, PDFs) underpin modern web data pipelines.
- Describe the role of web scraping, APIs, and archives in constructing datasets for market research and economic analysis.
- Understand the basic principles behind large language models and AI APIs used in the course, including prompts, roles, and structured outputs.
- Explain how LLMs can be integrated into market research workflows (e.g. sentiment analysis, text classification, survey coding, document comparison).
APPLYING KNOWLEDGE AND UNDERSTANDING
Upon successful completion, students will be able to:
- Use Python in Jupyter notebooks to collect data from static and dynamic web pages, archives, PDFs, and APIs, and convert it into structured formats suitable for analysis.
- Design and implement scraping and API-calling routines that respect ethical and legal constraints, including robots.txt and rate limits.
- Build reproducible pipelines that automate web data collection and basic cleaning steps for ongoing monitoring tasks.
- Apply AI and LLM APIs to real-world text datasets (e.g. reviews, social media posts, open-ended responses) to:
- classify sentiment and topics,
- extract entities and key phrases,
- compare documents using embeddings,
- generate structured outputs (e.g. coded categories for market research).
- Develop small, end-to-end projects that connect web data collection to LLM-based analysis in order to answer concrete marketing or economic research questions, and communicate the results in a clear and reproducible way.
Teaching methods
- Lectures
- Practical Exercises
- Individual works / Assignments
- Collaborative Works / Assignments
DETAILS
For each topic in the course, we combine short lectures with hands-on exercises in Jupyter notebooks. Students work directly with web data, APIs, and LLM services to implement the techniques covered in class. In the final part of the course, students develop a small project that connects web data collection with LLM-based analysis for a marketing or business application.
Assessment methods
| Continuous assessment | Partial exams | General exam | |
|---|---|---|---|
|
x | ||
|
x | ||
|
x |
ATTENDING STUDENTS
- Participation (20%)
Engagement in class activities, contribution to discussions, and completion of in-class exercises involving web data, APIs, and LLMs. - Assignments (30%)
Multiple assignments designed to practice web data collection and LLM techniques. These may include short reports or notebooks on scraping, API usage, text extraction, and LLM-based analysis of marketing-relevant data. - Final Exam (50%)
Written individual exam assessing both conceptual understanding (ethics, web protocols, data formats, basic LLM concepts) and practical skills (interpreting code snippets, reasoning about scraping/API requests, understanding LLM-based workflows).
Attendance will be registered at the beginning of all the sessions. To obtain the attending student status, students must be present in at least 75% of the lessons.
NOT ATTENDING STUDENTS
Written individual exam covering the full set of topics, including web data collection, APIs, automation, and the application of LLMs to market research data.
Teaching materials
ATTENDING AND NOT ATTENDING STUDENTS
- Jupyter notebooks and datasets provided by the instructor (web data collection, APIs, LLM examples).
- Course slides and additional readings posted on the learning platform (Blackboard / BBoard).
- Documentation and guides for selected APIs and AI services (links provided in class).
All essential teaching materials will be made available on the course website / Blackboard.