READ THE DAMN DOCUMENTATION (CAREFULLY). A case study using the PISA data.".
-
5 Nov 2025
16:00-17:30, Lecture Theatre, Nuffield College
- Sociology Seminar Add to Calendar
UCL
Abstract: When you get access to a new dataset, do you always carefully read the documentation first? We all know you should. But – let’s be honest – it’s a lot more fun to just start playing about with the data. This can however be a dangerous game to play. Even when survey instruments are taken at face-value, things may not be how they first appear. This paper presents a case study of this matter using the OECD’s Programme for International Student Assessment (PISA). A survey question included in this study attempts to measure student truancy across countries over time. The international survey documentation suggests an identical question has been used across cycles. Yet the national documentation illustrates how a subtle – yet important – change to the wording was made in some countries in 2015. We demonstrate researchers could easily miss this change and demonstrate the how this would lead to substantially different conclusions regarding the effect of the COVID-19 pandemic on levels of school truancy. Attempts to use artificial intelligence and large language models to spot this problem resulted in overconfidently incorrect advice. The findings thus serve as a reminder to even the most experienced data analysts (including ourselves) – ALWAYS READ THE SURVEY DOCUMENTATION CAREFULLY.
The Sociology Seminar Series for Trinity Term is convened by Jan O Jonsson, Ridhi Kashyap, Colin Mills and Christiaan Monden. For more information about this or any of the seminars in the series, please contact sociology.secretary@nuffield.ox.ac.uk.