Generative Large Language Models: Performance, Measurement and Biases
-
27 May 2025
12:30-14:00, SCR, Nuffield College
- Political Science Seminars Add to Calendar
Purdue University
Abstract: Generative large language models (LLMs) are increasingly used in the social sciences for data generation and text annotation, yet concerns remain about their biases and performance. This talk addresses these issues in two parts. First, we examine political biases in LLM output by analyzing responses to sensitive political questions across languages spoken in politically divergent societies. Focusing on OpenAI’s GPT-3.5 and GPT-4, we find that model outputs are more conservative in languages associated with conservative societies, and that GPT-4 tends to produce more left-leaning responses than GPT-3.5. Second, we evaluate LLM performance on complex annotation tasks using specialized political science texts. We propose a memory-based annotation approach, where the model retains its own prior classifications. This method significantly outperforms few-shot chain-of-thought prompting, suggesting a new direction for improving LLM-based annotation tasks.
The Political Science Seminar Series is convened by Rachel Bernhard and Tarik Abou-Chadi. For more information on this or any of the seminars in the series, please contact politics.secretary@nuffield.ox.ac.uk.