Leveraging <span style="font-variant:small-caps;">LLM</span> respondents for item evaluation: A psychometric analysis

Article Status
Published
Authors/contributors
Title
Leveraging <span style="font-variant:small-caps;">LLM</span> respondents for item evaluation: A psychometric analysis
Abstract
Effective educational measurement relies heavily on the curation of well‐designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT‐3.5, GPT‐4, Llama 2, Llama 3, Gemini‐Pro and Cohere Command R Plus) to generate responses with psychometric properties comparable to those of human respondents. Results indicate that some LLMs exhibit proficiency in College Algebra that is similar to or exceeds that of college students. However, we find the LLMs used in this study to have narrow proficiency distributions, limiting their ability to fully mimic the variability observed in human respondents, but that an ensemble of LLMs can better approximate the broader ability distribution typical of college students. Utilizing item response theory, the item parameters calibrated by LLM respondents have high correlations (eg, &gt;0.8 for GPT‐3.5) with their human calibrated counterparts. Several augmentation strategies are evaluated for their relative performance, with resampling methods proving most effective, enhancing the Spearman correlation from 0.89 (human only) to 0.93 (augmented human).
Publication
British Journal of Educational Technology
Volume
56
Issue
3
Pages
1028-1052
Date
2025-2-24
Journal Abbr
Brit. J. Educ. Technol.
Language
en
ISSN
0007-1013
Short Title
Leveraging <span style="font-variant
Accessed
09/07/2025, 15:31
Library Catalogue
DOI.org (Crossref)
Extra
Citation Key: liu2025 <标题>: 利用 <span style="font-variant:small-caps;">LLM</span> 受访者进行题目评估:一项心理测量分析
Citation
Liu, Y., Bhandari, S., & Pardos, Z. A. (2025). Leveraging LLM respondents for item evaluation: A psychometric analysis. British Journal of Educational Technology, 56(3), 1028–1052. https://doi.org/10.1111/bjet.13570
Powered by Zotero and Kerko.