Leveraging <span style="font-variant:small-caps;">LLM</span> respondents for item evaluation: A psychometric analysis
Article Status
    Published
Authors/contributors
    - Liu, Yunting (Author)
- Bhandari, Shreya (Author)
- Pardos, Zachary A. (Author)
Title
    Leveraging <span style="font-variant:small-caps;">LLM</span> respondents for item evaluation: A psychometric analysis
Abstract
    Effective educational measurement relies heavily on the curation of well‐designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT‐3.5, GPT‐4, Llama 2, Llama 3, Gemini‐Pro and Cohere Command R Plus) to generate responses with psychometric properties comparable to those of human respondents. Results indicate that some LLMs exhibit proficiency in College Algebra that is similar to or exceeds that of college students. However, we find the LLMs used in this study to have narrow proficiency distributions, limiting their ability to fully mimic the variability observed in human respondents, but that an ensemble of LLMs can better approximate the broader ability distribution typical of college students. Utilizing item response theory, the item parameters calibrated by LLM respondents have high correlations (eg, >0.8 for GPT‐3.5) with their human calibrated counterparts. Several augmentation strategies are evaluated for their relative performance, with resampling methods proving most effective, enhancing the Spearman correlation from 0.89 (human only) to 0.93 (augmented human).
Publication
    British Journal of Educational Technology
Volume
    56
Issue
    3
Pages
    1028-1052
Date
    2025-2-24
Journal Abbr
    Brit. J. Educ. Technol.
Language
    en
ISSN
    0007-1013
Short Title
    Leveraging <span style="font-variant
Accessed
    09/07/2025, 15:31
Library Catalogue
    DOI.org (Crossref)
Extra
    Citation Key: liu2025
<标题>: 利用 <span style="font-variant:small-caps;">LLM</span> 受访者进行题目评估:一项心理测量分析
Citation
    Liu, Y., Bhandari, S., & Pardos, Z. A. (2025). Leveraging LLM respondents for item evaluation: A psychometric analysis. British Journal of Educational Technology, 56(3), 1028–1052. https://doi.org/10.1111/bjet.13570
        Link to this record