Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models

Morris, Wesley; Holmes, Langdon; Choi, Joon Suh; Crossley, Scott

doi:10.1007/s40593-024-00418-w

Return

Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models

Article Status

Published

Authors/contributors

Morris, Wesley (Author)
Holmes, Langdon (Author)
Choi, Joon Suh (Author)
Crossley, Scott (Author)

Title

Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models

Abstract

Recent developments in the field of artificial intelligence allow for improved performance in the automated assessment of extended response items in mathematics, potentially allowing for the scoring of these items cheaply and at scale. This study details the grand prize-winning approach to developing large language models (LLMs) to automatically score the ten items in the National Assessment of Educational Progress (NAEP) Math Scoring Challenge. The approach uses extensive preprocessing that balanced the class labels for each item. This was done by identifying and filtering over-represented classes using a classifier trained on document-term matrices and data augmentation of under-represented classes using a generative pre-trained large language model (Grammarly’s Coedit-XL; Raheja et al., 2023). We also use input modification schemes that were hand-crafted to each item type and included information from parts of the multi-step math problem students had to solve. Finally, we finetune several pre-trained large language models on the modified input for each individual item in the NAEP automated math scoring challenge, with DeBERTa (He et al., 2021a) showing the best performance. This approach achieved human-like agreement (less than QWK 0.05 difference from human–human agreement) on nine out of the ten items in a held-out test set.

Publication

International Journal of Artificial Intelligence in Education

Date

2024-7-18

Volume

35

Issue

2

Pages

559-586

Journal Abbr

Int. J. Artif. Intell. Educ.

DOI

10.1007/s40593-024-00418-w

Citation Key

morris2024

URL

https://link.springer.com/10.1007/s40593-024-00418-w

Accessed

31/07/2024, 15:46

ISSN

1560-4292

Language

en

Library Catalogue

DOI.org (Crossref)

Extra

<标题>: 使用大型语言模型对数学评估中的构建式回答题进行自动评分 <AI Smry>: This study details the grand prize-winning approach to developing large language models (LLMs) to automatically score the ten items in the National Assessment of Educational Progress (NAEP) Math Scoring Challenge, with DeBERTa (He et al., 2021a) showing the best performance. Read_Status: New Read_Status_Date: 2026-01-26T11:33:38.512Z

Citation

Morris, W., Holmes, L., Choi, J. S., & Crossley, S. (2024). Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models. International Journal of Artificial Intelligence in Education, 35(2), 559–586. https://doi.org/10.1007/s40593-024-00418-w

Link to this record

https://aievidencehub.org/lib/UF377ISZ