LLMs in Short Answer Scoring: Limitations and Promise of Zero-Shot and Few-Shot Approaches

Chamieh, Imran; Zesch, Torsten; Giebermann, Klaus

Open in Zotero

View on zotero.org

Open in Zotero

View on zotero.org

LLMs in Short Answer Scoring: Limitations and Promise of Zero-Shot and Few-Shot Approaches

Article Status

Published

Authors/contributors

Chamieh, Imran (Author)
Zesch, Torsten (Author)
Giebermann, Klaus (Author)
Kochmar, Ekaterina (Editor)
Bexte, Marie (Editor)
Burstein, Jill (Editor)
Horbach, Andrea (Editor)
Laarmann-Quante, Ronja (Editor)
Tack, Anaïs (Editor)
Yaneva, Victoria (Editor)
Yuan, Zheng (Editor)

Title

LLMs in Short Answer Scoring: Limitations and Promise of Zero-Shot and Few-Shot Approaches

Abstract

In this work, we investigate the potential of Large Language Models (LLMs) for automated short answer scoring. We test zero-shot and few-shot settings, and compare with fine-tuned models and a supervised upper-bound, across three diverse datasets. Our results, in zero-shot and few-shot settings, show that LLMs perform poorly in these settings: LLMs have difficulty with tasks that require complex reasoning or domain-specific knowledge. While the models show promise on general knowledge tasks. The fine-tuned model come close to the supervised results but are still not feasible for application, highlighting potential overfitting issues. Overall, our study highlights the challenges and limitations of LLMs in short answer scoring and indicates that there currently seems to be no basis for applying LLMs for short answer scoring.

Proceedings Title

Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Publisher

Association for Computational Linguistics

Place

Mexico City, Mexico

Date

2024-06

Pages

309–315

Citation Key

chamieh2024

URL

https://aclanthology.org/2024.bea-1.25

Extra

<标题>: 短答题评分中的大型语言模型：零样本与少样本方法的局限性与潜力 Read_Status: New Read_Status_Date: 2026-01-26T11:33:57.130Z

Citation

Chamieh, I., Zesch, T., & Giebermann, K. (2024). LLMs in Short Answer Scoring: Limitations and Promise of Zero-Shot and Few-Shot Approaches. In E. Kochmar, M. Bexte, J. Burstein, A. Horbach, R. Laarmann-Quante, A. Tack, V. Yaneva, & Z. Yuan (Eds.), Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024) (pp. 309–315). Association for Computational Linguistics. https://aclanthology.org/2024.bea-1.25

Link to this record

https://aievidencehub.org/lib/MGGBQL4B