Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses
Article Status
Published
Authors/contributors
- Sami Baral (Author)
- Eamon Worden (Author)
- Wen-Chiang Lim (Author)
- Zhuang Luo (Author)
- Christopher Santorelli (Author)
- Ashish Gurung (Author)
- Benjamin, Paaßen (Contributor)
- Carrie, Demmans Epp (Contributor)
Title
Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses
Abstract
The effectiveness of feedback in enhancing learning outcomes is well documented within Educational Data Mining (EDM). Various prior research have explored methodologies to enhance the effectiveness of feedback to students in various ways. Recent developments in Large Language Models (LLMs) have extended their utility in enhancing automated feedback systems. This study aims to explore the potential of LLMs in facilitating automated feedback in math education in the form of numeric assessment scores. We examine the effectiveness of LLMs in evaluating student responses and scoring the responses by comparing 3 different models: Llama, SBERT-Canberra, and GPT4 model. The evaluation requires the model to provide a quantitative score on the student's responses to open-ended math problems. We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-provided scores for middle-school math problems. A similar approach was taken for training the SBERT-Canberra model, while the GPT4 model used a zero-shot learning approach. We evaluate and compare the models' performance in scoring accuracy. This study aims to further the ongoing development of automated assessment and feedback systems and outline potential future directions for leveraging generative LLMs in building automated feedback systems.
Date
2024-07-12
Short Title
Automated Assessment in Math Education
Accessed
31/07/2024, 15:49
Library Catalogue
DOI.org (Datacite)
Rights
Creative Commons Attribution 4.0 International
Extra
Publisher: International Educational Data Mining Society
Citation Key: samibaral2024
<标题>: 数学教育中的自动化评估:针对开放式回答的大型语言模型比较分析
Citation
Sami Baral, Eamon Worden, Wen-Chiang Lim, Zhuang Luo, Christopher Santorelli, & Ashish Gurung. (2024). Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses. https://doi.org/10.5281/ZENODO.12729932
Link to this record