Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses

Sami Baral; Eamon Worden; Wen-Chiang Lim; Zhuang Luo; Christopher Santorelli; Ashish Gurung

doi:10.5281/ZENODO.12729932

Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses

Article Status

Published

Authors/contributors

Sami Baral (Author)
Eamon Worden (Author)
Wen-Chiang Lim (Author)
Zhuang Luo (Author)
Christopher Santorelli (Author)
Ashish Gurung (Author)
Benjamin, Paaßen (Contributor)
Carrie, Demmans Epp (Contributor)

Title

Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses

Abstract

The effectiveness of feedback in enhancing learning outcomes is well documented within Educational Data Mining (EDM). Various prior research have explored methodologies to enhance the effectiveness of feedback to students in various ways. Recent developments in Large Language Models (LLMs) have extended their utility in enhancing automated feedback systems. This study aims to explore the potential of LLMs in facilitating automated feedback in math education in the form of numeric assessment scores. We examine the effectiveness of LLMs in evaluating student responses and scoring the responses by comparing 3 different models: Llama, SBERT-Canberra, and GPT4 model. The evaluation requires the model to provide a quantitative score on the student's responses to open-ended math problems. We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-provided scores for middle-school math problems. A similar approach was taken for training the SBERT-Canberra model, while the GPT4 model used a zero-shot learning approach. We evaluate and compare the models' performance in scoring accuracy. This study aims to further the ongoing development of automated assessment and feedback systems and outline potential future directions for leveraging generative LLMs in building automated feedback systems.

Date

2024-07-12

DOI

10.5281/ZENODO.12729932

Short Title

Automated Assessment in Math Education

URL

https://zenodo.org/doi/10.5281/zenodo.12729932

Accessed

31/07/2024, 15:49

Library Catalogue

DOI.org (Datacite)

Rights

Creative Commons Attribution 4.0 International

Extra

Publisher: International Educational Data Mining Society Citation Key: samibaral2024 <标题>: 数学教育中的自动化评估：针对开放式回答的大型语言模型比较分析

Citation

Sami Baral, Eamon Worden, Wen-Chiang Lim, Zhuang Luo, Christopher Santorelli, & Ashish Gurung. (2024). Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses. https://doi.org/10.5281/ZENODO.12729932

Link to this record

https://aievidencehub.org/lib/7I9MPLND