Enhancing Automated Scoring of Math Self-Explanation Quality Using LLM-Generated Datasets: A Semi-Supervised Approach

Nakamoto, Ryosuke; Flanagan, Brendan; Yamauchi, Taisei; Dai, Yiling; Takami, Kyosuke; Ogata, Hiroaki

doi:10.3390/computers12110217

Enhancing Automated Scoring of Math Self-Explanation Quality Using LLM-Generated Datasets: A Semi-Supervised Approach

Article Status

Published

Authors/contributors

Nakamoto, Ryosuke (Author)
Flanagan, Brendan (Author)
Yamauchi, Taisei (Author)
Dai, Yiling (Author)
Takami, Kyosuke (Author)
Ogata, Hiroaki (Author)

Title

Enhancing Automated Scoring of Math Self-Explanation Quality Using LLM-Generated Datasets: A Semi-Supervised Approach

Abstract

In the realm of mathematics education, self-explanation stands as a crucial learning mechanism, allowing learners to articulate their comprehension of intricate mathematical concepts and strategies. As digital learning platforms grow in prominence, there are mounting opportunities to collect and utilize mathematical self-explanations. However, these opportunities are met with challenges in automated evaluation. Automatic scoring of mathematical self-explanations is crucial for preprocessing tasks, including the categorization of learner responses, identification of common misconceptions, and the creation of tailored feedback and model solutions. Nevertheless, this task is hindered by the dearth of ample sample sets. Our research introduces a semi-supervised technique using the large language model (LLM), specifically its Japanese variant, to enrich datasets for the automated scoring of mathematical self-explanations. We rigorously evaluated the quality of self-explanations across five datasets, ranging from human-evaluated originals to ones devoid of original content. Our results show that combining LLM-based explanations with mathematical material significantly improves the model’s accuracy. Interestingly, there is an optimal limit to how many synthetic self-explanation data can benefit the system. Exceeding this limit does not further improve outcomes. This study thus highlights the need for careful consideration when integrating synthetic data into solutions, especially within the mathematics discipline.

Publication

Computers

Date

2023-10-24

Volume

12

Issue

11

Pages

217

Journal Abbr

Computers

DOI

10.3390/computers12110217

Citation Key

nakamoto2023

URL

https://www.mdpi.com/2073-431X/12/11/217

Accessed

31/07/2024, 15:44

ISSN

2073-431X

Short Title

Enhancing Automated Scoring of Math Self-Explanation Quality Using LLM-Generated Datasets

Language

en

Library Catalogue

DOI.org (Crossref)

Rights

https://creativecommons.org/licenses/by/4.0/

Extra

<标题>: 利用大语言模型生成的数据集增强数学自我解释质量的自动评分：一种半监督方法 Read_Status: New Read_Status_Date: 2026-01-26T11:33:43.293Z

Citation

Nakamoto, R., Flanagan, B., Yamauchi, T., Dai, Y., Takami, K., & Ogata, H. (2023). Enhancing Automated Scoring of Math Self-Explanation Quality Using LLM-Generated Datasets: A Semi-Supervised Approach. Computers, 12(11), 217. https://doi.org/10.3390/computers12110217

Link to this record

https://aievidencehub.org/lib/MTX4MSKG