Applying large language models for automated essay scoring for non-native Japanese

Li, Wenchao; Liu, Haitao

doi:10.1057/s41599-024-03209-9

Return

Applying large language models for automated essay scoring for non-native Japanese

Article Status

Published

Authors/contributors

Li, Wenchao (Author)
Liu, Haitao (Author)

Title

Applying large language models for automated essay scoring for non-native Japanese

Abstract

Recent advancements in artificial intelligence (AI) have led to an increased use of large language models (LLMs) for language assessment tasks such as automated essay scoring (AES), automated listening tests, and automated oral proficiency assessments. The application of LLMs for AES in the context of non-native Japanese, however, remains limited. This study explores the potential of LLM-based AES by comparing the efficiency of different models, i.e. two conventional machine training technology-based methods (Jess and JWriter), two LLMs (GPT and BERT), and one Japanese local LLM (Open-Calm large model). To conduct the evaluation, a dataset consisting of 1400 story-writing scripts authored by learners with 12 different first languages was used. Statistical analysis revealed that GPT-4 outperforms Jess and JWriter, BERT, and the Japanese language-specific trained Open-Calm large model in terms of annotation accuracy and predicting learning levels. Furthermore, by comparing 18 different models that utilize various prompts, the study emphasized the significance of prompts in achieving accurate and reliable evaluations using LLMs.

Publication

Humanities and Social Sciences Communications

Date

2024-6-3

Volume

11

Issue

1

Pages

723

Journal Abbr

Humanit Soc Sci Commun

DOI

10.1057/s41599-024-03209-9

Citation Key

li2024c

URL

https://www.nature.com/articles/s41599-024-03209-9

Accessed

31/07/2024, 15:48

ISSN

2662-9992

Language

en

Library Catalogue

DOI.org (Crossref)

Extra

<标题>: 将大型语言模型应用于非母语日语的自动作文评分 Read_Status: New Read_Status_Date: 2026-01-26T11:33:53.530Z

Citation

Li, W., & Liu, H. (2024). Applying large language models for automated essay scoring for non-native Japanese. Humanities and Social Sciences Communications, 11(1), 723. https://doi.org/10.1057/s41599-024-03209-9

Link to this record

https://aievidencehub.org/lib/MZP5AM3L