Harnessing LLMs for multi-dimensional writing assessment: Reliability and alignment with human judgments
Article Status
Published
Authors/contributors
- Tang, Xiaoyi (Author)
- Chen, Hongwei (Author)
- Lin, Daoyu (Author)
- Li, Kexin (Author)
Title
Harnessing LLMs for multi-dimensional writing assessment: Reliability and alignment with human judgments
Publication
Heliyon
Volume
10
Issue
14
Pages
e34262
Date
2024-7
Journal Abbr
Heliyon
Language
en
ISSN
2405-8440
Short Title
Harnessing LLMs for multi-dimensional writing assessment
Accessed
31/07/2024, 15:52
Library Catalogue
DOI.org (Crossref)
Extra
Citation Key: tang2024
<标题>: 利用大型语言模型进行多维写作评估:与人类判断的一致性及可靠性
<AI Smry>: Results indicate that prompt engineering significantly affects the reliability of LLMs, with GPT-4 showing marked improvement over GPT-3.5 and Claude 2, achieving 112% and 114% increase in scoring accuracy under the criteria and sample-referenced justification prompt.
Citation
Tang, X., Chen, H., Lin, D., & Li, K. (2024). Harnessing LLMs for multi-dimensional writing assessment: Reliability and alignment with human judgments. Heliyon, 10(14), e34262. https://doi.org/10.1016/j.heliyon.2024.e34262
Link to this record