Harnessing LLMs for multi-dimensional writing assessment: Reliability and alignment with human judgments

Article Status
Published
Authors/contributors
Title
Harnessing LLMs for multi-dimensional writing assessment: Reliability and alignment with human judgments
Publication
Heliyon
Date
2024-7
Volume
10
Issue
14
Pages
e34262
Journal Abbr
Heliyon
Citation Key
tang2024
Accessed
31/07/2024, 15:52
ISSN
2405-8440
Short Title
Harnessing LLMs for multi-dimensional writing assessment
Language
en
Library Catalogue
DOI.org (Crossref)
Extra
<标题>: 利用大型语言模型进行多维写作评估:与人类判断的一致性及可靠性 <AI Smry>: Results indicate that prompt engineering significantly affects the reliability of LLMs, with GPT-4 showing marked improvement over GPT-3.5 and Claude 2, achieving 112% and 114% increase in scoring accuracy under the criteria and sample-referenced justification prompt. Read_Status: New Read_Status_Date: 2025-11-10T07:26:12.161Z
Citation
Tang, X., Chen, H., Lin, D., & Li, K. (2024). Harnessing LLMs for multi-dimensional writing assessment: Reliability and alignment with human judgments. Heliyon, 10(14), e34262. https://doi.org/10.1016/j.heliyon.2024.e34262
Powered by Zotero and Kerko.