Comparing the quality of human and ChatGPT feedback of students’ writing

Steiss, Jacob; Tate, Tamara; Graham, Steve; Cruz, Jazmin; Hebert, Michael; Wang, Jiali; Moon, Youngsun; Tseng, Waverly; Warschauer, Mark; Olson, Carol Booth

doi:10.1016/j.learninstruc.2024.101894

Comparing the quality of human and ChatGPT feedback of students’ writing

Article Status

Published

Authors/contributors

Steiss, Jacob (Author)
Tate, Tamara (Author)
Graham, Steve (Author)
Cruz, Jazmin (Author)
Hebert, Michael (Author)
Wang, Jiali (Author)
Moon, Youngsun (Author)
Tseng, Waverly (Author)
Warschauer, Mark (Author)
Olson, Carol Booth (Author)

Title

Comparing the quality of human and ChatGPT feedback of students’ writing

Abstract

Structured Abstract Background Offering students formative feedback on their writing is an effective way to facilitate writing development. Recent advances in AI (i.e., ChatGPT) may function as an automated writing evaluation tool, increasing the amount of feedback students receive and diminishing the burden on teachers to provide frequent feedback to large classes. Aims We examined the ability of generative AI (ChatGPT) to provide formative feedback. We compared the quality of human and AI feedback by scoring the feedback each provided on secondary student essays. We scored the degree to which feedback (a) was criteria-based, (b) provided clear directions for improvement, (c) was accurate, (d) prioritized essential features, and (e) used a supportive tone. Sample 200 pieces of human-generated formative feedback and 200 pieces of AI-generated formative feedback for the same essays. Methods We examined whether ChatGPT and human feedback differed in quality for the whole sample, for compositions that differed in overall quality, and for native English speakers and English learners by comparing descriptive statistics and effect sizes. Results Human raters were better at providing high-quality feedback to students in all categories other than criteria-based. AI and humans showed differences in feedback quality based on essay quality. Feedback did not vary by language status for humans or AI. Conclusion Well-trained evaluators provided higher quality feedback than ChatGPT. Considering the ease of generating feedback through ChatGPT and its overall quality, generative AI may still be useful in some contexts, particularly in formative early drafts or instances where a well-trained educator is unavailable.

Publication

Learning and Instruction

Volume

91

Pages

101894

Date

2024-6

Journal Abbr

Learn. Instr.

Language

en

DOI

10.1016/j.learninstruc.2024.101894

ISSN

0959-4752

URL

https://linkinghub.elsevier.com/retrieve/pii/S0959475224000215

Accessed

11/04/2024, 07:55

Library Catalogue

ScienceDirect

Citation

Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, 101894. https://doi.org/10.1016/j.learninstruc.2024.101894

Link to this record

https://aievidencehub.org/lib/82LTNZ6E