Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance
Article Status
Published
Author/contributor
- Khademi, Abdolvahab (Author)
Title
Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance
Abstract
ChatGPT and Bard are AI chatbots based on Large Language Models (LLM) that are slated to promise different applications in diverse areas. In education, these AI technologies have been tested for applications in assessment and teaching. In assessment, AI has long been used in automated essay scoring and automated item generation. One psychometric property that these tools must have to assist or replace humans in assessment is high reliability in terms of agreement between AI scores and human raters. In this paper, we measure the reliability of OpenAI ChatGP and Google Bard LLMs tools against experienced and trained humans in perceiving and rating the complexity of writing prompts. Intraclass correlation (ICC) as a performance metric showed that the inter-reliability of both the OpenAI ChatGPT and the Google Bard were low against the gold standard of human ratings.
Publication
Journal of Applied Learning & Teaching
Volume
6
Issue
1
Date
2023-5-10
Journal Abbr
JALT
Language
en
ISSN
2591-801X
Short Title
Can ChatGPT and Bard Generate Aligned Assessment Items?
Accessed
07/01/2025, 21:01
Library Catalogue
Extra
arXiv:2304.05372 [cs]
Citation Key: khademi2023
<标题>: ChatGPT 和 Bard 能生成一致的评估项目吗?与人类表现的可靠性分析
<AI Smry>: Measure the reliability of OpenAI ChatGP and Google Bard LLMs tools against experienced and trained humans in perceiving and rating the complexity of writing prompts showed that the inter-reliability of both the Open AI ChatGPT and the Google Bard were low against the gold standard of human ratings.
Citation
Khademi, A. (2023). Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance. Journal of Applied Learning & Teaching, 6(1). https://doi.org/10.37074/jalt.2023.6.1.28
Link to this record