Evaluating the Quality of AI-Generated Items for a Certification Exam

Mead, Alan; Zhou, Chenxuan

Return

Evaluating the Quality of AI-Generated Items for a Certification Exam

Article Status

Published

Authors/contributors

Mead, Alan (Author)
Zhou, Chenxuan (Author)

Title

Evaluating the Quality of AI-Generated Items for a Certification Exam

Abstract

OpenAI’s GPT-3 model can write multiple-choice exam items. This paper reviewed the literature on automatic item generation and then described the recent history of OpenAI GPT models and their operation, and then described a methodology for generating items using these models. This study then critically evaluated GPT-3 at the task of writing multiple-choice exam items for a hypothetical psychometrics exam. We also compared two versions of the GPT-3 model (text-davinci-002 and text-davinci-003) on 90 items generated by GPT-3. The vast majority (71% and 90%) of items were judged as useful, but the typical item will require revision to address problems with the stem, key or distractors. The most common error was a violation of the principles of multiple-choice items (e.g., having two correct responses).

Publication

Journal of Applied Testing Technology

Date

2024

Citation Key

mead2024

URL

https://jattjournal.net/index.php/atp/article/view/173204

Extra

<标题>: 评估人工智能生成的认证考试题目的质量 Read_Status: New Read_Status_Date: 2026-01-26T11:33:31.077Z

Citation

Mead, A., & Zhou, C. (2024). Evaluating the Quality of AI-Generated Items for a Certification Exam. Journal of Applied Testing Technology. https://jattjournal.net/index.php/atp/article/view/173204

Link to this record

https://aievidencehub.org/lib/RVVT74G2