Evaluating the Quality of AI-Generated Items for a Certification Exam
Article Status
Published
Authors/contributors
- Mead, Alan (Author)
- Zhou, Chenxuan (Author)
Title
Evaluating the Quality of AI-Generated Items for a Certification Exam
Abstract
OpenAI’s GPT-3 model can write multiple-choice exam items. This paper reviewed the literature on automatic item generation and then described the recent history of OpenAI GPT models and their operation, and then described a methodology for generating items using these models. This study then critically evaluated GPT-3 at the task of writing multiple-choice exam items for a hypothetical psychometrics exam. We also compared two versions of the GPT-3 model (text-davinci-002 and text-davinci-003) on 90 items generated by GPT-3. The vast majority (71% and 90%) of items were judged as useful, but the typical item will require revision to address problems with the stem, key or distractors. The most common error was a violation of the principles of multiple-choice items (e.g., having two correct responses).
Publication
Journal of Applied Testing Technology
Date
2024
Extra
Citation Key: mead2024
<标题>: 评估人工智能生成的认证考试题目的质量
Citation
Mead, A., & Zhou, C. (2024). Evaluating the Quality of AI-Generated Items for a Certification Exam. Journal of Applied Testing Technology. https://jattjournal.net/index.php/atp/article/view/173204
Link to this record