Evaluating the Quality of AI-Generated Items for a Certification Exam
Article Status
    Published
Authors/contributors
    - Mead, Alan (Author)
 - Zhou, Chenxuan (Author)
 
Title
    Evaluating the Quality of AI-Generated Items for a Certification Exam
Abstract
    OpenAI’s GPT-3 model can write multiple-choice exam items. This paper reviewed the literature on automatic item generation and then described the recent history of OpenAI GPT models and their operation, and then described a methodology for generating items using these models. This study then critically evaluated GPT-3 at the task of writing multiple-choice exam items for a hypothetical psychometrics exam. We also compared two versions of the GPT-3 model (text-davinci-002 and text-davinci-003) on 90 items generated by GPT-3. The vast majority (71% and 90%) of items were judged as useful, but the typical item will require revision to address problems with the stem, key or distractors. The most common error was a violation of the principles of multiple-choice items (e.g., having two correct responses).
Publication
    Journal of Applied Testing Technology
Date
    2024
Extra
    Citation Key: mead2024
<标题>: 评估人工智能生成的认证考试题目的质量
Citation
    Mead, A., & Zhou, C. (2024). Evaluating the Quality of AI-Generated Items for a Certification Exam. Journal of Applied Testing Technology. https://jattjournal.net/index.php/atp/article/view/173204
        Link to this record