Use of AI (GPT-4)-generated multiple-choice questions for the examination of surgical subspecialty residents: Report of feasibility and psychometric analysis

Kim, Jin Kyu (Justin); Chua, Michael; Lorenzo, Armando; Rickard, Mandy; Andreacchi, Laura; Kim, Michael; Cheung, Douglas; Krakowsky, Yonah; Lee, Jason Y.

doi:10.5489/cuaj.9020

Return

Use of AI (GPT-4)-generated multiple-choice questions for the examination of surgical subspecialty residents: Report of feasibility and psychometric analysis

Article Status

Published

Authors/contributors

Kim, Jin Kyu (Justin) (Author)
Chua, Michael (Author)
Lorenzo, Armando (Author)
Rickard, Mandy (Author)
Andreacchi, Laura (Author)
Kim, Michael (Author)
Cheung, Douglas (Author)
Krakowsky, Yonah (Author)
Lee, Jason Y. (Author)

Title

Use of AI (GPT-4)-generated multiple-choice questions for the examination of surgical subspecialty residents: Report of feasibility and psychometric analysis

Abstract

Introduction: Multiple-choice questions (MCQs) are essential in medical education and widely used by licensing bodies. They are traditionally created with intensive human effort to ensure validity. Recent advances in AI, particularly large language models (LLMs), offer the potential to streamline this process. This study aimed to develop and test a GPT-4 model with customized instructions for generating MCQs to assess urology residents. Methods: A GPT-4 model was embedded using guidelines from medical licensing bodies and reference materials specific to urology. This model was tasked with generating MCQs designed to mimic the format and content of the 2023 urology examination outlined by the Royal College of Physicians and Surgeons of Canada (RCPSC). Following generation, a selection of MCQs underwent expert review for validity and suitability. Results: From an initial set of 123 generated MCQs, 60 were chosen for inclusion in an exam administered to 15 urology residents at the University of Toronto. The exam results demonstrated a general increasing performance with level of training cohorts, suggesting the MCQs’ ability to effectively discriminate knowledge levels among residents. The majority (33/60) of the questions had discriminatory value that appeared acceptable (discriminatory index 0.20.4) or excellent (discriminatory index >0.4). Conclusions: This study highlights AI-driven models like GPT-4 as efficient tools to aid with MCQ generation in medical education assessments. By automating MCQ creation while maintaining quality standards, AI can expedite processes. Future research should focus on refining AI applications in education to optimize assessments and enhance medical training and certification outcomes.

Publication

Canadian Urological Association Journal

Date

2025-02-24

Volume

19

Issue

6

Journal Abbr

Cuaj-can. Urol. Assoc.

DOI

10.5489/cuaj.9020

Citation Key

kim2025

URL

https://cuaj.ca/index.php/journal/article/view/9020

Accessed

08/10/2025, 23:09

ISSN

1920-1214, 1911-6470

Short Title

Use of AI (GPT-4)-generated multiple-choice questions for the examination of surgical subspecialty residents

Library Catalogue

DOI.org (Crossref)

Extra

Read_Status: New Read_Status_Date: 2026-01-26T11:33:31.078Z

Citation

Kim, J. K. (Justin), Chua, M., Lorenzo, A., Rickard, M., Andreacchi, L., Kim, M., Cheung, D., Krakowsky, Y., & Lee, J. Y. (2025). Use of AI (GPT-4)-generated multiple-choice questions for the examination of surgical subspecialty residents: Report of feasibility and psychometric analysis. Canadian Urological Association Journal, 19(6). https://doi.org/10.5489/cuaj.9020

Link to this record

https://aievidencehub.org/lib/FZ2K6ESN