Qualitative Coding with GPT-4: Where it Works Better

Liu, Xiner; Zambrano, Andrés; Baker, Ryan; Barany, Amanda; Ocumpaugh, Jaclyn; Zhang, Jiayi; Pankiewicz, Maciej; Nasiar, Nidhi; Wei, Zhanlan

doi:10.18608/jla.2025.8575

Qualitative Coding with GPT-4: Where it Works Better

Article Status

Published

Authors/contributors

Liu, Xiner (Author)
Zambrano, Andrés (Author)
Baker, Ryan (Author)
Barany, Amanda (Author)
Ocumpaugh, Jaclyn (Author)
Zhang, Jiayi (Author)
Pankiewicz, Maciej (Author)
Nasiar, Nidhi (Author)
Wei, Zhanlan (Author)

Title

Qualitative Coding with GPT-4: Where it Works Better

Abstract

This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies-Zero-shot, Few-shot, and Few-shot with contextual information-as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviors in an introductory programming course. We evaluated each approach's performance based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct's degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that the constructs that GPT-4 has the most difficulty with are the same constructs than human coders find more difficult to reach inter-rater reliability on. Notes for Practice (research paper)  GPT-4 can be used to code qualitative data for educationally-relevant constructs.  Using embeddings and examples can improve agreement with humans. Examples are more useful for constructs that are more difficult to define.  Constructs that human beings find difficult to agree on are also difficult for GPT-4.

Publication

Journal of Learning Analytics

Date

2025-03-05 12:08:51

DOI

10.18608/jla.2025.8575

Citation Key

zotero-12383

URL

https://www.researchgate.net/profile/Ryan-Baker-2/publication/389397778_Qualitative_Coding_with_GPT-4_Where_it_Works_Better/links/67c11f0cf5cb8f70d5c2ffb9/Qualitative-Coding-with-GPT-4-Where-it-Works-Better.pdf

Accessed

05/03/2025, 12:08

Extra

Read_Status: New Read_Status_Date: 2026-01-26T11:33:07.206Z

Citation

Liu, X., Zambrano, A., Baker, R., Barany, A., Ocumpaugh, J., Zhang, J., Pankiewicz, M., Nasiar, N., & Wei, Z. (2025). Qualitative Coding with GPT-4: Where it Works Better. Journal of Learning Analytics. https://doi.org/10.18608/jla.2025.8575

Link to this record

https://aievidencehub.org/lib/KY4NWV8A