Evaluating and Optimizing Educational Content with Large Language Model Judgments

Article Status
Published
Authors/contributors
Title
Evaluating and Optimizing Educational Content with Large Language Model Judgments
Abstract
Creating effective educational materials generally requires expensive and time-consuming studies of student learning outcomes. To overcome this barrier, one idea is to build computational models of student learning and use them to optimize instructional materials. However, it is difficult to model the cognitive processes of learning dynamics. We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. Specifically, we use GPT-3.5 to evaluate the overall effect of instructional materials on different student groups and find that it can replicate well-established educational findings such as the Expertise Reversal Effect and the Variability Effect. This demonstrates the potential of LMs as reliable evaluators of educational content. Building on this insight, we introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. We apply this approach to create math word problem worksheets aimed at maximizing student learning gains. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences. We conclude by discussing potential divergences between human and LM opinions and the resulting pitfalls of automating instructional design.
Repository
arXiv
Archive ID
arXiv:2403.02795
Date
May 6th, 2024
Accessed
20/06/2024, 08:21
Library Catalogue
Extra
arXiv:2403.02795 [cs] <AI Smry>: This work uses GPT-3.5 to evaluate the overall effect of instructional materials on different student groups and finds that it can replicate well-established educational findings such as the Expertise Reversal Effect and the Variability Effect.
Citation
He-Yueya, J., Goodman, N. D., & Brunskill, E. (2024). Evaluating and Optimizing Educational Content with Large Language Model Judgments (arXiv:2403.02795). arXiv. https://doi.org/10.48550/arXiv.2403.02795
Powered by Zotero and Kerko.