Using Demographic Data as Predictor Variables: a Questionable Choice

Article Status
Published
Authors/contributors
Title
Using Demographic Data as Predictor Variables: a Questionable Choice
Abstract
Predictive analytics methods in education are seeing widespread use and are producing increasingly accurate predictions of students’ outcomes. With the increased use of predictive analytics comes increasing concern about fairness for specific subgroups of the population. One approach that has been proposed to increase fairness is using demographic variables directly in models, as predictors. In this paper we explore issues of fairness in the use of demographic variables as predictors of long-term student outcomes, studying the arguments for and against this practice in the contexts where this literature has been published. We analyze arguments for the inclusion of demographic variables, specifically claims that this approach improves model performance and charges that excluding such variables amounts to a form of ‘color-blind’ racism. We also consider arguments against including demographic variables as predictors, including reduced actionability of predictions, risk of reinforcing bias, and limits of categorization. We then discuss how contextual factors of predictive models should influence case-specific decisions for the inclusion or exclusion of demographic variables and discuss the role of proxy variables. We conclude that, on balance, there are greater benefits to fairness if demographic variables are used to validate fairness rather than as predictors within models.
Report Type
preprint
Institution
EdArXiv
Date
2022-12-19
Language
en
Short Title
Using Demographic Data as Predictor Variables
Accessed
22/09/2023, 23:35
Library Catalogue
DOI.org (Crossref)
Citation
Baker, R. S., Esbenshade, L., Vitale, J., & Karumbaiah, S. (2022). Using Demographic Data as Predictor Variables: a Questionable Choice [Preprint]. EdArXiv. https://doi.org/10.35542/osf.io/y4wvj
Empirical studies
Powered by Zotero and Kerko.