Understanding Mean Score Differences Between the e‐rater ® Automated Scoring Engine and Humans for Demographically Based Groups in the GRE ® General Test

Ramineni, Chaitanya; Williamson, David

doi:10.1002/ets2.12192

Understanding Mean Score Differences Between the e‐rater ® Automated Scoring Engine and Humans for Demographically Based Groups in the GRE ® General Test

Article Status

Published

Authors/contributors

Ramineni, Chaitanya (Author)
Williamson, David (Author)

Title

Understanding Mean Score Differences Between the e‐rater ® Automated Scoring Engine and Humans for Demographically Based Groups in the GRE ® General Test

Abstract

Notable mean score differences for the e‐rater® automated scoring engine and for humans for essays from certain demographic groups were observed for the GRE® General Test in use before the major revision of 2012, called rGRE. The use of e‐rater as a check‐score model with discrepancy thresholds prevented an adverse impact on the examinee score at the item or test level. Despite this control, there remains a need to understand the root causes of these demographically based score differences and to identify potential mechanisms for avoiding future instances of discrepancy. In this study, we used a combination of statistical methods and human review to propose hypotheses about the root cause of score differences and whether such discrepancies reflect inadequacies of e‐rater, human scoring, or both. The human rating process was found to be influenced strongly by the scale structure and did not fully correspond to the e‐rater scoring mechanism. The human raters appeared to be using conditional logic and a rule‐based approach to their scoring, while e‐rater uses linear weighting of all the features. These analyses have implications for future research and operational policies for the scoring of the rGRE.

Publication

ETS Research Report Series

Date

2018-4-27

Volume

2018

Issue

1

Pages

1-31

Journal Abbr

ETS Research Report Series

DOI

10.1002/ets2.12192

Citation Key

ramineni2018

URL

https://onlinelibrary.wiley.com/doi/10.1002/ets2.12192

Accessed

18/06/2024, 18:07

ISSN

2330-8516

Language

en

Library Catalogue

DOI.org (Crossref)

Extra

<标题>: 理解 e‐rater ® 自动评分引擎与人类在 GRE ® 通用考试中不同人口群体的平均分差异 Read_Status: New Read_Status_Date: 2026-01-26T11:33:53.530Z

Citation

Ramineni, C., & Williamson, D. (2018). Understanding Mean Score Differences Between the e‐rater ® Automated Scoring Engine and Humans for Demographically Based Groups in the GRE ® General Test. ETS Research Report Series, 2018(1), 1–31. https://doi.org/10.1002/ets2.12192

Link to this record

https://aievidencehub.org/lib/4UASYZKD

Understanding Mean Score Differences Between the <i>e‐rater</i> ® Automated Scoring Engine and Humans for Demographically Based Groups in the <i>GRE</i> ® General Test