Results – Evidence Library – Artificial Intelligence in Measurement and Education

Automated Scoring for Reading Comprehension via In-context BERT Tuning

Nigel Fernandez, Aritra Ghosh, Naiming L...

|

Apr 24th, 2022

|

preprint

Nigel Fernandez, Aritra Ghosh, Naiming L...

Apr 24th, 2022

Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring where items can be quite different from one another. However, these approaches have two...

Validity Arguments for AI‐Based Automated Scores: Essay Scoring as an Illustration

Steve Ferrara, Saed Qunbar

|

Sep 24th, 2022

|

journalArticle

Steve Ferrara, Saed Qunbar

Sep 24th, 2022

Abstract In this article, we argue that automated scoring engines should be transparent and construct relevant—that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and directly relevant to the target assessment construct. We address the current limitations on evidence and validity arguments for...

A Multilevel Multinomial Logit Approach to Bias Detection

Yong He, Shumin Jing, Y Lu

|

Apr 24th, 2022

|

conferencePaper

Yong He, Shumin Jing, Y Lu

Apr 24th, 2022

Toward Argument‐Based Fairness with an Application to AI‐Enhanced Educational Assessments

A. Corinne Huggins‐Manley, Brandon M. Bo...

|

Sep 24th, 2022

|

journalArticle

A. Corinne Huggins‐Manley, Brandon M. Bo...

Sep 24th, 2022

Abstract The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument‐based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible approach to fairness arguments that occurs outside of and complementary to validity arguments is required to address many of the...

Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring

Matthew S. Johnson, Xiang Liu, Daniel F....

|

Sep 24th, 2022

|

journalArticle

Matthew S. Johnson, Xiang Liu, Daniel F....

Sep 24th, 2022

Examining Bias in Automated Scoring of Reading Comprehension Items

Susan Lottridge, Mackenzie Young

|

Apr 24th, 2022

|

conferencePaper

Susan Lottridge, Mackenzie Young

Apr 24th, 2022

The use of automated scoring (AS) of constructed responses has become increasingly common in k - 12 formative, interim, and summative assessment programs. AS has been shown to perform well in essay writing, reading comprehension, and mathematics. However, less is known about how automated scoring engines perform for key subgroups such as gender, race/ethnicity, English proficiency status, disability status, and economic status. Bias evaluations have focused primarily on mean score...

Mapping Between Hidden States and Features to Validate Automated Essay Scoring Using DeBERTa Models

Christopher Ormerod

|

Apr 24th, 2022

|

journalArticle

Christopher Ormerod

Apr 24th, 2022

We introduce a regression-based framework to explore the dependence that global features have on score predictions from pretrained transformer-based language models used for Automated Essay Scoring (AES). We demonstrate that neural networks use approximations of rubric-relevant global features to determine a score prediction. By considering linear models on the hidden states, we can approximate global features and measure their importance to score predictions. This study uses DeBERTa models...

Artificial Intelligence in Education: 23rd International Conference, AIED 2022, Durham, UK, July 27–31, 2022, Proceedings, Part I

Maria Mercedes Rodrigo, Noburu Matsuda, ...

|

Apr 24th, 2022

|

book

Maria Mercedes Rodrigo, Noburu Matsuda, ...

Apr 24th, 2022

Bipartite-play Dialogue Collection for Practical Automatic Evaluation of Dialogue Systems

Shiki Sato, Yosuke Kishinami, Hiroaki Su...

|

Apr 24th, 2022

|

conferencePaper

Shiki Sato, Yosuke Kishinami, Hiroaki Su...

Apr 24th, 2022

Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the...

Assessment in the age of artificial intelligence

Zachari Swiecki, Hassan Khosravi, Guanli...

|

Apr 24th, 2022

|

journalArticle

Zachari Swiecki, Hassan Khosravi, Guanli...

Apr 24th, 2022

Automatic scoring of short answers using justification cues estimated by BERT

Shunya Takano, Osamu Ichikawa

|

Apr 24th, 2022

|

conferencePaper

Shunya Takano, Osamu Ichikawa

Apr 24th, 2022

Automated scoring of the autobiographical interview with natural language processing

Ruben van Genugten, Daniel L Schacter

|

Apr 24th, 2022

|

journalArticle

Ruben van Genugten, Daniel L Schacter

Apr 24th, 2022

Toward Efficient Automated Feature Engineering

Kafeng Wang, Pengyang Wang, Chengzhong x...

|

Apr 24th, 2022

|

journalArticle

Kafeng Wang, Pengyang Wang, Chengzhong x...

Apr 24th, 2022

Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment. Therefore, in this work, we propose a generic framework to improve the efficiency of AFE. Specifically, we construct the AFE pipeline based on reinforcement...

Towards Human-Like Educational Question Generation with Large Language Models