Results – Evidence Library – Artificial Intelligence in Measurement and Education

Combining Multiple Corpora for Readability Assessment for People with Cognitive Disabilities

Victoria Yaneva, Constantin Orasan, Rich...

|

Apr 24th, 2017

|

conferencePaper

Victoria Yaneva, Constantin Orasan, Rich...

Apr 24th, 2017

Monitoring the performance of human and automated scores for spoken responses

Zhen Wang, Klaus Zechner, Yu Sun

|

Dec 19th, 2016

|

journalArticle

Zhen Wang, Klaus Zechner, Yu Sun

Dec 19th, 2016

As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish rigorous procedures for monitoring the performance of both human and automated scoring processes during operational administrations. This paper...

Comparing Human and Automated Essay Scoring for Prospective Graduate Students With Learning Disabilities and/or ADHD

Heather Buzick, Maria Elena Oliveri, Yig...

|

Jul 2nd, 2016

|

journalArticle

Heather Buzick, Maria Elena Oliveri, Yig...

Jul 2nd, 2016

Creating a Next‐Generation System of K–12 English Learner Language Proficiency Assessments

Maurice Cogan Hauck, Mikyung Kim Wolf, R...

|

Apr 4th, 2016

|

journalArticle

Maurice Cogan Hauck, Mikyung Kim Wolf, R...

Apr 4th, 2016

This paper is the first in a series from Educational Testing Service (ETS) concerning English language proficiency (ELP) assessments for K–12 English learners (ELs). The goal of this paper, and the series, is to present research‐based ideas, principles, and recommendations for consideration by those who are conceptualizing, developing, and implementing ELP assessments for K–12 ELs and by all stakeholders in their education and assessment. We also hope to contribute to the active current...

Analyzing the Behavior of Visual Question Answering Models

Aishwarya Agrawal, Dhruv Batra, Devi Par...

|

Apr 24th, 2016

|

journalArticle

Aishwarya Agrawal, Dhruv Batra, Devi Par...

Apr 24th, 2016

Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA). The performance of most models is clustered around 60-70%. In this paper we propose systematic methods to analyze the behavior of these models as a first step towards recognizing their strengths and weaknesses, and identifying the most fruitful directions for progress. We analyze two models, one each from two major classes of VQA models -- with-attention and without-attention...

Automatic Generation of Context-Based Fill-in-the-Blank Exercises Using Co-occurrence Likelihoods and Google n-grams

Jennifer Hill, Rahul Simha

|

Apr 24th, 2016

|

conferencePaper

Jennifer Hill, Rahul Simha

Apr 24th, 2016

Automated Essay Evaluation for English Language Learners:A Case Study of MY Access

Giang Thi Linh Hoang, Antony John Kunnan...

|

Oct 24th, 2016

|

journalArticle

Giang Thi Linh Hoang, Antony John Kunnan...

Oct 24th, 2016

Human-level concept learning through probabilistic program induction

Brenden M. Lake, Ruslan Salakhutdinov, J...

|

Dec 11th, 2015

|

journalArticle

Brenden M. Lake, Ruslan Salakhutdinov, J...

Dec 11th, 2015

Handwritten characters drawn by a model Not only do children learn effortlessly, they do so quickly and with a remarkable ability to use what they have learned as the raw material for creating new stuff. Lake et al. describe a computational model that learns in a similar fashion and does so better than current deep learning algorithms. The model classifies, parses, and recreates handwritten characters, and can generate new letters of the...

Comparing the Effectiveness of Self‐Paced and Collaborative Frame‐of‐Reference Training on Rater Accuracy in a Large‐Scale Writing Assessment

Kevin R. Raczynski, Allan S. Cohen, Geor...

|

Sep 24th, 2015

|

journalArticle

Kevin R. Raczynski, Allan S. Cohen, Geor...

Sep 24th, 2015

There is a large body of research on the effectiveness of rater training methods in the industrial and organizational psychology literature. Less has been reported in the measurement literature on large‐scale writing assessments. This study compared the effectiveness of two widely used rater training methods—self‐paced and collaborative frame‐of‐reference training—in the context of a large‐scale writing assessment. Sixty‐six raters were randomly assigned to the training methods. After...

Using Technology-Enhanced Processes to Generate Test Items in Multiple Languages

Mark J. Gierl, Hollis Lai, Karen Fung

|

Aug 20th, 2015

|

bookSection

Mark J. Gierl, Hollis Lai, Karen Fung

Aug 20th, 2015

CIDEr: Consensus-based Image Description Evaluation

Ramakrishna Vedantam, C. Lawrence Zitnic...

|

Jun 24th, 2015

|

preprint

Ramakrishna Vedantam, C. Lawrence Zitnic...

Jun 24th, 2015

Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new...

The Impact of Training Data on Automated Short Answer Scoring Performance

Michael Heilman, Nitin Madnani

|

Apr 24th, 2015

|

conferencePaper

Michael Heilman, Nitin Madnani

Apr 24th, 2015

LeBLEU: N-gram-based Translation Evaluation Score for Morphologically Complex Languages

Sami Virpioja, Stig-Arne Grönroos

|

Apr 24th, 2015

|

conferencePaper

Sami Virpioja, Stig-Arne Grönroos

Apr 24th, 2015

The Eras and Trends of Automatic Short Answer Grading

Steven Burrows, Iryna Gurevych, Benno St...

|

Oct 23rd, 2014

|

journalArticle

Steven Burrows, Iryna Gurevych, Benno St...

Oct 23rd, 2014

Automated versus human scoring: A case study in an EFL context

S.-J Huang

|

Jul 24th, 2014

|

journalArticle

S.-J Huang

Jul 24th, 2014

Automated Scoring of Constructed-Response Science Items: Prospects and Obstacles

Ou Lydia Liu, Chris Brew, John Blackmore...

|

Mar 6th, 2014

|

journalArticle

Ou Lydia Liu, Chris Brew, John Blackmore...

Mar 6th, 2014

Content‐based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept‐based scoring tool for content‐based scoring, c‐rater™, for four science items with rubrics aiming to differentiate among multiple levels of understanding. The items showed moderate to good agreement with human scores. The findings suggest...

Using Automated Essay Scores as an Anchor When Equating Constructed Response Writing Tests

Russell G. Almond

|

Dec 11th, 2013

|

journalArticle

Russell G. Almond

Dec 11th, 2013

Exploring the Assessment of Summaries: Using Latent Semantic Analysis to Grade Summaries Written by Spanish Students

J.A. León, R. Olmos, I. Escudero

|

Jul 24th, 2013

|

journalArticle

J.A. León, R. Olmos, I. Escudero

Jul 24th, 2013

An Application of Reverse Engineering to Automatic Item Generation: A Proof of Concept Using Automatically Generated Figures

Questar Assessment, Inc.

|

Apr 24th, 2013

|

report

Questar Assessment, Inc.

Apr 24th, 2013

A reverse engineering approach to automatic item generation (AIG) was applied to a figurebased publicly released test item from the Organisation for Economic Cooperation and Development (OECD) Programme for International Student Assessment (PISA) mathematical literacy cognitive instrument as part of a proof of concept. The author created an item template from which three items were randomly generated from within each of six types defined by a feature deemed to be most likely to affect item...

English language learners and automated scoring of essays: Critical considerations

Sara Cushing Weigle

|

Jan 24th, 2013

|

journalArticle

Sara Cushing Weigle

Jan 24th, 2013

Search

Publication year