Search
74 resources
-
Victoria Yaneva, Constantin Orasan, Rich...|Dec 1st, 2017|conferencePaperVictoria Yaneva, Constantin Orasan, Rich...Dec 1st, 2017
-
Zhen Wang, Klaus Zechner, Yu Sun|Dec 19th, 2016|journalArticleZhen Wang, Klaus Zechner, Yu SunDec 19th, 2016
As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish rigorous procedures for monitoring the performance of both human and automated scoring processes during operational administrations. This paper...
-
Heather Buzick, Maria Elena Oliveri, Yig...|Jul 2nd, 2016|journalArticleHeather Buzick, Maria Elena Oliveri, Yig...Jul 2nd, 2016
-
Maurice Cogan Hauck, Mikyung Kim Wolf, R...|Apr 4th, 2016|journalArticleMaurice Cogan Hauck, Mikyung Kim Wolf, R...Apr 4th, 2016
This paper is the first in a series from Educational Testing Service (ETS) concerning English language proficiency (ELP) assessments for K–12 English learners (ELs). The goal of this paper, and the series, is to present research‐based ideas, principles, and recommendations for consideration by those who are conceptualizing, developing, and implementing ELP assessments for K–12 ELs and by all stakeholders in their education and assessment. We also hope to contribute to the active current...
-
Aishwarya Agrawal, Dhruv Batra, Devi Par...|Dec 1st, 2016|journalArticleAishwarya Agrawal, Dhruv Batra, Devi Par...Dec 1st, 2016
Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA). The performance of most models is clustered around 60-70%. In this paper we propose systematic methods to analyze the behavior of these models as a first step towards recognizing their strengths and weaknesses, and identifying the most fruitful directions for progress. We analyze two models, one each from two major classes of VQA models -- with-attention and without-attention...
-
Jennifer Hill, Rahul Simha|Dec 1st, 2016|conferencePaperJennifer Hill, Rahul SimhaDec 1st, 2016
-
Giang Thi Linh Hoang, Antony John Kunnan...|Oct 1st, 2016|journalArticleGiang Thi Linh Hoang, Antony John Kunnan...Oct 1st, 2016
-
Brenden M. Lake, Ruslan Salakhutdinov, J...|Dec 11th, 2015|journalArticleBrenden M. Lake, Ruslan Salakhutdinov, J...Dec 11th, 2015
Handwritten characters drawn by a model Not only do children learn effortlessly, they do so quickly and with a remarkable ability to use what they have learned as the raw material for creating new stuff. Lake et al. describe a computational model that learns in a similar fashion and does so better than current deep learning algorithms. The model classifies, parses, and recreates handwritten characters, and can generate new letters of the...
-
Kevin R. Raczynski, Allan S. Cohen, Geor...|Sep 1st, 2015|journalArticleKevin R. Raczynski, Allan S. Cohen, Geor...Sep 1st, 2015
There is a large body of research on the effectiveness of rater training methods in the industrial and organizational psychology literature. Less has been reported in the measurement literature on large‐scale writing assessments. This study compared the effectiveness of two widely used rater training methods—self‐paced and collaborative frame‐of‐reference training—in the context of a large‐scale writing assessment. Sixty‐six raters were randomly assigned to the training methods. After...
-
Mark J. Gierl, Hollis Lai, Karen Fung|Aug 20th, 2015|bookSectionMark J. Gierl, Hollis Lai, Karen FungAug 20th, 2015
-
Ramakrishna Vedantam, C. Lawrence Zitnic...|Jun 1st, 2015|preprintRamakrishna Vedantam, C. Lawrence Zitnic...Jun 1st, 2015
Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new...
-
Michael Heilman, Nitin Madnani|Dec 1st, 2015|conferencePaperMichael Heilman, Nitin MadnaniDec 1st, 2015
-
Sami Virpioja, Stig-Arne Grönroos|Dec 1st, 2015|conferencePaperSami Virpioja, Stig-Arne GrönroosDec 1st, 2015
-
Steven Burrows, Iryna Gurevych, Benno St...|Oct 23rd, 2014|journalArticleSteven Burrows, Iryna Gurevych, Benno St...Oct 23rd, 2014
-
S.-J Huang|Jul 1st, 2014|journalArticleS.-J HuangJul 1st, 2014
-
Ou Lydia Liu, Chris Brew, John Blackmore...|Mar 6th, 2014|journalArticleOu Lydia Liu, Chris Brew, John Blackmore...Mar 6th, 2014
Content‐based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept‐based scoring tool for content‐based scoring, c‐rater™, for four science items with rubrics aiming to differentiate among multiple levels of understanding. The items showed moderate to good agreement with human scores. The findings suggest...
-
Russell G. Almond|Dec 11th, 2013|journalArticleRussell G. AlmondDec 11th, 2013
-
J.A. León, R. Olmos, I. Escudero|Jul 1st, 2013|journalArticleJ.A. León, R. Olmos, I. EscuderoJul 1st, 2013
-
Questar Assessment, Inc.|Apr 1st, 2013|reportQuestar Assessment, Inc.Apr 1st, 2013
A reverse engineering approach to automatic item generation (AIG) was applied to a figurebased publicly released test item from the Organisation for Economic Cooperation and Development (OECD) Programme for International Student Assessment (PISA) mathematical literacy cognitive instrument as part of a proof of concept. The author created an item template from which three items were randomly generated from within each of six types defined by a feature deemed to be most likely to affect item...
-
Sara Cushing Weigle|Jan 1st, 2013|journalArticleSara Cushing WeigleJan 1st, 2013