Search
74 resources
-
Rahul R. Divekar, Jaimie Drozdal, Yalun ...|Jun 8th, 2018|conferencePaperRahul R. Divekar, Jaimie Drozdal, Yalun ...Jun 8th, 2018
-
Su-Youn Yoon, Suma Bhat|May 22nd, 2018|journalArticleSu-Youn Yoon, Suma BhatMay 22nd, 2018
-
Chaitanya Ramineni, David Williamson|Apr 27th, 2018|journalArticleChaitanya Ramineni, David WilliamsonApr 27th, 2018
Notable mean score differences for the e‐rater® automated scoring engine and for humans for essays from certain demographic groups were observed for the GRE® General Test in use before the major revision of 2012, called rGRE. The use of e‐rater as a check‐score model with discrepancy thresholds prevented an adverse impact on the examinee score at the item or test level. Despite this control, there remains a need to understand the root causes of these demographically based score differences...
-
Kavita Ganesan|Mar 5th, 2018|preprintKavita GanesanMar 5th, 2018
Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times...
-
Stefanie A. Wind, Edward W. Wolfe, Georg...|Jan 2nd, 2018|journalArticleStefanie A. Wind, Edward W. Wolfe, Georg...Jan 2nd, 2018
-
Evelin Amorim, Marcia Cançado, Adriano V...|Jan 22nd, 2018|conferencePaperEvelin Amorim, Marcia Cançado, Adriano V...Jan 22nd, 2018
-
Akshay Chaturvedi, Onkar Pandit, Utpal G...|Jan 22nd, 2018|conferencePaperAkshay Chaturvedi, Onkar Pandit, Utpal G...Jan 22nd, 2018
-
Yaliang Li, Liuyi Yao, Nan Du|Jan 22nd, 2018|journalArticleYaliang Li, Liuyi Yao, Nan DuJan 22nd, 2018
The past few years have witnessed the flourishing of crowdsourced medical question answering (Q&A) websites. Patients who have medical information demands tend to post questions about their health conditions on these crowdsourced Q&A websites and get answers from other users. However, we observe that a large portion of new medical questions cannot be answered in time or receive only few answers from these websites. On the other hand, we notice that solved questions have great...
-
Nanyun Peng, Marjan Ghazvininejad, Jonat...|Jan 22nd, 2018|conferencePaperNanyun Peng, Marjan Ghazvininejad, Jonat...Jan 22nd, 2018
-
Cynthia Rudin|Jan 22nd, 2018|journalArticleCynthia RudinJan 22nd, 2018
Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|Oct 2nd, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...Oct 2nd, 2017
-
Keelan Evanini, Maurice Cogan Hauck, Ken...|May 2nd, 2017|journalArticleKeelan Evanini, Maurice Cogan Hauck, Ken...May 2nd, 2017
This report is the fifth in a series concerning English language proficiency (ELP) assessments for English learners (ELs) in kindergarten through 12th grade in the United States. The series, produced by Educational Testing Service (ETS), is intended to provide theory‐ and evidence‐based principles and recommendations for improving next‐generationELPassessment systems, policies, and practices and to stimulate discussion on better serving K–12ELstudents. The first report articulated a...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|May 22nd, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...May 22nd, 2017
-
Sanjeev Arora, Yingyu Liang, Tengyu Ma|Jan 22nd, 2017|conferencePaperSanjeev Arora, Yingyu Liang, Tengyu MaJan 22nd, 2017
-
Ryan Lowe, Michael Noseworthy, Iulian Vl...|Jan 22nd, 2017|conferencePaperRyan Lowe, Michael Noseworthy, Iulian Vl...Jan 22nd, 2017
-
Nitin Madnani, Anastassia Loukina, Alina...|Jan 22nd, 2017|conferencePaperNitin Madnani, Anastassia Loukina, Alina...Jan 22nd, 2017
-
Amber Nigam|Jan 22nd, 2017|conferencePaperAmber NigamJan 22nd, 2017
Automated Essay Scoring (AES) has been quite popular and is being widely used. However, lack of appropriate methodology for rating nonnative English speakers' essays has meant a lopsided advancement in this field. In this paper, we report initial results of our experiments with nonnative AES that learns from manual evaluation of nonnative essays. For this purpose, we conducted an exercise in which essays written by nonnative English speakers in test environment were rated both manually and...
-
Brian Riordan, Andrea Horbach, Aoife Cah...|Jan 22nd, 2017|conferencePaperBrian Riordan, Andrea Horbach, Aoife Cah...Jan 22nd, 2017
-
Ute Schmid, Christina Zeller, Tarek Beso...|Jan 22nd, 2017|bookSectionUte Schmid, Christina Zeller, Tarek Beso...Jan 22nd, 2017
-
Joshua Wilson|Apr 22nd, 2017|journalArticleJoshua WilsonApr 22nd, 2017