Search
74 resources
-
Rahul R. Divekar, Jaimie Drozdal, Yalun ...|Jun 8th, 2018|conferencePaperRahul R. Divekar, Jaimie Drozdal, Yalun ...Jun 8th, 2018
-
Su-Youn Yoon, Suma Bhat|May 24th, 2018|journalArticleSu-Youn Yoon, Suma BhatMay 24th, 2018
-
Chaitanya Ramineni, David Williamson|Apr 27th, 2018|journalArticleChaitanya Ramineni, David WilliamsonApr 27th, 2018
Notable mean score differences for the e‐rater® automated scoring engine and for humans for essays from certain demographic groups were observed for the GRE® General Test in use before the major revision of 2012, called rGRE. The use of e‐rater as a check‐score model with discrepancy thresholds prevented an adverse impact on the examinee score at the item or test level. Despite this control, there remains a need to understand the root causes of these demographically based score differences...
-
Kavita Ganesan|Mar 5th, 2018|preprintKavita GanesanMar 5th, 2018
Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times...
-
Stefanie A. Wind, Edward W. Wolfe, Georg...|Jan 2nd, 2018|journalArticleStefanie A. Wind, Edward W. Wolfe, Georg...Jan 2nd, 2018
-
Evelin Amorim, Marcia Cançado, Adriano V...|Apr 24th, 2018|conferencePaperEvelin Amorim, Marcia Cançado, Adriano V...Apr 24th, 2018
-
Akshay Chaturvedi, Onkar Pandit, Utpal G...|Apr 24th, 2018|conferencePaperAkshay Chaturvedi, Onkar Pandit, Utpal G...Apr 24th, 2018
-
Yaliang Li, Liuyi Yao, Nan Du|Apr 24th, 2018|journalArticleYaliang Li, Liuyi Yao, Nan DuApr 24th, 2018
The past few years have witnessed the flourishing of crowdsourced medical question answering (Q&A) websites. Patients who have medical information demands tend to post questions about their health conditions on these crowdsourced Q&A websites and get answers from other users. However, we observe that a large portion of new medical questions cannot be answered in time or receive only few answers from these websites. On the other hand, we notice that solved questions have great...
-
Nanyun Peng, Marjan Ghazvininejad, Jonat...|Apr 24th, 2018|conferencePaperNanyun Peng, Marjan Ghazvininejad, Jonat...Apr 24th, 2018
-
Cynthia Rudin|Apr 24th, 2018|journalArticleCynthia RudinApr 24th, 2018
Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|Oct 2nd, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...Oct 2nd, 2017
-
Keelan Evanini, Maurice Cogan Hauck, Ken...|May 2nd, 2017|journalArticleKeelan Evanini, Maurice Cogan Hauck, Ken...May 2nd, 2017
This report is the fifth in a series concerning English language proficiency (ELP) assessments for English learners (ELs) in kindergarten through 12th grade in the United States. The series, produced by Educational Testing Service (ETS), is intended to provide theory‐ and evidence‐based principles and recommendations for improving next‐generationELPassessment systems, policies, and practices and to stimulate discussion on better serving K–12ELstudents. The first report articulated a...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|May 24th, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...May 24th, 2017
-
Sanjeev Arora, Yingyu Liang, Tengyu Ma|Apr 24th, 2017|conferencePaperSanjeev Arora, Yingyu Liang, Tengyu MaApr 24th, 2017
-
Ryan Lowe, Michael Noseworthy, Iulian Vl...|Apr 24th, 2017|conferencePaperRyan Lowe, Michael Noseworthy, Iulian Vl...Apr 24th, 2017
-
Nitin Madnani, Anastassia Loukina, Alina...|Apr 24th, 2017|conferencePaperNitin Madnani, Anastassia Loukina, Alina...Apr 24th, 2017
-
Amber Nigam|Apr 24th, 2017|conferencePaperAmber NigamApr 24th, 2017
Automated Essay Scoring (AES) has been quite popular and is being widely used. However, lack of appropriate methodology for rating nonnative English speakers' essays has meant a lopsided advancement in this field. In this paper, we report initial results of our experiments with nonnative AES that learns from manual evaluation of nonnative essays. For this purpose, we conducted an exercise in which essays written by nonnative English speakers in test environment were rated both manually and...
-
Brian Riordan, Andrea Horbach, Aoife Cah...|Apr 24th, 2017|conferencePaperBrian Riordan, Andrea Horbach, Aoife Cah...Apr 24th, 2017
-
Ute Schmid, Christina Zeller, Tarek Beso...|Apr 24th, 2017|bookSectionUte Schmid, Christina Zeller, Tarek Beso...Apr 24th, 2017
-
Joshua Wilson|Apr 24th, 2017|journalArticleJoshua WilsonApr 24th, 2017