Search
74 resources
-
Rahul R. Divekar, Jaimie Drozdal, Yalun ...|Jun 8th, 2018|conferencePaperRahul R. Divekar, Jaimie Drozdal, Yalun ...Jun 8th, 2018
-
Su-Youn Yoon, Suma Bhat|May 10th, 2018|journalArticleSu-Youn Yoon, Suma BhatMay 10th, 2018
-
Chaitanya Ramineni, David Williamson|Apr 27th, 2018|journalArticleChaitanya Ramineni, David WilliamsonApr 27th, 2018
Notable mean score differences for the e‐rater® automated scoring engine and for humans for essays from certain demographic groups were observed for the GRE® General Test in use before the major revision of 2012, called rGRE. The use of e‐rater as a check‐score model with discrepancy thresholds prevented an adverse impact on the examinee score at the item or test level. Despite this control, there remains a need to understand the root causes of these demographically based score differences...
-
Kavita Ganesan|Mar 5th, 2018|preprintKavita GanesanMar 5th, 2018
Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times...
-
Stefanie A. Wind, Edward W. Wolfe, Georg...|Jan 2nd, 2018|journalArticleStefanie A. Wind, Edward W. Wolfe, Georg...Jan 2nd, 2018
-
Evelin Amorim, Marcia Cançado, Adriano V...|Mar 10th, 2018|conferencePaperEvelin Amorim, Marcia Cançado, Adriano V...Mar 10th, 2018
-
Akshay Chaturvedi, Onkar Pandit, Utpal G...|Mar 10th, 2018|conferencePaperAkshay Chaturvedi, Onkar Pandit, Utpal G...Mar 10th, 2018
-
Yaliang Li, Liuyi Yao, Nan Du|Mar 10th, 2018|journalArticleYaliang Li, Liuyi Yao, Nan DuMar 10th, 2018
The past few years have witnessed the flourishing of crowdsourced medical question answering (Q&A) websites. Patients who have medical information demands tend to post questions about their health conditions on these crowdsourced Q&A websites and get answers from other users. However, we observe that a large portion of new medical questions cannot be answered in time or receive only few answers from these websites. On the other hand, we notice that solved questions have great...
-
Nanyun Peng, Marjan Ghazvininejad, Jonat...|Mar 10th, 2018|conferencePaperNanyun Peng, Marjan Ghazvininejad, Jonat...Mar 10th, 2018
-
Cynthia Rudin|Mar 10th, 2018|journalArticleCynthia RudinMar 10th, 2018
Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|Oct 2nd, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...Oct 2nd, 2017
-
Keelan Evanini, Maurice Cogan Hauck, Ken...|May 2nd, 2017|journalArticleKeelan Evanini, Maurice Cogan Hauck, Ken...May 2nd, 2017
This report is the fifth in a series concerning English language proficiency (ELP) assessments for English learners (ELs) in kindergarten through 12th grade in the United States. The series, produced by Educational Testing Service (ETS), is intended to provide theory‐ and evidence‐based principles and recommendations for improving next‐generationELPassessment systems, policies, and practices and to stimulate discussion on better serving K–12ELstudents. The first report articulated a...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|May 10th, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...May 10th, 2017
-
Sanjeev Arora, Yingyu Liang, Tengyu Ma|Mar 10th, 2017|conferencePaperSanjeev Arora, Yingyu Liang, Tengyu MaMar 10th, 2017
-
Ryan Lowe, Michael Noseworthy, Iulian Vl...|Mar 10th, 2017|conferencePaperRyan Lowe, Michael Noseworthy, Iulian Vl...Mar 10th, 2017
-
Nitin Madnani, Anastassia Loukina, Alina...|Mar 10th, 2017|conferencePaperNitin Madnani, Anastassia Loukina, Alina...Mar 10th, 2017
-
Amber Nigam|Mar 10th, 2017|conferencePaperAmber NigamMar 10th, 2017
Automated Essay Scoring (AES) has been quite popular and is being widely used. However, lack of appropriate methodology for rating nonnative English speakers' essays has meant a lopsided advancement in this field. In this paper, we report initial results of our experiments with nonnative AES that learns from manual evaluation of nonnative essays. For this purpose, we conducted an exercise in which essays written by nonnative English speakers in test environment were rated both manually and...
-
Brian Riordan, Andrea Horbach, Aoife Cah...|Mar 10th, 2017|conferencePaperBrian Riordan, Andrea Horbach, Aoife Cah...Mar 10th, 2017
-
Ute Schmid, Christina Zeller, Tarek Beso...|Mar 10th, 2017|bookSectionUte Schmid, Christina Zeller, Tarek Beso...Mar 10th, 2017
-
Joshua Wilson|Apr 10th, 2017|journalArticleJoshua WilsonApr 10th, 2017