Search
74 resources
-
Rahul R. Divekar, Jaimie Drozdal, Yalun ...|Jun 8th, 2018|conferencePaperRahul R. Divekar, Jaimie Drozdal, Yalun ...Jun 8th, 2018
-
Su-Youn Yoon, Suma Bhat|May 1st, 2018|journalArticleSu-Youn Yoon, Suma BhatMay 1st, 2018
-
Chaitanya Ramineni, David Williamson|Apr 27th, 2018|journalArticleChaitanya Ramineni, David WilliamsonApr 27th, 2018
Notable mean score differences for the e‐rater® automated scoring engine and for humans for essays from certain demographic groups were observed for the GRE® General Test in use before the major revision of 2012, called rGRE. The use of e‐rater as a check‐score model with discrepancy thresholds prevented an adverse impact on the examinee score at the item or test level. Despite this control, there remains a need to understand the root causes of these demographically based score differences...
-
Kavita Ganesan|Mar 5th, 2018|preprintKavita GanesanMar 5th, 2018
Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times...
-
Stefanie A. Wind, Edward W. Wolfe, Georg...|Jan 2nd, 2018|journalArticleStefanie A. Wind, Edward W. Wolfe, Georg...Jan 2nd, 2018
-
Evelin Amorim, Marcia Cançado, Adriano V...|Dec 1st, 2018|conferencePaperEvelin Amorim, Marcia Cançado, Adriano V...Dec 1st, 2018
-
Akshay Chaturvedi, Onkar Pandit, Utpal G...|Dec 1st, 2018|conferencePaperAkshay Chaturvedi, Onkar Pandit, Utpal G...Dec 1st, 2018
-
Yaliang Li, Liuyi Yao, Nan Du|Dec 1st, 2018|journalArticleYaliang Li, Liuyi Yao, Nan DuDec 1st, 2018
The past few years have witnessed the flourishing of crowdsourced medical question answering (Q&A) websites. Patients who have medical information demands tend to post questions about their health conditions on these crowdsourced Q&A websites and get answers from other users. However, we observe that a large portion of new medical questions cannot be answered in time or receive only few answers from these websites. On the other hand, we notice that solved questions have great...
-
Nanyun Peng, Marjan Ghazvininejad, Jonat...|Dec 1st, 2018|conferencePaperNanyun Peng, Marjan Ghazvininejad, Jonat...Dec 1st, 2018
-
Cynthia Rudin|Dec 1st, 2018|journalArticleCynthia RudinDec 1st, 2018
Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|Oct 2nd, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...Oct 2nd, 2017
-
Keelan Evanini, Maurice Cogan Hauck, Ken...|May 2nd, 2017|journalArticleKeelan Evanini, Maurice Cogan Hauck, Ken...May 2nd, 2017
This report is the fifth in a series concerning English language proficiency (ELP) assessments for English learners (ELs) in kindergarten through 12th grade in the United States. The series, produced by Educational Testing Service (ETS), is intended to provide theory‐ and evidence‐based principles and recommendations for improving next‐generationELPassessment systems, policies, and practices and to stimulate discussion on better serving K–12ELstudents. The first report articulated a...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|May 1st, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...May 1st, 2017
-
Sanjeev Arora, Yingyu Liang, Tengyu Ma|Dec 1st, 2017|conferencePaperSanjeev Arora, Yingyu Liang, Tengyu MaDec 1st, 2017
-
Ryan Lowe, Michael Noseworthy, Iulian Vl...|Dec 1st, 2017|conferencePaperRyan Lowe, Michael Noseworthy, Iulian Vl...Dec 1st, 2017
-
Nitin Madnani, Anastassia Loukina, Alina...|Dec 1st, 2017|conferencePaperNitin Madnani, Anastassia Loukina, Alina...Dec 1st, 2017
-
Amber Nigam|Dec 1st, 2017|conferencePaperAmber NigamDec 1st, 2017
Automated Essay Scoring (AES) has been quite popular and is being widely used. However, lack of appropriate methodology for rating nonnative English speakers' essays has meant a lopsided advancement in this field. In this paper, we report initial results of our experiments with nonnative AES that learns from manual evaluation of nonnative essays. For this purpose, we conducted an exercise in which essays written by nonnative English speakers in test environment were rated both manually and...
-
Brian Riordan, Andrea Horbach, Aoife Cah...|Dec 1st, 2017|conferencePaperBrian Riordan, Andrea Horbach, Aoife Cah...Dec 1st, 2017
-
Ute Schmid, Christina Zeller, Tarek Beso...|Dec 1st, 2017|bookSectionUte Schmid, Christina Zeller, Tarek Beso...Dec 1st, 2017
-
Joshua Wilson|Apr 1st, 2017|journalArticleJoshua WilsonApr 1st, 2017