Search
702 resources
-
Chris Van Der Lee, Albert Gatt, Emiel Va...|Oct 27th, 2019|conferencePaperChris Van Der Lee, Albert Gatt, Emiel Va...Oct 27th, 2019
-
Yuni Susanti, Takenobu Tokunaga, Hitoshi...|Oct 1st, 2018|journalArticleYuni Susanti, Takenobu Tokunaga, Hitoshi...Oct 1st, 2018
-
Riccardo Guidotti, Anna Monreale, Salvat...|Aug 22nd, 2018|journalArticleRiccardo Guidotti, Anna Monreale, Salvat...Aug 22nd, 2018
In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically...
-
Nitin Madnani, Aoife Cahill|Aug 27th, 2018|conferencePaperNitin Madnani, Aoife CahillAug 27th, 2018
In this position paper, we argue that building operational automated scoring systems is a task that has disciplinary complexity above and beyond standard competitive shared tasks which usually involve applying the latest machine learning techniques to publicly available data in order to obtain the best accuracy. Automated scoring systems warrant significant cross-discipline collaboration of which natural language processing and machine learning are just two of many important components. Such...
-
André A. Rupp|Jul 3rd, 2018|journalArticleAndré A. RuppJul 3rd, 2018
-
Rahul R. Divekar, Jaimie Drozdal, Yalun ...|Jun 8th, 2018|conferencePaperRahul R. Divekar, Jaimie Drozdal, Yalun ...Jun 8th, 2018
-
Su-Youn Yoon, Suma Bhat|May 27th, 2018|journalArticleSu-Youn Yoon, Suma BhatMay 27th, 2018
-
Chaitanya Ramineni, David Williamson|Apr 27th, 2018|journalArticleChaitanya Ramineni, David WilliamsonApr 27th, 2018
Notable mean score differences for the e‐rater® automated scoring engine and for humans for essays from certain demographic groups were observed for the GRE® General Test in use before the major revision of 2012, called rGRE. The use of e‐rater as a check‐score model with discrepancy thresholds prevented an adverse impact on the examinee score at the item or test level. Despite this control, there remains a need to understand the root causes of these demographically based score differences...
-
Kavita Ganesan|Mar 5th, 2018|preprintKavita GanesanMar 5th, 2018
Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times...
-
Stefanie A. Wind, Edward W. Wolfe, Georg...|Jan 2nd, 2018|journalArticleStefanie A. Wind, Edward W. Wolfe, Georg...Jan 2nd, 2018
-
Evelin Amorim, Marcia Cançado, Adriano V...|Oct 27th, 2018|conferencePaperEvelin Amorim, Marcia Cançado, Adriano V...Oct 27th, 2018
-
Akshay Chaturvedi, Onkar Pandit, Utpal G...|Oct 27th, 2018|conferencePaperAkshay Chaturvedi, Onkar Pandit, Utpal G...Oct 27th, 2018
-
Yaliang Li, Liuyi Yao, Nan Du|Oct 27th, 2018|journalArticleYaliang Li, Liuyi Yao, Nan DuOct 27th, 2018
The past few years have witnessed the flourishing of crowdsourced medical question answering (Q&A) websites. Patients who have medical information demands tend to post questions about their health conditions on these crowdsourced Q&A websites and get answers from other users. However, we observe that a large portion of new medical questions cannot be answered in time or receive only few answers from these websites. On the other hand, we notice that solved questions have great...
-
Nanyun Peng, Marjan Ghazvininejad, Jonat...|Oct 27th, 2018|conferencePaperNanyun Peng, Marjan Ghazvininejad, Jonat...Oct 27th, 2018
-
Cynthia Rudin|Oct 27th, 2018|journalArticleCynthia RudinOct 27th, 2018
Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|Oct 2nd, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...Oct 2nd, 2017
-
Keelan Evanini, Maurice Cogan Hauck, Ken...|May 2nd, 2017|journalArticleKeelan Evanini, Maurice Cogan Hauck, Ken...May 2nd, 2017
This report is the fifth in a series concerning English language proficiency (ELP) assessments for English learners (ELs) in kindergarten through 12th grade in the United States. The series, produced by Educational Testing Service (ETS), is intended to provide theory‐ and evidence‐based principles and recommendations for improving next‐generationELPassessment systems, policies, and practices and to stimulate discussion on better serving K–12ELstudents. The first report articulated a...
-
Mark D. Shermis, Liyang Mao, Matthew Mul...|May 27th, 2017|journalArticleMark D. Shermis, Liyang Mao, Matthew Mul...May 27th, 2017
-
Sanjeev Arora, Yingyu Liang, Tengyu Ma|Oct 27th, 2017|conferencePaperSanjeev Arora, Yingyu Liang, Tengyu MaOct 27th, 2017
-
Ryan Lowe, Michael Noseworthy, Iulian Vl...|Oct 27th, 2017|conferencePaperRyan Lowe, Michael Noseworthy, Iulian Vl...Oct 27th, 2017