Search
702 resources
-
Weizhe Yuan, Graham Neubig, Pengfei Liu,...|Oct 27th, 2021|journalArticleWeizhe Yuan, Graham Neubig, Pengfei Liu,...Oct 27th, 2021
A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference...
-
Weizhe Yuan, Graham Neubig, Pengfei Liu,...|Oct 27th, 2021|journalArticleWeizhe Yuan, Graham Neubig, Pengfei Liu,...Oct 27th, 2021
A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference...
-
Anna Rogers, Olga Kovaleva, Anna Rumshis...|Dec 27th, 2020|journalArticleAnna Rogers, Olga Kovaleva, Anna Rumshis...Dec 27th, 2020
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.
-
Joshua Wilson, Jessica Rodrigues|Oct 27th, 2020|journalArticleJoshua Wilson, Jessica RodriguesOct 27th, 2020
-
Vivekanandan S. Kumar, David Boulanger|Sep 15th, 2020|journalArticleVivekanandan S. Kumar, David BoulangerSep 15th, 2020
-
Cong Wang, Xiufeng Liu, Lei Wang|Sep 9th, 2020|journalArticleCong Wang, Xiufeng Liu, Lei WangSep 9th, 2020
-
Makoto Sano, Doris Luft Baker, Marlen Co...|Sep 1st, 2020|journalArticleMakoto Sano, Doris Luft Baker, Marlen Co...Sep 1st, 2020
Purpose: Explore how different automated scoring (AS) models score reliably the expressive language and vocabulary knowledge in depth of young second grade Latino English learners. Design/methodology/approach: Analyze a total of 13,471 English utterances from 217 Latino English learners with random forest, end-to-end memory networks, long short-term memory, and other AS models. Findings: Random forest outperformed the other AS models as measured by the mean of quadratic weighted kappa (QWK =...
-
Andreas Wahlen, Christiane Kuhn, Olga Zl...|Aug 25th, 2020|journalArticleAndreas Wahlen, Christiane Kuhn, Olga Zl...Aug 25th, 2020
-
Jinnie Shin, Mark J. Gierl|Jul 4th, 2020|journalArticleJinnie Shin, Mark J. GierlJul 4th, 2020
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness and the performance of two AES frameworks, each based on machine learning with deep language features, or complex language features, and deep neural...
-
Isaac I. Bejar, Chen Li, Daniel McCaffre...|Jul 2nd, 2020|journalArticleIsaac I. Bejar, Chen Li, Daniel McCaffre...Jul 2nd, 2020
-
Ourania Rotou, André A. Rupp|May 26th, 2020|journalArticleOurania Rotou, André A. RuppMay 26th, 2020
This research report provides a description of the processes of evaluating the “deployability” of automated scoring (AS) systems from the perspective of large‐scale educational assessments in operational settings. It discusses a comprehensive psychometric evaluation that entails analyses that take into consideration the specific purpose of AS, the test design, the quality of human scores, the data collection design needed to train and evaluate the AS model, and the application of statistics...
-
Tianyi Zhang, Varsha Kishore, Felix Wu|Feb 24th, 2020|preprintTianyi Zhang, Varsha Kishore, Felix WuFeb 24th, 2020
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection...
-
Eirini Ntoutsi, Pavlos Fafalios, Ujwal G...|Feb 3rd, 2020|journalArticleEirini Ntoutsi, Pavlos Fafalios, Ujwal G...Feb 3rd, 2020
Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training, and deployment to ensure social good while still...
-
Zuraina Ali|Feb 1st, 2020|journalArticleZuraina AliFeb 1st, 2020
The uses of Artificial Intelligence (AI) seems to be relevant in many fields nowadays due to its ability in providing a simulation of human intelligence processes that are handled by machines; in particular computer systems. This paper concerns with reviewing the uses of AI in language teaching and learning. In particular, it reviews the research on the uses of AI in its application in the learning and teaching of language. Qualitative research method; specifically content analysis, is...
-
Jennifer Altavilla|Sep 27th, 2020|journalArticleJennifer AltavillaSep 27th, 2020
In response to federal policy and the COVID-19 pandemic, schools and districts are using technology to support students designated as English learners (ELs). However, school leaders and teachers have little guidance about how to implement technology effectively to foster these students’ language development and content instruction. To address this need, Jennifer Altavilla raises three concerns specific to technology use with ELs: (1) Technology accessibility and use are equally important,...
-
Esin Durmus, He He, Mona Diab|Oct 27th, 2020|conferencePaperEsin Durmus, He He, Mona DiabOct 27th, 2020
-
Elijah Mayfield, Alan W Black|Oct 27th, 2020|conferencePaperElijah Mayfield, Alan W BlackOct 27th, 2020
-
Joshua Maynez, Shashi Narayan, Bernd Boh...|Oct 27th, 2020|preprintJoshua Maynez, Shashi Narayan, Bernd Boh...Oct 27th, 2020
It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models lead to less human-like responses for open-ended tasks such as language modeling and story generation. In this paper we have analyzed limitations of these models for abstractive document summarization and found that these models are highly prone to hallucinate content that is unfaithful to the input document. We conducted a large scale human evaluation of several neural...
-
Shikib Mehri, Maxine Eskenazi|Oct 27th, 2020|conferencePaperShikib Mehri, Maxine EskenaziOct 27th, 2020
-
Shikib Mehri, Maxine Eskenazi|Oct 27th, 2020|conferencePaperShikib Mehri, Maxine EskenaziOct 27th, 2020