Full Library – Evidence Library – Artificial Intelligence in Measurement and Education

BARTScore: Evaluating Generated Text as Text Generation

Weizhe Yuan, Graham Neubig, Pengfei Liu,...

|

Jan 31st, 2021

|

journalArticle

Weizhe Yuan, Graham Neubig, Pengfei Liu,...

Jan 31st, 2021

A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference...

BARTScore: Evaluating Generated Text as Text Generation

Weizhe Yuan, Graham Neubig, Pengfei Liu,...

|

Jan 31st, 2021

|

journalArticle

Weizhe Yuan, Graham Neubig, Pengfei Liu,...

Jan 31st, 2021

A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference...

A Primer in BERTology: What We Know About How BERT Works

Anna Rogers, Olga Kovaleva, Anna Rumshis...

|

Dec 31st, 2020

|

journalArticle

Anna Rogers, Olga Kovaleva, Anna Rumshis...

Dec 31st, 2020

Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.

Classification accuracy and efficiency of writing screening using automated essay scoring

Joshua Wilson, Jessica Rodrigues

|

Oct 31st, 2020

|

journalArticle

Joshua Wilson, Jessica Rodrigues

Oct 31st, 2020

Automated Essay Scoring and the Deep Learning Black Box: How Are Rubric Scores Determined?

Vivekanandan S. Kumar, David Boulanger

|

Sep 15th, 2020

|

journalArticle

Vivekanandan S. Kumar, David Boulanger

Sep 15th, 2020

Automated Scoring of Chinese Grades 7–9 Students’ Competence in Interpreting and Arguing from Evidence

Cong Wang, Xiufeng Liu, Lei Wang

|

Sep 9th, 2020

|

journalArticle

Cong Wang, Xiufeng Liu, Lei Wang

Sep 9th, 2020

Measuring the Expressive Language and Vocabulary of Latino English Learners Using Hand Transcribed Speech Data and Automated Scoring

Makoto Sano, Doris Luft Baker, Marlen Co...

|

Sep 1st, 2020

|

journalArticle

Makoto Sano, Doris Luft Baker, Marlen Co...

Sep 1st, 2020

Purpose: Explore how different automated scoring (AS) models score reliably the expressive language and vocabulary knowledge in depth of young second grade Latino English learners. Design/methodology/approach: Analyze a total of 13,471 English utterances from 217 Latino English learners with random forest, end-to-end memory networks, long short-term memory, and other AS models. Findings: Random forest outperformed the other AS models as measured by the mean of quadratic weighted kappa (QWK =...

Automated Scoring of Teachers’ Pedagogical Content Knowledge – A Comparison Between Human and Machine Scoring

Andreas Wahlen, Christiane Kuhn, Olga Zl...

|

Aug 25th, 2020

|

journalArticle

Andreas Wahlen, Christiane Kuhn, Olga Zl...

Aug 25th, 2020

More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms

Jinnie Shin, Mark J. Gierl

|

Jul 4th, 2020

|

journalArticle

Jinnie Shin, Mark J. Gierl

Jul 4th, 2020

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness and the performance of two AES frameworks, each based on machine learning with deep language features, or complex language features, and deep neural...

Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring

Isaac I. Bejar, Chen Li, Daniel McCaffre...

|

Jul 2nd, 2020

|

journalArticle

Isaac I. Bejar, Chen Li, Daniel McCaffre...

Jul 2nd, 2020

Evaluations of Automated Scoring Systems in Practice

Ourania Rotou, André A. Rupp

|

May 26th, 2020

|

journalArticle

Ourania Rotou, André A. Rupp

May 26th, 2020

This research report provides a description of the processes of evaluating the “deployability” of automated scoring (AS) systems from the perspective of large‐scale educational assessments in operational settings. It discusses a comprehensive psychometric evaluation that entails analyses that take into consideration the specific purpose of AS, the test design, the quality of human scores, the data collection design needed to train and evaluate the AS model, and the application of statistics...

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, Varsha Kishore, Felix Wu

|

Feb 24th, 2020

|

preprint

Tianyi Zhang, Varsha Kishore, Felix Wu

Feb 24th, 2020

We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection...

Bias in data‐driven artificial intelligence systems—An introductory survey

Eirini Ntoutsi, Pavlos Fafalios, Ujwal G...

|

Feb 3rd, 2020

|

journalArticle

Eirini Ntoutsi, Pavlos Fafalios, Ujwal G...

Feb 3rd, 2020

Artificial Intelligence (AI)‐based systems are widely employed nowadays to make decisions that have far‐reaching impact on individuals and society. Their decisions might affect everyone, everywhere, and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training, and deployment to ensure social good while still...

Artificial Intelligence (AI): A Review of its Uses in Language Teaching and Learning

Zuraina Ali

|

Feb 1st, 2020

|

journalArticle

Zuraina Ali

Feb 1st, 2020

The uses of Artificial Intelligence (AI) seems to be relevant in many fields nowadays due to its ability in providing a simulation of human intelligence processes that are handled by machines; in particular computer systems. This paper concerns with reviewing the uses of AI in language teaching and learning. In particular, it reviews the research on the uses of AI in its application in the learning and teaching of language. Qualitative research method; specifically content analysis, is...

How technology affects instruction for English learners

Jennifer Altavilla

|

Sep 30th, 2020

|

journalArticle

Jennifer Altavilla

Sep 30th, 2020

In response to federal policy and the COVID-19 pandemic, schools and districts are using technology to support students designated as English learners (ELs). However, school leaders and teachers have little guidance about how to implement technology effectively to foster these students’ language development and content instruction. To address this need, Jennifer Altavilla raises three concerns specific to technology use with ELs: (1) Technology accessibility and use are equally important,...

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization

Esin Durmus, He He, Mona Diab

|

Jan 31st, 2020

|

conferencePaper

Esin Durmus, He He, Mona Diab

Jan 31st, 2020

Should You Fine-Tune BERT for Automated Essay Scoring?

Elijah Mayfield, Alan W Black

|

Jan 31st, 2020

|

conferencePaper

Elijah Mayfield, Alan W Black

Jan 31st, 2020

On Faithfulness and Factuality in Abstractive Summarization

Joshua Maynez, Shashi Narayan, Bernd Boh...

|

Jan 31st, 2020

|

preprint

Joshua Maynez, Shashi Narayan, Bernd Boh...

Jan 31st, 2020

It is well known that the standard likelihood training and approximate decoding objectives in neural text generation models lead to less human-like responses for open-ended tasks such as language modeling and story generation. In this paper we have analyzed limitations of these models for abstractive document summarization and found that these models are highly prone to hallucinate content that is unfaithful to the input document. We conducted a large scale human evaluation of several neural...

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Shikib Mehri, Maxine Eskenazi

|

Jan 31st, 2020

|

conferencePaper

Shikib Mehri, Maxine Eskenazi

Jan 31st, 2020

USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation

Shikib Mehri, Maxine Eskenazi

|

Jan 31st, 2020

|

conferencePaper

Shikib Mehri, Maxine Eskenazi

Jan 31st, 2020

Search

Publication year