Full Library – Evidence Library – Artificial Intelligence in Measurement and Education

Navigating the generative AI era: Introducing the AI assessment scale for ethical GenAI assessment

Mike Perkins, Leon Furze, Jasper Roe

|

Apr 25th, 2023

|

journalArticle

Mike Perkins, Leon Furze, Jasper Roe

Apr 25th, 2023

Recent developments in Generative Artificial Intelligence (GenAI) have created a paradigm shift in multiple areas of society, and the use of these technologies is likely to become a defining feature of education in coming decades. GenAI offers transformative pedagogical opportunities, while simultaneously posing ethical and academic challenges. Against this backdrop, we outline a practical, simple, and sufficiently comprehensive tool to allow for the integration of GenAI tools into...

Sabiá: Portuguese Large Language Models

Ramon Pires, Hugo Abonizio, Thales Sales...

|

Apr 25th, 2023

|

preprint

Ramon Pires, Hugo Abonizio, Thales Sales...

Apr 25th, 2023

As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...

Sabiá: Portuguese Large Language Models

Ramon Pires, Hugo Abonizio, Thales Sales...

|

Apr 25th, 2023

|

preprint

Ramon Pires, Hugo Abonizio, Thales Sales...

Apr 25th, 2023

As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...

ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope

Partha Pratim Ray

|

Apr 25th, 2023

|

journalArticle

Partha Pratim Ray

Apr 25th, 2023

In recent years, artificial intelligence (AI) and machine learning have been transforming the landscape of scientific research. Out of which, the chatbot technology has experienced tremendous advancements in recent years, especially with ChatGPT emerging as a notable AI language model. This comprehensive review delves into the background, applications, key challenges, and future directions of ChatGPT. We begin by exploring its origins, development, and underlying technology, before examining...

CLASS: A Design Framework for Building Intelligent Tutoring Systems Based on Learning Science principles

Shashank Sonkar, Naiming Liu, Debshila M...

|

Apr 25th, 2023

|

conferencePaper

Shashank Sonkar, Naiming Liu, Debshila M...

Apr 25th, 2023

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

Mirac Suzgun, Nathan Scales, Nathanael S...

|

Apr 25th, 2023

|

preprint

Mirac Suzgun, Nathan Scales, Nathanael S...

Apr 25th, 2023

BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models. Language models have already made good progress on this benchmark, with the best model in the BIG-Bench paper outperforming average reported human-rater results on 65% of the BIG-Bench tasks via few-shot prompting. But on what tasks do language models fall short of average human-rater performance, and are those tasks actually unsolvable by...

Towards Generalizable Detection of Urgency of Discussion Forum Posts

Valdemar Švábenský, Ryan S. Baker, André...

|

Apr 25th, 2023

|

conferencePaper

Valdemar Švábenský, Ryan S. Baker, André...

Apr 25th, 2023

Approximating Online Human Evaluation of Social Chatbots with Prompting

Ekaterina Svikhnushina, Pearl Pu

|

Apr 25th, 2023

|

preprint

Ekaterina Svikhnushina, Pearl Pu

Apr 25th, 2023

As conversational models become increasingly available to the general public, users are engaging with this technology in social interactions. Such unprecedented interaction experiences may pose considerable social and psychological risks to the users unless the technology is properly controlled. This highlights the need for scalable and robust evaluation metrics for conversational chatbots. Existing evaluation metrics aim to automate offline user evaluation and approximate human judgment of...

Auditing and Mitigating Cultural Bias in LLMs

Yan Tao, Olga Viberg, Ryan S. Baker

|

Apr 25th, 2023

|

journalArticle

Yan Tao, Olga Viberg, Ryan S. Baker

Apr 25th, 2023

Culture fundamentally shapes people's reasoning, behavior, and communication. Generative artificial intelligence (AI) technologies may cause a shift towards a dominant culture. As people increasingly use AI to expedite and even automate various professional and personal tasks, cultural values embedded in AI models may bias authentic expression. We audit large language models for cultural bias, comparing their responses to nationally representative survey data, and evaluate country-specific...

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Jiaan Wang, Yunlong Liang, Fandong Meng,...

|

Apr 25th, 2023

|

journalArticle

Jiaan Wang, Yunlong Liang, Fandong Meng,...

Apr 25th, 2023

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with...

CRASE+® for ACT Writing Technical Report

Scott Wood

|

Apr 25th, 2023

|

report

Scott Wood

Apr 25th, 2023

Rating Short L2 Essays on the CEFR Scale with GPT-4

Kevin P. Yancey, Geoffrey Laflair, Antho...

|

Apr 25th, 2023

|

conferencePaper

Kevin P. Yancey, Geoffrey Laflair, Antho...

Apr 25th, 2023

Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...

Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency

Eric Zelikman, Wanjing Anya Ma, Jasmine ...

|

Apr 25th, 2023

|

conferencePaper

Eric Zelikman, Wanjing Anya Ma, Jasmine ...

Apr 25th, 2023

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Shuyan Zhou, Uri Alon, Sumit Agarwal

|

Apr 25th, 2023

|

conferencePaper

Shuyan Zhou, Uri Alon, Sumit Agarwal

Apr 25th, 2023

Learning by Analogy: Diverse Questions Generation in Math Word Problem

Zihao Zhou, Maizhen Ning, Qiufeng Wang

|

Apr 25th, 2023

|

conferencePaper

Zihao Zhou, Maizhen Ning, Qiufeng Wang

Apr 25th, 2023

Using Demographic Data as Predictor Variables: a Questionable Choice

EdArXiv

|

Dec 19th, 2022

|

report

EdArXiv

Dec 19th, 2022

Predictive analytics methods in education are seeing widespread use and are producing increasingly accurate predictions of students’ outcomes. With the increased use of predictive analytics comes increasing concern about fairness for specific subgroups of the population. One approach that has been proposed to increase fairness is using demographic variables directly in models, as predictors. In this paper we explore issues of fairness in the use of demographic variables as predictors of...

Less is More: Parameter-Free Text Classification with Gzip

Zhiying Jiang, Matthew Y. R. Yang, Mikha...

|

Dec 19th, 2022

|

preprint

Zhiying Jiang, Matthew Y. R. Yang, Mikha...

Dec 19th, 2022

Deep neural networks (DNNs) are often used for text classification tasks as they usually achieve high levels of accuracy. However, DNNs can be computationally intensive with billions of parameters and large amounts of labeled data, which can make them expensive to use, to optimize and to transfer to out-of-distribution (OOD) cases in practice. In this paper, we propose a non-parametric alternative to DNNs that's easy, light-weight and universal in text classification: a combination of a...

A Graph Convolutional Network Feature Learning Framework for Interpretable Geometry Problem Solving

Fucheng Guo, Pengpeng Jian

|

Dec 18th, 2022

|

conferencePaper

Fucheng Guo, Pengpeng Jian

Dec 18th, 2022

MinMax fairness: from Rawlsian Theory of Justice to solution for algorithmic bias

Flavia Barsotti, Rüya Gökhan Koçer

|

Nov 30th, 2022

|

journalArticle

Flavia Barsotti, Rüya Gökhan Koçer

Nov 30th, 2022

This paper presents an intuitive explanation about why and how Rawlsian Theory of Justice (Rawls in A theory of justice, Harvard University Press, Harvard, 1971) provides the foundations to a solution for algorithmic bias. The contribution of the paper is to discuss and show why Rawlsian ideas in their original form (e.g. the veil of ignorance, original position, and allowing inequalities that serve the worst-off) are relevant to operationalize fairness for algorithmic decision making. The...

AI-assisted automated scoring of picture-cued writing tasks for language assessment

Ruibin Zhao, Yipeng Zhuang, Di Zou

|

Nov 28th, 2022

|

journalArticle

Ruibin Zhao, Yipeng Zhuang, Di Zou

Nov 28th, 2022

Search

Publication year