Full Library – Evidence Library – Artificial Intelligence in Measurement and Education

English grammar multiple-choice question generation using Text-to-Text Transfer Transformer

Peerawat Chomphooyod, Atiwong Suchato, N...

|

Jan 1st, 2023

|

journalArticle

Peerawat Chomphooyod, Atiwong Suchato, N...

Jan 1st, 2023

English grammar multiple-choice questions (MCQs) can be automatically generated to reduce preparation time. Previous studies have focused on semiautomated methods based on the transformation of human-made sentences/articles into MCQs, owing to which the number of generated questions is dependent on the size of a given text corpus. This study proposes an artificial intelligence-assisted MCQ generation system that increases the number of generable questions using controllable text generation...

ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope

Partha Pratim Ray

|

Jan 1st, 2023

|

journalArticle

Partha Pratim Ray

Jan 1st, 2023

In recent years, artificial intelligence (AI) and machine learning have been transforming the landscape of scientific research. Out of which, the chatbot technology has experienced tremendous advancements in recent years, especially with ChatGPT emerging as a notable AI language model. This comprehensive review delves into the background, applications, key challenges, and future directions of ChatGPT. We begin by exploring its origins, development, and underlying technology, before examining...

Responsible AI Standards - Duolingo English Test

Jill Burstein, Kevin Yancey, Klinton Bic...

|

Oct 20th, 2023

|

document

Jill Burstein, Kevin Yancey, Klinton Bic...

Oct 20th, 2023

Artificial Intelligence in Education Technologies: New Development and Innovative Practices: Proceedings of 2022 3rd International Conference on Artificial Intelligence in Education Technology

Eric C. K. Cheng, Tianchong Wang, Tim Sc...

|

Oct 20th, 2023

|

book

Eric C. K. Cheng, Tianchong Wang, Tim Sc...

Oct 20th, 2023

How Useful are Educational Questions Generated by Large Language Models?

Sabina Elkins, Ekaterina Kochmar, Jackie...

|

Oct 20th, 2023

|

preprint

Sabina Elkins, Ekaterina Kochmar, Jackie...

Oct 20th, 2023

Controllable text generation (CTG) by large language models has a huge potential to transform education for teachers and students alike. Specifically, high quality and diverse question generation can dramatically reduce the load on teachers and improve the quality of their educational content. Recent work in this domain has made progress with generation, but fails to show that real teachers judge the generated questions as sufficiently useful for the classroom setting; or if instead the...

GPTScore: Evaluate as You Desire

Jinlan Fu, See-Kiong Ng, Zhengbao Jiang,...

|

Oct 20th, 2023

|

journalArticle

Jinlan Fu, See-Kiong Ng, Zhengbao Jiang,...

Oct 20th, 2023

Generative Artificial Intelligence (AI) has enabled the development of sophisticated models that are capable of producing high-caliber text, images, and other outputs through the utilization of large pre-trained models. Nevertheless, assessing the quality of the generation is an even more arduous task than the generation itself, and this issue has not been given adequate consideration recently. This paper proposes a novel evaluation framework, GPTScore, which utilizes the emergent abilities...

Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos, Ryan A. Rossi, Joe B...

|

Oct 20th, 2023

|

preprint

Isabel O. Gallegos, Ryan A. Rossi, Joe B...

Oct 20th, 2023

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural...

Math Education with Large Language Models: Peril or Promise?

Harsh Kumar, David M. Rothschild, Daniel...

|

Oct 20th, 2023

|

preprint

Harsh Kumar, David M. Rothschild, Daniel...

Oct 20th, 2023

The widespread availability of large language models (LLMs) has provoked both fear and excitement in the domain of education.On one hand, there is the concern that students will offload their coursework to LLMs, limiting what they themselves learn.On the other hand, there is the hope that LLMs might serve as scalable, personalized tutors.Here we conduct a large, pre-registered experiment involving 1200 participants to investigate how exposure to LLM-based explanations affect learning.In the...

Applying Large Language Models and Chain-of-Thought for Automatic Scoring

Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...

|

Oct 20th, 2023

|

journalArticle

Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...

Oct 20th, 2023

This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment...

Guidance for generative AI in education and research

Fengchun Miao, Wayne Holmes

|

Oct 20th, 2023

|

book

Fengchun Miao, Wayne Holmes

Oct 20th, 2023

Sabiá: Portuguese Large Language Models

Ramon Pires, Hugo Abonizio, Thales Sales...

|

Oct 20th, 2023

|

preprint

Ramon Pires, Hugo Abonizio, Thales Sales...

Oct 20th, 2023

As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...

Sabiá: Portuguese Large Language Models

Ramon Pires, Hugo Abonizio, Thales Sales...

|

Oct 20th, 2023

|

preprint

Ramon Pires, Hugo Abonizio, Thales Sales...

Oct 20th, 2023

As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...

CLASS: A Design Framework for Building Intelligent Tutoring Systems Based on Learning Science principles

Shashank Sonkar, Naiming Liu, Debshila M...

|

Oct 20th, 2023

|

conferencePaper

Shashank Sonkar, Naiming Liu, Debshila M...

Oct 20th, 2023

Towards Generalizable Detection of Urgency of Discussion Forum Posts

Valdemar Švábenský, Ryan S. Baker, André...

|

Oct 20th, 2023

|

conferencePaper

Valdemar Švábenský, Ryan S. Baker, André...

Oct 20th, 2023

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Jiaan Wang, Yunlong Liang, Fandong Meng,...

|

Oct 20th, 2023

|

journalArticle

Jiaan Wang, Yunlong Liang, Fandong Meng,...

Oct 20th, 2023

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with...

Rating Short L2 Essays on the CEFR Scale with GPT-4

Kevin P. Yancey, Geoffrey Laflair, Antho...

|

Oct 20th, 2023

|

conferencePaper

Kevin P. Yancey, Geoffrey Laflair, Antho...

Oct 20th, 2023

Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...

Rating Short L2 Essays on the CEFR Scale with GPT-4

Kevin P. Yancey, Geoffrey Laflair, Antho...

|

Oct 20th, 2023

|

conferencePaper

Kevin P. Yancey, Geoffrey Laflair, Antho...

Oct 20th, 2023

Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Shuyan Zhou, Uri Alon, Sumit Agarwal

|

Oct 20th, 2023

|

conferencePaper

Shuyan Zhou, Uri Alon, Sumit Agarwal

Oct 20th, 2023

Learning by Analogy: Diverse Questions Generation in Math Word Problem

Zihao Zhou, Maizhen Ning, Qiufeng Wang

|

Oct 20th, 2023

|

conferencePaper

Zihao Zhou, Maizhen Ning, Qiufeng Wang

Oct 20th, 2023

Using Demographic Data as Predictor Variables: a Questionable Choice

EdArXiv

|

Dec 19th, 2022

|

report

EdArXiv

Dec 19th, 2022

Predictive analytics methods in education are seeing widespread use and are producing increasingly accurate predictions of students’ outcomes. With the increased use of predictive analytics comes increasing concern about fairness for specific subgroups of the population. One approach that has been proposed to increase fairness is using demographic variables directly in models, as predictors. In this paper we explore issues of fairness in the use of demographic variables as predictors of...

Search

Empirical studies

Empirical studies

Technical methods

Publication year