Full Library – Evidence Library – Artificial Intelligence in Measurement and Education

Jinlan Fu, See-Kiong Ng, Zhengbao Jiang,...

|

Oct 28th, 2023

|

journalArticle

Jinlan Fu, See-Kiong Ng, Zhengbao Jiang,...

Oct 28th, 2023

Generative Artificial Intelligence (AI) has enabled the development of sophisticated models that are capable of producing high-caliber text, images, and other outputs through the utilization of large pre-trained models. Nevertheless, assessing the quality of the generation is an even more arduous task than the generation itself, and this issue has not been given adequate consideration recently. This paper proposes a novel evaluation framework, GPTScore, which utilizes the emergent abilities...

Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos, Ryan A. Rossi, Joe B...

|

Oct 28th, 2023

|

preprint

Isabel O. Gallegos, Ryan A. Rossi, Joe B...

Oct 28th, 2023

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural...

Bias and Fairness in Large Language Models: A Survey

Isabel O. Gallegos, Ryan A. Rossi, Joe B...

|

Oct 28th, 2023

|

preprint

Isabel O. Gallegos, Ryan A. Rossi, Joe B...

Oct 28th, 2023

Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural...

Investigating Massive Multilingual Pre-Trained Machine Translation Models for Clinical Domain via Transfer Learning

Lifeng Han, Gleb Erofeev, Irina Sorokina...

|

Oct 28th, 2023

|

preprint

Lifeng Han, Gleb Erofeev, Irina Sorokina...

Oct 28th, 2023

Massively multilingual pre-trained language models (MMPLMs) are developed in recent years demonstrating superpowers and the pre-knowledge they acquire for downstream tasks. This work investigates whether MMPLMs can be applied to clinical domain machine translation (MT) towards entirely unseen languages via transfer learning. We carry out an experimental investigation using Meta-AI's MMPLMs ``wmt21-dense-24-wide-en-X and X-en (WMT21fb)'' which were pre-trained on 7 language pairs and 14...

Fine-tuning ChatGPT for Automatic Scoring

Ehsan Latif, Xiaoming Zhai

|

Oct 28th, 2023

|

journalArticle

Ehsan Latif, Xiaoming Zhai

Oct 28th, 2023

This study highlights the potential of fine-tuned ChatGPT (GPT-3.5) for automatically scoring student written constructed responses using example assessment tasks in science education. Recent studies on OpenAI's generative model GPT-3.5 proved its superiority in predicting the natural language with high accuracy and human-like responses. GPT-3.5 has been trained over enormous online language materials such as journals and Wikipedia; therefore, more than direct usage of pre-trained GPT-3.5 is...

Applying Large Language Models and Chain-of-Thought for Automatic Scoring

Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...

|

Oct 28th, 2023

|

journalArticle

Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...

Oct 28th, 2023

This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment...

Applying Large Language Models and Chain-of-Thought for Automatic Scoring

Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...

|

Oct 28th, 2023

|

journalArticle

Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...

Oct 28th, 2023

This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment...

Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference

Zachary Levonian, Chenglu Li, Wangda Zhu...

|

Oct 28th, 2023

|

journalArticle

Zachary Levonian, Chenglu Li, Wangda Zhu...

Oct 28th, 2023

For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a...

Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference

Zachary Levonian, Chenglu Li, Wangda Zhu...

|

Oct 28th, 2023

|

journalArticle

Zachary Levonian, Chenglu Li, Wangda Zhu...

Oct 28th, 2023

For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a...

A Framework for Responsible Development of Automated Student Feedback with Generative AI

Euan D Lindsay, Aditya Johri, Johannes B...

|

Oct 28th, 2023

|

journalArticle

Euan D Lindsay, Aditya Johri, Johannes B...

Oct 28th, 2023

Providing rich feedback to students is essential for supporting student learning. Recent advances in generative AI, particularly within large language modelling (LLM), provide the opportunity to deliver repeatable, scalable and instant automatically generated feedback to students, making abundant a previously scarce and expensive learning resource. Such an approach is feasible from a technical perspective due to these recent advances in Artificial Intelligence (AI) and Natural Language...

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Yang Liu, Dan Iter, Yichong Xu

|

Oct 28th, 2023

|

preprint

Yang Liu, Dan Iter, Yichong Xu

Oct 28th, 2023

The quality of texts generated by natural language generation (NLG) systems is hard to measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE, have been shown to have relatively low correlation with human judgments, especially for tasks that require creativity and diversity. Recent studies suggest using large language models (LLMs) as reference-free metrics for NLG evaluation, which have the benefit of being applicable to new tasks that lack human references....

Deciphering Stereotypes in Pre-Trained Language Models

Weicheng Ma, Henry Scheible, Brian Wang,...

|

Oct 28th, 2023

|

conferencePaper

Weicheng Ma, Henry Scheible, Brian Wang,...

Oct 28th, 2023

Warning: This paper contains content that is stereotypical and may be upsetting. This paper addresses the issue of demographic stereotypes present in Transformer-based pre-trained language models (PLMs) and aims to deepen our understanding of how these biases are encoded in these models. To accomplish this, we introduce an easy-to-use framework for examining the stereotype-encoding behavior of PLMs through a combination of model probing and textual analyses. Our findings reveal that a small...

Sources of Hallucination by Large Language Models on Inference Tasks

Nick McKenna, Tianyi Li, Liang Cheng

|

Oct 28th, 2023

|

preprint

Nick McKenna, Tianyi Li, Liang Cheng

Oct 28th, 2023

Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavior, and show that these are major sources of hallucination in generative LLMs. First,...

Exploring Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning

Hunter McNichols, Wanyong Feng, Jaewook ...

|

Oct 28th, 2023

|

journalArticle

Hunter McNichols, Wanyong Feng, Jaewook ...

Oct 28th, 2023

Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in both assessments and practices. An important aspect of MCQs is the distractors, i.e., incorrect options that are designed to target specific misconceptions or insufficient knowledge among students. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content...

Guidance for generative AI in education and research

Fengchun Miao, Wayne Holmes

|

Oct 28th, 2023

|

book

Fengchun Miao, Wayne Holmes

Oct 28th, 2023

Using AI to Implement Effective Teaching Strategies in Classrooms: Five Strategies, Including Prompts

Ethan R. Mollick, Lilach Mollick

|

Oct 28th, 2023

|

preprint

Ethan R. Mollick, Lilach Mollick

Oct 28th, 2023

This paper provides guidance for using AI to quickly and easily implement evidence-based teaching strategies that instructors can integrate into their teaching. We discuss five teaching strategies that have proven value but are hard to implement in practice due to time and effort constraints. We show how AI can help instructors create material that supports these strategies and improve student learning. The strategies include providing multiple examples and explanations; uncovering and...

Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

Steven Moore, Huy A. Nguyen, Tianying Ch...

|

Oct 28th, 2023

|

bookSection

Steven Moore, Huy A. Nguyen, Tianying Ch...

Oct 28th, 2023

GenQ: Automated Question Generation to Support Caregivers While Reading Stories with Children

Arun Balajiee Lekshmi Narayanan, Ligia E...

|

Oct 28th, 2023

|

journalArticle

Arun Balajiee Lekshmi Narayanan, Ligia E...

Oct 28th, 2023

When caregivers ask open--ended questions to motivate dialogue with children, it facilitates the child's reading comprehension skills.Although there is scope for use of technological tools, referred here as "intelligent tutoring systems", to scaffold this process, it is currently unclear whether existing intelligent systems that generate human--language like questions is beneficial. Additionally, training data used in the development of these automated question generation systems is...

Enhancing Programming eTextbooks with ChatGPT Generated Counterfactual-Thinking-Inspired Questions

Arun Balajiee Lekshmi Narayanan, Rully A...

|

Oct 28th, 2023

|

journalArticle

Arun Balajiee Lekshmi Narayanan, Rully A...

Oct 28th, 2023

Digital textbooks have become an integral part of everyday learning tasks. In this work, we consider the use of digital textbooks for programming classes. Generally, students struggle with utilizing textbooks on programming to the maximum, with a possible reason being that the example programs provided as illustration of concepts in these textbooks don't offer sufficient interactivity for students, and thereby not sufficiently motivating to explore or understand these programming examples...

Ethical principles for artificial intelligence in education

Andy Nguyen, Ha Ngan Ngo, Yvonne Hong

|

Apr 28th, 2023

|

journalArticle

Andy Nguyen, Ha Ngan Ngo, Yvonne Hong

Apr 28th, 2023

Abstract The advancement of artificial intelligence in education (AIED) has the potential to transform the educational landscape and influence the role of all involved stakeholders. In recent years, the applications of AIED have been gradually adopted to progress our understanding of students’ learning and enhance learning performance and experience. However, the adoption of AIED has led to increasing ethical risks and concerns regarding several aspects such as personal data and...

Search

Publication year