Full Library – Evidence Library – Artificial Intelligence in Measurement and Education

The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches

Bhashithe Abeysinghe, Ruhan Circi

|

Jun 5th, 2024

|

preprint

Bhashithe Abeysinghe, Ruhan Circi

Jun 5th, 2024

Chatbots have been an interesting application of natural language generation since its inception. With novel transformer based Generative AI methods, building chatbots have become trivial. Chatbots which are targeted at specific domains such as medicine, psychology, and general information retrieval are implemented rapidly. This, however, should not distract from the need to evaluate the chatbot responses. Especially because the natural language generation community does not entirely agree...

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

David Ifeoluwa Adelani, Jessica Ojo, Isr...

|

Jun 5th, 2024

|

preprint

David Ifeoluwa Adelani, Jessica Ojo, Isr...

Jun 5th, 2024

Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 16 typologically-diverse low-resource African...

Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays

Johanna Fleckenstein, Jennifer Meyer, Th...

|

Jun 1st, 2024

|

journalArticle

Johanna Fleckenstein, Jennifer Meyer, Th...

Jun 1st, 2024

The potential application of generative artificial intelligence (AI) in schools and universities poses great challenges, especially for the assessment…

Human-centred learning analytics and AI in education: A systematic literature review

Riordan Alfredo, Vanessa Echeverria, Yue...

|

Jun 5th, 2024

|

journalArticle

Riordan Alfredo, Vanessa Echeverria, Yue...

Jun 5th, 2024

The rapid expansion of Learning Analytics (LA) and Artificial Intelligence in Education (AIED) offers new scalable, data-intensive systems but raises concerns about data privacy and agency. Excluding stakeholders—like students and teachers—from the design process can potentially lead to mistrust and inadequately aligned tools. Despite a shift towards human-centred design in recent LA and AIED research, there remain gaps in our understanding of the importance of human control, safety,...

Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions

Jennifer Meyer, Thorben Jansen, Ronja Sc...

|

Jun 5th, 2024

|

journalArticle

Jennifer Meyer, Thorben Jansen, Ronja Sc...

Jun 5th, 2024

Writing proficiency is an essential skill for upper secondary students that can be enhanced through effective feedback. Creating feedback on writing tasks, however, is time-intensive and presents a challenge for educators, often resulting in students receiving insufficient or no feedback. The advent of text-generating large language models (LLMs) offers a promising solution, namely, automated evidence-based feedback generation. Yet, empirical evidence from randomized controlled studies about...

Comparing the quality of human and ChatGPT feedback of students’ writing

Jacob Steiss, Tamara Tate, Steve Graham,...

|

Jun 5th, 2024

|

journalArticle

Jacob Steiss, Tamara Tate, Steve Graham,...

Jun 5th, 2024

Structured Abstract Background Offering students formative feedback on their writing is an effective way to facilitate writing development. Recent advances in AI (i.e., ChatGPT) may function as an automated writing evaluation tool, increasing the amount of feedback students receive and diminishing the burden on teachers to provide frequent feedback to large classes. Aims We examined the ability of generative AI (ChatGPT) to provide formative feedback. We compared the quality of human and AI...

Using convolutional neural networks to automatically score eight TIMSS 2019 graphical response items

Lillian Tyack, Lale Khorramdel, Matthias...

|

Jun 5th, 2024

|

journalArticle

Lillian Tyack, Lale Khorramdel, Matthias...

Jun 5th, 2024

International large-scale assessments (ILSAs) have used graphical response-based items to measure student ability for decades, but they have yet to implement automated scoring of these responses and instead rely on human scoring alone. To investigate how scores provided by machine algorithms compare to those provided by human raters, we applied convolutional neural networks (CNNs) to classify image-based responses from eight Timss 2019 items. Our results show that the most accurate CNN...

Intelligent tutoring effects on induced emotions and cognitive load of learning-disabled learners

Sarthika Dutt, Neelu Jyothi Ahuja

|

May 29th, 2024

|

journalArticle

Sarthika Dutt, Neelu Jyothi Ahuja

May 29th, 2024

This study addresses the learning requirements of learners with learning difficulties by monitoring their learning experience in an Intelligent Tutoring System. Intelligent Tutoring Systems were developed to enrich the teaching-learning process. In the present work, the interface is designed and developed utilizing the potential of Artificial Intelligence to meet their individual needs. Designing an online learning platform for a learners with learning difficulties requires consideration of...

ChatGPT in education: benefits and challenges of ChatGPT for mathematics and science teaching practices: International Journal of Mathematical Education in Science and Technology: Vol 0, No 0

May 28th, 2024

|

journalArticle

May 28th, 2024

This study investigates the utilisation and perceptions of artificial intelligence (AI) applications among mathematics and science teachers in enhancing students’ learning experience in mathematics classrooms. One prominent AI application this research seeks to explore, given its recent rise in popularity and usage, is ChatGPT. The study aims to explore the frequency and purpose of ChatGPT usage, its potential for student engagement, and its implementation’s perceived effectiveness,...

Towards a Unified Framework for Evaluating Explanations

Juan D. Pinto, Luc Paquette

|

May 22nd, 2024

|

preprint

Juan D. Pinto, Luc Paquette

May 22nd, 2024

The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more heavily emphasized user-centered approaches often based on participatory design methods. This paper reviews how these communities have evaluated interpretability, identifying overlaps and semantic misalignments. We propose moving towards a unified framework...

A.I.’s Black Boxes Just Got a Little Less Mysterious

Kevin Roose

|

May 21st, 2024

|

newspaperArticle

Kevin Roose

May 21st, 2024

Researchers at the A.I. company Anthropic claim to have found clues about the inner workings of large language models, possibly helping to prevent their misuse and to curb their potential threats.

The United States Artificial Intelligence Safety Institute: Vision, Mission, and Strategic Goals

May 20th, 2024

|

report

May 20th, 2024

This summary provides an overview of the U.S.

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

Irina Jurenka, Markus Kunesch, Kevin McK...

|

May 14th, 2024

|

document

Irina Jurenka, Markus Kunesch, Kevin McK...

May 14th, 2024

Artificial Intelligence and Educational Measurement: Opportunities and Threats

Andrew D. Ho

|

May 9th, 2024

|

journalArticle

Andrew D. Ho

May 9th, 2024

I review opportunities and threats that widely accessible Artificial Intelligence (AI)-powered services present for educational statistics and measurement. Algorithmic and computational advances continue to improve approaches to item generation, scale maintenance, test security, test scoring, and score reporting. Predictable misuses of AI for these purposes will result in biased scores, construct underrepresentation, and differential impact over time. Recent efforts to develop standards for...

Bridging large language model disparities: Skill tagging of multilingual educational content

Yerin Kwak, Zachary A. Pardos

|

May 8th, 2024

|

journalArticle

Yerin Kwak, Zachary A. Pardos

May 8th, 2024

The adoption of large language models (LLMs) in education holds much promise. However, like many technological innovations before them, adoption and access can often be inequitable from the outset, creating more divides than they bridge. In this paper, we explore the magnitude of the country and language divide in the leading open‐source and proprietary LLMs with respect to knowledge of K‐12 taxonomies in a variety of countries and their performance on tagging problem content with the...

The transformative potential of Artificial Intelligence for Education

Stephen Bezzina, Alexiei Dingli

|

May 6th, 2024

|

conferencePaper

Stephen Bezzina, Alexiei Dingli

May 6th, 2024

Education today faces a range of challenges, but also unique opportunities, vis-à-vis the evolution of digital technologies. Traditional teaching methods struggle to meet the diverse needs of students, as classrooms comprise learners at different levels. The Education AI project represents an initiative aimed at integrating Artificial Intelligence (AI) into the classroom to address the diversity of students and the challenges faced by traditional teaching methods. The project's core...

Learner-Oriented Tool Using Generative Language Model for Angeles Elementary School Primary Students

Jerald Guiwan, Leeron Lacson, Michaella ...

|

May 6th, 2024

|

preprint

Jerald Guiwan, Leeron Lacson, Michaella ...

May 6th, 2024

In the Philippines, elementary students face a significant educational hurdle, particularly in Grade 4, where foundational competencies prove challenging to grasp. This research aims to provide a possible solution for this issue by developing and investigating the functionality of an electronic GLM-powered learner-oriented tool (EGLOT), designed to act as an educational companion that leverages a generative language model to personalize the learning experience for Grade 3 students in...

Evaluating and Optimizing Educational Content with Large Language Model Judgments

Joy He-Yueya, Noah D. Goodman, Emma Brun...

|

May 6th, 2024

|

preprint

Joy He-Yueya, Noah D. Goodman, Emma Brun...

May 6th, 2024

Creating effective educational materials generally requires expensive and time-consuming studies of student learning outcomes. To overcome this barrier, one idea is to build computational models of student learning and use them to optimize instructional materials. However, it is difficult to model the cognitive processes of learning dynamics. We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning...

Evaluating and Optimizing Educational Content with Large Language Model Judgments

Joy He-Yueya, Noah D. Goodman, Emma Brun...

|

May 6th, 2024

|

preprint

Joy He-Yueya, Noah D. Goodman, Emma Brun...

May 6th, 2024

Creating effective educational materials generally requires expensive and time-consuming studies of student learning outcomes. To overcome this barrier, one idea is to build computational models of student learning and use them to optimize instructional materials. However, it is difficult to model the cognitive processes of learning dynamics. We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning...

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Hugh Zhang, Jeff Da, Dean Lee

|

May 3rd, 2024

|

preprint

Hugh Zhang, Jeff Da, Dean Lee

May 3rd, 2024

Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k...

Search

Empirical studies

Empirical studies

Technical methods

Publication year