Results – Evidence Library – Artificial Intelligence in Measurement and Education

Learner-Oriented Tool Using Generative Language Model for Angeles Elementary School Primary Students

Jerald Guiwan, Leeron Lacson, Michaella ...

|

May 6th, 2024

|

preprint

Jerald Guiwan, Leeron Lacson, Michaella ...

May 6th, 2024

In the Philippines, elementary students face a significant educational hurdle, particularly in Grade 4, where foundational competencies prove challenging to grasp. This research aims to provide a possible solution for this issue by developing and investigating the functionality of an electronic GLM-powered learner-oriented tool (EGLOT), designed to act as an educational companion that leverages a generative language model to personalize the learning experience for Grade 3 students in...

Evaluating and Optimizing Educational Content with Large Language Model Judgments

Joy He-Yueya, Noah D. Goodman, Emma Brun...

|

May 6th, 2024

|

preprint

Joy He-Yueya, Noah D. Goodman, Emma Brun...

May 6th, 2024

Creating effective educational materials generally requires expensive and time-consuming studies of student learning outcomes. To overcome this barrier, one idea is to build computational models of student learning and use them to optimize instructional materials. However, it is difficult to model the cognitive processes of learning dynamics. We propose an alternative approach that uses Language Models (LMs) as educational experts to assess the impact of various instructions on learning...

Validation of an elicited Imitation test as a measure of Korean language proficiency

Hojung Kim, Changkyung Song, Jiyoung Kim...

|

May 6th, 2024

|

journalArticle

Hojung Kim, Changkyung Song, Jiyoung Kim...

May 6th, 2024

This study presents a modified version of the Korean Elicited Imitation (EI) test, designed to resemble natural spoken language, and validates its reliability as a measure of proficiency. The study assesses the correlation between average test scores and Test of Proficiency in Korean (TOPIK) levels, examining score distributions among beginner, intermediate, and advanced learner groups. Using item response theory (IRT), the study explores the influence of four key facets—learners, items,...

ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A Review

Sunder Ali Khowaja, Parus Khuwaja, Kapal...

|

May 5th, 2024

|

journalArticle

Sunder Ali Khowaja, Parus Khuwaja, Kapal...

May 5th, 2024

Abstract ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e....

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Hugh Zhang, Jeff Da, Dean Lee

|

May 3rd, 2024

|

preprint

Hugh Zhang, Jeff Da, Dean Lee

May 3rd, 2024

Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k...

Some military experts are wary of generative AI

Ryan Heath

|

May 1st, 2024

|

webpage

Ryan Heath

May 1st, 2024

The Pentagon is hitting the brakes on the new technology even as business is charging forward.

Investigating Automatic Scoring and Feedback using Large Language Models

Gloria Ashiya Katuka, Alexander Gain, Ye...

|

May 1st, 2024

|

preprint

Gloria Ashiya Katuka, Alexander Gain, Ye...

May 1st, 2024

Automatic grading and feedback have been long studied using traditional machine learning and deep learning techniques using language models. With the recent accessibility to high performing large language models (LLMs) like LLaMA-2, there is an opportunity to investigate the use of these LLMs for automatic grading and feedback generation. Despite the increase in performance, LLMs require significant computational resources for fine-tuning and additional specific adjustments to enhance their...

Math Multiple Choice Question Generation via Human-Large Language Model Collaboration

Jaewook Lee, Digory Smith, Simon Woodhea...

|

May 1st, 2024

|

preprint

Jaewook Lee, Digory Smith, Simon Woodhea...

May 1st, 2024

Multiple choice questions (MCQs) are a popular method for evaluating students’ knowledge due to their efficiency in administration and grading. Crafting high-quality math MCQs is a labor-intensive process that requires educators to formulate precise stems and plausible distractors. Recent advances in large language models (LLMs) have sparked interest in automating MCQ creation, but challenges persist in ensuring mathematical accuracy and addressing student errors. This paper introduces a...

The Routledge international handbook of automated essay evaluation

Mark D. Shermis, Joshua Wilson

|

May 1st, 2024

|

book

Mark D. Shermis, Joshua Wilson

May 1st, 2024

"The Routledge International Handbook of Automated Essay Evaluation (AEE) is a definitive guide at the intersection of automation, artificial intelligence, and education. This volume encapsulates the ongoing advancement of AEE, reflecting its application in both large-scale and classroom-based assessments to support teaching and learning endeavours"--

PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models

Ishaan Watts, Varun Gumma, Aditya Yadava...

|

May 1st, 2024

|

journalArticle

Ishaan Watts, Varun Gumma, Aditya Yadava...

May 1st, 2024

Evaluation of multilingual Large Language Models (LLMs) is challenging due to a variety of factors – the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data and the lack of local, cultural nuances in translated benchmarks. Hence, it is difficult to do extensive evaluation of LLMs in the multilingual […]

AI, the new wingman of development (The World Bank)

Siddharth Dixit, Indermit S Gill

|

May 10th, 2024

|

report

Siddharth Dixit, Indermit S Gill

May 10th, 2024

The study explores the potential benefits of implementing artificial intelligence (AI) in seven development sectors that receive significant funding from the World Bank. The study provides an overview of the challenges faced by these sectors, including agriculture, healthcare, education, finance, energy, infrastructure, and data. The findings reveal that AI can expedite the achievement of development goals in most of these sectors. The study shows that many organizations already utilize AI...

The Emergence of AI in Educational Settings: The Transformative Influence of Artificial Intelligence on Learning Process

Rachmi Rachmi

|

Apr 30th, 2024

|

journalArticle

Rachmi Rachmi

Apr 30th, 2024

The Neglected 15%: Positive Effects of Hybrid Human-AI Tutoring Among Students with Disabilities

Danielle R. Thomas, Erin Gatz, Shivang G...

|

Apr 29th, 2024

|

preprint

Danielle R. Thomas, Erin Gatz, Shivang G...

Apr 29th, 2024

Incorporating human tutoring with AI holds promise for supporting diverse math learners. In the U.S., approximately 15% of students receive special education services, with limited previous research within AIED on the impact of AI-assisted learning among students with disabilities. Previous work combining human tutors and AI suggests that students with lower prior knowledge, such as lacking basic skills, exhibit greater learning gains compared to their more knowledgeable peers. Building upon...

The Ethics of Advanced AI Assistants

Iason Gabriel, Arianna Manzini, Geoff Ke...

|

Apr 28th, 2024

|

preprint

Iason Gabriel, Arianna Manzini, Geoff Ke...

Apr 28th, 2024

This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of...

geval/README.md at main · nlpyang/geval

Apr 25th, 2024

|

webpage

Apr 25th, 2024

Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment" - nlpyang/geval

AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails

Sankalan Pal Chowdhury, Vilém Zouhar, Mr...

|

Apr 25th, 2024

|

preprint

Sankalan Pal Chowdhury, Vilém Zouhar, Mr...

Apr 25th, 2024

Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providing no guarantees. We posit that while LLMs with certain guardrails can take the place of subject...

The GPT Surprise: Offering Large Language Model Chat in a Massive Coding Class Reduced Engagement but Increased Adopters Exam Performances

Allen Nie, Yash Chandak, Miroslav Suzara...

|

Apr 25th, 2024

|

preprint

Allen Nie, Yash Chandak, Miroslav Suzara...

Apr 25th, 2024

Large language models (LLMs) are quickly being adopted in a wide range of learning experiences, especially via ubiquitous and broadly accessible chat interfaces like ChatGPT and Copilot. This type of interface is readily available to students and teachers around the world, yet relatively little research has been done to assess the impact of such generic tools on student learning. Coding education is an interesting test case, both because LLMs have strong performance on coding tasks, and...

In-Context Learning for Scalable and Online Hallucination Detection in RAGS

Nicolò Cosimo Albanese

|

Apr 20th, 2024

|

conferencePaper

Nicolò Cosimo Albanese

Apr 20th, 2024

Ensuring fidelity to source documents is crucial for the responsible use of Large Language Models (LLMs) in Retrieval Augmented Generation (RAG) systems. We propose a lightweight method for real-time hallucination detection, with potential to be deployed as a model-agnostic microservice to bolster reliability. Using in-context learning, our approach evaluates response factuality at the sentence level without annotated data, promoting transparency and user trust. Compared to other...

In-Context Learning for Scalable and Online Hallucination Detection in RAGS

Nicolò Cosimo Albanese

|

Apr 20th, 2024

|

conferencePaper

Nicolò Cosimo Albanese

Apr 20th, 2024

Ensuring fidelity to source documents is crucial for the responsible use of Large Language Models (LLMs) in Retrieval Augmented Generation (RAG) systems. We propose a lightweight method for real-time hallucination detection, with potential to be deployed as a model-agnostic microservice to bolster reliability. Using in-context learning, our approach evaluates response factuality at the sentence level without annotated data, promoting transparency and user trust. Compared to other...

Using Artificial Intelligence Tools in K-12 Classrooms

RAND

|

Apr 17th, 2024

|

report

RAND

Apr 17th, 2024

Search

Publication year