Results – Evidence Library – Artificial Intelligence in Measurement and Education

Hallucination is Inevitable: An Innate Limitation of Large Language Models

Ziwei Xu, Sanjay Jain, Mohan Kankanhalli...

|

Apr 24th, 2024

|

journalArticle

Ziwei Xu, Sanjay Jain, Mohan Kankanhalli...

Apr 24th, 2024

Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined...

One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering

Han Yang, Mingchen Li, Huixue Zhou

|

Dec 24th, 2023

|

preprint

Han Yang, Mingchen Li, Huixue Zhou

Dec 24th, 2023

To enhance the accuracy and reliability of diverse medical question-answering (QA) tasks and investigate efficient approaches deploying the Large Language Models (LLM) technologies, We developed a novel ensemble learning pipeline by utilizing state-of-the-art LLMs, focusing on improving performance on diverse medical QA datasets.Materials and MethodsOur study employs three medical QA datasets: PubMedQA, MedQA-USMLE, and MedMCQA, each presenting unique challenges in biomedical...

NLP-Based Management of Large Multiple-Choice Test Item Repositories

Valentina Albano, Donatella Firmani, Lui...

|

Dec 15th, 2023

|

journalArticle

Valentina Albano, Donatella Firmani, Lui...

Dec 15th, 2023

Multiple-choice questions (MCQs) are widely used in educational assessments and professional certification exams. Managing large repositories of MCQs, however, poses several challenges due to the high volume of questions and the need to maintain their quality and relevance over time. One of these challenges is the presence of questions that duplicate concepts but are formulated differently. Such questions can indeed elude syntactic controls but provide no added value to the repository. In...

Machine Learning Systems

Dec 14th, 2023

|

webpage

Dec 14th, 2023

Meet LLM360: The First Fully Open-Source and Transparent Large Language Models (LLMs)

Sana Hassan

|

Dec 13th, 2023

|

blogPost

Sana Hassan

Dec 13th, 2023

Open-source Large Language Models (LLMs) such as LLaMA, Falcon, and Mistral offer a range of choices for AI professionals and scholars. Yet, the majority of these LLMs have only made available select components like the end-model weights or inference scripts, with technical documents often narrowing their focus to broader design aspects and basic metrics. This approach restricts advances in the field by reducing clarity in the training methodologies of LLMs, leading to repeated efforts by...

Can AI Be Too Good to Use?

Andy Fell

|

Dec 12th, 2023

|

webpage

Andy Fell

Dec 12th, 2023

Much of the discussion around implementing artificial intelligence systems focuses on whether an AI application is “trustworthy”: Does it produce useful, reliable results, free of bias, while ensuring data privacy? But a new paper published Dec. 7 in Frontiers in Artificial Intelligence poses a different question: What if an AI is just too good?

Navigating the generative AI era: Introducing the AI assessment scale for ethical GenAI assessment

Mike Perkins, Leon Furze, Jasper Roe

|

Dec 12th, 2023

|

webpage

Mike Perkins, Leon Furze, Jasper Roe

Dec 12th, 2023

Recent developments in Generative Artificial Intelligence (GenAI) have created a paradigm shift in multiple areas of society, and the use of these technologies is likely to become a defining feature of education in coming decades. GenAI offers transformative pedagogical opportunities, while simultaneously posing ethical and academic challenges. Against this backdrop, we outline a practical, simple, and sufficiently comprehensive tool to allow for the integration of GenAI tools into...

Classification of Human- and AI-Generated Texts for English, French, German, and Spanish

Kristina Schaaff, Tim Schlippe, Lorenz M...

|

Dec 8th, 2023

|

preprint

Kristina Schaaff, Tim Schlippe, Lorenz M...

Dec 8th, 2023

In this paper we analyze features to classify human- and AI-generated text for English, French, German and Spanish and compare them across languages. We investigate two scenarios: (1) The detection of text generated by AI from scratch, and (2) the detection of text rephrased by AI. For training and testing the classifiers in this multilingual setting, we created a new text corpus covering 10 topics for each language. For the detection of AI-generated text, the combination of all proposed...

Can AI Provide Useful Holistic Essay Scoring?

Tamara Tate, Jacob Steiss, Drew Bailey

|

Dec 5th, 2023

|

preprint

Tamara Tate, Jacob Steiss, Drew Bailey

Dec 5th, 2023

Researchers have sought for decades to automate holistic essay scoring. Over the years, these programs have improved significantly. However, accuracy requires significant amounts of training on human-scored texts—reducing the expediency and usefulness of such programs for routine uses by teachers across the nation on non-standardized prompts. This study analyzes the output of multiple versions of ChatGPT scoring of secondary student essays from three extant corpora and compares it to quality...

What Do We Mean by GenAI? A Systematic Mapping of The Evolution, Trends, and Techniques Involved in Generative AI.

Francisco José García Peñalvo, Andrea Vá...

|

Dec 1st, 2023

|

journalArticle

Francisco José García Peñalvo, Andrea Vá...

Dec 1st, 2023

Artificial Intelligence has become a focal point of interest across various sectors due to its ability to generate creative and realistic outputs. A specific subset, generative artificial intelligence, has seen significant growth, particularly in late 2022. Tools like ChatGPT, Dall-E, or Midjourney have democratized access to Large Language Models, enabling the creation of human-like content. However, the concept 'Generative Artificial Intelligence lacks a universally accepted definition,...

Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design

Lyle Regenwetter, Akash Srivastava, Dan ...

|

Dec 24th, 2023

|

journalArticle

Lyle Regenwetter, Akash Srivastava, Dan ...

Dec 24th, 2023

Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design

Lyle Regenwetter, Akash Srivastava, Dan ...

|

Dec 24th, 2023

|

journalArticle

Lyle Regenwetter, Akash Srivastava, Dan ...

Dec 24th, 2023

Human mobility and language: towards new multilingual approaches with AI

Marion Santorelli, Domenico Catullo

|

Dec 24th, 2023

|

conferencePaper

Marion Santorelli, Domenico Catullo

Dec 24th, 2023

This study investigates the relationships between language and human mobility in terms of investment, accessibility and inclusion and how human-computer interactions, AI (Artificial Intelligence) speech translators might overcome language barrier in a multilingual perspective. After a brief analysis of population dynamics, demographic change and migration based on European Union publications, the aim of this paper is to highlight the strong nexus between language and mobility and how it...

Math Education with Large Language Models: Peril or Promise?

Harsh Kumar, David M. Rothschild, Daniel...

|

Nov 22nd, 2023

|

preprint

Harsh Kumar, David M. Rothschild, Daniel...

Nov 22nd, 2023

The widespread availability of large language models (LLMs) has provoked both fear and excitement in the domain of education.On one hand, there is the concern that students will offload their coursework to LLMs, limiting what they themselves learn.On the other hand, there is the hope that LLMs might serve as scalable, personalized tutors.Here we conduct a large, pre-registered experiment involving 1200 participants to investigate how exposure to LLM-based explanations affect learning.In the...

AI and the Future of Skills, Volume 2: Methods for Evaluating AI Capabilities

OECD

|

Nov 16th, 2023

|

book

OECD

Nov 16th, 2023

ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Yann Hicke, Anmol Agarwal, Qianou Ma

|

Nov 13th, 2023

|

preprint

Yann Hicke, Anmol Agarwal, Qianou Ma

Nov 13th, 2023

Responding to the thousands of student questions on online QA platforms each semester has a considerable human cost, particularly in computing courses with rapidly growing enrollments. To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) from the LLaMA-2 family to ensure data privacy. Our approach combines augmentation techniques such as retrieval augmented generation (RAG),...

ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Yann Hicke, Anmol Agarwal, Qianou Ma

|

Nov 13th, 2023

|

preprint

Yann Hicke, Anmol Agarwal, Qianou Ma

Nov 13th, 2023

Responding to the thousands of student questions on online QA platforms each semester has a considerable human cost, particularly in computing courses with rapidly growing enrollments. To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) from the LLaMA-2 family to ensure data privacy. Our approach combines augmentation techniques such as retrieval augmented generation (RAG),...

Review on Neural Question Generation for Education Purposes

Said Al Faraby, Adiwijaya Adiwijaya, Ade...

|

Oct 31st, 2023

|

journalArticle

Said Al Faraby, Adiwijaya Adiwijaya, Ade...

Oct 31st, 2023

Questioning plays a vital role in education, directing knowledge construction and assessing students’ understanding. However, creating high-level questions requires significant creativity and effort. Automatic question generation is expected to facilitate the generation of not only fluent and relevant but also educationally valuable questions. While rule-based methods are intuitive for short inputs, they struggle with longer and more complex inputs. Neural question generation (NQG) has shown...

Towards social generative AI for education: theory, practices and ethics

Oct 26th, 2023

|

webpage

Oct 26th, 2023

Using State-of-the-Art Speech Models to Evaluate Oral Reading Fluency in Ghana

Owen Henkel, Hannah Horne-Robinson, Libb...

|

Oct 26th, 2023

|

preprint

Owen Henkel, Hannah Horne-Robinson, Libb...

Oct 26th, 2023

This paper reports on a set of three recent experiments utilizing large-scale speech models to evaluate the oral reading fluency (ORF) of students in Ghana. While ORF is a well-established measure of foundational literacy, assessing it typically requires one-on-one sessions between a student and a trained evaluator, a process that is time-consuming and costly. Automating the evaluation of ORF could support better literacy instruction, particularly in education contexts where formative...

Search

Publication year