Results – Evidence Library – Artificial Intelligence in Measurement and Education

Ben Williamson

|

Feb 22nd, 2024

|

blogPost

Ben Williamson

Feb 22nd, 2024

Photo by Mick Haupt on Unsplash Over the past year or so, a narrative that AI will inevitably transform education has become widespread. You can find it in the pronouncements of investors, tech ind…

Can AI-Generated Text be Reliably Detected?

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

|

Feb 19th, 2024

|

preprint

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

Feb 19th, 2024

The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Peiyi Wang, Lei Li, Zhihong Shao

|

Feb 19th, 2024

|

preprint

Peiyi Wang, Lei Li, Zhihong Shao

Feb 19th, 2024

In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation

Siyuan Wang, Zhuohan Long, Zhihao Fan

|

Feb 17th, 2024

|

preprint

Siyuan Wang, Zhuohan Long, Zhihao Fan

Feb 17th, 2024

This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...

Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation

Siyuan Wang, Zhuohan Long, Zhihao Fan

|

Feb 17th, 2024

|

preprint

Siyuan Wang, Zhuohan Long, Zhihao Fan

Feb 17th, 2024

This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...

Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education

Nischal Ashok Kumar, Andrew Lan

|

Feb 10th, 2024

|

preprint

Nischal Ashok Kumar, Andrew Lan

Feb 10th, 2024

In computer science education, test cases are an integral part of programming assignments since they can be used as assessment items to test students' programming knowledge and provide personalized feedback on student-written code. The goal of our work is to propose a fully automated approach for test case generation that can accurately measure student knowledge, which is important for two reasons. First, manually constructing test cases requires expert knowledge and is a labor-intensive...

Dora Demszky - Empowering Educators via Language Technology

Jan 29th, 2024

|

webpage

Jan 29th, 2024

A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education

Jacob Doughty, Zipiao Wan, Anishka Bompe...

|

Jan 29th, 2024

|

conferencePaper

Jacob Doughty, Zipiao Wan, Anishka Bompe...

Jan 29th, 2024

Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting

Masahiro Kaneko, Danushka Bollegala, Nao...

|

Jan 28th, 2024

|

preprint

Masahiro Kaneko, Danushka Bollegala, Nao...

Jan 28th, 2024

There exist both scalable tasks, like reading comprehension and fact-checking, where model performance improves with model size, and unscalable tasks, like arithmetic reasoning and symbolic reasoning, where model performance does not necessarily improve with model size. Large language models (LLMs) equipped with Chain-of-Thought (CoT) prompting are able to make accurate incremental predictions even on unscalable tasks. Unfortunately, despite their exceptional reasoning abilities, LLMs tend...

An In-Depth Review of ChatGPT’s Pros and Cons for Learning and Teaching in Education

Agariadne Dwinggo Samala, Xiaoming Zhai,...

|

Jan 25th, 2024

|

journalArticle

Agariadne Dwinggo Samala, Xiaoming Zhai,...

Jan 25th, 2024

As technology progresses, there has been an increasing interest in using Chatbot GPT (Generative Pre-trained Transformer) in education. Chatbot GPT, or ChatGPT, gained one million users within the first week of launching in November 2022 and had amassed over 100 million active users by February 2023. This type of artificial intelligence uses natural language processing to convert it into a user. This paper presents a comprehensive analysis and review of 34 articles published on ChatGPT and...

Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning

Hunter McNichols, Wanyong Feng, Jaewook ...

|

Jan 11th, 2024

|

preprint

Hunter McNichols, Wanyong Feng, Jaewook ...

Jan 11th, 2024

Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable form of assessment. An important aspect of MCQs is the distractors, i.e., incorrect options that are designed to target specific misconceptions or insufficient knowledge among students. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers, which has limited...

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

S. M. Towhidul Islam Tonmoy, S. M. Mehed...

|

Jan 8th, 2024

|

preprint

S. M. Towhidul Islam Tonmoy, S. M. Mehed...

Jan 8th, 2024

As Large Language Models (LLMs) continue to advance in their ability to write human-like text, a key challenge remains around their tendency to hallucinate generating content that appears factual but is ungrounded. This issue of hallucination is arguably the biggest hindrance to safely deploying these powerful LLMs into real-world production systems that impact people's lives. The journey toward widespread adoption of LLMs in practical settings heavily relies on addressing and mitigating...

The Information Age for Education via Artificial Intelligence and Machine Learning: A Bibliometric and Systematic Literature Analysis

Hassan Abuhassna, Fareed Awae, Mahyudin ...

|

May 14th, 2024

|

journalArticle

Hassan Abuhassna, Fareed Awae, Mahyudin ...

May 14th, 2024

The integration of Artificial Intelligence (AI) and Machine Learning (ML) in education is a rapidly evolving field, yet the long-term implications and actual impacts on student learning outcomes require more in-depth study. Address this gap, our study offers a novel approach combining bibliometric analysis and a Systematic Literature Review (SLR), guided by the PRISMA methodology. The first phase, a comprehensive bibliometric analysis, identified key nations, educational institutions,...

Text-based Question Difficulty Prediction: A Systematic Review of Automatic Approaches

Samah AlKhuzaey, Floriana Grasso, Terry ...

|

Sep 14th, 2024

|

journalArticle

Samah AlKhuzaey, Floriana Grasso, Terry ...

Sep 14th, 2024

Abstract Designing and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective. Assessment quality and validity are therefore heavily reliant on the quality of the...

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

Simone Balloccu, Patrícia Schmidtová, Ma...

|

May 14th, 2024

|

preprint

Simone Balloccu, Patrícia Schmidtová, Ma...

May 14th, 2024

Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the...

Generative AI Can Harm Learning

Hamsa Bastani, Osbert Bastani, Alp Sungu...

|

May 14th, 2024

|

preprint

Hamsa Bastani, Osbert Bastani, Alp Sungu...

May 14th, 2024

Generative artificial intelligence (AI) is poised to revolutionize how humans work, and has already demonstrated promise in significantly improving human productivity. However, a key remaining question is how generative AI affects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs. We study the impact of...

The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges

Okan Bulut, Maggie Beiting-Parrish, Jodi...

|

May 14th, 2024

|

preprint

Okan Bulut, Maggie Beiting-Parrish, Jodi...

May 14th, 2024

The integration of artificial intelligence (AI) in educational measurement has revolutionized assessment methods, enabling automated scoring, rapid content analysis, and personalized feedback through machine learning and natural language processing. These advancements provide timely, consistent feedback and valuable insights into student performance, thereby enhancing the assessment experience. However, the deployment of AI in education also raises significant ethical concerns regarding...

Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Zheng Chu, Jingchang Chen, Qianglong Che...

|

May 14th, 2024

|

preprint

Zheng Chu, Jingchang Chen, Qianglong Che...

May 14th, 2024

Reasoning, a fundamental cognitive process integral to human intelligence, has garnered substantial interest within artificial intelligence. Notably, recent studies have revealed that chain-of-thought prompting significantly enhances LLM's reasoning capabilities, which attracts widespread attention from both academics and industry. In this paper, we systematically investigate relevant research, summarizing advanced methods through a meticulous taxonomy that offers novel perspectives....

A Large-Scale Corpus for Assessing Written Argumentation: Persuade 2.0

Scott Andrew Crossley, Perpetual Baffour...

|

May 14th, 2024

|

preprint

Scott Andrew Crossley, Perpetual Baffour...

May 14th, 2024

Towards a New Artificial Intelligence-based Framework for Teachers’ Online Continuous Professional Development Programs: Systematic Review.

Hamza Fakhar, Mohammed Lamrabet, Nouredd...

|

May 14th, 2024

|

journalArticle

Hamza Fakhar, Mohammed Lamrabet, Nouredd...

May 14th, 2024

—In recent years, the Artificial Intelligence (AI) field has witnessed rapid growth, affecting diverse sectors, including education. In this systematic review of literature, we aimed to analyze studies concerning the integration of AI in the continuous professional development (CPD) of teachers in order to generate a global vision on its potential to enhance the quality of CPD programs in the international level, and to provide recommendations for its application in the Moroccan context. To...

Search

Publication year