Results – Evidence Library – Artificial Intelligence in Measurement and Education

Can artificial Intelligence Technology Promote the Improvement of Student Learning Outcomes?——Meta Analysis Based on 50 Experimental and Quasi Experimental Studies

Lijuan Wang, Miaomiao Zhao

|

Jun 3rd, 2024

|

conferencePaper

Lijuan Wang, Miaomiao Zhao

Jun 3rd, 2024

Can AI-Generated Text be Reliably Detected?

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

|

Feb 19th, 2024

|

preprint

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

Feb 19th, 2024

The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

Rose E. Wang, Qingyang Zhang, Carly Robi...

|

Jun 3rd, 2024

|

preprint

Rose E. Wang, Qingyang Zhang, Carly Robi...

Jun 3rd, 2024

Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...

Can AI-Generated Text be Reliably Detected?

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

|

Feb 19th, 2024

|

preprint

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

Feb 19th, 2024

The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Rose E. Wang, Ana T. Ribeiro, Carly D. R...

|

Oct 3rd, 2024

|

preprint

Rose E. Wang, Ana T. Ribeiro, Carly D. R...

Oct 3rd, 2024

Generative AI, particularly Language Models (LMs), has the potential to transform real-world domains with societal impact, particularly where access to experts is limited. For example, in education, training novice educators with expert guidance is important for effectiveness but expensive, creating significant barriers to improving education quality at scale. This challenge disproportionately harms students from under-served communities, who stand to gain the most from high-quality...

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Peiyi Wang, Lei Li, Zhihong Shao

|

Feb 19th, 2024

|

preprint

Peiyi Wang, Lei Li, Zhihong Shao

Feb 19th, 2024

In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

Comparing the quality of human and ChatGPT feedback of students’ writing

Jacob Steiss, Tamara Tate, Steve Graham,...

|

Jun 3rd, 2024

|

journalArticle

Jacob Steiss, Tamara Tate, Steve Graham,...

Jun 3rd, 2024

Structured Abstract Background Offering students formative feedback on their writing is an effective way to facilitate writing development. Recent advances in AI (i.e., ChatGPT) may function as an automated writing evaluation tool, increasing the amount of feedback students receive and diminishing the burden on teachers to provide frequent feedback to large classes. Aims We examined the ability of generative AI (ChatGPT) to provide formative feedback. We compared the quality of human and AI...

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav...

|

Aug 15th, 2024

|

preprint

Abhimanyu Dubey, Abhinav Jauhri, Abhinav...

Aug 15th, 2024

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language...

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

Irina Jurenka, Markus Kunesch, Kevin McK...

|

May 14th, 2024

|

document

Irina Jurenka, Markus Kunesch, Kevin McK...

May 14th, 2024

Search

Empirical studies

Technical methods

Publication year