Results – Evidence Library – Artificial Intelligence in Measurement and Education

Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction

EdArXiv

|

Jun 2nd, 2023

|

report

EdArXiv

Jun 2nd, 2023

Coaching, which involves classroom observation and expert feedback, is a widespread and fundamental part of teacher training. However, the majority of teachers do not have access to consistent, high quality coaching due to limited resources and access to expertise. We explore whether generative AI could become a cost-effective complement to expert feedback by serving as an automated teacher coach. In doing so, we propose three teacher coaching tasks for generative AI: (A) scoring transcript...

Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction

EdArXiv

|

Jun 2nd, 2023

|

report

EdArXiv

Jun 2nd, 2023

Coaching, which involves classroom observation and expert feedback, is a widespread and fundamental part of teacher training. However, the majority of teachers do not have access to consistent, high quality coaching due to limited resources and access to expertise. We explore whether generative AI could become a cost-effective complement to expert feedback by serving as an automated teacher coach. In doing so, we propose three teacher coaching tasks for generative AI: (A) scoring transcript...

Can artificial Intelligence Technology Promote the Improvement of Student Learning Outcomes?——Meta Analysis Based on 50 Experimental and Quasi Experimental Studies

Lijuan Wang, Miaomiao Zhao

|

Jun 1st, 2024

|

conferencePaper

Lijuan Wang, Miaomiao Zhao

Jun 1st, 2024

Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction

EdArXiv

|

Jun 2nd, 2023

|

report

EdArXiv

Jun 2nd, 2023

Coaching, which involves classroom observation and expert feedback, is a widespread and fundamental part of teacher training. However, the majority of teachers do not have access to consistent, high quality coaching due to limited resources and access to expertise. We explore whether generative AI could become a cost-effective complement to expert feedback by serving as an automated teacher coach. In doing so, we propose three teacher coaching tasks for generative AI: (A) scoring transcript...

Is ChatGPT a Good Teacher Coach? Measuring Zero-Shot Performance For Scoring and Providing Actionable Insights on Classroom Instruction

Rose Wang, Dorottya Demszky

|

Jun 2nd, 2023

|

preprint

Rose Wang, Dorottya Demszky

Jun 2nd, 2023

Coaching, which involves classroom observation and expert feedback, is a widespread and fundamental part of teacher training. However, the majority of teachers do not have access to consistent, high quality coaching due to limited resources and access to expertise. We explore whether generative AI could become a cost-effective complement to expert feedback by serving as an automated teacher coach. In doing so, we propose three teacher coaching tasks for generative AI: (A) scoring transcript...

Learning by Analogy: Diverse Questions Generation in Math Word Problem

Zihao Zhou, Maizhen Ning, Qiufeng Wang

|

Jun 1st, 2023

|

conferencePaper

Zihao Zhou, Maizhen Ning, Qiufeng Wang

Jun 1st, 2023

Artificial Intelligence in Education Technologies: New Development and Innovative Practices: Proceedings of 2022 3rd International Conference on Artificial Intelligence in Education Technology

Eric C. K. Cheng, Tianchong Wang, Tim Sc...

|

Jun 1st, 2023

|

book

Eric C. K. Cheng, Tianchong Wang, Tim Sc...

Jun 1st, 2023

Can AI-Generated Text be Reliably Detected?

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

|

Feb 19th, 2024

|

preprint

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

Feb 19th, 2024

The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

Rose E. Wang, Qingyang Zhang, Carly Robi...

|

Jun 1st, 2024

|

preprint

Rose E. Wang, Qingyang Zhang, Carly Robi...

Jun 1st, 2024

Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...

Can AI-Generated Text be Reliably Detected?

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

|

Feb 19th, 2024

|

preprint

Vinu Sankar Sadasivan, Aounon Kumar, Sri...

Feb 19th, 2024

The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Rose E Wang, Ana T Ribeiro, Carly D Robi...

|

journalArticle

Rose E Wang, Ana T Ribeiro, Carly D Robi...

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

Yang Liu, Dan Iter, Yichong Xu

|

May 23rd, 2023

|

preprint

Yang Liu, Dan Iter, Yichong Xu

May 23rd, 2023

The quality of texts generated by natural language generation (NLG) systems is hard to measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE, have been shown to have relatively low correlation with human judgments, especially for tasks that require creativity and diversity. Recent studies suggest using large language models (LLMs) as reference-free metrics for NLG evaluation, which have the benefit of being applicable to new tasks that lack human references....

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

Rose E. Wang, Ana T. Ribeiro, Carly D. R...

|

Oct 3rd, 2024

|

preprint

Rose E. Wang, Ana T. Ribeiro, Carly D. R...

Oct 3rd, 2024

Generative AI, particularly Language Models (LMs), has the potential to transform real-world domains with societal impact, particularly where access to experts is limited. For example, in education, training novice educators with expert guidance is important for effectiveness but expensive, creating significant barriers to improving education quality at scale. This challenge disproportionately harms students from under-served communities, who stand to gain the most from high-quality...

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Jiaan Wang, Yunlong Liang, Fandong Meng,...

|

Jun 1st, 2023

|

journalArticle

Jiaan Wang, Yunlong Liang, Fandong Meng,...

Jun 1st, 2023

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with...

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans,...

|

Jan 10th, 2023

|

preprint

Jason Wei, Xuezhi Wang, Dale Schuurmans,...

Jan 10th, 2023

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of...

Comparing the Quality of Human and ChatGPT Feedback on Students’ Writing

Jacob Steiss, Tamara Tate, Steve Graham,...

|

Sep 7th, 2023

|

preprint

Jacob Steiss, Tamara Tate, Steve Graham,...

Sep 7th, 2023

Offering students formative feedback on drafts of their writing is an effective way to facilitate writing development. This study examined the ability of generative AI (i.e., ChatGPT) to provide formative feedback on students’ compositions. We compared the quality of human and AI feedback by scoring the feedback each provided on secondary student essays (n=200) on five measures of feedback quality: the degree to which feedback (a) was criteria-based, (b) provided clear directions for...

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Peiyi Wang, Lei Li, Zhihong Shao

|

Feb 19th, 2024

|

preprint

Peiyi Wang, Lei Li, Zhihong Shao

Feb 19th, 2024

In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

Comparing the Quality of Human and ChatGPT Feedback on Students’ Writing

Jacob Steiss, Tamara Tate, Steve Graham,...

|

Sep 7th, 2023

|

preprint

Jacob Steiss, Tamara Tate, Steve Graham,...

Sep 7th, 2023

Offering students formative feedback on drafts of their writing is an effective way to facilitate writing development. This study examined the ability of generative AI (i.e., ChatGPT) to provide formative feedback on students’ compositions. We compared the quality of human and AI feedback by scoring the feedback each provided on secondary student essays (n=200) on five measures of feedback quality: the degree to which feedback (a) was criteria-based, (b) provided clear directions for...

Comparing the quality of human and ChatGPT feedback of students’ writing

Jacob Steiss, Tamara Tate, Steve Graham,...

|

Jun 1st, 2024

|

journalArticle

Jacob Steiss, Tamara Tate, Steve Graham,...

Jun 1st, 2024

Structured Abstract Background Offering students formative feedback on their writing is an effective way to facilitate writing development. Recent advances in AI (i.e., ChatGPT) may function as an automated writing evaluation tool, increasing the amount of feedback students receive and diminishing the burden on teachers to provide frequent feedback to large classes. Aims We examined the ability of generative AI (ChatGPT) to provide formative feedback. We compared the quality of human and AI...

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Lei Huang, Weijiang Yu, Weitao Ma

|

Nov 9th, 2023

|

preprint

Lei Huang, Weijiang Yu, Weitao Ma

Nov 9th, 2023

The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses substantial challenges to their practical deployment and raises concerns over the reliability of...

Search

Empirical studies

Technical methods

Publication year