Results – Evidence Library – Artificial Intelligence in Measurement and Education

Artificial Intelligence in Education in Cape Verde: Potential and Challenges

António P. M. Gomes, Bruno F. Gonçalves,...

|

Apr 24th, 2024

|

conferencePaper

António P. M. Gomes, Bruno F. Gonçalves,...

Apr 24th, 2024

This research aims to explore the potentialities and challenges of AI in education in Cape Verde, in order to understand how AI can be used in the teaching–learning context. To carry out the research, a qualitative methodology was adopted supported by a literature review. With support from the documents found, the benefits and potentialities were analyzed, reflecting on the ethical implications of the application of AI in education, with emphasis on privacy, data security and ethics. We...

Uncertainty in Language Models: Assessment through Rank-Calibration

Xinmeng Huang, Shuo Li, Mengxin Yu

|

Apr 24th, 2024

|

conferencePaper

Xinmeng Huang, Shuo Li, Mengxin Yu

Apr 24th, 2024

Detecting ChatGPT-generated essays in a large-scale writing assessment: Is there a bias against non-native English speakers?

Yang Jiang, Jiangang Hao, Michael Fauss,...

|

Aug 24th, 2024

|

journalArticle

Yang Jiang, Jiangang Hao, Michael Fauss,...

Aug 24th, 2024

ChatGPT's ability or prompt quality: what determines the success of generating multiple-choice questions

Yavuz Selim Kıyak

|

Apr 24th, 2024

|

journalArticle

Yavuz Selim Kıyak

Apr 24th, 2024

ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam

Yavuz Selim Kıyak, Özlem Coşkun, Işıl İr...

|

May 24th, 2024

|

journalArticle

Yavuz Selim Kıyak, Özlem Coşkun, Işıl İr...

May 24th, 2024

Large Language Models in Medical Education: Comparing ChatGPT- to Human-Generated Exam Questions

Matthias Carl Laupichler, Johanna Flora ...

|

May 24th, 2024

|

journalArticle

Matthias Carl Laupichler, Johanna Flora ...

May 24th, 2024

Abstract Problem Creating medical exam questions is time consuming, but well-written questions can be used for test-enhanced learning, which has been shown to have a positive effect on student learning. The automated generation of high-quality questions using large language models (LLMs), such as ChatGPT, would therefore be desirable. However, there are no current studies that compare students’ performance on LLM-generated questions to questions...

MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation

Yu Li, Shenyu Zhang, Rui Wu

|

Apr 24th, 2024

|

preprint

Yu Li, Shenyu Zhang, Rui Wu

Apr 24th, 2024

Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models, especially in open-ended text, has consistently presented a significant challenge. Addressing this, recent work has explored the possibility of using LLMs as evaluators. While using a single LLM as an evaluation agent shows potential, it is filled with...

Leveraging Large Language Models for NLG Evaluation: A Survey

Zhen Li, Xiaohan Xu, Tao Shen

|

Apr 24th, 2024

|

journalArticle

Zhen Li, Xiaohan Xu, Tao Shen

Apr 24th, 2024

In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This survey aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to...

Leveraging Large Language Models for NLG Evaluation: A Survey

Zhen Li, Xiaohan Xu, Tao Shen

|

Apr 24th, 2024

|

journalArticle

Zhen Li, Xiaohan Xu, Tao Shen

Apr 24th, 2024

In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This survey aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to...

Generative AI and Its Educational Implications

Kacper Łodzikowski, Peter W. Foltz, John...

|

Apr 24th, 2024

|

journalArticle

Kacper Łodzikowski, Peter W. Foltz, John...

Apr 24th, 2024

We discuss the implications of generative AI on education across four critical sections: the historical development of AI in education, its contemporary applications in learning, societal repercussions, and strategic recommendations for researchers. We propose ways in which generative AI can transform the educational landscape, primarily via its ability to conduct assessment of complex cognitive performances and create personalized content. We also address the challenges of effective...

Evaluating the Quality of AI-Generated Items for a Certification Exam

Alan Mead, Chenxuan Zhou

|

Apr 24th, 2024

|

journalArticle

Alan Mead, Chenxuan Zhou

Apr 24th, 2024

OpenAI’s GPT-3 model can write multiple-choice exam items. This paper reviewed the literature on automatic item generation and then described the recent history of OpenAI GPT models and their operation, and then described a methodology for generating items using these models. This study then critically evaluated GPT-3 at the task of writing multiple-choice exam items for a hypothetical psychometrics exam. We also compared two versions of the GPT-3 model (text-davinci-002 and...

An Automatic Question Usability Evaluation Toolkit

Steven Moore, Eamon Costello, Huy A. Ngu...

|

Apr 24th, 2024

|

bookSection

Steven Moore, Eamon Costello, Huy A. Ngu...

Apr 24th, 2024

ChatGPT 3.5 fails to write appropriate multiple choice practice exam questions

Alexander Ngo, Saumya Gupta, Oliver Perr...

|

Jan 24th, 2024

|

journalArticle

Alexander Ngo, Saumya Gupta, Oliver Perr...

Jan 24th, 2024

ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems

Jon Saad-Falcon, Omar Khattab, Christoph...

|

Apr 24th, 2024

|

preprint

Jon Saad-Falcon, Omar Khattab, Christoph...

Apr 24th, 2024

Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems along the dimensions of context relevance, answer faithfulness, and answer relevance. By creating its own synthetic training data, ARES finetunes lightweight LM judges to assess the quality of individual RAG components. To mitigate potential...

Adapting to AI: how to understand, prepare for, and innovate in a changing landscape

The Chronicle of Higher Educ...

|

Apr 24th, 2024

|

report

The Chronicle of Higher Educ...

Apr 24th, 2024

Can artificial Intelligence Technology Promote the Improvement of Student Learning Outcomes?——Meta Analysis Based on 50 Experimental and Quasi Experimental Studies

Lijuan Wang, Miaomiao Zhao

|

Apr 24th, 2024

|

conferencePaper

Lijuan Wang, Miaomiao Zhao

Apr 24th, 2024

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Peiyi Wang, Lei Li, Zhihong Shao

|

Apr 24th, 2024

|

preprint

Peiyi Wang, Lei Li, Zhihong Shao

Apr 24th, 2024

In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

Rose E. Wang, Qingyang Zhang, Carly Robi...

|

Apr 24th, 2024

|

preprint

Rose E. Wang, Qingyang Zhang, Carly Robi...

Apr 24th, 2024

Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...

The Social life of AI in Education

Ben Williamson

|

Mar 24th, 2024

|

journalArticle

Ben Williamson

Mar 24th, 2024

Elementary English learners’ engagement with automated feedback

Joshua Wilson, Corey Palermo, Arianto Wi...

|

Jun 24th, 2024

|

journalArticle

Joshua Wilson, Corey Palermo, Arianto Wi...

Jun 24th, 2024

Search

Publication year