690 resources

  • Ryosuke Nakamoto, Brendan Flanagan, Tais...
    |
    Oct 24th, 2023
    |
    journalArticle
    Ryosuke Nakamoto, Brendan Flanagan, Tais...
    Oct 24th, 2023

    In the realm of mathematics education, self-explanation stands as a crucial learning mechanism, allowing learners to articulate their comprehension of intricate mathematical concepts and strategies. As digital learning platforms grow in prominence, there are mounting opportunities to collect and utilize mathematical self-explanations. However, these opportunities are met with challenges in automated evaluation. Automatic scoring of mathematical self-explanations is crucial for preprocessing...

  • Hyungjoo Chae, Yongho Song, Kai Tzu-iunn...
    |
    Oct 22nd, 2023
    |
    preprint
    Hyungjoo Chae, Yongho Song, Kai Tzu-iunn...
    Oct 22nd, 2023

    Human-like chatbots necessitate the use of commonsense reasoning in order to effectively comprehend and respond to implicit information present within conversations. Achieving such coherence and informativeness in responses, however, is a non-trivial task. Even for large language models (LLMs), the task of identifying and aggregating key evidence within a single hop presents a substantial challenge. This complexity arises because such evidence is scattered across multiple turns in a...

  • Robert Friel, Atindriyo Sanyal
    |
    Oct 22nd, 2023
    |
    preprint
    Robert Friel, Atindriyo Sanyal
    Oct 22nd, 2023

    Large language models (LLMs) have experienced notable advancements in generating coherent and contextually relevant responses. However, hallucinations - incorrect or unfounded claims - are still prevalent, prompting the creation of automated metrics to detect these in LLM outputs. Our contributions include: introducing ChainPoll, an innovative hallucination detection method that excels compared to its counterparts, and unveiling RealHall, a refined collection of benchmark datasets to assess...

  • Ziang Xiao, Susu Zhang, Vivian Lai
    |
    Oct 22nd, 2023
    |
    preprint
    Ziang Xiao, Susu Zhang, Vivian Lai
    Oct 22nd, 2023

    We address a fundamental challenge in Natural Language Generation (NLG) model evaluation -- the design and evaluation of evaluation metrics. Recognizing the limitations of existing automatic metrics and noises from how current human evaluation was conducted, we propose MetricEval, a framework informed by measurement theory, the foundation of educational test design, for conceptualizing and evaluating the reliability and validity of NLG evaluation metrics. The framework formalizes the source...

  • Ziang Xiao, Susu Zhang, Vivian Lai
    |
    Oct 22nd, 2023
    |
    preprint
    Ziang Xiao, Susu Zhang, Vivian Lai
    Oct 22nd, 2023

    We address a fundamental challenge in Natural Language Generation (NLG) model evaluation -- the design and evaluation of evaluation metrics. Recognizing the limitations of existing automatic metrics and noises from how current human evaluation was conducted, we propose MetricEval, a framework informed by measurement theory, the foundation of educational test design, for conceptualizing and evaluating the reliability and validity of NLG evaluation metrics. The framework formalizes the source...

  • Igor Nowacki
    |
    Oct 19th, 2023
    |
    blogPost
    Igor Nowacki
    Oct 19th, 2023

    Stanford University and Research Institutions Release Model Transparency Index TS2 SPACE

  • Igor Nowacki
    |
    Oct 19th, 2023
    |
    blogPost
    Igor Nowacki
    Oct 19th, 2023

    Stanford University and Research Institutions Release Model Transparency Index TS2 SPACE

  • Amos Azaria, Tom Mitchell
    |
    Oct 17th, 2023
    |
    preprint
    Amos Azaria, Tom Mitchell
    Oct 17th, 2023

    While Large Language Models (LLMs) have shown exceptional performance in various tasks, one of their most prominent drawbacks is generating inaccurate or false information with a confident tone. In this paper, we provide evidence that the LLM's internal state can be used to reveal the truthfulness of statements. This includes both statements provided to the LLM, and statements that the LLM itself generates. Our approach is to train a classifier that outputs the probability that a statement...

  • Klang E, Portugez S, Gross R
    |
    Oct 17th, 2023
    |
    journalArticle
    Klang E, Portugez S, Gross R
    Oct 17th, 2023

    Abstract Background The task of writing multiple choice question examinations for medical students is complex, timely and requires significant efforts from clinical staff and faculty. Applying artificial intelligence algorithms in this field of medical education may be advisable. Methods During March to April 2023, we utilized GPT-4, an OpenAI application, to write a 210 multi choice questions-MCQs...

  • Dorottya Demszky, Diyi Yang, David S. Ye...
    |
    Oct 13th, 2023
    |
    journalArticle
    Dorottya Demszky, Diyi Yang, David S. Ye...
    Oct 13th, 2023
  • Surjodeep Sarkar, Manas Gaur, Lujie Kare...
    |
    Oct 12th, 2023
    |
    journalArticle
    Surjodeep Sarkar, Manas Gaur, Lujie Kare...
    Oct 12th, 2023

    Virtual Mental Health Assistants (VMHAs) continuously evolve to support the overloaded global healthcare system, which receives approximately 60 million primary care visits and 6 million emergency room visits annually. These systems, developed by clinical psychologists, psychiatrists, and AI researchers, are designed to aid in Cognitive Behavioral Therapy (CBT). The main focus of VMHAs is to provide relevant information to mental health professionals (MHPs) and engage in meaningful...

  • Jon Chun, Katherine Elkins
    |
    Oct 11th, 2023
    |
    journalArticle
    Jon Chun, Katherine Elkins
    Oct 11th, 2023
  • Jon Chun, Katherine Elkins
    |
    Oct 11th, 2023
    |
    journalArticle
    Jon Chun, Katherine Elkins
    Oct 11th, 2023
  • Ryan Spring
    |
    Oct 11th, 2023
    |
    preprint
    Ryan Spring
    Oct 11th, 2023

    Abstract In order to solve the problem of teachers not assigning and evaluating student writing but not completely trusting AI raters, I created and tested a rating scheme in which an AI model would rate students’ language use based on understandable criteria and humans would quickly check the AI responses while rating content and structure. Teachers tried the scheme and improvements were made based on new data and newly available research. An online practice tool was also created...

  • Ning Miao, Yee Whye Teh, Tom Rainforth
    |
    Oct 5th, 2023
    |
    preprint
    Ning Miao, Yee Whye Teh, Tom Rainforth
    Oct 5th, 2023

    The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning. However, when faced with more complicated problems that require non-linear thinking, even the strongest LLMs make mistakes. To address this, we explore whether LLMs are able to recognize errors in their own step-by-step reasoning, without resorting to external resources. To this end, we propose SelfCheck, a...

  • Ning Miao, Yee Whye Teh, Tom Rainforth
    |
    Oct 5th, 2023
    |
    preprint
    Ning Miao, Yee Whye Teh, Tom Rainforth
    Oct 5th, 2023

    The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning. However, when faced with more complicated problems that require non-linear thinking, even the strongest LLMs make mistakes. To address this, we explore whether LLMs are able to recognize errors in their own step-by-step reasoning, without resorting to external resources. To this end, we propose SelfCheck, a...

  • Jun Ho Choi, Oliver Garrod, Paul Atherto...
    |
    Oct 4th, 2023
    |
    preprint
    Jun Ho Choi, Oliver Garrod, Paul Atherto...
    Oct 4th, 2023

    Education systems in developing countries have few resources to serve large, poor populations. How might generative AI integrate into classrooms? This paper introduces an AI chatbot designed to assist teachers in Sierra Leone with professional development to improve their instruction. We describe initial findings from early implementation across 122 schools and 193 teachers, and analyze its use with qualitative observations and by analyzing queries. Teachers use the system for lesson...

  • Zachary Levonian, Chenglu Li, Wangda Zhu...
    |
    Oct 4th, 2023
    |
    preprint
    Zachary Levonian, Chenglu Li, Wangda Zhu...
    Oct 4th, 2023

    For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a...

  • Luca Benedetto, Paolo Cremonesi, Andrew ...
    |
    Sep 30th, 2023
    |
    journalArticle
    Luca Benedetto, Paolo Cremonesi, Andrew ...
    Sep 30th, 2023

    Question Difficulty Estimation from Text (QDET) is the application of Natural Language Processing techniques to the estimation of a value, either numerical or categorical, which represents the difficulty of questions in educational settings. We give an introduction to the field, build a taxonomy based on question characteristics, and present the various approaches that have been proposed in recent years, outlining opportunities for further research. This survey provides an introduction for...

  • Baphumelele Masikisiki, Vukosi Marivate,...
    |
    Sep 30th, 2023
    |
    preprint
    Baphumelele Masikisiki, Vukosi Marivate,...
    Sep 30th, 2023

    Large Language Models, such as Generative Pre-trained Transformer 3 (aka. GPT-3), have been developed to understand language through the analysis of extensive text data, allowing them to identify patterns and connections between words. While LLMs have demonstrated impressive performance across various text-related tasks, they encounter challenges in tasks associated with reasoning. To address this challenge, Chain of Thought(CoT) prompting method has been proposed as a means to enhance LLMs'...

Last update from database: 01/12/2025, 19:15 (UTC)
Powered by Zotero and Kerko.