269 resources

  • Lijuan Wang, Miaomiao Zhao
    |
    Dec 26th, 2024
    |
    conferencePaper
    Lijuan Wang, Miaomiao Zhao
    Dec 26th, 2024
  • Rose E. Wang, Qingyang Zhang, Carly Robi...
    |
    Dec 26th, 2024
    |
    preprint
    Rose E. Wang, Qingyang Zhang, Carly Robi...
    Dec 26th, 2024

    Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...

  • Ziwei Xu, Sanjay Jain, Mohan Kankanhalli...
    |
    Dec 26th, 2024
    |
    journalArticle
    Ziwei Xu, Sanjay Jain, Mohan Kankanhalli...
    Dec 26th, 2024

    Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined...

  • Sana Hassan
    |
    Dec 13th, 2023
    |
    blogPost
    Sana Hassan
    Dec 13th, 2023

    Open-source Large Language Models (LLMs) such as LLaMA, Falcon, and Mistral offer a range of choices for AI professionals and scholars. Yet, the majority of these LLMs have only made available select components like the end-model weights or inference scripts, with technical documents often narrowing their focus to broader design aspects and basic metrics. This approach restricts advances in the field by reducing clarity in the training methodologies of LLMs, leading to repeated efforts by...

  • Andy Fell
    |
    Dec 12th, 2023
    |
    webpage
    Andy Fell
    Dec 12th, 2023

    Much of the discussion around implementing artificial intelligence systems focuses on whether an AI application is “trustworthy”: Does it produce useful, reliable results, free of bias, while ensuring data privacy? But a new paper published Dec. 7 in Frontiers in Artificial Intelligence poses a different question: What if an AI is just too good?

  • Mike Perkins, Leon Furze, Jasper Roe
    |
    Dec 12th, 2023
    |
    webpage
    Mike Perkins, Leon Furze, Jasper Roe
    Dec 12th, 2023

    Recent developments in Generative Artificial Intelligence (GenAI) have created a paradigm shift in multiple areas of society, and the use of these technologies is likely to become a defining feature of education in coming decades. GenAI offers transformative pedagogical opportunities, while simultaneously posing ethical and academic challenges. Against this backdrop, we outline a practical, simple, and sufficiently comprehensive tool to allow for the integration of GenAI tools into...

  • Tamara Tate, Jacob Steiss, Drew Bailey
    |
    Dec 5th, 2023
    |
    preprint
    Tamara Tate, Jacob Steiss, Drew Bailey
    Dec 5th, 2023

    Researchers have sought for decades to automate holistic essay scoring. Over the years, these programs have improved significantly. However, accuracy requires significant amounts of training on human-scored texts—reducing the expediency and usefulness of such programs for routine uses by teachers across the nation on non-standardized prompts. This study analyzes the output of multiple versions of ChatGPT scoring of secondary student essays from three extant corpora and compares it to quality...

  • Lyle Regenwetter, Akash Srivastava, Dan ...
    |
    Dec 26th, 2023
    |
    journalArticle
    Lyle Regenwetter, Akash Srivastava, Dan ...
    Dec 26th, 2023
  • Harsh Kumar, David M. Rothschild, Daniel...
    |
    Nov 22nd, 2023
    |
    preprint
    Harsh Kumar, David M. Rothschild, Daniel...
    Nov 22nd, 2023

    The widespread availability of large language models (LLMs) has provoked both fear and excitement in the domain of education.On one hand, there is the concern that students will offload their coursework to LLMs, limiting what they themselves learn.On the other hand, there is the hope that LLMs might serve as scalable, personalized tutors.Here we conduct a large, pre-registered experiment involving 1200 participants to investigate how exposure to LLM-based explanations affect learning.In the...

  • Yann Hicke, Anmol Agarwal, Qianou Ma
    |
    Nov 13th, 2023
    |
    preprint
    Yann Hicke, Anmol Agarwal, Qianou Ma
    Nov 13th, 2023

    Responding to the thousands of student questions on online QA platforms each semester has a considerable human cost, particularly in computing courses with rapidly growing enrollments. To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) from the LLaMA-2 family to ensure data privacy. Our approach combines augmentation techniques such as retrieval augmented generation (RAG),...

  • Lei Huang, Weijiang Yu, Weitao Ma
    |
    Nov 9th, 2023
    |
    preprint
    Lei Huang, Weijiang Yu, Weitao Ma
    Nov 9th, 2023

    The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses substantial challenges to their practical deployment and raises concerns over the reliability of...

  • Nick McKenna, Tianyi Li, Liang Cheng
    |
    Oct 22nd, 2023
    |
    preprint
    Nick McKenna, Tianyi Li, Liang Cheng
    Oct 22nd, 2023

    Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization. We present a series of behavioral studies on several LLM families (LLaMA, GPT-3.5, and PaLM) which probe their behavior using controlled experiments. We establish two biases originating from pretraining which predict much of their behavior, and show that these are major sources of hallucination in generative LLMs. First,...

  • Ziang Xiao, Susu Zhang, Vivian Lai
    |
    Oct 22nd, 2023
    |
    preprint
    Ziang Xiao, Susu Zhang, Vivian Lai
    Oct 22nd, 2023

    We address a fundamental challenge in Natural Language Generation (NLG) model evaluation -- the design and evaluation of evaluation metrics. Recognizing the limitations of existing automatic metrics and noises from how current human evaluation was conducted, we propose MetricEval, a framework informed by measurement theory, the foundation of educational test design, for conceptualizing and evaluating the reliability and validity of NLG evaluation metrics. The framework formalizes the source...

  • Ziang Xiao, Susu Zhang, Vivian Lai
    |
    Oct 22nd, 2023
    |
    preprint
    Ziang Xiao, Susu Zhang, Vivian Lai
    Oct 22nd, 2023

    We address a fundamental challenge in Natural Language Generation (NLG) model evaluation -- the design and evaluation of evaluation metrics. Recognizing the limitations of existing automatic metrics and noises from how current human evaluation was conducted, we propose MetricEval, a framework informed by measurement theory, the foundation of educational test design, for conceptualizing and evaluating the reliability and validity of NLG evaluation metrics. The framework formalizes the source...

  • Igor Nowacki
    |
    Oct 19th, 2023
    |
    blogPost
    Igor Nowacki
    Oct 19th, 2023

    Stanford University and Research Institutions Release Model Transparency Index TS2 SPACE

  • Amos Azaria, Tom Mitchell
    |
    Oct 17th, 2023
    |
    preprint
    Amos Azaria, Tom Mitchell
    Oct 17th, 2023

    While Large Language Models (LLMs) have shown exceptional performance in various tasks, one of their most prominent drawbacks is generating inaccurate or false information with a confident tone. In this paper, we provide evidence that the LLM's internal state can be used to reveal the truthfulness of statements. This includes both statements provided to the LLM, and statements that the LLM itself generates. Our approach is to train a classifier that outputs the probability that a statement...

  • Surjodeep Sarkar, Manas Gaur, Lujie Kare...
    |
    Oct 12th, 2023
    |
    journalArticle
    Surjodeep Sarkar, Manas Gaur, Lujie Kare...
    Oct 12th, 2023

    Virtual Mental Health Assistants (VMHAs) continuously evolve to support the overloaded global healthcare system, which receives approximately 60 million primary care visits and 6 million emergency room visits annually. These systems, developed by clinical psychologists, psychiatrists, and AI researchers, are designed to aid in Cognitive Behavioral Therapy (CBT). The main focus of VMHAs is to provide relevant information to mental health professionals (MHPs) and engage in meaningful...

  • Jon Chun, Katherine Elkins
    |
    Oct 11th, 2023
    |
    journalArticle
    Jon Chun, Katherine Elkins
    Oct 11th, 2023
  • Ning Miao, Yee Whye Teh, Tom Rainforth
    |
    Oct 5th, 2023
    |
    preprint
    Ning Miao, Yee Whye Teh, Tom Rainforth
    Oct 5th, 2023

    The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning. However, when faced with more complicated problems that require non-linear thinking, even the strongest LLMs make mistakes. To address this, we explore whether LLMs are able to recognize errors in their own step-by-step reasoning, without resorting to external resources. To this end, we propose SelfCheck, a...

  • Jun Ho Choi, Oliver Garrod, Paul Atherto...
    |
    Oct 4th, 2023
    |
    preprint
    Jun Ho Choi, Oliver Garrod, Paul Atherto...
    Oct 4th, 2023

    Education systems in developing countries have few resources to serve large, poor populations. How might generative AI integrate into classrooms? This paper introduces an AI chatbot designed to assist teachers in Sierra Leone with professional development to improve their instruction. We describe initial findings from early implementation across 122 schools and 193 teachers, and analyze its use with qualitative observations and by analyzing queries. Teachers use the system for lesson...

Last update from database: 26/12/2024, 10:15 (UTC)
Powered by Zotero and Kerko.