18 resources

  • Lijuan Wang, Miaomiao Zhao
    |
    Jan 22nd, 2024
    |
    conferencePaper
    Lijuan Wang, Miaomiao Zhao
    Jan 22nd, 2024
  • Xinyi Lu, Xu Wang
    |
    Jul 9th, 2024
    |
    conferencePaper
    Xinyi Lu, Xu Wang
    Jul 9th, 2024

    Evaluating the quality of automatically generated question items has been a long standing challenge. In this paper, we leverage LLMs to simulate student profiles and generate responses to multiple-choice questions (MCQs). The generative students' responses to MCQs can further support question item evaluation. We propose Generative Students, a prompt architecture designed based on the KLI framework. A generative student profile is a function of the list of knowledge components the student has...

  • Renzhe Yu, Zhen Xu, Sky CH-Wang
    |
    Nov 2nd, 2024
    |
    preprint
    Renzhe Yu, Zhen Xu, Sky CH-Wang
    Nov 2nd, 2024

    The universal availability of ChatGPT and other similar tools since late 2022 has prompted tremendous public excitement and experimental effort about the potential of large language models (LLMs) to improve learning experience and outcomes, especially for learners from disadvantaged backgrounds. However, little research has systematically examined the real-world impacts of LLM availability on educational equity beyond theoretical projections and controlled studies of innovative LLM...

  • Siyuan Wang, Zhuohan Long, Zhihao Fan
    |
    Feb 17th, 2024
    |
    preprint
    Siyuan Wang, Zhuohan Long, Zhihao Fan
    Feb 17th, 2024

    This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...

  • Siyuan Wang, Zhuohan Long, Zhihao Fan
    |
    Feb 17th, 2024
    |
    preprint
    Siyuan Wang, Zhuohan Long, Zhihao Fan
    Feb 17th, 2024

    This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...

  • Rose E. Wang, Qingyang Zhang, Carly Robi...
    |
    Jan 22nd, 2024
    |
    preprint
    Rose E. Wang, Qingyang Zhang, Carly Robi...
    Jan 22nd, 2024

    Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...

  • Sunder Ali Khowaja, Parus Khuwaja, Kapal...
    |
    May 5th, 2024
    |
    journalArticle
    Sunder Ali Khowaja, Parus Khuwaja, Kapal...
    May 5th, 2024

    Abstract ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e....

  • Vinu Sankar Sadasivan, Aounon Kumar, Sri...
    |
    Feb 19th, 2024
    |
    preprint
    Vinu Sankar Sadasivan, Aounon Kumar, Sri...
    Feb 19th, 2024

    The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

  • Rose E. Wang, Ana T. Ribeiro, Carly D. R...
    |
    Oct 3rd, 2024
    |
    preprint
    Rose E. Wang, Ana T. Ribeiro, Carly D. R...
    Oct 3rd, 2024

    Generative AI, particularly Language Models (LMs), has the potential to transform real-world domains with societal impact, particularly where access to experts is limited. For example, in education, training novice educators with expert guidance is important for effectiveness but expensive, creating significant barriers to improving education quality at scale. This challenge disproportionately harms students from under-served communities, who stand to gain the most from high-quality...

  • Rose E Wang, Ana T Ribeiro, Carly D Robi...
    |
    Nov 25th, 2024
    |
    journalArticle
    Rose E Wang, Ana T Ribeiro, Carly D Robi...
    Nov 25th, 2024
  • Peiyi Wang, Lei Li, Zhihong Shao
    |
    Jan 22nd, 2024
    |
    preprint
    Peiyi Wang, Lei Li, Zhihong Shao
    Jan 22nd, 2024

    In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

  • Peiyi Wang, Lei Li, Zhihong Shao
    |
    Feb 19th, 2024
    |
    preprint
    Peiyi Wang, Lei Li, Zhihong Shao
    Feb 19th, 2024

    In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

  • Zheng Chu, Jingchang Chen, Qianglong Che...
    |
    Jan 22nd, 2024
    |
    preprint
    Zheng Chu, Jingchang Chen, Qianglong Che...
    Jan 22nd, 2024

    Reasoning, a fundamental cognitive process integral to human intelligence, has garnered substantial interest within artificial intelligence. Notably, recent studies have revealed that chain-of-thought prompting significantly enhances LLM's reasoning capabilities, which attracts widespread attention from both academics and industry. In this paper, we systematically investigate relevant research, summarizing advanced methods through a meticulous taxonomy that offers novel perspectives....

  • Jacob Steiss, Tamara Tate, Steve Graham,...
    |
    Jun 22nd, 2024
    |
    journalArticle
    Jacob Steiss, Tamara Tate, Steve Graham,...
    Jun 22nd, 2024

    Structured Abstract Background Offering students formative feedback on their writing is an effective way to facilitate writing development. Recent advances in AI (i.e., ChatGPT) may function as an automated writing evaluation tool, increasing the amount of feedback students receive and diminishing the burden on teachers to provide frequent feedback to large classes. Aims We examined the ability of generative AI (ChatGPT) to provide formative feedback. We compared the quality of human and AI...

  • Jacob Doughty, Zipiao Wan, Anishka Bompe...
    |
    Jan 29th, 2024
    |
    conferencePaper
    Jacob Doughty, Zipiao Wan, Anishka Bompe...
    Jan 29th, 2024
  • Abhimanyu Dubey, Abhinav Jauhri, Abhinav...
    |
    Aug 15th, 2024
    |
    preprint
    Abhimanyu Dubey, Abhinav Jauhri, Abhinav...
    Aug 15th, 2024

    Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language...

  • Irina Jurenka, Markus Kunesch, Kevin McK...
    |
    May 14th, 2024
    |
    report
    Irina Jurenka, Markus Kunesch, Kevin McK...
    May 14th, 2024
  • Irina Jurenka, Markus Kunesch, Kevin McK...
    |
    May 14th, 2024
    |
    document
    Irina Jurenka, Markus Kunesch, Kevin McK...
    May 14th, 2024
Last update from database: 22/01/2026, 11:15 (UTC)
Powered by Zotero and Kerko.