In authors or contributors

63 resources

  • Tianqi Wang, Hiroaki Funayama, Hiroki Ou...
    |
    Jan 22nd, 2021
    |
    journalArticle
    Tianqi Wang, Hiroaki Funayama, Hiroki Ou...
    Jan 22nd, 2021
  • Renzhe Yu, Zhen Xu, Sky CH-Wang
    |
    Nov 2nd, 2024
    |
    preprint
    Renzhe Yu, Zhen Xu, Sky CH-Wang
    Nov 2nd, 2024

    The universal availability of ChatGPT and other similar tools since late 2022 has prompted tremendous public excitement and experimental effort about the potential of large language models (LLMs) to improve learning experience and outcomes, especially for learners from disadvantaged backgrounds. However, little research has systematically examined the real-world impacts of LLM availability on educational equity beyond theoretical projections and controlled studies of innovative LLM...

  • Eric C. K. Cheng, Tianchong Wang, Tim Sc...
    |
    Jan 22nd, 2023
    |
    book
    Eric C. K. Cheng, Tianchong Wang, Tim Sc...
    Jan 22nd, 2023
  • Eric C. K. Cheng, Tianchong Wang, Tim Sc...
    |
    Jan 22nd, 2023
    |
    book
    Eric C. K. Cheng, Tianchong Wang, Tim Sc...
    Jan 22nd, 2023
  • Siyuan Wang, Zhuohan Long, Zhihao Fan
    |
    Feb 17th, 2024
    |
    preprint
    Siyuan Wang, Zhuohan Long, Zhihao Fan
    Feb 17th, 2024

    This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...

  • Siyuan Wang, Zhuohan Long, Zhihao Fan
    |
    Feb 17th, 2024
    |
    preprint
    Siyuan Wang, Zhuohan Long, Zhihao Fan
    Feb 17th, 2024

    This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...

  • Rose E. Wang, Qingyang Zhang, Carly Robi...
    |
    Jan 22nd, 2024
    |
    preprint
    Rose E. Wang, Qingyang Zhang, Carly Robi...
    Jan 22nd, 2024

    Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...

  • Sunder Ali Khowaja, Parus Khuwaja, Kapal...
    |
    May 5th, 2024
    |
    journalArticle
    Sunder Ali Khowaja, Parus Khuwaja, Kapal...
    May 5th, 2024

    Abstract ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e....

  • Vinu Sankar Sadasivan, Aounon Kumar, Sri...
    |
    Feb 19th, 2024
    |
    preprint
    Vinu Sankar Sadasivan, Aounon Kumar, Sri...
    Feb 19th, 2024

    The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

  • Weicheng Ma, Henry Scheible, Brian Wang,...
    |
    Jan 22nd, 2023
    |
    conferencePaper
    Weicheng Ma, Henry Scheible, Brian Wang,...
    Jan 22nd, 2023

    Warning: This paper contains content that is stereotypical and may be upsetting. This paper addresses the issue of demographic stereotypes present in Transformer-based pre-trained language models (PLMs) and aims to deepen our understanding of how these biases are encoded in these models. To accomplish this, we introduce an easy-to-use framework for examining the stereotype-encoding behavior of PLMs through a combination of model probing and textual analyses. Our findings reveal that a small...

  • Rose E. Wang, Ana T. Ribeiro, Carly D. R...
    |
    Oct 3rd, 2024
    |
    preprint
    Rose E. Wang, Ana T. Ribeiro, Carly D. R...
    Oct 3rd, 2024

    Generative AI, particularly Language Models (LMs), has the potential to transform real-world domains with societal impact, particularly where access to experts is limited. For example, in education, training novice educators with expert guidance is important for effectiveness but expensive, creating significant barriers to improving education quality at scale. This challenge disproportionately harms students from under-served communities, who stand to gain the most from high-quality...

  • Rose E Wang, Ana T Ribeiro, Carly D Robi...
    |
    Nov 25th, 2024
    |
    journalArticle
    Rose E Wang, Ana T Ribeiro, Carly D Robi...
    Nov 25th, 2024
  • Yang Liu, Dan Iter, Yichong Xu
    |
    Jan 22nd, 2023
    |
    preprint
    Yang Liu, Dan Iter, Yichong Xu
    Jan 22nd, 2023

    The quality of texts generated by natural language generation (NLG) systems is hard to measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE, have been shown to have relatively low correlation with human judgments, especially for tasks that require creativity and diversity. Recent studies suggest using large language models (LLMs) as reference-free metrics for NLG evaluation, which have the benefit of being applicable to new tasks that lack human references....

  • Nigel Fernandez, Aritra Ghosh, Naiming L...
    |
    Jan 22nd, 2022
    |
    preprint
    Nigel Fernandez, Aritra Ghosh, Naiming L...
    Jan 22nd, 2022

    Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring where items can be quite different from one another. However, these approaches have two...

  • Changrong Xiao, Wenxing Ma, Qingping Son...
    |
    Mar 3rd, 2025
    |
    preprint
    Changrong Xiao, Wenxing Ma, Qingping Son...
    Mar 3rd, 2025

    Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable...

  • Changrong Xiao, Wenxing Ma, Qingping Son...
    |
    Mar 3rd, 2025
    |
    preprint
    Changrong Xiao, Wenxing Ma, Qingping Son...
    Mar 3rd, 2025

    Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable...

  • E Prihar, M Lee, M Hopman
    |
    journalArticle
    E Prihar, M Lee, M Hopman
  • Jiaan Wang, Yunlong Liang, Fandong Meng,...
    |
    Jan 22nd, 2023
    |
    journalArticle
    Jiaan Wang, Yunlong Liang, Fandong Meng,...
    Jan 22nd, 2023

    Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with...

  • Jason Wei, Xuezhi Wang, Dale Schuurmans,...
    |
    Jan 10th, 2023
    |
    preprint
    Jason Wei, Xuezhi Wang, Dale Schuurmans,...
    Jan 10th, 2023

    We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of...

  • Jason Wei, Xuezhi Wang, Dale Schuurmans,...
    |
    Jan 10th, 2023
    |
    preprint
    Jason Wei, Xuezhi Wang, Dale Schuurmans,...
    Jan 10th, 2023

    We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of...

Last update from database: 22/01/2026, 14:15 (UTC)
Powered by Zotero and Kerko.