In authors or contributors

40 resources

  • Zhen Wang, Klaus Zechner, Yu Sun
    |
    Dec 19th, 2016
    |
    journalArticle
    Zhen Wang, Klaus Zechner, Yu Sun
    Dec 19th, 2016

    As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish rigorous procedures for monitoring the performance of both human and automated scoring processes during operational administrations. This paper...

  • Yu Wang, Madhumitha Gopalakrishnan, Yoav...
    |
    Dec 15th, 2025
    |
    conferencePaper
    Yu Wang, Madhumitha Gopalakrishnan, Yoav...
    Dec 15th, 2025
  • Yu Wang, Madhu Gopalakrishnan, Yoav Berg...
    |
    Dec 15th, 2025
    |
    presentation
    Yu Wang, Madhu Gopalakrishnan, Yoav Berg...
    Dec 15th, 2025
  • Gloria Ashiya Katuka, Alexander Gain, Ye...
    |
    May 1st, 2024
    |
    preprint
    Gloria Ashiya Katuka, Alexander Gain, Ye...
    May 1st, 2024

    Automatic grading and feedback have been long studied using traditional machine learning and deep learning techniques using language models. With the recent accessibility to high performing large language models (LLMs) like LLaMA-2, there is an opportunity to investigate the use of these LLMs for automatic grading and feedback generation. Despite the increase in performance, LLMs require significant computational resources for fine-tuning and additional specific adjustments to enhance their...

  • S Christie, Baptiste Moreau-Pernet, Yu T...
    |
    Jul 24th, 2024
    |
    conferencePaper
    S Christie, Baptiste Moreau-Pernet, Yu T...
    Jul 24th, 2024

    Large language models (LLMs) are increasingly being deployed in user-facing applications in educational settings. Deployed applications often augment LLMs with fine-tuning, custom system prompts, and moderation layers to achieve particular goals. However, the behaviors of LLM-powered systems are difficult to guarantee, and most existing evaluations focus instead on the performance of unmodified 'foun-dation' models. Tools for evaluating such deployed systems are currently sparse, inflexible,...

  • Renzhe Yu, Zhen Xu, Sky CH-Wang
    |
    Nov 2nd, 2024
    |
    preprint
    Renzhe Yu, Zhen Xu, Sky CH-Wang
    Nov 2nd, 2024

    The universal availability of ChatGPT and other similar tools since late 2022 has prompted tremendous public excitement and experimental effort about the potential of large language models (LLMs) to improve learning experience and outcomes, especially for learners from disadvantaged backgrounds. However, little research has systematically examined the real-world impacts of LLM availability on educational equity beyond theoretical projections and controlled studies of innovative LLM...

  • Ruibin Zhao, Yipeng Zhuang, Di Zou
    |
    Nov 28th, 2022
    |
    journalArticle
    Ruibin Zhao, Yipeng Zhuang, Di Zou
    Nov 28th, 2022
  • Scott Andrew Crossley, Perpetual Baffour...
    |
    Dec 15th, 2024
    |
    preprint
    Scott Andrew Crossley, Perpetual Baffour...
    Dec 15th, 2024
  • Xinmeng Huang, Shuo Li, Mengxin Yu
    |
    Dec 15th, 2024
    |
    conferencePaper
    Xinmeng Huang, Shuo Li, Mengxin Yu
    Dec 15th, 2024
  • Yu Li, Shenyu Zhang, Rui Wu
    |
    Dec 15th, 2024
    |
    preprint
    Yu Li, Shenyu Zhang, Rui Wu
    Dec 15th, 2024

    Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models, especially in open-ended text, has consistently presented a significant challenge. Addressing this, recent work has explored the possibility of using LLMs as evaluators. While using a single LLM as an evaluation agent shows potential, it is filled with...

  • Chi-Min Chan, Weize Chen, Yusheng Su
    |
    Aug 14th, 2023
    |
    preprint
    Chi-Min Chan, Weize Chen, Yusheng Su
    Aug 14th, 2023

    Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost. With the emergence of large language models (LLMs), researchers have explored LLMs' potential as alternatives for human evaluation. While these single-agent-based approaches show promise, experimental results suggest that further advancements are needed to bridge the gap between their current effectiveness and human-level evaluation quality. Recognizing that best practices of human...

  • Chi-Min Chan, Weize Chen, Yusheng Su
    |
    Aug 14th, 2023
    |
    preprint
    Chi-Min Chan, Weize Chen, Yusheng Su
    Aug 14th, 2023

    Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost. With the emergence of large language models (LLMs), researchers have explored LLMs' potential as alternatives for human evaluation. While these single-agent-based approaches show promise, experimental results suggest that further advancements are needed to bridge the gap between their current effectiveness and human-level evaluation quality. Recognizing that best practices of human...

  • Ying Xu, Dakuo Wang, Mo Yu
    |
    Dec 15th, 2022
    |
    journalArticle
    Ying Xu, Dakuo Wang, Mo Yu
    Dec 15th, 2022

    Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on the reading education research, we introduce FairytaleQA, a dataset focusing on narrative...

  • Hyungjoo Chae, Yongho Song, Kai Tzu-iunn...
    |
    Oct 22nd, 2023
    |
    preprint
    Hyungjoo Chae, Yongho Song, Kai Tzu-iunn...
    Oct 22nd, 2023

    Human-like chatbots necessitate the use of commonsense reasoning in order to effectively comprehend and respond to implicit information present within conversations. Achieving such coherence and informativeness in responses, however, is a non-trivial task. Even for large language models (LLMs), the task of identifying and aggregating key evidence within a single hop presents a substantial challenge. This complexity arises because such evidence is scattered across multiple turns in a...

  • Zheng Chu, Jingchang Chen, Qianglong Che...
    |
    Dec 15th, 2024
    |
    preprint
    Zheng Chu, Jingchang Chen, Qianglong Che...
    Dec 15th, 2024

    Reasoning, a fundamental cognitive process integral to human intelligence, has garnered substantial interest within artificial intelligence. Notably, recent studies have revealed that chain-of-thought prompting significantly enhances LLM's reasoning capabilities, which attracts widespread attention from both academics and industry. In this paper, we systematically investigate relevant research, summarizing advanced methods through a meticulous taxonomy that offers novel perspectives....

  • Isabel O. Gallegos, Ryan A. Rossi, Joe B...
    |
    Dec 15th, 2023
    |
    preprint
    Isabel O. Gallegos, Ryan A. Rossi, Joe B...
    Dec 15th, 2023

    Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural...

  • Isabel O. Gallegos, Ryan A. Rossi, Joe B...
    |
    Dec 15th, 2023
    |
    preprint
    Isabel O. Gallegos, Ryan A. Rossi, Joe B...
    Dec 15th, 2023

    Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural...

  • Lei Huang, Weijiang Yu, Weitao Ma
    |
    Jan 24th, 2025
    |
    preprint
    Lei Huang, Weijiang Yu, Weitao Ma
    Jan 24th, 2025

    The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), fueling a paradigm shift in information acquisition. Nevertheless, LLMs are prone to hallucination, generating plausible yet nonfactual content. This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval (IR) systems and has attracted intensive research to detect and mitigate such hallucinations. Given the open-ended...

  • Steven Moore, John Stamper, Richard Tong...
    |
    Jul 7th, 2023
    |
    conferencePaper
    Steven Moore, John Stamper, Richard Tong...
    Jul 7th, 2023
  • Andrew M. Olney, Steven Moore, John Stam...
    |
    Jul 7th, 2023
    |
    conferencePaper
    Andrew M. Olney, Steven Moore, John Stam...
    Jul 7th, 2023
Last update from database: 15/12/2025, 20:15 (UTC)
Powered by Zotero and Kerko.