131 resources

  • Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...
    |
    Oct 14th, 2023
    |
    journalArticle
    Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...
    Oct 14th, 2023

    This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment...

  • Fengchun Miao, Wayne Holmes
    |
    Oct 14th, 2023
    |
    book
    Fengchun Miao, Wayne Holmes
    Oct 14th, 2023
  • Ramon Pires, Hugo Abonizio, Thales Sales...
    |
    Oct 14th, 2023
    |
    preprint
    Ramon Pires, Hugo Abonizio, Thales Sales...
    Oct 14th, 2023

    As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...

  • Ramon Pires, Hugo Abonizio, Thales Sales...
    |
    Oct 14th, 2023
    |
    preprint
    Ramon Pires, Hugo Abonizio, Thales Sales...
    Oct 14th, 2023

    As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...

  • Shashank Sonkar, Naiming Liu, Debshila M...
    |
    Oct 14th, 2023
    |
    conferencePaper
    Shashank Sonkar, Naiming Liu, Debshila M...
    Oct 14th, 2023
  • Valdemar Švábenský, Ryan S. Baker, André...
    |
    Oct 14th, 2023
    |
    conferencePaper
    Valdemar Švábenský, Ryan S. Baker, André...
    Oct 14th, 2023
  • Jiaan Wang, Yunlong Liang, Fandong Meng,...
    |
    Oct 14th, 2023
    |
    journalArticle
    Jiaan Wang, Yunlong Liang, Fandong Meng,...
    Oct 14th, 2023

    Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with...

  • Kevin P. Yancey, Geoffrey Laflair, Antho...
    |
    Oct 14th, 2023
    |
    conferencePaper
    Kevin P. Yancey, Geoffrey Laflair, Antho...
    Oct 14th, 2023

    Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...

  • Kevin P. Yancey, Geoffrey Laflair, Antho...
    |
    Oct 14th, 2023
    |
    conferencePaper
    Kevin P. Yancey, Geoffrey Laflair, Antho...
    Oct 14th, 2023

    Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...

  • Shuyan Zhou, Uri Alon, Sumit Agarwal
    |
    Oct 14th, 2023
    |
    conferencePaper
    Shuyan Zhou, Uri Alon, Sumit Agarwal
    Oct 14th, 2023
  • Zihao Zhou, Maizhen Ning, Qiufeng Wang
    |
    Oct 14th, 2023
    |
    conferencePaper
    Zihao Zhou, Maizhen Ning, Qiufeng Wang
    Oct 14th, 2023
Last update from database: 14/10/2025, 21:15 (UTC)
Powered by Zotero and Kerko.