5 resources

  • Jill Burstein, Kevin Yancey, Klinton Bic...
    |
    Dec 16th, 2023
    |
    document
    Jill Burstein, Kevin Yancey, Klinton Bic...
    Dec 16th, 2023
  • Kevin P. Yancey, Geoffrey Laflair, Antho...
    |
    Dec 16th, 2023
    |
    conferencePaper
    Kevin P. Yancey, Geoffrey Laflair, Antho...
    Dec 16th, 2023

    Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...

  • Jill Burstein, Geoffrey T. LaFlair, Anto...
    |
    Mar 23rd, 2022
    |
    report
    Jill Burstein, Geoffrey T. LaFlair, Anto...
    Mar 23rd, 2022

    The Duolingo English Test is a groundbreaking, digital-first, computer-adaptive English language proficiency test intended to support stakeholder admissions decisions at English-medium institutions. The test measures four key constructs for university English language proficiency: Speaking, Writing, Reading, and Listening (SWRL), and is aligned with the Common European Framework of Reference for Languages (CEFR) proficiency levels and descriptors. As a digital-first assessment, the test...

  • Jill Burstein, Geoffrey T. LaFlair, Kevi...
    |
    Aug 28th, 2024
    |
    preprint
    Jill Burstein, Geoffrey T. LaFlair, Kevi...
    Aug 28th, 2024

    Artificial intelligence (AI) creates opportunities for assessments, such as efficiencies for item generation and scoring of spoken and written responses. At the same time, it poses risks (such as bias in AI-generated item content). Responsible AI (RAI) practices aim to mitigate risks associated with AI. This chapter addresses the critical role of RAI practices in achieving test quality (appropriateness of test score inferences), and test equity (fairness to all test takers). To illustrate,...

  • Imran Chamieh, Torsten Zesch, Klaus Gieb...
    |
    Jun 16th, 2024
    |
    conferencePaper
    Imran Chamieh, Torsten Zesch, Klaus Gieb...
    Jun 16th, 2024

    In this work, we investigate the potential of Large Language Models (LLMs) for automated short answer scoring. We test zero-shot and few-shot settings, and compare with fine-tuned models and a supervised upper-bound, across three diverse datasets. Our results, in zero-shot and few-shot settings, show that LLMs perform poorly in these settings: LLMs have difficulty with tasks that require complex reasoning or domain-specific knowledge. While the models show promise on general knowledge tasks....

Last update from database: 16/12/2025, 15:15 (UTC)
Powered by Zotero and Kerko.