2 resources

  • Jill Burstein, Geoffrey T. LaFlair, Kevi...
    |
    Aug 28th, 2024
    |
    preprint
    Jill Burstein, Geoffrey T. LaFlair, Kevi...
    Aug 28th, 2024

    Artificial intelligence (AI) creates opportunities for assessments, such as efficiencies for item generation and scoring of spoken and written responses. At the same time, it poses risks (such as bias in AI-generated item content). Responsible AI (RAI) practices aim to mitigate risks associated with AI. This chapter addresses the critical role of RAI practices in achieving test quality (appropriateness of test score inferences), and test equity (fairness to all test takers). To illustrate,...

  • Imran Chamieh, Torsten Zesch, Klaus Gieb...
    |
    Jun 16th, 2024
    |
    conferencePaper
    Imran Chamieh, Torsten Zesch, Klaus Gieb...
    Jun 16th, 2024

    In this work, we investigate the potential of Large Language Models (LLMs) for automated short answer scoring. We test zero-shot and few-shot settings, and compare with fine-tuned models and a supervised upper-bound, across three diverse datasets. Our results, in zero-shot and few-shot settings, show that LLMs perform poorly in these settings: LLMs have difficulty with tasks that require complex reasoning or domain-specific knowledge. While the models show promise on general knowledge tasks....

Last update from database: 16/12/2025, 15:15 (UTC)
Powered by Zotero and Kerko.