702 resources

  • Jasmin Wachter, Michael Radloff, Maja Sm...
    |
    Mar 17th, 2025
    |
    preprint
    Jasmin Wachter, Michael Radloff, Maja Sm...
    Mar 17th, 2025

    We introduce an Item Response Theory (IRT)-based framework to detect and quantify socioeconomic bias in large language models (LLMs) without relying on subjective human judgments. Unlike traditional methods, IRT accounts for item difficulty, improving ideological bias estimation. We fine-tune two LLM families (Meta-LLaMa 3.2-1B-Instruct and Chat- GPT 3.5) to represent distinct ideological positions and introduce a two-stage approach: (1) modeling response avoidance and (2) estimating...

  • Mar 6th, 2025
    |
    webpage
    Mar 6th, 2025
  • Xiner Liu, Andrés Zambrano, Ryan Baker
    |
    Mar 5th, 2025
    |
    journalArticle
    Xiner Liu, Andrés Zambrano, Ryan Baker
    Mar 5th, 2025

    This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies-Zero-shot, Few-shot, and Few-shot with contextual information-as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I...

  • Mar 4th, 2025
    |
    webpage
    Mar 4th, 2025

    Find out how Turnitin Clarity brings transparency and integrity insights into the student writing process in Turnitin Feedback Studio

  • Mar 3rd, 2025
    |
    journalArticle
    Mar 3rd, 2025
  • Changrong Xiao, Wenxing Ma, Qingping Son...
    |
    Mar 3rd, 2025
    |
    preprint
    Changrong Xiao, Wenxing Ma, Qingping Son...
    Mar 3rd, 2025

    Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable...

  • Changrong Xiao, Wenxing Ma, Qingping Son...
    |
    Mar 3rd, 2025
    |
    preprint
    Changrong Xiao, Wenxing Ma, Qingping Son...
    Mar 3rd, 2025

    Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable...

  • Hayri Eren Suna, Mahmut Özer
    |
    Mar 1st, 2025
    |
    journalArticle
    Hayri Eren Suna, Mahmut Özer
    Mar 1st, 2025

    In recent years, artificial intelligence (AI) and machine learning (ML) algorithms have played an influential role in advancing educational assessment. As a means of improving equal opportunities in education, assessing students' learning deficiencies and developing personalized learning suggestions is considered an important aspect. Furthermore, big data-based algorithms play an increasing role in assessing students' cognitive and social-emotional development and conducting research on...

  • Jin Kyu (Justin) Kim, Michael Chua, Arma...
    |
    Feb 24th, 2025
    |
    journalArticle
    Jin Kyu (Justin) Kim, Michael Chua, Arma...
    Feb 24th, 2025

    Introduction: Multiple-choice questions (MCQs) are essential in medical education and widely used by licensing bodies. They are traditionally created with intensive human effort to ensure validity. Recent advances in AI, particularly large language models (LLMs), offer the potential to streamline this process. This study aimed to develop and test a GPT-4 model with customized instructions for generating MCQs to assess urology residents. Methods: A GPT-4 model was embedded using guidelines...

  • Yunting Liu, Shreya Bhandari, Zachary A....
    |
    Feb 24th, 2025
    |
    journalArticle
    Yunting Liu, Shreya Bhandari, Zachary A....
    Feb 24th, 2025

    Effective educational measurement relies heavily on the curation of well‐designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT‐3.5, GPT‐4, Llama 2, Llama 3, Gemini‐Pro and Cohere Command R Plus) to generate responses with psychometric properties comparable to those of human respondents....

  • Yunting Liu, Shreya Bhandari, Zachary A....
    |
    Feb 24th, 2025
    |
    journalArticle
    Yunting Liu, Shreya Bhandari, Zachary A....
    Feb 24th, 2025

    Effective educational measurement relies heavily on the curation of well‐designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT‐3.5, GPT‐4, Llama 2, Llama 3, Gemini‐Pro and Cohere Command R Plus) to generate responses with psychometric properties comparable to those of human respondents....

  • Xuansheng Wu, Padmaja Pravin Saraf, Gyeo...
    |
    Feb 21st, 2025
    |
    preprint
    Xuansheng Wu, Padmaja Pravin Saraf, Gyeo...
    Feb 21st, 2025

    Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by which LLMs assign scores remain largely unclear. It is also uncertain how closely AI's scoring process mirrors that of humans or if it adheres to the same grading criteria. To address this gap, this paper uncovers the grading rubrics that LLMs used to score...

  • Zhaoyi Joey Hou, Alejandro Ciuba, Xiang ...
    |
    Feb 13th, 2025
    |
    preprint
    Zhaoyi Joey Hou, Alejandro Ciuba, Xiang ...
    Feb 13th, 2025

    Automatic Essay Scoring (AES) assigns scores to student essays, reducing the grading workload for instructors. Developing a scoring system capable of handling essays across diverse prompts is challenging due to the flexibility and diverse nature of the writing task. Existing methods typically fall into two categories: supervised feature-based approaches and large language model (LLM)-based methods. Supervised feature-based approaches often achieve higher performance but require...

  • Hotaka Maeda, Yikai Lu
    |
    Feb 10th, 2025
    |
    preprint
    Hotaka Maeda, Yikai Lu
    Feb 10th, 2025

    We fine-tuned and compared several encoder-based Transformer large language models (LLM) to predict differential item functioning (DIF) from the item text. We then applied explainable artificial intelligence (XAI) methods to these models to identify specific words associated with DIF. The data included 42,180 items designed for English language arts and mathematics summative state assessments among students in grades 3 to 11. Prediction $R^2$ ranged from .04 to .32 among eight focal and...

  • Obed Boateng, Bright Boateng
    |
    Jan 30th, 2025
    |
    journalArticle
    Obed Boateng, Bright Boateng
    Jan 30th, 2025

    The increasing integration of artificial intelligence and algorithmic systems in educational settings has raised critical concerns about their impact on educational equity. This paper examines the manifestation and implications of algorithmic bias across various educational domains, including admissions processes, assessment systems, and learning management platforms. Through analysis of current research and studies, we investigate how these biases can perpetuate or exacerbate existing...

  • Lei Huang, Weijiang Yu, Weitao Ma
    |
    Jan 24th, 2025
    |
    preprint
    Lei Huang, Weijiang Yu, Weitao Ma
    Jan 24th, 2025

    The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), fueling a paradigm shift in information acquisition. Nevertheless, LLMs are prone to hallucination, generating plausible yet nonfactual content. This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval (IR) systems and has attracted intensive research to detect and mitigate such hallucinations. Given the open-ended...

  • Hannah-Beth Clark, Margaux Dowland, Laur...
    |
    Jan 21st, 2025
    |
    journalArticle
    Hannah-Beth Clark, Margaux Dowland, Laur...
    Jan 21st, 2025

    Designing AI tools for use in educational settings presents distinct challenges; the need for accuracy is heightened, safety is imperative and pedagogical rigor is crucial.As a publicly funded body in the UK, Oak National Academy is in a unique position to innovate within this field as we have a comprehensive curriculum of approximately 13,000 open education resources (OER) for all National Curriculum subjects, designed and quality-assured by expert, human teachers. This has provided the...

  • Oct 27th, 2025
    |
    book
    Oct 27th, 2025
  • Alejandro Andrade-Lotero, Lee Becker, Jo...
    |
    Oct 27th, 2025
    |
    conferencePaper
    Alejandro Andrade-Lotero, Lee Becker, Jo...
    Oct 27th, 2025
  • Jessica Andrews-Todd, Edith Aurora Graf,...
    |
    Oct 27th, 2025
    |
    conferencePaper
    Jessica Andrews-Todd, Edith Aurora Graf,...
    Oct 27th, 2025
Last update from database: 27/10/2025, 21:15 (UTC)
Powered by Zotero and Kerko.