2 resources

  • Yale Quan, Chun Wang
    |
    Oct 26th, 2025
    |
    journalArticle
    Yale Quan, Chun Wang
    Oct 26th, 2025

    This study introduces InterDIFNet, a multilabel classification neural network for detecting intersectional differential item functioning (DIF) in educational and psychological assessments, with a focus on small sample sizes. Unlike traditional marginal DIF methods, which often fail to capture the effects of intersecting identities and require large samples, InterDIFNet models uniform and non-uniform DIF across multiple intersectional groups simultaneously. The method utilizes an optimized...

  • Valentin Hofmann, David Heineman, Ian Ma...
    |
    Sep 14th, 2025
    |
    preprint
    Valentin Hofmann, David Heineman, Ian Ma...
    Sep 14th, 2025

    Language model (LM) benchmarking faces several challenges: comprehensive evaluations are costly, benchmarks often fail to measure the intended capabilities, and evaluation quality can degrade due to labeling errors and benchmark saturation. Although various strategies have been proposed to mitigate these issues, they tend to address individual aspects in isolation, neglecting broader questions about overall evaluation quality. Here, we introduce FLUID BENCHMARKING, a new evaluation approach...

Last update from database: 26/10/2025, 01:15 (UTC)
Powered by Zotero and Kerko.