Search
8 resources
-
Kate Nolan, Youngsoon Kang, Ran Liu|Jan 22nd, 2025|presentationKate Nolan, Youngsoon Kang, Ran LiuJan 22nd, 2025
-
Yunting Liu, Shreya Bhandari, Zachary A....|Feb 24th, 2025|journalArticleYunting Liu, Shreya Bhandari, Zachary A....Feb 24th, 2025
Effective educational measurement relies heavily on the curation of well‐designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT‐3.5, GPT‐4, Llama 2, Llama 3, Gemini‐Pro and Cohere Command R Plus) to generate responses with psychometric properties comparable to those of human respondents....
-
Yunting Liu, Shreya Bhandari, Zachary A....|Feb 24th, 2025|journalArticleYunting Liu, Shreya Bhandari, Zachary A....Feb 24th, 2025
Effective educational measurement relies heavily on the curation of well‐designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT‐3.5, GPT‐4, Llama 2, Llama 3, Gemini‐Pro and Cohere Command R Plus) to generate responses with psychometric properties comparable to those of human respondents....
-
Yue Huang, Corey Palermo, Ruitao Liu|Aug 27th, 2025|journalArticleYue Huang, Corey Palermo, Ruitao LiuAug 27th, 2025
-
Xuansheng Wu, Padmaja Pravin Saraf, Gyeo...|Feb 21st, 2025|preprintXuansheng Wu, Padmaja Pravin Saraf, Gyeo...Feb 21st, 2025
Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by which LLMs assign scores remain largely unclear. It is also uncertain how closely AI's scoring process mirrors that of humans or if it adheres to the same grading criteria. To address this gap, this paper uncovers the grading rubrics that LLMs used to score...
-
Jiangang Hao, Wenju Cui, Patrick C. Kyll...|Jan 22nd, 2025|conferencePaperJiangang Hao, Wenju Cui, Patrick C. Kyll...Jan 22nd, 2025
-
Xiner Liu, Andrés Zambrano, Ryan Baker|Mar 5th, 2025|journalArticleXiner Liu, Andrés Zambrano, Ryan BakerMar 5th, 2025
This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies-Zero-shot, Few-shot, and Few-shot with contextual information-as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I...
-
Lei Huang, Weijiang Yu, Weitao Ma|Jan 24th, 2025|preprintLei Huang, Weijiang Yu, Weitao MaJan 24th, 2025
The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), fueling a paradigm shift in information acquisition. Nevertheless, LLMs are prone to hallucination, generating plausible yet nonfactual content. This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval (IR) systems and has attracted intensive research to detect and mitigate such hallucinations. Given the open-ended...