Search
6 resources
-
Ehsan Latif, Xiaoming Zhai|Oct 28th, 2023|journalArticleEhsan Latif, Xiaoming ZhaiOct 28th, 2023
This study highlights the potential of fine-tuned ChatGPT (GPT-3.5) for automatically scoring student written constructed responses using example assessment tasks in science education. Recent studies on OpenAI's generative model GPT-3.5 proved its superiority in predicting the natural language with high accuracy and human-like responses. GPT-3.5 has been trained over enormous online language materials such as journals and Wikipedia; therefore, more than direct usage of pre-trained GPT-3.5 is...
-
Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...|Oct 28th, 2023|journalArticleGyeong-Geon Lee, Ehsan Latif, Xuansheng ...Oct 28th, 2023
This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment...
-
Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...|Oct 28th, 2023|journalArticleGyeong-Geon Lee, Ehsan Latif, Xuansheng ...Oct 28th, 2023
This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment...
-
Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...|Jun 28th, 2024|journalArticleGyeong-Geon Lee, Ehsan Latif, Xuansheng ...Jun 28th, 2024
-
Gyeong-Geon Lee, Ehsan Latif, Xuansheng ...|Jun 28th, 2024|preprintGyeong-Geon Lee, Ehsan Latif, Xuansheng ...Jun 28th, 2024
This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment...
-
Xuansheng Wu, Padmaja Pravin Saraf, Gyeo...|Feb 21st, 2025|preprintXuansheng Wu, Padmaja Pravin Saraf, Gyeo...Feb 21st, 2025
Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by which LLMs assign scores remain largely unclear. It is also uncertain how closely AI's scoring process mirrors that of humans or if it adheres to the same grading criteria. To address this gap, this paper uncovers the grading rubrics that LLMs used to score...