Search
18 resources
-
Lijuan Wang, Miaomiao Zhao|Jan 22nd, 2024|conferencePaperLijuan Wang, Miaomiao ZhaoJan 22nd, 2024
-
Xinyi Lu, Xu Wang|Jul 9th, 2024|conferencePaperXinyi Lu, Xu WangJul 9th, 2024
Evaluating the quality of automatically generated question items has been a long standing challenge. In this paper, we leverage LLMs to simulate student profiles and generate responses to multiple-choice questions (MCQs). The generative students' responses to MCQs can further support question item evaluation. We propose Generative Students, a prompt architecture designed based on the KLI framework. A generative student profile is a function of the list of knowledge components the student has...
-
Renzhe Yu, Zhen Xu, Sky CH-Wang|Nov 2nd, 2024|preprintRenzhe Yu, Zhen Xu, Sky CH-WangNov 2nd, 2024
The universal availability of ChatGPT and other similar tools since late 2022 has prompted tremendous public excitement and experimental effort about the potential of large language models (LLMs) to improve learning experience and outcomes, especially for learners from disadvantaged backgrounds. However, little research has systematically examined the real-world impacts of LLM availability on educational equity beyond theoretical projections and controlled studies of innovative LLM...
-
Siyuan Wang, Zhuohan Long, Zhihao Fan|Feb 17th, 2024|preprintSiyuan Wang, Zhuohan Long, Zhihao FanFeb 17th, 2024
This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...
-
Siyuan Wang, Zhuohan Long, Zhihao Fan|Feb 17th, 2024|preprintSiyuan Wang, Zhuohan Long, Zhihao FanFeb 17th, 2024
This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...
-
Rose E. Wang, Qingyang Zhang, Carly Robi...|Jan 22nd, 2024|preprintRose E. Wang, Qingyang Zhang, Carly Robi...Jan 22nd, 2024
Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...
-
Sunder Ali Khowaja, Parus Khuwaja, Kapal...|May 5th, 2024|journalArticleSunder Ali Khowaja, Parus Khuwaja, Kapal...May 5th, 2024
Abstract ChatGPT is another large language model (LLM) vastly available for the consumers on their devices but due to its performance and ability to converse effectively, it has gained a huge popularity amongst research as well as industrial community. Recently, many studies have been published to show the effectiveness, efficiency, integration, and sentiments of chatGPT and other LLMs. In contrast, this study focuses on the important aspects that are mostly overlooked, i.e....
-
Vinu Sankar Sadasivan, Aounon Kumar, Sri...|Feb 19th, 2024|preprintVinu Sankar Sadasivan, Aounon Kumar, Sri...Feb 19th, 2024
The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...
-
Rose E. Wang, Ana T. Ribeiro, Carly D. R...|Oct 3rd, 2024|preprintRose E. Wang, Ana T. Ribeiro, Carly D. R...Oct 3rd, 2024
Generative AI, particularly Language Models (LMs), has the potential to transform real-world domains with societal impact, particularly where access to experts is limited. For example, in education, training novice educators with expert guidance is important for effectiveness but expensive, creating significant barriers to improving education quality at scale. This challenge disproportionately harms students from under-served communities, who stand to gain the most from high-quality...
-
Rose E Wang, Ana T Ribeiro, Carly D Robi...|Nov 25th, 2024|journalArticleRose E Wang, Ana T Ribeiro, Carly D Robi...Nov 25th, 2024
-
Peiyi Wang, Lei Li, Zhihong Shao|Jan 22nd, 2024|preprintPeiyi Wang, Lei Li, Zhihong ShaoJan 22nd, 2024
In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...
-
Peiyi Wang, Lei Li, Zhihong Shao|Feb 19th, 2024|preprintPeiyi Wang, Lei Li, Zhihong ShaoFeb 19th, 2024
In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...
-
Zheng Chu, Jingchang Chen, Qianglong Che...|Jan 22nd, 2024|preprintZheng Chu, Jingchang Chen, Qianglong Che...Jan 22nd, 2024
Reasoning, a fundamental cognitive process integral to human intelligence, has garnered substantial interest within artificial intelligence. Notably, recent studies have revealed that chain-of-thought prompting significantly enhances LLM's reasoning capabilities, which attracts widespread attention from both academics and industry. In this paper, we systematically investigate relevant research, summarizing advanced methods through a meticulous taxonomy that offers novel perspectives....
-
Jacob Steiss, Tamara Tate, Steve Graham,...|Jun 22nd, 2024|journalArticleJacob Steiss, Tamara Tate, Steve Graham,...Jun 22nd, 2024
Structured Abstract Background Offering students formative feedback on their writing is an effective way to facilitate writing development. Recent advances in AI (i.e., ChatGPT) may function as an automated writing evaluation tool, increasing the amount of feedback students receive and diminishing the burden on teachers to provide frequent feedback to large classes. Aims We examined the ability of generative AI (ChatGPT) to provide formative feedback. We compared the quality of human and AI...
-
Jacob Doughty, Zipiao Wan, Anishka Bompe...|Jan 29th, 2024|conferencePaperJacob Doughty, Zipiao Wan, Anishka Bompe...Jan 29th, 2024
-
Abhimanyu Dubey, Abhinav Jauhri, Abhinav...|Aug 15th, 2024|preprintAbhimanyu Dubey, Abhinav Jauhri, Abhinav...Aug 15th, 2024
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language...
-
Irina Jurenka, Markus Kunesch, Kevin McK...|May 14th, 2024|reportIrina Jurenka, Markus Kunesch, Kevin McK...May 14th, 2024
-
Irina Jurenka, Markus Kunesch, Kevin McK...|May 14th, 2024|documentIrina Jurenka, Markus Kunesch, Kevin McK...May 14th, 2024