Search
6 resources
-
Rose E. Wang, Qingyang Zhang, Carly Robi...|Dec 14th, 2024|preprintRose E. Wang, Qingyang Zhang, Carly Robi...Dec 14th, 2024
Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...
-
Yu Li, Shenyu Zhang, Rui Wu|Dec 14th, 2024|preprintYu Li, Shenyu Zhang, Rui WuDec 14th, 2024
Recent advancements in generative Large Language Models(LLMs) have been remarkable, however, the quality of the text generated by these models often reveals persistent issues. Evaluating the quality of text generated by these models, especially in open-ended text, has consistently presented a significant challenge. Addressing this, recent work has explored the possibility of using LLMs as evaluators. While using a single LLM as an evaluation agent shows potential, it is filled with...
-
Joshua Wilson, Fan Zhang, Corey Palermo,...|Apr 14th, 2024|journalArticleJoshua Wilson, Fan Zhang, Corey Palermo,...Apr 14th, 2024
This study examined middle school students' perceptions of an automated writing evaluation (AWE) system, MI Write. We summarize students' perceptions of MI Write's usability, usefulness, and desirability both quantitatively and qualitatively. We then estimate hierarchical entry regression models that account for district context, classroom climate, demographic factors (i.e., gender, special education status, limited English proficiency status, socioeconomic status, grade), students'...
-
Hugh Zhang, Jeff Da, Dean Lee|May 3rd, 2024|preprintHugh Zhang, Jeff Da, Dean LeeMay 3rd, 2024
Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k...
-
Jacob Doughty, Zipiao Wan, Anishka Bompe...|Jan 29th, 2024|conferencePaperJacob Doughty, Zipiao Wan, Anishka Bompe...Jan 29th, 2024
-
Abhimanyu Dubey, Abhinav Jauhri, Abhinav...|Aug 15th, 2024|preprintAbhimanyu Dubey, Abhinav Jauhri, Abhinav...Aug 15th, 2024
Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language...