Search
265 resources
-
Siyu Zha, Yuehan Qiao, Qingyu Hu|Apr 5th, 2024|preprintSiyu Zha, Yuehan Qiao, Qingyu HuApr 5th, 2024
Project-based learning (PBL) is an instructional method that is very helpful in nurturing students' creativity, but it requires significant time and energy from both students and teachers. Large language models (LLMs) have been proven to assist in creative tasks, yet much controversy exists regarding their role in fostering creativity. This paper explores the potential of LLMs in PBL settings, with a special focus on fostering creativity. We began with an exploratory study involving 12...
-
Ahmed Mohammed, Umar Faiza Bashir, Abuba...|Apr 3rd, 2024|journalArticleAhmed Mohammed, Umar Faiza Bashir, Abuba...Apr 3rd, 2024
The study assessed the impact of artificial intelligence on curriculum implementation in public secondary schools in Federal Capital territory, Abuja, Nigeria. The research design used for the study is descriptive survey. The population of the study comprises of the all the teachers in public secondary schools in FCT. The sample for the study is 320 respondents. The researcher formulated a questionnaire titled Artificial Intelligence on Curriculum Implementation Questionnaire (AICIQ). The...
-
Ruikun Hou, Tim Fütterer, Babette Bühler...|Apr 1st, 2024|preprintRuikun Hou, Tim Fütterer, Babette Bühler...Apr 1st, 2024
Classroom observation protocols standardize the assessment of teaching effectiveness and facilitate comprehension of classroom interactions. Whereas these protocols offer teachers specific feedback on their teaching practices, the manual coding by human raters is resource-intensive and often unreliable. This has sparked interest in developing AI-driven, cost-effective methods for automating such holistic coding. Our work explores a multimodal approach to automatically estimating...
-
Joshua Wilson, Fan Zhang, Corey Palermo,...|Apr 1st, 2024|journalArticleJoshua Wilson, Fan Zhang, Corey Palermo,...Apr 1st, 2024
This study examined middle school students' perceptions of an automated writing evaluation (AWE) system, MI Write. We summarize students' perceptions of MI Write's usability, usefulness, and desirability both quantitatively and qualitatively. We then estimate hierarchical entry regression models that account for district context, classroom climate, demographic factors (i.e., gender, special education status, limited English proficiency status, socioeconomic status, grade), students'...
-
Olanrewaju Lawal, Anthony Soronnadi, Olu...|Apr 23rd, 2024|conferencePaperOlanrewaju Lawal, Anthony Soronnadi, Olu...Apr 23rd, 2024
In the rapidly evolving era of artificial intelligence, Large Language Models (LLMs) like ChatGPT-3.5, Llama, and PaLM 2 play a pivotal role in reshaping education. Trained on diverse language data with a predominant focus on English, these models exhibit remarkable proficiency in comprehending and generating intricate human language constructs, revolutionizing educational applications. This potential has prompted exploration into personalized and enriched educational experiences,...
-
Jon Saad-Falcon, Omar Khattab, Christoph...|Mar 31st, 2024|preprintJon Saad-Falcon, Omar Khattab, Christoph...Mar 31st, 2024
Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems along the dimensions of context relevance, answer faithfulness, and answer relevance. By creating its own synthetic training data, ARES finetunes lightweight LM judges to assess the quality of individual RAG components. To mitigate potential...
-
Yiqing Xie, Alex Xie, Divyanshu Sheth|Mar 31st, 2024|preprintYiqing Xie, Alex Xie, Divyanshu ShethMar 31st, 2024
To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples...
-
Yiqing Xie, Alex Xie, Divyanshu Sheth|Mar 31st, 2024|preprintYiqing Xie, Alex Xie, Divyanshu ShethMar 31st, 2024
To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples...
-
Muhammad Athar Ganaie|Mar 16th, 2024|blogPostMuhammad Athar GanaieMar 16th, 2024
Understanding and manipulating neural models is essential in the evolving field of AI. This necessity stems from various applications, from refining models for enhanced robustness to unraveling their decision-making processes for greater interpretability. Amidst this backdrop, the Stanford University research team has introduced 'pyvene,' a groundbreaking open-source Python library that facilitates intricate interventions on PyTorch models. pyvene is ingeniously designed to overcome the...
-
Simone Balloccu, Patrícia Schmidtová, Ma...|Feb 22nd, 2024|preprintSimone Balloccu, Patrícia Schmidtová, Ma...Feb 22nd, 2024
Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the...
-
Ben Williamson|Feb 22nd, 2024|blogPostBen WilliamsonFeb 22nd, 2024
Photo by Mick Haupt on Unsplash Over the past year or so, a narrative that AI will inevitably transform education has become widespread. You can find it in the pronouncements of investors, tech ind…
-
Vinu Sankar Sadasivan, Aounon Kumar, Sri...|Feb 19th, 2024|preprintVinu Sankar Sadasivan, Aounon Kumar, Sri...Feb 19th, 2024
The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...
-
Vinu Sankar Sadasivan, Aounon Kumar, Sri...|Feb 19th, 2024|preprintVinu Sankar Sadasivan, Aounon Kumar, Sri...Feb 19th, 2024
The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...
-
Peiyi Wang, Lei Li, Zhihong Shao|Feb 19th, 2024|preprintPeiyi Wang, Lei Li, Zhihong ShaoFeb 19th, 2024
In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...
-
Hamsa Bastani, Osbert Bastani, Alp Sungu...|Jan 23rd, 2024|preprintHamsa Bastani, Osbert Bastani, Alp Sungu...Jan 23rd, 2024
Generative artificial intelligence (AI) is poised to revolutionize how humans work, and has already demonstrated promise in significantly improving human productivity. However, a key remaining question is how generative AI affects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs. We study the impact of...
-
Zhen Li, Xiaohan Xu, Tao Shen|Jan 23rd, 2024|journalArticleZhen Li, Xiaohan Xu, Tao ShenJan 23rd, 2024
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This survey aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to...
-
The Chronicle of Higher Educ...|Jan 23rd, 2024|reportThe Chronicle of Higher Educ...Jan 23rd, 2024
-
Lijuan Wang, Miaomiao Zhao|Jan 23rd, 2024|conferencePaperLijuan Wang, Miaomiao ZhaoJan 23rd, 2024
-
Rose E. Wang, Qingyang Zhang, Carly Robi...|Jan 23rd, 2024|preprintRose E. Wang, Qingyang Zhang, Carly Robi...Jan 23rd, 2024
Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribute Bridge, a method that uses cognitive task analysis to translate an expert's latent thought...
-
Ziwei Xu, Sanjay Jain, Mohan Kankanhalli...|Jan 23rd, 2024|journalArticleZiwei Xu, Sanjay Jain, Mohan Kankanhalli...Jan 23rd, 2024
Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined...