Search
269 resources
-
Sankalan Pal Chowdhury, Vilém Zouhar, Mr...|Apr 25th, 2024|preprintSankalan Pal Chowdhury, Vilém Zouhar, Mr...Apr 25th, 2024
Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providing no guarantees. We posit that while LLMs with certain guardrails can take the place of subject...
-
Nicolò Cosimo Albanese|Apr 20th, 2024|conferencePaperNicolò Cosimo AlbaneseApr 20th, 2024
Ensuring fidelity to source documents is crucial for the responsible use of Large Language Models (LLMs) in Retrieval Augmented Generation (RAG) systems. We propose a lightweight method for real-time hallucination detection, with potential to be deployed as a model-agnostic microservice to bolster reliability. Using in-context learning, our approach evaluates response factuality at the sentence level without annotated data, promoting transparency and user trust. Compared to other...
-
Ji Yoon Jung, Lillian Tyack, Matthias vo...|Apr 8th, 2024|journalArticleJi Yoon Jung, Lillian Tyack, Matthias vo...Apr 8th, 2024
Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies.
-
Cade Metz, Cecilia Kang, Sheera Frenkel,...|Apr 6th, 2024|newspaperArticleCade Metz, Cecilia Kang, Sheera Frenkel,...Apr 6th, 2024
OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.
-
Siyu Zha, Yuehan Qiao, Qingyu Hu|Apr 5th, 2024|preprintSiyu Zha, Yuehan Qiao, Qingyu HuApr 5th, 2024
Project-based learning (PBL) is an instructional method that is very helpful in nurturing students' creativity, but it requires significant time and energy from both students and teachers. Large language models (LLMs) have been proven to assist in creative tasks, yet much controversy exists regarding their role in fostering creativity. This paper explores the potential of LLMs in PBL settings, with a special focus on fostering creativity. We began with an exploratory study involving 12...
-
Ahmed Mohammed, Umar Faiza Bashir, Abuba...|Apr 3rd, 2024|journalArticleAhmed Mohammed, Umar Faiza Bashir, Abuba...Apr 3rd, 2024
The study assessed the impact of artificial intelligence on curriculum implementation in public secondary schools in Federal Capital territory, Abuja, Nigeria. The research design used for the study is descriptive survey. The population of the study comprises of the all the teachers in public secondary schools in FCT. The sample for the study is 320 respondents. The researcher formulated a questionnaire titled Artificial Intelligence on Curriculum Implementation Questionnaire (AICIQ). The...
-
Ruikun Hou, Tim Fütterer, Babette Bühler...|Apr 1st, 2024|preprintRuikun Hou, Tim Fütterer, Babette Bühler...Apr 1st, 2024
Classroom observation protocols standardize the assessment of teaching effectiveness and facilitate comprehension of classroom interactions. Whereas these protocols offer teachers specific feedback on their teaching practices, the manual coding by human raters is resource-intensive and often unreliable. This has sparked interest in developing AI-driven, cost-effective methods for automating such holistic coding. Our work explores a multimodal approach to automatically estimating...
-
Joshua Wilson, Fan Zhang, Corey Palermo,...|Apr 1st, 2024|journalArticleJoshua Wilson, Fan Zhang, Corey Palermo,...Apr 1st, 2024
This study examined middle school students' perceptions of an automated writing evaluation (AWE) system, MI Write. We summarize students' perceptions of MI Write's usability, usefulness, and desirability both quantitatively and qualitatively. We then estimate hierarchical entry regression models that account for district context, classroom climate, demographic factors (i.e., gender, special education status, limited English proficiency status, socioeconomic status, grade), students'...
-
Olanrewaju Lawal, Anthony Soronnadi, Olu...|Apr 26th, 2024|conferencePaperOlanrewaju Lawal, Anthony Soronnadi, Olu...Apr 26th, 2024
In the rapidly evolving era of artificial intelligence, Large Language Models (LLMs) like ChatGPT-3.5, Llama, and PaLM 2 play a pivotal role in reshaping education. Trained on diverse language data with a predominant focus on English, these models exhibit remarkable proficiency in comprehending and generating intricate human language constructs, revolutionizing educational applications. This potential has prompted exploration into personalized and enriched educational experiences,...
-
Jon Saad-Falcon, Omar Khattab, Christoph...|Mar 31st, 2024|preprintJon Saad-Falcon, Omar Khattab, Christoph...Mar 31st, 2024
Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems along the dimensions of context relevance, answer faithfulness, and answer relevance. By creating its own synthetic training data, ARES finetunes lightweight LM judges to assess the quality of individual RAG components. To mitigate potential...
-
Yiqing Xie, Alex Xie, Divyanshu Sheth|Mar 31st, 2024|preprintYiqing Xie, Alex Xie, Divyanshu ShethMar 31st, 2024
To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples...
-
Yiqing Xie, Alex Xie, Divyanshu Sheth|Mar 31st, 2024|preprintYiqing Xie, Alex Xie, Divyanshu ShethMar 31st, 2024
To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples...
-
Muhammad Athar Ganaie|Mar 16th, 2024|blogPostMuhammad Athar GanaieMar 16th, 2024
Understanding and manipulating neural models is essential in the evolving field of AI. This necessity stems from various applications, from refining models for enhanced robustness to unraveling their decision-making processes for greater interpretability. Amidst this backdrop, the Stanford University research team has introduced 'pyvene,' a groundbreaking open-source Python library that facilitates intricate interventions on PyTorch models. pyvene is ingeniously designed to overcome the...
-
Simone Balloccu, Patrícia Schmidtová, Ma...|Feb 22nd, 2024|preprintSimone Balloccu, Patrícia Schmidtová, Ma...Feb 22nd, 2024
Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the...
-
Ben Williamson|Feb 22nd, 2024|blogPostBen WilliamsonFeb 22nd, 2024
Photo by Mick Haupt on Unsplash Over the past year or so, a narrative that AI will inevitably transform education has become widespread. You can find it in the pronouncements of investors, tech ind…
-
Vinu Sankar Sadasivan, Aounon Kumar, Sri...|Feb 19th, 2024|preprintVinu Sankar Sadasivan, Aounon Kumar, Sri...Feb 19th, 2024
The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...
-
Vinu Sankar Sadasivan, Aounon Kumar, Sri...|Feb 19th, 2024|preprintVinu Sankar Sadasivan, Aounon Kumar, Sri...Feb 19th, 2024
The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...
-
Peiyi Wang, Lei Li, Zhihong Shao|Feb 19th, 2024|preprintPeiyi Wang, Lei Li, Zhihong ShaoFeb 19th, 2024
In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...
-
Zhen Li, Xiaohan Xu, Tao Shen|Dec 26th, 2024|journalArticleZhen Li, Xiaohan Xu, Tao ShenDec 26th, 2024
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This survey aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to...
-
The Chronicle of Higher Educ...|Dec 26th, 2024|reportThe Chronicle of Higher Educ...Dec 26th, 2024