269 resources

  • Sankalan Pal Chowdhury, Vilém Zouhar, Mr...
    |
    Apr 25th, 2024
    |
    preprint
    Sankalan Pal Chowdhury, Vilém Zouhar, Mr...
    Apr 25th, 2024

    Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providing no guarantees. We posit that while LLMs with certain guardrails can take the place of subject...

  • Nicolò Cosimo Albanese
    |
    Apr 20th, 2024
    |
    conferencePaper
    Nicolò Cosimo Albanese
    Apr 20th, 2024

    Ensuring fidelity to source documents is crucial for the responsible use of Large Language Models (LLMs) in Retrieval Augmented Generation (RAG) systems. We propose a lightweight method for real-time hallucination detection, with potential to be deployed as a model-agnostic microservice to bolster reliability. Using in-context learning, our approach evaluates response factuality at the sentence level without annotated data, promoting transparency and user trust. Compared to other...

  • Ji Yoon Jung, Lillian Tyack, Matthias vo...
    |
    Apr 8th, 2024
    |
    journalArticle
    Ji Yoon Jung, Lillian Tyack, Matthias vo...
    Apr 8th, 2024

    Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies.

  • Cade Metz, Cecilia Kang, Sheera Frenkel,...
    |
    Apr 6th, 2024
    |
    newspaperArticle
    Cade Metz, Cecilia Kang, Sheera Frenkel,...
    Apr 6th, 2024

    OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.

  • Siyu Zha, Yuehan Qiao, Qingyu Hu
    |
    Apr 5th, 2024
    |
    preprint
    Siyu Zha, Yuehan Qiao, Qingyu Hu
    Apr 5th, 2024

    Project-based learning (PBL) is an instructional method that is very helpful in nurturing students' creativity, but it requires significant time and energy from both students and teachers. Large language models (LLMs) have been proven to assist in creative tasks, yet much controversy exists regarding their role in fostering creativity. This paper explores the potential of LLMs in PBL settings, with a special focus on fostering creativity. We began with an exploratory study involving 12...

  • Ahmed Mohammed, Umar Faiza Bashir, Abuba...
    |
    Apr 3rd, 2024
    |
    journalArticle
    Ahmed Mohammed, Umar Faiza Bashir, Abuba...
    Apr 3rd, 2024

    The study assessed the impact of artificial intelligence on curriculum implementation in public secondary schools in Federal Capital territory, Abuja, Nigeria. The research design used for the study is descriptive survey. The population of the study comprises of the all the teachers in public secondary schools in FCT. The sample for the study is 320 respondents. The researcher formulated a questionnaire titled Artificial Intelligence on Curriculum Implementation Questionnaire (AICIQ). The...

  • Ruikun Hou, Tim Fütterer, Babette Bühler...
    |
    Apr 1st, 2024
    |
    preprint
    Ruikun Hou, Tim Fütterer, Babette Bühler...
    Apr 1st, 2024

    Classroom observation protocols standardize the assessment of teaching effectiveness and facilitate comprehension of classroom interactions. Whereas these protocols offer teachers specific feedback on their teaching practices, the manual coding by human raters is resource-intensive and often unreliable. This has sparked interest in developing AI-driven, cost-effective methods for automating such holistic coding. Our work explores a multimodal approach to automatically estimating...

  • Joshua Wilson, Fan Zhang, Corey Palermo,...
    |
    Apr 1st, 2024
    |
    journalArticle
    Joshua Wilson, Fan Zhang, Corey Palermo,...
    Apr 1st, 2024

    This study examined middle school students' perceptions of an automated writing evaluation (AWE) system, MI Write. We summarize students' perceptions of MI Write's usability, usefulness, and desirability both quantitatively and qualitatively. We then estimate hierarchical entry regression models that account for district context, classroom climate, demographic factors (i.e., gender, special education status, limited English proficiency status, socioeconomic status, grade), students'...

  • Olanrewaju Lawal, Anthony Soronnadi, Olu...
    |
    Apr 27th, 2024
    |
    conferencePaper
    Olanrewaju Lawal, Anthony Soronnadi, Olu...
    Apr 27th, 2024

    In the rapidly evolving era of artificial intelligence, Large Language Models (LLMs) like ChatGPT-3.5, Llama, and PaLM 2 play a pivotal role in reshaping education. Trained on diverse language data with a predominant focus on English, these models exhibit remarkable proficiency in comprehending and generating intricate human language constructs, revolutionizing educational applications. This potential has prompted exploration into personalized and enriched educational experiences,...

  • Jon Saad-Falcon, Omar Khattab, Christoph...
    |
    Mar 31st, 2024
    |
    preprint
    Jon Saad-Falcon, Omar Khattab, Christoph...
    Mar 31st, 2024

    Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems along the dimensions of context relevance, answer faithfulness, and answer relevance. By creating its own synthetic training data, ARES finetunes lightweight LM judges to assess the quality of individual RAG components. To mitigate potential...

  • Yiqing Xie, Alex Xie, Divyanshu Sheth
    |
    Mar 31st, 2024
    |
    preprint
    Yiqing Xie, Alex Xie, Divyanshu Sheth
    Mar 31st, 2024

    To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples...

  • Yiqing Xie, Alex Xie, Divyanshu Sheth
    |
    Mar 31st, 2024
    |
    preprint
    Yiqing Xie, Alex Xie, Divyanshu Sheth
    Mar 31st, 2024

    To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples...

  • Muhammad Athar Ganaie
    |
    Mar 16th, 2024
    |
    blogPost
    Muhammad Athar Ganaie
    Mar 16th, 2024

    Understanding and manipulating neural models is essential in the evolving field of AI. This necessity stems from various applications, from refining models for enhanced robustness to unraveling their decision-making processes for greater interpretability. Amidst this backdrop, the Stanford University research team has introduced 'pyvene,' a groundbreaking open-source Python library that facilitates intricate interventions on PyTorch models. pyvene is ingeniously designed to overcome the...

  • Simone Balloccu, Patrícia Schmidtová, Ma...
    |
    Feb 22nd, 2024
    |
    preprint
    Simone Balloccu, Patrícia Schmidtová, Ma...
    Feb 22nd, 2024

    Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the...

  • Ben Williamson
    |
    Feb 22nd, 2024
    |
    blogPost
    Ben Williamson
    Feb 22nd, 2024

    Photo by Mick Haupt on Unsplash Over the past year or so, a narrative that AI will inevitably transform education has become widespread. You can find it in the pronouncements of investors, tech ind…

  • Vinu Sankar Sadasivan, Aounon Kumar, Sri...
    |
    Feb 19th, 2024
    |
    preprint
    Vinu Sankar Sadasivan, Aounon Kumar, Sri...
    Feb 19th, 2024

    The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

  • Vinu Sankar Sadasivan, Aounon Kumar, Sri...
    |
    Feb 19th, 2024
    |
    preprint
    Vinu Sankar Sadasivan, Aounon Kumar, Sri...
    Feb 19th, 2024

    The unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, we show that these detectors are not...

  • Peiyi Wang, Lei Li, Zhihong Shao
    |
    Feb 19th, 2024
    |
    preprint
    Peiyi Wang, Lei Li, Zhihong Shao
    Feb 19th, 2024

    In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

  • Zhen Li, Xiaohan Xu, Tao Shen
    |
    Dec 27th, 2024
    |
    journalArticle
    Zhen Li, Xiaohan Xu, Tao Shen
    Dec 27th, 2024

    In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This survey aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to...

  • The Chronicle of Higher Educ...
    |
    Dec 27th, 2024
    |
    report
    The Chronicle of Higher Educ...
    Dec 27th, 2024
Last update from database: 27/12/2024, 16:15 (UTC)
Powered by Zotero and Kerko.