In authors or contributors

63 resources

  • Jacob Steiss, Tamara Tate, Steve Graham,...
    |
    Sep 7th, 2023
    |
    preprint
    Jacob Steiss, Tamara Tate, Steve Graham,...
    Sep 7th, 2023

    Offering students formative feedback on drafts of their writing is an effective way to facilitate writing development. This study examined the ability of generative AI (i.e., ChatGPT) to provide formative feedback on students’ compositions. We compared the quality of human and AI feedback by scoring the feedback each provided on secondary student essays (n=200) on five measures of feedback quality: the degree to which feedback (a) was criteria-based, (b) provided clear directions for...

  • Melissa Bond, Hassan Khosravi, Maarten D...
    |
    Jan 22nd, 2023
    |
    journalArticle
    Melissa Bond, Hassan Khosravi, Maarten D...
    Jan 22nd, 2023
  • Zichao Wang, Jakob Valdez, Debshila Basu...
    |
    Jan 22nd, 2022
    |
    bookSection
    Zichao Wang, Jakob Valdez, Debshila Basu...
    Jan 22nd, 2022
  • Zichao Wang, Jakob Valdez, Debshila Basu...
    |
    Jan 22nd, 2022
    |
    bookSection
    Zichao Wang, Jakob Valdez, Debshila Basu...
    Jan 22nd, 2022
  • Peiyi Wang, Lei Li, Zhihong Shao
    |
    Jan 22nd, 2024
    |
    preprint
    Peiyi Wang, Lei Li, Zhihong Shao
    Jan 22nd, 2024

    In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

  • Zichao Wang, Jakob Valdez, Debshila Basu...
    |
    Jan 22nd, 2022
    |
    bookSection
    Zichao Wang, Jakob Valdez, Debshila Basu...
    Jan 22nd, 2022
  • Peiyi Wang, Lei Li, Zhihong Shao
    |
    Feb 19th, 2024
    |
    preprint
    Peiyi Wang, Lei Li, Zhihong Shao
    Feb 19th, 2024

    In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

  • Zheng Chu, Jingchang Chen, Qianglong Che...
    |
    Jan 22nd, 2024
    |
    preprint
    Zheng Chu, Jingchang Chen, Qianglong Che...
    Jan 22nd, 2024

    Reasoning, a fundamental cognitive process integral to human intelligence, has garnered substantial interest within artificial intelligence. Notably, recent studies have revealed that chain-of-thought prompting significantly enhances LLM's reasoning capabilities, which attracts widespread attention from both academics and industry. In this paper, we systematically investigate relevant research, summarizing advanced methods through a meticulous taxonomy that offers novel perspectives....

  • Jacob Steiss, Tamara Tate, Steve Graham,...
    |
    Jun 22nd, 2024
    |
    journalArticle
    Jacob Steiss, Tamara Tate, Steve Graham,...
    Jun 22nd, 2024

    Structured Abstract Background Offering students formative feedback on their writing is an effective way to facilitate writing development. Recent advances in AI (i.e., ChatGPT) may function as an automated writing evaluation tool, increasing the amount of feedback students receive and diminishing the burden on teachers to provide frequent feedback to large classes. Aims We examined the ability of generative AI (ChatGPT) to provide formative feedback. We compared the quality of human and AI...

  • Valentin Hofmann, David Heineman, Ian Ma...
    |
    Sep 14th, 2025
    |
    preprint
    Valentin Hofmann, David Heineman, Ian Ma...
    Sep 14th, 2025

    Language model (LM) benchmarking faces several challenges: comprehensive evaluations are costly, benchmarks often fail to measure the intended capabilities, and evaluation quality can degrade due to labeling errors and benchmark saturation. Although various strategies have been proposed to mitigate these issues, they tend to address individual aspects in isolation, neglecting broader questions about overall evaluation quality. Here, we introduce FLUID BENCHMARKING, a new evaluation approach...

  • Lei Huang, Weijiang Yu, Weitao Ma
    |
    Jan 24th, 2025
    |
    preprint
    Lei Huang, Weijiang Yu, Weitao Ma
    Jan 24th, 2025

    The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), fueling a paradigm shift in information acquisition. Nevertheless, LLMs are prone to hallucination, generating plausible yet nonfactual content. This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval (IR) systems and has attracted intensive research to detect and mitigate such hallucinations. Given the open-ended...

  • Billy Ho Hung Cheung, Gary Kui Kai Lau, ...
    |
    Aug 29th, 2023
    |
    journalArticle
    Billy Ho Hung Cheung, Gary Kui Kai Lau, ...
    Aug 29th, 2023

    Large language models, in particular ChatGPT, have showcased remarkable language processing capabilities. Given the substantial workload of university medical staff, this study aims to assess the quality of multiple-choice questions (MCQs) produced by ChatGPT for use in graduate medical examinations, compared to questions written by university professoriate staffs based on standard medical textbooks.

  • Jacob Doughty, Zipiao Wan, Anishka Bompe...
    |
    Jan 29th, 2024
    |
    conferencePaper
    Jacob Doughty, Zipiao Wan, Anishka Bompe...
    Jan 29th, 2024
  • Susan Lottridge, Amy Burkhardt, Christop...
    |
    journalArticle
    Susan Lottridge, Amy Burkhardt, Christop...

    Every year, millions of middle-school students write argumentative essays that are evaluated against a scoring rubric. However, the scores they receive don’t necessarily offer clear guidance on how to improve their essay or what they’ve done well. With advancements in natural language processing technology, we now have the capability to provide more detailed feedback. At this juncture, we’ve developed an artificial intelligence-supported editing tool to assist students in revising their...

  • Iddo Drori, Sarah Zhang, Reece Shuttlewo...
    |
    Aug 2nd, 2022
    |
    journalArticle
    Iddo Drori, Sarah Zhang, Reece Shuttlewo...
    Aug 2nd, 2022

    We demonstrate that a neural network pretrained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI’s Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a dataset of questions from Massachusetts Institute of Technology (MIT)’s largest mathematics courses (Single Variable and Multivariable Calculus,...

  • Ying Xu, Dakuo Wang, Mo Yu
    |
    Jan 22nd, 2022
    |
    journalArticle
    Ying Xu, Dakuo Wang, Mo Yu
    Jan 22nd, 2022

    Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on the reading education research, we introduce FairytaleQA, a dataset focusing on narrative...

  • Tejal Patwardhan, Rachel Dias, Elizabeth...
    |
    Oct 5th, 2025
    |
    preprint
    Tejal Patwardhan, Rachel Dias, Elizabeth...
    Oct 5th, 2025

    We introduce GDPval, a benchmark evaluating AI model capabilities on real-world economically valuable tasks. GDPval covers the majority of U.S. Bureau of Labor Statistics Work Activities for 44 occupations across the top 9 sectors contributing to U.S. GDP (Gross Domestic Product). Tasks are constructed from the representative work of industry professionals with an average of 14 years of experience. We find that frontier model performance on GDPval is improving roughly linearly over time, and...

  • Rohan Anil, Andrew M. Dai, Orhan Firat
    |
    May 17th, 2023
    |
    preprint
    Rohan Anil, Andrew M. Dai, Orhan Firat
    May 17th, 2023

    We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more...

  • Abhimanyu Dubey, Abhinav Jauhri, Abhinav...
    |
    Aug 15th, 2024
    |
    preprint
    Abhimanyu Dubey, Abhinav Jauhri, Abhinav...
    Aug 15th, 2024

    Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language...

  • Rishi Bommasani, Drew A. Hudson, Ehsan A...
    |
    Jan 22nd, 2021
    |
    journalArticle
    Rishi Bommasani, Drew A. Hudson, Ehsan A...
    Jan 22nd, 2021

    AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical...

Last update from database: 22/01/2026, 14:15 (UTC)
Powered by Zotero and Kerko.