In authors or contributors

3 resources

  • Valentin Hofmann, David Heineman, Ian Ma...
    |
    Sep 14th, 2025
    |
    preprint
    Valentin Hofmann, David Heineman, Ian Ma...
    Sep 14th, 2025

    Language model (LM) benchmarking faces several challenges: comprehensive evaluations are costly, benchmarks often fail to measure the intended capabilities, and evaluation quality can degrade due to labeling errors and benchmark saturation. Although various strategies have been proposed to mitigate these issues, they tend to address individual aspects in isolation, neglecting broader questions about overall evaluation quality. Here, we introduce FLUID BENCHMARKING, a new evaluation approach...

  • Rishi Bommasani, Drew A. Hudson, Ehsan A...
    |
    Oct 24th, 2021
    |
    journalArticle
    Rishi Bommasani, Drew A. Hudson, Ehsan A...
    Oct 24th, 2021

    AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical...

  • Rishi Bommasani, Drew A. Hudson, Ehsan A...
    |
    Jul 12th, 2022
    |
    preprint
    Rishi Bommasani, Drew A. Hudson, Ehsan A...
    Jul 12th, 2022

    AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical...

Last update from database: 24/10/2025, 12:15 (UTC)
Powered by Zotero and Kerko.