2 resources

  • Yiqing Xie, Alex Xie, Divyanshu Sheth
    |
    Mar 31st, 2024
    |
    preprint
    Yiqing Xie, Alex Xie, Divyanshu Sheth
    Mar 31st, 2024

    To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples...

  • Rylan Schaeffer, Brando Miranda, Sanmi K...
    |
    May 22nd, 2023
    |
    preprint
    Rylan Schaeffer, Brando Miranda, Sanmi K...
    May 22nd, 2023

    Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an alternative explanation for emergent abilities: that for a particular task and model family,...

Last update from database: 02/03/2025, 19:15 (UTC)
Powered by Zotero and Kerko.