In authors or contributors

1 resource

  • Hugh Zhang, Jeff Da, Dean Lee
    |
    May 3rd, 2024
    |
    preprint
    Hugh Zhang, Jeff Da, Dean Lee
    May 3rd, 2024

    Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). GSM1k is designed to mirror the style and complexity of the established GSM8k...

Last update from database: 27/12/2024, 16:15 (UTC)
Powered by Zotero and Kerko.