GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

Open in Zotero

View on zotero.org

Open in Zotero

View on zotero.org

Article Status

Published

Authors/contributors

Patwardhan, Tejal (Author)
Dias, Rachel (Author)
Proehl, Elizabeth (Author)
Kim, Grace (Author)
Wang, Michele (Author)
Watkins, Olivia (Author)
Fishman, Simón Posada (Author)
Aljubeh, Marwan (Author)
Thacker, Phoebe (Author)
Fauconnet, Laurance (Author)
Kim, Natalie S. (Author)
Chao, Patrick (Author)
Miserendino, Samuel (Author)
Chabot, Gildas (Author)
Li, David (Author)
Sharman, Michael (Author)
Barr, Alexandra (Author)
Glaese, Amelia (Author)
Tworek, Jerry (Author)

Title

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

Abstract

We introduce GDPval, a benchmark evaluating AI model capabilities on real-world economically valuable tasks. GDPval covers the majority of U.S. Bureau of Labor Statistics Work Activities for 44 occupations across the top 9 sectors contributing to U.S. GDP (Gross Domestic Product). Tasks are constructed from the representative work of industry professionals with an average of 14 years of experience. We find that frontier model performance on GDPval is improving roughly linearly over time, and that the current best frontier models are approaching industry experts in deliverable quality. We analyze the potential for frontier models, when paired with human oversight, to perform GDPval tasks cheaper and faster than unaided experts. We also demonstrate that increased reasoning effort, increased task context, and increased scaffolding improves model performance on GDPval. Finally, we open-source a gold subset of 220 tasks and provide a public automated grading service at evals.openai.com to facilitate future research in understanding real-world model capabilities.

Repository

arXiv

Archive ID

arXiv:2510.04374

Date

2025-10-05

DOI

10.48550/arXiv.2510.04374

Citation Key

patwardhan2025

URL

http://arxiv.org/abs/2510.04374

Accessed

20/10/2025, 20:07

Short Title

GDPval

Library Catalogue

arXiv.org

Extra

arXiv:2510.04374 [cs] Read_Status: New Read_Status_Date: 2026-01-26T11:32:41.339Z

Citation

Patwardhan, T., Dias, R., Proehl, E., Kim, G., Wang, M., Watkins, O., Fishman, S. P., Aljubeh, M., Thacker, P., Fauconnet, L., Kim, N. S., Chao, P., Miserendino, S., Chabot, G., Li, D., Sharman, M., Barr, A., Glaese, A., & Tworek, J. (2025). GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks (arXiv:2510.04374). arXiv. https://doi.org/10.48550/arXiv.2510.04374

Link to this record

https://aievidencehub.org/lib/BJFUUIWJ