Whose Opinions Do Language Models Reflect? – Evidence Library – Artificial Intelligence in Measurement and Education

View on zotero.org

View on zotero.org

Whose Opinions Do Language Models Reflect?

Article Status

Published

Authors/contributors

Santurkar, Shibani (Author)
Durmus, Esin (Author)
Ladhak, Faisal (Author)
Lee, Cinoo (Author)
Liang, Percy (Author)
Hashimoto, Tatsunori (Author)

Title

Whose Opinions Do Language Models Reflect?

Abstract

Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs in response to subjective queries can have a profound impact, both on user satisfaction, as well as shaping the views of society at large. In this work, we put forth a quantitative framework to investigate the opinions reflected by LMs -- by leveraging high-quality public opinion polls and their associated human responses. Using this framework, we create OpinionsQA, a new dataset for evaluating the alignment of LM opinions with those of 60 US demographic groups over topics ranging from abortion to automation. Across topics, we find substantial misalignment between the views reflected by current LMs and those of US demographic groups: on par with the Democrat-Republican divide on climate change. Notably, this misalignment persists even after explicitly steering the LMs towards particular demographic groups. Our analysis not only confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs, but also surfaces groups whose opinions are poorly reflected by current LMs (e.g., 65+ and widowed individuals). Our code and data are available at https://github.com/tatsu-lab/opinions_qa.

Repository

arXiv

Archive ID

arXiv:2303.17548

Date

2023-03-30

DOI

10.48550/arXiv.2303.17548

URL

http://arxiv.org/abs/2303.17548

Accessed

19/05/2023, 21:22

Library Catalogue

Extra

arXiv:2303.17548 [cs] <AI Smry>: This work creates OpinionsQA, a new dataset for evaluating the alignment of LM opinions with those of 60 US demographic groups over topics ranging from abortion to automation, and finds substantial misalignment.

Citation

Santurkar, S., Durmus, E., Ladhak, F., Lee, C., Liang, P., & Hashimoto, T. (2023). Whose Opinions Do Language Models Reflect? (arXiv:2303.17548). arXiv. https://doi.org/10.48550/arXiv.2303.17548

Link to this record

https://aievidencehub.org/lib/79QPTNV7

Powered by Zotero and Kerko.