Towards a Unified Multi-Dimensional Evaluator for Text Generation

Open in Zotero

View on zotero.org

Open in Zotero

View on zotero.org

Article Status

Published

Authors/contributors

Zhong, Ming (Author)
Liu, Yang (Author)
Yin, Da (Author)
Mao, Yuning (Author)
Jiao, Yizhu (Author)
Liu, Pengfei (Author)
Zhu, Chenguang (Author)
Ji, Heng (Author)
Han, Jiawei (Author)

Title

Towards a Unified Multi-Dimensional Evaluator for Text Generation

Abstract

Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG evaluation as a Boolean Question Answering (QA) task, and by guiding the model with different questions, we can use one evaluator to evaluate from multiple dimensions. Furthermore, thanks to the unified Boolean QA format, we are able to introduce an intermediate learning phase that enables UniEval to incorporate external knowledge from multiple related tasks and gain further improvement. Experiments on three typical NLG tasks show that UniEval correlates substantially better with human judgments than existing metrics. Specifically, compared to the top-performing unified evaluators, UniEval achieves a 23% higher correlation on text summarization, and over 43% on dialogue response generation. Also, UniEval demonstrates a strong zero-shot learning ability for unseen evaluation dimensions and tasks. Source code, data and all pre-trained evaluators are available on our GitHub repository (https://github.com/maszhongming/UniEval).

Repository

arXiv

Archive ID

arXiv:2210.07197

Place

Abu Dhabi, United Arab Emirates

Date

2022

URL

https://aclanthology.org/2022.emnlp-main.131

Accessed

27/10/2023, 17:34

Library Catalogue

arXiv.org

Extra

arXiv:2210.07197 [cs] <标题>: 迈向统一的多维文本生成评估器 Read_Status: New Read_Status_Date: 2025-11-10T07:25:50.080Z Citation Key: zhong2022

Citation

Zhong, M., Liu, Y., Yin, D., Mao, Y., Jiao, Y., Liu, P., Zhu, C., Ji, H., & Han, J. (2022). Towards a Unified Multi-Dimensional Evaluator for Text Generation (arXiv:2210.07197). arXiv. https://aclanthology.org/2022.emnlp-main.131

Link to this record

https://aievidencehub.org/lib/B8USIPBN