InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Colombo, Pierre Jean A.; Clavel, Chloé; Piantanida, Pablo

doi:10.1609/aaai.v36i10.21299

Return

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Article Status

Published

Authors/contributors

Title

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Abstract

Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and two figure correlation gains in many configurations compared to existing metrics on both summarization and data2text generation tasks.

Publication

Proceedings of the AAAI Conference on Artificial Intelligence

Volume

36

Issue

10

Pages

10554-10562

Date

2022-6-28

Journal Abbr

AAAI

DOI

10.1609/aaai.v36i10.21299

ISSN

2374-3468

Short Title

InfoLM

URL

https://ojs.aaai.org/index.php/AAAI/article/view/21299

Accessed

12/06/2024, 18:18

Library Catalogue

DOI.org (Crossref)

Extra

<AI Smry>: This paper introduces InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model and makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria.

Citation

Colombo, P. J. A., Clavel, C., & Piantanida, P. (2022). InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10554–10562. https://doi.org/10.1609/aaai.v36i10.21299

Technical methods

model evaluation subgroup

Link to this record

https://aievidencehub.org/lib/2QYEV4RL