Leveraging Large Language Models for NLG Evaluation: A Survey
Article Status
Published
Authors/contributors
- Li, Zhen (Author)
- Xu, Xiaohan (Author)
- Shen, Tao (Author)
- Xu, Can (Author)
- Gu, Jia-Chen (Author)
- Tao, Chongyang (Author)
Title
Leveraging Large Language Models for NLG Evaluation: A Survey
Abstract
In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This survey aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. Our detailed exploration includes critically assessing various LLM-based methodologies, as well as comparing their strengths and limitations in evaluating NLG outputs. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this survey seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.
Date
2024
Short Title
Leveraging Large Language Models for NLG Evaluation
Accessed
12/03/2024, 16:19
Library Catalogue
DOI.org (Datacite)
Rights
Creative Commons Attribution 4.0 International
Extra
Publisher: [object Object]
Version Number: 1
<AI Smry>: A coherent taxonomy for organizing existing LLM-based evaluation metrics is proposed, offering a structured framework to understand and compare these methods and seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.
Citation
Li, Z., Xu, X., Shen, T., Xu, C., Gu, J.-C., & Tao, C. (2024). Leveraging Large Language Models for NLG Evaluation: A Survey. https://doi.org/10.48550/ARXIV.2401.07103
Technical methods
Link to this record