3 resources

  • Weizhe Yuan, Graham Neubig, Pengfei Liu,...
    |
    Apr 4th, 2021
    |
    journalArticle
    Weizhe Yuan, Graham Neubig, Pengfei Liu,...
    Apr 4th, 2021

    A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference...

  • Jinlan Fu, See-Kiong Ng, Zhengbao Jiang,...
    |
    Apr 4th, 2023
    |
    journalArticle
    Jinlan Fu, See-Kiong Ng, Zhengbao Jiang,...
    Apr 4th, 2023

    Generative Artificial Intelligence (AI) has enabled the development of sophisticated models that are capable of producing high-caliber text, images, and other outputs through the utilization of large pre-trained models. Nevertheless, assessing the quality of the generation is an even more arduous task than the generation itself, and this issue has not been given adequate consideration recently. This paper proposes a novel evaluation framework, GPTScore, which utilizes the emergent abilities...

  • Ming Zhong, Yang Liu, Da Yin
    |
    Oct 13th, 2022
    |
    preprint
    Ming Zhong, Yang Liu, Da Yin
    Oct 13th, 2022

    Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG...

Last update from database: 04/04/2025, 20:15 (UTC)
Powered by Zotero and Kerko.