Evaluation of machine translation and its evaluation

Article Status
Published
Authors/contributors
Title
Evaluation of machine translation and its evaluation
Abstract
Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, machine translation can be evaluated using the well-known measures precision, recall, and their average, the F-measure. The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives. More importantly, this standard measure has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved. The relevant software is publicly available from http://nlp.cs.nyu.edu/GTM/.
Proceedings Title
Proceedings of Machine Translation Summit IX: Papers
Place
New Orleans, USA
Date
2006-1-1
Citation Key
turian2006
Extra
<AI Smry>: The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives and has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved. <标题>: 机器翻译及其评估的评价 Read_Status: New Read_Status_Date: 2026-01-26T11:33:38.512Z
Citation
Turian, J. P., Shen, L., & Melamed, I. D. (2006, January 1). Evaluation of machine translation and its evaluation. Proceedings of Machine Translation Summit IX: Papers. https://doi.org/10.21236/ada453509
Powered by Zotero and Kerko.