Evaluation of machine translation and its evaluation

Turian, Joseph P.; Shen, Luke; Melamed, I. Dan

doi:10.21236/ada453509

Evaluation of machine translation and its evaluation

Article Status

Published

Authors/contributors

Turian, Joseph P. (Author)
Shen, Luke (Author)
Melamed, I. Dan (Author)

Title

Evaluation of machine translation and its evaluation

Abstract

Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, machine translation can be evaluated using the well-known measures precision, recall, and their average, the F-measure. The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives. More importantly, this standard measure has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved. The relevant software is publicly available from http://nlp.cs.nyu.edu/GTM/.

Date

2006-1-1

Proceedings Title

Proceedings of Machine Translation Summit IX: Papers

Place

New Orleans, USA

DOI

10.21236/ada453509

URL

http://www.dtic.mil/docs/citations/ADA453509

Extra

<AI Smry>: The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives and has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved.

Citation

Turian, J. P., Shen, L., & Melamed, I. D. (2006, January 1). Evaluation of machine translation and its evaluation. Proceedings of Machine Translation Summit IX: Papers. https://doi.org/10.21236/ada453509

Technical methods

model evaluation subgroup

Link to this record

https://aievidencehub.org/lib/UAJP4GYK