Modifications of Machine Translation Evaluation Metrics by Using Word Embeddings

Haozhou Wang, Paola Merlo


Abstract
Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but these metrics have poor correlations with human judgements because they badly represent word similarity and impose strict identity matching. In this paper, we propose some modifications to the traditional measures based on word embeddings for these two metrics. The evaluation results show that our modifications significantly improve their correlation with human judgements.
Anthology ID:
W16-4505
Volume:
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
HyTra | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
33–41
Language:
URL:
https://www.aclweb.org/anthology/W16-4505
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W16-4505.pdf