BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection
Jihyung Moon | Won Ik Cho | Junbum Lee
Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media
Quality estimation (QE), a task of evaluating the quality of translation automatically without human-translated reference, is one of the important challenges for machine translation (MT). As the QE methods, BLEU score for round-trip translation (RTT) had been considered. However, it was found to be a poor predictor of translation quality since BLEU was not an adequate metric to detect semantic similarity between input and RTT. Recently, the pre-trained language models have made breakthroughs in many NLP tasks by providing semantically meaningful word and sentence embeddings. In this paper, we employ the semantic embeddings to RTT-based QE metric. Our method achieves the highest correlations with human judgments compared to WMT 2019 quality estimation metric task submissions. Additionally, we observe that with semantic-level metrics, RTT-based QE is robust to the choice of a backward translation system and shows consistent performance on both SMT and NMT forward translation systems.