Boxing Chen


2020

pdf bib
Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences
Xiangyu Duan | Baijun Ji | Hao Jia | Min Tan | Min Zhang | Boxing Chen | Weihua Luo | Yue Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a new task of machine translation (MT), which is based on no parallel sentences but can refer to a ground-truth bilingual dictionary. Motivated by the ability of a monolingual speaker learning to translate via looking up the bilingual dictionary, we propose the task to see how much potential an MT system can attain using the bilingual dictionary and large scale monolingual corpora, while is independent on parallel sentences. We propose anchored training (AT) to tackle the task. AT uses the bilingual dictionary to establish anchoring points for closing the gap between source language and target language. Experiments on various language pairs show that our approaches are significantly better than various baselines, including dictionary-based word-by-word translation, dictionary-supervised cross-lingual word embedding transformation, and unsupervised MT. On distant language pairs that are hard for unsupervised MT to perform well, AT performs remarkably better, achieving performances comparable to supervised SMT trained on more than 4M parallel sentences.

pdf bib
Domain Transfer based Data Augmentation for Neural Query Translation
Liang Yao | Baosong Yang | Haibo Zhang | Boxing Chen | Weihua Luo
Proceedings of the 28th International Conference on Computational Linguistics

Query translation (QT) serves as a critical factor in successful cross-lingual information retrieval (CLIR). Due to the lack of parallel query samples, neural-based QT models are usually optimized with synthetic data which are derived from large-scale monolingual queries. Nevertheless, such kind of pseudo corpus is mostly produced by a general-domain translation model, making it be insufficient to guide the learning of QT model. In this paper, we extend the data augmentation with a domain transfer procedure, thus to revise synthetic candidates to search-aware examples. Specifically, the domain transfer model is built upon advanced Transformer, in which layer coordination and mixed attention are exploited to speed up the refining process and leverage parameters from a pre-trained cross-lingual language model. In order to examine the effectiveness of the proposed method, we collected French-to-English and Spanish-to-English QT test sets, each of which consists of 10,000 parallel query pairs with careful manual-checking. Qualitative and quantitative analyses reveal that our model significantly outperforms strong baselines and the related domain transfer methods on both translation quality and retrieval accuracy.

pdf bib
Self-Paced Learning for Neural Machine Translation
Yu Wan | Baosong Yang | Derek F. Wong | Yikai Zhou | Lidia S. Chao | Haibo Zhang | Boxing Chen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing self-paced learning, where NMT model is allowed to 1) automatically quantify the learning confidence over training examples; and 2) flexibly govern its learning via regulating the loss in each iteration step. Experimental results over multiple translation tasks demonstrate that the proposed model yields better performance than strong baselines and those models trained with human-designed curricula on both translation quality and convergence speed.

pdf bib
Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation
Pei Zhang | Boxing Chen | Niyu Ge | Kai Fan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.

pdf bib
Iterative Domain-Repaired Back-Translation
Hao-Ran Wei | Zhirui Zhang | Boxing Chen | Weihua Luo
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent. One common and effective strategy for this case is exploiting in-domain monolingual data with the back-translation method. However, the synthetic parallel data is very noisy because they are generated by imperfect out-of-domain systems, resulting in the poor performance of domain adaptation. To address this issue, we propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair (DR) model to refine translations in synthetic bilingual data. To this end, we construct corresponding data for the DR model training by round-trip translating the monolingual sentences, and then design the unified training framework to optimize paired DR and NMT models jointly. Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach, achieving 15.79 and 4.47 BLEU improvements on average over unadapted models and back-translation.

2019

pdf bib
Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention
Xiangyu Duan | Mingming Yin | Min Zhang | Boxing Chen | Weihua Luo
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Abstractive Sentence Summarization (ASSUM) targets at grasping the core idea of the source sentence and presenting it as the summary. It is extensively studied using statistical models or neural models based on the large-scale monolingual source-summary parallel corpus. But there is no cross-lingual parallel corpus, whose source sentence language is different to the summary language, to directly train a cross-lingual ASSUM system. We propose to solve this zero-shot problem by using resource-rich monolingual ASSUM system to teach zero-shot cross-lingual ASSUM system on both summary word generation and attention. This teaching process is along with a back-translation process which simulates source-summary pairs. Experiments on cross-lingual ASSUM task show that our proposed method is significantly better than pipeline baselines and previous works, and greatly enhances the cross-lingual performances closer to the monolingual performances.

pdf bib
Lattice Transformer for Speech Translation
Pei Zhang | Niyu Ge | Boxing Chen | Kai Fan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent advances in sequence modeling have highlighted the strengths of the transformer architecture, especially in achieving state-of-the-art machine translation results. However, depending on the up-stream systems, e.g., speech recognition, or word segmentation, the input to translation system can vary greatly. The goal of this work is to extend the attention mechanism of the transformer to naturally consume the lattice in addition to the traditional sequential input. We first propose a general lattice transformer for speech translation where the input is the output of the automatic speech recognition (ASR) which contains multiple paths and posterior scores. To leverage the extra information from the lattice structure, we develop a novel controllable lattice attention mechanism to obtain latent representations. On the LDC Spanish-English speech translation corpus, our experiments show that lattice transformer generalizes significantly better and outperforms both a transformer baseline and a lattice LSTM. Additionally, we validate our approach on the WMT 2017 Chinese-English translation task with lattice inputs from different BPE segmentations. In this task, we also observe the improvements over strong baselines.

2018

pdf bib
Alibaba’s Neural Machine Translation Systems for WMT18
Yongchao Deng | Shanbo Cheng | Jun Lu | Kai Song | Jingang Wang | Shenglan Wu | Liang Yao | Guchun Zhang | Haibo Zhang | Pei Zhang | Changfeng Zhu | Boxing Chen
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission systems of Alibaba for WMT18 shared news translation task. We participated in 5 translation directions including English ↔ Russian, English ↔ Turkish in both directions and English → Chinese. Our systems are based on Google’s Transformer model architecture, into which we integrated the most recent features from the academic research. We also employed most techniques that have been proven effective during the past WMT years, such as BPE, back translation, data selection, model ensembling and reranking, at industrial scale. For some morphologically-rich languages, we also incorporated linguistic knowledge into our neural network. For the translation tasks in which we have participated, our resulting systems achieved the best case sensitive BLEU score in all 5 directions. Notably, our English → Russian system outperformed the second reranked system by 5 BLEU score.

pdf bib
Alibaba Submission for WMT18 Quality Estimation Task
Jiayi Wang | Kai Fan | Bo Li | Fengming Zhou | Boxing Chen | Yangbin Shi | Luo Si
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The goal of WMT 2018 Shared Task on Translation Quality Estimation is to investigate automatic methods for estimating the quality of machine translation results without reference translations. This paper presents the QE Brain system, which proposes the neural Bilingual Expert model as a feature extractor based on conditional target language model with a bidirectional transformer and then processes the semantic representations of source and the translation output with a Bi-LSTM predictive model for automatic quality estimation. The system has been applied to the sentence-level scoring and ranking tasks as well as the word-level tasks for finding errors for each word in translations. An extensive set of experimental results have shown that our system outperformed the best results in WMT 2017 Quality Estimation tasks and obtained top results in WMT 2018.

pdf bib
Alibaba Submission to the WMT18 Parallel Corpus Filtering Task
Jun Lu | Xiaoyu Lv | Yangbin Shi | Boxing Chen
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the Alibaba Machine Translation Group submissions to the WMT 2018 Shared Task on Parallel Corpus Filtering. While evaluating the quality of the parallel corpus, the three characteristics of the corpus are investigated, i.e. 1) the bilingual/translation quality, 2) the monolingual quality and 3) the corpus diversity. Both rule-based and model-based methods are adapted to score the parallel sentence pairs. The final parallel corpus filtering system is reliable, easy to build and adapt to other language pairs.

2017

pdf bib
Cost Weighting for Neural Machine Translation Domain Adaptation
Boxing Chen | Colin Cherry | George Foster | Samuel Larkin
Proceedings of the First Workshop on Neural Machine Translation

In this paper, we propose a new domain adaptation technique for neural machine translation called cost weighting, which is appropriate for adaptation scenarios in which a small in-domain data set and a large general-domain data set are available. Cost weighting incorporates a domain classifier into the neural machine translation training algorithm, using features derived from the encoder representation in order to distinguish in-domain from out-of-domain data. Classifier probabilities are used to weight sentences according to their domain similarity when updating the parameters of the neural translation model. We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting. Experiments on two large-data tasks show that both the traditional techniques and our novel proposal lead to significant gains, with cost weighting outperforming the traditional methods.

pdf bib
NRC Machine Translation System for WMT 2017
Chi-kiu Lo | Boxing Chen | Colin Cherry | George Foster | Samuel Larkin | Darlene Stewart | Roland Kuhn
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Semi-supervised Convolutional Networks for Translation Adaptation with Tiny Amount of In-domain Data
Boxing Chen | Fei Huang
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning

2015

pdf bib
Multi-level Evaluation for Machine Translation
Boxing Chen | Hongyu Guo | Roland Kuhn
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Representation Based Translation Evaluation Metrics
Boxing Chen | Hongyu Guo
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Bilingual Sentiment Consistency for Statistical Machine Translation
Boxing Chen | Xiaodan Zhu
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU
Boxing Chen | Colin Cherry
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf bib
Vector Space Model for Adaptation in Statistical Machine Translation
Boxing Chen | Roland Kuhn | George Foster
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Adaptation of Reordering Models for Statistical Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
Boxing Chen | Roland Kuhn | Samuel Larkin
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Improving AMBER, an MT Evaluation Metric
Boxing Chen | Roland Kuhn | George Foster
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
AMBER: A Modified BLEU, Enhanced Ranking Metric
Boxing Chen | Roland Kuhn
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib
Fast Consensus Hypothesis Regeneration for Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Lessons from NRC’s Portage System at WMT 2010
Samuel Larkin | Boxing Chen | George Foster | Ulrich Germann | Eric Joanis | Howard Johnson | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Phrase Clustering for Smoothing TM Probabilities - or, How to Extract Paraphrases from Phrase Tables
Roland Kuhn | Boxing Chen | George Foster | Evan Stratford
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Bilingual Sense Similarity for Statistical Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

2009

pdf bib
A Comparative Study of Hypothesis Alignment and its Improvement for Machine Translation System Combination
Boxing Chen | Min Zhang | Haizhou Li | Aiti Aw
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Exploiting N-best Hypotheses for SMT Self-Enhancement
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Regenerating Hypotheses for Statistical Machine Translation
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2006

pdf bib
A Web-based Demonstrator of a Multi-lingual Phrase-based Translation System
Roldano Cattoni | Nicola Bertoldi | Mauro Cettolo | Boxing Chen | Marcello Federico
Demonstrations

2004

pdf bib
Combining clues for lexical level aligning using the Null hypothesis approach
Olivier Kraif | Boxing Chen
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Using a Word Sense Disambiguation system for translation disambiguation: the LIA-LIDILEM team experiment
Grégoire Moreau de Montcheuil | Marc El-Bèze | Boxing Chen | Olivier Kraif
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

2003

pdf bib
Preparatory Work on Automatic Extraction of Bilingual Multi-Word Units from Parallel Corpora
Boxing Chen | Limin Du
International Journal of Computational Linguistics & Chinese Language Processing, Volume 8, Number 2, August 2003