Jianwei Cui


pdf bib
Focus-Constrained Attention Mechanism for CVAE-based Response Generation
Zhi Cui | Yanran Li | Jiayi Zhang | Jianwei Cui | Chen Wei | Bin Wang
Findings of the Association for Computational Linguistics: EMNLP 2020

To model diverse responses for a given post, one promising way is to introduce a latent variable into Seq2Seq models. The latent variable is supposed to capture the discourse-level information and encourage the informativeness of target responses. However, such discourse-level information is often too coarse for the decoder to be utilized. To tackle it, our idea is to transform the coarse-grained discourse-level information into fine-grained word-level information. Specifically, we firstly measure the semantic concentration of corresponding target response on the post words by introducing a fine-grained focus signal. Then, we propose a focus-constrained attention mechanism to take full advantage of focus in well aligning the input to the target response. The experimental results demonstrate that by exploiting the fine-grained signal, our model can generate more diverse and informative responses compared with several state-of-the-art models.

pdf bib
Xiaomi’s Submissions for IWSLT 2020 Open Domain Translation Task
Yuhui Sun | Mengxue Guo | Xiang Li | Jianwei Cui | Bin Wang
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes the Xiaomi’s submissions to the IWSLT20 shared open domain translation task for Chinese<->Japanese language pair. We explore different model ensembling strategies based on recent Transformer variants. We also further strengthen our systems via some effective techniques, such as data filtering, data selection, tagged back translation, domain adaptation, knowledge distillation, and re-ranking. Our resulting Chinese->Japanese primary system ranked second in terms of character-level BLEU score among all submissions. Our resulting Japanese->Chinese primary system also achieved a competitive performance.

pdf bib
Modeling Discourse Structure for Document-level Neural Machine Translation
Junxuan Chen | Xiang Li | Jiarui Zhang | Chulun Zhou | Jianwei Cui | Bin Wang | Jinsong Su
Proceedings of the First Workshop on Automatic Simultaneous Translation

Recently, document-level neural machine translation (NMT) has become a hot topic in the community of machine translation. Despite its success, most of existing studies ignored the discourse structure information of the input document to be translated, which has shown effective in other tasks. In this paper, we propose to improve document-level NMT with the aid of discourse structure information. Our encoder is based on a hierarchical attention network (HAN) (Miculicich et al., 2018). Specifically, we first parse the input document to obtain its discourse structure. Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word. Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder. Experimental results on the English-to-German dataset show that our model can significantly outperform both Transformer and Transformer+HAN.

pdf bib
Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism
Pan Xie | Zhi Cui | Xiuying Chen | XiaoHui Hu | Jianwei Cui | Bin Wang
Proceedings of the 28th International Conference on Computational Linguistics

Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to train a conditional masked translation model (CMTM), and refine the generated results within several iterations. Unfortunately, such approach hardly considers the sequential dependency among target words, which inevitably results in a translation degradation. Hence, instead of solely training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse sequential information into it. Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept. The experimental results (WMT14 En ↔ De and WMT16 En ↔ Ro) demonstrate that our model uses dramatically less training computations than the typical CMTM, as well as outperforms several state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge distillation, our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.