Jun Lu


2018

pdf bib
Alibaba’s Neural Machine Translation Systems for WMT18
Yongchao Deng | Shanbo Cheng | Jun Lu | Kai Song | Jingang Wang | Shenglan Wu | Liang Yao | Guchun Zhang | Haibo Zhang | Pei Zhang | Changfeng Zhu | Boxing Chen
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the submission systems of Alibaba for WMT18 shared news translation task. We participated in 5 translation directions including English ↔ Russian, English ↔ Turkish in both directions and English → Chinese. Our systems are based on Google’s Transformer model architecture, into which we integrated the most recent features from the academic research. We also employed most techniques that have been proven effective during the past WMT years, such as BPE, back translation, data selection, model ensembling and reranking, at industrial scale. For some morphologically-rich languages, we also incorporated linguistic knowledge into our neural network. For the translation tasks in which we have participated, our resulting systems achieved the best case sensitive BLEU score in all 5 directions. Notably, our English → Russian system outperformed the second reranked system by 5 BLEU score.

pdf bib
Alibaba Submission to the WMT18 Parallel Corpus Filtering Task
Jun Lu | Xiaoyu Lv | Yangbin Shi | Boxing Chen
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the Alibaba Machine Translation Group submissions to the WMT 2018 Shared Task on Parallel Corpus Filtering. While evaluating the quality of the parallel corpus, the three characteristics of the corpus are investigated, i.e. 1) the bilingual/translation quality, 2) the monolingual quality and 3) the corpus diversity. Both rule-based and model-based methods are adapted to score the parallel sentence pairs. The final parallel corpus filtering system is reliable, easy to build and adapt to other language pairs.

2014

pdf bib
An Iterative Link-based Method for Parallel Web Page Mining
Le Liu | Yu Hong | Jun Lu | Jun Lang | Heng Ji | Jianmin Yao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2012

pdf bib
Semi-supervised Chinese Word Segmentation for CLP2012
Saike He | Nan He | Songxiang Cen | Jun Lu
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

2011

pdf bib
Factual or Satisfactory: What Search Results Are Better?
Yu Hong | Jun Lu | Shiqi Zhao
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation