Mu Li


2020

pdf bib
Emotion Classification by Jointly Learning to Lexiconize and Classify
Deyu Zhou | Shuangzhi Wu | Qing Wang | Jun Xie | Zhaopeng Tu | Mu Li
Proceedings of the 28th International Conference on Computational Linguistics

Emotion lexicons have been shown effective for emotion classification (Baziotis et al., 2018). Previous studies handle emotion lexicon construction and emotion classification separately. In this paper, we propose an emotional network (EmNet) to jointly learn sentence emotions and construct emotion lexicons which are dynamically adapted to a given context. The dynamic emotion lexicons are useful for handling words with multiple emotions based on different context, which can effectively improve the classification accuracy. We validate the approach on two representative architectures – LSTM and BERT, demonstrating its superiority on identifying emotions in Tweets. Our model outperforms several approaches proposed in previous studies and achieves new state-of-the-art on the benchmark Twitter dataset.

2018

pdf bib
Bidirectional Generative Adversarial Networks for Neural Machine Translation
Zhirui Zhang | Shujie Liu | Mu Li | Ming Zhou | Enhong Chen
Proceedings of the 22nd Conference on Computational Natural Language Learning

Generative Adversarial Network (GAN) has been proposed to tackle the exposure bias problem of Neural Machine Translation (NMT). However, the discriminator typically results in the instability of the GAN training due to the inadequate training problem: the search space is so huge that sampled translations are not sufficient for discriminator training. To address this issue and stabilize the GAN training, in this paper, we propose a novel Bidirectional Generative Adversarial Network for Neural Machine Translation (BGAN-NMT), which aims to introduce a generator model to act as the discriminator, whereby the discriminator naturally considers the entire translation space so that the inadequate training problem can be alleviated. To satisfy this property, generator and discriminator are both designed to model the joint probability of sentence pairs, with the difference that, the generator decomposes the joint probability with a source language model and a source-to-target translation model, while the discriminator is formulated as a target language model and a target-to-source translation model. To further leverage the symmetry of them, an auxiliary GAN is introduced and adopts generator and discriminator models of original one as its own discriminator and generator respectively. Two GANs are alternately trained to update the parameters. Experiment results on German-English and Chinese-English translation tasks demonstrate that our method not only stabilizes GAN training but also achieves significant improvements over baseline systems.

pdf bib
Generative Bridging Network for Neural Sequence Prediction
Wenhu Chen | Guanlin Li | Shuo Ren | Shujie Liu | Zhirui Zhang | Mu Li | Ming Zhou
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network). Unlike MLE directly maximizing the conditional likelihood, the bridge extends the point-wise ground truth to a bridge distribution conditioned on it, and the generator is optimized to minimize their KL-divergence. Three different GBNs, namely uniform GBN, language-model GBN and coaching GBN, are proposed to penalize confidence, enhance language smoothness and relieve learning burden. Experiments conducted on two recognized sequence prediction tasks (machine translation and abstractive text summarization) show that our proposed GBNs can yield significant improvements over strong baselines. Furthermore, by analyzing samples drawn from different bridges, expected influences on the generator are verified.

pdf bib
Triangular Architecture for Rare Language Translation
Shuo Ren | Wenhu Chen | Shujie Liu | Mu Li | Ming Zhou | Shuai Ma
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural Machine Translation (NMT) performs poor on the low-resource language pair (X,Z), especially when Z is a rare language. By introducing another rich language Y, we propose a novel triangular training architecture (TA-NMT) to leverage bilingual data (Y,Z) (may be small) and (X,Y) (can be rich) to improve the translation performance of low-resource pairs. In this triangular architecture, Z is taken as the intermediate latent variable, and translation models of Z are jointly optimized with an unified bidirectional EM algorithm under the goal of maximizing the translation likelihood of (X,Y). Empirical results demonstrate that our method significantly improves the translation quality of rare languages on MultiUN and IWSLT2012 datasets, and achieves even better performance combining back-translation methods.

2017

pdf bib
Sequence-to-Dependency Neural Machine Translation
Shuangzhi Wu | Dongdong Zhang | Nan Yang | Mu Li | Ming Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Nowadays a typical Neural Machine Translation (NMT) model generates translations from left to right as a linear sequence, during which latent syntactic structures of the target sentences are not explicitly concerned. Inspired by the success of using syntactic knowledge of target language for improving statistical machine translation, in this paper we propose a novel Sequence-to-Dependency Neural Machine Translation (SD-NMT) method, in which the target word sequence and its corresponding dependency structure are jointly constructed and modeled, and this structure is used as context to facilitate word generations. Experimental results show that the proposed method significantly outperforms state-of-the-art baselines on Chinese-English and Japanese-English translation tasks.

pdf bib
Chunk-based Decoder for Neural Machine Translation
Shonosuke Ishiwatari | Jingtao Yao | Shujie Liu | Mu Li | Ming Zhou | Naoki Yoshinaga | Masaru Kitsuregawa | Weijia Jia
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Chunks (or phrases) once played a pivotal role in machine translation. By using a chunk rather than a word as the basic translation unit, local (intra-chunk) and global (inter-chunk) word orders and dependencies can be easily modeled. The chunk structure, despite its importance, has not been considered in the decoders used for neural machine translation (NMT). In this paper, we propose chunk-based decoders for (NMT), each of which consists of a chunk-level decoder and a word-level decoder. The chunk-level decoder models global dependencies while the word-level decoder decides the local word order in a chunk. To output a target sentence, the chunk-level decoder generates a chunk representation containing global information, which the word-level decoder then uses as a basis to predict the words inside the chunk. Experimental results show that our proposed decoders can significantly improve translation performance in a WAT ‘16 English-to-Japanese translation task.

pdf bib
Stack-based Multi-layer Attention for Transition-based Dependency Parsing
Zhirui Zhang | Shujie Liu | Mu Li | Ming Zhou | Enhong Chen
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Although sequence-to-sequence (seq2seq) network has achieved significant success in many NLP tasks such as machine translation and text summarization, simply applying this approach to transition-based dependency parsing cannot yield a comparable performance gain as in other state-of-the-art methods, such as stack-LSTM and head selection. In this paper, we propose a stack-based multi-layer attention model for seq2seq learning to better leverage structural linguistics information. In our method, two binary vectors are used to track the decoding stack in transition-based parsing, and multi-layer attention is introduced to capture multiple word dependencies in partial trees. We conduct experiments on PTB and CTB datasets, and the results show that our proposed model achieves state-of-the-art accuracy and significant improvement in labeled precision with respect to the baseline seq2seq model.

2016

pdf bib
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
Shi Feng | Shujie Liu | Nan Yang | Mu Li | Ming Zhou | Kenny Q. Zhu
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In neural machine translation, the attention mechanism facilitates the translation process by producing a soft alignment between the source sentence and the target sentence. However, without dedicated distortion and fertility models seen in traditional SMT systems, the learned alignment may not be accurate, which can lead to low translation quality. In this paper, we propose two novel models to improve attention-based neural machine translation. We propose a recurrent attention mechanism as an implicit distortion model, and a fertility conditioned decoder as an implicit fertility model. We conduct experiments on large-scale Chinese–English translation tasks. The results show that our models significantly improve both the alignment and translation quality compared to the original attention mechanism and several other variations.

pdf bib
Knowledge-Based Semantic Embedding for Machine Translation
Chen Shi | Shujie Liu | Shuo Ren | Shi Feng | Mu Li | Ming Zhou | Xu Sun | Houfeng Wang
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Hierarchical Recurrent Neural Network for Document Modeling
Rui Lin | Shujie Liu | Muyun Yang | Mu Li | Ming Zhou | Sheng Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Bilingually-constrained Phrase Embeddings for Machine Translation
Jiajun Zhang | Shujie Liu | Mu Li | Ming Zhou | Chengqing Zong
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Topic Representation for SMT with Neural Networks
Lei Cui | Dongdong Zhang | Shujie Liu | Qiming Chen | Mu Li | Ming Zhou | Muyun Yang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Recursive Recurrent Neural Network for Statistical Machine Translation
Shujie Liu | Nan Yang | Mu Li | Ming Zhou
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Lexicalized Reordering Model for Hierarchical Phrase-based Translation
Hailong Cao | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Efficient Collective Entity Linking with Stacking
Zhengyan He | Shujie Liu | Yang Song | Mu Li | Ming Zhou | Houfeng Wang
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Domain Adaptation for SMT Using Multi-Task Learning
Lei Cui | Xilun Chen | Dongdong Zhang | Shujie Liu | Mu Li | Ming Zhou
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Word Alignment Modeling with Context Dependent Deep Neural Network
Nan Yang | Shujie Liu | Mu Li | Ming Zhou | Nenghai Yu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Punctuation Prediction with Transition-based Parsing
Dongdong Zhang | Shuangzhi Wu | Nan Yang | Mu Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning Entity Representation for Entity Disambiguation
Zhengyan He | Shujie Liu | Mu Li | Ming Zhou | Longkai Zhang | Houfeng Wang
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Bilingual Data Cleaning for SMT using Graph-based Random Walk
Lei Cui | Dongdong Zhang | Shujie Liu | Mu Li | Ming Zhou
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Learning Translation Consensus with Structured Label Propagation
Shujie Liu | Chi-Ho Li | Mu Li | Ming Zhou
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Ranking-based Approach to Word Reordering for Statistical Machine Translation
Nan Yang | Mu Li | Dongdong Zhang | Nenghai Yu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Hierarchical Chunk-to-String Translation
Yang Feng | Dongdong Zhang | Mu Li | Qun Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
Seung-Wook Lee | Dongdong Zhang | Mu Li | Ming Zhou | Hae-Chang Rim
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Forced Derivation Tree based Model Training to Statistical Machine Translation
Nan Duan | Mu Li | Ming Zhou
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Re-training Monolingual Parser Bilingually for Syntactic SMT
Shujie Liu | Chi-Ho Li | Mu Li | Ming Zhou
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Hypothesis Mixture Decoding for Statistical Machine Translation
Nan Duan | Mu Li | Ming Zhou
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Mixture Model-based Minimum Bayes Risk Decoding using Multiple Machine Translation Systems
Nan Duan | Mu Li | Dongdong Zhang | Ming Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Adaptive Development Data Selection for Log-linear Model in Statistical Machine Translation
Mu Li | Yinggong Zhao | Dongdong Zhang | Ming Zhou
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Hybrid Decoding: Decoding with Partial Hypotheses Combination over Multiple SMT Systems
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Coling 2010: Posters

pdf bib
A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
Lei Cui | Dongdong Zhang | Mu Li | Ming Zhou | Tiejun Zhao
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Better Synchronous Binarization for Machine Translation
Tong Xiao | Mu Li | Dongdong Zhang | Jingbo Zhu | Ming Zhou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
The Feature Subspace Method for SMT System Combination
Nan Duan | Mu Li | Tong Xiao | Ming Zhou
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Extracting Keyphrases from Chinese News Articles Using TextRank and Query Log Knowledge
Weiming Liang | Chang-Ning Huang | Mu Li | Bao-Liang Lu
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

pdf bib
Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders
Mu Li | Nan Duan | Dongdong Zhang | Chi-Ho Li | Ming Zhou
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Measure Word Generation for English-Chinese SMT Systems
Dongdong Zhang | Mu Li | Nan Duan | Chi-Ho Li | Ming Zhou
Proceedings of ACL-08: HLT

pdf bib
An Empirical Study in Source Word Deletion for Phrase-Based Statistical Machine Translation
Chi-Ho Li | Hailei Zhang | Dongdong Zhang | Mu Li | Ming Zhou
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points
Ming Zhou | Bo Wang | Shujie Liu | Mu Li | Dongdong Zhang | Tiejun Zhao
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation
Chi-Ho Li | Minghui Li | Dongdong Zhang | Mu Li | Ming Zhou | Yi Guan
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Improving Query Spelling Correction Using Web Search Results
Qing Chen | Mu Li | Ming Zhou
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Phrase Reordering Model Integrating Syntactic Knowledge for SMT
Dongdong Zhang | Mu Li | Chi-Ho Li | Ming Zhou
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Exploring Distributional Similarity Based Models for Query Spelling Correction
Mu Li | Muhua Zhu | Yang Zhang | Ming Zhou
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
An Improved Chinese Word Segmentation System with Conditional Random Field
Hai Zhao | Chang-Ning Huang | Mu Li
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

pdf bib
Discriminative Reranking for Spelling Correction
Yang Zhang | Pilian He | Wei Xiang | Mu Li
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

pdf bib
Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling
Hai Zhao | Chang-Ning Huang | Mu Li | Bao-Liang Lu
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation

2005

pdf bib
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Jianfeng Gao | Mu Li | Andi Wu | Chang-Ning Huang
Computational Linguistics, Volume 31, Number 4, December 2005

pdf bib
Detecting Segmentation Errors in Chinese Annotated Corpus
Chengjie Sun | Chang-Ning Huang | Xiaolong Wang | Mu Li
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing

2004

pdf bib
Adaptive Chinese Word Segmentation
Jianfeng Gao | Andi Wu | Mu Li | Chang-Ning Huang | Hongqiao Li | Xinsong Xia | Haowei Qin
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation
Mu Li | Jianfeng Gao | Chang-Ning Huang | Jianfeng Li
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
Single Character Chinese Named Entity Recognition
Xiaodan Zhu | Mu Li | Jianfeng Gao | Chang-Ning Huang
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
Improved Source-Channel Models for Chinese Word Segmentation
Jianfeng Gao | Mu Li | Chang-Ning Huang
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics