Longyue Wang


2020

pdf bib
Self-Attention with Cross-Lingual Position Representation
Liang Ding | Longyue Wang | Dacheng Tao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Position encoding (PE), an essential part of self-attention networks (SANs), is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences. However, in cross-lingual scenarios, machine translation, the PEs of source and target sentences are modeled independently. Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem. In this paper, we augment SANs with cross-lingual position representations to model the bilingually aware latent structure for the input sentence. Specifically, we utilize bracketing transduction grammar (BTG)-based reordering information to encourage SANs to learn bilingual diagonal alignments. Experimental results on WMT’14 EnglishGerman, WAT’17 JapaneseEnglish, and WMT’17 ChineseEnglish translation tasks demonstrate that our approach significantly and consistently improves translation quality over strong baselines. Extensive analyses confirm that the performance gains come from the cross-lingual information.

pdf bib
How Does Selective Mechanism Improve Self-Attention Networks?
Xinwei Geng | Longyue Wang | Xing Wang | Bing Qin | Ting Liu | Zhaopeng Tu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words. However, the underlying reasons for their strong performance have not been well explained. In this paper, we bridge the gap by assessing the strengths of selective SANs (SSANs), which are implemented with a flexible and universal Gumbel-Softmax. Experimental results on several representative NLP tasks, including natural language inference, semantic role labelling, and machine translation, show that SSANs consistently outperform the standard SANs. Through well-designed probing experiments, we empirically validate that the improvement of SSANs can be attributed in part to mitigating two commonly-cited weaknesses of SANs: word order encoding and structure modeling. Specifically, the selective mechanism improves SANs by paying more attention to content words that contribute to the meaning of the sentence.

pdf bib
On the Sub-layer Functionalities of Transformer Decoder
Yilin Yang | Longyue Wang | Shuming Shi | Prasad Tadepalli | Stefan Lee | Zhaopeng Tu
Findings of the Association for Computational Linguistics: EMNLP 2020

There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During translation, the decoder must predict output tokens by considering both the source-language text from the encoder and the target-language prefix produced in previous steps. In this work, we study how Transformer-based decoders leverage information from the source and target languages – developing a universal probe task to assess how information is propagated through each module of each decoder layer. We perform extensive experiments on three major translation datasets (WMT En-De, En-Fr, and En-Zh). Our analysis provides insight on when and where decoders leverage different sources. Based on these insights, we demonstrate that the residual feed-forward module in each Transformer decoder layer can be dropped with minimal loss of performance – a significant reduction in computation and number of parameters, and consequently a significant boost to both training and inference speed.

pdf bib
Context-Aware Cross-Attention for Non-Autoregressive Translation
Liang Ding | Longyue Wang | Di Wu | Dacheng Tao | Zhaopeng Tu
Proceedings of the 28th International Conference on Computational Linguistics

Non-autoregressive translation (NAT) significantly accelerates the inference process by predicting the entire target sequence. However, due to the lack of target dependency modelling in the decoder, the conditional generation process heavily depends on the cross-attention. In this paper, we reveal a localness perception problem in NAT cross-attention, for which it is difficult to adequately capture source context. To alleviate this problem, we propose to enhance signals of neighbour source tokens into conventional cross-attention. Experimental results on several representative datasets show that our approach can consistently improve translation quality over strong NAT baselines. Extensive analyses demonstrate that the enhanced cross-attention achieves better exploitation of source contexts by leveraging both local and global information.

pdf bib
On the Sparsity of Neural Machine Translation Models
Yong Wang | Longyue Wang | Victor Li | Zhaopeng Tu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information.

2019

pdf bib
One Model to Learn Both: Zero Pronoun Prediction and Translation
Longyue Wang | Zhaopeng Tu | Xing Wang | Shuming Shi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Zero pronouns (ZPs) are frequently omitted in pro-drop languages, but should be recalled in non-pro-drop languages. This discourse phenomenon poses a significant challenge for machine translation (MT) when translating texts from pro-drop to non-pro-drop languages. In this paper, we propose a unified and discourse-aware ZP translation approach for neural MT models. Specifically, we jointly learn to predict and translate ZPs in an end-to-end manner, allowing both components to interact with each other. In addition, we employ hierarchical neural networks to exploit discourse-level context, which is beneficial for ZP prediction and thus translation. Experimental results on both Chinese-English and Japanese-English data show that our approach significantly and accumulatively improves both translation performance and ZP prediction accuracy over not only baseline but also previous works using external ZP prediction models. Extensive analyses confirm that the performance improvement comes from the alleviation of different kinds of errors especially caused by subjective ZPs.

pdf bib
Towards Understanding Neural Machine Translation with Word Importance
Shilin He | Zhaopeng Tu | Xing Wang | Longyue Wang | Michael Lyu | Shuming Shi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Although neural machine translation (NMT) has advanced the state-of-the-art on various language pairs, the interpretability of NMT remains unsatisfactory. In this work, we propose to address this gap by focusing on understanding the input-output behavior of NMT models. Specifically, we measure the word importance by attributing the NMT output to every input word through a gradient-based method. We validate the approach on a couple of perturbation operations, language pairs, and model architectures, demonstrating its superiority on identifying input words with higher influence on translation performance. Encouragingly, the calculated importance can serve as indicators of input words that are under-translated by NMT models. Furthermore, our analysis reveals that words of certain syntactic categories have higher importance while the categories vary across language pairs, which can inspire better design principles of NMT architectures for multi-lingual translation.

pdf bib
Self-Attention with Structural Position Representations
Xing Wang | Zhaopeng Tu | Longyue Wang | Shuming Shi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al., 2018). In this work, we propose to augment SANs with structural position representations to model the latent structure of the input sentence, which is complementary to the standard sequential positional representations. Specifically, we use dependency tree to represent the grammatical structure of a sentence, and propose two strategies to encode the positional relationships among words in the dependency tree. Experimental results on NIST Chinese-to-English and WMT14 English-to-German translation tasks show that the proposed approach consistently boosts performance over both the absolute and relative sequential position representations.

pdf bib
Assessing the Ability of Self-Attention Networks to Learn Word Order
Baosong Yang | Longyue Wang | Derek F. Wong | Lidia S. Chao | Zhaopeng Tu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirically confirmed, nor explanations for their strong performances on machine translation tasks when “lacking positional information” have been explored. To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN. Specifically, we randomly move one word to another position, and examine whether a trained model can detect both the original and inserted positions. Experimental results reveal that: 1) SAN trained on word reordering detection indeed has difficulty learning the positional information even with the position embedding; and 2) SAN trained on machine translation learns better positional information than its RNN counterpart, in which position embedding plays a critical role. Although recurrence structure make the model more universally-effective on learning word order, learning objectives matter more in the downstream tasks such as machine translation.

pdf bib
Exploiting Sentential Context for Neural Machine Translation
Xing Wang | Zhaopeng Tu | Longyue Wang | Shuming Shi
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this work, we present novel approaches to exploit sentential context for neural machine translation (NMT). Specifically, we show that a shallow sentential context extracted from the top encoder layer only, can improve translation performance via contextualizing the encoding representations of individual words. Next, we introduce a deep sentential context, which aggregates the sentential context representations from all of the internal layers of the encoder to form a more comprehensive context representation. Experimental results on the WMT14 English-German and English-French benchmarks show that our model consistently improves performance over the strong Transformer model, demonstrating the necessity and effectiveness of exploiting sentential context for NMT.

pdf bib
Modeling Recurrence for Transformer
Jie Hao | Xing Wang | Baosong Yang | Longyue Wang | Jinfeng Zhang | Zhaopeng Tu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recently, the Transformer model that is based solely on attention mechanisms, has advanced the state-of-the-art on various machine translation tasks. However, recent studies reveal that the lack of recurrence modeling hinders its further improvement of translation capacity. In response to this problem, we propose to directly model recurrence for Transformer with an additional recurrence encoder. In addition to the standard recurrent neural network, we introduce a novel attentive recurrent network to leverage the strengths of both attention models and recurrent networks. Experimental results on the widely-used WMT14 English⇒German and WMT17 Chinese⇒English translation tasks demonstrate the effectiveness of the proposed approach. Our studies also reveal that the proposed model benefits from a short-cut that bridges the source and target sequences with a single recurrent layer, which outperforms its deep counterpart.

pdf bib
Convolutional Self-Attention Networks
Baosong Yang | Longyue Wang | Derek F. Wong | Lidia S. Chao | Zhaopeng Tu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Self-attention networks (SANs) have drawn increasing interest due to their high parallelization in computation and flexibility in modeling dependencies. SANs can be further enhanced with multi-head attention by allowing the model to attend to information from different representation subspaces. In this work, we propose novel convolutional self-attention networks, which offer SANs the abilities to 1) strengthen dependencies among neighboring elements, and 2) model the interaction between features extracted by multiple attention heads. Experimental results of machine translation on different language pairs and model settings show that our approach outperforms both the strong Transformer baseline and other existing models on enhancing the locality of SANs. Comparing with prior studies, the proposed model is parameter free in terms of introducing no more parameters.

2018

pdf bib
Chinese-Portuguese Machine Translation: A Study on Building Parallel Corpora from Comparable Texts
Siyou Liu | Longyue Wang | Chao-Hong Liu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism
Longyue Wang | Zhaopeng Tu | Andy Way | Qun Liu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. Recently, Wang et al. (2018) proposed a novel reconstruction-based approach to alleviating dropped pronoun (DP) translation problems for neural machine translation models. In this work, we improve the original model from two perspectives. First, we employ a shared reconstructor to better exploit encoder and decoder representations. Second, we jointly learn to translate and predict DPs in an end-to-end manner, to avoid the errors propagated from an external DP prediction model. Experimental results show that our approach significantly improves both translation performance and DP prediction accuracy.

2017

pdf bib
Semantics-Enhanced Task-Oriented Dialogue Translation: A Case Study on Hotel Booking
Longyue Wang | Jinhua Du | Liangyou Li | Zhaopeng Tu | Andy Way | Qun Liu
Proceedings of the IJCNLP 2017, System Demonstrations

We showcase TODAY, a semantics-enhanced task-oriented dialogue translation system, whose novelties are: (i) task-oriented named entity (NE) definition and a hybrid strategy for NE recognition and translation; and (ii) a novel grounded semantic method for dialogue understanding and task-order management. TODAY is a case-study demo which can efficiently and accurately assist customers and agents in different languages to reach an agreement in a dialogue for the hotel booking.

pdf bib
Exploiting Cross-Sentence Context for Neural Machine Translation
Longyue Wang | Zhaopeng Tu | Andy Way | Qun Liu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a cross-sentence context-aware approach and investigate the influence of historical contextual information on the performance of neural machine translation (NMT). First, this history is summarized in a hierarchical way. We then integrate the historical representation into NMT in two strategies: 1) a warm-start of encoder and decoder states, and 2) an auxiliary context source for updating decoder states. Experimental results on a large Chinese-English translation task show that our approach significantly improves upon a strong attention-based NMT system by up to +2.1 BLEU points.

2016

pdf bib
Automatic Construction of Discourse Corpora for Dialogue Translation
Longyue Wang | Xiaojun Zhang | Zhaopeng Tu | Andy Way | Qun Liu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation.

pdf bib
A Novel Approach to Dropped Pronoun Translation
Longyue Wang | Zhaopeng Tu | Xiaojun Zhang | Hang Li | Andy Way | Qun Liu
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
The DCU Discourse Parser: A Sense Classification Task
Tsuyoshi Okita | Longyue Wang | Qun Liu
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

pdf bib
The DCU Discourse Parser for Connective, Argument Identification and Explicit Sense Classification
Longyue Wang | Chris Hokamp | Tsuyoshi Okita | Xiaojun Zhang | Qun Liu
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

2014

pdf bib
Factored Statistical Machine Translation for Grammatical Error Correction
Yiming Wang | Longyue Wang | Xiaodong Zeng | Derek F. Wong | Lidia S. Chao | Yi Lu
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Domain Adaptation for Medical Text Translation using Web Resources
Yi Lu | Longyue Wang | Derek F. Wong | Lidia S. Chao | Yiming Wang
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Combining Domain Adaptation Approaches for Medical Text Translation
Longyue Wang | Yi Lu | Derek F. Wong | Lidia S. Chao | Yiming Wang | Francisco Oliveira
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
Liang Tian | Derek F. Wong | Lidia S. Chao | Paulo Quaresma | Francisco Oliveira | Yi Lu | Shuo Li | Yiming Wang | Longyue Wang
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natural language processing systems, especially for Statistical Machine Translation (SMT). However, most existing parallel corpora to Chinese are subject to in-house use, while others are domain specific and limited in size. To a certain degree, this limits the SMT research. This paper describes the acquisition of a large scale and high quality parallel corpora for English and Chinese. The corpora constructed in this paper contain about 15 million English-Chinese (E-C) parallel sentences, and more than 2 million training data and 5,000 testing sentences are made publicly available. Different from previous work, the corpus is designed to embrace eight different domains. Some of them are further categorized into different topics. The corpus will be released to the research community, which is available at the NLP2CT website.

2013

pdf bib
UM-Checker: A Hybrid System for English Grammatical Error Correction
Junwen Xing | Longyue Wang | Derek F. Wong | Lidia S. Chao | Xiaodong Zeng
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Edit Distance: A New Data Selection Criterion for Domain Adaptation in SMT
Longyue Wang | Derek F. Wong | Lidia S. Chao | Junwen Xing | Yi Lu | Isabel Trancoso
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf bib
CRFs-Based Chinese Word Segmentation for Micro-Blog with Small-Scale Data
Longyue Wang | Derek F. Wong | Lidia S. Chao | Junwen Xing
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
A Joint Chinese Named Entity Recognition and Disambiguation System
Longyue Wang | Shuo Li | Derek F. Wong | Lidia S. Chao
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
An Improvement in Cross-Language Document Retrieval Based on Statistical Models
Longyue Wang | Derek F. Wong | Lidia S. Chao
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)