Derek F. Wong


2020

pdf bib
Guiding Variational Response Generator to Exploit Persona
Bowen Wu | MengYuan Li | Zongsheng Wang | Yifu Chen | Derek F. Wong | Qihang Feng | Junhong Huang | Baoxun Wang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Leveraging persona information of users in Neural Response Generators (NRG) to perform personalized conversations has been considered as an attractive and important topic in the research of conversational agents over the past few years. Despite of the promising progress achieved by recent studies in this field, persona information tends to be incorporated into neural networks in the form of user embeddings, with the expectation that the persona can be involved via End-to-End learning. This paper proposes to adopt the personality-related characteristics of human conversations into variational response generators, by designing a specific conditional variational autoencoder based deep model with two new regularization terms employed to the loss function, so as to guide the optimization towards the direction of generating both persona-aware and relevant responses. Besides, to reasonably evaluate the performances of various persona modeling approaches, this paper further presents three direct persona-oriented metrics from different perspectives. The experimental results have shown that our proposed methodology can notably improve the performance of persona-aware response generation, and the metrics are reasonable to evaluate the results.

pdf bib
Norm-Based Curriculum Learning for Neural Machine Translation
Xuebo Liu | Houtim Lai | Derek F. Wong | Lidia S. Chao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

A neural machine translation (NMT) system is expensive to train, especially with high-resource settings. As the NMT architectures become deeper and wider, this issue gets worse and worse. In this paper, we aim to improve the efficiency of training an NMT by introducing a novel norm-based curriculum learning method. We use the norm (aka length or module) of a word embedding as a measure of 1) the difficulty of the sentence, 2) the competence of the model, and 3) the weight of the sentence. The norm-based sentence difficulty takes the advantages of both linguistically motivated and model-based sentence difficulties. It is easy to determine and contains learning-dependent features. The norm-based model competence makes NMT learn the curriculum in a fully automated way, while the norm-based sentence weight further enhances the learning of the vector representation of the NMT. Experimental results for the WMT’14 English-German and WMT’17 Chinese-English translation tasks demonstrate that the proposed method outperforms strong baselines in terms of BLEU score (+1.17/+1.56) and training speedup (2.22x/3.33x).

pdf bib
Uncertainty-Aware Curriculum Learning for Neural Machine Translation
Yikai Zhou | Baosong Yang | Derek F. Wong | Yu Wan | Lidia S. Chao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Neural machine translation (NMT) has proven to be facilitated by curriculum learning which presents examples in an easy-to-hard order at different training stages. The keys lie in the assessment of data difficulty and model competence. We propose uncertainty-aware curriculum learning, which is motivated by the intuition that: 1) the higher the uncertainty in a translation pair, the more complex and rarer the information it contains; and 2) the end of the decline in model uncertainty indicates the completeness of current training stage. Specifically, we serve cross-entropy of an example as its data difficulty and exploit the variance of distributions over the weights of the network to present the model uncertainty. Extensive experiments on various translation tasks reveal that our approach outperforms the strong baseline and related methods on both translation quality and convergence speed. Quantitative analyses reveal that the proposed strategy offers NMT the ability to automatically govern its learning schedule.

pdf bib
新型冠状病毒肺炎相关的推特主题与情感研究(Exploring COVID-19-related Twitter Topic Dynamics across Countries)
Shuailong Liang (梁帅龙) | Derek F. Wong (黄辉) | Yue Zhang (张岳)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

我们基于从2020年1月22日至2020年4月30日在推特社交平台上抓取的不同国家和地区发布的50万条推文,研究了有关 2019新型冠状病毒肺炎相关的主题和人们的观点,发现了不同国家之间推特用户的普遍关切和看法之间存在着异同,并且对不同议题的情感态度也有所不同。我们发现大部分推文中包含了强烈的情感,其中表达爱与支持的推文比较普遍。总体来看,人们的情感随着时间的推移逐渐正向增强。

pdf bib
Self-Paced Learning for Neural Machine Translation
Yu Wan | Baosong Yang | Derek F. Wong | Yikai Zhou | Lidia S. Chao | Haibo Zhang | Boxing Chen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing self-paced learning, where NMT model is allowed to 1) automatically quantify the learning confidence over training examples; and 2) flexibly govern its learning via regulating the loss in each iteration step. Experimental results over multiple translation tasks demonstrate that the proposed model yields better performance than strong baselines and those models trained with human-designed curricula on both translation quality and convergence speed.

2019

pdf bib
Learning Deep Transformer Models for Machine Translation
Qiang Wang | Bei Li | Tong Xiao | Jingbo Zhu | Changliang Li | Derek F. Wong | Lidia S. Chao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto standard for development of the Transformer system, and the other uses deeper language representation but faces the difficulty arising from learning deep networks. Here, we continue the line of research on the latter. We claim that a truly deep Transformer model can surpass the Transformer-Big counterpart by 1) proper use of layer normalization and 2) a novel way of passing the combination of previous layers to the next. On WMT’16 English-German and NIST OpenMT’12 Chinese-English tasks, our deep system (30/25-layer encoder) outperforms the shallow Transformer-Big/Base baseline (6-layer encoder) by 0.4-2.4 BLEU points. As another bonus, the deep model is 1.6X smaller in size and 3X faster in training than Transformer-Big.

pdf bib
Leveraging Local and Global Patterns for Self-Attention Networks
Mingzhou Xu | Derek F. Wong | Baosong Yang | Yue Zhang | Lidia S. Chao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self-attention networks have received increasing research attention. By default, the hidden states of each word are hierarchically calculated by attending to all words in the sentence, which assembles global information. However, several studies pointed out that taking all signals into account may lead to overlooking neighboring information (e.g. phrase pattern). To address this argument, we propose a hybrid attention mechanism to dynamically leverage both of the local and global information. Specifically, our approach uses a gating scalar for integrating both sources of the information, which is also convenient for quantifying their contributions. Experiments on various neural machine translation tasks demonstrate the effectiveness of the proposed method. The extensive analyses verify that the two types of contexts are complementary to each other, and our method gives highly effective improvements in their integration.

pdf bib
Shared-Private Bilingual Word Embeddings for Neural Machine Translation
Xuebo Liu | Derek F. Wong | Yang Liu | Lidia S. Chao | Tong Xiao | Jingbo Zhu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Word embedding is central to neural machine translation (NMT), which has attracted intensive research interest in recent years. In NMT, the source embedding plays the role of the entrance while the target embedding acts as the terminal. These layers occupy most of the model parameters for representation learning. Furthermore, they indirectly interface via a soft-attention mechanism, which makes them comparatively isolated. In this paper, we propose shared-private bilingual word embeddings, which give a closer relationship between the source and target embeddings, and which also reduce the number of model parameters. For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units. Experiments on 5 language pairs belonging to 6 different language families and written in 5 different alphabets demonstrate that the proposed model provides a significant performance boost over the strong baselines with dramatically fewer model parameters.

pdf bib
Assessing the Ability of Self-Attention Networks to Learn Word Order
Baosong Yang | Longyue Wang | Derek F. Wong | Lidia S. Chao | Zhaopeng Tu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Self-attention networks (SAN) have attracted a lot of interests due to their high parallelization and strong performance on a variety of NLP tasks, e.g. machine translation. Due to the lack of recurrence structure such as recurrent neural networks (RNN), SAN is ascribed to be weak at learning positional information of words for sequence modeling. However, neither this speculation has been empirically confirmed, nor explanations for their strong performances on machine translation tasks when “lacking positional information” have been explored. To this end, we propose a novel word reordering detection task to quantify how well the word order information learned by SAN and RNN. Specifically, we randomly move one word to another position, and examine whether a trained model can detect both the original and inserted positions. Experimental results reveal that: 1) SAN trained on word reordering detection indeed has difficulty learning the positional information even with the position embedding; and 2) SAN trained on machine translation learns better positional information than its RNN counterpart, in which position embedding plays a critical role. Although recurrence structure make the model more universally-effective on learning word order, learning objectives matter more in the downstream tasks such as machine translation.

pdf bib
Convolutional Self-Attention Networks
Baosong Yang | Longyue Wang | Derek F. Wong | Lidia S. Chao | Zhaopeng Tu
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Self-attention networks (SANs) have drawn increasing interest due to their high parallelization in computation and flexibility in modeling dependencies. SANs can be further enhanced with multi-head attention by allowing the model to attend to information from different representation subspaces. In this work, we propose novel convolutional self-attention networks, which offer SANs the abilities to 1) strengthen dependencies among neighboring elements, and 2) model the interaction between features extracted by multiple attention heads. Experimental results of machine translation on different language pairs and model settings show that our approach outperforms both the strong Transformer baseline and other existing models on enhancing the locality of SANs. Comparing with prior studies, the proposed model is parameter free in terms of introducing no more parameters.

2018

pdf bib
Modeling Localness for Self-Attention Networks
Baosong Yang | Zhaopeng Tu | Derek F. Wong | Fandong Meng | Lidia S. Chao | Tong Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Self-attention networks have proven to be of profound value for its strength of capturing global dependencies. In this work, we propose to model localness for self-attention networks, which enhances the ability of capturing useful local context. We cast localness modeling as a learnable Gaussian bias, which indicates the central and scope of the local region to be paid more attention. The bias is then incorporated into the original attention distribution to form a revised distribution. To maintain the strength of capturing long distance dependencies while enhance the ability of capturing short-range dependencies, we only apply localness modeling to lower layers of self-attention networks. Quantitative and qualitative analyses on Chinese-English and English-German translation tasks demonstrate the effectiveness and universality of the proposed approach.

2017

pdf bib
Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation
Baosong Yang | Derek F. Wong | Tong Xiao | Lidia S. Chao | Jingbo Zhu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder. To maximize the predictive likelihood of target words, a weighted variant of an attention mechanism is used to balance the attentive information between lexical and phrase vectors. Using a tree-based rare word encoding, the proposed model is extended to sub-word level to alleviate the out-of-vocabulary (OOV) problem. Empirical results reveal that the proposed model significantly outperforms sequence-to-sequence attention-based and tree-based neural translation models in English-Chinese translation tasks.

2015

pdf bib
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Aaron Li-Feng Han | Xiaodong Zeng | Derek F. Wong | Lidia S. Chao
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

2014

pdf bib
Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints
Xiaodong Zeng | Lidia S. Chao | Derek F. Wong | Isabel Trancoso | Liang Tian
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Factored Statistical Machine Translation for Grammatical Error Correction
Yiming Wang | Longyue Wang | Xiaodong Zeng | Derek F. Wong | Lidia S. Chao | Yi Lu
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Domain Adaptation for Medical Text Translation using Web Resources
Yi Lu | Longyue Wang | Derek F. Wong | Lidia S. Chao | Yiming Wang
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Combining Domain Adaptation Approaches for Medical Text Translation
Longyue Wang | Yi Lu | Derek F. Wong | Lidia S. Chao | Yiming Wang | Francisco Oliveira
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
UM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
Liang Tian | Derek F. Wong | Lidia S. Chao | Paulo Quaresma | Francisco Oliveira | Yi Lu | Shuo Li | Yiming Wang | Longyue Wang
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natural language processing systems, especially for Statistical Machine Translation (SMT). However, most existing parallel corpora to Chinese are subject to in-house use, while others are domain specific and limited in size. To a certain degree, this limits the SMT research. This paper describes the acquisition of a large scale and high quality parallel corpora for English and Chinese. The corpora constructed in this paper contain about 15 million English-Chinese (E-C) parallel sentences, and more than 2 million training data and 5,000 testing sentences are made publicly available. Different from previous work, the corpus is designed to embrace eight different domains. Some of them are further categorized into different topics. The corpus will be released to the research community, which is available at the NLP2CT website.

2013

pdf bib
Influence of Part-of-Speech and Phrasal Category Universal Tag-set in Tree-to-Tree Translation Models
Francisco Oliveira | Derek F. Wong | Lidia S. Chao | Liang Tian | Liangye He
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Quality Estimation for Machine Translation Using the Joint Method of Evaluation Criteria and Statistical Modeling
Aaron Li-Feng Han | Yi Lu | Derek F. Wong | Lidia S. Chao | Liangye He | Junwen Xing
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
A Description of Tunable Machine Translation Evaluation Systems in WMT13 Metrics Task
Aaron Li-Feng Han | Derek F. Wong | Lidia S. Chao | Yi Lu | Liangye He | Yiming Wang | Jiaji Zhou
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Experiments with POS-based restructuring and alignment-based reordering for statistical machine translation
Shuo Li | Derek F. Wong | Lidia S. Chao
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
UM-Checker: A Hybrid System for English Grammatical Error Correction
Junwen Xing | Longyue Wang | Derek F. Wong | Lidia S. Chao | Xiaodong Zeng
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
Xiaodong Zeng | Derek F. Wong | Lidia S. Chao | Isabel Trancoso
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
Xiaodong Zeng | Derek F. Wong | Lidia S. Chao | Isabel Trancoso
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Edit Distance: A New Data Selection Criterion for Domain Adaptation in SMT
Longyue Wang | Derek F. Wong | Lidia S. Chao | Junwen Xing | Yi Lu | Isabel Trancoso
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Augmented Parsing of Unknown Word by Graph-Based Semi-Supervised Learning
Qiuping Huang | Derek F. Wong | Lidia S. Chao | Xiaodong Zeng | Liangye He
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

2012

pdf bib
CRFs-Based Chinese Word Segmentation for Micro-Blog with Small-Scale Data
Longyue Wang | Derek F. Wong | Lidia S. Chao | Junwen Xing
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Rules Design in Word Segmentation of Chinese Micro-Blog
Hao Zong | Derek F. Wong | Lidia S. Chao
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
A Template Based Hybrid Model for Chinese Personal Name Disambiguation
Hao Zong | Derek F. Wong | Lidia S. Chao
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
A Joint Chinese Named Entity Recognition and Disambiguation System
Longyue Wang | Shuo Li | Derek F. Wong | Lidia S. Chao
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
A Simplified Chinese Parser with Factored Model
Qiuping Huang | Liangye He | Derek F. Wong | Lidia S. Chao
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Adapting Multilingual Parsing Models to Sinica Treebank
Liangye He | Derek F. Wong | Lidia S. Chao
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors
Aaron L. F. Han | Derek F. Wong | Lidia S. Chao
Proceedings of COLING 2012: Posters

pdf bib
An Improvement in Cross-Language Document Retrieval Based on Statistical Models
Longyue Wang | Derek F. Wong | Lidia S. Chao
Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012)

pdf bib
TQDL: Integrated Models for Cross-Language Document Retrieval
Long-Yue Wang | Derek F. Wong | Lidia S. Chao
International Journal of Computational Linguistics & Chinese Language Processing, Volume 17, Number 4, December 2012-Special Issue on Selected Papers from ROCLING XXIV