Jinsong Su


2020

pdf bib
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yongjing Yin | Fandong Meng | Jinsong Su | Chulun Zhou | Zhengyuan Yang | Jie Zhou | Jiebo Luo
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multi-modal neural machine translation (NMT) aims to translate source sentences into a target language paired with images. However, dominant multi-modal NMT models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have potential to refine multi-modal representation learning. To deal with this issue, in this paper, we propose a novel graph-based multi-modal fusion encoder for NMT. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, these representations provide an attention-based context vector for the decoder. We evaluate our proposed encoder on the Multi30K datasets. Experimental results and in-depth analysis show the superiority of our multi-modal NMT model.

pdf bib
Exploring Contextual Word-level Style Relevance for Unsupervised Style Transfer
Chulun Zhou | Liangyu Chen | Jiachen Liu | Xinyan Xiao | Jinsong Su | Sheng Guo | Hua Wu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Unsupervised style transfer aims to change the style of an input sentence while preserving its original content without using parallel training data. In current dominant approaches, owing to the lack of fine-grained control on the influence from the target style, they are unable to yield desirable output sentences. In this paper, we propose a novel attentional sequence-to-sequence (Seq2seq) model that dynamically exploits the relevance of each output word to the target style for unsupervised style transfer. Specifically, we first pretrain a style classifier, where the relevance of each input word to the original style can be quantified via layer-wise relevance propagation. In a denoising auto-encoding manner, we train an attentional Seq2seq model to reconstruct input sentences and repredict word-level previously-quantified style relevance simultaneously. In this way, this model is endowed with the ability to automatically predict the style relevance of each output word. Then, we equip the decoder of this model with a neural style component to exploit the predicted wordlevel style relevance for better style transfer. Particularly, we fine-tune this model using a carefully-designed objective function involving style transfer, style relevance consistency, content preservation and fluency modeling loss terms. Experimental results show that our proposed model achieves state-of-the-art performance in terms of both transfer accuracy and content preservation.

pdf bib
Structural Information Preserving for Graph-to-Text Generation
Linfeng Song | Ante Wang | Jinsong Su | Yue Zhang | Kun Xu | Yubin Ge | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs. As a crucial defect, the current state-of-the-art models may mess up or even drop the core structural information of input graphs when generating outputs. We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information. In particular, we introduce two types of autoencoding losses, each individually focusing on different aspects (a.k.a. views) of input graphs. The losses are then back-propagated to better calibrate our model via multi-task training. Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.

pdf bib
Modeling Discourse Structure for Document-level Neural Machine Translation
Junxuan Chen | Xiang Li | Jiarui Zhang | Chulun Zhou | Jianwei Cui | Bin Wang | Jinsong Su
Proceedings of the First Workshop on Automatic Simultaneous Translation

Recently, document-level neural machine translation (NMT) has become a hot topic in the community of machine translation. Despite its success, most of existing studies ignored the discourse structure information of the input document to be translated, which has shown effective in other tasks. In this paper, we propose to improve document-level NMT with the aid of discourse structure information. Our encoder is based on a hierarchical attention network (HAN) (Miculicich et al., 2018). Specifically, we first parse the input document to obtain its discourse structure. Then, we introduce a Transformer-based path encoder to embed the discourse structure information of each word. Finally, we combine the discourse structure information with the word embedding before it is fed into the encoder. Experimental results on the English-to-German dataset show that our model can significantly outperform both Transformer and Transformer+HAN.

2019

pdf bib
Semantic Neural Machine Translation Using AMR
Linfeng Song | Daniel Gildea | Yue Zhang | Zhiguo Wang | Jinsong Su
Transactions of the Association for Computational Linguistics, Volume 7

It is intuitive that semantic representations can be useful for machine translation, mainly because they can help in enforcing meaning preservation and handling data sparsity (many sentences correspond to one meaning) of machine translation models. On the other hand, little work has been done on leveraging semantics for neural machine translation (NMT). In this work, we study the usefulness of AMR (abstract meaning representation) on NMT. Experiments on a standard English-to-German dataset show that incorporating AMR as additional knowledge can significantly improve a strong attention-based sequence-to-sequence neural translation model.

pdf bib
Leveraging Dependency Forest for Neural Medical Relation Extraction
Linfeng Song | Yue Zhang | Daniel Gildea | Mo Yu | Zhiguo Wang | Jinsong Su
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Medical relation extraction discovers relations between entity mentions in text, such as research articles. For this task, dependency syntax has been recognized as a crucial source of features. Yet in the medical domain, 1-best parse trees suffer from relatively low accuracies, diminishing their usefulness. We investigate a method to alleviate this problem by utilizing dependency forests. Forests contain more than one possible decisions and therefore have higher recall but more noise compared with 1-best outputs. A graph neural network is used to represent the forests, automatically distinguishing the useful syntactic information from parsing noise. Results on two benchmarks show that our method outperforms the standard tree-based methods, giving the state-of-the-art results in the literature.

pdf bib
Iterative Dual Domain Adaptation for Neural Machine Translation
Jiali Zeng | Yang Liu | Jinsong Su | Yubing Ge | Yaojie Lu | Yongjing Yin | Jiebo Luo
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Previous studies on the domain adaptation for neural machine translation (NMT) mainly focus on the one-pass transferring out-of-domain translation knowledge to in-domain NMT model. In this paper, we argue that such a strategy fails to fully extract the domain-shared translation knowledge, and repeatedly utilizing corpora of different domains can lead to better distillation of domain-shared translation knowledge. To this end, we propose an iterative dual domain adaptation framework for NMT. Specifically, we first pretrain in-domain and out-of-domain NMT models using their own training corpora respectively, and then iteratively perform bidirectional translation knowledge transfer (from in-domain to out-of-domain and then vice versa) based on knowledge distillation until the in-domain NMT model convergences. Furthermore, we extend the proposed framework to the scenario of multiple out-of-domain training corpora, where the above-mentioned transfer is performed sequentially between the in-domain and each out-of-domain NMT models in the ascending order of their domain similarities. Empirical results on Chinese-English and English-German translation tasks demonstrate the effectiveness of our framework.

pdf bib
Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis
Jialong Tang | Ziyao Lu | Jinsong Su | Yubin Ge | Linfeng Song | Le Sun | Jiebo Luo
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In aspect-level sentiment classification (ASC), it is prevalent to equip dominant neural models with attention mechanisms, for the sake of acquiring the importance of each context word on the given aspect. However, such a mechanism tends to excessively focus on a few frequent words with sentiment polarities, while ignoring infrequent ones. In this paper, we propose a progressive self-supervised attention learning approach for neural ASC models, which automatically mines useful attention supervision information from a training corpus to refine attention mechanisms. Specifically, we iteratively conduct sentiment predictions on all training instances. Particularly, at each iteration, the context word with the maximum attention weight is extracted as the one with active/misleading influence on the correct/incorrect prediction of every instance, and then the word itself is masked for subsequent iterations. Finally, we augment the conventional training objective with a regularization term, which enables ASC models to continue equally focusing on the extracted active context words while decreasing weights of those misleading ones. Experimental results on multiple datasets show that our proposed approach yields better attention mechanisms, leading to substantial improvements over the two state-of-the-art neural ASC models. Source code and trained models are available at https://github.com/DeepLearnXMU/PSSAttention.

2018

pdf bib
Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination
Jiali Zeng | Jinsong Su | Huating Wen | Yang Liu | Jun Xie | Yongjing Yin | Jianqiang Zhao
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

With great practical value, the study of Multi-domain Neural Machine Translation (NMT) mainly focuses on using mixed-domain parallel sentences to construct a unified model that allows translation to switch between different domains. Intuitively, words in a sentence are related to its domain to varying degrees, so that they will exert disparate impacts on the multi-domain NMT modeling. Based on this intuition, in this paper, we devote to distinguishing and exploiting word-level domain contexts for multi-domain NMT. To this end, we jointly model NMT with monolingual attention-based domain classification tasks and improve NMT as follows: 1) Based on the sentence representations produced by a domain classifier and an adversarial domain classifier, we generate two gating vectors and use them to construct domain-specific and domain-shared annotations, for later translation predictions via different attention models; 2) We utilize the attention weights derived from target-side domain classifier to adjust the weights of target words in the training objective, enabling domain-related words to have greater impacts during model training. Experimental results on Chinese-English and English-French multi-domain translation tasks demonstrate the effectiveness of the proposed model. Source codes of this paper are available on Github https://github.com/DeepLearnXMU/WDCNMT.

pdf bib
Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks
Biao Zhang | Deyi Xiong | Jinsong Su | Qian Lin | Huiji Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose an additionsubtraction twin-gated recurrent network (ATR) to simplify neural machine translation. The recurrent units of ATR are heavily simplified to have the smallest number of weight matrices among units of all existing gated RNNs. With the simple addition and subtraction operation, we introduce a twin-gated mechanism to build input and forget gates which are highly correlated. Despite this simplification, the essential non-linearities and capability of modeling long-distance dependencies are preserved. Additionally, the proposed ATR is more transparent than LSTM/GRU due to the simplification. Forward self-attention can be easily established in ATR, which makes the proposed network interpretable. Experiments on WMT14 translation tasks demonstrate that ATR-based neural machine translation can yield competitive performance on English-German and English-French language pairs in terms of both translation quality and speed. Further experiments on NIST Chinese-English translation, natural language inference and Chinese word segmentation verify the generality and applicability of ATR on different natural language processing tasks.

pdf bib
Neural Machine Translation with Decoding History Enhanced Attention
Mingxuan Wang | Jun Xie | Zhixing Tan | Jinsong Su | Deyi Xiong | Chao Bian
Proceedings of the 27th International Conference on Computational Linguistics

Neural machine translation with source-side attention have achieved remarkable performance. however, there has been little work exploring to attend to the target-side which can potentially enhance the memory capbility of NMT. We reformulate a Decoding History Enhanced Attention mechanism (DHEA) to render NMT model better at selecting both source-side and target-side information. DHA enables dynamic control of the ratios at which source and target contexts contribute to the generation of target words, offering a way to weakly induce structure relations among both source and target tokens. It also allows training errors to be directly back-propagated through short-cut connections and effectively alleviates the gradient vanishing problem. The empirical study on Chinese-English translation shows that our model with proper configuration can improve by 0:9 BLEU upon Transformer and the best reported results in the dataset. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.

pdf bib
Deconvolution-Based Global Decoding for Neural Machine Translation
Junyang Lin | Xu Sun | Xuancheng Ren | Shuming Ma | Jinsong Su | Qi Su
Proceedings of the 27th International Conference on Computational Linguistics

A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order. As the studies of linguistics have proved that language is not linear word sequence but sequence of complex structure, translation at each step should be conditioned on the whole target-side context. To tackle the problem, we propose a new NMT model that decodes the sequence with the guidance of its structural prediction of the context of the target sequence. Our model generates translation based on the structural prediction of the target-side context so that the translation can be freed from the bind of sequential order. Experimental results demonstrate that our model is more competitive compared with the state-of-the-art methods, and the analysis reflects that our model is also robust to translating sentences of different lengths and it also reduces repetition with the instruction from the target-side context for decoding.

pdf bib
Accelerating Neural Transformer via an Average Attention Network
Biao Zhang | Deyi Xiong | Jinsong Su
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With parallelizable attention networks, the neural Transformer is very fast to train. However, due to the auto-regressive architecture and self-attention in the decoder, the decoding procedure becomes slow. To alleviate this issue, we propose an average attention network as an alternative to the self-attention network in the decoder of the neural Transformer. The average attention network consists of two layers, with an average layer that models dependencies on previous positions and a gating layer that is stacked over the average layer to enhance the expressiveness of the proposed attention network. We apply this network on the decoder part of the neural Transformer to replace the original target-side self-attention model. With masking tricks and dynamic programming, our model enables the neural Transformer to decode sentences over four times faster than its original version with almost no loss in training time and translation performance. We conduct a series of experiments on WMT17 translation tasks, where on 6 different language pairs, we obtain robust and consistent speed-ups in decoding.

2017

pdf bib
Improving Implicit Discourse Relation Recognition with Discourse-specific Word Embeddings
Changxing Wu | Xiaodong Shi | Yidong Chen | Jinsong Su | Boli Wang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We introduce a simple and effective method to learn discourse-specific word embeddings (DSWE) for implicit discourse relation recognition. Specifically, DSWE is learned by performing connective classification on massive explicit discourse data, and capable of capturing discourse relationships between words. On the PDTB data set, using DSWE as features achieves significant improvements over baselines.

2016

pdf bib
Variational Neural Discourse Relation Recognizer
Biao Zhang | Deyi Xiong | Jinsong Su | Qun Liu | Rongrong Ji | Hong Duan | Min Zhang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Variational Neural Machine Translation
Biao Zhang | Deyi Xiong | Jinsong Su | Hong Duan | Min Zhang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Bilingually-constrained Synthetic Data for Implicit Discourse Relation Recognition
Changxing Wu | Xiaodong Shi | Yidong Chen | Yanzhou Huang | Jinsong Su
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Bilingual Autoencoders with Global Descriptors for Modeling Parallel Sentences
Biao Zhang | Deyi Xiong | Jinsong Su | Hong Duan | Min Zhang
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Parallel sentence representations are important for bilingual and cross-lingual tasks in natural language processing. In this paper, we explore a bilingual autoencoder approach to model parallel sentences. We extract sentence-level global descriptors (e.g. min, max) from word embeddings, and construct two monolingual autoencoders over these descriptors on the source and target language. In order to tightly connect the two autoencoders with bilingual correspondences, we force them to share the same decoding parameters and minimize a corpus-level semantic distance between the two languages. Being optimized towards a joint objective function of reconstruction and semantic errors, our bilingual antoencoder is able to learn continuous-valued latent representations for parallel sentences. Experiments on both intrinsic and extrinsic evaluations on statistical machine translation tasks show that our autoencoder achieves substantial improvements over the baselines.

pdf bib
Convolution-Enhanced Bilingual Recursive Neural Network for Bilingual Semantic Modeling
Jinsong Su | Biao Zhang | Deyi Xiong | Ruochen Li | Jianmin Yin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Estimating similarities at different levels of linguistic units, such as words, sub-phrases and phrases, is helpful for measuring semantic similarity of an entire bilingual phrase. In this paper, we propose a convolution-enhanced bilingual recursive neural network (ConvBRNN), which not only exploits word alignments to guide the generation of phrase structures but also integrates multiple-level information of the generated phrase structures into bilingual semantic modeling. In order to accurately learn the semantic hierarchy of a bilingual phrase, we develop a recursive neural network to constrain the learned bilingual phrase structures to be consistent with word alignments. Upon the generated source and target phrase structures, we stack a convolutional neural network to integrate vector representations of linguistic units on the structures into bilingual phrase embeddings. After that, we fully incorporate information of different linguistic units into a bilinear semantic similarity model. We introduce two max-margin losses to train the ConvBRNN model: one for the phrase structure inference and the other for the semantic similarity model. Experiments on NIST Chinese-English translation tasks demonstrate the high quality of the generated bilingual phrase structures with respect to word alignments and the effectiveness of learned semantic similarities on machine translation.

2015

pdf bib
Graph-Based Collective Lexical Selection for Statistical Machine Translation
Jinsong Su | Deyi Xiong | Shujian Huang | Xianpei Han | Junfeng Yao
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation
Jinsong Su | Deyi Xiong | Biao Zhang | Yang Liu | Junfeng Yao | Min Zhang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Shallow Convolutional Neural Network for Implicit Discourse Relation Recognition
Biao Zhang | Jinsong Su | Deyi Xiong | Yaojie Lu | Hong Duan | Junfeng Yao
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Context-Aware Topic Model for Statistical Machine Translation
Jinsong Su | Deyi Xiong | Yang Liu | Xianpei Han | Hongyu Lin | Junfeng Yao | Min Zhang
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Regularized Structured Perceptron: A Case Study on Chinese Word Segmentation, POS Tagging and Parsing
Kaixu Zhang | Jinsong Su | Changle Zhou
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
A Topic-Triggered Language Model for Statistical Machine Translation
Heng Yu | Jinsong Su | Yajuan Lv | Qun Liu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
Jinsong Su | Hua Wu | Haifeng Wang | Yidong Chen | Xiaodong Shi | Huailin Dong | Qun Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2010

pdf bib
Dependency-Based Bracketing Transduction Grammar for Statistical Machine Translation
Jinsong Su | Yang Liu | Haitao Mi | Hongmei Zhao | Yajuan Lv | Qun Liu
Coling 2010: Posters

pdf bib
Learning Lexicalized Reordering Models from Reordering Graphs
Jinsong Su | Yang Liu | Yajuan Lv | Haitao Mi | Qun Liu
Proceedings of the ACL 2010 Conference Short Papers