Jiajun Zhang


2020

pdf bib
Attend, Translate and Summarize: An Efficient Method for Neural Cross-Lingual Summarization
Junnan Zhu | Yu Zhou | Jiajun Zhang | Chengqing Zong
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Cross-lingual summarization aims at summarizing a document in one language (e.g., Chinese) into another language (e.g., English). In this paper, we propose a novel method inspired by the translation pattern in the process of obtaining a cross-lingual summary. We first attend to some words in the source text, then translate them into the target language, and summarize to get the final summary. Specifically, we first employ the encoder-decoder attention distribution to attend to the source words. Second, we present three strategies to acquire the translation probability, which helps obtain the translation candidates for each source word. Finally, each summary word is generated either from the neural distribution or from the translation candidates of source words. Experimental results on Chinese-to-English and English-to-Chinese summarization tasks have shown that our proposed method can significantly outperform the baselines, achieving comparable performance with the state-of-the-art.

pdf bib
CASIA’s System for IWSLT 2020 Open Domain Translation
Qian Wang | Yuchen Liu | Cong Ma | Yu Lu | Yining Wang | Long Zhou | Yang Zhao | Jiajun Zhang | Chengqing Zong
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes the CASIA’s system for the IWSLT 2020 open domain translation task. This year we participate in both Chinese→Japanese and Japanese→Chinese translation tasks. Our system is neural machine translation system based on Transformer model. We augment the training data with knowledge distillation and back translation to improve the translation performance. Domain data classification and weighted domain model ensemble are introduced to generate the final translation result. We compare and analyze the performance on development data with different model settings and different data processing techniques.

pdf bib
Improving Autoregressive NMT with Non-Autoregressive Model
Long Zhou | Jiajun Zhang | Chengqing Zong
Proceedings of the First Workshop on Automatic Simultaneous Translation

Autoregressive neural machine translation (NMT) models are often used to teach non-autoregressive models via knowledge distillation. However, there are few studies on improving the quality of autoregressive translation (AT) using non-autoregressive translation (NAT). In this work, we propose a novel Encoder-NAD-AD framework for NMT, aiming at boosting AT with global information produced by NAT model. Specifically, under the semantic guidance of source-side context captured by the encoder, the non-autoregressive decoder (NAD) first learns to generate target-side hidden state sequence in parallel. Then the autoregressive decoder (AD) performs translation from left to right, conditioned on source-side and target-side hidden states. Since AD has global information generated by low-latency NAD, it is more likely to produce a better translation with less time delay. Experiments on WMT14 En-De, WMT16 En-Ro, and IWSLT14 De-En translation tasks demonstrate that our framework achieves significant improvements with only 8% speed degeneration over the autoregressive NMT.

pdf bib
Distill and Replay for Continual Language Learning
Jingyuan Sun | Shaonan Wang | Jiajun Zhang | Chengqing Zong
Proceedings of the 28th International Conference on Computational Linguistics

Accumulating knowledge to tackle new tasks without necessarily forgetting the old ones is a hallmark of human-like intelligence. But the current dominant paradigm of machine learning is still to train a model that works well on static datasets. When learning tasks in a stream where data distribution may fluctuate, fitting on new tasks often leads to forgetting on the previous ones. We propose a simple yet effective framework that continually learns natural language understanding tasks with one model. Our framework distills knowledge and replays experience from previous tasks when fitting on a new task, thus named DnR (distill and replay). The framework is based on language models and can be smoothly built with different language model architectures. Experimental results demonstrate that DnR outperfoms previous state-of-the-art models in continually learning tasks of the same type but from different domains, as well as tasks of different types. With the distillation method, we further show that it’s possible for DnR to incrementally compress the model size while still outperforming most of the baselines. We hope that DnR could promote the empirical application of continual language learning, and contribute to building human-level language intelligence minimally bothered by catastrophic forgetting.

pdf bib
Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity
Yang Zhao | Lu Xiang | Junnan Zhu | Jiajun Zhang | Yu Zhou | Chengqing Zong
Proceedings of the 28th International Conference on Computational Linguistics

Previous studies combining knowledge graph (KG) with neural machine translation (NMT) have two problems: i) Knowledge under-utilization: they only focus on the entities that appear in both KG and training sentence pairs, making much knowledge in KG unable to be fully utilized. ii) Granularity mismatch: the current KG methods utilize the entity as the basic granularity, while NMT utilizes the sub-word as the granularity, making the KG different to be utilized in NMT. To alleviate above problems, we propose a multi-task learning method on sub-entity granularity. Specifically, we first split the entities in KG and sentence pairs into sub-entity granularity by using joint BPE. Then we utilize the multi-task learning to combine the machine translation task and knowledge reasoning task. The extensive experiments on various translation tasks have demonstrated that our method significantly outperforms the baseline models in both translation quality and handling the entities.

pdf bib
Multimodal Sentence Summarization via Multimodal Selective Encoding
Haoran Li | Junnan Zhu | Jiajun Zhang | Xiaodong He | Chengqing Zong
Proceedings of the 28th International Conference on Computational Linguistics

This paper studies the problem of generating a summary for a given sentence-image pair. Existing multimodal sequence-to-sequence approaches mainly focus on enhancing the decoder by visual signals, while ignoring that the image can improve the ability of the encoder to identify highlights of a news event or a document. Thus, we propose a multimodal selective gate network that considers reciprocal relationships between textual and multi-level visual features, including global image descriptor, activation grids, and object proposals, to select highlights of the event when encoding the source sentence. In addition, we introduce a modality regularization to encourage the summary to capture the highlights embedded in the image more accurately. To verify the generalization of our model, we adopt the multimodal selective gate to the text-based decoder and multimodal-based decoder. Experimental results on a public multimodal sentence summarization dataset demonstrate the advantage of our models over baselines. Further analysis suggests that our proposed multimodal selective gate network can effectively select important information in the input sentence.

pdf bib
Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning
Xiaomian Kang | Yang Zhao | Jiajun Zhang | Chengqing Zong
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Document-level neural machine translation has yielded attractive improvements. However, majority of existing methods roughly use all context sentences in a fixed scope. They neglect the fact that different source sentences need different sizes of context. To address this problem, we propose an effective approach to select dynamic context so that the document-level translation model can utilize the more useful selected context sentences to produce better translations. Specifically, we introduce a selection module that is independent of the translation module to score each candidate context sentence. Then, we propose two strategies to explicitly select a variable number of context sentences and feed them into the translation module. We train the two modules end-to-end via reinforcement learning. A novel reward is proposed to encourage the selection and utilization of dynamic context sentences. Experiments demonstrate that our approach can select adaptive context sentences for different source sentences, and significantly improves the performance of document-level translation methods.

2019

pdf bib
Synchronous Bidirectional Neural Machine Translation
Long Zhou | Jiajun Zhang | Chengqing Zong
Transactions of the Association for Computational Linguistics, Volume 7

Existing approaches to neural machine translation (NMT) generate the target language sequence token-by-token from left to right. However, this kind of unidirectional decoding framework cannot make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectional–neural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST Chinese–English, WMT14 English–German, and WMT18 Russian–English translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49, and 1.04 BLEU points, respectively, and obtains the state-of-the-art performance on Chinese–English and English–German translation tasks.

pdf bib
Are You for Real? Detecting Identity Fraud via Dialogue Interactions
Weikang Wang | Jiajun Zhang | Qian Li | Chengqing Zong | Zhifei Li
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Identity fraud detection is of great importance in many real-world scenarios such as the financial industry. However, few studies addressed this problem before. In this paper, we focus on identity fraud detection in loan applications and propose to solve this problem with a novel interactive dialogue system which consists of two modules. One is the knowledge graph (KG) constructor organizing the personal information for each loan applicant. The other is structured dialogue management that can dynamically generate a series of questions based on the personal KG to ask the applicants and determine their identity states. We also present a heuristic user simulator based on problem analysis to evaluate our method. Experiments have shown that the trainable dialogue system can effectively detect fraudsters, and achieve higher recognition accuracy compared with rule-based systems. Furthermore, our learned dialogue strategies are interpretable and flexible, which can help promote real-world applications.

pdf bib
NCLS: Neural Cross-Lingual Summarization
Junnan Zhu | Qian Wang | Yining Wang | Yu Zhou | Jiajun Zhang | Shaonan Wang | Chengqing Zong
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. Existing methods simply divide this task into two steps: summarization and translation, leading to the problem of error propagation. To handle that, we present an end-to-end CLS framework, which we refer to as Neural Cross-Lingual Summarization (NCLS), for the first time. Moreover, we propose to further improve NCLS by incorporating two related tasks, monolingual summarization and machine translation, into the training process of CLS under multi-task learning. Due to the lack of supervised CLS data, we propose a round-trip translation strategy to acquire two high-quality large-scale CLS datasets based on existing monolingual summarization datasets. Experimental results have shown that our NCLS achieves remarkable improvement over traditional pipeline methods on both English-to-Chinese and Chinese-to-English CLS human-corrected test sets. In addition, NCLS with multi-task learning can further significantly improve the quality of generated summaries. We make our dataset and code publicly available here: http://www.nlpr.ia.ac.cn/cip/dataset.htm.

pdf bib
Synchronously Generating Two Languages with Interactive Decoding
Yining Wang | Jiajun Zhang | Long Zhou | Yuchen Liu | Chengqing Zong
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In this paper, we introduce a novel interactive approach to translate a source language into two different languages simultaneously and interactively. Specifically, the generation of one language relies on not only previously generated outputs by itself, but also the outputs predicted in the other language. Experimental results on IWSLT and WMT datasets demonstrate that our method can obtain significant improvements over both conventional Neural Machine Translation (NMT) model and multilingual NMT model.

pdf bib
A Compact and Language-Sensitive Multilingual Translation Method
Yining Wang | Long Zhou | Jiajun Zhang | Feifei Zhai | Jingfang Xu | Chengqing Zong
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multilingual neural machine translation (Multi-NMT) with one encoder-decoder model has made remarkable progress due to its simple deployment. However, this multilingual translation paradigm does not make full use of language commonality and parameter sharing between encoder and decoder. Furthermore, this kind of paradigm cannot outperform the individual models trained on bilingual corpus in most cases. In this paper, we propose a compact and language-sensitive method for multilingual translation. To maximize parameter sharing, we first present a universal representor to replace both encoder and decoder models. To make the representor sensitive for specific languages, we further introduce language-sensitive embedding, attention, and discriminator with the ability to enhance model performance. We verify our methods on various translation scenarios, including one-to-many, many-to-many and zero-shot. Extensive experiments demonstrate that our proposed methods remarkably outperform strong standard multilingual translation systems on WMT and IWSLT datasets. Moreover, we find that our model is especially helpful in low-resource and zero-shot translation scenarios.

pdf bib
Incremental Learning from Scratch for Task-Oriented Dialogue Systems
Weikang Wang | Jiajun Zhang | Qian Li | Mei-Yuh Hwang | Chengqing Zong | Zhifei Li
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Clarifying user needs is essential for existing task-oriented dialogue systems. However, in real-world applications, developers can never guarantee that all possible user demands are taken into account in the design phase. Consequently, existing systems will break down when encountering unconsidered user needs. To address this problem, we propose a novel incremental learning framework to design task-oriented dialogue systems, or for short Incremental Dialogue System (IDS), without pre-defining the exhaustive list of user needs. Specifically, we introduce an uncertainty estimation module to evaluate the confidence of giving correct responses. If there is high confidence, IDS will provide responses to users. Otherwise, humans will be involved in the dialogue process, and IDS can learn from human intervention through an online learning module. To evaluate our method, we propose a new dataset which simulates unanticipated user needs in the deployment stage. Experiments show that IDS is robust to unconsidered user actions, and can update itself online by smartly selecting only the most effective training data, and hence attains better performance with less annotation cost.

pdf bib
Memory Consolidation for Contextual Spoken Language Understanding with Dialogue Logistic Inference
He Bai | Yu Zhou | Jiajun Zhang | Chengqing Zong
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Dialogue contexts are proven helpful in the spoken language understanding (SLU) system and they are typically encoded with explicit memory representations. However, most of the previous models learn the context memory with only one objective to maximizing the SLU performance, leaving the context memory under-exploited. In this paper, we propose a new dialogue logistic inference (DLI) task to consolidate the context memory jointly with SLU in the multi-task framework. DLI is defined as sorting a shuffled dialogue session into its original logical order and shares the same memory encoder and retrieval mechanism as the SLU model. Our experimental results show that various popular contextual SLU models can benefit from our approach, and improvements are quite impressive, especially in slot filling.

2018

pdf bib
Exploiting Pre-Ordering for Neural Machine Translation
Yang Zhao | Jiajun Zhang | Chengqing Zong
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
One Sentence One Model for Neural Machine Translation
Xiaoqing Li | Jiajun Zhang | Chengqing Zong
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Associative Multichannel Autoencoder for Multimodal Word Representation
Shaonan Wang | Jiajun Zhang | Chengqing Zong
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper we address the problem of learning multimodal word representations by integrating textual, visual and auditory inputs. Inspired by the re-constructive and associative nature of human memory, we propose a novel associative multichannel autoencoder (AMA). Our model first learns the associations between textual and perceptual modalities, so as to predict the missing perceptual information of concepts. Then the textual and predicted perceptual representations are fused through reconstructing their original and associated embeddings. Using a gating mechanism our model assigns different weights to each modality according to the different concepts. Results on six benchmark concept similarity tests show that the proposed method significantly outperforms strong unimodal baselines and state-of-the-art multimodal models.

pdf bib
Addressing Troublesome Words in Neural Machine Translation
Yang Zhao | Jiajun Zhang | Zhongjun He | Chengqing Zong | Hua Wu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

One of the weaknesses of Neural Machine Translation (NMT) is in handling lowfrequency and ambiguous words, which we refer as troublesome words. To address this problem, we propose a novel memoryenhanced NMT method. First, we investigate different strategies to define and detect the troublesome words. Then, a contextual memory is constructed to memorize which target words should be produced in what situations. Finally, we design a hybrid model to dynamically access the contextual memory so as to correctly translate the troublesome words. The extensive experiments on Chinese-to-English and English-to-German translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling troublesome words.

pdf bib
Three Strategies to Improve One-to-Many Multilingual Translation
Yining Wang | Jiajun Zhang | Feifei Zhai | Jingfang Xu | Chengqing Zong
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Due to the benefits of model compactness, multilingual translation (including many-to-one, many-to-many and one-to-many) based on a universal encoder-decoder architecture attracts more and more attention. However, previous studies show that one-to-many translation based on this framework cannot perform on par with the individually trained models. In this work, we introduce three strategies to improve one-to-many multilingual translation by balancing the shared and unique features. Within the architecture of one decoder for all target languages, we first exploit the use of unique initial states for different target languages. Then, we employ language-dependent positional embeddings. Finally and especially, we propose to divide the hidden cells of the decoder into shared and language-dependent ones. The extensive experiments demonstrate that our proposed methods can obtain remarkable improvements over the strong baselines. Moreover, our strategies can achieve comparable or even better performance than the individually trained translation models.

pdf bib
A Teacher-Student Framework for Maintainable Dialog Manager
Weikang Wang | Jiajun Zhang | Han Zhang | Mei-Yuh Hwang | Chengqing Zong | Zhifei Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Reinforcement learning (RL) is an attractive solution for task-oriented dialog systems. However, extending RL-based systems to handle new intents and slots requires a system redesign. The high maintenance cost makes it difficult to apply RL methods to practical systems on a large scale. To address this issue, we propose a practical teacher-student framework to extend RL-based dialog systems without retraining from scratch. Specifically, the “student” is an extended dialog manager based on a new ontology, and the “teacher” is existing resources used for guiding the learning process of the “student”. By specifying constraints held in the new dialog manager, we transfer knowledge of the “teacher” to the “student” without additional resources. Experiments show that the performance of the extended system is comparable to the system trained from scratch. More importantly, the proposed framework makes no assumption about the unsupported intents and slots, which makes it possible to improve RL-based systems incrementally.

pdf bib
MSMO: Multimodal Summarization with Multimodal Output
Junnan Zhu | Haoran Li | Tianshang Liu | Yu Zhou | Jiajun Zhang | Chengqing Zong
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Multimodal summarization has drawn much attention due to the rapid growth of multimedia data. The output of the current multimodal summarization systems is usually represented in texts. However, we have found through experiments that multimodal output can significantly improve user satisfaction for informativeness of summaries. In this paper, we propose a novel task, multimodal summarization with multimodal output (MSMO). To handle this task, we first collect a large-scale dataset for MSMO research. We then propose a multimodal attention model to jointly generate text and select the most relevant image from the multimodal input. Finally, to evaluate multimodal outputs, we construct a novel multimodal automatic evaluation (MMAE) method which considers both intra-modality salience and inter-modality relevance. The experimental results show the effectiveness of MMAE.

pdf bib
Ensure the Correctness of the Summary: Incorporate Entailment Knowledge into Abstractive Sentence Summarization
Haoran Li | Junnan Zhu | Jiajun Zhang | Chengqing Zong
Proceedings of the 27th International Conference on Computational Linguistics

In this paper, we investigate the sentence summarization task that produces a summary from a source sentence. Neural sequence-to-sequence models have gained considerable success for this task, while most existing approaches only focus on improving the informativeness of the summary, which ignore the correctness, i.e., the summary should not contain unrelated information with respect to the source sentence. We argue that correctness is an essential requirement for summarization systems. Considering a correct summary is semantically entailed by the source sentence, we incorporate entailment knowledge into abstractive summarization models. We propose an entailment-aware encoder under multi-task framework (i.e., summarization generation and entailment recognition) and an entailment-aware decoder by entailment Reward Augmented Maximum Likelihood (RAML) training. Experiment results demonstrate that our models significantly outperform baselines from the aspects of informativeness and correctness.

pdf bib
Source Critical Reinforcement Learning for Transferring Spoken Language Understanding to a New Language
He Bai | Yu Zhou | Jiajun Zhang | Liang Zhao | Mei-Yuh Hwang | Chengqing Zong
Proceedings of the 27th International Conference on Computational Linguistics

To deploy a spoken language understanding (SLU) model to a new language, language transferring is desired to avoid the trouble of acquiring and labeling a new big SLU corpus. An SLU corpus is a monolingual corpus with domain/intent/slot labels. Translating the original SLU corpus into the target language is an attractive strategy. However, SLU corpora consist of plenty of semantic labels (slots), which general-purpose translators cannot handle well, not to mention additional culture differences. This paper focuses on the language transferring task given a small in-domain parallel SLU corpus. The in-domain parallel corpus can be used as the first adaptation on the general translator. But more importantly, we show how to use reinforcement learning (RL) to further adapt the adapted translator, where translated sentences with more proper slot tags receive higher rewards. Our reward is derived from the source input sentence exclusively, unlike reward via actor-critical methods or computing reward with a ground truth target sentence. Hence we can adapt the translator the second time, using the big monolingual SLU corpus from the source language. We evaluate our approach on Chinese to English language transferring for SLU systems. The experimental results show that the generated English SLU corpus via adaptation and reinforcement learning gives us over 97% in the slot F1 score and over 84% accuracy in domain classification. It demonstrates the effectiveness of the proposed language transferring method. Compared with naive translation, our proposed method improves domain classification accuracy by relatively 22%, and the slot filling F1 score by relatively more than 71%.

2017

pdf bib
Neural System Combination for Machine Translation
Long Zhou | Wenpeng Hu | Jiajun Zhang | Chengqing Zong
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Neural machine translation (NMT) becomes a new approach to machine translation and generates much more fluent results compared to statistical machine translation (SMT). However, SMT is usually better than NMT in translation adequacy. It is therefore a promising direction to combine the advantages of both NMT and SMT. In this paper, we propose a neural system combination framework leveraging multi-source NMT, which takes as input the outputs of NMT and SMT systems and produces the final translation. Extensive experiments on the Chinese-to-English translation task show that our model archives significant improvement by 5.3 BLEU points over the best single system output and 3.4 BLEU points over the state-of-the-art traditional system combination methods.

pdf bib
Towards Neural Machine Translation with Partially Aligned Corpora
Yining Wang | Yang Zhao | Jiajun Zhang | Chengqing Zong | Zhengshan Xue
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While neural machine translation (NMT) has become the new paradigm, the parameter optimization requires large-scale parallel data which is scarce in many domains and language pairs. In this paper, we address a new translation scenario in which there only exists monolingual corpora and phrase pairs. We propose a new method towards translation with partially aligned sentence pairs which are derived from the phrase pairs and monolingual corpora. To make full use of the partially aligned corpora, we adapt the conventional NMT training method in two aspects. On one hand, different generation strategies are designed for aligned and unaligned target words. On the other hand, a different objective function is designed to model the partially aligned parts. The experiments demonstrate that our method can achieve a relatively good result in such a translation scenario, and tiny bitexts can boost translation quality to a large extent.

pdf bib
Learning from Parenthetical Sentences for Term Translation in Machine Translation
Guoping Huang | Jiajun Zhang | Yu Zhou | Chengqing Zong
Proceedings of the 9th SIGHAN Workshop on Chinese Language Processing

Terms extensively exist in specific domains, and term translation plays a critical role in domain-specific machine translation (MT) tasks. However, it’s a challenging task to translate them correctly for the huge number of pre-existing terms and the endless new terms. To achieve better term translation quality, it is necessary to inject external term knowledge into the underlying MT system. Fortunately, there are plenty of term translation knowledge in parenthetical sentences on the Internet. In this paper, we propose a simple, straightforward and effective framework to improve term translation by learning from parenthetical sentences. This framework includes: (1) a focused web crawler; (2) a parenthetical sentence filter, acquiring parenthetical sentences including bilingual term pairs; (3) a term translation knowledge extractor, extracting bilingual term translation candidates; (4) a probability learner, generating the term translation table for MT decoders. The extensive experiments demonstrate that our proposed framework significantly improves the translation quality of terms and sentences.

pdf bib
Exploiting Word Internal Structures for Generic Chinese Sentence Representation
Shaonan Wang | Jiajun Zhang | Chengqing Zong
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We introduce a novel mixed characterword architecture to improve Chinese sentence representations, by utilizing rich semantic information of word internal structures. Our architecture uses two key strategies. The first is a mask gate on characters, learning the relation among characters in a word. The second is a maxpooling operation on words, adaptively finding the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task.

pdf bib
Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video
Haoran Li | Junnan Zhu | Cong Ma | Jiajun Zhang | Chengqing Zong
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

The rapid increase of the multimedia data over the Internet necessitates multi-modal summarization from collections of text, image, audio and video. In this work, we propose an extractive Multi-modal Summarization (MMS) method which can automatically generate a textual summary given a set of documents, images, audios and videos related to a specific topic. The key idea is to bridge the semantic gaps between multi-modal contents. For audio information, we design an approach to selectively use its transcription. For vision information, we learn joint representations of texts and images using a neural network. Finally, all the multi-modal aspects are considered to generate the textural summary by maximizing the salience, non-redundancy, readability and coverage through budgeted optimization of submodular functions. We further introduce an MMS corpus in English and Chinese. The experimental results on this dataset demonstrate that our method outperforms other competitive baseline methods.

2016

pdf bib
A Bilingual Discourse Corpus and Its Applications
Yang Liu | Jiajun Zhang | Chengqing Zong | Yating Yang | Xi Zhou
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Existing discourse research only focuses on the monolingual languages and the inconsistency between languages limits the power of the discourse theory in multilingual applications such as machine translation. To address this issue, we design and build a bilingual discource corpus in which we are currently defining and annotating the bilingual elementary discourse units (BEDUs). The BEDUs are then organized into hierarchical structures. Using this discourse style, we have annotated nearly 20K LDC sentences. Finally, we design a bilingual discourse based method for machine translation evaluation and show the effectiveness of our bilingual discourse annotations.

pdf bib
Exploiting Source-side Monolingual Data in Neural Machine Translation
Jiajun Zhang | Chengqing Zong
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
An End-to-End Chinese Discourse Parser with Adaptation to Explicit and Non-explicit Relation Recognition
Xiaomian Kang | Haoran Li | Long Zhou | Jiajun Zhang | Chengqing Zong
Proceedings of the CoNLL-16 shared task

pdf bib
An Empirical Exploration of Skip Connections for Sequential Tagging
Huijia Wu | Jiajun Zhang | Chengqing Zong
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we empirically explore the effects of various kinds of skip connections in stacked bidirectional LSTMs for sequential tagging. We investigate three kinds of skip connections connecting to LSTM cells: (a) skip connections to the gates, (b) skip connections to the internal states and (c) skip connections to the cell outputs. We present comprehensive experiments showing that skip connections to cell outputs outperform the remaining two. Furthermore, we observe that using gated identity functions as skip mappings works pretty well. Based on this novel skip connections, we successfully train deep stacked bidirectional LSTM models and obtain state-of-the-art results on CCG supertagging and comparable results on POS tagging.

pdf bib
Different Contexts Lead to Different Word Embeddings
Wenpeng Hu | Jiajun Zhang | Nan Zheng
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recent work for learning word representations has applied successfully to many NLP applications, such as sentiment analysis and question answering. However, most of these models assume a single vector per word type without considering polysemy and homonymy. In this paper, we present an extension to the CBOW model which not only improves the quality of embeddings but also makes embeddings suitable for polysemy. It differs from most of the related work in that it learns one semantic center embedding and one context bias instead of training multiple embeddings per word type. Different context leads to different bias which is defined as the weighted average embeddings of local context. Experimental results on similarity task and analogy task show that the word representations learned by the proposed method outperform the competitive baselines.

2014

pdf bib
Bilingually-constrained Phrase Embeddings for Machine Translation
Jiajun Zhang | Shujie Liu | Mu Li | Ming Zhou | Chengqing Zong
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
RNN-based Derivation Structure Prediction for SMT
Feifei Zhai | Jiajun Zhang | Yu Zhou | Chengqing Zong
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Unsupervised Tree Induction for Tree-based Translation
Feifei Zhai | Jiajun Zhang | Yu Zhou | Chengqing Zong
Transactions of the Association for Computational Linguistics, Volume 1

In current research, most tree-based translation models are built directly from parse trees. In this study, we go in another direction and build a translation model with an unsupervised tree structure derived from a novel non-parametric Bayesian model. In the model, we utilize synchronous tree substitution grammars (STSG) to capture the bilingual mapping between language pairs. To train the model efficiently, we develop a Gibbs sampler with three novel Gibbs operators. The sampler is capable of exploring the infinite space of tree structures by performing local changes on the tree nodes. Experimental results show that the string-to-tree translation system using our Bayesian tree structures significantly outperforms the strong baseline string-to-tree system using parse trees.

pdf bib
Handling Ambiguities of Bilingual Predicate-Argument Structures for Statistical Machine Translation
Feifei Zhai | Jiajun Zhang | Yu Zhou | Chengqing Zong
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
Jiajun Zhang | Chengqing Zong
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Machine Translation by Modeling Predicate-Argument Structure Transformation
Feifei Zhai | Jiajun Zhang | Yu Zhou | Chengqing Zong
Proceedings of COLING 2012

pdf bib
Tree-based Translation without using Parse Trees
Feifei Zhai | Jiajun Zhang | Yu Zhou | Chengqing Zong
Proceedings of COLING 2012

2011

pdf bib
Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
Jiajun Zhang | Feifei Zhai | Chengqing Zong
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
A Framework for Effectively Integrating Hard and Soft Syntactic Rules into Phrase Based Translation
Jiajun Zhang | Chengqing Zong
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2

2008

pdf bib
Sentence Type Based Reordering Model for Statistical Machine Translation
Jiajun Zhang | Chengqing Zong | Shoushan Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)