Jie Zhou


2020

pdf bib
Unsupervised Paraphrasing by Simulated Annealing
Xianggen Liu | Lili Mou | Fandong Meng | Hao Zhou | Jie Zhou | Sen Song
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose UPSA, a novel approach that accomplishes Unsupervised Paraphrasing by Simulated Annealing. We model paraphrase generation as an optimization problem and propose a sophisticated objective function, involving semantic similarity, expression diversity, and language fluency of paraphrases. UPSA searches the sentence space towards this objective by performing a sequence of local editing. We evaluate our approach on various datasets, namely, Quora, Wikianswers, MSCOCO, and Twitter. Extensive results show that UPSA achieves the state-of-the-art performance compared with previous unsupervised methods in terms of both automatic and human evaluations. Further, our approach outperforms most existing domain-adapted supervised models, showing the generalizability of UPSA.

pdf bib
Hierarchy-Aware Global Model for Hierarchical Text Classification
Jie Zhou | Chunping Ma | Dingkun Long | Guangwei Xu | Ning Ding | Haoyu Zhang | Pengjun Xie | Gongshen Liu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Hierarchical text classification is an essential yet challenging subtask of multi-label text classification with a taxonomic hierarchy. Existing methods have difficulties in modeling the hierarchical label structure in a global view. Furthermore, they cannot make full use of the mutual interactions between the text feature space and the label space. In this paper, we formulate the hierarchy as a directed graph and introduce hierarchy-aware structure encoders for modeling label dependencies. Based on the hierarchy encoder, we propose a novel end-to-end hierarchy-aware global model (HiAGM) with two variants. A multi-label attention variant (HiAGM-LA) learns hierarchy-aware label embeddings through the hierarchy encoder and conducts inductive fusion of label-aware text features. A text feature propagation model (HiAGM-TP) is proposed as the deductive variant that directly feeds text features into hierarchy encoders. Compared with previous works, both HiAGM-LA and HiAGM-TP achieve significant and consistent improvements on three benchmark datasets.

pdf bib
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yongjing Yin | Fandong Meng | Jinsong Su | Chulun Zhou | Zhengyuan Yang | Jie Zhou | Jiebo Luo
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Multi-modal neural machine translation (NMT) aims to translate source sentences into a target language paired with images. However, dominant multi-modal NMT models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have potential to refine multi-modal representation learning. To deal with this issue, in this paper, we propose a novel graph-based multi-modal fusion encoder for NMT. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, these representations provide an attention-based context vector for the decoder. We evaluate our proposed encoder on the Multi30K datasets. Experimental results and in-depth analysis show the superiority of our multi-modal NMT model.

pdf bib
Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation
Qiu Ran | Yankai Lin | Peng Li | Jie Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Non-autoregressive neural machine translation (NAT) predicts the entire target sequence simultaneously and significantly accelerates inference process. However, NAT discards the dependency information in a sentence, and thus inevitably suffers from the multi-modality problem: the target tokens may be provided by different possible translations, often causing token repetitions or missing. To alleviate this problem, we propose a novel semi-autoregressive model RecoverSAT in this work, which generates a translation as a sequence of segments. The segments are generated simultaneously while each segment is predicted token-by-token. By dynamically determining segment length and deleting repetitive segments, RecoverSAT is capable of recovering from repetitive and missing token errors. Experimental results on three widely-used benchmark datasets show that our proposed model achieves more than 4 times speedup while maintaining comparable performance compared with the corresponding autoregressive model.

pdf bib
A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking
Yong Shan | Zekang Li | Jinchao Zhang | Fandong Meng | Yang Feng | Cheng Niu | Jie Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Recent studies in dialogue state tracking (DST) leverage historical information to determine states which are generally represented as slot-value pairs. However, most of them have limitations to efficiently exploit relevant context due to the lack of a powerful mechanism for modeling interactions between the slot and the dialogue history. Besides, existing methods usually ignore the slot imbalance problem and treat all slots indiscriminately, which limits the learning of hard slots and eventually hurts overall performance. In this paper, we propose to enhance the DST through employing a contextual hierarchical attention network to not only discern relevant information at both word level and turn level but also learn contextual representations. We further propose an adaptive objective to alleviate the slot imbalance problem by dynamically adjust weights of different slots during training. Experimental results show that our approach reaches 52.68% and 58.55% joint accuracy on MultiWOZ 2.0 and MultiWOZ 2.1 datasets respectively and achieves new state-of-the-art performance with considerable improvements (+1.24% and +5.98%).

pdf bib
Continual Relation Learning via Episodic Memory Activation and Reconsolidation
Xu Han | Yi Dai | Tianyu Gao | Yankai Lin | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Continual relation learning aims to continually train a model on new data to learn incessantly emerging novel relations while avoiding catastrophically forgetting old relations. Some pioneering work has proved that storing a handful of historical relation examples in episodic memory and replaying them in subsequent training is an effective solution for such a challenging problem. However, these memory-based methods usually suffer from overfitting the few memorized examples of old relations, which may gradually cause inevitable confusion among existing relations. Inspired by the mechanism in human long-term memory formation, we introduce episodic memory activation and reconsolidation (EMAR) to continual relation learning. Every time neural models are activated to learn both new and memorized data, EMAR utilizes relation prototypes for memory reconsolidation exercise to keep a stable understanding of old relations. The experimental results show that EMAR could get rid of catastrophically forgetting old relations and outperform the state-of-the-art continual learning models.

pdf bib
Diversifying Dialogue Generation with Non-Conversational Text
Hui Su | Xiaoyu Shen | Sanqiang Zhao | Zhou Xiao | Pengwei Hu | Randy Zhong | Cheng Niu | Jie Zhou
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Neural network-based sequence-to-sequence (seq2seq) models strongly suffer from the low-diversity problem when it comes to open-domain dialogue generation. As bland and generic utterances usually dominate the frequency distribution in our daily chitchat, avoiding them to generate more interesting responses requires complex data filtering, sampling techniques or modifying the training objective. In this paper, we propose a new perspective to diversify dialogue generation by leveraging non-conversational text. Compared with bilateral conversations, non-conversational text are easier to obtain, more diverse and cover a much broader range of topics. We collect a large-scale non-conversational corpus from multi sources including forum comments, idioms and book snippets. We further present a training paradigm to effectively incorporate these text via iterative back translation. The resulting model is tested on two conversational datasets from different domains and is shown to produce significantly more diverse responses without sacrificing the relevance with context.

pdf bib
ECNU-SenseMaker at SemEval-2020 Task 4: Leveraging Heterogeneous Knowledge Resources for Commonsense Validation and Explanation
Qian Zhao | Siyu Tao | Jie Zhou | Linlin Wang | Xin Lin | Liang He
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes our system for SemEval-2020 Task 4: Commonsense Validation and Explanation (Wang et al., 2020). We propose a novel Knowledge-enhanced Graph Attention Network (KEGAT) architecture for this task, leveraging heterogeneous knowledge from both the structured knowledge base (i.e. ConceptNet) and unstructured text to better improve the ability of a machine in commonsense understanding. This model has a powerful commonsense inference capability via utilizing suitable commonsense incorporation methods and upgraded data augmentation techniques. Besides, an internal sharing mechanism is cooperated to prohibit our model from insufficient and excessive reasoning for commonsense. As a result, this model performs quite well in both validation and explanation. For instance, it achieves state-of-the-art accuracy in the subtask called Commonsense Explanation (Multi-Choice). We officially name the system as ECNU-SenseMaker. Code is publicly available at https://github.com/ECNU-ICA/ECNU-SenseMaker.

pdf bib
Document Sub-structure in Neural Machine Translation
Radina Dobreva | Jie Zhou | Rachel Bawden
Proceedings of the 12th Language Resources and Evaluation Conference

Current approaches to machine translation (MT) either translate sentences in isolation, disregarding the context they appear in, or model context at the level of the full document, without a notion of any internal structure the document may have. In this work we consider the fact that documents are rarely homogeneous blocks of text, but rather consist of parts covering different topics. Some documents, such as biographies and encyclopedia entries, have highly predictable, regular structures in which sections are characterised by different topics. We draw inspiration from Louis and Webber (2014) who use this information to improve statistical MT and transfer their proposal into the framework of neural MT. We compare two different methods of including information about the topic of the section within which each sentence is found: one using side constraints and the other using a cache-based model. We create and release the data on which we run our experiments - parallel corpora for three language pairs (Chinese-English, French-English, Bulgarian-English) from Wikipedia biographies, which we extract automatically, preserving the boundaries of sections within the articles.

pdf bib
A Sentiment-Controllable Topic-to-Essay Generator with Topic Knowledge Graph
Lin Qiao | Jianhao Yan | Fandong Meng | Zhendong Yang | Jie Zhou
Findings of the Association for Computational Linguistics: EMNLP 2020

Generating a vivid, novel, and diverse essay with only several given topic words is a promising task of natural language generation. Previous work in this task exists two challenging problems: neglect of sentiment beneath the text and insufficient utilization of topic-related knowledge. Therefore, we propose a novel Sentiment Controllable topic-to- essay generator with a Topic Knowledge Graph enhanced decoder, named SCTKG, which is based on the conditional variational auto-encoder (CVAE) framework. We firstly inject the sentiment information into the generator for controlling sentiment for each sentence, which leads to various generated essays. Then we design a Topic Knowledge Graph enhanced decoder. Unlike existing models that use knowledge entities separately, our model treats knowledge graph as a whole and encodes more structured, connected semantic information in the graph to generate a more relevant essay. Experimental results show that our SCTKG can generate sentiment controllable essays and outperform the state-of-the-art approach in terms of topic relevance, fluency, and diversity on both automatic and human evaluation.

pdf bib
SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis
Jie Zhou | Junfeng Tian | Rui Wang | Yuanbin Wu | Wenming Xiao | Liang He
Proceedings of the 28th International Conference on Computational Linguistics

Pre-trained language models have been widely applied to cross-domain NLP tasks like sentiment analysis, achieving state-of-the-art performance. However, due to the variety of users’ emotional expressions across domains, fine-tuning the pre-trained models on the source domain tends to overfit, leading to inferior results on the target domain. In this paper, we pre-train a sentiment-aware language model (SentiX) via domain-invariant sentiment knowledge from large-scale review datasets, and utilize it for cross-domain sentiment analysis task without fine-tuning. We propose several pre-training tasks based on existing lexicons and annotations at both token and sentence levels, such as emoticons, sentiment words, and ratings, without human interference. A series of experiments are conducted and the results indicate the great advantages of our model. We obtain new state-of-the-art results in all the cross-domain sentiment analysis tasks, and our proposed SentiX can be trained with only 1% samples (18 samples) and it achieves better performance than BERT with 90% samples.

pdf bib
Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack
Keqing He | Jinchao Zhang | Yuanmeng Yan | Weiran Xu | Cheng Niu | Jie Zhou
Proceedings of the 28th International Conference on Computational Linguistics

Zero-shot slot filling has widely arisen to cope with data scarcity in target domains. However, previous approaches often ignore constraints between slot value representation and related slot description representation in the latent space and lack enough model robustness. In this paper, we propose a Contrastive Zero-Shot Learning with Adversarial Attack (CZSL-Adv) method for the cross-domain slot filling. The contrastive loss aims to map slot value contextual representations to the corresponding slot description representations. And we introduce an adversarial attack training strategy to improve model robustness. Experimental results show that our model significantly outperforms state-of-the-art baselines under both zero-shot and few-shot settings.

pdf bib
One Comment from One Perspective: An Effective Strategy for Enhancing Automatic Music Comment
Tengfei Huo | Zhiqiang Liu | Jinchao Zhang | Jie Zhou
Proceedings of the 28th International Conference on Computational Linguistics

The automatic generation of music comments is of great significance for increasing the popularity of music and the music platform’s activity. In human music comments, there exists high distinction and diverse perspectives for the same song. In other words, for a song, different comments stem from different musical perspectives. However, to date, this characteristic has not been considered well in research on automatic comment generation. The existing methods tend to generate common and meaningless comments. In this paper, we propose an effective multi-perspective strategy to enhance the diversity of the generated comments. The experiment results on two music comment datasets show that our proposed model can effectively generate a series of diverse music comments based on different perspectives, which outperforms state-of-the-art baselines by a substantial margin.

pdf bib
Token-level Adaptive Training for Neural Machine Translation
Shuhao Gu | Jinchao Zhang | Fandong Meng | Yang Feng | Wanying Xie | Jie Zhou | Dong Yu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

There exists a token imbalance phenomenon in natural language as different tokens appear with different frequencies, which leads to different learning difficulties for tokens in Neural Machine Translation (NMT). The vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies and tends to generate more high-frequency tokens and less low-frequency tokens compared with the golden token distribution. However, low-frequency tokens may carry critical semantic information that will affect the translation quality once they are neglected. In this paper, we explored target token-level adaptive objectives based on token frequencies to assign appropriate weights for each target token during training. We aimed that those meaningful but relatively low-frequency words could be assigned with larger weights in objectives to encourage the model to pay more attention to these tokens. Our method yields consistent improvements in translation quality on ZH-EN, EN-RO, and EN-DE translation tasks, especially on sentences that contain more low-frequency tokens where we can get 1.68, 1.02, and 0.52 BLEU increases compared with baseline, respectively. Further analyses show that our method can also improve the lexical diversity of translation.

pdf bib
Multi-Unit Transformers for Neural Machine Translation
Jianhao Yan | Fandong Meng | Jie Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Transformer models achieve remarkable success in Neural Machine Translation. Many efforts have been devoted to deepening the Transformer by stacking several units (i.e., a combination of Multihead Attentions and FFN) in a cascade, while the investigation over multiple parallel units draws little attention. In this paper, we propose the Multi-Unit Transformer (MUTE) , which aim to promote the expressiveness of the Transformer by introducing diverse and complementary units. Specifically, we use several parallel units and show that modeling with multiple units improves model performance and introduces diversity. Further, to better leverage the advantage of the multi-unit setting, we design biased module and sequential dependency that guide and encourage complementariness among different units. Experimental results on three machine translation tasks, the NIST Chinese-to-English, WMT’14 English-to-German and WMT’18 Chinese-to-English, show that the MUTE models significantly outperform the Transformer-Base, by up to +1.52, +1.90 and +1.10 BLEU points, with only a mild drop in inference speed (about 3.1%). In addition, our methods also surpass the Transformer-Big model, with only 54% of its parameters. These results demonstrate the effectiveness of the MUTE, as well as its efficiency in both the inference process and parameter usage.

pdf bib
MAVEN: A Massive General Domain Event Detection Dataset
Xiaozhi Wang | Ziqi Wang | Xu Han | Wangyi Jiang | Rong Han | Zhiyuan Liu | Juanzi Li | Peng Li | Yankai Lin | Jie Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Event detection (ED), which means identifying event trigger words and classifying event types, is the first and most fundamental step for extracting event knowledge from plain text. Most existing datasets exhibit the following issues that limit further development of ED: (1) Data scarcity. Existing small-scale datasets are not sufficient for training and stably benchmarking increasingly sophisticated modern neural methods. (2) Low coverage. Limited event types of existing datasets cannot well cover general-domain events, which restricts the applications of ED models. To alleviate these problems, we present a MAssive eVENt detection dataset (MAVEN), which contains 4,480 Wikipedia documents, 118,732 event mention instances, and 168 event types. MAVEN alleviates the data scarcity problem and covers much more general event types. We reproduce the recent state-of-the-art ED models and conduct a thorough evaluation on MAVEN. The experimental results show that existing ED methods cannot achieve promising results on MAVEN as on the small datasets, which suggests that ED in the real world remains a challenging task and requires further research efforts. We also discuss further directions for general domain ED with empirical analyses. The source code and dataset can be obtained from https://github.com/THU-KEG/MAVEN-dataset.

pdf bib
Disentangle-based Continual Graph Representation Learning
Xiaoyu Kou | Yankai Lin | Shaobo Liu | Peng Li | Jie Zhou | Yan Zhang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Graph embedding (GE) methods embed nodes (and/or edges) in graph into a low-dimensional semantic space, and have shown its effectiveness in modeling multi-relational data. However, existing GE models are not practical in real-world applications since it overlooked the streaming nature of incoming data. To address this issue, we study the problem of continual graph representation learning which aims to continually train a GE model on new data to learn incessantly emerging multi-relational data while avoiding catastrophically forgetting old learned knowledge. Moreover, we propose a disentangle-based continual graph representation learning (DiCGRL) framework inspired by the human’s ability to learn procedural knowledge. The experimental results show that DiCGRL could effectively alleviate the catastrophic forgetting problem and outperform state-of-the-art continual learning models. The code and datasets are released on https://github.com/KXY-PUBLIC/DiCGRL.

pdf bib
Bridging the Gap between Prior and Posterior Knowledge Selection for Knowledge-Grounded Dialogue Generation
Xiuyi Chen | Fandong Meng | Peng Li | Feilong Chen | Shuang Xu | Bo Xu | Jie Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Knowledge selection plays an important role in knowledge-grounded dialogue, which is a challenging task to generate more informative responses by leveraging external knowledge. Recently, latent variable models have been proposed to deal with the diversity of knowledge selection by using both prior and posterior distributions over knowledge and achieve promising performance. However, these models suffer from a huge gap between prior and posterior knowledge selection. Firstly, the prior selection module may not learn to select knowledge properly because of lacking the necessary posterior information. Secondly, latent variable models suffer from the exposure bias that dialogue generation is based on the knowledge selected from the posterior distribution at training but from the prior distribution at inference. Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection. Experimental results on two knowledge-grounded dialogue datasets show that both PIPM and KDBTS achieve performance improvement over the state-of-the-art latent variable model and their combination shows further improvement.

pdf bib
Learning from Context or Names? An Empirical Study on Neural Relation Extraction
Hao Peng | Tianyu Gao | Xu Han | Yankai Lin | Peng Li | Zhiyuan Liu | Maosong Sun | Jie Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Neural models have achieved remarkable success on relation extraction (RE) benchmarks. However, there is no clear understanding what information in text affects existing RE models to make decisions and how to further improve the performance of these models. To this end, we empirically study the effect of two main information sources in text: textual context and entity mentions (names). We find that (i) while context is the main source to support the predictions, RE models also heavily rely on the information from entity mentions, most of which is type information, and (ii) existing datasets may leak shallow heuristics via entity mentions and thus contribute to the high performance on RE benchmarks. Based on the analyses, we propose an entity-masked contrastive pre-training framework for RE to gain a deeper understanding on both textual context and type information while avoiding rote memorization of entities or use of superficial cues in mentions. We carry out extensive experiments to support our views, and show that our framework can improve the effectiveness and robustness of neural models in different RE scenarios. All the code and datasets are released at https://github.com/thunlp/RE-Context-or-Names.

pdf bib
MovieChats: Chat like Humans in a Closed Domain
Hui Su | Xiaoyu Shen | Zhou Xiao | Zheng Zhang | Ernie Chang | Cheng Zhang | Cheng Niu | Jie Zhou
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Being able to perform in-depth chat with humans in a closed domain is a precondition before an open-domain chatbot can be ever claimed. In this work, we take a close look at the movie domain and present a large-scale high-quality corpus with fine-grained annotations in hope of pushing the limit of movie-domain chatbots. We propose a unified, readily scalable neural approach which reconciles all subtasks like intent prediction and knowledge retrieval. The model is first pretrained on the huge general-domain data, then finetuned on our corpus. We show this simple neural approach trained on high-quality data is able to outperform commercial systems replying on complex rules. On both the static and interactive tests, we find responses generated by our system exhibits remarkably good engagement and sensibleness close to human-written ones. We further analyze the limits of our work and point out potential directions for future work

2019

pdf bib
CM-Net: A Novel Collaborative Memory Network for Spoken Language Understanding
Yijin Liu | Fandong Meng | Jinchao Zhang | Jie Zhou | Yufeng Chen | Jinan Xu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Spoken Language Understanding (SLU) mainly involves two tasks, intent detection and slot filling, which are generally modeled jointly in existing works. However, most existing models fail to fully utilize cooccurrence relations between slots and intents, which restricts their potential performance. To address this issue, in this paper we propose a novel Collaborative Memory Network (CM-Net) based on the well-designed block, named CM-block. The CM-block firstly captures slot-specific and intent-specific features from memories in a collaborative manner, and then uses these enriched features to enhance local context representations, based on which the sequential information flow leads to more specific (slot and intent) global utterance representations. Through stacking multiple CM-blocks, our CM-Net is able to alternately perform information exchange among specific memories, local contexts and the global utterance, and thus incrementally enriches each other. We evaluate the CM-Net on two standard benchmarks (ATIS and SNIPS) and a self-collected corpus (CAIS). Experimental results show that the CM-Net achieves the state-of-the-art results on the ATIS and SNIPS in most of criteria, and significantly outperforms the baseline models on the CAIS. Additionally, we make the CAIS dataset publicly available for the research community.

pdf bib
Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation
Zhengxin Yang | Jinchao Zhang | Fandong Meng | Shuhao Gu | Yang Feng | Jie Zhou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Context modeling is essential to generate coherent and consistent translation for Document-level Neural Machine Translations. The widely used method for document-level translation usually compresses the context information into a representation via hierarchical attention networks. However, this method neither considers the relationship between context words nor distinguishes the roles of context words. To address this problem, we propose a query-guided capsule networks to cluster context information into different perspectives from which the target translation may concern. Experiment results show that our method can significantly outperform strong baselines on multiple data sets of different domains.

pdf bib
NumNet: Machine Reading Comprehension with Numerical Reasoning
Qiu Ran | Yankai Lin | Peng Li | Jie Zhou | Zhiyuan Liu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Numerical reasoning, such as addition, subtraction, sorting and counting is a critical skill in human’s reading comprehension, which has not been well considered in existing machine reading comprehension (MRC) systems. To address this issue, we propose a numerical MRC model named as NumNet, which utilizes a numerically-aware graph neural network to consider the comparing information and performs numerical reasoning over numbers in the question and passage. Our system achieves an EM-score of 64.56% on the DROP dataset, outperforming all existing machine reading comprehension models by considering the numerical relations among numbers.

pdf bib
A Novel Aspect-Guided Deep Transition Model for Aspect Based Sentiment Analysis
Yunlong Liang | Fandong Meng | Jinchao Zhang | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Aspect based sentiment analysis (ABSA) aims to identify the sentiment polarity towards the given aspect in a sentence, while previous models typically exploit an aspect-independent (weakly associative) encoder for sentence representation generation. In this paper, we propose a novel Aspect-Guided Deep Transition model, named AGDT, which utilizes the given aspect to guide the sentence encoding from scratch with the specially-designed deep transition architecture. Furthermore, an aspect-oriented objective is designed to enforce AGDT to reconstruct the given aspect with the generated sentence representation. In doing so, our AGDT can accurately generate aspect-specific sentence representation, and thus conduct more accurate sentiment predictions. Experimental results on multiple SemEval datasets demonstrate the effectiveness of our proposed approach, which significantly outperforms the best reported results with the same setting.

pdf bib
HMEAE: Hierarchical Modular Event Argument Extraction
Xiaozhi Wang | Ziqi Wang | Xu Han | Zhiyuan Liu | Juanzi Li | Peng Li | Maosong Sun | Jie Zhou | Xiang Ren
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Existing event extraction methods classify each argument role independently, ignoring the conceptual correlations between different argument roles. In this paper, we propose a Hierarchical Modular Event Argument Extraction (HMEAE) model, to provide effective inductive bias from the concept hierarchy of event argument roles. Specifically, we design a neural module network for each basic unit of the concept hierarchy, and then hierarchically compose relevant unit modules with logical operations into a role-oriented modular network to classify a specific argument role. As many argument roles share the same high-level unit module, their correlation can be utilized to extract specific event arguments better. Experiments on real-world datasets show that HMEAE can effectively leverage useful knowledge from the concept hierarchy and significantly outperform the state-of-the-art baselines. The source code can be obtained from https://github.com/thunlp/HMEAE.

pdf bib
FewRel 2.0: Towards More Challenging Few-Shot Relation Classification
Tianyu Gao | Xu Han | Hao Zhu | Zhiyuan Liu | Peng Li | Maosong Sun | Jie Zhou
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present FewRel 2.0, a more challenging task to investigate two aspects of few-shot relation classification models: (1) Can they adapt to a new domain with only a handful of instances? (2) Can they detect none-of-the-above (NOTA) relations? To construct FewRel 2.0, we build upon the FewRel dataset by adding a new test set in a quite different domain, and a NOTA relation choice. With the new dataset and extensive experimental analysis, we found (1) that the state-of-the-art few-shot relation classification models struggle on these two aspects, and (2) that the commonly-used techniques for domain adaptation and NOTA detection still cannot handle the two challenges well. Our research calls for more attention and further efforts to these two real-world issues. All details and resources about the dataset and baselines are released at https://github.com/thunlp/fewrel.

pdf bib
Incremental Transformer with Deliberation Decoder for Document Grounded Conversations
Zekang Li | Cheng Niu | Fandong Meng | Yang Feng | Qian Li | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Document Grounded Conversations is a task to generate dialogue responses when chatting about the content of a given document. Obviously, document knowledge plays a critical role in Document Grounded Conversations, while existing dialogue models do not exploit this kind of knowledge effectively enough. In this paper, we propose a novel Transformer-based architecture for multi-turn document grounded conversations. In particular, we devise an Incremental Transformer to encode multi-turn utterances along with knowledge in related documents. Motivated by the human cognitive process, we design a two-pass decoder (Deliberation Decoder) to improve context coherence and knowledge correctness. Our empirical study on a real-world Document Grounded Dataset proves that responses generated by our model significantly outperform competitive baselines on both context coherence and knowledge relevance.

pdf bib
Improving Multi-turn Dialogue Modelling with Utterance ReWriter
Hui Su | Xiaoyu Shen | Rongzhi Zhang | Fei Sun | Pengwei Hu | Cheng Niu | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recent research has achieved impressive results in single-turn dialogue modelling. In the multi-turn setting, however, current models are still far from satisfactory. One major challenge is the frequently occurred coreference and information omission in our daily conversation, making it hard for machines to understand the real intention. In this paper, we propose rewriting the human utterance as a pre-process to help multi-turn dialgoue modelling. Each utterance is first rewritten to recover all coreferred and omitted information. The next processing steps are then performed based on the rewritten utterance. To properly train the utterance rewriter, we collect a new dataset with human annotations and introduce a Transformer-based utterance rewriting architecture using the pointer network. We show the proposed architecture achieves remarkably good performance on the utterance rewriting task. The trained utterance rewriter can be easily integrated into online chatbots and brings general improvement over different domains.

pdf bib
DocRED: A Large-Scale Document-Level Relation Extraction Dataset
Yuan Yao | Deming Ye | Peng Li | Xu Han | Yankai Lin | Zhenghao Liu | Zhiyuan Liu | Lixin Huang | Jie Zhou | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features: (1) DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE from plain text; (2) DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document; (3) along with the human-annotated data, we also offer large-scale distantly supervised data, which enables DocRED to be adopted for both supervised and weakly supervised scenarios. In order to verify the challenges of document-level RE, we implement recent state-of-the-art methods for RE and conduct a thorough evaluation of these methods on DocRED. Empirical results show that DocRED is challenging for existing RE methods, which indicates that document-level RE remains an open problem and requires further efforts. Based on the detailed analysis on the experiments, we discuss multiple promising directions for future research. We make DocRED and the code for our baselines publicly available at https://github.com/thunlp/DocRED.

pdf bib
GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification
Jie Zhou | Xu Han | Cheng Yang | Zhiyuan Liu | Lifeng Wang | Changcheng Li | Maosong Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Fact verification (FV) is a challenging task which requires to retrieve relevant evidence from plain text and use the evidence to verify given claims. Many claims require to simultaneously integrate and reason over several pieces of evidence for verification. However, previous work employs simple models to extract information from evidence without letting evidence communicate with each other, e.g., merely concatenate the evidence for processing. Therefore, these methods are unable to grasp sufficient relational and logical information among the evidence. To alleviate this issue, we propose a graph-based evidence aggregating and reasoning (GEAR) framework which enables information to transfer on a fully-connected evidence graph and then utilizes different aggregators to collect multi-evidence information. We further employ BERT, an effective pre-trained language representation model, to improve the performance. Experimental results on a large-scale benchmark dataset FEVER have demonstrated that GEAR could leverage multi-evidence information for FV and thus achieves the promising result with a test FEVER score of 67.10%. Our code is available at https://github.com/thunlp/GEAR.

pdf bib
Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation
Zhiqiang Liu | Zuohui Fu | Jie Cao | Gerard de Melo | Yik-Cheung Tam | Cheng Niu | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Rhetoric is a vital element in modern poetry, and plays an essential role in improving its aesthetics. However, to date, it has not been considered in research on automatic poetry generation. In this paper, we propose a rhetorically controlled encoder-decoder for modern Chinese poetry generation. Our model relies on a continuous latent variable as a rhetoric controller to capture various rhetorical patterns in an encoder, and then incorporates rhetoric-based mixtures while generating modern Chinese poetry. For metaphor and personification, an automated evaluation shows that our model outperforms state-of-the-art baselines by a substantial margin, while human evaluation shows that our model generates better poems than baseline methods in terms of fluency, coherence, meaningfulness, and rhetorical aesthetics.

pdf bib
Towards Fine-grained Text Sentiment Transfer
Fuli Luo | Peng Li | Pengcheng Yang | Jie Zhou | Yutong Tan | Baobao Chang | Zhifang Sui | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we focus on the task of fine-grained text sentiment transfer (FGST). This task aims to revise an input sequence to satisfy a given sentiment intensity, while preserving the original semantic content. Different from the conventional sentiment transfer task that only reverses the sentiment polarity (positive/negative) of text, the FTST task requires more nuanced and fine-grained control of sentiment. To remedy this, we propose a novel Seq2SentiSeq model. Specifically, the numeric sentiment intensity value is incorporated into the decoder via a Gaussian kernel layer to finely control the sentiment intensity of the output. Moreover, to tackle the problem of lacking parallel data, we propose a cycle reinforcement learning algorithm to guide the model training. In this framework, the elaborately designed rewards can balance both sentiment transformation and content preservation, while not requiring any ground truth output. Experimental results show that our approach can outperform existing methods by a large margin in both automatic evaluation and human evaluation.

pdf bib
Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation
Shuming Ma | Pengcheng Yang | Tianyu Liu | Peng Li | Jie Zhou | Xu Sun
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Table-to-text generation aims to translate the structured data into the unstructured text. Most existing methods adopt the encoder-decoder framework to learn the transformation, which requires large-scale training samples. However, the lack of large parallel data is a major practical problem for many domains. In this work, we consider the scenario of low resource table-to-text generation, where only limited parallel data is available. We propose a novel model to separate the generation into two stages: key fact prediction and surface realization. It first predicts the key facts from the tables, and then generates the text with the key facts. The training of key fact prediction needs much fewer annotated data, while surface realization can be trained with pseudo parallel corpus. We evaluate our model on a biography generation dataset. Our model can achieve 27.34 BLEU score with only 1,000 parallel data, while the baseline model only obtain the performance of 9.71 BLEU score.

pdf bib
GCDT: A Global Context Enhanced Deep Transition Architecture for Sequence Labeling
Yijin Liu | Fandong Meng | Jinchao Zhang | Jinan Xu | Yufeng Chen | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Current state-of-the-art systems for sequence labeling are typically based on the family of Recurrent Neural Networks (RNNs). However, the shallow connections between consecutive hidden states of RNNs and insufficient modeling of global information restrict the potential performance of those models. In this paper, we try to address these issues, and thus propose a Global Context enhanced Deep Transition architecture for sequence labeling named GCDT. We deepen the state transition path at each position in a sentence, and further assign every token with a global representation learned from the entire sentence. Experiments on two standard sequence labeling tasks show that, given only training data and the ubiquitous word embeddings (Glove), our GCDT achieves 91.96 F1 on the CoNLL03 NER task and 95.43 F1 on the CoNLL2000 Chunking task, which outperforms the best reported results under the same settings. Furthermore, by leveraging BERT as an additional resource, we establish new state-of-the-art results with 93.47 F1 on NER and 97.30 F1 on Chunking.

pdf bib
Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation
Chenze Shao | Yang Feng | Jinchao Zhang | Fandong Meng | Xilin Chen | Jie Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Non-Autoregressive Transformer (NAT) aims to accelerate the Transformer model through discarding the autoregressive mechanism and generating target words independently, which fails to exploit the target sequential information. Over-translation and under-translation errors often occur for the above reason, especially in the long sentence translation scenario. In this paper, we propose two approaches to retrieve the target sequential information for NAT to enhance its translation ability while preserving the fast-decoding property. Firstly, we propose a sequence-level training method based on a novel reinforcement algorithm for NAT (Reinforce-NAT) to reduce the variance and stabilize the training procedure. Secondly, we propose an innovative Transformer decoder named FS-decoder to fuse the target sequential information into the top layer of the decoder. Experimental results on three translation tasks show that the Reinforce-NAT surpasses the baseline NAT system by a significant margin on BLEU without decelerating the decoding speed and the FS-decoder achieves comparable translation performance to the autoregressive Transformer with considerable speedup.

2017

pdf bib
Deep Neural Machine Translation with Linear Associative Unit
Mingxuan Wang | Zhengdong Lu | Jie Zhou | Qun Liu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art Neural Machine Translation (NMT) with its capability in modeling complex functions and capturing complex linguistic structures. However NMT with deep architecture in its encoder or decoder RNNs often suffer from severe gradient diffusion due to the non-linear recurrent activations, which often makes the optimization much more difficult. To address this problem we propose a novel linear associative units (LAU) to reduce the gradient propagation path inside the recurrent unit. Different from conventional approaches (LSTM unit and GRU), LAUs uses linear associative connections between input and output of the recurrent unit, which allows unimpeded information flow through both space and time The model is quite simple, but it is surprisingly effective. Our empirical study on Chinese-English translation shows that our model with proper configuration can improve by 11.7 BLEU upon Groundhog and the best reported on results in the same setting. On WMT14 English-German task and a larger WMT14 English-French task, our model achieves comparable results with the state-of-the-art.

pdf bib
Incorporating Word Reordering Knowledge into Attention-based Neural Machine Translation
Jinchao Zhang | Mingxuan Wang | Qun Liu | Jie Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper proposes three distortion models to explicitly incorporate the word reordering knowledge into attention-based Neural Machine Translation (NMT) for further improving translation performance. Our proposed models enable attention mechanism to attend to source words regarding both the semantic requirement and the word reordering penalty. Experiments on Chinese-English translation show that the approaches can improve word alignment quality and achieve significant translation improvements over a basic attention-based NMT by large margins. Compared with previous works on identical corpora, our system achieves the state-of-the-art performance on translation quality.

2016

pdf bib
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
Jie Zhou | Ying Cao | Xuguang Wang | Peng Li | Wei Xu
Transactions of the Association for Computational Linguistics, Volume 4

Neural machine translation (NMT) aims at solving machine translation (MT) problems using neural networks and has exhibited promising results in recent years. However, most of the existing NMT models are shallow and there is still a performance gap between a single NMT model and the best conventional MT system. In this work, we introduce a new type of linear connections, named fast-forward connections, based on deep Long Short-Term Memory (LSTM) networks, and an interleaved bi-directional architecture for stacking the LSTM layers. Fast-forward connections play an essential role in propagating the gradients and building a deep topology of depth 16. On the WMT’14 English-to-French task, we achieve BLEU=37.7 with a single attention model, which outperforms the corresponding single shallow model by 6.2 BLEU points. This is the first time that a single NMT model achieves state-of-the-art performance and outperforms the best conventional model by 0.7 BLEU points. We can still achieve BLEU=36.3 even without using an attention mechanism. After special handling of unknown words and model ensembling, we obtain the best score reported to date on this task with BLEU=40.4. Our models are also validated on the more difficult WMT’14 English-to-German task.

2015

pdf bib
End-to-end learning of semantic role labeling using recurrent neural networks
Jie Zhou | Wei Xu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2013

pdf bib
Generalization of Words for Chinese Dependency Parsing
Xianchao Wu | Jie Zhou | Yu Sun | Zhanyi Liu | Dianhai Yu | Hua Wu | Haifeng Wang
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

Search
Venues