Mo Yu


2020

pdf bib
MCMH: Learning Multi-Chain Multi-Hop Rules for Knowledge Graph Reasoning
Lu Zhang | Mo Yu | Tian Gao | Yue Yu
Findings of the Association for Computational Linguistics: EMNLP 2020

Multi-hop reasoning approaches over knowledge graphs infer a missing relationship between entities with a multi-hop rule, which corresponds to a chain of relationships. We extend existing works to consider a generalized form of multi-hop rules, where each rule is a set of relation chains. To learn such generalized rules efficiently, we propose a two-step approach that first selects a small set of relation chains as a rule and then evaluates the confidence of the target relationship by jointly scoring the selected chains. A game-theoretical framework is proposed to this end to simultaneously optimize the rule selection and prediction steps. Empirical results show that our multi-chain multi-hop (MCMH) rules result in superior results compared to the standard single-chain approaches, justifying both our formulation of generalized rules and the effectiveness of the proposed learning framework.

pdf bib
Frustratingly Hard Evidence Retrieval for QA Over Books
Xiangyang Mou | Mo Yu | Bingsheng Yao | Chenghao Yang | Xiaoxiao Guo | Saloni Potdar | Hui Su
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

A lot of progress has been made to improve question answering (QA) in recent years, but the special problem of QA over narrative book stories has not been explored in-depth. We formulate BookQA as an open-domain QA task given its similar dependency on evidence retrieval. We further investigate how state-of-the-art open-domain QA approaches can help BookQA. Besides achieving state-of-the-art on the NarrativeQA benchmark, our study also reveals the difficulty of evidence retrieval in books with a wealth of experiments and analysis - which necessitates future effort on novel solutions for evidence retrieval in BookQA.

pdf bib
Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning
Xiaoxiao Guo | Mo Yu | Yupeng Gao | Chuang Gan | Murray Campbell | Shiyu Chang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Interactive Fiction (IF) games with real human-written natural language texts provide a new natural evaluation for language understanding techniques. In contrast to previous text games with mostly synthetic texts, IF games pose language understanding challenges on the human-written textual descriptions of diverse and sophisticated game worlds and language generation challenges on the action command generation from less restricted combinatorial space. We take a novel perspective of IF game solving and re-formulate it as Multi-Passage Reading Comprehension (MPRC) tasks. Our approaches utilize the context-query attention mechanisms and the structured prediction in MPRC to efficiently generate and evaluate action outputs and apply an object-centric historical observation retrieval strategy to mitigate the partial observability of the textual observations. Extensive experiments on the recent IF benchmark (Jericho) demonstrate clear advantages of our approaches achieving high winning rates and low data requirements compared to all previous approaches.

2019

pdf bib
Leveraging Dependency Forest for Neural Medical Relation Extraction
Linfeng Song | Yue Zhang | Daniel Gildea | Mo Yu | Zhiguo Wang | Jinsong Su
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Medical relation extraction discovers relations between entity mentions in text, such as research articles. For this task, dependency syntax has been recognized as a crucial source of features. Yet in the medical domain, 1-best parse trees suffer from relatively low accuracies, diminishing their usefulness. We investigate a method to alleviate this problem by utilizing dependency forests. Forests contain more than one possible decisions and therefore have higher recall but more noise compared with 1-best outputs. A graph neural network is used to represent the forests, automatically distinguishing the useful syntactic information from parsing noise. Results on two benchmarks show that our method outperforms the standard tree-based methods, giving the state-of-the-art results in the literature.

pdf bib
Out-of-Domain Detection for Low-Resource Text Classification Tasks
Ming Tan | Yang Yu | Haoyu Wang | Dakuo Wang | Saloni Potdar | Shiyu Chang | Mo Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Out-of-domain (OOD) detection for low-resource text classification is a realistic but understudied task. The goal is to detect the OOD cases with limited in-domain (ID) training data, since in machine learning applications we observe that training data is often insufficient. In this work, we propose an OOD-resistant Prototypical Network to tackle this zero-shot OOD detection and few-shot ID classification task. Evaluations on real-world datasets show that the proposed solution outperforms state-of-the-art methods in zero-shot OOD detection task, while maintaining a competitive performance on ID classification task.

pdf bib
Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control
Mo Yu | Shiyu Chang | Yang Zhang | Tommi Jaakkola
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Selective rationalization has become a common mechanism to ensure that predictive models reveal how they use any available features. The selection may be soft or hard, and identifies a subset of input features relevant for prediction. The setup can be viewed as a co-operate game between the selector (aka rationale generator) and the predictor making use of only the selected features. The co-operative setting may, however, be compromised for two reasons. First, the generator typically has no direct access to the outcome it aims to justify, resulting in poor performance. Second, there’s typically no control exerted on the information left outside the selection. We revise the overall co-operative framework to address these challenges. We introduce an introspective model which explicitly predicts and incorporates the outcome into the selection process. Moreover, we explicitly control the rationale complement via an adversary so as not to leave any useful information out of the selection. We show that the two complementary mechanisms maintain both high predictive accuracy and lead to comprehensive rationales.

pdf bib
Context-Aware Conversation Thread Detection in Multi-Party Chat
Ming Tan | Dakuo Wang | Yupeng Gao | Haoyu Wang | Saloni Potdar | Xiaoxiao Guo | Shiyu Chang | Mo Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In multi-party chat, it is common for multiple conversations to occur concurrently, leading to intermingled conversation threads in chat logs. In this work, we propose a novel Context-Aware Thread Detection (CATD) model that automatically disentangles these conversation threads. We evaluate our model on four real-world datasets and demonstrate an overall im-provement in thread detection accuracy over state-of-the-art benchmarks.

pdf bib
Simple yet Effective Bridge Reasoning for Open-Domain Multi-Hop Question Answering
Wenhan Xiong | Mo Yu | Xiaoxiao Guo | Hong Wang | Shiyu Chang | Murray Campbell | William Yang Wang
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

A key challenge of multi-hop question answering (QA) in the open-domain setting is to accurately retrieve the supporting passages from a large corpus. Existing work on open-domain QA typically relies on off-the-shelf information retrieval (IR) techniques to retrieve answer passages, i.e., the passages containing the groundtruth answers. However, IR-based approaches are insufficient for multi-hop questions, as the topic of the second or further hops is not explicitly covered by the question. To resolve this issue, we introduce a new subproblem of open-domain multi-hop QA, which aims to recognize the bridge (i.e., the anchor that links to the answer passage) from the context of a set of start passages with a reading comprehension model. This model, the bridge reasoner, is trained with a weakly supervised signal and produces the candidate answer passages for the passage reader to extract the answer. On the full-wiki HotpotQA benchmark, we significantly improve the baseline method by 14 point F1. Without using any memory inefficient contextual embeddings, our result is also competitive with the state-of-the-art that applies BERT in multiple modules.

pdf bib
Do Multi-hop Readers Dream of Reasoning Chains?
Haoyu Wang | Mo Yu | Xiaoxiao Guo | Rajarshi Das | Wenhan Xiong | Tian Gao
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

General Question Answering (QA) systems over texts require the multi-hop reasoning capability, i.e. the ability to reason with information collected from multiple passages to derive the answer. In this paper we conduct a systematic analysis to assess such an ability of various existing models proposed for multi-hop QA tasks. Specifically, our analysis investigates that whether providing the full reasoning chain of multiple passages, instead of just one final passage where the answer appears, could improve the performance of the existing QA models. Surprisingly, when using the additional evidence passages, the improvements of all the existing multi-hop reading approaches are rather limited, with the highest error reduction of 5.8% on F1 (corresponding to 1.3% improvement) from the BERT model. To better understand whether the reasoning chains indeed could help find the correct answers, we further develop a co-matching-based method that leads to 13.1% error reduction with passage chains when applied to two of our base readers (including BERT). Our results demonstrate the existence of the potential improvement using explicit multi-hop reasoning and the necessity to develop models with better reasoning abilities.

pdf bib
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering
Rajarshi Das | Ameya Godbole | Dilip Kavarthapu | Zhiyu Gong | Abhishek Singhal | Mo Yu | Xiaoxiao Guo | Tian Gao | Hamed Zamani | Manzil Zaheer | Andrew McCallum
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Multi-hop question answering (QA) requires an information retrieval (IR) system that can find multiple supporting evidence needed to answer the question, making the retrieval process very challenging. This paper introduces an IR technique that uses information of entities present in the initially retrieved evidence to learn to ‘hop’ to other relevant evidence. In a setting, with more than 5 million Wikipedia paragraphs, our approach leads to significant boost in retrieval performance. The retrieved evidence also increased the performance of an existing QA model (without any training) on the benchmark by 10.59 F1.

pdf bib
Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers
Haoyu Wang | Ming Tan | Mo Yu | Shiyu Chang | Dakuo Wang | Kun Xu | Xiaoxiao Guo | Saloni Potdar
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Many approaches to extract multiple relations from a paragraph require multiple passes over the paragraph. In practice, multiple passes are computationally expensive and this makes difficult to scale to longer paragraphs and larger text corpora. In this work, we focus on the task of multiple relation extractions by encoding the paragraph only once. We build our solution upon the pre-trained self-attentive models (Transformer), where we first add a structured prediction layer to handle extraction between multiple entity pairs, then enhance the paragraph embedding to capture multiple relational information associated with each entity with entity-aware attention. We show that our approach is not only scalable but can also perform state-of-the-art on the standard benchmark ACE 2005.

pdf bib
Self-Supervised Learning for Contextualized Extractive Summarization
Hong Wang | Xin Wang | Wenhan Xiong | Mo Yu | Xiaoxiao Guo | Shiyu Chang | William Yang Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Existing models for extractive summarization are usually trained from scratch with a cross-entropy loss, which does not explicitly capture the global context at the document level. In this paper, we aim to improve this task by introducing three auxiliary pre-training tasks that learn to capture the document-level context in a self-supervised fashion. Experiments on the widely-used CNN/DM dataset validate the effectiveness of the proposed auxiliary tasks. Furthermore, we show that after pre-training, a clean model with simple building blocks is able to outperform previous state-of-the-art that are carefully designed.

pdf bib
Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network
Kun Xu | Liwei Wang | Mo Yu | Yansong Feng | Yan Song | Zhiguo Wang | Dong Yu
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Previous cross-lingual knowledge graph (KG) alignment studies rely on entity embeddings derived only from monolingual KG structural information, which may fail at matching entities that have different facts in two KGs. In this paper, we introduce the topic entity graph, a local sub-graph of an entity, to represent entities with their contextual information in KG. From this view, the KB-alignment task can be formulated as a graph matching problem; and we further propose a graph-attention based solution, which first matches all entities in two topic entity graphs, and then jointly model the local matching information to derive a graph-level matching vector. Experiments show that our model outperforms previous state-of-the-art methods by a large margin.

pdf bib
Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader
Wenhan Xiong | Mo Yu | Shiyu Chang | Xiaoxiao Guo | William Yang Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a new end-to-end question answering model, which learns to aggregate answer evidence from an incomplete knowledge base (KB) and a set of retrieved text snippets.Under the assumptions that structured data is easier to query and the acquired knowledge can help the understanding of unstructured text, our model first accumulates knowledge ofKB entities from a question-related KB sub-graph; then reformulates the question in the latent space and reads the text with the accumulated entity knowledge at hand. The evidence from KB and text are finally aggregated to predict answers. On the widely-used KBQA benchmark WebQSP, our model achieves consistent improvements across settings with different extents of KB incompleteness.

pdf bib
Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets
Guanhua Zhang | Bing Bai | Jian Liang | Kun Bai | Shiyu Chang | Mo Yu | Conghui Zhu | Tiejun Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Natural Language Sentence Matching (NLSM) has gained substantial attention from both academics and the industry, and rich public datasets contribute a lot to this process. However, biased datasets can also hurt the generalization performance of trained models and give untrustworthy evaluation results. For many NLSM datasets, the providers select some pairs of sentences into the datasets, and this sampling procedure can easily bring unintended pattern, i.e., selection bias. One example is the QuoraQP dataset, where some content-independent naive features are unreasonably predictive. Such features are the reflection of the selection bias and termed as the “leakage features.” In this paper, we investigate the problem of selection bias on six NLSM datasets and find that four out of them are significantly biased. We further propose a training and evaluation framework to alleviate the bias. Experimental results on QuoraQP suggest that the proposed framework can improve the generalization ability of trained models, and give more trustworthy evaluation results for real-world adoptions.

pdf bib
TWEETQA: A Social Media Focused Question Answering Dataset
Wenhan Xiong | Jiawei Wu | Hong Wang | Vivek Kulkarni | Mo Yu | Shiyu Chang | Xiaoxiao Guo | William Yang Wang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

With social media becoming increasingly popular on which lots of news and real-time events are reported, developing automated question answering systems is critical to the effective-ness of many applications that rely on real-time knowledge. While previous datasets have concentrated on question answering (QA) for formal text like news and Wikipedia, we present the first large-scale dataset for QA over social media data. To ensure that the tweets we collected are useful, we only gather tweets used by journalists to write news articles. We then ask human annotators to write questions and answers upon these tweets. Unlike otherQA datasets like SQuAD in which the answers are extractive, we allow the answers to be abstractive. We show that two recently proposed neural models that perform well on formal texts are limited in their performance when applied to our dataset. In addition, even the fine-tuned BERT model is still lagging behind human performance with a large margin. Our results thus point to the need of improved QA systems targeting social media text.

pdf bib
Multi-Granular Text Encoding for Self-Explaining Categorization
Zhiguo Wang | Yue Zhang | Mo Yu | Wei Zhang | Lin Pan | Linfeng Song | Kun Xu | Yousef El-Kurdi
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Self-explaining text categorization requires a classifier to make a prediction along with supporting evidence. A popular type of evidence is sub-sequences extracted from the input text which are sufficient for the classifier to make the prediction. In this work, we define multi-granular ngrams as basic units for explanation, and organize all ngrams into a hierarchical structure, so that shorter ngrams can be reused while computing longer ngrams. We leverage the tree-structured LSTM to learn a context-independent representation for each unit via parameter sharing. Experiments on medical disease classification show that our model is more accurate, efficient and compact than the BiLSTM and CNN baselines. More importantly, our model can extract intuitive multi-granular evidence to support its predictions.

pdf bib
NE-Table: A Neural key-value table for Named Entities
Janarthanan Rajendran | Jatin Ganhotra | Xiaoxiao Guo | Mo Yu | Satinder Singh | Lazaros Polymenakos
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. In this paper, we propose a solution for this problem, and present empirical evaluations on: a) a structured Question-Answering task, b) three related Goal-Oriented dialog tasks, and c) a Reading-Comprehension task, which show that the proposed method can be effective in dealing with both in-vocabulary and OOV NEs. We create extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set which are available at - https://github.com/IBM/ne-table-datasets/

pdf bib
Imposing Label-Relational Inductive Bias for Extremely Fine-Grained Entity Typing
Wenhan Xiong | Jiawei Wu | Deren Lei | Mo Yu | Shiyu Chang | Xiaoxiao Guo | William Yang Wang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Existing entity typing systems usually exploit the type hierarchy provided by knowledge base (KB) schema to model label correlations and thus improve the overall performance. Such techniques, however, are not directly applicable to more open and practical scenarios where the type set is not restricted by KB schema and includes a vast number of free-form types. To model the underlying label correlations without access to manually annotated label structures, we introduce a novel label-relational inductive bias, represented by a graph propagation layer that effectively encodes both global label co-occurrence statistics and word-level similarities. On a large dataset with over 10,000 free-form types, the graph-enhanced model equipped with an attention-based matching module is able to achieve a much higher recall score while maintaining a high-level precision. Specifically, it achieves a 15.3% relative F1 improvement and also less inconsistency in the outputs. We further show that a simple modification of our proposed graph layer can also improve the performance on a conventional and widely-tested dataset that only includes KB-schema types.

pdf bib
Sentence Embedding Alignment for Lifelong Relation Extraction
Hong Wang | Wenhan Xiong | Mo Yu | Xiaoxiao Guo | Shiyu Chang | William Yang Wang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Conventional approaches to relation extraction usually require a fixed set of pre-defined relations. Such requirement is hard to meet in many real applications, especially when new data and relations are emerging incessantly and it is computationally expensive to store all data and re-train the whole model every time new data and relations come in. We formulate such challenging problem as lifelong relation extraction and investigate memory-efficient incremental learning methods without catastrophically forgetting knowledge learned from previous tasks. We first investigate a modified version of the stochastic gradient methods with a replay memory, which surprisingly outperforms recent state-of-the-art lifelong learning methods. We further propose to improve this approach to alleviate the forgetting problem by anchoring the sentence embedding space. Specifically, we utilize an explicit alignment model to mitigate the sentence embedding distortion of learned model when training on new data and new relations. Experiment results on multiple benchmarks show that our proposed method significantly outperforms the state-of-the-art lifelong learning approaches.

2018

pdf bib
Diverse Few-Shot Text Classification with Multiple Metrics
Mo Yu | Xiaoxiao Guo | Jinfeng Yi | Shiyu Chang | Saloni Potdar | Yu Cheng | Gerald Tesauro | Haoyu Wang | Bowen Zhou
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We study few-shot learning in natural language domains. Compared to many existing works that apply either metric-based or optimization-based meta-learning to image domain with low inter-task variance, we consider a more realistic setting, where tasks are diverse. However, it imposes tremendous difficulties to existing state-of-the-art metric-based algorithms since a single metric is insufficient to capture complex task variations in natural language domain. To alleviate the problem, we propose an adaptive metric learning approach that automatically determines the best weighted combination from a set of metrics obtained from meta-training tasks for a newly seen few-shot task. Extensive quantitative evaluations on real-world sentiment analysis and dialog intent classification datasets demonstrate that the proposed method performs favorably against state-of-the-art few shot learning algorithms in terms of predictive accuracy. We make our code and data available for further study.

pdf bib
Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
Tszhang Guo | Shiyu Chang | Mo Yu | Kun Bai
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recently, Reinforcement Learning (RL) approaches have demonstrated advanced performance in image captioning by directly optimizing the metric used for testing. However, this shaped reward introduces learning biases, which reduces the readability of generated text. In addition, the large sample space makes training unstable and slow.To alleviate these issues, we propose a simple coherent solution that constrains the action space using an n-gram language prior. Quantitative and qualitative evaluations on benchmarks show that RL with the simple add-on module performs favorably against its counterpart in terms of both readability and speed of convergence. Human evaluation results show that our model is more human readable and graceful. The implementation will become publicly available upon the acceptance of the paper.

pdf bib
Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model
Kun Xu | Lingfei Wu | Zhiguo Wang | Mo Yu | Liwei Chen | Vadim Sheinin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Existing neural semantic parsers mainly utilize a sequence encoder, i.e., a sequential LSTM, to extract word order features while neglecting other valuable syntactic information such as dependency or constituent trees. In this paper, we first propose to use the syntactic graph to represent three types of syntactic information, i.e., word order, dependency and constituency features; then employ a graph-to-sequence model to encode the syntactic graph and decode a logical form. Experimental results on benchmark datasets show that our model is comparable to the state-of-the-art on Jobs640, ATIS, and Geo880. Experimental results on adversarial examples demonstrate the robustness of the model is also improved by encoding more syntactic information.

pdf bib
Deriving Machine Attention from Human Rationales
Yujia Bao | Shiyu Chang | Mo Yu | Regina Barzilay
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Attention-based models are successful when trained on large amounts of data. In this paper, we demonstrate that even in the low-resource scenario, attention can be learned effectively. To this end, we start with discrete human-annotated rationales and map them into continuous attention. Our central hypothesis is that this mapping is general across domains, and thus can be transferred from resource-rich domains to low-resource ones. Our model jointly learns a domain-invariant representation and induces the desired mapping between rationales and attention. Our empirical results validate this hypothesis and show that our approach delivers significant gains over state-of-the-art baselines, yielding over 15% average error reduction on benchmark datasets.

pdf bib
One-Shot Relational Learning for Knowledge Graphs
Wenhan Xiong | Mo Yu | Shiyu Chang | Xiaoxiao Guo | William Yang Wang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Knowledge graphs (KG) are the key components of various natural language processing applications. To further expand KGs’ coverage, previous studies on knowledge graph completion usually require a large number of positive examples for each relation. However, we observe long-tail relations are actually more common in KGs and those newly added relations often do not have many known triples for training. In this work, we aim at predicting new facts under a challenging setting where only one training instance is available. We propose a one-shot relational learning framework, which utilizes the knowledge distilled by embedding models and learns a matching metric by considering both the learned embeddings and one-hop graph structures. Empirically, our model yields considerable performance improvements over existing embedding models, and also eliminates the need of re-training the embedding models when dealing with newly added relations.

pdf bib
A Co-Matching Model for Multi-choice Reading Comprehension
Shuohang Wang | Mo Yu | Jing Jiang | Shiyu Chang
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Multi-choice reading comprehension is a challenging task, which involves the matching between a passage and a question-answer pair. This paper proposes a new co-matching approach to this problem, which jointly models whether a passage can match both a question and a candidate answer. Experimental results on the RACE dataset demonstrate that our approach achieves state-of-the-art performance.

2017

pdf bib
Improved Neural Relation Detection for Knowledge Base Question Answering
Mo Yu | Wenpeng Yin | Kazi Saidul Hasan | Cicero dos Santos | Bing Xiang | Bowen Zhou
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Relation detection is a core component of many NLP applications including Knowledge Base Question Answering (KBQA). In this paper, we propose a hierarchical recurrent neural network enhanced by residual learning which detects KB relations given an input question. Our method uses deep residual bidirectional LSTMs to compare questions and relation names via different levels of abstraction. Additionally, we propose a simple KBQA system that integrates entity linking and our proposed relation detector to make the two components enhance each other. Our experimental results show that our approach not only achieves outstanding relation detection performance, but more importantly, it helps our KBQA system achieve state-of-the-art accuracy for both single-relation (SimpleQuestions) and multi-relation (WebQSP) QA benchmarks.

2016

pdf bib
Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling
Gakuto Kurata | Bing Xiang | Bowen Zhou | Mo Yu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Embedding Lexical Features via Low-Rank Tensors
Mo Yu | Mark Dredze | Raman Arora | Matthew R. Gormley
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Simple Question Answering by Attentive Convolutional Neural Network
Wenpeng Yin | Mo Yu | Bing Xiang | Bowen Zhou | Hinrich Schütze
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This work focuses on answering single-relation factoid questions over Freebase. Each question can acquire the answer from a single fact of form (subject, predicate, object) in Freebase. This task, simple question answering (SimpleQA), can be addressed via a two-step pipeline: entity linking and fact selection. In fact selection, we match the subject entity in a fact candidate with the entity mention in the question by a character-level convolutional neural network (char-CNN), and match the predicate in that fact with the question by a word-level CNN (word-CNN). This work makes two main contributions. (i) A simple and effective entity linker over Freebase is proposed. Our entity linker outperforms the state-of-the-art entity linker over SimpleQA task. (ii) A novel attentive maxpooling is stacked over word-CNN, so that the predicate representation can be matched with the predicate-focused question representation more effectively. Experiments show that our system sets new state-of-the-art in this task.

2015

pdf bib
Improved Relation Extraction with Feature-Rich Compositional Embedding Models
Matthew R. Gormley | Mo Yu | Mark Dredze
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Combining Word Embeddings and Feature Embeddings for Fine-grained Relation Extraction
Mo Yu | Matthew R. Gormley | Mark Dredze
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Concrete Chinese NLP Pipeline
Nanyun Peng | Francis Ferraro | Mo Yu | Nicholas Andrews | Jay DeYoung | Max Thomas | Matthew R. Gormley | Travis Wolfe | Craig Harman | Benjamin Van Durme | Mark Dredze
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf bib
An Empirical Study of Chinese Name Matching and Applications
Nanyun Peng | Mo Yu | Mark Dredze
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Learning Composition Models for Phrase Embeddings
Mo Yu | Mark Dredze
Transactions of the Association for Computational Linguistics, Volume 3

Lexical embeddings can serve as useful representations for words for a variety of NLP tasks, but learning embeddings for phrases can be challenging. While separate embeddings are learned for each word, this is infeasible for every phrase. We construct phrase embeddings by learning how to compose word embeddings using features that capture phrase structure and context. We propose efficient unsupervised and task-specific learning objectives that scale our model to large datasets. We demonstrate improvements on both language modeling and several phrase semantic similarity tasks with various phrase lengths. We make the implementation of our model and the datasets available for general use.

2014

pdf bib
Improving Lexical Embeddings with Semantic Knowledge
Mo Yu | Mark Dredze
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Cross-lingual Projections between Languages from Different Families
Mo Yu | Tiejun Zhao | Yalong Bai | Hao Tian | Dianhai Yu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Compound Embedding Features for Semi-supervised Learning
Mo Yu | Tiejun Zhao | Daxiang Dong | Hao Tian | Dianhai Yu
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Locally Training the Log-Linear Model for SMT
Lemao Liu | Hailong Cao | Taro Watanabe | Tiejun Zhao | Mo Yu | Conghui Zhu
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Target-dependent Twitter Sentiment Classification
Long Jiang | Mo Yu | Ming Zhou | Xiaohua Liu | Tiejun Zhao
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies