Dian Yu


pdf bib
Dialogue-Based Relation Extraction
Dian Yu | Kai Sun | Claire Cardie | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE, aiming to support the prediction of relation(s) between two arguments that appear in a dialogue. We further offer DialogRE as a platform for studying cross-sentence RE as most facts span multiple sentences. We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks. Considering the timeliness of communication in a dialogue, we design a new metric to evaluate the performance of RE methods in a conversational setting and investigate the performance of several representative RE methods on DialogRE. Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings. DialogRE is available at https://dataset.org/dialogre/.

pdf bib
Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension
Hongyu Gong | Yelong Shen | Dian Yu | Jianshu Chen | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we study machine reading comprehension (MRC) on long texts: where a model takes as inputs a lengthy document and a query, extracts a text span from the document as an answer. State-of-the-art models (e.g., BERT) tend to use a stack of transformer layers that are pre-trained from a large number of unlabeled language corpora to encode the joint contextual information of query and document. However, these transformer models can only take as input a fixed-length (e.g., 512) text. To deal with even longer text inputs, previous approaches usually chunk them into equally-spaced segments and predict answers based on each segment independently without considering the information from other segments. As a result, they may form segments that fail to cover complete answers or retain insufficient contexts around the correct answer required for question answering. Moreover, they are less capable of answering questions that need cross-segment information. We propose to let a model learn to chunk in a more flexible way via reinforcement learning: a model can decide the next segment that it wants to process in either direction. We also apply recurrent mechanisms to enable information to flow across segments. Experiments on three MRC tasks – CoQA, QuAC, and TriviaQA – demonstrate the effectiveness of our proposed recurrent chunking mechanisms: we can obtain segments that are more likely to contain complete answers and at the same time provide sufficient contexts around the ground truth answers for better predictions.

pdf bib
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu | Hai Hu | Xuanwei Zhang | Lu Li | Chenjie Cao | Yudong Li | Yechen Xu | Kai Sun | Dian Yu | Cong Yu | Yin Tian | Qianqian Dong | Weitang Liu | Bo Shi | Yiming Cui | Junyi Li | Jun Zeng | Rongzhao Wang | Weijian Xie | Yanting Li | Yina Patterson | Zuoyu Tian | Yiwen Zhang | He Zhou | Shaoweihua Liu | Zhe Zhao | Qipeng Zhao | Cong Yue | Xinrui Zhang | Zhengliang Yang | Kyle Richardson | Zhenzhong Lan
Proceedings of the 28th International Conference on Computational Linguistics

The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). The problem, however, is that most such benchmarks are limited to English, which has made it difficult to replicate many of the successes in English NLU for other languages. To help remedy this issue, we introduce the first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark. CLUE is an open-ended, community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text. To establish results on these tasks, we report scores using an exhaustive set of current state-of-the-art pre-trained Chinese models (9 in total). We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on Chinese NLU. Our benchmark is released at https://www.cluebenchmarks.com

pdf bib
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension
Kai Sun | Dian Yu | Dong Yu | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 8

Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations. We present a comprehensive analysis of the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed for these real-world problems. We implement rule-based and popular neural methods and find that there is still a significant performance gap between the best performing model (68.5%) and human readers (96.0%), especiallyon problems that require prior knowledge. We further study the effects of distractor plausibility and data augmentation based on translated relevant datasets for English on model performance. We expect C3 to present great challenges to existing systems as answering 86.8% of questions requires both knowledge within and beyond the accompanying document, and we hope that C3 can serve as a platform to study how to leverage various kinds of prior knowledge to better understand a given written or orally oriented text. C3 is available at https://dataset.org/c3/.


pdf bib
DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension
Kai Sun | Dian Yu | Jianshu Chen | Dong Yu | Yejin Choi | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 7

We present DREAM, the first dialogue-based multiple-choice reading comprehension data set. Collected from English as a Foreign Language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our data set contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension data sets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge. DREAM is available at https://dataset.org/dream/.

pdf bib
Dependency Parsing for Spoken Dialog Systems
Sam Davidson | Dian Yu | Zhou Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Dependency parsing of conversational input can play an important role in language understanding for dialog systems by identifying the relationships between entities extracted from user utterances. Additionally, effective dependency parsing can elucidate differences in language structure and usage for discourse analysis of human-human versus human-machine dialogs. However, models trained on datasets based on news articles and web data do not perform well on spoken human-machine dialog, and currently available annotation schemes do not adapt well to dialog data. Therefore, we propose the Spoken Conversation Universal Dependencies (SCUD) annotation scheme that extends the Universal Dependencies (UD) (Nivre et al., 2016) guidelines to spoken human-machine dialogs. We also provide ConvBank, a conversation dataset between humans and an open-domain conversational dialog system with SCUD annotation. Finally, to demonstrate the utility of the dataset, we train a dependency parser on the ConvBank dataset. We demonstrate that by pre-training a dependency parser on a set of larger public datasets and fine-tuning on ConvBank data, we achieved the best result, 85.05% unlabeled and 77.82% labeled attachment accuracy.

pdf bib
Gunrock: A Social Bot for Complex and Engaging Long Conversations
Dian Yu | Michelle Cohn | Yi Mang Yang | Chun Yen Chen | Weiming Wen | Jiaping Zhang | Mingyang Zhou | Kevin Jesse | Austin Chau | Antara Bhowmick | Shreenath Iyer | Giritheja Sreenivasulu | Sam Davidson | Ashwin Bhandare | Zhou Yu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

Gunrock is the winner of the 2018 Amazon Alexa Prize, as evaluated by coherence and engagement from both real users and Amazon-selected expert conversationalists. We focus on understanding complex sentences and having in-depth conversations in open domains. In this paper, we introduce some innovative system designs and related validation analysis. Overall, we found that users produce longer sentences to Gunrock, which are directly related to users’ engagement (e.g., ratings, number of turns). Additionally, users’ backstory queries about Gunrock are positively correlated to user satisfaction. Finally, we found dialog flows that interleave facts and personal opinions and stories lead to better user satisfaction.

pdf bib
Improving Question Answering with External Knowledge
Xiaoman Pan | Kai Sun | Dian Yu | Jianshu Chen | Heng Ji | Claire Cardie | Dong Yu
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus. In this work, we explore simple yet effective methods for exploiting two sources of external knowledge for subject-area QA. The first enriches the original subject-area reference corpus with relevant text snippets extracted from an open-domain resource (i.e., Wikipedia) that cover potentially ambiguous concepts in the question and answer options. As in other QA research, the second method simply increases the amount of training data by appending additional in-domain subject-area instances. Experiments on three challenging multiple-choice science QA tasks (i.e., ARC-Easy, ARC-Challenge, and OpenBookQA) demonstrate the effectiveness of our methods: in comparison to the previous state-of-the-art, we obtain absolute gains in accuracy of up to 8.1%, 13.0%, and 12.8%, respectively. While we observe consistent gains when we introduce knowledge from Wikipedia, we find that employing additional QA training instances is not uniformly helpful: performance degrades when the added instances exhibit a higher level of difficulty than the original training data. As one of the first studies on exploiting unstructured external knowledge for subject-area QA, we hope our methods, observations, and discussion of the exposed limitations may shed light on further developments in the area.

pdf bib
Improving Pre-Trained Multilingual Model with Vocabulary Expansion
Hai Wang | Dian Yu | Kai Sun | Jianshu Chen | Dong Yu
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recently, pre-trained language models have achieved remarkable success in a broad range of natural language processing tasks. However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language. Instead of exhaustively pre-training monolingual language models independently, an alternative solution is to pre-train a powerful multilingual deep language model over large-scale corpora in hundreds of languages. However, the vocabulary size for each language in such a model is relatively small, especially for low-resource languages. This limitation inevitably hinders the performance of these multilingual models on tasks such as sequence labeling, wherein in-depth token-level or sentence-level understanding is essential. In this paper, inspired by previous methods designed for monolingual settings, we investigate two approaches (i.e., joint mapping and mixture mapping) based on a pre-trained multilingual model BERT for addressing the out-of-vocabulary (OOV) problem on a variety of tasks, including part-of-speech tagging, named entity recognition, machine translation quality estimation, and machine reading comprehension. Experimental results show that using mixture mapping is more promising. To the best of our knowledge, this is the first work that attempts to address and discuss the OOV issue in multilingual settings.

pdf bib
Evidence Sentence Extraction for Machine Reading Comprehension
Hai Wang | Dian Yu | Kai Sun | Jianshu Chen | Dong Yu | David McAllester | Dan Roth
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Remarkable success has been achieved in the last few years on some limited machine reading comprehension (MRC) tasks. However, it is still difficult to interpret the predictions of existing MRC models. In this paper, we focus on extracting evidence sentences that can explain or support the answers of multiple-choice MRC tasks, where the majority of answer options cannot be directly extracted from reference documents. Due to the lack of ground truth evidence sentence labels in most cases, we apply distant supervision to generate imperfect labels and then use them to train an evidence sentence extractor. To denoise the noisy labels, we apply a recently proposed deep probabilistic logic learning framework to incorporate both sentence-level and cross-sentence linguistic indicators for indirect supervision. We feed the extracted evidence sentences into existing MRC models and evaluate the end-to-end performance on three challenging multiple-choice MRC datasets: MultiRC, RACE, and DREAM, achieving comparable or better performance than the same models that take as input the full reference document. To the best of our knowledge, this is the first work extracting evidence sentences for multiple-choice MRC.

pdf bib
Improving Machine Reading Comprehension with General Reading Strategies
Kai Sun | Dian Yu | Dong Yu | Claire Cardie
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Reading strategies have been shown to improve comprehension levels, especially for readers lacking adequate prior knowledge. Just as the process of knowledge accumulation is time-consuming for human readers, it is resource-demanding to impart rich general domain knowledge into a deep language model via pre-training. Inspired by reading strategies identified in cognitive science, and given limited computational resources - just a pre-trained model and a fixed number of training instances - we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner. By fine-tuning a pre-trained language model (Radford et al., 2018) with our proposed strategies on the largest general domain multiple-choice MRC dataset RACE, we obtain a 5.8% absolute increase in accuracy over the previous best result achieved by the same pre-trained model fine-tuned on RACE without the use of strategies. We further fine-tune the resulting model on a target MRC task, leading to an absolute improvement of 6.2% in average accuracy over previous state-of-the-art approaches on six representative non-extractive MRC datasets from different domains (i.e., ARC, OpenBookQA, MCTest, SemEval-2018 Task 11, ROCStories, and MultiRC). These results demonstrate the effectiveness of our proposed strategies and the versatility and general applicability of our fine-tuned models that incorporate these strategies. Core code is available at https://github.com/nlpdata/strategy/.

pdf bib
UC Davis at SemEval-2019 Task 1: DAG Semantic Parsing with Attention-based Decoder
Dian Yu | Kenji Sagae
Proceedings of the 13th International Workshop on Semantic Evaluation

We present an encoder-decoder model for semantic parsing with UCCA SemEval 2019 Task 1. The encoder is a Bi-LSTM and the decoder uses recursive self-attention. The proposed model alleviates challenges and feature engineering in traditional transition-based and graph-based parsers. The resulting parser is simple and proved to effective on the semantic parsing task.


pdf bib
Open Relation Extraction and Grounding
Dian Yu | Lifu Huang | Heng Ji
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Previous open Relation Extraction (open RE) approaches mainly rely on linguistic patterns and constraints to extract important relational triples from large-scale corpora. However, they lack of abilities to cover diverse relation expressions or measure the relative importance of candidate triples within a sentence. It is also challenging to name the relation type of a relational triple merely based on context words, which could limit the usefulness of open RE in downstream applications. We propose a novel importance-based open RE approach by exploiting the global structure of a dependency tree to extract salient triples. We design an unsupervised relation type naming method by grounding relational triples to a large-scale Knowledge Base (KB) schema, leveraging KB triples and weighted context words associated with relational triples. Experiments on the English Slot Filling 2013 dataset demonstrate that our approach achieves 8.1% higher F-score over state-of-the-art open RE methods.


pdf bib
Unsupervised Person Slot Filling based on Graph Mining
Dian Yu | Heng Ji
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


pdf bib
Why Read if You Can Scan? Trigger Scoping Strategy for Biographical Fact Extraction
Dian Yu | Heng Ji | Sujian Li | Chin-Yew Lin
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Detecting Deceptive Groups Using Conversations and Network Analysis
Dian Yu | Yulia Tyshchuk | Heng Ji | William Wallace
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)


pdf bib
The Wisdom of Minority: Unsupervised Slot Filling Validation based on Multi-dimensional Truth-Finding
Dian Yu | Hongzhao Huang | Taylor Cassidy | Heng Ji | Chi Wang | Shi Zhi | Jiawei Han | Clare Voss | Malik Magdon-Ismail
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers


pdf bib
Resolving Entity Morphs in Censored Data
Hongzhao Huang | Zhen Wen | Dian Yu | Heng Ji | Yizhou Sun | Jiawei Han | He Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)