Claire Cardie

Also published as: C. Cardie


2020

pdf bib
Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings
Rishi Bommasani | Kelly Davis | Claire Cardie
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Contextualized representations (e.g. ELMo, BERT) have become the default pretrained representations for downstream NLP applications. In some settings, this transition has rendered their static embedding predecessors (e.g. Word2Vec, GloVe) obsolete. As a side-effect, we observe that older interpretability methods for static embeddings — while more diverse and mature than those available for their dynamic counterparts — are underutilized in studying newer contextualized representations. Consequently, we introduce simple and fully general methods for converting from contextualized representations to static lookup-table embeddings which we apply to 5 popular pretrained models and 9 sets of pretrained weights. Our analysis of the resulting static embeddings notably reveals that pooling over many contexts significantly improves representational quality under intrinsic evaluation. Complementary to analyzing representational quality, we consider social biases encoded in pretrained representations with respect to gender, race/ethnicity, and religion and find that bias is encoded disparately across pretrained models and internal layers even for models with the same training data. Concerningly, we find dramatic inconsistencies between social bias estimators for word embeddings.

pdf bib
Dialogue-Based Relation Extraction
Dian Yu | Kai Sun | Claire Cardie | Dong Yu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE, aiming to support the prediction of relation(s) between two arguments that appear in a dialogue. We further offer DialogRE as a platform for studying cross-sentence RE as most facts span multiple sentences. We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks. Considering the timeliness of communication in a dialogue, we design a new metric to evaluate the performance of RE methods in a conversational setting and investigate the performance of several representative RE methods on DialogRE. Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings. DialogRE is available at https://dataset.org/dialogre/.

pdf bib
Document-Level Event Role Filler Extraction using Multi-Granularity Contextualized Encoding
Xinya Du | Claire Cardie
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Few works in the literature of event extraction have gone beyond individual sentences to make extraction decisions. This is problematic when the information needed to recognize an event argument is spread across multiple sentences. We argue that document-level event extraction is a difficult task since it requires a view of a larger context to determine which spans of text correspond to event role fillers. We first investigate how end-to-end neural sequence models (with pre-trained language model representations) perform on document-level role filler extraction, as well as how the length of context captured affects the models’ performance. To dynamically aggregate information captured by neural representations learned at different levels of granularity (e.g., the sentence- and paragraph-level), we propose a novel multi-granularity reader. We evaluate our models on the MUC-4 event extraction dataset, and show that our best system performs substantially better than prior work. We also report findings on the relationship between context length and neural model performance on the task.

pdf bib
Improving Event Duration Prediction via Time-aware Pre-training
Zonglin Yang | Xinya Du | Alexander Rush | Claire Cardie
Findings of the Association for Computational Linguistics: EMNLP 2020

End-to-end models in NLP rarely encode external world knowledge about length of time. We introduce two effective models for duration prediction, which incorporate external knowledge by reading temporal-related news sentences (time-aware pre-training). Specifically, one model predicts the range/unit where the duration value falls in (R-PRED); and the other predicts the exact duration value (E-PRED). Our best model – E-PRED, substantially outperforms previous work, and captures duration information more accurately than R-PRED. We also demonstrate our models are capable of duration prediction in the unsupervised setting, outperforming the baselines.

pdf bib
WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
Faisal Ladhak | Esin Durmus | Claire Cardie | Kathleen McKeown
Findings of the Association for Computational Linguistics: EMNLP 2020

We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of cross-lingual abstractive summarization systems. We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors. We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We further propose a method for direct cross-lingual summarization (i.e., without requiring translation at inference time) by leveraging synthetic data and Neural Machine Translation as a pre-training step. Our method significantly outperforms the baseline approaches, while being more cost efficient during inference.

pdf bib
SUMSUM@FNS-2020 Shared Task
Siyan Zheng | Anneliese Lu | Claire Cardie
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

This paper describes the SUMSUM systems submitted to the Financial Narrative Summarization Shared Task (FNS-2020). We explore a section-based extractive summarization method tailored to the structure of financial reports: our best system parses the report Table of Contents (ToC), splits the report into narrative sections based on the ToC, and applies a BERT-based classifier to each section to determine whether it should be included in the summary. Our best system ranks 4<sup>th</sup>, 1<sup>st</sup>, 2<sup>nd</sup> and 17<sup>th</sup> on the Rouge-1, Rouge-2, Rouge-SU4, and Rouge-L official metrics, respectively. We also report results on the validation set using an alternative set of Rouge-based metrics that measure performance with respect to the best-matching of the available gold summaries.

pdf bib
Event Extraction by Answering (Almost) Natural Questions
Xinya Du | Claire Cardie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The problem of event extraction requires detecting the event trigger and extracting its corresponding arguments. Existing work in event argument extraction typically relies heavily on entity recognition as a preprocessing/concurrent step, causing the well-known problem of error propagation. To avoid this issue, we introduce a new paradigm for event extraction by formulating it as a question answering (QA) task that extracts the event arguments in an end-to-end manner. Empirical results demonstrate that our framework outperforms prior methods substantially; in addition, it is capable of extracting event arguments for roles not seen at training time (i.e., in a zero-shot learning setting).

pdf bib
Intrinsic Evaluation of Summarization Datasets
Rishi Bommasani | Claire Cardie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

High quality data forms the bedrock for building meaningful statistical models in NLP. Consequently, data quality must be evaluated either during dataset construction or *post hoc*. Almost all popular summarization datasets are drawn from natural sources and do not come with inherent quality assurance guarantees. In spite of this, data quality has gone largely unquestioned for many of these recent datasets. We perform the first large-scale evaluation of summarization datasets by introducing 5 intrinsic metrics and applying them to 10 popular datasets. We find that data usage in recent summarization research is sometimes inconsistent with the underlying properties of the data. Further, we discover that our metrics can serve the additional purpose of being inexpensive heuristics for detecting generically low quality examples.

pdf bib
Exploring the Role of Argument Structure in Online Debate Persuasion
Jialu Li | Esin Durmus | Claire Cardie
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Online debate forums provide users a platform to express their opinions on controversial topics while being exposed to opinions from diverse set of viewpoints. Existing work in Natural Language Processing (NLP) has shown that linguistic features extracted from the debate text and features encoding the characteristics of the audience are both critical in persuasion studies. In this paper, we aim to further investigate the role of discourse structure of the arguments from online debates in their persuasiveness. In particular, we use the factor graph model to obtain features for the argument structure of debates from an online debating platform and incorporate these features to an LSTM-based model to predict the debater that makes the most convincing arguments. We find that incorporating argument structure features play an essential role in achieving the best predictive performance in assessing the persuasiveness of the arguments on online debates.

pdf bib
Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension
Kai Sun | Dian Yu | Dong Yu | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 8

Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations. We present a comprehensive analysis of the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed for these real-world problems. We implement rule-based and popular neural methods and find that there is still a significant performance gap between the best performing model (68.5%) and human readers (96.0%), especiallyon problems that require prior knowledge. We further study the effects of distractor plausibility and data augmentation based on translated relevant datasets for English on model performance. We expect C3 to present great challenges to existing systems as answering 86.8% of questions requires both knowledge within and beyond the accompanying document, and we hope that C3 can serve as a platform to study how to leverage various kinds of prior knowledge to better understand a given written or orally oriented text. C3 is available at https://dataset.org/c3/.

2019

pdf bib
DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension
Kai Sun | Dian Yu | Jianshu Chen | Dong Yu | Yejin Choi | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 7

We present DREAM, the first dialogue-based multiple-choice reading comprehension data set. Collected from English as a Foreign Language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our data set contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension data sets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge. DREAM is available at https://dataset.org/dream/.

pdf bib
The Role of Pragmatic and Discourse Context in Determining Argument Impact
Esin Durmus | Faisal Ladhak | Claire Cardie
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Research in the social sciences and psychology has shown that the persuasiveness of an argument depends not only the language employed, but also on attributes of the source/communicator, the audience, and the appropriateness and strength of the argument’s claims given the pragmatic and discourse context of the argument. Among these characteristics of persuasive arguments, prior work in NLP does not explicitly investigate the effect of the pragmatic and discourse context when determining argument quality. This paper presents a new dataset to initiate the study of this aspect of argumentation: it consists of a diverse collection of arguments covering 741 controversial topics and comprising over 47,000 claims. We further propose predictive models that incorporate the pragmatic and discourse context of argumentative claims and show that they outperform models that rely only on claim-specific linguistic features for predicting the perceived impact of individual claims within a particular line of argument.

pdf bib
Improving Question Answering with External Knowledge
Xiaoman Pan | Kai Sun | Dian Yu | Jianshu Chen | Heng Ji | Claire Cardie | Dong Yu
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

We focus on multiple-choice question answering (QA) tasks in subject areas such as science, where we require both broad background knowledge and the facts from the given subject-area reference corpus. In this work, we explore simple yet effective methods for exploiting two sources of external knowledge for subject-area QA. The first enriches the original subject-area reference corpus with relevant text snippets extracted from an open-domain resource (i.e., Wikipedia) that cover potentially ambiguous concepts in the question and answer options. As in other QA research, the second method simply increases the amount of training data by appending additional in-domain subject-area instances. Experiments on three challenging multiple-choice science QA tasks (i.e., ARC-Easy, ARC-Challenge, and OpenBookQA) demonstrate the effectiveness of our methods: in comparison to the previous state-of-the-art, we obtain absolute gains in accuracy of up to 8.1%, 13.0%, and 12.8%, respectively. While we observe consistent gains when we introduce knowledge from Wikipedia, we find that employing additional QA training instances is not uniformly helpful: performance degrades when the added instances exhibit a higher level of difficulty than the original training data. As one of the first studies on exploiting unstructured external knowledge for subject-area QA, we hope our methods, observations, and discussion of the exposed limitations may shed light on further developments in the area.

pdf bib
A Corpus for Modeling User and Language Effects in Argumentation on Online Debating
Esin Durmus | Claire Cardie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Existing argumentation datasets have succeeded in allowing researchers to develop computational methods for analyzing the content, structure and linguistic features of argumentative text. They have been much less successful in fostering studies of the effect of “user” traits — characteristics and beliefs of the participants — on the debate/argument outcome as this type of user information is generally not available. This paper presents a dataset of 78,376 debates generated over a 10-year period along with surprisingly comprehensive participant profiles. We also complete an example study using the dataset to analyze the effect of selected user traits on the debate outcome in comparison to the linguistic features typically employed in studies of this kind.

pdf bib
Multi-Source Cross-Lingual Model Transfer: Learning What to Share
Xilun Chen | Ahmed Hassan Awadallah | Hany Hassan | Wei Wang | Claire Cardie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training data in multiple source languages is leveraged to further boost target language performance. Unlike most existing methods that rely only on language-invariant features for CLTL, our approach coherently utilizes both language-invariant and language-specific features at instance level. Our model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language. This enables our model to learn effectively what to share between various languages in the multilingual setup. Moreover, when coupled with unsupervised multilingual embeddings, our model can operate in a zero-resource setting where neither target language training data nor cross-lingual resources are available. Our model achieves significant performance gains over prior art, as shown in an extensive set of experiments over multiple text classification and sequence tagging tasks including a large-scale industry dataset.

pdf bib
Keeping Notes: Conditional Natural Language Generation with a Scratchpad Encoder
Ryan Benmalek | Madian Khabsa | Suma Desu | Claire Cardie | Michele Banko
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks. By enabling the decoder at each time step to write to all of the encoder output layers, Scratchpad can employ the encoder as a “scratchpad” memory to keep track of what has been generated so far and thereby guide future generation. We evaluate Scratchpad in the context of three well-studied natural language generation tasks — Machine Translation, Question Generation, and Text Summarization — and obtain state-of-the-art or comparable performance on standard datasets for each task. Qualitative assessments in the form of human judgements (question generation), attention visualization (MT), and sample output (summarization) provide further evidence of the ability of Scratchpad to generate fluent and expressive output.

pdf bib
Determining Relative Argument Specificity and Stance for Complex Argumentative Structures
Esin Durmus | Faisal Ladhak | Claire Cardie
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Systems for automatic argument generation and debate require the ability to (1) determine the stance of any claims employed in the argument and (2) assess the specificity of each claim relative to the argument context. Existing work on understanding claim specificity and stance, however, has been limited to the study of argumentative structures that are relatively shallow, most often consisting of a single claim that directly supports or opposes the argument thesis. In this paper, we tackle these tasks in the context of complex arguments on a diverse set of topics. In particular, our dataset consists of manually curated argument trees for 741 controversial topics covering 95,312 unique claims; lines of argument are generally of depth 2 to 6. We find that as the distance between a pair of claims increases along the argument path, determining the relative specificity of a pair of claims becomes easier and determining their relative stance becomes harder.

pdf bib
SPARSE: Structured Prediction using Argument-Relative Structured Encoding
Rishi Bommasani | Arzoo Katiyar | Claire Cardie
Proceedings of the Third Workshop on Structured Prediction for NLP

We propose structured encoding as a novel approach to learning representations for relations and events in neural structured prediction. Our approach explicitly leverages the structure of available relation and event metadata to generate these representations, which are parameterized by both the attribute structure of the metadata as well as the learned representation of the arguments of the relations and events. We consider affine, biaffine, and recurrent operators for building hierarchical representations and modelling underlying features. We apply our approach to the second-order structured prediction task studied in the 2016/2017 Belief and Sentiment analysis evaluations (BeSt): given a document and its entities, relations, and events (including metadata and mentions), determine the sentiment of each entity towards every relation and event in the document. Without task-specific knowledge sources or domain engineering, we significantly improve over systems and baselines that neglect the available metadata or its hierarchical structure. We observe across-the-board improvements on the BeSt 2016/2017 sentiment analysis task of at least 2.3 (absolute) and 10.6% (relative) F-measure over the previous state-of-the-art.

pdf bib
Persuasion of the Undecided: Language vs. the Listener
Liane Longpre | Esin Durmus | Claire Cardie
Proceedings of the 6th Workshop on Argument Mining

This paper examines the factors that govern persuasion for a priori UNDECIDED versus DECIDED audience members in the context of on-line debates. We separately study two types of influences: linguistic factors — features of the language of the debate itself; and audience factors — features of an audience member encoding demographic information, prior beliefs, and debate platform behavior. In a study of users of a popular debate platform, we find first that different combinations of linguistic features are critical for predicting persuasion outcomes for UNDECIDED versus DECIDED members of the audience. We additionally find that audience factors have more influence on predicting the side (PRO/CON) that persuaded UNDECIDED users than for DECIDED users that flip their stance to the opposing side. Our results emphasize the importance of considering the undecided and decided audiences separately when studying linguistic factors of persuasion.

pdf bib
Be Consistent! Improving Procedural Text Comprehension using Label Consistency
Xinya Du | Bhavana Dalvi | Niket Tandon | Antoine Bosselut | Wen-tau Yih | Peter Clark | Claire Cardie
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Our goal is procedural text comprehension, namely tracking how the properties of entities (e.g., their location) change with time given a procedural text (e.g., a paragraph about photosynthesis, a recipe). This task is challenging as the world is changing throughout the text, and despite recent advances, current systems still struggle with this task. Our approach is to leverage the fact that, for many procedural texts, multiple independent descriptions are readily available, and that predictions from them should be consistent (label consistency). We present a new learning framework that leverages label consistency during training, allowing consistency bias to be built into the model. Evaluation on a standard benchmark dataset for procedural text, ProPara (Dalvi et al., 2018), shows that our approach significantly improves prediction performance (F1) over prior state-of-the-art systems.

pdf bib
Improving Machine Reading Comprehension with General Reading Strategies
Kai Sun | Dian Yu | Dong Yu | Claire Cardie
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Reading strategies have been shown to improve comprehension levels, especially for readers lacking adequate prior knowledge. Just as the process of knowledge accumulation is time-consuming for human readers, it is resource-demanding to impart rich general domain knowledge into a deep language model via pre-training. Inspired by reading strategies identified in cognitive science, and given limited computational resources - just a pre-trained model and a fixed number of training instances - we propose three general strategies aimed to improve non-extractive machine reading comprehension (MRC): (i) BACK AND FORTH READING that considers both the original and reverse order of an input sequence, (ii) HIGHLIGHTING, which adds a trainable embedding to the text embedding of tokens that are relevant to the question and candidate answers, and (iii) SELF-ASSESSMENT that generates practice questions and candidate answers directly from the text in an unsupervised manner. By fine-tuning a pre-trained language model (Radford et al., 2018) with our proposed strategies on the largest general domain multiple-choice MRC dataset RACE, we obtain a 5.8% absolute increase in accuracy over the previous best result achieved by the same pre-trained model fine-tuned on RACE without the use of strategies. We further fine-tune the resulting model on a target MRC task, leading to an absolute improvement of 6.2% in average accuracy over previous state-of-the-art approaches on six representative non-extractive MRC datasets from different domains (i.e., ARC, OpenBookQA, MCTest, SemEval-2018 Task 11, ROCStories, and MultiRC). These results demonstrate the effectiveness of our proposed strategies and the versatility and general applicability of our fine-tuned models that incorporate these strategies. Core code is available at https://github.com/nlpdata/strategy/.

2018

pdf bib
A Corpus of eRulemaking User Comments for Measuring Evaluability of Arguments
Joonsuk Park | Claire Cardie
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Nested Named Entity Recognition Revisited
Arzoo Katiyar | Claire Cardie
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We propose a novel recurrent neural network-based approach to simultaneously handle nested named entity recognition and nested entity mention detection. The model learns a hypergraph representation for nested entities using features extracted from a recurrent neural network. In evaluations on three standard data sets, we show that our approach significantly outperforms existing state-of-the-art methods, which are feature-based. The approach is also efficient: it operates linearly in the number of tokens and the number of possible output labels at any token. Finally, we present an extension of our model that jointly learns the head of each entity mention.

pdf bib
Exploring the Role of Prior Beliefs for Argument Persuasion
Esin Durmus | Claire Cardie
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Public debate forums provide a common platform for exchanging opinions on a topic of interest. While recent studies in natural language processing (NLP) have provided empirical evidence that the language of the debaters and their patterns of interaction play a key role in changing the mind of a reader, research in psychology has shown that prior beliefs can affect our interpretation of an argument and could therefore constitute a competing alternative explanation for resistance to changing one’s stance. To study the actual effect of language use vs. prior beliefs on persuasion, we provide a new dataset and propose a controlled setting that takes into consideration two reader-level factors: political and religious ideology. We find that prior beliefs affected by these reader-level factors play a more important role than language use effects and argue that it is important to account for them in NLP studies of persuasion.

pdf bib
Multinomial Adversarial Networks for Multi-Domain Text Classification
Xilun Chen | Claire Cardie
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Many text classification tasks are known to be highly domain-dependent. Unfortunately, the availability of training data can vary drastically across domains. Worse still, for some domains there may not be any annotated data at all. In this work, we propose a multinomial adversarial network (MAN) to tackle this real-world problem of multi-domain text classification (MDTC) in which labeled data may exist for multiple domains, but in insufficient amounts to train effective classifiers for one or more of the domains. We provide theoretical justifications for the MAN framework, proving that different instances of MANs are essentially minimizers of various f-divergence metrics (Ali and Silvey, 1966) among multiple probability distributions. MANs are thus a theoretically sound generalization of traditional adversarial networks that discriminate over two distributions. More specifically, for the MDTC task, MAN learns features that are invariant across multiple domains by resorting to its ability to reduce the divergence among the feature distributions of each domain. We present experimental results showing that MANs significantly outperform the prior art on the MDTC task. We also show that MANs achieve state-of-the-art performance for domains with no labeled data.

pdf bib
Unsupervised Multilingual Word Embeddings
Xilun Chen | Claire Cardie
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Multilingual Word Embeddings (MWEs) represent words from multiple languages in a single distributional vector space. Unsupervised MWE (UMWE) methods acquire multilingual embeddings without cross-lingual supervision, which is a significant advantage over traditional supervised approaches and opens many new possibilities for low-resource languages. Prior art for learning UMWEs, however, merely relies on a number of independently trained Unsupervised Bilingual Word Embeddings (UBWEs) to obtain multilingual embeddings. These methods fail to leverage the interdependencies that exist among many languages. To address this shortcoming, we propose a fully unsupervised framework for learning MWEs that directly exploits the relations between all language pairs. Our model substantially outperforms previous approaches in the experiments on multilingual word translation and cross-lingual word similarity. In addition, our model even beats supervised approaches trained with cross-lingual resources.

pdf bib
Towards Dynamic Computation Graphs via Sparse Latent Structure
Vlad Niculae | André F. T. Martins | Claire Cardie
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Deep NLP models benefit from underlying structures in the data—e.g., parse trees—typically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff: either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribution over latent structures, we propose a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor. To the best of our knowledge, our method is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability.

pdf bib
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification
Xilun Chen | Yu Sun | Ben Athiwaratkun | Claire Cardie | Kilian Weinberger
Transactions of the Association for Computational Linguistics, Volume 6

In recent years great success has been achieved in sentiment classification for English, thanks in part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance of labeled data. To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN1) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. ADAN has two discriminative branches: a sentiment classifier and an adversarial language discriminator. Both branches take input from a shared feature extractor to learn hidden representations that are simultaneously indicative for the classification task and invariant across languages. Experiments on Chinese and Arabic sentiment classification demonstrate that ADAN significantly outperforms state-of-the-art systems.

pdf bib
Harvesting Paragraph-level Question-Answer Pairs from Wikipedia
Xinya Du | Claire Cardie
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study the task of generating from Wikipedia articles question-answer pairs that cover content beyond a single sentence. We propose a neural network approach that incorporates coreference knowledge via a novel gating mechanism. As compared to models that only take into account sentence-level information (Heilman and Smith, 2010; Du et al., 2017; Zhou et al., 2017), we find that the linguistic knowledge introduced by the coreference representation aids question generation significantly, producing models that outperform the current state-of-the-art. We apply our system (composed of an answer span extraction system and the passage-level QG system) to the 10,000 top ranking Wikipedia articles and create a corpus of over one million question-answer pairs. We provide qualitative analysis for the this large-scale generated corpus from Wikipedia.

pdf bib
Understanding the Effect of Gender and Stance in Opinion Expression in Debates on “Abortion”
Esin Durmus | Claire Cardie
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

In this paper, we focus on understanding linguistic differences across groups with different self-identified gender and stance in expressing opinions about ABORTION. We provide a new dataset consisting of users’ gender, stance on ABORTION as well as the debates in ABORTION drawn from debate.org. We use the gender and stance information to identify significant linguistic differences across individuals with different gender and stance. We show the importance of considering the stance information along with the gender since we observe significant linguistic differences across individuals with different stance even within the same gender group.

2017

pdf bib
Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees
Arzoo Katiyar | Claire Cardie
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a novel attention-based recurrent neural network for joint extraction of entity mentions and relations. We show that attention along with long short term memory (LSTM) network can extract semantic relations between entity mentions without having access to dependency trees. Experiments on Automatic Content Extraction (ACE) corpora show that our model significantly outperforms feature-based joint model by Li and Ji (2014). We also compare our model with an end-to-end tree-based LSTM model (SPTree) by Miwa and Bansal (2016) and show that our model performs within 1% on entity mentions and 2% on relations. Our fine-grained analysis also shows that our model performs significantly better on Agent-Artifact relations, while SPTree performs better on Physical and Part-Whole relations.

pdf bib
Argument Mining with Structured SVMs and RNNs
Vlad Niculae | Joonsuk Park | Claire Cardie
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets.

pdf bib
Learning to Ask: Neural Question Generation for Reading Comprehension
Xinya Du | Junru Shao | Claire Cardie
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study automatic question generation for sentences from text passages in reading comprehension. We introduce an attention-based sequence learning model for the task and investigate the effect of encoding sentence- vs. paragraph-level information. In contrast to all previous work, our model does not rely on hand-crafted rules or a sophisticated NLP pipeline; it is instead trainable end-to-end via sequence-to-sequence learning. Automatic evaluation results show that our system significantly outperforms the state-of-the-art rule-based system. In human evaluations, questions generated by our system are also rated as being more natural (i.e.,, grammaticality, fluency) and as more difficult to answer (in terms of syntactic and lexical divergence from the original text and reasoning needed to answer).

pdf bib
Proceedings of the 4th Workshop on Argument Mining
Ivan Habernal | Iryna Gurevych | Kevin Ashley | Claire Cardie | Nancy Green | Diane Litman | Georgios Petasis | Chris Reed | Noam Slonim | Vern Walker
Proceedings of the 4th Workshop on Argument Mining

pdf bib
Identifying Where to Focus in Reading Comprehension for Neural Question Generation
Xinya Du | Claire Cardie
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

A first step in the task of automatically generating questions for testing reading comprehension is to identify question-worthy sentences, i.e. sentences in a text passage that humans find it worthwhile to ask questions about. We propose a hierarchical neural sentence-level sequence tagging model for this task, which existing approaches to question generation have ignored. The approach is fully data-driven — with no sophisticated NLP pipelines or any hand-crafted rules/features — and compares favorably to a number of baselines when evaluated on the SQuAD data set. When incorporated into an existing neural question generation system, the resulting end-to-end system achieves state-of-the-art performance for paragraph-level question generation for reading comprehension.

2016

pdf bib
Investigating LSTMs for Joint Extraction of Opinion Entities and Relations
Arzoo Katiyar | Claire Cardie
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Proceedings of the 2nd Workshop on Argumentation Mining
Claire Cardie
Proceedings of the 2nd Workshop on Argumentation Mining

pdf bib
Socially-Informed Timeline Generation for Complex Events
Lu Wang | Claire Cardie | Galen Marchetti
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
Eneko Agirre | Carmen Banea | Claire Cardie | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo | Iñigo Lopez-Gazpio | Montse Maritxalar | Rada Mihalcea | German Rigau | Larraitz Uria | Janyce Wiebe
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
A Hierarchical Distance-dependent Bayesian Model for Event Coreference Resolution
Bishan Yang | Claire Cardie | Peter Frazier
Transactions of the Association for Computational Linguistics, Volume 3

We present a novel hierarchical distance-dependent Bayesian model for event coreference resolution. While existing generative models for event coreference resolution are completely unsupervised, our model allows for the incorporation of pairwise distances between event mentions — information that is widely used in supervised coreference models to guide the generative clustering processing for better event clustering both within and across documents. We model the distances between event mentions using a feature-rich learnable distance function and encode them as Bayesian priors for nonparametric clustering. Experiments on the ECB+ corpus show that our model outperforms state-of-the-art methods for both within- and cross-document event coreference resolution.

2014

pdf bib
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
Eneko Agirre | Carmen Banea | Claire Cardie | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo | Rada Mihalcea | German Rigau | Janyce Wiebe
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
SimCompass: Using Deep Learning Word Embeddings to Assess Cross-level Similarity
Carmen Banea | Di Chen | Rada Mihalcea | Claire Cardie | Janyce Wiebe
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Book Reviews: Sentiment Analysis and Opinion Mining by Bing Liu
Claire Cardie
Computational Linguistics, Volume 40, Issue 2 - June 2014

pdf bib
Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization
Bishan Yang | Claire Cardie
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Towards a General Rule for Identifying Deceptive Opinion Spam
Jiwei Li | Myle Ott | Claire Cardie | Eduard Hovy
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute Detection
Lu Wang | Claire Cardie
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Identifying Appropriate Support for Propositions in Online User Comments
Joonsuk Park | Claire Cardie
Proceedings of the First Workshop on Argumentation Mining

pdf bib
Overview of the 2014 NLP Unshared Task in PoliInformatics
Noah A. Smith | Claire Cardie | Anne Washington | John Wilkerson
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Improving Agreement and Disagreement Identification in Online Discussions with A Socially-Tuned Sentiment Lexicon
Lu Wang | Claire Cardie
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
The Enrollment Effect: A Study of Amazon’s Vine Program
Dinesh Puranam | Claire Cardie
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

pdf bib
Joint Modeling of Opinion Expression Extraction and Attribute Classification
Bishan Yang | Claire Cardie
Transactions of the Association for Computational Linguistics, Volume 2

In this paper, we study the problems of opinion expression extraction and expression-level polarity and intensity classification. Traditional fine-grained opinion analysis systems address these problems in isolation and thus cannot capture interactions among the textual spans of opinion expressions and their opinion-related properties. We present two types of joint approaches that can account for such interactions during 1) both learning and inference or 2) only during inference. Extensive experiments on a standard dataset demonstrate that our approaches provide substantial improvements over previously published results. By analyzing the results, we gain some insight into the advantages of different joint models.

pdf bib
Query-Focused Opinion Summarization for User-Generated Content
Lu Wang | Hema Raghavan | Claire Cardie | Vittorio Castelli
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Opinion Mining with Deep Recurrent Neural Networks
Ozan İrsoy | Claire Cardie
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts
Jiwei Li | Alan Ritter | Claire Cardie | Eduard Hovy
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Identifying Manipulated Offerings on Review Portals
Jiwei Li | Myle Ott | Claire Cardie
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Lu Wang | Hema Raghavan | Vittorio Castelli | Radu Florian | Claire Cardie
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Domain-Independent Abstract Generation for Focused Meeting Summarization
Lu Wang | Claire Cardie
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Joint Inference for Fine-grained Opinion Extraction
Bishan Yang | Claire Cardie
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
TopicSpam: a Topic-Model based approach for spam detection
Jiwei Li | Claire Cardie | Sujian Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Negative Deceptive Opinion Spam
Myle Ott | Claire Cardie | Jeffrey T. Hancock
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
CPN-CORE: A Text Semantic Similarity System Infused with Opinion Knowledge
Carmen Banea | Yoonjung Choi | Lingjia Deng | Samer Hassan | Michael Mohler | Bishan Yang | Claire Cardie | Rada Mihalcea | Jan Wiebe
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

pdf bib
In Search of a Gold Standard in Studies of Deception
Stephanie Gokhman | Jeff Hancock | Poornima Prabhu | Myle Ott | Claire Cardie
Proceedings of the Workshop on Computational Approaches to Deception Detection

pdf bib
Unsupervised Topic Modeling Approaches to Decision Summarization in Spoken Meetings
Lu Wang | Claire Cardie
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Improving Implicit Discourse Relation Recognition Through Feature Set Optimization
Joonsuk Park | Claire Cardie
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Focused Meeting Summarization via Unsupervised Relation Extraction
Lu Wang | Claire Cardie
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Extracting Opinion Expressions with semi-Markov Conditional Random Fields
Bishan Yang | Claire Cardie
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Automatically Creating General-Purpose Opinion Summaries from Text
Veselin Stoyanov | Claire Cardie
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Compositional Matrix-Space Models for Sentiment Analysis
Ainur Yessenalina | Claire Cardie
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Summarizing Decisions in Spoken Meetings
Lu Wang | Claire Cardie
Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages

pdf bib
Reconciling OntoNotes: Unrestricted Coreference Resolution in OntoNotes with Reconcile.
Veselin Stoyanov | Uday Babbar | Pracheer Gupta | Claire Cardie
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
Finding Deceptive Opinion Spam by Any Stretch of the Imagination
Myle Ott | Yejin Choi | Claire Cardie | Jeffrey T. Hancock
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora
Bin Lu | Chenhao Tan | Claire Cardie | Benjamin K. Tsou
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Coreference Resolution with Reconcile
Veselin Stoyanov | Claire Cardie | Nathan Gilbert | Ellen Riloff | David Buttler | David Hysom
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
Yejin Choi | Claire Cardie
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Automatically Generating Annotator Rationales to Improve Sentiment Classification
Ainur Yessenalina | Yejin Choi | Claire Cardie
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Multi-Level Structured Models for Document-Level Sentiment Classification
Ainur Yessenalina | Yisong Yue | Claire Cardie
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification
Yejin Choi | Claire Cardie
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Conundrums in Noun Phrase Coreference Resolution: Making Sense of the State-of-the-Art
Veselin Stoyanov | Nathan Gilbert | Claire Cardie | Ellen Riloff
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Annotating Topics of Opinions
Veselin Stoyanov | Claire Cardie
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Fine-grained subjectivity analysis has been the subject of much recent research attention. As a result, the field has gained a number of working definitions, technical approaches and manually annotated corpora that cover many facets of subjectivity. Little work has been done, however, on one aspect of fine-grained opinions - the specification and identification of opinion topics. In particular, due to the difficulty of manual opinion topic annotation, no general-purpose opinion corpus with information about topics of fine-grained opinions currently exists. In this paper, we propose a methodology for the manual annotation of opinion topics and use it to annotate a portion of an existing general-purpose opinion corpus with opinion topic information. Inter-annotator agreement results according to a number of metrics suggest that the annotations are reliable.

pdf bib
An eRulemaking Corpus: Identifying Substantive Issues in Public Comments
Claire Cardie | Cynthia Farina | Matt Rawding | Adil Aijaz
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe the creation of a corpus that supports a real-world hierarchical text categorization task in the domain of electronic rulemaking (eRulemaking). Features of the task and of the eRulemaking domain engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning task. Interannotator agreement results are presented for a group of six annotators. We also briefly describe the results of experiments that apply standard and hierarchical text categorization techniques to the eRulemaking data sets. The corpus is the first in a series of related sentence-level text categorization corpora to be developed in the eRulemaking domain.

pdf bib
Topic Identification for Fine-Grained Opinion Analysis
Veselin Stoyanov | Claire Cardie
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework
Mohit Bansal | Claire Cardie | Lillian Lee
Coling 2008: Companion volume: Posters

pdf bib
Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis
Yejin Choi | Claire Cardie
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Structured Local Training and Biased Potential Functions for Conditional Random Fields with Application to Coreference Resolution
Yejin Choi | Claire Cardie
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

2006

pdf bib
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
Nicoletta Calzolari | Claire Cardie | Pierre Isabelle
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Toward Opinion Summarization: Linking the Sources
Veselin Stoyanov | Claire Cardie
Proceedings of the Workshop on Sentiment and Subjectivity in Text

pdf bib
Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning
Veselin Stoyanov | Claire Cardie
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
Joint Extraction of Entities and Relations for Opinion Recognition
Yejin Choi | Eric Breck | Claire Cardie
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns
Yejin Choi | Claire Cardie | Ellen Riloff | Siddharth Patwardhan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Optimizing to Arbitrary NLP Metrics using Ensemble Selection
Art Munson | Claire Cardie | Rich Caruana
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Perspective Question Answering Using the OpQA Corpus
Veselin Stoyanov | Claire Cardie | Janyce Wiebe
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
OpinionFinder: A System for Subjectivity Analysis
Theresa Wilson | Paul Hoffmann | Swapna Somasundaran | Jason Kessler | Janyce Wiebe | Yejin Choi | Claire Cardie | Ellen Riloff | Siddharth Patwardhan
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

2004

pdf bib
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions
Eric Breck | Claire Cardie
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Bootstrapping Coreference Classifiers with Multiple Machine Learning Algorithms
Vincent Ng | Claire Cardie
Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing

pdf bib
Weakly Supervised Natural Language Learning Without Redundant Views
Vincent Ng | Claire Cardie
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

2002

pdf bib
Selecting sentences for multidocument summaries using randomized local search
Michael White | Claire Cardie
Proceedings of the ACL-02 Workshop on Automatic Summarization

pdf bib
Combining Sample Selection and Error-Driven Pruning for Machine Learning of Coreference Rules
Vincent Ng | Claire Cardie
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib
Improving Machine Learning Approaches to Coreference Resolution
Vincent Ng | Claire Cardie
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
Identifying Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution
Vincent Ng | Claire Cardie
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Multidocument Summarization via Information Extraction
Michael White | Tanya Korelsky | Claire Cardie | Vincent Ng | David Pierce | Kiri Wagstaff
Proceedings of the First International Conference on Human Language Technology Research

pdf bib
Limitations of Co-Training for Natural Language Learning from Large Datasets
David Pierce | Claire Cardie
Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

2000

pdf bib
Towards Translingual Information Access using Portable Information Extraction
Michael White | Claire Cardie | Chung-hye Han | Nari Kim | Benoit Lavoie | Martha Palmer | Owen Rainbow | Juntae Yoon
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems

pdf bib
Examining the Role of Statistical and Linguistic Knowledge Sources in a General-Knowledge Question-Answering System
Claire Cardie | Vincent Ng | David Pierce | Chris Buckley
Sixth Applied Natural Language Processing Conference

1999

pdf bib
Noun Phrase Coreference as Clustering
Claire Cardie | Kiri Wagstaff
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

pdf bib
The Smart/Empire TIPSTER IR System
Chris Buckley | Janet Walz | Claire Cardie | Scott Mardis | Mandar Mitra | David Pierce | Kiri Wagstaff
TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998

pdf bib
Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
Claire Cardie | David Pierce
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
Claire Cardie | David Pierce
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1996

pdf bib
Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge
Claire Cardie
Conference on Empirical Methods in Natural Language Processing

1993

pdf bib
UMass/Hughes: Description of the CIRCUS System Used for MUC-51
W. Lehnert | J. McCarthy | S. Soderland | E. Riloff | C. Cardie | J. Peterson | F. Feng
Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference Held in Baltimore, Maryland, August 25-27, 1993

pdf bib
UMass/Hughes: Description of the CIRCUS System Used for TIPSTER Text
W. Lehnert | J. McCarthy | S. Soderland | E. Riloff | C. Cardie | J. Peterson | F. Feng
TIPSTER TEXT PROGRAM: PHASE I: Proceedings of a Workshop held at Fredricksburg, Virginia, September 19-23, 1993

1992

pdf bib
University of Massachusetts: MUC-4 Test Results and Analysis
W. Lehnert | C. Cardie | D. Fisher | J. McCarthy | E. Riloff | S. Soderland
Fourth Message Uunderstanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992

pdf bib
University of Massachusetts: Description of the CIRCUS System as Used for MUC-4
W. Lehnert | C. Cardie | D. Fisher | J. McCarthy | E. Riloff | S. Soderland
Fourth Message Uunderstanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia, June 16-18, 1992

pdf bib
Corpus-Based Acquisition of Relative Pronoun Disambiguation Heuristics
Claire Cardie
30th Annual Meeting of the Association for Computational Linguistics

1991

pdf bib
University of Massachusetts: MUC-3 Test Results and Analysis
Wendy Lehnert | Claire Cardie | David Fisher | Ellen Riloff | Robert Williams
Third Message Uunderstanding Conference (MUC-3): Proceedings of a Conference Held in San Diego, California, May 21-23, 1991

pdf bib
University of Massachusetts: Description of the CIRCUS System as Used for MUC-3
Wendy Lehnert | Claire Cardie | David Fisher | Ellen Riloff | Robert Williams
Third Message Uunderstanding Conference (MUC-3): Proceedings of a Conference Held in San Diego, California, May 21-23, 1991

Search
Co-authors