Dan Goldwasser


2020

pdf bib
Understanding the Language of Political Agreement and Disagreement in Legislative Texts
Maryam Davoodi | Eric Waltenburg | Dan Goldwasser
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

While national politics often receive the spotlight, the overwhelming majority of legislation proposed, discussed, and enacted is done at the state level. Despite this fact, there is little awareness of the dynamics that lead to adopting these policies. In this paper, we take the first step towards a better understanding of these processes and the underlying dynamics that shape them, using data-driven methods. We build a new large-scale dataset, from multiple data sources, connecting state bills and legislator information, geographical information about their districts, and donations and donors’ information. We suggest a novel task, predicting the legislative body’s vote breakdown for a given bill, according to different criteria of interest, such as gender, rural-urban and ideological splits. Finally, we suggest a shared relational embedding model, representing the interactions between the text of the bill and the legislative context in which it is presented. Our experiments show that providing this context helps improve the prediction over strong text-based models.

pdf bib
Weakly-Supervised Modeling of Contextualized Event Embedding for Discourse Relations
I-Ta Lee | Maria Leonor Pacheco | Dan Goldwasser
Findings of the Association for Computational Linguistics: EMNLP 2020

Representing, and reasoning over, long narratives requires models that can deal with complex event structures connected through multiple relationship types. This paper suggests to represent this type of information as a narrative graph and learn contextualized event representations over it using a relational graph neural network model. We train our model to capture event relations, derived from the Penn Discourse Tree Bank, on a huge corpus, and show that our multi-relational contextualized event representation can improve performance when learning script knowledge without direct supervision and provide a better representation for the implicit discourse sense classification task.

pdf bib
Predicting Stance Change Using Modular Architectures
Aldo Porco | Dan Goldwasser
Proceedings of the 28th International Conference on Computational Linguistics

The ability to change a person’s mind on a given issue depends both on the arguments they are presented with and on their underlying perspectives and biases on that issue. Predicting stance changes require characterizing both aspects and the interaction between them, especially in realistic settings in which stance changes are very rare. In this paper, we suggest a modular learning approach, which decomposes the task into multiple modules, focusing on different aspects of the interaction between users, their beliefs, and the arguments they are exposed to. Our experiments show that our modular approach archives significantly better results compared to the end-to-end approach using BERT over the same inputs.

pdf bib
Cross-Lingual Document Retrieval with Smooth Learning
Jiapeng Liu | Xiao Zhang | Dan Goldwasser | Xiao Wang
Proceedings of the 28th International Conference on Computational Linguistics

Cross-lingual document search is an information retrieval task in which the queries’ language and the documents’ language are different. In this paper, we study the instability of neural document search models and propose a novel end-to-end robust framework that achieves improved performance in cross-lingual search with different documents’ languages. This framework includes a novel measure of the relevance, smooth cosine similarity, between queries and documents, and a novel loss function, Smooth Ordinal Search Loss, as the objective function. We further provide theoretical guarantee on the generalization error bound for the proposed framework. We conduct experiments to compare our approach with other document search models, and observe significant gains under commonly used ranking metrics on the cross-lingual document retrieval task in a variety of languages.

pdf bib
Semi-supervised Autoencoding Projective Dependency Parsing
Xiao Zhang | Dan Goldwasser
Proceedings of the 28th International Conference on Computational Linguistics

We describe two end-to-end autoencoding models for semi-supervised graph-based projective dependency parsing. The first model is a Locally Autoencoding Parser (LAP) encoding the input using continuous latent variables in a sequential manner; The second model is a Globally Autoencoding Parser (GAP) encoding the input into dependency trees as latent variables, with exact inference. Both models consist of two parts: an encoder enhanced by deep neural networks (DNN) that can utilize the contextual information to encode the input into latent variables, and a decoder which is a generative model able to reconstruct the input. Both LAP and GAP admit a unified structure with different loss functions for labeled and unlabeled data with shared parameters. We conducted experiments on WSJ and UD dependency parsing data sets, showing that our models can exploit the unlabeled data to improve the performance given a limited amount of labeled data, and outperform a previously proposed semi-supervised model.

pdf bib
Weakly Supervised Learning of Nuanced Frames for Analyzing Polarization in News Media
Shamik Roy | Dan Goldwasser
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we suggest a minimally supervised approach for identifying nuanced frames in news article coverage of politically divisive topics. We suggest to break the broad policy frames suggested by Boydstun et al., 2014 into fine-grained subframes which can capture differences in political ideology in a better way. We evaluate the suggested subframes and their embedding, learned using minimal supervision, over three topics, namely, immigration, gun-control, and abortion. We demonstrate the ability of the subframes to capture ideological differences and analyze political discourse in news media.

pdf bib
Identifying Collaborative Conversations using Latent Discourse Behaviors
Ayush Jain | Maria Leonor Pacheco | Steven Lancette | Mahak Goindani | Dan Goldwasser
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In this work, we study collaborative online conversations. Such conversations are rich in content, constructive and motivated by a shared goal. Automatically identifying such conversations requires modeling complex discourse behaviors, which characterize the flow of information, sentiment and community structure within discussions. To help capture these behaviors, we define a hybrid relational model in which relevant discourse behaviors are formulated as discrete latent variables and scored using neural networks. These variables provide the information needed for predicting the overall collaborative characterization of the entire conversational thread. We show that adding inductive bias in the form of latent variables results in performance improvement, while providing a natural way to explain the decision.

pdf bib
Semi-supervised Parsing with a Variational Autoencoding Parser
Xiao Zhang | Dan Goldwasser
Proceedings of the 16th International Conference on Parsing Technologies and the IWPT 2020 Shared Task on Parsing into Enhanced Universal Dependencies

We propose an end-to-end variational autoencoding parsing (VAP) model for semi-supervised graph-based projective dependency parsing. It encodes the input using continuous latent variables in a sequential manner by deep neural networks (DNN) that can utilize the contextual information, and reconstruct the input using a generative model. The VAP model admits a unified structure with different loss functions for labeled and unlabeled data with shared parameters. We conducted experiments on the WSJ data sets, showing the proposed model can use the unlabeled data to increase the performance on a limited amount of labeled data, on a par with a recently proposed semi-supervised parser with faster inference.

pdf bib
“where is this relationship going?”: Understanding Relationship Trajectories in Narrative Text
Keen You | Dan Goldwasser
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

We examine a new commonsense reasoning task: given a narrative describing a social interaction that centers on two protagonists, systems make inferences about the underlying relationship trajectory. Specifically, we propose two evaluation tasks: Relationship Outlook Prediction MCQ and Resolution Prediction MCQ. In Relationship Outlook Prediction, a system maps an interaction to a relationship outlook that captures how the interaction is expected to change the relationship. In Resolution Prediction, a system attributes a given relationship outlook to a particular resolution that explains the outcome. These two tasks parallel two real-life questions that people frequently ponder upon as they navigate different social situations: “where is this relationship going?” and “how did we end up here?”. To facilitate the investigation of human social relationships through these two tasks, we construct a new dataset, Social Narrative Tree, which consists of 1250 stories documenting a variety of daily social interactions. The narratives encode a multitude of social elements that interweave to give rise to rich commonsense knowledge of how relationships evolve with respect to social interactions. We establish baseline performances using language models and the accuracies are significantly lower than human performance. The results demonstrate that models need to look beyond syntactic and semantic signals to comprehend complex human relationships.

2019

pdf bib
Sentiment Tagging with Partial Labels using Modular Architectures
Xiao Zhang | Dan Goldwasser
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Many NLP learning tasks can be decomposed into several distinct sub-tasks, each associated with a partial label. In this paper we focus on a popular class of learning problems, sequence prediction applied to several sentiment analysis tasks, and suggest a modular learning approach in which different sub-tasks are learned using separate functional modules, combined to perform the final task while sharing information. Our experiments show this approach helps constrain the learning process and can alleviate some of the supervision efforts.

pdf bib
Encoding Social Information with Graph Convolutional Networks forPolitical Perspective Detection in News Media
Chang Li | Dan Goldwasser
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Identifying the political perspective shaping the way news events are discussed in the media is an important and challenging task. In this paper, we highlight the importance of contextualizing social information, capturing how this information is disseminated in social networks. We use Graph Convolutional Networks, a recently proposed neural architecture for representing relational information, to capture the documents’ social context. We show that social information can be used effectively as a source of distant supervision, and when direct supervision is available, even little social information can significantly improve performance.

pdf bib
Multi-Relational Script Learning for Discourse Relations
I-Ta Lee | Dan Goldwasser
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Modeling script knowledge can be useful for a wide range of NLP tasks. Current statistical script learning approaches embed the events, such that their relationships are indicated by their similarity in the embedding. While intuitive, these approaches fall short of representing nuanced relations, needed for downstream tasks. In this paper, we suggest to view learning event embedding as a multi-relational problem, which allows us to capture different aspects of event pairs. We model a rich set of event relations, such as Cause and Contrast, derived from the Penn Discourse Tree Bank. We evaluate our model on three types of tasks, the popular Mutli-Choice Narrative Cloze and its variants, several multi-relational prediction tasks, and a related downstream task—implicit discourse sense classification.

pdf bib
Modeling Behavioral Aspects of Social Media Discourse for Moral Classification
Kristen Johnson | Dan Goldwasser
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

Political discourse on social media microblogs, specifically Twitter, has become an undeniable part of mainstream U.S. politics. Given the length constraint of tweets, politicians must carefully word their statements to ensure their message is understood by their intended audience. This constraint often eliminates the context of the tweet, making automatic analysis of social media political discourse a difficult task. To overcome this challenge, we propose simultaneous modeling of high-level abstractions of political language, such as political slogans and framing strategies, with abstractions of how politicians behave on Twitter. These behavioral abstractions can be further leveraged as forms of supervision in order to increase prediction accuracy, while reducing the burden of annotation. In this work, we use Probabilistic Soft Logic (PSL) to build relational models to capture the similarities in language and behavior that obfuscate political messages on Twitter. When combined, these descriptors reveal the moral foundations underlying the discourse of U.S. politicians online, across differing governing administrations, showing how party talking points remain cohesive or change over time.

pdf bib
Improving Natural Language Interaction with Robots Using Advice
Nikhil Mehta | Dan Goldwasser
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Over the last few years, there has been growing interest in learning models for physically grounded language understanding tasks, such as the popular blocks world domain. These works typically view this problem as a single-step process, in which a human operator gives an instruction and an automated agent is evaluated on its ability to execute it. In this paper we take the first step towards increasing the bandwidth of this interaction, and suggest a protocol for including advice, high-level observations about the task, which can help constrain the agent’s prediction. We evaluate our approach on the blocks world task, and show that even simple advice can help lead to significant performance improvements. To help reduce the effort involved in supplying the advice, we also explore model self-generated advice which can still improve results.

pdf bib
Using Natural Language Relations between Answer Choices for Machine Comprehension
Rajkumar Pujari | Dan Goldwasser
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

While evaluating an answer choice for Reading Comprehension task, other answer choices available for the question and the answers of related questions about the same paragraph often provide valuable information. In this paper, we propose a method to leverage the natural language relations between the answer choices, such as entailment and contradiction, to improve the performance of machine comprehension. We use a stand-alone question answering (QA) system to perform QA task and a Natural Language Inference (NLI) system to identify the relations between the choice pairs. Then we perform inference using an Integer Linear Programming (ILP)-based relational framework to re-evaluate the decisions made by the standalone QA system in light of the relations identified by the NLI system. We also propose a multitask learning model that learns both the tasks jointly.

2018

pdf bib
Structured Representation Learning for Online Debate Stance Prediction
Chang Li | Aldo Porco | Dan Goldwasser
Proceedings of the 27th International Conference on Computational Linguistics

Online debates can help provide valuable information about various perspectives on a wide range of issues. However, understanding the stances expressed in these debates is a highly challenging task, which requires modeling both textual content and users’ conversational interactions. Current approaches take a collective classification approach, which ignores the relationships between different debate topics. In this work, we suggest to view this task as a representation learning problem, and embed the text and authors jointly based on their interactions. We evaluate our model over the Internet Argumentation Corpus, and compare different approaches for structural information embedding. Experimental results show that our model can achieve significantly better results compared to previous competitive models.

pdf bib
Classification of Moral Foundations in Microblog Political Discourse
Kristen Johnson | Dan Goldwasser
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Previous works in computer science, as well as political and social science, have shown correlation in text between political ideologies and the moral foundations expressed within that text. Additional work has shown that policy frames, which are used by politicians to bias the public towards their stance on an issue, are also correlated with political ideology. Based on these associations, this work takes a first step towards modeling both the language and how politicians frame issues on Twitter, in order to predict the moral foundations that are used by politicians to express their stances on issues. The contributions of this work includes a dataset annotated for the moral foundations, annotation guidelines, and probabilistic graphical models which show the usefulness of jointly modeling abstract political slogans, as opposed to the unigrams of previous works, with policy frames for the prediction of the morality underlying political tweets.

2017

pdf bib
Leveraging Behavioral and Social Information for Weakly Supervised Collective Classification of Political Discourse on Twitter
Kristen Johnson | Di Jin | Dan Goldwasser
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Framing is a political strategy in which politicians carefully word their statements in order to control public perception of issues. Previous works exploring political framing typically analyze frame usage in longer texts, such as congressional speeches. We present a collection of weakly supervised models which harness collective classification to predict the frames used in political discourse on the microblogging platform, Twitter. Our global probabilistic models show that by combining both lexical features of tweets and network-based behavioral features of Twitter, we are able to increase the average, unsupervised F1 score by 21.52 points over a lexical baseline alone.

pdf bib
PurdueNLP at SemEval-2017 Task 1: Predicting Semantic Textual Similarity with Paraphrase and Event Embeddings
I-Ta Lee | Mahak Goindani | Chang Li | Di Jin | Kristen Marie Johnson | Xiao Zhang | Maria Leonor Pacheco | Dan Goldwasser
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes our proposed solution for SemEval 2017 Task 1: Semantic Textual Similarity (Daniel Cer and Specia, 2017). The task aims at measuring the degree of equivalence between sentences given in English. Performance is evaluated by computing Pearson Correlation scores between the predicted scores and human judgements. Our proposed system consists of two subsystems and one regression model for predicting STS scores. The two subsystems are designed to learn Paraphrase and Event Embeddings that can take the consideration of paraphrasing characteristics and sentence structures into our system. The regression model associates these embeddings to make the final predictions. The experimental result shows that our system acquires 0.8 of Pearson Correlation Scores in this task.

pdf bib
Ideological Phrase Indicators for Classification of Political Discourse Framing on Twitter
Kristen Johnson | I-Ta Lee | Dan Goldwasser
Proceedings of the Second Workshop on NLP and Computational Social Science

Politicians carefully word their statements in order to influence how others view an issue, a political strategy called framing. Simultaneously, these frames may also reveal the beliefs or positions on an issue of the politician. Simple language features such as unigrams, bigrams, and trigrams are important indicators for identifying the general frame of a text, for both longer congressional speeches and shorter tweets of politicians. However, tweets may contain multiple unigrams across different frames which limits the effectiveness of this approach. In this paper, we present a joint model which uses both linguistic features of tweets and ideological phrase indicators extracted from a state-of-the-art embedding-based model to predict the general frame of political tweets.

pdf bib
Semi-supervised Structured Prediction with Neural CRF Autoencoder
Xiao Zhang | Yong Jiang | Hao Peng | Kewei Tu | Dan Goldwasser
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper we propose an end-to-end neural CRF autoencoder (NCRF-AE) model for semi-supervised learning of sequential structured prediction problems. Our NCRF-AE consists of two parts: an encoder which is a CRF model enhanced by deep neural networks, and a decoder which is a generative model trying to reconstruct the input. Our model has a unified structure with different loss functions for labeled and unlabeled data with shared parameters. We developed a variation of the EM algorithm for optimizing both the encoder and the decoder simultaneously by decoupling their parameters. Our Experimental results over the Part-of-Speech (POS) tagging task on eight different languages, show that our model can outperform competitive systems in both supervised and semi-supervised scenarios.

2016

pdf bib
Adapting Event Embedding for Implicit Discourse Relation Recognition
Maria Leonor Pacheco | I-Ta Lee | Xiao Zhang | Abdullah Khan Zehady | Pranjal Daga | Di Jin | Ayush Parolia | Dan Goldwasser
Proceedings of the CoNLL-16 shared task

pdf bib
“All I know about politics is what I read in Twitter”: Weakly Supervised Models for Extracting Politicians’ Stances From Twitter
Kristen Johnson | Dan Goldwasser
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

During the 2016 United States presidential election, politicians have increasingly used Twitter to express their beliefs, stances on current political issues, and reactions concerning national and international events. Given the limited length of tweets and the scrutiny politicians face for what they choose or neglect to say, they must craft and time their tweets carefully. The content and delivery of these tweets is therefore highly indicative of a politician’s stances. We present a weakly supervised method for extracting how issues are framed and temporal activity patterns on Twitter for popular politicians and issues of the 2016 election. These behavioral components are combined into a global model which collectively infers the most likely stance and agreement patterns among politicians, with respective accuracies of 86.44% and 84.6% on average.

pdf bib
Better Together: Combining Language and Social Interactions into a Shared Representation
Yi-Yu Lai | Chang Li | Dan Goldwasser | Jennifer Neville
Proceedings of TextGraphs-10: the Workshop on Graph-based Methods for Natural Language Processing

pdf bib
Identifying Stance by Analyzing Political Discourse on Twitter
Kristen Johnson | Dan Goldwasser
Proceedings of the First Workshop on NLP and Computational Social Science

pdf bib
Introducing DRAIL – a Step Towards Declarative Deep Relational Learning
Xiao Zhang | Maria Leonor Pacheco | Chang Li | Dan Goldwasser
Proceedings of the Workshop on Structured Prediction for NLP

pdf bib
Understanding Satirical Articles Using Common-Sense
Dan Goldwasser | Xiao Zhang
Transactions of the Association for Computational Linguistics, Volume 4

Automatic satire detection is a subtle text classification task, for machines and at times, even for humans. In this paper we argue that satire detection should be approached using common-sense inferences, rather than traditional text classification methods. We present a highly structured latent variable model capturing the required inferences. The model abstracts over the specific entities appearing in the articles, grouping them into generalized categories, thus allowing the model to adapt to previously unseen situations.

2014

pdf bib
Predicting Instructor’s Intervention in MOOC forums
Snigdha Chaturvedi | Dan Goldwasser | Hal Daumé III
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
I Object!” Modeling Latent Pragmatic Effects in Courtroom Dialogues
Dan Goldwasser | Hal Daumé III
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Understanding MOOC Discussion Forums using Seeded LDA
Arti Ramesh | Dan Goldwasser | Bert Huang | Hal Daumé | Lise Getoor
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

2013

pdf bib
Leveraging Domain-Independent Information in Semantic Parsing
Dan Goldwasser | Dan Roth
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Predicting Structures in NLP: Constrained Conditional Models and Integer Linear Programming in NLP
Dan Goldwasser | Vivek Srikumar | Dan Roth
Tutorial Abstracts at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Proceedings of the Second Workshop on Semantic Interpretation in an Actionable Context
Dan Goldwasser | Regina Barzilay | Dan Roth
Proceedings of the Second Workshop on Semantic Interpretation in an Actionable Context

2011

pdf bib
Confidence Driven Unsupervised Semantic Parsing
Dan Goldwasser | Roi Reichart | James Clarke | Dan Roth
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Discriminative Learning over Constrained Latent Representations
Ming-Wei Chang | Dan Goldwasser | Dan Roth | Vivek Srikumar
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Driving Semantic Parsing from the World’s Response
James Clarke | Dan Goldwasser | Ming-Wei Chang | Dan Roth
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

2009

pdf bib
Reading to Learn: Constructing Features from Semantic Abstracts
Jacob Eisenstein | James Clarke | Dan Goldwasser | Dan Roth
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised Constraint Driven Learning For Transliteration Discovery
Ming-Wei Chang | Dan Goldwasser | Dan Roth | Yuancheng Tu
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
Active Sample Selection for Named Entity Transliteration
Dan Goldwasser | Dan Roth
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Transliteration as Constrained Optimization
Dan Goldwasser | Dan Roth
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing