David Jurgens


2020

pdf bib
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
David Bamman | Dirk Hovy | David Jurgens | Brendan O'Connor | Svitlana Volkova
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

pdf bib
Condolence and Empathy in Online Communities
Naitian Zhou | David Jurgens
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Offering condolence is a natural reaction to hearing someone’s distress. Individuals frequently express distress in social media, where some communities can provide support. However, not all condolence is equal—trite responses offer little actual support despite their good intentions. Here, we develop computational tools to create a massive dataset of 11.4M expressions of distress and 2.8M corresponding offerings of condolence in order to examine the dynamics of condolence online. Our study reveals widespread disparity in what types of distress receive supportive condolence rather than just engagement. Building on studies from social psychology, we analyze the language of condolence and develop a new dataset for quantifying the empathy in a condolence using appraisal theory. Finally, we demonstrate that the features of condolence individuals find most helpful online differ substantially in their features from those seen in interpersonal settings.

pdf bib
Quantifying Intimacy in Language
Jiaxin Pei | David Jurgens
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Intimacy is a fundamental aspect of how we relate to others in social settings. Language encodes the social information of intimacy through both topics and other more subtle cues (such as linguistic hedging and swearing). Here, we introduce a new computational framework for studying expressions of the intimacy in language with an accompanying dataset and deep learning model for accurately predicting the intimacy level of questions (Pearson r = 0.87). Through analyzing a dataset of 80.5M questions across social media, books, and films, we show that individuals employ interpersonal pragmatic moves in their language to align their intimacy with social settings. Then, in three studies, we further demonstrate how individuals modulate their intimacy to match social norms around gender, social distance, and audience, each validating key findings from studies in social psychology. Our work demonstrates that intimacy is a pervasive and impactful social dimension of language.

2019

pdf bib
Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts
Luke Breitfeller | Emily Ahn | David Jurgens | Yulia Tsvetkov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Microaggressions are subtle, often veiled, manifestations of human biases. These uncivil interactions can have a powerful negative impact on people by marginalizing minorities and disadvantaged groups. The linguistic subtlety of microaggressions in communication has made it difficult for researchers to analyze their exact nature, and to quantify and extract microaggressions automatically. Specifically, the lack of a corpus of real-world microaggressions and objective criteria for annotating them have prevented researchers from addressing these problems at scale. In this paper, we devise a general but nuanced, computationally operationalizable typology of microaggressions based on a small subset of data that we have. We then create two datasets: one with examples of diverse types of microaggressions recollected by their targets, and another with gender-based microaggressions in public conversations on social media. We introduce a new, more objective, criterion for annotation and an active-learning based procedure that increases the likelihood of surfacing posts containing microaggressions. Finally, we analyze the trends that emerge from these new datasets.

pdf bib
A Just and Comprehensive Strategy for Using NLP to Address Online Abuse
David Jurgens | Libby Hemphill | Eshwar Chandrasekharan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Online abusive behavior affects millions and the NLP community has attempted to mitigate this problem by developing technologies to detect abuse. However, current methods have largely focused on a narrow definition of abuse to detriment of victims who seek both validation and solutions. In this position paper, we argue that the community needs to make three substantive changes: (1) expanding our scope of problems to tackle both more subtle and more serious forms of abuse, (2) developing proactive technologies that counter or inhibit abuse before it harms, and (3) reframing our effort within a framework of justice to promote healthy communities.

pdf bib
Wetin dey with these comments? Modeling Sociolinguistic Factors Affecting Code-switching Behavior in Nigerian Online Discussions
Innocent Ndubuisi-Obi | Sayan Ghosh | David Jurgens
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Multilingual individuals code switch between languages as a part of a complex communication process. However, most computational studies have examined only one or a handful of contextual factors predictive of switching. Here, we examine Naija-English code switching in a rich contextual environment to understand the social and topical factors eliciting a switch. We introduce a new corpus of 330K articles and accompanying 389K comments labeled for code switching behavior. In modeling whether a comment will switch, we show that topic-driven variation, tribal affiliation, emotional valence, and audience design all play complementary roles in behavior.

pdf bib
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
Svitlana Volkova | David Jurgens | Dirk Hovy | David Bamman | Oren Tsur
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

2018

pdf bib
RtGender: A Corpus for Studying Differential Responses to Gender
Rob Voigt | David Jurgens | Vinodkumar Prabhakaran | Dan Jurafsky | Yulia Tsvetkov
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
It’s going to be okay: Measuring Access to Support in Online Communities
Zijian Wang | David Jurgens
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

People use online platforms to seek out support for their informational and emotional needs. Here, we ask what effect does revealing one’s gender have on receiving support. To answer this, we create (i) a new dataset and method for identifying supportive replies and (ii) new methods for inferring gender from text and name. We apply these methods to create a new massive corpus of 102M online interactions with gender-labeled users, each rated by degree of supportiveness. Our analysis shows wide-spread and consistent disparity in support: identifying as a woman is associated with higher rates of support - but also higher rates of disparagement.

pdf bib
Measuring the Evolution of a Scientific Field through Citation Frames
David Jurgens | Srijan Kumar | Raine Hoover | Dan McFarland | Dan Jurafsky
Transactions of the Association for Computational Linguistics, Volume 6

Citations have long been used to characterize the state of a scientific field and to identify influential works. However, writers use citations for different purposes, and this varied purpose influences uptake by future scholars. Unfortunately, our understanding of how scholars use and frame citations has been limited to small-scale manual citation analysis of individual papers. We perform the largest behavioral study of citations to date, analyzing how scientific works frame their contributions through different types of citations and how this framing affects the field as a whole. We introduce a new dataset of nearly 2,000 citations annotated for their function, and use it to develop a state-of-the-art classifier and label the papers of an entire field: Natural Language Processing. We then show how differences in framing affect scientific uptake and reveal the evolution of the publication venues and the field as a whole. We demonstrate that authors are sensitive to discourse structure and publication venue when citing, and that how a paper frames its work through citations is predictive of the citation count it will receive. Finally, we use changes in citation framing to show that the field of NLP is undergoing a significant increase in consensus.

2017

pdf bib
Incorporating Dialectal Variability for Socially Equitable Language Identification
David Jurgens | Yulia Tsvetkov | Dan Jurafsky
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Language identification (LID) is a critical first step for processing multilingual text. Yet most LID systems are not designed to handle the linguistic diversity of global platforms like Twitter, where local dialects and rampant code-switching lead language classifiers to systematically miss minority dialect speakers and multilingual speakers. We propose a new dataset and a character-based sequence-to-sequence model for LID designed to support dialectal and multilingual language varieties. Our model achieves state-of-the-art performance on multiple LID benchmarks. Furthermore, in a case study using Twitter for health tracking, our method substantially increases the availability of texts written by underrepresented populations, enabling the development of “socially inclusive” NLP tools.

pdf bib
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Steven Bethard | Marine Carpuat | Marianna Apidianaki | Saif M. Mohammad | Daniel Cer | David Jurgens
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

pdf bib
Proceedings of the Second Workshop on NLP and Computational Social Science
Dirk Hovy | Svitlana Volkova | David Bamman | David Jurgens | Brendan O’Connor | Oren Tsur | A. Seza Doğruöz
Proceedings of the Second Workshop on NLP and Computational Social Science

2016

pdf bib
Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel
Hardik Vala | Stefan Dimitrov | David Jurgens | Andrew Piper | Derek Ruths
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Characters form the focus of various studies of literary works, including social network analysis, archetype induction, and plot comparison. The recent rise in the computational modelling of literary works has produced a proportional rise in the demand for character-annotated literary corpora. However, automatically identifying characters is an open problem and there is low availability of literary texts with manually labelled characters. To address the latter problem, this work presents three contributions: (1) a comprehensive scheme for manually resolving mentions to characters in texts. (2) A novel collaborative annotation tool, CHARLES (CHAracter Resolution Label-Entry System) for character annotation and similiar cross-document tagging tasks. (3) The character annotations resulting from a pilot study on the novel Pride and Prejudice, demonstrating the scheme and tool facilitate the efficient production of high-quality annotations. We expect this work to motivate the further production of annotated literary corpora to help meet the demand of the community.

pdf bib
Proceedings of the First Workshop on NLP and Computational Social Science
David Bamman | A. Seza Doğruöz | Jacob Eisenstein | Dirk Hovy | David Jurgens | Brendan O’Connor | Alice Oh | Oren Tsur | Svitlana Volkova
Proceedings of the First Workshop on NLP and Computational Social Science

pdf bib
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)
Steven Bethard | Marine Carpuat | Daniel Cer | David Jurgens | Preslav Nakov | Torsten Zesch
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
SemEval-2016 Task 14: Semantic Taxonomy Enrichment
David Jurgens | Mohammad Taher Pilehvar
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Mr. Bennet, his coachman, and the Archbishop walk into a bar but only one of them gets recognized: On The Difficulty of Detecting Characters in Literary Texts
Hardik Vala | David Jurgens | Andrew Piper | Derek Ruths
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

bib
Semantic Similarity Frontiers: From Concepts to Documents
David Jurgens | Mohammad Taher Pilehvar
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Semantic similarity forms a central component in many NLP systems, from lexical semantics, to part of speech tagging, to social media analysis. Recent years have seen a renewed interest in developing new similarity techniques, buoyed in part by work on embeddings and by SemEval tasks in Semantic Textual Similarity and Cross-Level Semantic Similarity. The increased interest has led to hundreds of techniques for measuring semantic similarity, which makes it difficult for practitioners to identify which state-of-the-art techniques are applicable and easily integrated into projects and for researchers to identify which aspects of the problem require future research.This tutorial synthesizes the current state of the art for measuring semantic similarity for all types of conceptual or textual pairs and presents a broad overview of current techniques, what resources they use, and the particular inputs or domains to which the methods are most applicable. We survey methods ranging from corpus-based approaches operating on massive or domains-specific corpora to those leveraging structural information from expert-based or collaboratively-constructed lexical resources. Furthermore, we review work on multiple similarity tasks from sense-based comparisons to word, sentence, and document-sized comparisons and highlight general-purpose methods capable of comparing multiple types of inputs. Where possible, we also identify techniques that have been demonstrated to successfully operate in multilingual or cross-lingual settings.Our tutorial provides a clear overview of currently-available tools and their strengths for practitioners who need out of the box solutions and provides researchers with an understanding of the limitations of current state of the art and what open problems remain in the field. Given the breadth of available approaches, participants will also receive a detailed bibliography of approaches (including those not directly covered in the tutorial), annotated according to the approaches abilities, and pointers to when open-source implementations of the algorithms may be obtained.

pdf bib
Reserating the awesometastic: An automatic extension of the WordNet taxonomy for novel terms
David Jurgens | Mohammad Taher Pilehvar
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Reading Between the Lines: Overcoming Data Sparsity for Accurate Classification of Lexical Relationships
Silvia Necşulescu | Sara Mendes | David Jurgens | Núria Bel | Roberto Navigli
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf bib
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
Preslav Nakov | Torsten Zesch | Daniel Cer | David Jurgens
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
SemEval-2014 Task 3: Cross-Level Semantic Similarity
David Jurgens | Mohammad Taher Pilehvar | Roberto Navigli
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Validating and Extending Semantic Knowledge Bases using Video Games with a Purpose
Daniele Vannella | David Jurgens | Daniele Scarfini | Domenico Toscani | Roberto Navigli
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Twitter Users #CodeSwitch Hashtags! #MoltoImportante #wow
David Jurgens | Stefan Dimitrov | Derek Ruths
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
It’s All Fun and Games until Someone Annotates: Video Games with a Purpose for Linguistic Annotation
David Jurgens | Roberto Navigli
Transactions of the Association for Computational Linguistics, Volume 2

Annotated data is prerequisite for many NLP applications. Acquiring large-scale annotated corpora is a major bottleneck, requiring significant time and resources. Recent work has proposed turning annotation into a game to increase its appeal and lower its cost; however, current games are largely text-based and closely resemble traditional annotation tasks. We propose a new linguistic annotation paradigm that produces annotations from playing graphical video games. The effectiveness of this design is demonstrated using two video games: one to create a mapping from WordNet senses to images, and a second game that performs Word Sense Disambiguation. Both games produce accurate results. The first game yields annotation quality equal to that of experts and a cost reduction of 73% over equivalent crowdsourcing; the second game provides a 16.3% improvement in accuracy over current state-of-the-art sense disambiguation games with WordNet.

pdf bib
An analysis of ambiguity in word sense annotations
David Jurgens
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Word sense annotation is a challenging task where annotators distinguish which meaning of a word is present in a given context. In some contexts, a word usage may elicit multiple interpretations, resulting either in annotators disagreeing or in allowing the usage to be annotated with multiple senses. While some works have allowed the latter, the extent to which multiple sense annotations are needed has not been assessed. The present work analyzes a dataset of instances annotated with multiple WordNet senses to assess the causes of the multiple interpretations and their relative frequencies, along with the effect of the multiple senses on the contextual interpretation. We show that contextual underspecification is the primary cause of multiple interpretations but that syllepsis still accounts for more than a third of the cases. In addition, we show that sense coarsening can only partially remove the need for labeling instances with multiple senses and we provide suggestions for how future sense annotation guidelines might be developed to account for this need.

2013

pdf bib
Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity
Mohammad Taher Pilehvar | David Jurgens | Roberto Navigli
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels
David Jurgens
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
SemEval-2013 Task 12: Multilingual Word Sense Disambiguation
Roberto Navigli | David Jurgens | Daniele Vannella
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses
David Jurgens | Ioannis Klapaftis
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
An Evaluation of Graded Sense Disambiguation using Word Sense Induction
David Jurgens
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
SemEval-2012 Task 2: Measuring Degrees of Relational Similarity
David Jurgens | Saif Mohammad | Peter Turney | Keith Holyoak
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Word Sense Induction by Community Detection
David Jurgens
Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing

pdf bib
Measuring the Impact of Sense Similarity on Word Sense Induction
David Jurgens | Keith Stevens
Proceedings of the First workshop on Unsupervised Learning in NLP

2010

pdf bib
Capturing Nonlinear Structure in Word Spaces through Dimensionality Reduction
David Jurgens | Keith Stevens
Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics

pdf bib
The S-Space Package: An Open Source Package for Word Space Models
David Jurgens | Keith Stevens
Proceedings of the ACL 2010 System Demonstrations

pdf bib
HERMIT: Flexible Clustering for the SemEval-2 WSI Task
David Jurgens | Keith Stevens
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
Event Detection in Blogs using Temporal Random Indexing
David Jurgens | Keith Stevens
Proceedings of the Workshop on Events in Emerging Text Types