Dorottya Demszky


2020

pdf bib
GoEmotions: A Dataset of Fine-Grained Emotions
Dorottya Demszky | Dana Movshovitz-Attias | Jeongwoo Ko | Alan Cowen | Gaurav Nemade | Sujith Ravi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks. We introduce GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. We demonstrate the high quality of the annotations via Principal Preserved Component Analysis. We conduct transfer learning experiments with existing emotion benchmarks to show that our dataset generalizes well to other domains and different emotion taxonomies. Our BERT-based model achieves an average F1-score of .46 across our proposed taxonomy, leaving much room for improvement.

pdf bib
Pártélet: A Hungarian Corpus of Propaganda Texts from the Hungarian Socialist Era
Zoltán Kmetty | Veronika Vincze | Dorottya Demszky | Orsolya Ring | Balázs Nagy | Martina Katalin Szabó
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we present Pártélet, a digitized Hungarian corpus of Communist propaganda texts. Pártélet was the official journal of the governing party during the Hungarian socialism from 1956 to 1989, hence it represents the direct political agitation and propaganda of the dictatorial system in question. The paper has a dual purpose: first, to present a general review of the corpus compilation process and the basic statistical data of the corpus, and second, to demonstrate through two case studies what the dataset can be used for. We show that our corpus provides a unique opportunity for conducting research on Hungarian propaganda discourse, as well as analyzing changes of this discourse over a 35-year period of time with computer-assisted methods.

bib
Analyzing the Framing of 2020 Presidential Candidates in the News
Audrey Acken | Dorottya Demszky
Proceedings of the The Fourth Widening Natural Language Processing Workshop

In this study, we apply NLP methods to learn about the framing of the 2020 Democratic Presidential candidates in news media. We use both a lexicon-based approach and word embeddings to analyze how candidates are discussed in news sources with different political leanings. Our results show significant differences in the framing of candidates across the news sources along several dimensions, such as sentiment and agency, paving the way for a deeper investigation.

2019

pdf bib
Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings
Dorottya Demszky | Nikhil Garg | Rob Voigt | James Zou | Jesse Shapiro | Matthew Gentzkow | Dan Jurafsky
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force. We quantify these aspects with existing lexical methods, and propose clustering of tweet embeddings as a means to identify salient topics for analysis across events; human evaluations show that our approach generates more cohesive topics than traditional LDA-based models. We apply our methods to study 4.4M tweets on 21 mass shootings. We provide evidence that the discussion of these events is highly polarized politically and that this polarization is primarily driven by partisan differences in framing rather than topic choice. We identify framing devices, such as grounding and the contrasting use of the terms “terrorist” and “crazy”, that contribute to polarization. Results pertaining to topic choice, affect and illocutionary force suggest that Republicans focus more on the shooter and event-specific facts (news) while Democrats focus more on the victims and call for policy changes. Our work contributes to a deeper understanding of the way group divisions manifest in language and to computational methods for studying them.