Aparna Garimella


2020

pdf bib
Understanding and Explicitly Measuring Linguistic and Stylistic Properties of Deception via Generation and Translation
Emily Saldanha | Aparna Garimella | Svitlana Volkova
Proceedings of the 13th International Conference on Natural Language Generation

Massive digital disinformation is one of the main risks of modern society. Hundreds of models and linguistic analyses have been done to compare and contrast misleading and credible content online. However, most models do not remove the confounding factor of a topic or narrative when training, so the resulting models learn a clear topical separation for misleading versus credible content. We study the feasibility of using two strategies to disentangle the topic bias from the models to understand and explicitly measure linguistic and stylistic properties of content from misleading versus credible content. First, we develop conditional generative models to create news content that is characteristic of different credibility levels. We perform multi-dimensional evaluation of model performance on mimicking both the style and linguistic differences that distinguish news of different credibility using machine translation metrics and classification models. We show that even though generative models are able to imitate both the style and language of the original content, additional conditioning on both the news category and the topic leads to reduced performance. In a second approach, we perform deception style “transfer” by translating deceptive content into the style of credible content and vice versa. Extending earlier studies, we demonstrate that, when conditioned on a topic, deceptive content is shorter, less readable, more biased, and more subjective than credible content, and transferring the style from deceptive to credible content is more challenging than the opposite direction.

pdf bib
“Judge me by my size (noun), do you?” YodaLib: A Demographic-Aware Humor Generation Framework
Aparna Garimella | Carmen Banea | Nabil Hossain | Rada Mihalcea
Proceedings of the 28th International Conference on Computational Linguistics

The subjective nature of humor makes computerized humor generation a challenging task. We propose an automatic humor generation framework for filling the blanks in Mad Libs® stories, while accounting for the demographic backgrounds of the desired audience. We collect a dataset consisting of such stories, which are filled in and judged by carefully selected workers on Amazon Mechanical Turk. We build upon the BERT platform to predict location-biased word fillings in incomplete sentences, and we fine-tune BERT to classify location-specific humor in a sentence. We leverage these components to produce YodaLib, a fully-automated Mad Libs style humor generation framework, which selects and ranks appropriate candidate words and sentences in order to generate a coherent and funny story tailored to certain demographics. Our experimental results indicate that YodaLib outperforms a previous semi-automated approach proposed for this task, while also surpassing human annotators in both qualitative and quantitative analyses.

2019

pdf bib
Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing
Aparna Garimella | Carmen Banea | Dirk Hovy | Rada Mihalcea
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Several linguistic studies have shown the prevalence of various lexical and grammatical patterns in texts authored by a person of a particular gender, but models for part-of-speech tagging and dependency parsing have still not adapted to account for these differences. To address this, we annotate the Wall Street Journal part of the Penn Treebank with the gender information of the articles’ authors, and build taggers and parsers trained on this data that show performance differences in text written by men and women. Further analyses reveal numerous part-of-speech tags and syntactic relations whose prediction performances benefit from the prevalence of a specific gender in the training data. The results underscore the importance of accounting for gendered differences in syntactic tasks, and outline future venues for developing more accurate taggers and parsers. We release our data to the research community.

2017

pdf bib
Demographic-aware word associations
Aparna Garimella | Carmen Banea | Rada Mihalcea
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Variations of word associations across different groups of people can provide insights into people’s psychologies and their world views. To capture these variations, we introduce the task of demographic-aware word associations. We build a new gold standard dataset consisting of word association responses for approximately 300 stimulus words, collected from more than 800 respondents of different gender (male/female) and from different locations (India/United States), and show that there are significant variations in the word associations made by these groups. We also introduce a new demographic-aware word association model based on a neural net skip-gram architecture, and show how computational methods for measuring word associations that specifically account for writer demographics can outperform generic methods that are agnostic to such information.

2016

pdf bib
Identifying Cross-Cultural Differences in Word Usage
Aparna Garimella | Rada Mihalcea | James Pennebaker
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Personal writings have inspired researchers in the fields of linguistics and psychology to study the relationship between language and culture to better understand the psychology of people across different cultures. In this paper, we explore this relation by developing cross-cultural word models to identify words with cultural bias – i.e., words that are used in significantly different ways by speakers from different cultures. Focusing specifically on two cultures: United States and Australia, we identify a set of words with significant usage differences, and further investigate these words through feature analysis and topic modeling, shedding light on the attributes of language that contribute to these differences.

pdf bib
Zooming in on Gender Differences in Social Media
Aparna Garimella | Rada Mihalcea
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Men are from Mars and women are from Venus - or so the genre of relationship literature would have us believe. But there is some truth in this idea, and researchers in fields as diverse as psychology, sociology, and linguistics have explored ways to better understand the differences between genders. In this paper, we take another look at the problem of gender discrimination and attempt to move beyond the typical surface-level text classification approach, by (1) identifying semantic and psycholinguistic word classes that reflect systematic differences between men and women and (2) finding differences between genders in the ways they use the same words. We describe several experiments and report results on a large collection of blogs authored by men and women.