Arijit Ghosh Chowdhury


2020

pdf bib
Augmenting NLP models using Latent Feature Interpolations
Amit Jindal | Arijit Ghosh Chowdhury | Aniket Didolkar | Di Jin | Ramit Sawhney | Rajiv Ratn Shah
Proceedings of the 28th International Conference on Computational Linguistics

Models with a large number of parameters are prone to over-fitting and often fail to capture the underlying input distribution. We introduce Emix, a data augmentation method that uses interpolations of word embeddings and hidden layer representations to construct virtual examples. We show that Emix shows significant improvements over previously used interpolation based regularizers and data augmentation techniques. We also demonstrate how our proposed method is more robust to sparsification. We highlight the merits of our proposed methodology by performing thorough quantitative and qualitative assessments.

2019

pdf bib
#YouToo? Detection of Personal Recollections of Sexual Harassment on Social Media
Arijit Ghosh Chowdhury | Ramit Sawhney | Rajiv Ratn Shah | Debanjan Mahata
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The availability of large-scale online social data, coupled with computational methods can help us answer fundamental questions relat- ing to our social lives, particularly our health and well-being. The #MeToo trend has led to people talking about personal experiences of harassment more openly. This work at- tempts to aggregate such experiences of sex- ual abuse to facilitate a better understanding of social media constructs and to bring about social change. It has been found that disclo- sure of abuse has positive psychological im- pacts. Hence, we contend that such informa- tion can leveraged to create better campaigns for social change by analyzing how users react to these stories and to obtain a better insight into the consequences of sexual abuse. We use a three part Twitter-Specific Social Media Lan- guage Model to segregate personal recollec- tions of sexual harassment from Twitter posts. An extensive comparison with state-of-the-art generic and specific models along with a de- tailed error analysis explores the merit of our proposed model.

pdf bib
ARHNet - Leveraging Community Interaction for Detection of Religious Hate Speech in Arabic
Arijit Ghosh Chowdhury | Aniket Didolkar | Ramit Sawhney | Rajiv Ratn Shah
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

The rapid widespread of social media has lead to some undesirable consequences like the rapid increase of hateful content and offensive language. Religious Hate Speech, in particular, often leads to unrest and sometimes aggravates to violence against people on the basis of their religious affiliations. The richness of the Arabic morphology and the limited available resources makes this task especially challenging. The current state-of-the-art approaches to detect hate speech in Arabic rely entirely on textual (lexical and semantic) cues. Our proposed methodology contends that leveraging Community-Interaction can better help us profile hate speech content on social media. Our proposed ARHNet (Arabic Religious Hate Speech Net) model incorporates both Arabic Word Embeddings and Social Network Graphs for the detection of religious hate speech.

pdf bib
Speak up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment
Arijit Ghosh Chowdhury | Ramit Sawhney | Puneet Mathur | Debanjan Mahata | Rajiv Ratn Shah
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

The #MeToo movement is an ongoing prevalent phenomenon on social media aiming to demonstrate the frequency and widespread of sexual harassment by providing a platform to speak narrate personal experiences of such harassment. The aggregation and analysis of such disclosures pave the way to development of technology-based prevention of sexual harassment. We contend that the lack of specificity in generic sentence classification models may not be the best way to tackle text subtleties that intrinsically prevail in a classification task as complex as identifying disclosures of sexual harassment. We propose the Disclosure Language Model, a three part ULMFiT architecture, consisting of a Language model, a Medium-Specific (Twitter) model and a Task-Specific classifier to tackle this problem and create a manually annotated real-world dataset to test our technique on this, to show that using a Discourse Language Model often yields better classification performance over (i) Generic deep learning based sentence classification models (ii) existing models that rely on handcrafted stylistic features. An extensive comparison with state-of-the-art generic and specific models along with a detailed error analysis presents the case for our proposed methodology.