Semi-Supervised Iterative Approach for Domain-Specific Complaint Detection in Social Media

Akash Gautam, Debanjan Mahata, Rakesh Gosangi, Rajiv Ratn Shah


Abstract
In this paper, we present a semi-supervised bootstrapping approach to detect product or service related complaints in social media. Our approach begins with a small collection of annotated samples which are used to identify a preliminary set of linguistic indicators pertinent to complaints. These indicators are then used to expand the dataset. The expanded dataset is again used to extract more indicators. This process is applied for several iterations until we can no longer find any new indicators. We evaluated this approach on a Twitter corpus specifically to detect complaints about transportation services. We started with an annotated set of 326 samples of transportation complaints, and after four iterations of the approach, we collected 2,840 indicators and over 3,700 tweets. We annotated a random sample of 700 tweets from the final dataset and observed that nearly half the samples were actual transportation complaints. Lastly, we also studied how different features based on semantics, orthographic properties, and sentiment contribute towards the prediction of complaints.
Anthology ID:
2020.ecnlp-1.7
Volume:
Proceedings of The 3rd Workshop on e-Commerce and NLP
Month:
July
Year:
2020
Address:
Seattle, WA, USA
Venues:
ACL | ECNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–53
Language:
URL:
https://www.aclweb.org/anthology/2020.ecnlp-1.7
DOI:
10.18653/v1/2020.ecnlp-1.7
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.ecnlp-1.7.pdf
Video:
 http://slideslive.com/38931247