Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS

Wejdene Khiari, Mathieu Roche, Asma Bouhafs Hafsia


Abstract
With the explosive growth of online social media (forums, blogs, and social networks), exploitation of these new information sources has become essential. Our work is based on the sud4science project. The goal of this project is to perform multidisciplinary work on a corpus of authentic SMS, in French, collected in 2011 and anonymised (88milSMS corpus: http://88milsms.huma-num.fr). This paper highlights a new method to integrate opinion detection knowledge from an SMS corpus by combining lexical and semantic information. More precisely, our approach gives more weight to words with a sentiment (i.e. presence of words in a dedicated dictionary) for a classification task based on three classes: positive, negative, and neutral. The experiments were conducted on two corpora: an elongated SMS corpus (i.e. repetitions of characters in messages) and a non-elongated SMS corpus. We noted that non-elongated SMS were much better classified than elongated SMS. Overall, this study highlighted that the integration of semantic knowledge always improves classification.
Anthology ID:
L16-1188
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1185–1189
Language:
URL:
https://www.aclweb.org/anthology/L16-1188
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/L16-1188.pdf