Early Text Classification Using Multi-Resolution Concept Representations

Adrian Pastor López-Monroy, Fabio A. González, Manuel Montes, Hugo Jair Escalante, Thamar Solorio


Abstract
The intensive use of e-communications in everyday life has given rise to new threats and risks. When the vulnerable asset is the user, detecting these potential attacks before they cause serious damages is extremely important. This paper proposes a novel document representation to improve the early detection of risks in social media sources. The goal is to effectively identify the potential risk using as few text as possible and with as much anticipation as possible. Accordingly, we devise a Multi-Resolution Representation (MulR), which allows us to generate multiple “views” of the analyzed text. These views capture different semantic meanings for words and documents at different levels of detail, which is very useful in early scenarios to model the variable amounts of evidence. Intuitively, the representation captures better the content of short documents (very early stages) in low resolutions, whereas large documents (medium/large stages) are better modeled with higher resolutions. We evaluate the proposed ideas in two different tasks where anticipation is critical: sexual predator detection and depression detection. The experimental evaluation for these early tasks revealed that the proposed approach outperforms previous methodologies by a considerable margin.
Anthology ID:
N18-1110
Volume:
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Month:
June
Year:
2018
Address:
New Orleans, Louisiana
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1216–1225
Language:
URL:
https://www.aclweb.org/anthology/N18-1110
DOI:
10.18653/v1/N18-1110
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/N18-1110.pdf