Unsupervised Stemmer for Arabic Tweets

Fahad Albogamy, Allan Ramsay


Abstract
Stemming is an essential processing step in a wide range of high level text processing applications such as information extraction, machine translation and sentiment analysis. It is used to reduce words to their stems. Many stemming algorithms have been developed for Modern Standard Arabic (MSA). Although Arabic tweets and MSA are closely related and share many characteristics, there are substantial differences between them in lexicon and syntax. In this paper, we introduce a light Arabic stemmer for Arabic tweets. Our results show improvements over the performance of a number of well-known stemmers for Arabic.
Anthology ID:
W16-3912
Volume:
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
WNUT | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
78–84
Language:
URL:
https://www.aclweb.org/anthology/W16-3912
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W16-3912.pdf