Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach

Fahad Albogamy, Allan Ramsay, Hanady Ahmed


Abstract
In this paper, we propose using a “bootstrapping” method for constructing a dependency treebank of Arabic tweets. This method uses a rule-based parser to create a small treebank of one thousand Arabic tweets and a data-driven parser to create a larger treebank by using the small treebank as a seed training set. We are able to create a dependency treebank from unlabelled tweets without any manual intervention. Experiments results show that this method can improve the speed of training the parser and the accuracy of the resulting parsers.
Anthology ID:
W17-1312
Volume:
Proceedings of the Third Arabic Natural Language Processing Workshop
Month:
April
Year:
2017
Address:
Valencia, Spain
Venues:
WANLP | WS
SIG:
SEMITIC
Publisher:
Association for Computational Linguistics
Note:
Pages:
94–99
Language:
URL:
https://www.aclweb.org/anthology/W17-1312
DOI:
10.18653/v1/W17-1312
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-1312.pdf