Supervised Event Coding from Text Written in Arabic: Introducing Hadath

Javier Osorio, Alejandro Reyes, Alejandro Beltrán, Atal Ahmadzai


Abstract
This article introduces Hadath, a supervised protocol for coding event data from text written in Arabic. Hadath contributes to recent efforts in advancing multi-language event coding using computer-based solutions. In this application, we focus on extracting event data about the conflict in Afghanistan from 2008 to 2018 using Arabic information sources. The implementation relies first on a Machine Learning algorithm to classify news stories relevant to the Afghan conflict. Then, using Hadath, we implement the Natural Language Processing component for event coding from Arabic script. The output database contains daily geo-referenced information at the district level on who did what to whom, when and where in the Afghan conflict. The data helps to identify trends in the dynamics of violence, the provision of governance, and traditional conflict resolution in Afghanistan for different actors over time and across space.
Anthology ID:
2020.aespen-1.9
Volume:
Proceedings of the Workshop on Automated Extraction of Socio-political Events from News 2020
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
AESPEN | LREC | WS
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
49–56
Language:
English
URL:
https://www.aclweb.org/anthology/2020.aespen-1.9
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.aespen-1.9.pdf