Towards a Computational Lexicon for Moroccan Darija: Words, Idioms, and Constructions

Jamal Laoudi, Claire Bonial, Lucia Donatelli, Stephen Tratz, Clare Voss


Abstract
In this paper, we explore the challenges of building a computational lexicon for Moroccan Darija (MD), an Arabic dialect spoken by over 32 million people worldwide but which only recently has begun appearing frequently in written form in social media. We raise the question of what belongs in such a lexicon and start by describing our work building traditional word-level lexicon entries with their English translations. We then discuss challenges in translating idiomatic MD text that led to creating multi-word expression lexicon entries whose meanings could not be fully derived from the individual words. Finally, we provide a preliminary exploration of constructions to be considered for inclusion in an MD constructicon by translating examples of English constructions and examining their MD counterparts.
Anthology ID:
W18-4910
Volume:
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Venues:
COLING | LAW | MWE | WS
SIGs:
SIGLEX | SIGANN
Publisher:
Association for Computational Linguistics
Note:
Pages:
74–85
Language:
URL:
https://www.aclweb.org/anthology/W18-4910
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-4910.pdf