Learning from Relatives: Unified Dialectal Arabic Segmentation
Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer
Abstract
Arabic dialects do not just share a common koiné, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.- Anthology ID:
- K17-1043
- Volume:
- Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada
- Venue:
- CoNLL
- SIG:
- SIGNLL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 432–441
- Language:
- URL:
- https://www.aclweb.org/anthology/K17-1043
- DOI:
- 10.18653/v1/K17-1043
- PDF:
- http://aclanthology.lst.uni-saarland.de/K17-1043.pdf