Endangered Languages meet Modern NLP

Antonios Anastasopoulos, Christopher Cox, Graham Neubig, Hilaria Cruz


Abstract
This tutorial will focus on NLP for endangered languages documentation and revitalization. First, we will acquaint the attendees with the process and the challenges of language documentation, showing how the needs of the language communities and the documentary linguists map to specific NLP tasks. We will then present the state-of-the-art in NLP applied in this particularly challenging setting (extremely low-resource datasets, noisy transcriptions, limited annotations, non-standard orthographies). In doing so, we will also analyze the challenges of working in this domain and expand on both the capabilities and the limitations of current NLP approaches. Our ultimate goal is to motivate more NLP practitioners to work towards this very important direction, and also provide them with the tools and understanding of the limitations/challenges, both of which are needed in order to have an impact.
Anthology ID:
2020.coling-tutorials.7
Volume:
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
39–45
Language:
URL:
https://www.aclweb.org/anthology/2020.coling-tutorials.7
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.coling-tutorials.7.pdf