From the attic to the cloud: mobilization of endangered language resources with linked data

Sebastian Nordhoff


Abstract
This paper describes a collection of 20k ELAN annotation files harvested from five different endangered language archives. The ELAN files form a very heterogeneous set, but the hierarchical configuration of their tiers allow, in conjunction with the tier content, to identify transcriptions, translations, and glosses. These transcriptions, translations, and glosses are queryable across archives. Small analyses of graphemes (transcription tier), grammatical and lexical glosses (gloss tier), and semantic concepts (translation tier) show the viability of the approach. The use of identifiers from OLAC, Wikidata and Glottolog allows for a better integration of the data from these archives into the Linguistic Linked Open Data Cloud.
Anthology ID:
2020.lr4sshoc-1.3
Volume:
Proceedings of the Workshop about Language Resources for the SSH Cloud
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LR4SSHOC | LREC | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
10–18
Language:
English
URL:
https://www.aclweb.org/anthology/2020.lr4sshoc-1.3
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.lr4sshoc-1.3.pdf