Annohub – Annotation Metadata for Linked Data Applications

Frank Abromeit, Christian Fäth, Luis Glaser


Abstract
We introduce a new dataset for the Linguistic Linked Open Data (LLOD) cloud that will provide metadata about annotation and language information harvested from annotated language resources like corpora freely available on the internet. To our knowledge annotation metadata is not provided by any metadata provider, e.g. linghub, datahub or CLARIN so far. On the other hand, language metadata that is found on such portals is rarely provided in machine-readable form, especially as Linked Data. In this paper, we describe the harvesting process, content and structure of the new dataset and its application in the Lin|gu|is|tik portal, a research platform for linguists. Aside from that, we introduce tools for the conversion of XML encoded language resources to the CoNLL format. The generated RDF data as well as the XML-converter application are made public under an open license.
Anthology ID:
2020.ldl-1.6
Volume:
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LDL | LREC | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
36–44
Language:
English
URL:
https://www.aclweb.org/anthology/2020.ldl-1.6
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.ldl-1.6.pdf