Hinrik Hafsteinsson


2020

pdf bib
A Universal Dependencies Conversion Pipeline for a Penn-format Constituency Treebank
Þórunn Arnardóttir | Hinrik Hafsteinsson | Einar Freyr Sigurðsson | Kristín Bjarnadóttir | Anton Karl Ingason | Hildur Jónsdóttir | Steinþór Steingrímsson
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

The topic of this paper is a rule-based pipeline for converting constituency treebanks based on the Penn Treebank format to Universal Dependencies (UD). We describe an Icelandic constituency treebank, its annotation scheme and the UD scheme. The conversion is discussed, the methods used to deliver a fully automated UD corpus and complications involved. To show its applicability to corpora in different languages, we extend the pipeline and convert a Faroese constituency treebank to a UD corpus. The result is an open-source conversion tool, published under an Apache 2.0 license, applicable to a Penn-style treebank for conversion to a UD corpus, along with the two new UD corpora.