Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format

Alina Wróblewska


Abstract
The paper presents the largest Polish Dependency Bank in Universal Dependencies format – PDBUD – with 22K trees and 352K tokens. PDBUD builds on its previous version, i.e. the Polish UD treebank (PL-SZ), and contains all 8K PL-SZ trees. The PL-SZ trees are checked and possibly corrected in the current edition of PDBUD. Further 14K trees are automatically converted from a new version of Polish Dependency Bank. The PDBUD trees are expanded with the enhanced edges encoding the shared dependents and the shared governors of the coordinated conjuncts and with the semantic roles of some dependents. The conducted evaluation experiments show that PDBUD is large enough for training a high-quality graph-based dependency parser for Polish.
Anthology ID:
W18-6020
Volume:
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
Month:
November
Year:
2018
Address:
Brussels, Belgium
Venues:
EMNLP | UDW | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
173–182
Language:
URL:
https://www.aclweb.org/anthology/W18-6020
DOI:
10.18653/v1/W18-6020
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-6020.pdf