A Fully Expanded Dependency Treebank for Telugu

Sneha Nallani, Manish Shrivastava, Dipti Sharma


Abstract
Treebanks are an essential resource for syntactic parsing. The available Paninian dependency treebank(s) for Telugu is annotated only with inter-chunk dependency relations and not all words of a sentence are part of the parse tree. In this paper, we automatically annotate the intra-chunk dependencies in the treebank using a Shift-Reduce parser based on Context Free Grammar rules for Telugu chunks. We also propose a few additional intra-chunk dependency relations for Telugu apart from the ones used in Hindi treebank. Annotating intra-chunk dependencies finally provides a complete parse tree for every sentence in the treebank. Having a fully expanded treebank is crucial for developing end to end parsers which produce complete trees. We present a fully expanded dependency treebank for Telugu consisting of 3220 sentences. In this paper, we also convert the treebank annotated with Anncorra part-of-speech tagset to the latest BIS tagset. The BIS tagset is a hierarchical tagset adopted as a unified part-of-speech standard across all Indian Languages. The final treebank is made publicly available.
Anthology ID:
2020.wildre-1.8
Volume:
Proceedings of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
LREC | WILDRE | WS
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
39–44
Language:
English
URL:
https://www.aclweb.org/anthology/2020.wildre-1.8
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.wildre-1.8.pdf