Subjecthood and annotation: The cases of French and Wolof

Olivier Bondéelle, Sylvain Kahane


Abstract
This article considers the annotation of subjects in UD treebanks. The identification of the subject poses a particular problem in Wolof, due to pronominal indices whose status as a pronoun or a pronominal affix is uncertain. In the UD treebank available for Wolof (Dione, 2019), these have been annotated depending on the construction either as true subjects, or as morphosyntactic features agreeing with the verb. The study of this corpus of 40 000 words allows us to show that the problem is indeed difficult to solve, especially since Wolof has a rich system of auxiliaries and several basic constructions with different properties. Before addressing the case of Wolof, we will present the simpler, but partly comparable, case of French, where subject clitics also tend to behave like affixes, and subjecthood can move from the preverbal to the detached position. We will also make a several annotation recommendations that would avoid overwriting information regarding subjecthood.
Anthology ID:
2020.udw-1.5
Volume:
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venues:
COLING | UDW
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–45
Language:
URL:
https://www.aclweb.org/anthology/2020.udw-1.5
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.udw-1.5.pdf