Combining the output of two coreference resolution systems for two source languages to improve annotation projection

Yulia Grishina


Abstract
Although parallel coreference corpora can to a high degree support the development of SMT systems, there are no large-scale parallel datasets available due to the complexity of the annotation task and the variability in annotation schemes. In this study, we exploit an annotation projection method to combine the output of two coreference resolution systems for two different source languages (English, German) in order to create an annotated corpus for a third language (Russian). We show that our technique is superior to projecting annotations from a single source language, and we provide an in-depth analysis of the projected annotations in order to assess the perspectives of our approach.
Anthology ID:
W17-4809
Volume:
Proceedings of the Third Workshop on Discourse in Machine Translation
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venues:
DiscoMT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
67–72
Language:
URL:
https://www.aclweb.org/anthology/W17-4809
DOI:
10.18653/v1/W17-4809
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-4809.pdf