Automatic Alignment and Annotation Projection for Literary Texts

Uli Steinbach, Ines Rehbein


Abstract
This paper presents a modular NLP pipeline for the creation of a parallel literature corpus, followed by annotation transfer from the source to the target language. The test case we use to evaluate our pipeline is the automatic transfer of quote and speaker mention annotations from English to German. We evaluate the different components of the pipeline and discuss challenges specific to literary texts. Our experiments show that after applying a reasonable amount of semi-automatic postprocessing we can obtain high-quality aligned and annotated resources for a new language.
Anthology ID:
W19-2505
Volume:
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
June
Year:
2019
Address:
Minneapolis, USA
Venues:
LaTeCH | NAACL | WS
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–45
Language:
URL:
https://www.aclweb.org/anthology/W19-2505
DOI:
10.18653/v1/W19-2505
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-2505.pdf