Towards Coreference for Literary Text: Analyzing Domain-Specific Phenomena

Ina Roesiger, Sarah Schulz, Nils Reiter


Abstract
Coreference resolution is the task of grouping together references to the same discourse entity. Resolving coreference in literary texts could benefit a number of Digital Humanities (DH) tasks, such as analyzing the depiction of characters and/or their relations. Domain-dependent training data has shown to improve coreference resolution for many domains, e.g. the biomedical domain, as its properties differ significantly from news text or dialogue, on which automatic systems are typically trained. Literary texts could also benefit from corpora annotated with coreference. We therefore analyze the specific properties of coreference-related phenomena on a number of texts and give directions for the adaptation of annotation guidelines. As some of the adaptations have profound impact, we also present a new annotation tool for coreference, with a focus on enabling annotation of long texts with many discourse entities.
Anthology ID:
W18-4515
Volume:
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico
Venues:
COLING | LaTeCH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
129–138
Language:
URL:
https://www.aclweb.org/anthology/W18-4515
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-4515.pdf