A Benchmark of Rule-Based and Neural Coreference Resolution in Dutch Novels and News

Corbèn Poot, Andreas van Cranenburgh


Abstract
We evaluate a rule-based (Lee et al., 2013) and neural (Lee et al., 2018) coreference system on Dutch datasets of two domains: literary novels and news/Wikipedia text. The results provide insight into the relative strengths of data-driven and knowledge-driven systems, as well as the influence of domain, document length, and annotation schemes. The neural system performs best on news/Wikipedia text, while the rule-based system performs best on literature. The neural system shows weaknesses with limited training data and long documents, while the rule-based system is affected by annotation differences. The code and models used in this paper are available at https://github.com/andreasvc/crac2020
Anthology ID:
2020.crac-1.9
Volume:
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
December
Year:
2020
Address:
Barcelona, Spain (online)
Venues:
COLING | CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
79–90
Language:
URL:
https://www.aclweb.org/anthology/2020.crac-1.9
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.crac-1.9.pdf