Exploring Coreference Features in Heterogeneous Data

Ekaterina Lapshinova-Koltunski, Kerstin Kunz


Abstract
The present paper focuses on variation phenomena in coreference chains. We address the hypothesis that the degree of structural variation between chain elements depends on language-specific constraints and preferences and, even more, on the communicative situation of language production. We define coreference features that also include reference to abstract entities and events. These features are inspired through several sources – cognitive parameters, pragmatic factors and typological status. We pay attention to the distributions of these features in a dataset containing English and German texts of spoken and written discourse mode, which can be classified into seven different registers. We apply text classification and feature selection to find out how these variational dimensions (language, mode and register) impact on coreference features. Knowledge on the variation under analysis is valuable for contrastive linguistics, translation studies and multilingual natural language processing (NLP), e.g. machine translation or cross-lingual coreference resolution.
Anthology ID:
2020.codi-1.6
Volume:
Proceedings of the First Workshop on Computational Approaches to Discourse
Month:
November
Year:
2020
Address:
Online
Venues:
CODI | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–64
Language:
URL:
https://www.aclweb.org/anthology/2020.codi-1.6
DOI:
10.18653/v1/2020.codi-1.6
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.codi-1.6.pdf