A Recursive Annotation Scheme for Referential Information Status

Arndt Riester, David Lorenz, Nina Seemann


Abstract
We provide a robust and detailed annotation scheme for information status, which is easy to use, follows a semantic rather than cognitive motivation, and achieves reasonable inter-annotator scores. Our annotation scheme is based on two main assumptions: firstly, that information status strongly depends on (in)definiteness, and secondly, that it ought to be understood as a property of referents rather than words. Therefore, our scheme banks on overt (in)definiteness marking and provides different categories for each class. Definites are grouped according to the information source by which the referent is identified. A special aspect of the scheme is that non-anaphoric expressions (e.g.\ names) are classified as to whether their referents are likely to be known or unknown to an expected audience. The annotation scheme provides a solution for annotating complex nominal expressions which may recursively contain embedded expressions. In annotating a corpus of German radio news bulletins, a kappa score of .66 for the full scheme was achieved, a core scheme of six top-level categories yields kappa = .78.
Anthology ID:
L10-1528
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/764_Paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/764_Paper.pdf