Daniela Goecke


Influence of Text Type and Text Length on Anaphoric Annotation
Daniela Goecke | Maik Stührenberg | Andreas Witt
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We report the results of a study that investigates the agreement of anaphoric annotations. The study focuses on the influence of the factors text length and text type on a corpus of scientific articles and newspaper texts. In order to measure inter-annotator agreement we compare existing approaches and we propose to measure each step of the annotation process separately instead of measuring the resulting anaphoric relations only. A total amount of 3,642 anaphoric relations has been annotated for a corpus of 53,038 tokens (12,327 markables). The results of the study show that text type has more influence on inter-annotator agreement than text length. Furthermore, the definition of well-defined annotation instructions and coder training is a crucial point in order to receive good annotation results.


Web-based Annotation of Anaphoric Relations and Lexical Chains
Maik Stührenberg | Daniela Goecke | Nils Diewald | Alexander Mehler | Irene Cramer
Proceedings of the Linguistic Annotation Workshop


Exploiting logical document structure for anaphora resolution
Daniela Goecke | Andreas Witt
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The aim of the paper is twofold. Firstly, an approach is presented how to select the correct antecedent for an anaphoric element according to the kind of text segments in which both of them occur. Basically, information on logical text structure (e.g. chapters, sections, paragraphs) is used in order to select the antecedent life span of a linguistic expression, i.e. some linguistic expressions are more likely to be chosen as an antecedent throughout the whole text than others. In addition, an appropriate search scope for an anaphora expressed by an expression can be defined according to the document structuring elements that include the linguistic expression. Corpus investigations give rise to the supposition that logical text structure influences the search scope of candidates for antecedents. Second, a solution is presented how to integrate the resources used for anaphora resolution. In this approach, multi-layered XML annotation is used in order to make a set of resources accessible for the anaphora resolution system.

Multidimensional markup and heterogeneous linguistic resources
Maik Stührenberg | Andreas Witt | Daniela Goecke | Dieter Metzing | Oliver Schonefeld
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing