Diana Steffen


Beyond Citations: Corpus-based Methods for Detecting the Impact of Research Outcomes on Society
Rezvaneh Rezapour | Jutta Bopp | Norman Fiedler | Diana Steffen | Andreas Witt | Jana Diesner
Proceedings of the 12th Language Resources and Evaluation Conference

This paper proposes, implements and evaluates a novel, corpus-based approach for identifying categories indicative of the impact of research via a deductive (top-down, from theory to data) and an inductive (bottom-up, from data to theory) approach. The resulting categorization schemes differ in substance. Research outcomes are typically assessed by using bibliometric methods, such as citation counts and patterns, or alternative metrics, such as references to research in the media. Shortcomings with these methods are their inability to identify impact of research beyond academia (bibliometrics) and considering text-based impact indicators beyond those that capture attention (altmetrics). We address these limitations by leveraging a mixed-methods approach for eliciting impact categories from experts, project personnel (deductive) and texts (inductive). Using these categories, we label a corpus of project reports per category schema, and apply supervised machine learning to infer these categories from project reports. The classification results show that we can predict deductively and inductively derived impact categories with 76.39% and 78.81% accuracy (F1-score), respectively. Our approach can complement solutions from bibliometrics and scientometrics for assessing the impact of research and studying the scope and types of advancements transferred from academia to society.


A Corpus of Literal and Idiomatic Uses of German Infinitive-Verb Compounds
Andrea Horbach | Andrea Hensler | Sabine Krome | Jakob Prange | Werner Scholze-Stubenrecht | Diana Steffen | Stefan Thater | Christian Wellner | Manfred Pinkal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present an annotation study on a representative dataset of literal and idiomatic uses of German infinitive-verb compounds in newspaper and journal texts. Infinitive-verb compounds form a challenge for writers of German, because spelling regulations are different for literal and idiomatic uses. Through the participation of expert lexicographers we were able to obtain a high-quality corpus resource which offers itself as a testbed for automatic idiomaticity detection and coarse-grained word-sense disambiguation. We trained a classifier on the corpus which was able to distinguish literal and idiomatic uses with an accuracy of 85 %.


Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain
Paul Buitelaar | Diana Steffen | Martin Volk | Dominic Widdows | Bogdan Sacaleanu | Špela Vintar | Stanley Peters | Hans Uszkoreit
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)


Unsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS
Dominic Widdows | Stanley Peters | Scott Cederberg | Chiu-Ki Chan | Diana Steffen | Paul Buitelaar
Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine