Günter Neumann

Also published as: Gunter Neumann, Guenter Neumann


Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus
Eleni Metheniti | Guenter Neumann
Proceedings of the 12th Language Resources and Evaluation Conference

Multilingual, inflectional corpora are a scarce resource in the NLP community, especially corpora with annotated morpheme boundaries. We are evaluating a generated, multilingual inflectional corpus with morpheme boundaries, generated from the English Wiktionary (Metheniti and Neumann, 2018), against the largest, multilingual, high-quality inflectional corpus of the UniMorph project (Kirov et al., 2018). We confirm that the generated Wikinflection corpus is not of such quality as UniMorph, but we were able to extract a significant amount of words from the intersection of the two corpora. Our Wikinflection corpus benefits from the morpheme segmentations of Wiktionary/Wikinflection and from the manually-evaluated morphological feature tags of the UniMorph project, and has 216K lemmas and 5.4M word forms, in a total of 68 languages.

A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction
Saadullah Amin | Katherine Ann Dunfield | Anna Vechkaeva | Guenter Neumann
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Fact triples are a common form of structured knowledge used within the biomedical domain. As the amount of unstructured scientific texts continues to grow, manual annotation of these texts for the task of relation extraction becomes increasingly expensive. Distant supervision offers a viable approach to combat this by quickly producing large amounts of labeled, but considerably noisy, data. We aim to reduce such noise by extending an entity-enriched relation classification BERT model to the problem of multiple instance learning, and defining a simple data encoding scheme that significantly reduces noise, reaching state-of-the-art performance for distantly-supervised biomedical relation extraction. Our approach further encodes knowledge about the direction of relation triples, allowing for increased focus on relation learning by reducing noise and alleviating the need for joint learning with knowledge graph completion.

CopyBERT: A Unified Approach to Question Generation with Self-Attention
Stalin Varanasi | Saadullah Amin | Guenter Neumann
Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI

Contextualized word embeddings provide better initialization for neural networks that deal with various natural language understanding (NLU) tasks including Question Answering (QA) and more recently, Question Generation(QG). Apart from providing meaningful word representations, pre-trained transformer models (Vaswani et al., 2017), such as BERT (Devlin et al., 2019) also provide self-attentions which encode syntactic information that can be probed for dependency parsing (Hewitt and Manning, 2019) and POStagging (Coenen et al., 2019). In this paper, we show that the information from selfattentions of BERT are useful for language modeling of questions conditioned on paragraph and answer phrases. To control the attention span, we use semi-diagonal mask and utilize a shared model for encoding and decoding, unlike sequence-to-sequence. We further employ copy-mechanism over self-attentions to acheive state-of-the-art results for Question Generation on SQuAD v1.1 (Rajpurkar et al., 2016).


Team DOMLIN: Exploiting Evidence Enhancement for the FEVER Shared Task
Dominik Stammbach | Guenter Neumann
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

This paper contains our system description for the second Fact Extraction and VERification (FEVER) challenge. We propose a two-staged sentence selection strategy to account for examples in the dataset where evidence is not only conditioned on the claim, but also on previously retrieved evidence. We use a publicly available document retrieval module and have fine-tuned BERT checkpoints for sentence se- lection and as the entailment classifier. We report a FEVER score of 68.46% on the blind testset.

Identifying Grammar Rules for Language Education with Dependency Parsing in German
Eleni Metheniti | Pomi Park | Kristina Kolesova | Günter Neumann
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

DOMLIN at SemEval-2019 Task 8: Automated Fact Checking exploiting Ratings in Community Question Answering Forums
Dominik Stammbach | Stalin Varanasi | Guenter Neumann
Proceedings of the 13th International Workshop on Semantic Evaluation

In the following, we describe our system developed for the Semeval2019 Task 8. We fine-tuned a BERT checkpoint on the qatar living forum dump and used this checkpoint to train a number of models. Our hand-in for subtask A consists of a fine-tuned classifier from this BERT checkpoint. For subtask B, we first have a classifier deciding whether a comment is factual or non-factual. If it is factual, we retrieve intra-forum evidence and using this evidence, have a classifier deciding the comment’s veracity. We trained this classifier on ratings which we crawled from qatarliving.com


LightRel at SemEval-2018 Task 7: Lightweight and Fast Relation Classification
Tyler Renslow | Günter Neumann
Proceedings of The 12th International Workshop on Semantic Evaluation

We present LightRel, a lightweight and fast relation classifier. Our goal is to develop a high baseline for different relation extraction tasks. By defining only very few data-internal, word-level features and external knowledge sources in the form of word clusters and word embeddings, we train a fast and simple linear classifier

An Interactive Web-Interface for Visualizing the Inner Workings of the Question Answering LSTM
Ekaterina Loginova | Günter Neumann
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present a visualisation tool which aims to illuminate the inner workings of an LSTM model for question answering. It plots heatmaps of neurons’ firings and allows a user to check the dependency between neurons and manual features. The system possesses an interactive web-interface and can be adapted to other models and domains.

How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?
Georg Heigold | Stalin Varanasi | Günter Neumann | Josef van Genabith
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

Code-Mixed Question Answering Challenge: Crowd-sourcing Data and Techniques
Khyathi Chandu | Ekaterina Loginova | Vishal Gupta | Josef van Genabith | Günter Neumann | Manoj Chinnakotla | Eric Nyberg | Alan W. Black
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

Code-Mixing (CM) is the phenomenon of alternating between two or more languages which is prevalent in bi- and multi-lingual communities. Most NLP applications today are still designed with the assumption of a single interaction language and are most likely to break given a CM utterance with multiple languages mixed at a morphological, phrase or sentence level. For example, popular commercial search engines do not yet fully understand the intents expressed in CM queries. As a first step towards fostering research which supports CM in NLP applications, we systematically crowd-sourced and curated an evaluation dataset for factoid question answering in three CM languages - Hinglish (Hindi+English), Tenglish (Telugu+English) and Tamlish (Tamil+English) which belong to two language families (Indo-Aryan and Dravidian). We share the details of our data collection process, techniques which were used to avoid inducing lexical bias amongst the crowd workers and other CM specific linguistic properties of the dataset. Our final dataset, which is available freely for research purposes, has 1,694 Hinglish, 2,848 Tamlish and 1,391 Tenglish factoid questions and their answers. We discuss the techniques used by the participants for the first edition of this ongoing challenge.


An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages
Georg Heigold | Guenter Neumann | Josef van Genabith
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Character-based approaches are attractive as they can handle rarely- and unseen words gracefully. We evaluate on 14 languages and observe consistent gains over a state-of-the-art morphological tagger across all languages except for English and French, where we match the state-of-the-art. We compare two architectures for computing character-based word vectors using recurrent (RNN) and convolutional (CNN) nets. We show that the CNN based approach performs slightly worse and less consistently than the RNN based approach. Small but systematic gains are observed when combining the two architectures by ensembling.


An analysis of textual inference in German customer emails
Kathrin Eichler | Aleksandra Gabryszak | Günter Neumann
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

The Excitement Open Platform for Textual Inferences
Bernardo Magnini | Roberto Zanoli | Ido Dagan | Kathrin Eichler | Guenter Neumann | Tae-Gil Noh | Sebastian Pado | Asher Stern | Omer Levy
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations


Exploiting User Search Sessions for the Semantic Categorization of Question-like Informational Search Queries
Alejandro Figueroa | Guenter Neumann
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Design and Realization of the EXCITEMENT Open Platform for Textual Entailment
Günter Neumann | Sebastian Padó
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora


An Adaptive Framework for Named Entity Combination
Bogdan Sacaleanu | Günter Neumann
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We have developed a new OSGi-based platform for Named Entity Recognition (NER) which uses a voting strategy to combine the results produced by several existing NER systems (currently OpenNLP, LingPipe and Stanford). The different NER systems have been systematically decomposed and modularized into the same pipeline of preprocessing components in order to support a flexible selection and ordering of the NER processing flow. This high modular and component-based design supports the possibility to setup different constellations of chained processing steps including alternative voting strategies for combining the results of parallel running components.

Parsing Hindi with MDParser
Alexander Volokh | Günter Neumann
Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages


Automatic Detection and Correction of Errors in Dependency Treebanks
Alexander Volokh | Günter Neumann
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content
Günter Neumann | Sven Schmeier
Proceedings of the ACL-HLT 2011 System Demonstrations


DFKI KeyWE: Ranking Keyphrases Extracted from Scientific Articles
Kathrin Eichler | Günter Neumann
Proceedings of the 5th International Workshop on Semantic Evaluation

372:Comparing the Benefit of Different Dependency Parsers for Textual Entailment Using Syntactic Constraints Only
Alexander Volokh | Günter Neumann
Proceedings of the 5th International Workshop on Semantic Evaluation


Unsupervised Relation Extraction From Web Documents
Kathrin Eichler | Holmer Hemsen | Günter Neumann
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The IDEX system is a prototype of an interactive dynamic Information Extraction (IE) system. A user of the system expresses an information request in the form of a topic description, which is used for an initial search in order to retrieve a relevant set of documents. On basis of this set of documents, unsupervised relation extraction and clustering is done by the system. The results of these operations can then be interactively inspected by the user. In this paper we describe the relation extraction and clustering components of the IDEX system. Preliminary evaluation results of these components are presented and an overview is given of possible enhancements to improve the relation extraction and clustering components.

The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering
Elena Cabrio | Milen Kouylekov | Bernardo Magnini | Matteo Negri | Laura Hasler | Constantin Orasan | David Tomás | Jose Luis Vicedo | Guenter Neumann | Corinna Weber
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the QALL-ME benchmark, a multilingual resource of annotated spoken requests in the tourism domain, freely available for research purposes. The languages currently involved in the project are Italian, English, Spanish and German. It introduces a semantic annotation scheme for spoken information access requests, specifically derived from Question Answering (QA) research. In addition to pragmatic and semantic annotations, we propose three QA-based annotation levels: the Expected Answer Type, the Expected Answer Quantifier and the Question Topical Target of a request, to fully capture the content of a request and extract the sought-after information. The QALL-ME benchmark is developed under the EU-FP6 QALL-ME project which aims at the realization of a shared and distributed infrastructure for Question Answering (QA) systems on mobile devices (e.g. mobile phones). Questions are formulated by the users in free natural language input, and the system returns the actual sequence of words which constitutes the answer from a collection of information sources (e.g. documents, databases). Within this framework, the benchmark has the twofold purpose of training machine learning based applications for QA, and testing their actual performance with a rapid turnaround in controlled laboratory setting.

A Puristic Approach for Joint Dependency Parsing and Semantic Role Labeling
Alexander Volokh | Günter Neumann
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning


Recognizing Textual Entailment Using Sentence Similarity based on Dependency Tree Skeletons
Rui Wang | Günter Neumann
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

DFKI2: An Information Extraction Based Approach to People Disambiguation
Andrea Heyl | Günter Neumann
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)


Exploring HPSG-based Treebanks for Probabilistic Parsing HPSG grammar extraction
Günter Neumann | Berthold Crysmann
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We describe a method for the automatic extraction of a Stochastic Lexicalized Tree Insertion Grammar from a linguistically rich HPSG Treebank. The extraction method is strongly guided by HPSG-based head and argument decomposition rules. The tree anchors correspond to lexical labels encoding fine-grained information. The approach has been tested with a German corpus achieving a labeled recall of 77.33% and labeled precision of 78.27%, which is competitive to recent results reported for German parsing using the Negra Treebank.

Cross-Cutting Aspects of Cross-Language Question Answering Systems
Bogdan Sacaleanu | Günter Neumann
Proceedings of the Workshop on Multilingual Question Answering - MLQA ‘06


An Integrated Archictecture for Shallow and Deep Processing
Berthold Crysmann | Anette Frank | Bernd Kiefer | Stefan Mueller | Guenter Neumann | Jakub Piskorski | Ulrich Schaefer | Melanie Siegel | Hans Uszkoreit | Feiyu Xu | Markus Becker | Hans-Ulrich Krieger
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics


A Divide-and-Conquer Strategy for Shallow Parsing of German Free Texts
Gunter Neumann | Christian Braun | Jakub Piskorski
Sixth Applied Natural Language Processing Conference


Automatic extraction of stochastic lexicalized tree grammars from treebanks
Günter Neumann
Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4)


Applying Explanation-based Learning to Control and Speeding-up Natural Language Generation
Gunter Neumann
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

An Information Extraction Core System for Real World German Text Processing
Gunter Neumann | Rolf Backofen | Judith Baur | Markus Becker | Christian Braun
Fifth Conference on Applied Natural Language Processing


DISCO-An HPSG-based NLP System and its Application for Appointment Scheduling Project Note
Hans Uszkoreit | Rolf Backofen | Stephan Busemann | Abdel Kader Diagne | Elizabeth A. Hinkelman | Walter Kasper | Bernd Kiefer | Hans-Ulrich Krieger | Klaus Netter | Gunter Neumann | Stephan Oepen | Stephen P. Spackman
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics


Self-Monitoring with Reversible Grammars
Gunter Neumann | Gertjan van Noord
COLING 1992 Volume 2: The 15th International Conference on Computational Linguistics


A Bidirectional Model for Natural Language Processing
Gunter Neumann
Fifth Conference of the European Chapter of the Association for Computational Linguistics

Reversibility and Modularity in Natural Language Generation
Gunter Neumann
Reversible Grammar in Natural Language Processing


A Head-Driven Approach to Incremental and Parallel Generation of Syntactic Structures
Gunter Neumann | Wolfgang Finkler
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics