Nora Hollenstein


2020

pdf bib
ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation
Nora Hollenstein | Marius Troendle | Ce Zhang | Nicolas Langer
Proceedings of the 12th Language Resources and Evaluation Conference

We recorded and preprocessed ZuCo 2.0, a new dataset of simultaneous eye-tracking and electroencephalography during natural reading and during annotation. This corpus contains gaze and brain activity data of 739 English sentences, 349 in a normal reading paradigm and 390 in a task-specific paradigm, in which the 18 participants actively search for a semantic relation type in the given sentences as a linguistic annotation task. This new dataset complements ZuCo 1.0 by providing experiments designed to analyze the differences in cognitive processing between natural reading and annotation. The data is freely available here: https://osf.io/2urht/.

pdf bib
Control, Generate, Augment: A Scalable Framework for Multi-Attribute Text Generation
Giuseppe Russo | Nora Hollenstein | Claudiu Cristian Musat | Ce Zhang
Findings of the Association for Computational Linguistics: EMNLP 2020

We introduce CGA, a conditional VAE architecture, to control, generate, and augment text. CGA is able to generate natural English sentences controlling multiple semantic and syntactic attributes by combining adversarial learning with a context-aware loss and a cyclical word dropout routine. We demonstrate the value of the individual model components in an ablation study. The scalability of our approach is ensured through a single discriminator, independently of the number of attributes. We show high quality, diversity and attribute control in the generated sentences through a series of automatic and human assessments. As the main application of our work, we test the potential of this new NLG model in a data augmentation scenario. In a downstream NLP task, the sentences generated by our CGA model show significant improvements over a strong baseline, and a classification performance often comparable to adding same amount of additional real data.

pdf bib
CogniVal in Action: An Interface for Customizable Cognitive Word Embedding Evaluation
Nora Hollenstein | Adrian van der Lek | Ce Zhang
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

We demonstrate the functionalities of the new user interface for CogniVal. CogniVal is a framework for the cognitive evaluation of English word embeddings, which evaluates the quality of the embeddings based on their performance to predict human lexical representations from cognitive language processing signals from various sources. In this paper, we present an easy-to-use command line interface for CogniVal with multiple improvements over the original work, including the possibility to evaluate custom embeddings against custom cognitive data sources.

pdf bib
Towards Best Practices for Leveraging Human Language Processing Signals for Natural Language Processing
Nora Hollenstein | Maria Barrett | Lisa Beinborn
Proceedings of the Second Workshop on Linguistic and Neurocognitive Resources

NLP models are imperfect and lack intricate capabilities that humans access automatically when processing speech or reading a text. Human language processing data can be leveraged to increase the performance of models and to pursue explanatory research for a better understanding of the differences between human and machine language processing. We review recent studies leveraging different types of cognitive processing signals, namely eye-tracking, M/EEG and fMRI data recorded during language understanding. We discuss the role of cognitive data for machine learning-based NLP methods and identify fundamental challenges for processing pipelines. Finally, we propose practical strategies for using these types of cognitive signals to enhance NLP models.

2019

pdf bib
CogniVal: A Framework for Cognitive Word Embedding Evaluation
Nora Hollenstein | Antonio de la Torre | Nicolas Langer | Ce Zhang
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

An interesting method of evaluating word representations is by how much they reflect the semantic representations in the human brain. However, most, if not all, previous works only focus on small datasets and a single modality. In this paper, we present the first multi-modal framework for evaluating English word representations based on cognitive lexical semantics. Six types of word embeddings are evaluated by fitting them to 15 datasets of eye-tracking, EEG and fMRI signals recorded during language processing. To achieve a global score over all evaluation hypotheses, we apply statistical significance testing accounting for the multiple comparisons problem. This framework is easily extensible and available to include other intrinsic and extrinsic evaluation methods. We find strong correlations in the results between cognitive datasets, across recording modalities and to their performance on extrinsic NLP tasks.

pdf bib
Entity Recognition at First Sight: Improving NER with Eye Movement Information
Nora Hollenstein | Ce Zhang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Previous research shows that eye-tracking data contains information about the lexical and syntactic properties of text, which can be used to improve natural language processing models. In this work, we leverage eye movement features from three corpora with recorded gaze information to augment a state-of-the-art neural model for named entity recognition (NER) with gaze embeddings. These corpora were manually annotated with named entity labels. Moreover, we show how gaze features, generalized on word type level, eliminate the need for recorded eye-tracking data at test time. The gaze-augmented models for NER using token-level and type-level features outperform the baselines. We present the benefits of eye-tracking features by evaluating the NER models on both individual datasets as well as in cross-domain settings.

2018

pdf bib
ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction
Jonathan Rotsztejn | Nora Hollenstein | Ce Zhang
Proceedings of The 12th International Workshop on Semantic Evaluation

Reliably detecting relevant relations between entities in unstructured text is a valuable resource for knowledge extraction, which is why it has awaken significant interest in the field of Natural Language Processing. In this paper, we present a system for relation classification and extraction based on an ensemble of convolutional and recurrent neural networks that ranked first in 3 out of the 4 Subtasks at SemEval 2018 Task 7. We provide detailed explanations and grounds for the design choices behind the most relevant features and analyze their importance.

pdf bib
Sequence Classification with Human Attention
Maria Barrett | Joachim Bingel | Nora Hollenstein | Marek Rei | Anders Søgaard
Proceedings of the 22nd Conference on Computational Natural Language Learning

Learning attention functions requires large volumes of data, but many NLP tasks simulate human behavior, and in this paper, we show that human attention really does provide a good inductive bias on many attention functions in NLP. Specifically, we use estimated human attention derived from eye-tracking corpora to regularize attention functions in recurrent neural networks. We show substantial improvements across a range of tasks, including sentiment analysis, grammatical error detection, and detection of abusive language.

pdf bib
Patient Risk Assessment and Warning Symptom Detection Using Deep Attention-Based Neural Networks
Ivan Girardi | Pengfei Ji | An-phi Nguyen | Nora Hollenstein | Adam Ivankay | Lorenz Kuhn | Chiara Marchiori | Ce Zhang
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

We present an operational component of a real-world patient triage system. Given a specific patient presentation, the system is able to assess the level of medical urgency and issue the most appropriate recommendation in terms of best point of care and time to treat. We use an attention-based convolutional neural network architecture trained on 600,000 doctor notes in German. We compare two approaches, one that uses the full text of the medical notes and one that uses only a selected list of medical entities extracted from the text. These approaches achieve 79% and 66% precision, respectively, but on a confidence threshold of 0.6, precision increases to 85% and 75%, respectively. In addition, a method to detect warning symptoms is implemented to render the classification task transparent from a medical perspective. The method is based on the learning of attention scores and a method of automatic validation using the same data.

2016

pdf bib
Inconsistency Detection in Semantic Annotation
Nora Hollenstein | Nathan Schneider | Bonnie Webber
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Inconsistencies are part of any manually annotated corpus. Automatically finding these inconsistencies and correcting them (even manually) can increase the quality of the data. Past research has focused mainly on detecting inconsistency in syntactic annotation. This work explores new approaches to detecting inconsistency in semantic annotation. Two ranking methods are presented in this paper: a discrepancy ranking and an entropy ranking. Those methods are then tested and evaluated on multiple corpora annotated with multiword expressions and supersense labels. The results show considerable improvements in detecting inconsistency candidates over a random baseline. Possible applications of methods for inconsistency detection are improving the annotation procedure as well as the guidelines and correcting errors in completed annotations.

2014

pdf bib
SA-UZH: Verb-based Sentiment Analysis
Nora Hollenstein | Michael Amsler | Martina Bachmann | Manfred Klenner
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Inducing Domain-specific Noun Polarity Guided by Domain-independent Polarity Preferences of Adjectives
Manfred Klenner | Michael Amsler | Nora Hollenstein
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging
Nora Hollenstein | Noëmi Aepli
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects