Ce Zhang


pdf bib
ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation
Nora Hollenstein | Marius Troendle | Ce Zhang | Nicolas Langer
Proceedings of the 12th Language Resources and Evaluation Conference

We recorded and preprocessed ZuCo 2.0, a new dataset of simultaneous eye-tracking and electroencephalography during natural reading and during annotation. This corpus contains gaze and brain activity data of 739 English sentences, 349 in a normal reading paradigm and 390 in a task-specific paradigm, in which the 18 participants actively search for a semantic relation type in the given sentences as a linguistic annotation task. This new dataset complements ZuCo 1.0 by providing experiments designed to analyze the differences in cognitive processing between natural reading and annotation. The data is freely available here: https://osf.io/2urht/.

pdf bib
Control, Generate, Augment: A Scalable Framework for Multi-Attribute Text Generation
Giuseppe Russo | Nora Hollenstein | Claudiu Cristian Musat | Ce Zhang
Findings of the Association for Computational Linguistics: EMNLP 2020

We introduce CGA, a conditional VAE architecture, to control, generate, and augment text. CGA is able to generate natural English sentences controlling multiple semantic and syntactic attributes by combining adversarial learning with a context-aware loss and a cyclical word dropout routine. We demonstrate the value of the individual model components in an ablation study. The scalability of our approach is ensured through a single discriminator, independently of the number of attributes. We show high quality, diversity and attribute control in the generated sentences through a series of automatic and human assessments. As the main application of our work, we test the potential of this new NLG model in a data augmentation scenario. In a downstream NLP task, the sentences generated by our CGA model show significant improvements over a strong baseline, and a classification performance often comparable to adding same amount of additional real data.

pdf bib
CogniVal in Action: An Interface for Customizable Cognitive Word Embedding Evaluation
Nora Hollenstein | Adrian van der Lek | Ce Zhang
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

We demonstrate the functionalities of the new user interface for CogniVal. CogniVal is a framework for the cognitive evaluation of English word embeddings, which evaluates the quality of the embeddings based on their performance to predict human lexical representations from cognitive language processing signals from various sources. In this paper, we present an easy-to-use command line interface for CogniVal with multiple improvements over the original work, including the possibility to evaluate custom embeddings against custom cognitive data sources.


pdf bib
CogniVal: A Framework for Cognitive Word Embedding Evaluation
Nora Hollenstein | Antonio de la Torre | Nicolas Langer | Ce Zhang
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

An interesting method of evaluating word representations is by how much they reflect the semantic representations in the human brain. However, most, if not all, previous works only focus on small datasets and a single modality. In this paper, we present the first multi-modal framework for evaluating English word representations based on cognitive lexical semantics. Six types of word embeddings are evaluated by fitting them to 15 datasets of eye-tracking, EEG and fMRI signals recorded during language processing. To achieve a global score over all evaluation hypotheses, we apply statistical significance testing accounting for the multiple comparisons problem. This framework is easily extensible and available to include other intrinsic and extrinsic evaluation methods. We find strong correlations in the results between cognitive datasets, across recording modalities and to their performance on extrinsic NLP tasks.

pdf bib
Entity Recognition at First Sight: Improving NER with Eye Movement Information
Nora Hollenstein | Ce Zhang
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Previous research shows that eye-tracking data contains information about the lexical and syntactic properties of text, which can be used to improve natural language processing models. In this work, we leverage eye movement features from three corpora with recorded gaze information to augment a state-of-the-art neural model for named entity recognition (NER) with gaze embeddings. These corpora were manually annotated with named entity labels. Moreover, we show how gaze features, generalized on word type level, eliminate the need for recorded eye-tracking data at test time. The gaze-augmented models for NER using token-level and type-level features outperform the baselines. We present the benefits of eye-tracking features by evaluating the NER models on both individual datasets as well as in cross-domain settings.


pdf bib
ETH-DS3Lab at SemEval-2018 Task 7: Effectively Combining Recurrent and Convolutional Neural Networks for Relation Classification and Extraction
Jonathan Rotsztejn | Nora Hollenstein | Ce Zhang
Proceedings of The 12th International Workshop on Semantic Evaluation

Reliably detecting relevant relations between entities in unstructured text is a valuable resource for knowledge extraction, which is why it has awaken significant interest in the field of Natural Language Processing. In this paper, we present a system for relation classification and extraction based on an ensemble of convolutional and recurrent neural networks that ranked first in 3 out of the 4 Subtasks at SemEval 2018 Task 7. We provide detailed explanations and grounds for the design choices behind the most relevant features and analyze their importance.

pdf bib
Patient Risk Assessment and Warning Symptom Detection Using Deep Attention-Based Neural Networks
Ivan Girardi | Pengfei Ji | An-phi Nguyen | Nora Hollenstein | Adam Ivankay | Lorenz Kuhn | Chiara Marchiori | Ce Zhang
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

We present an operational component of a real-world patient triage system. Given a specific patient presentation, the system is able to assess the level of medical urgency and issue the most appropriate recommendation in terms of best point of care and time to treat. We use an attention-based convolutional neural network architecture trained on 600,000 doctor notes in German. We compare two approaches, one that uses the full text of the medical notes and one that uses only a selected list of medical entities extracted from the text. These approaches achieve 79% and 66% precision, respectively, but on a confidence threshold of 0.6, precision increases to 85% and 75%, respectively. In addition, a method to detect warning symptoms is implemented to render the classification task transparent from a medical perspective. The method is based on the learning of attention scores and a method of automatic validation using the same data.


pdf bib
Understanding Tables in Context Using Standard NLP Toolkits
Vidhya Govindaraju | Ce Zhang | Christopher Ré
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


pdf bib
Big Data versus the Crowd: Looking for Relationships in All the Right Places
Ce Zhang | Feng Niu | Christopher Ré | Jude Shavlik
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)