Fabio Rinaldi


2020

pdf bib
SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-based Embedding Spaces
Vani Kanjirangat | Sandra Mitrovic | Alessandro Antonucci | Fabio Rinaldi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Lexical semantic change detection (also known as semantic shift tracing) is a task of identifying words that have changed their meaning over time. Unsupervised semantic shift tracing, focal point of SemEval2020, is particularly challenging. Given the unsupervised setup, in this work, we propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings. As such, disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages. To leverage this idea, clustering is performed on contextualized (BERT-based) embeddings of word occurrences. The obtained results show that our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.

pdf bib
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis
Eben Holderness | Antonio Jimeno Yepes | Alberto Lavelli | Anne-Lyse Minard | James Pustejovsky | Fabio Rinaldi
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

pdf bib
Annotating the Pandemic: Named Entity Recognition and Normalisation in COVID-19 Literature
Nico Colic | Lenz Furrer | Fabio Rinaldi
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. We are presenting a publicly available pipeline to perform named entity recognition and normalisation in parallel to help find relevant publications and to aid in downstream NLP tasks such as text summarisation. In our approach, we are using a dictionary-based system for its high recall in conjunction with two models based on BioBERT for their accuracy. Their outputs are combined according to different strategies depending on the entity type. In addition, we are using a manually crafted dictionary to increase performance for new concepts related to COVID-19. We have previously evaluated our work on the CRAFT corpus, and make the output of our pipeline available on two visualisation platforms.

pdf bib
COVID-19 Twitter Monitor: Aggregating and Visualizing COVID-19 Related Trends in Social Media
Joseph Cornelius | Tilia Ellendorff | Lenz Furrer | Fabio Rinaldi
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

Social media platforms offer extensive information about the development of the COVID-19 pandemic and the current state of public health. In recent years, the Natural Language Processing community has developed a variety of methods to extract health-related information from posts on social media platforms. In order for these techniques to be used by a broad public, they must be aggregated and presented in a user-friendly way. We have aggregated ten methods to analyze tweets related to the COVID-19 pandemic, and present interactive visualizations of the results on our online platform, the COVID-19 Twitter Monitor. In the current version of our platform, we offer distinct methods for the inspection of the dataset, at different levels: corpus-wide, single post, and spans within each post. Besides, we allow the combination of different methods to enable a more selective acquisition of knowledge. Through the visual and interactive combination of various methods, interconnections in the different outputs can be revealed.

2019

pdf bib
UZH@CRAFT-ST: a Sequence-labeling Approach to Concept Recognition
Lenz Furrer | Joseph Cornelius | Fabio Rinaldi
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

As our submission to the CRAFT shared task 2019, we present two neural approaches to concept recognition. We propose two different systems for joint named entity recognition (NER) and normalization (NEN), both of which model the task as a sequence labeling problem. Our first system is a BiLSTM network with two separate outputs for NER and NEN trained from scratch, whereas the second system is an instance of BioBERT fine-tuned on the concept-recognition task. We exploit two strategies for extending concept coverage, ontology pretraining and backoff with a dictionary lookup. Our results show that the backoff strategy effectively tackles the problem of unseen concepts, addressing a major limitation of the chosen design. In the cross-system comparison, BioBERT proves to be a strong basis for creating a concept-recognition system, although some entity types are predicted more accurately by the BiLSTM-based system.

pdf bib
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)
Eben Holderness | Antonio Jimeno Yepes | Alberto Lavelli | Anne-Lyse Minard | James Pustejovsky | Fabio Rinaldi
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)

pdf bib
Approaching SMM4H with Merged Models and Multi-task Learning
Tilia Ellendorff | Lenz Furrer | Nicola Colic | Noëmi Aepli | Fabio Rinaldi
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.

2018

pdf bib
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Alberto Lavelli | Anne-Lyse Minard | Fabio Rinaldi
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

pdf bib
UZH@SMM4H: System Descriptions
Tilia Ellendorff | Joseph Cornelius | Heath Gordon | Nicola Colic | Fabio Rinaldi
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

Our team at the University of Zürich participated in the first 3 of the 4 sub-tasks at the Social Media Mining for Health Applications (SMM4H) shared task. We experimented with different approaches for text classification, namely traditional feature-based classifiers (Logistic Regression and Support Vector Machines), shallow neural networks, RCNNs, and CNNs. This system description paper provides details regarding the different system architectures and the achieved results.

2016

pdf bib
The PsyMine Corpus - A Corpus annotated with Psychiatric Disorders and their Etiological Factors
Tilia Ellendorff | Simon Foster | Fabio Rinaldi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present the first version of a corpus annotated for psychiatric disorders and their etiological factors. The paper describes the choice of text, annotated entities and events/relations as well as the annotation scheme and procedure applied. The corpus is featuring a selection of focus psychiatric disorders including depressive disorder, anxiety disorder, obsessive-compulsive disorder, phobic disorders and panic disorder. Etiological factors for these focus disorders are widespread and include genetic, physiological, sociological and environmental factors among others. Etiological events, including annotated evidence text, represent the interactions between their focus disorders and their etiological factors. Additionally to these core events, symptomatic and treatment events have been annotated. The current version of the corpus includes 175 scientific abstracts. All entities and events/relations have been manually annotated by domain experts and scores of inter-annotator agreement are presented. The aim of the corpus is to provide a first gold standard to support the development of biomedical text mining applications for the specific area of mental disorders which belong to the main contributors to the contemporary burden of disease.

pdf bib
Author Name Disambiguation in MEDLINE Based on Journal Descriptors and Semantic Types
Dina Vishnyakova | Raul Rodriguez-Esteban | Khan Ozol | Fabio Rinaldi
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

Author name disambiguation (AND) in publication and citation resources is a well-known problem. Often, information about email address and other details in the affiliation is missing. In cases where such information is not available, identifying the authorship of publications becomes very challenging. Consequently, there have been attempts to resolve such cases by utilizing external resources as references. However, such external resources are heterogeneous and are not always reliable regarding the correctness of information. To solve the AND task, especially when information about an author is not complete we suggest the use of new features such as journal descriptors (JD) and semantic types (ST). The evaluation of different feature models shows that their inclusion has an impact equivalent to that of other important features such as email address. Using such features we show that our system outperforms the state of the art.

2014

pdf bib
Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction
Tilia Ellendorff | Fabio Rinaldi | Simon Clematide
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We show how to use large biomedical databases in order to obtain a gold standard for training a machine learning system over a corpus of biomedical text. As an example we use the Comparative Toxicogenomics Database (CTD) and describe by means of a short case study how the obtained data can be applied. We explain how we exploit the structure of the database for compiling training material and a testset. Using a Naive Bayes document classification approach based on words, stem bigrams and MeSH descriptors we achieve a macro-average F-score of 61% on a subset of 8 action terms. This outperforms a baseline system based on a lookup of stemmed keywords by more than 20%. Furthermore, we present directions of future work, taking the described system as a vantage point. Future work will be aiming towards a weakly supervised system capable of discovering complete biomedical interactions and events.

2013

pdf bib
UZH in BioNLP 2013
Gerold Schneider | Simon Clematide | Tilia Ellendorff | Don Tuggener | Fabio Rinaldi | Gintarė Grigonytė
Proceedings of the BioNLP Shared Task 2013 Workshop

2012

pdf bib
Dependency parsing for interaction detection in pharmacogenomics
Gerold Schneider | Fabio Rinaldi | Simon Clematide
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We give an overview of our approach to the extraction of interactions between pharmacogenomic entities like drugs, genes and diseases and suggest classes of interaction types driven by data from PharmGKB and partly following the top level ontology WordNet and biomedical types from BioNLP. Our text mining approach to the extraction of interactions is based on syntactic analysis. We use syntactic analyses to explore domain events and to suggest a set of interaction labels for the pharmacogenomics domain.

2011

pdf bib
An Incremental Model for the Coreference Resolution Task of BioNLP 2011
Don Tuggener | Manfred Klenner | Gerold Schneider | Simon Clematide | Fabio Rinaldi
Proceedings of BioNLP Shared Task 2011 Workshop

2009

pdf bib
TX Task: Automatic Detection of Focus Organisms in Biomedical Publications
Thomas Kappeler | Kaarel Kaljurand | Fabio Rinaldi
Proceedings of the BioNLP 2009 Workshop

pdf bib
UZurich in the BioNLP 2009 Shared Task
Kaarel Kaljurand | Gerold Schneider | Fabio Rinaldi
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2008

pdf bib
Dependency-Based Relation Mining for Biomedical Literature
Fabio Rinaldi | Gerold Schneider | Kaarel Kaljurand | Michael Hess
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe techniques for the automatic detection of relationships among domain entities (e.g. genes, proteins, diseases) mentioned in the biomedical literature. Our approach is based on the adaptive selection of candidate interactions sentences, which are then parsed using our own dependency parser. Specific syntax-based filters are used to limit the number of possible candidate interacting pairs. The approach has been implemented as a demonstrator over a corpus of 2000 richly annotated MedLine abstracts, and later tested by participation to a text mining competition. In both cases, the results obtained have proved the adequacy of the proposed approach to the task of interaction detection.

2007

pdf bib
Pro3Gres Parser in the CoNLL Domain Adaptation Shared Task
Gerold Schneider | Kaarel Kaljurand | Fabio Rinaldi | Tobias Kuhn
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2004

pdf bib
The Role of MultiWord Terminology in Knowledge Management
James Dowdall | Will Lowe | Jeremy Ellman | Fabio Rinaldi | Michael Hess
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

One of the major obstacles for knowledge management remains MultiWord Terminology (MWT). This paper explores the difficulties that arise and describes real world solutions implemented as part of the Parmenides project. Parmenides is being built as an integrated knowledge management package that combines information, MWT and ontology extraction methods in a semi-automated framework. The focus of this paper is on eliciting ontological fragments based on dedicated MWT processing.

pdf bib
Steps Towards Semantically Annotated Language Resources
Manfred Klenner | Fabio Rinaldi | Michael Hess
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Exploiting Language Resources for Semantic Web Annotations
Kaarel Kaljurand | Fabio Rinaldi | James Dowdall | Michael Hess
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Answering Questions in the Genomics Domain
Fabio Rinaldi | James Dowdall | Gerold Schneider | Andreas Persidis
Proceedings of the Conference on Question Answering in Restricted Domains

pdf bib
Fast, Deep-Linguistic Statistical Dependency Parsing
Gerold Schneider | Fabio Rinaldi | James Dowdall
Proceedings of the Workshop on Recent Advances in Dependency Grammar

pdf bib
A robust and hybrid deep-linguistic theory applied to large-scale parsing
Gerold Schneider | James Dowdall | Fabio Rinaldi
Proceedings of the 3rd workshop on RObust Methods in Analysis of Natural Language Data (ROMAND 2004)

2003

pdf bib
Exploiting Paraphrases in a Question Answering System
Fabio Rinaldi | James Dowdall | Kaarel Kaljurand | Michael Hess | Diego Mollá
Proceedings of the Second International Workshop on Paraphrasing

pdf bib
Complex Structuring of Term Variants for Question Answering
James Dowdall | Fabio Rinaldi | Fidelia Ibekwe-SanJuan | Eric SanJuan
Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment

pdf bib
Parmenides: An Opportunity for ISO TC37 SC4?
Fabio Rinaldi | James Dowdall | Michael Hess | Kaarel Kaljurand | Andreas Persidis
Proceedings of the ACL 2003 Workshop on Linguistic Annotation: Getting the Model Right

2002

pdf bib
Technical Terminology as a Critical Resource
James Dowdall | Michael Hess | Neeme Kahusk | Kaarel Kaljurand | Mare Koit | Fabio Rinaldi | Kadri Vider
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

1998

pdf bib
FACILE: Description of the NE System Used for MUC-7
William J Black | Fabio Rinaldi | David Mowatt
Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998