Elena Tutubalina


2020

pdf bib
Ad Lingua: Text Classification Improves Symbolism Prediction in Image Advertisements
Andrey Savchenko | Anton Alekseev | Sejeong Kwon | Elena Tutubalina | Evgeny Myasnikov | Sergey Nikolenko
Proceedings of the 28th International Conference on Computational Linguistics

Understanding image advertisements is a challenging task, often requiring non-literal interpretation. We argue that standard image-based predictions are insufficient for symbolism prediction. Following the intuition that texts and images are complementary in advertising, we introduce a multimodal ensemble of a state of the art image-based classifier, a classifier based on an object detection architecture, and a fine-tuned language model applied to texts extracted from ads by OCR. The resulting system establishes a new state of the art in symbolism prediction.

pdf bib
Fair Evaluation in Concept Normalization: a Large-scale Comparative Analysis for BERT-based Models
Elena Tutubalina | Artur Kadurin | Zulfat Miftahutdinov
Proceedings of the 28th International Conference on Computational Linguistics

Linking of biomedical entity mentions to various terminologies of chemicals, diseases, genes, adverse drug reactions is a challenging task, often requiring non-syntactic interpretation. A large number of biomedical corpora and state-of-the-art models have been introduced in the past five years. However, there are no general guidelines regarding the evaluation of models on these corpora in single- and cross-terminology settings. In this work, we perform a comparative evaluation of various benchmarks and study the efficiency of state-of-the-art neural architectures based on Bidirectional Encoder Representations from Transformers (BERT) for linking of three entity types across three domains: research abstracts, drug labels, and user-generated texts on drug therapy in English. We have made the source code and results available at https://github.com/insilicomedicine/Fair-Evaluation-BERT.

pdf bib
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
Graciela Gonzalez-Hernandez | Ari Z. Klein | Ivan Flores | Davy Weissenbacher | Arjun Magge | Karen O'Connor | Abeed Sarker | Anne-Lyse Minard | Elena Tutubalina | Zulfat Miftahutdinov | Ilseyar Alimova
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

pdf bib
Overview of the Fifth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2020
Ari Klein | Ilseyar Alimova | Ivan Flores | Arjun Magge | Zulfat Miftahutdinov | Anne-Lyse Minard | Karen O’Connor | Abeed Sarker | Elena Tutubalina | Davy Weissenbacher | Graciela Gonzalez-Hernandez
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

The vast amount of data on social media presents significant opportunities and challenges for utilizing it as a resource for health informatics. The fifth iteration of the Social Media Mining for Health Applications (#SMM4H) shared tasks sought to advance the use of Twitter data (tweets) for pharmacovigilance, toxicovigilance, and epidemiology of birth defects. In addition to re-runs of three tasks, #SMM4H 2020 included new tasks for detecting adverse effects of medications in French and Russian tweets, characterizing chatter related to prescription medication abuse, and detecting self reports of birth defect pregnancy outcomes. The five tasks required methods for binary classification, multi-class classification, and named entity recognition (NER). With 29 teams and a total of 130 system submissions, participation in the #SMM4H shared tasks continues to grow.

pdf bib
KFU NLP Team at SMM4H 2020 Tasks: Cross-lingual Transfer Learning with Pretrained Language Models for Drug Reactions
Zulfat Miftahutdinov | Andrey Sakhovskiy | Elena Tutubalina
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

This paper describes neural models developed for the Social Media Mining for Health (SMM4H) 2020 shared tasks. Specifically, we participated in two tasks. We investigate the use of a language representation model BERT pretrained on a large-scale corpus of 5 million health-related user reviews in English and Russian. The ensemble of neural networks for extraction and normalization of adverse drug reactions ranked first among 7 teams at the SMM4H 2020 Task 3 and obtained a relaxed F1 of 46%. The BERT-based multilingual model for classification of English and Russian tweets that report adverse reactions ranked second among 16 and 7 teams at two first subtasks of the SMM4H 2019 Task 2 and obtained a relaxed F1 of 58% on English tweets and 51% on Russian tweets.

2019

pdf bib
Deep Neural Models for Medical Concept Normalization in User-Generated Texts
Zulfat Miftahutdinov | Elena Tutubalina
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

In this work, we consider the medical concept normalization problem, i.e., the problem of mapping a health-related entity mention in a free-form text to a concept in a controlled vocabulary, usually to the standard thesaurus in the Unified Medical Language System (UMLS). This is a challenging task since medical terminology is very different when coming from health care professionals or from the general public in the form of social media texts. We approach it as a sequence learning problem with powerful neural networks such as recurrent neural networks and contextualized word representation models trained to obtain semantic representations of social media expressions. Our experimental evaluation over three different benchmarks shows that neural architectures leverage the semantic meaning of the entity mention and significantly outperform existing state of the art models.

pdf bib
Detecting Adverse Drug Reactions from Biomedical Texts with Neural Networks
Ilseyar Alimova | Elena Tutubalina
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Detection of adverse drug reactions in postapproval periods is a crucial challenge for pharmacology. Social media and electronic clinical reports are becoming increasingly popular as a source for obtaining health related information. In this work, we focus on extraction information of adverse drug reactions from various sources of biomedical textbased information, including biomedical literature and social media. We formulate the problem as a binary classification task and compare the performance of four state-of-the-art attention-based neural networks in terms of the F-measure. We show the effectiveness of these methods on four different benchmarks.

pdf bib
KFU NLP Team at SMM4H 2019 Tasks: Want to Extract Adverse Drugs Reactions from Tweets? BERT to The Rescue
Zulfat Miftahutdinov | Ilseyar Alimova | Elena Tutubalina
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

This paper describes a system developed for the Social Media Mining for Health (SMM4H) 2019 shared tasks. Specifically, we participated in three tasks. The goals of the first two tasks are to classify whether a tweet contains mentions of adverse drug reactions (ADR) and extract these mentions, respectively. The objective of the third task is to build an end-to-end solution: first, detect ADR mentions and then map these entities to concepts in a controlled vocabulary. We investigate the use of a language representation model BERT trained to obtain semantic representations of social media texts. Our experiments on a dataset of user reviews showed that BERT is superior to state-of-the-art models based on recurrent neural networks. The BERT-based system for Task 1 obtained an F1 of 57.38%, with improvements up to +7.19% F1 over a score averaged across all 43 submissions. The ensemble of neural networks with a voting scheme for named entity recognition ranked first among 9 teams at the SMM4H 2019 Task 2 and obtained a relaxed F1 of 65.8%. The end-to-end model based on BERT for ADR normalization ranked first at the SMM4H 2019 Task 3 and obtained a relaxed F1 of 43.2%.

bib
AspeRa: Aspect-Based Rating Prediction Based on User Reviews
Elena Tutubalina | Valentin Malykh | Sergey Nikolenko | Anton Alekseev | Ilya Shenbin
Proceedings of the 2019 Workshop on Widening NLP

We propose a novel Aspect-based Rating Prediction model (AspeRa) that estimates user rating based on review texts for the items. It is based on aspect extraction with neural networks and combines the advantages of deep learning and topic modeling. It is mainly designed for recommendations, but an important secondary goal of AspeRa is to discover coherent aspects of reviews that can be used to explain predictions or for user profiling. We conduct a comprehensive empirical study of AspeRa, showing that it outperforms state-of-the-art models in terms of recommendation quality and produces interpretable aspects. This paper is an abridged version of our work (Nikolenko et al., 2019)

bib
Entity-level Classification of Adverse Drug Reactions: a Comparison of Neural Network Models
Ilseyar Alimova | Elena Tutubalina
Proceedings of the 2019 Workshop on Widening NLP

This paper presents our experimental work on exploring the potential of neural network models developed for aspect-based sentiment analysis for entity-level adverse drug reaction (ADR) classification. Our goal is to explore how to represent local context around ADR mentions and learn an entity representation, interacting with its context. We conducted extensive experiments on various sources of text-based information, including social media, electronic health records, and abstracts of scientific articles from PubMed. The results show that Interactive Attention Neural Network (IAN) outperformed other models on four corpora in terms of macro F-measure. This work is an abridged version of our recent paper accepted to Programming and Computer Software journal in 2019.

pdf bib
Distant Supervision for Sentiment Attitude Extraction
Nicolay Rusnachenko | Natalia Loukachevitch | Elena Tutubalina
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

News articles often convey attitudes between the mentioned subjects, which is essential for understanding the described situation. In this paper, we describe a new approach to distant supervision for extracting sentiment attitudes between named entities mentioned in texts. Two factors (pair-based and frame-based) were used to automatically label an extensive news collection, dubbed as RuAttitudes. The latter became a basis for adaptation and training convolutional architectures, including piecewise max pooling and full use of information across different sentences. The results show that models, trained with RuAttitudes, outperform ones that were trained with only supervised learning approach and achieve 13.4% increase in F1-score on RuSentRel collection.

2015

pdf bib
Clustering-based Approach to Multiword Expression Extraction and Ranking
Elena Tutubalina
Proceedings of the 11th Workshop on Multiword Expressions

2014

pdf bib
Unsupervised Approach to Extracting Problem Phrases from User Reviews of Products
Elena Tutubalina | Vladimir Ivanov
Proceedings of the First AHA!-Workshop on Information Discovery in Text