Graciela Gonzalez

Also published as: Graciela Gonzalez-Hernandez


2020

pdf bib
UPennHLP at WNUT-2020 Task 2 : Transformer models for classification of COVID19 posts on Twitter
Arjun Magge | Varad Pimpalkhute | Divya Rallapalli | David Siguenza | Graciela Gonzalez-Hernandez
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Increasing usage of social media presents new non-traditional avenues for monitoring disease outbreaks, virus transmissions and disease progressions through user posts describing test results or disease symptoms. However, the discussions on the topic of infectious diseases that are informative in nature also span various topics such as news, politics and humor which makes the data mining challenging. We present a system to identify tweets about the COVID19 disease outbreak that are deemed to be informative on Twitter for use in downstream applications. The system scored a F1-score of 0.8941, Precision of 0.9028, Recall of 0.8856 and Accuracy of 0.9010. In the shared task organized as part of the 6th Workshop of Noisy User-generated Text (WNUT), the system was ranked 18th by F1-score and 13th by Accuracy.

pdf bib
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task
Graciela Gonzalez-Hernandez | Ari Z. Klein | Ivan Flores | Davy Weissenbacher | Arjun Magge | Karen O'Connor | Abeed Sarker | Anne-Lyse Minard | Elena Tutubalina | Zulfat Miftahutdinov | Ilseyar Alimova
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

pdf bib
Overview of the Fifth Social Media Mining for Health Applications (#SMM4H) Shared Tasks at COLING 2020
Ari Klein | Ilseyar Alimova | Ivan Flores | Arjun Magge | Zulfat Miftahutdinov | Anne-Lyse Minard | Karen O’Connor | Abeed Sarker | Elena Tutubalina | Davy Weissenbacher | Graciela Gonzalez-Hernandez
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

The vast amount of data on social media presents significant opportunities and challenges for utilizing it as a resource for health informatics. The fifth iteration of the Social Media Mining for Health Applications (#SMM4H) shared tasks sought to advance the use of Twitter data (tweets) for pharmacovigilance, toxicovigilance, and epidemiology of birth defects. In addition to re-runs of three tasks, #SMM4H 2020 included new tasks for detecting adverse effects of medications in French and Russian tweets, characterizing chatter related to prescription medication abuse, and detecting self reports of birth defect pregnancy outcomes. The five tasks required methods for binary classification, multi-class classification, and named entity recognition (NER). With 29 teams and a total of 130 system submissions, participation in the #SMM4H shared tasks continues to grow.

2019

pdf bib
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
Davy Weissenbacher | Graciela Gonzalez-Hernandez
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

pdf bib
Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019
Davy Weissenbacher | Abeed Sarker | Arjun Magge | Ashlynn Daughton | Karen O’Connor | Michael J. Paul | Graciela Gonzalez-Hernandez
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

The number of users of social media continues to grow, with nearly half of adults worldwide and two-thirds of all American adults using social networking. Advances in automated data processing, machine learning and NLP present the possibility of utilizing this massive data source for biomedical and public health applications, if researchers address the methodological challenges unique to this media. We present the Social Media Mining for Health Shared Tasks collocated with the ACL at Florence in 2019, which address these challenges for health monitoring and surveillance, utilizing state of the art techniques for processing noisy, real-world, and substantially creative language expressions from social media users. For the fourth execution of this challenge, we proposed four different tasks. Task 1 asked participants to distinguish tweets reporting an adverse drug reaction (ADR) from those that do not. Task 2, a follow-up to Task 1, asked participants to identify the span of text in tweets reporting ADRs. Task 3 is an end-to-end task where the goal was to first detect tweets mentioning an ADR and then map the extracted colloquial mentions of ADRs in the tweets to their corresponding standard concept IDs in the MedDRA vocabulary. Finally, Task 4 asked participants to classify whether a tweet contains a personal mention of one’s health, a more general discussion of the health issue, or is an unrelated mention. A total of 34 teams from around the world registered and 19 teams from 12 countries submitted a system run. We summarize here the corpora for this challenge which are freely available at https://competitions.codalab.org/competitions/22521, and present an overview of the methods and the results of the competing systems.

pdf bib
SemEval-2019 Task 12: Toponym Resolution in Scientific Papers
Davy Weissenbacher | Arjun Magge | Karen O’Connor | Matthew Scotch | Graciela Gonzalez-Hernandez
Proceedings of the 13th International Workshop on Semantic Evaluation

We present the SemEval-2019 Task 12 which focuses on toponym resolution in scientific articles. Given an article from PubMed, the task consists of detecting mentions of names of places, or toponyms, and mapping the mentions to their corresponding entries in GeoNames.org, a database of geospatial locations. We proposed three subtasks. In Subtask 1, we asked participants to detect all toponyms in an article. In Subtask 2, given toponym mentions as input, we asked participants to disambiguate them by linking them to entries in GeoNames. In Subtask 3, we asked participants to perform both the detection and the disambiguation steps for all toponyms. A total of 29 teams registered, and 8 teams submitted a system run. We summarize the corpus and the tools created for the challenge. They are freely available at https://competitions.codalab.org/competitions/19948. We also analyze the methods, the results and the errors made by the competing systems with a focus on toponym disambiguation.

2018

pdf bib
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
Graciela Gonzalez-Hernandez | Davy Weissenbacher | Abeed Sarker | Michael Paul
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

pdf bib
Overview of the Third Social Media Mining for Health (SMM4H) Shared Tasks at EMNLP 2018
Davy Weissenbacher | Abeed Sarker | Michael J. Paul | Graciela Gonzalez-Hernandez
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

The goals of the SMM4H shared tasks are to release annotated social media based health related datasets to the research community, and to compare the performances of natural language processing and machine learning systems on tasks involving these datasets. The third execution of the SMM4H shared tasks, co-hosted with EMNLP-2018, comprised of four subtasks. These subtasks involve annotated user posts from Twitter (tweets) and focus on the (i) automatic classification of tweets mentioning a drug name, (ii) automatic classification of tweets containing reports of first-person medication intake, (iii) automatic classification of tweets presenting self-reports of adverse drug reaction (ADR) detection, and (iv) automatic classification of vaccine behavior mentions in tweets. A total of 14 teams participated and 78 system runs were submitted (23 for task 1, 20 for task 2, 18 for task 3, 17 for task 4).

pdf bib
Dealing with Medication Non-Adherence Expressions in Twitter
Takeshi Onishi | Davy Weissenbacher | Ari Klein | Karen O’Connor | Graciela Gonzalez-Hernandez
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

Through a semi-automatic analysis of tweets, we show that Twitter users not only express Medication Non-Adherence (MNA) in social media but also their reasons for not complying; further research is necessary to fully extract automatically and analyze this information, in order to facilitate the use of this data in epidemiological studies.

2017

pdf bib
HLP@UPenn at SemEval-2017 Task 4A: A simple, self-optimizing text classification system combining dense and sparse vectors
Abeed Sarker | Graciela Gonzalez
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We present a simple supervised text classification system that combines sparse and dense vector representations of words, and generalized representations of words via clusters. The sparse vectors are generated from word n-gram sequences (1-3). The dense vector representations of words (embeddings) are learned by training a neural network to predict neighboring words in a large unlabeled dataset. To classify a text segment, the different representations of it are concatenated, and the classification is performed using Support Vector Machines (SVM). Our system is particularly intended for use by non-experts of natural language processing and machine learning, and, therefore, the system does not require any manual tuning of parameters or weights. Given a training set, the system automatically generates the training vectors, optimizes the relevant hyper-parameters for the SVM classifier, and trains the classification model. We evaluated this system on the SemEval-2017 English sentiment analysis task. In terms of average F1-score, our system obtained 8th position out of 39 submissions (F1-score: 0.632, average recall: 0.637, accuracy: 0.646).

pdf bib
Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System
Ari Klein | Abeed Sarker | Masoud Rouhizadeh | Karen O’Connor | Graciela Gonzalez
BioNLP 2017

Social media sites (e.g., Twitter) have been used for surveillance of drug safety at the population level, but studies that focus on the effects of medications on specific sets of individuals have had to rely on other sources of data. Mining social media data for this in-formation would require the ability to distinguish indications of personal medication in-take in this media. Towards that end, this paper presents an annotated corpus that can be used to train machine learning systems to determine whether a tweet that mentions a medication indicates that the individual posting has taken that medication at a specific time. To demonstrate the utility of the corpus as a training set, we present baseline results of supervised classification.

2016

pdf bib
Automatic Prediction of Linguistic Decline in Writings of Subjects with Degenerative Dementia
Davy Weissenbacher | Travis A. Johnson | Laura Wojtulewicz | Amylou Dueck | Dona Locke | Richard Caselli | Graciela Gonzalez
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Data, tools and resources for mining social media drug chatter
Abeed Sarker | Graciela Gonzalez
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

Social media has emerged into a crucial resource for obtaining population-based signals for various public health monitoring and surveillance tasks, such as pharmacovigilance. There is an abundance of knowledge hidden within social media data, and the volume is growing. Drug-related chatter on social media can include user-generated information that can provide insights into public health problems such as abuse, adverse reactions, long-term effects, and multi-drug interactions. Our objective in this paper is to present to the biomedical natural language processing, data science, and public health communities data sets (annotated and unannotated), tools and resources that we have collected and created from social media. The data we present was collected from Twitter using the generic and brand names of drugs as keywords, along with their common misspellings. Following the collection of the data, annotation guidelines were created over several iterations, which detail important aspects of social media data annotation and can be used by future researchers for developing similar data sets. The annotation guidelines were followed to prepare data sets for text classification, information extraction and normalization. In this paper, we discuss the preparation of these guidelines, outline the data sets prepared, and present an overview of our state-of-the-art systems for data collection, supervised classification, and information extraction. In addition to the development of supervised systems for classification and extraction, we developed and released unlabeled data and language models. We discuss the potential uses of these language models in data mining and the large volumes of unlabeled data from which they were generated. We believe that the summaries and repositories we present here of our data, annotation guidelines, models, and tools will be beneficial to the research community as a single-point entry for all these resources, and will promote further research in this area.

pdf bib
DiegoLab16 at SemEval-2016 Task 4: Sentiment Analysis in Twitter using Centroids, Clusters, and Sentiment Lexicons
Abeed Sarker | Graciela Gonzalez
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
DIEGOLab: An Approach for Message-level Sentiment Classification in Twitter
Abeed Sarker | Azadeh Nikfarjam | Davy Weissenbacher | Graciela Gonzalez
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses
Tasnia Tahsin | Robert Rivera | Rachel Beard | Rob Lauder | Davy Weissenbacher | Matthew Scotch | Garrick Wallstrom | Graciela Gonzalez
Proceedings of BioNLP 2014

2013

pdf bib
Evaluating the Use of Empirically Constructed Lexical Resources for Named Entity Recognition
Siddhartha Jonnalagadda | Trevor Cohen | Stephen Wu | Hongfang Liu | Graciela Gonzalez
Proceedings of the IWCS 2013 Workshop on Computational Semantics in Clinical Text (CSCT 2013)

2012

pdf bib
Automatic Approaches for Gene-Drug Interaction Extraction from Biomedical Text: Corpus and Comparative Evaluation
Nate Sutton | Laura Wojtulewicz | Neel Mehta | Graciela Gonzalez
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

2011

pdf bib
Double Layered Learning for Biological Event Extraction from Text
Ehsan Emadzadeh | Azadeh Nikfarjam | Graciela Gonzalez
Proceedings of BioNLP Shared Task 2011 Workshop

2010

pdf bib
Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks
Robert Leaman | Laura Wojtulewicz | Ryan Sullivan | Annie Skariah | Jian Yang | Graciela Gonzalez
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing

2009

pdf bib
Molecular event extraction from Link Grammar parse trees
Jörg Hakenberg | Illés Solt | Domonkos Tikk | Luis Tari | Astrid Rheinländer | Nguyen Quang Long | Graciela Gonzalez | Ulf Leser
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

pdf bib
Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text
Siddhartha Jonnalagadda | Luis Tari | Jörg Hakenberg | Chitta Baral | Graciela Gonzalez
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers