Aitor González-Agirre

Also published as: Aitor Gonzalez-Agirre


2019

pdf bib
PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track
Aitor Gonzalez-Agirre | Montserrat Marimon | Ander Intxaurrondo | Obdulia Rabal | Marta Villegas | Martin Krallinger
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

One of the biomedical entity types of relevance for medicine or biosciences are chemical compounds and drugs. The correct detection these entities is critical for other text mining applications building on them, such as adverse drug-reaction detection, medication-related fake news or drug-target extraction. Although a significant effort was made to detect mentions of drugs/chemicals in English texts, so far only very limited attempts were made to recognize them in medical documents in other languages. Taking into account the growing amount of medical publications and clinical records written in Spanish, we have organized the first shared task on detecting drug and chemical entities in Spanish medical documents. Additionally, we included a clinical concept-indexing sub-track asking teams to return SNOMED-CT identifiers related to drugs/chemicals for a collection of documents. For this task, named PharmaCoNER, we generated annotation guidelines together with a corpus of 1,000 manually annotated clinical case studies. A total of 22 teams participated in the sub-track 1, (77 system runs), and 7 teams in the sub-track 2 (19 system runs). Top scoring teams used sophisticated deep learning approaches yielding very competitive results with F-measures above 0.91. These results indicate that there is a real interest in promoting biomedical text mining efforts beyond English. We foresee that the PharmaCoNER annotation guidelines, corpus and participant systems will foster the development of new resources for clinical and biomedical text mining systems of Spanish medical data.

pdf bib
Medical Word Embeddings for Spanish: Development and Evaluation
Felipe Soares | Marta Villegas | Aitor Gonzalez-Agirre | Martin Krallinger | Jordi Armengol-Estapé
Proceedings of the 2nd Clinical Natural Language Processing Workshop

Word embeddings are representations of words in a dense vector space. Although they are not recent phenomena in Natural Language Processing (NLP), they have gained momentum after the recent developments of neural methods and Word2Vec. Regarding their applications in medical and clinical NLP, they are invaluable resources when training in-domain named entity recognition systems, classifiers or taggers, for instance. Thus, the development of tailored word embeddings for medical NLP is of great interest. However, we identified a gap in the literature which we aim to fill in this paper: the availability of embeddings for medical NLP in Spanish, as well as a standardized form of intrinsic evaluation. Since most work has been done for English, some established datasets for intrinsic evaluation are already available. In this paper, we show the steps we employed to adapt such datasets for the first time to Spanish, of particular relevance due to the considerable volume of EHRs in this language, as well as the creation of in-domain medical word embeddings for the Spanish using the state-of-the-art FastText model. We performed intrinsic evaluation with our adapted datasets, as well as extrinsic evaluation with a named entity recognition systems using a baseline embedding of general-domain. Both experiments proved that our embeddings are suitable for use in medical NLP in the Spanish language, and are more accurate than general-domain ones.

2016

pdf bib
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
Eneko Agirre | Carmen Banea | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Rada Mihalcea | German Rigau | Janyce Wiebe
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
SemEval-2016 Task 2: Interpretable Semantic Textual Similarity
Eneko Agirre | Aitor Gonzalez-Agirre | Iñigo Lopez-Gazpio | Montse Maritxalar | German Rigau | Larraitz Uria
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
UBC: Cubes for English Semantic Textual Similarity and Supervised Approaches for Interpretable STS
Eneko Agirre | Aitor Gonzalez-Agirre | Iñigo Lopez-Gazpio | Montse Maritxalar | German Rigau | Larraitz Uria
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
Eneko Agirre | Carmen Banea | Claire Cardie | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo | Iñigo Lopez-Gazpio | Montse Maritxalar | Rada Mihalcea | German Rigau | Larraitz Uria | Janyce Wiebe
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
Eneko Agirre | Carmen Banea | Claire Cardie | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo | Rada Mihalcea | German Rigau | Janyce Wiebe
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
*SEM 2013 shared task: Semantic Textual Similarity
Eneko Agirre | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre | Weiwei Guo
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
UBC_UOS-TYPED: Regression for typed-similarity
Eneko Agirre | Nikolaos Aletras | Aitor Gonzalez-Agirre | German Rigau | Mark Stevenson
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

pdf bib
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
Eneko Agirre | Daniel Cer | Mona Diab | Aitor Gonzalez-Agirre
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Multilingual Central Repository version 3.0
Aitor Gonzalez-Agirre | Egoitz Laparra | German Rigau
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes the upgrading process of the Multilingual Central Repository (MCR). The new MCR uses WordNet 3.0 as Interlingual-Index (ILI). Now, the current version of the MCR integrates in the same EuroWordNet framework wordnets from five different languages: English, Spanish, Catalan, Basque and Galician. In order to provide ontological coherence to all the integrated wordnets, the MCR has also been enriched with a disparate set of ontologies: Base Concepts, Top Ontology, WordNet Domains and Suggested Upper Merged Ontology. The whole content of the MCR is freely available.

pdf bib
A proposal for improving WordNet Domains
Aitor González-Agirre | Mauro Castillo | German Rigau
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

WordNet Domains (WND) is a lexical resource where synsets have been semi-automatically annotated with one or more domain labels from a set of 165 hierarchically organized domains. The uses of WND include the power to reduce the polysemy degree of the words, grouping those senses that belong to the same domain. But the semi-automatic method used to develop this resource was far from being perfect. By cross-checking the content of the Multilingual Central Repository (MCR) it is possible to find some errors and inconsistencies. Many are very subtle. Others, however, leave no doubt. Moreover, it is very difficult to quantify the number of errors in the original version of WND. This paper presents a novel semi-automatic method to propagate domain information through the MCR. We also compare both labellings (the original and the new one) allowing us to detect anomalies in the original WND labels.