Manuel Carlos Díaz-Galiano


2020

pdf bib
Transfer learning applied to text classification in Spanish radiological reports
Pilar López Úbeda | Manuel Carlos Díaz-Galiano | L. Alfonso Urena Lopez | Maite Martin | Teodoro Martín-Noguerol | Antonio Luna
Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020)

Pre-trained text encoders have rapidly advanced the state-of-the-art on many Natural Language Processing tasks. This paper presents the use of transfer learning methods applied to the automatic detection of codes in radiological reports in Spanish. Assigning codes to a clinical document is a popular task in NLP and in the biomedical domain. These codes can be of two types: standard classifications (e.g. ICD-10) or specific to each clinic or hospital. In this study we show a system using specific radiology clinic codes. The dataset is composed of 208,167 radiology reports labeled with 89 different codes. The corpus has been evaluated with three methods using the BERT model applied to Spanish: Multilingual BERT, BETO and XLM. The results are interesting obtaining 70% of F1-score with a pre-trained multilingual model.

2019

pdf bib
SINAI-DL at SemEval-2019 Task 5: Recurrent networks and data augmentation by paraphrasing
Arturo Montejo-Ráez | Salud María Jiménez-Zafra | Miguel A. García-Cumbreras | Manuel Carlos Díaz-Galiano
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the participation of the SINAI-DL team at Task 5 in SemEval 2019, called HatEval. We have applied some classic neural network layers, like word embeddings and LSTM, to build a neural classifier for both proposed tasks. Due to the small amount of training data provided compared to what is expected for an adequate learning stage in deep architectures, we explore the use of paraphrasing tools as source for data augmentation. Our results show that this method is promising, as some improvement has been found over non-augmented training sets.

pdf bib
SINAI-DL at SemEval-2019 Task 7: Data Augmentation and Temporal Expressions
Miguel A. García-Cumbreras | Salud María Jiménez-Zafra | Arturo Montejo-Ráez | Manuel Carlos Díaz-Galiano | Estela Saquete
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the participation of the SINAI-DL team at RumourEval (Task 7 in SemEval 2019, subtask A: SDQC). SDQC addresses the challenge of rumour stance classification as an indirect way of identifying potential rumours. Given a tweet with several replies, our system classifies each reply into either supporting, denying, questioning or commenting on the underlying rumours. We have applied data augmentation, temporal expressions labelling and transfer learning with a four-layer neural classifier. We achieve an accuracy of 0.715 with the official run over reply tweets.

2016

pdf bib
Pictogrammar: an AAC device based on a semantic grammar
Fernando Martínez-Santiago | Miguel Ángel García-Cumbreras | Arturo Montejo-Ráez | Manuel Carlos Díaz-Galiano
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications