Léo Bouscarrat


2020

pdf bib
Multilingual enrichment of disease biomedical ontologies
Léo Bouscarrat | Antoine Bonnefoy | Cécile Capponi | Carlos Ramisch
Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020)

Translating biomedical ontologies is an important challenge, but doing it manually requires much time and money. We study the possibility to use open-source knowledge bases to translate biomedical ontologies. We focus on two aspects: coverage and quality. We look at the coverage of two biomedical ontologies focusing on diseases with respect to Wikidata for 9 European languages (Czech, Dutch, English, French, German, Italian, Polish, Portuguese and Spanish) for both, plus Arabic, Chinese and Russian for the second. We first use direct links between Wikidata and the studied ontologies and then use second-order links by going through other intermediate ontologies. We then compare the quality of the translations obtained thanks to Wikidata with a commercial machine translation tool, here Google Cloud Translation.

2019

pdf bib
STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings
Léo Bouscarrat | Antoine Bonnefoy | Thomas Peel | Cécile Pereira
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

This paper introduces STRASS: Summarization by TRAnsformation Selection and Scoring. It is an extractive text summarization method which leverages the semantic information in existing sentence embedding spaces. Our method creates an extractive summary by selecting the sentences with the closest embeddings to the document embedding. The model earns a transformation of the document embedding to minimize the similarity between the extractive summary and the ground truth summary. As the transformation is only composed of a dense layer, the training can be done on CPU, therefore, inexpensive. Moreover, inference time is short and linear according to the number of sentences. As a second contribution, we introduce the French CASS dataset, composed of judgments from the French Court of cassation and their corresponding summaries. On this dataset, our results show that our method performs similarly to the state of the art extractive methods with effective training and inferring time.

2018

pdf bib
Towards Language Technology for Mi’kmaq
Anant Maheshwari | Léo Bouscarrat | Paul Cook
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)