Mariana Neves


2020

pdf bib
MEDLINE as a Parallel Corpus: a Survey to Gain Insight on French-, Spanish- and Portuguese-speaking Authors’ Abstract Writing Practice
Aurélie Névéol | Antonio Jimeno Yepes | Mariana Neves
Proceedings of the 12th Language Resources and Evaluation Conference

Background: Parallel corpora are used to train and evaluate machine translation systems. To alleviate the cost of producing parallel resources for evaluation campaigns, existing corpora are leveraged. However, little information may be available about the methods used for producing the corpus, including translation direction. Objective: To gain insight on MEDLINE parallel corpus used in the biomedical task at the Workshop on Machine Translation in 2019 (WMT 2019). Material and Methods: Contact information for the authors of MEDLINE articles included in the English/Spanish (EN/ES), English/French (EN/FR), and English/Portuguese (EN/PT) WMT 2019 test sets was obtained from PubMed and publisher websites. The authors were asked about their abstract writing practices in a survey. Results: The response rate was above 20%. Authors reported that they are mainly native speakers of languages other than English. Although manual translation, sometimes via professional translation services, was commonly used for abstract translation, authors of articles in the EN/ES and EN/PT sets also relied on post-edited machine translation. Discussion: This study provides a characterization of MEDLINE authors’ language skills and abstract writing practices. Conclusion: The information collected in this study will be used to inform test set design for the next WMT biomedical task.

2019

pdf bib
Evaluation of Scientific Elements for Text Similarity in Biomedical Publications
Mariana Neves | Daniel Butzke | Barbara Grune
Proceedings of the 6th Workshop on Argument Mining

Rhetorical elements from scientific publications provide a more structured view of the document and allow algorithms to focus on particular parts of the text. We surveyed the literature for previously proposed schemes for rhetorical elements and present an overview of its current state of the art. We also searched for available tools using these schemes and applied four tools for our particular task of ranking biomedical abstracts based on text similarity. Comparison of the tools with two strong baselines shows that the predictions provided by the ArguminSci tool can support our use case of mining alternative methods for animal experiments.

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

pdf bib
Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies
Rachel Bawden | Kevin Bretonnel Cohen | Cristian Grozea | Antonio Jimeno Yepes | Madeleine Kittner | Martin Krallinger | Nancy Mah | Aurelie Neveol | Mariana Neves | Felipe Soares | Amy Siu | Karin Verspoor | Maika Vicente Navarro
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

In the fourth edition of the WMT Biomedical Translation task, we considered a total of six languages, namely Chinese (zh), English (en), French (fr), German (de), Portuguese (pt), and Spanish (es). We performed an evaluation of automatic translations for a total of 10 language directions, namely, zh/en, en/zh, fr/en, en/fr, de/en, en/de, pt/en, en/pt, es/en, and en/es. We provided training data based on MEDLINE abstracts for eight of the 10 language pairs and test sets for all of them. In addition to that, we offered a new sub-task for the translation of terms in biomedical terminologies for the en/es language direction. Higher BLEU scores (close to 0.5) were obtained for the es/en, en/es and en/pt test sets, as well as for the terminology sub-task. After manual validation of the primary runs, some submissions were judged to be better than the reference translations, for instance, for de/en, en/es and es/en.

2018

pdf bib
Bf3R at SemEval-2018 Task 7: Evaluating Two Relation Extraction Tools for Finding Semantic Relations in Biomedical Abstracts
Mariana Neves | Daniel Butzke | Gilbert Schönfelder | Barbara Grune
Proceedings of The 12th International Workshop on Semantic Evaluation

Automatic extraction of semantic relations from text can support finding relevant information from scientific publications. We describe our participation in Task 7 of SemEval-2018 for which we experimented with two relations extraction tools - jSRE and TEES - for the extraction and classification of six relation types. The results we obtained with TEES were significantly superior than those with jSRE (33.4% vs. 30.09% and 20.3% vs. 16%). Additionally, we utilized the model trained with TEES for extracting semantic relations from biomedical abstracts, for which we present a preliminary evaluation.

pdf bib
Parallel Corpora for the Biomedical Domain
Aurélie Névéol | Antonio Jimeno Yepes | Mariana Neves | Karin Verspoor
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Diego Moussallem | Thiago Ferreira | Marcos Zampieri | Maria Claudia Cavalcanti | Geraldo Xexéo | Mariana Neves | Axel-Cyrille Ngonga Ngomo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Semantic role labeling tools for biomedical question answering: a study of selected tools on the BioASQ datasets
Fabian Eckert | Mariana Neves
Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering

Question answering (QA) systems usually rely on advanced natural language processing components to precisely understand the questions and extract the answers. Semantic role labeling (SRL) is known to boost performance for QA, but its use for biomedical texts has not yet been fully studied. We analyzed the performance of three SRL tools (BioKIT, BIOSMILE and PathLSTM) on 1776 questions from the BioASQ challenge. We compared the systems regarding the coverage of the questions and snippets, as well as based on pre-defined criteria, such as easiness of installation, supported formats and usability. Finally, we integrated two of the tools in a simple QA system to further evaluate their performance over the official BioASQ test sets.

pdf bib
Proceedings of the Third Conference on Machine Translation: Research Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Research Papers

bib
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

pdf bib
Findings of the WMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol | Cristian Grozea | Amy Siu | Madeleine Kittner | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

Machine translation enables the automatic translation of textual documents between languages and can facilitate access to information only available in a given language for non-speakers of this language, e.g. research results presented in scientific publications. In this paper, we provide an overview of the Biomedical Translation shared task in the Workshop on Machine Translation (WMT) 2018, which specifically examined the performance of machine translation systems for biomedical texts. This year, we provided test sets of scientific publications from two sources (EDP and Medline) and for six language pairs (English with each of Chinese, French, German, Portuguese, Romanian and Spanish). We describe the development of the various test sets, the submissions that we received and the evaluations that we carried out. We obtained a total of 39 runs from six teams and some of this year’s BLEU scores were somewhat higher that last year’s, especially for teams that made use of biomedical resources or state-of-the-art MT algorithms (e.g. Transformer). Finally, our manual evaluation scored automatic translations higher than the reference translations for German and Spanish.

2017

pdf bib
Olelo: A Question Answering Application for Biomedicine
Mariana Neves | Hendrik Folkerts | Marcel Jankrift | Julian Niedermeier | Toni Stachewicz | Sören Tietböhl | Milena Kraus | Matthias Uflacker
Proceedings of ACL 2017, System Demonstrations

pdf bib
Neural Domain Adaptation for Biomedical Question Answering
Georg Wiese | Dirk Weissenborn | Mariana Neves
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Factoid question answering (QA) has recently benefited from the development of deep learning (DL) systems. Neural network models outperform traditional approaches in domains where large datasets exist, such as SQuAD (ca. 100,000 questions) for Wikipedia articles. However, these systems have not yet been applied to QA in more specific domains, such as biomedicine, because datasets are generally too small to train a DL system from scratch. For example, the BioASQ dataset for biomedical QA comprises less then 900 factoid (single answer) and list (multiple answers) QA instances. In this work, we adapt a neural QA system trained on a large open-domain dataset (SQuAD, source) to a biomedical dataset (BioASQ, target) by employing various transfer learning techniques. Our network architecture is based on a state-of-the-art QA system, extended with biomedical word embeddings and a novel mechanism to answer list questions. In contrast to existing biomedical QA systems, our system does not rely on domain-specific ontologies, parsers or entity taggers, which are expensive to create. Despite this fact, our systems achieve state-of-the-art results on factoid questions and competitive results on list questions.

pdf bib
Neural Question Answering at BioASQ 5B
Georg Wiese | Dirk Weissenborn | Mariana Neves
BioNLP 2017

This paper describes our submission to the 2017 BioASQ challenge. We participated in Task B, Phase B which is concerned with biomedical question answering (QA). We focus on factoid and list question, using an extractive QA model, that is, we restrict our system to output substrings of the provided text snippets. At the core of our system, we use FastQA, a state-of-the-art neural QA system. We extended it with biomedical word embeddings and changed its answer layer to be able to answer list questions in addition to factoid questions. We pre-trained the model on a large-scale open-domain QA dataset, SQuAD, and then fine-tuned the parameters on the BioASQ training set. With our approach, we achieve state-of-the-art results on factoid questions and competitive results on list questions.

pdf bib
Assessing the performance of Olelo, a real-time biomedical question answering application
Mariana Neves | Fabian Eckert | Hendrik Folkerts | Matthias Uflacker
BioNLP 2017

Question answering (QA) can support physicians and biomedical researchers to find answers to their questions in the scientific literature. Such systems process large collections of documents in real time and include many natural language processing (NLP) procedures. We recently developed Olelo, a QA system for biomedicine which includes various NLP components, such as question processing, document and passage retrieval, answer processing and multi-document summarization. In this work, we present an evaluation of our system on the the fifth BioASQ challenge. We participated with the current state of the application and with an extension based on semantic role labeling that we are currently investigating. In addition to the BioASQ evaluation, we compared our system to other on-line biomedical QA systems in terms of the response time and the quality of the answers.

pdf bib
A parallel collection of clinical trials in Portuguese and English
Mariana Neves
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

Parallel collections of documents are crucial resources for training and evaluating machine translation (MT) systems. Even though large collections are available for certain domains and language pairs, these are still scarce in the biomedical domain. We developed a parallel corpus of clinical trials in Portuguese and English. The documents are derived from the Brazilian Clinical Trials Registry and the corpus currently contains a total of 1188 documents. In this paper, we describe the corpus construction and discuss the quality of the translation and the sentence alignment that we obtained.

pdf bib
Findings of the WMT 2017 Biomedical Translation Shared Task
Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Karin Verspoor | Ondřej Bojar | Arthur Boyer | Cristian Grozea | Barry Haddow | Madeleine Kittner | Yvonne Lichtblau | Pavel Pecina | Roland Roller | Rudolf Rosa | Amy Siu | Philippe Thomas | Saskia Trescher
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The biomedical scientific literature is a rich source of information not only in the English language, for which it is more abundant, but also in other languages, such as Portuguese, Spanish and French. We present the first freely available parallel corpus of scientific publications for the biomedical domain. Documents from the ”Biological Sciences” and ”Health Sciences” categories were retrieved from the Scielo database and parallel titles and abstracts are available for the following language pairs: Portuguese/English (about 86,000 documents in total), Spanish/English (about 95,000 documents) and French/English (about 2,000 documents). Additionally, monolingual data was also collected for all four languages. Sentences in the parallel corpus were automatically aligned and a manual analysis of 200 documents by native experts found that a minimum of 79% of sentences were correctly aligned in all language pairs. We demonstrate the utility of the corpus by running baseline machine translation experiments. We show that for all language pairs, a statistical machine translation system trained on the parallel corpora achieves performance that rivals or exceeds the state of the art in the biomedical domain. Furthermore, the corpora are currently being used in the biomedical task in the First Conference on Machine Translation (WMT’16).

pdf bib
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

bib
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Findings of the 2016 Conference on Machine Translation
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Varvara Logacheva | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Martin Popel | Matt Post | Raphael Rubino | Carolina Scarton | Lucia Specia | Marco Turchi | Karin Verspoor | Marcos Zampieri
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
HPI Question Answering System in BioASQ 2016
Frederik Schulze | Ricarda Schüler | Tim Draeger | Daniel Dummer | Alexander Ernst | Pedro Flemming | Cindy Perscheid | Mariana Neves
Proceedings of the Fourth BioASQ workshop

pdf bib
BioMedLAT Corpus: Annotation of the Lexical Answer Type for Biomedical Questions
Mariana Neves | Milena Kraus
Proceedings of the Open Knowledge Base and Question Answering Workshop (OKBQA 2016)

Question answering (QA) systems need to provide exact answers for the questions that are posed to the system. However, this can only be achieved through a precise processing of the question. During this procedure, one important step is the detection of the expected type of answer that the system should provide by extracting the headword of the questions and identifying its semantic type. We have annotated the headword and assigned UMLS semantic types to 643 factoid/list questions from the BioASQ training data. We present statistics on the corpus and a preliminary evaluation in baseline experiments. We also discuss the challenges on both the manual annotation and the automatic detection of the headwords and the semantic types. We believe that this is a valuable resource for both training and evaluation of biomedical QA systems. The corpus is available at: https://github.com/mariananeves/BioMedLAT.

pdf bib
Entity-Supported Summarization of Biomedical Abstracts
Frederik Schulze | Mariana Neves
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

The increasing amount of biomedical information that is available for researchers and clinicians makes it harder to quickly find the right information. Automatic summarization of multiple texts can provide summaries specific to the user’s information needs. In this paper we look into the use named-entity recognition for graph-based summarization. We extend the LexRank algorithm with information about named entities and present EntityRank, a multi-document graph-based summarization algorithm that is solely based on named entities. We evaluate our system on a datasets of 1009 human written summaries provided by BioASQ and on 1974 gene summaries, fetched from the Entrez Gene database. The results show that the addition of named-entity information increases the performance of graph-based summarizers and that the EntityRank significantly outperforms the other methods with regard to the ROUGE measures.

2013

pdf bib
WBI-DDI: Drug-Drug Interaction Extraction using Majority Voting
Philippe Thomas | Mariana Neves | Tim Rocktäschel | Ulf Leser
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2009

pdf bib
Extraction of biomedical events using case-based reasoning
Mariana Neves | José-María Carazo | Alberto Pascual-Montano
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2008

pdf bib
CBR-Tagger: a case-based reasoning approach to the gene/protein mention problem
Mariana Neves | Monica Chagoyen | José María Carazo | Alberto Pascual-Montano
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing