Amy Siu


2020

pdf bib
TrainX – Named Entity Linking with Active Sampling and Bi-Encoders
Tom Oberhauser | Tim Bischoff | Karl Brendel | Maluna Menke | Tobias Klatt | Amy Siu | Felix Alexander Gers | Alexander Löser
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

We demonstrate TrainX, a system for Named Entity Linking for medical experts. It combines state-of-the-art entity recognition and linking architectures, such as Flair and fine-tuned Bi-Encoders based on BERT, with an easy-to-use interface for healthcare professionals. We support medical experts in annotating training data by using active sampling strategies to forward informative samples to the annotator. We demonstrate that our model is capable of linking against large knowledge bases, such as UMLS (3.6 million entities), and supporting zero-shot cases, where the linker has never seen the entity before. Those zero-shot capabilities help to mitigate the problem of rare and expensive training data that is a common issue in the medical domain.

2019

pdf bib
Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies
Rachel Bawden | Kevin Bretonnel Cohen | Cristian Grozea | Antonio Jimeno Yepes | Madeleine Kittner | Martin Krallinger | Nancy Mah | Aurelie Neveol | Mariana Neves | Felipe Soares | Amy Siu | Karin Verspoor | Maika Vicente Navarro
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

In the fourth edition of the WMT Biomedical Translation task, we considered a total of six languages, namely Chinese (zh), English (en), French (fr), German (de), Portuguese (pt), and Spanish (es). We performed an evaluation of automatic translations for a total of 10 language directions, namely, zh/en, en/zh, fr/en, en/fr, de/en, en/de, pt/en, en/pt, es/en, and en/es. We provided training data based on MEDLINE abstracts for eight of the 10 language pairs and test sets for all of them. In addition to that, we offered a new sub-task for the translation of terms in biomedical terminologies for the en/es language direction. Higher BLEU scores (close to 0.5) were obtained for the es/en, en/es and en/pt test sets, as well as for the terminology sub-task. After manual validation of the primary runs, some submissions were judged to be better than the reference translations, for instance, for de/en, en/es and es/en.

2018

pdf bib
Findings of the WMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol | Cristian Grozea | Amy Siu | Madeleine Kittner | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

Machine translation enables the automatic translation of textual documents between languages and can facilitate access to information only available in a given language for non-speakers of this language, e.g. research results presented in scientific publications. In this paper, we provide an overview of the Biomedical Translation shared task in the Workshop on Machine Translation (WMT) 2018, which specifically examined the performance of machine translation systems for biomedical texts. This year, we provided test sets of scientific publications from two sources (EDP and Medline) and for six language pairs (English with each of Chinese, French, German, Portuguese, Romanian and Spanish). We describe the development of the various test sets, the submissions that we received and the evaluations that we carried out. We obtained a total of 39 runs from six teams and some of this year’s BLEU scores were somewhat higher that last year’s, especially for teams that made use of biomedical resources or state-of-the-art MT algorithms (e.g. Transformer). Finally, our manual evaluation scored automatic translations higher than the reference translations for German and Spanish.

2017

pdf bib
Findings of the WMT 2017 Biomedical Translation Shared Task
Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Karin Verspoor | Ondřej Bojar | Arthur Boyer | Cristian Grozea | Barry Haddow | Madeleine Kittner | Yvonne Lichtblau | Pavel Pecina | Roland Roller | Rudolf Rosa | Amy Siu | Philippe Thomas | Saskia Trescher
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences
Patrick Ernst | Amy Siu | Dragan Milchevski | Johannes Hoffart | Gerhard Weikum
Proceedings of ACL-2016 System Demonstrations

pdf bib
Disambiguation of entities in MEDLINE abstracts by combining MeSH terms with knowledge
Amy Siu | Patrick Ernst | Gerhard Weikum
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2015

pdf bib
Semantic Type Classification of Common Words in Biomedical Noun Phrases
Amy Siu | Gerhard Weikum
Proceedings of BioNLP 15