BioMRC: A Dataset for Biomedical Machine Reading Comprehension
Dimitris Pappas | Petros Stavropoulos | Ion Androutsopoulos | Ryan McDonald
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

We introduceBIOMRC, a large-scale cloze-style biomedical MRC dataset. Care was taken to reduce noise, compared to the previous BIOREAD dataset of Pappas et al. (2018). Experiments show that simple heuristics do not perform well on the new dataset and that two neural MRC models that had been tested on BIOREAD perform much better on BIOMRC, indicating that the new dataset is indeed less noisy or at least that its task is more feasible. Non-expert human performance is also higher on the new dataset compared to BIOREAD, and biomedical experts perform even better. We also introduce a new BERT-based MRC model, the best version of which substantially outperforms all other methods tested, reaching or surpassing the accuracy of biomedical experts in some experiments. We make the new dataset available in three different sizes, also releasing our code, and providing a leaderboard.

Research & Innovation Activities’ Impact Assessment: The Data4Impact System
Ioanna Grypari | Dimitris Pappas | Natalia Manola | Haris Papageorgiou
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

Cat. 2 Show-case: We present the Data4Impact (D4I) platform, a novel end-to-end system for evidence-based, timely and accurate monitoring and evaluation of research and innovation (R&I) activities. Using the latest technological advances in Human Language Technology (HLT) and our data-driven methodology, we build a novel set of indicators in order to track funded projects and their impact on science, the economy and the society as a whole, during and after the project life-cycle. We develop our methodology by targeting Health-related EC projects from 2007 to 2019 to produce solutions that meet the needs of stakeholders (mainly policy-makers and research funders). Various D4I text analytics workflows process datasets and their metadata, extract valuable insights and estimate intermediate results and metrics, culminating in a set of robust indicators that the users can interact with through our dashboard, the D4I Monitor (available at monitor.data4impact.eu). Therefore, our approach, which can be generalized to different contexts, is multidimensional (technology, tools, indicators, dashboard) and the resulting system can provide an innovative solution for public administrators in their policy-making needs related to RDI funding allocation.


Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors
Sotiris Kotitsas | Dimitris Pappas | Ion Androutsopoulos | Ryan McDonald | Marianna Apidianaki
Proceedings of the 18th BioNLP Workshop and Shared Task

Network Embedding (NE) methods, which map network nodes to low-dimensional feature vectors, have wide applications in network analysis and bioinformatics. Many existing NE methods rely only on network structure, overlooking other information associated with the nodes, e.g., text describing the nodes. Recent attempts to combine the two sources of information only consider local network structure. We extend NODE2VEC, a well-known NE method that considers broader network structure, to also consider textual node descriptors using recurrent neural encoders. Our method is evaluated on link prediction in two networks derived from UMLS. Experimental results demonstrate the effectiveness of the proposed approach compared to previous work.


BioRead: A New Dataset for Biomedical Reading Comprehension
Dimitris Pappas | Ion Androutsopoulos | Haris Papageorgiou
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

AUEB at BioASQ 6: Document and Snippet Retrieval
George Brokos | Polyvios Liosis | Ryan McDonald | Dimitris Pappas | Ion Androutsopoulos
Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering

We present AUEB’s submissions to the BioASQ 6 document and snippet retrieval tasks (parts of Task 6b, Phase A). Our models use novel extensions to deep learning architectures that operate solely over the text of the query and candidate document/snippets. Our systems scored at the top or near the top for all batches of the challenge, highlighting the effectiveness of deep learning for these tasks.