Karin Verspoor

Also published as: Cornelia Maria Verspoor, Karin M. Verspoor


2020

pdf bib
Learning from Unlabelled Data for Clinical Semantic Textual Similarity
Yuxia Wang | Karin Verspoor | Timothy Baldwin
Proceedings of the 3rd Clinical Natural Language Processing Workshop

Domain pretraining followed by task fine-tuning has become the standard paradigm for NLP tasks, but requires in-domain labelled data for task fine-tuning. To overcome this, we propose to utilise domain unlabelled data by assigning pseudo labels from a general model. We evaluate the approach on two clinical STS datasets, and achieve r= 0.80 on N2C2-STS. Further investigation reveals that if the data distribution of unlabelled sentence pairs is closer to the test data, we can obtain better performance. By leveraging a large general-purpose STS dataset and small-scale in-domain training data, we obtain further improvements to r= 0.90, a new SOTA.

pdf bib
Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity
Yuxia Wang | Fei Liu | Karin Verspoor | Timothy Baldwin
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

In this paper, we apply pre-trained language models to the Semantic Textual Similarity (STS) task, with a specific focus on the clinical domain. In low-resource setting of clinical STS, these large models tend to be impractical and prone to overfitting. Building on BERT, we study the impact of a number of model design choices, namely different fine-tuning and pooling strategies. We observe that the impact of domain-specific fine-tuning on clinical STS is much less than that in the general domain, likely due to the concept richness of the domain. Based on this, we propose two data augmentation techniques. Experimental results on N2C2-STS 1 demonstrate substantial improvements, validating the utility of the proposed methods.

pdf bib
Domain Adaptation and Instance Selection for Disease Syndrome Classification over Veterinary Clinical Notes
Brian Hur | Timothy Baldwin | Karin Verspoor | Laura Hardefeldt | James Gilkerson
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Identifying the reasons for antibiotic administration in veterinary records is a critical component of understanding antimicrobial usage patterns. This informs antimicrobial stewardship programs designed to fight antimicrobial resistance, a major health crisis affecting both humans and animals in which veterinarians have an important role to play. We propose a document classification approach to determine the reason for administration of a given drug, with particular focus on domain adaptation from one drug to another, and instance selection to minimize annotation effort.

pdf bib
WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking
Afshin Rahimi | Timothy Baldwin | Karin Verspoor
Proceedings of the 28th International Conference on Computational Linguistics

We present our work on aligning the Unified Medical Language System (UMLS) to Wikipedia, to facilitate manual alignment of the two resources. We propose a cross-lingual neural reranking model to match a UMLS concept with a Wikipedia page, which achieves a recall@1of 72%, a substantial improvement of 20% over word- and char-level BM25, enabling manual alignment with minimal effort. We release our resources, including ranked Wikipedia pages for 700k UMLSconcepts, and WikiUMLS, a dataset for training and evaluation of alignment models between UMLS and Wikipedia collected from Wikidata. This will provide easier access to Wikipedia for health professionals, patients, and NLP systems, including in multilingual settings.

pdf bib
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020
Karin Verspoor | Kevin Bretonnel Cohen | Mark Dredze | Emilio Ferrara | Jonathan May | Robert Munro | Cecile Paris | Byron Wallace
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

pdf bib
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020
Karin Verspoor | Kevin Bretonnel Cohen | Michael Conway | Berry de Bruijn | Mark Dredze | Rada Mihalcea | Byron Wallace
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

pdf bib
Improved Topic Representations of Medical Documents to Assist COVID-19 Literature Exploration
Yulia Otmakhova | Karin Verspoor | Timothy Baldwin | Simon Šuster
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

Efficient discovery and exploration of biomedical literature has grown in importance in the context of the COVID-19 pandemic, and topic-based methods such as latent Dirichlet allocation (LDA) are a useful tool for this purpose. In this study we compare traditional topic models based on word tokens with topic models based on medical concepts, and propose several ways to improve topic coherence and specificity.

2019

pdf bib
Detecting Chemical Reactions in Patents
Hiyori Yoshikawa | Dat Quoc Nguyen | Zenan Zhai | Christian Druckenbrodt | Camilo Thorne | Saber A. Akhondi | Timothy Baldwin | Karin Verspoor
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration. In this paper we introduce the novel task of detecting the textual spans that describe or refer to chemical reactions within patents. We formulate this task as a paragraph-level sequence tagging problem, where the system is required to return a sequence of paragraphs which contain a description of a reaction. To address this new task, we construct an annotated dataset from an existing proprietary database of chemical reactions manually extracted from patents. We introduce several baseline methods for the task and evaluate them over our dataset. Through error analysis, we discuss what makes the task complex and challenging, and suggest possible directions for future research.

pdf bib
Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings
Zenan Zhai | Dat Quoc Nguyen | Saber Akhondi | Camilo Thorne | Christian Druckenbrodt | Trevor Cohn | Michelle Gregory | Karin Verspoor
Proceedings of the 18th BioNLP Workshop and Shared Task

Chemical patents are an important resource for chemical information. However, few chemical Named Entity Recognition (NER) systems have been evaluated on patent documents, due in part to their structural and linguistic complexity. In this paper, we explore the NER performance of a BiLSTM-CRF model utilising pre-trained word embeddings, character-level word representations and contextualized ELMo word representations for chemical patents. We compare word embeddings pre-trained on biomedical and chemical patent corpora. The effect of tokenizers optimized for the chemical domain on NER performance in chemical patents is also explored. The results on two patent corpora show that contextualized word representations generated from ELMo substantially improve chemical NER performance w.r.t. the current state-of-the-art. We also show that domain-specific resources such as word embeddings trained on chemical patents and chemical-specific tokenizers, have a positive impact on NER performance.

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

pdf bib
Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies
Rachel Bawden | Kevin Bretonnel Cohen | Cristian Grozea | Antonio Jimeno Yepes | Madeleine Kittner | Martin Krallinger | Nancy Mah | Aurelie Neveol | Mariana Neves | Felipe Soares | Amy Siu | Karin Verspoor | Maika Vicente Navarro
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

In the fourth edition of the WMT Biomedical Translation task, we considered a total of six languages, namely Chinese (zh), English (en), French (fr), German (de), Portuguese (pt), and Spanish (es). We performed an evaluation of automatic translations for a total of 10 language directions, namely, zh/en, en/zh, fr/en, en/fr, de/en, en/de, pt/en, en/pt, es/en, and en/es. We provided training data based on MEDLINE abstracts for eight of the 10 language pairs and test sets for all of them. In addition to that, we offered a new sub-task for the translation of terms in biomedical terminologies for the en/es language direction. Higher BLEU scores (close to 0.5) were obtained for the es/en, en/es and en/pt test sets, as well as for the terminology sub-task. After manual validation of the primary runs, some submissions were judged to be better than the reference translations, for instance, for de/en, en/es and es/en.

pdf bib
A Bag-of-concepts Model Improves Relation Extraction in a Narrow Knowledge Domain with Limited Data
Jiyu Chen | Karin Verspoor | Zenan Zhai
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

This paper focuses on a traditional relation extraction task in the context of limited annotated data and a narrow knowledge domain. We explore this task with a clinical corpus consisting of 200 breast cancer follow-up treatment letters in which 16 distinct types of relations are annotated. We experiment with an approach to extracting typed relations called window-bounded co-occurrence (WBC), which uses an adjustable context window around entity mentions of a relevant type, and compare its performance with a more typical intra-sentential co-occurrence baseline. We further introduce a new bag-of-concepts (BoC) approach to feature engineering based on the state-of-the-art word embeddings and word synonyms. We demonstrate the competitiveness of BoC by comparing with methods of higher complexity, and explore its effectiveness on this small dataset.

2018

pdf bib
Parallel Corpora for the Biomedical Domain
Aurélie Névéol | Antonio Jimeno Yepes | Mariana Neves | Karin Verspoor
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
An Improved Neural Network Model for Joint POS Tagging and Dependency Parsing
Dat Quoc Nguyen | Karin Verspoor
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

We propose a novel neural network model for joint part-of-speech (POS) tagging and dependency parsing. Our model extends the well-known BIST graph-based dependency parser (Kiperwasser and Goldberg, 2016) by incorporating a BiLSTM-based tagging component to produce automatically predicted POS tags for the parser. On the benchmark English Penn treebank, our model obtains strong UAS and LAS scores at 94.51% and 92.87%, respectively, producing 1.5+% absolute improvements to the BIST graph-based parser, and also obtaining a state-of-the-art POS tagging accuracy at 97.97%. Furthermore, experimental results on parsing 61 “big” Universal Dependencies treebanks from raw texts show that our model outperforms the baseline UDPipe (Straka and Strakova, 2017) with 0.8% higher average POS tagging score and 3.6% higher average LAS score. In addition, with our model, we also obtain state-of-the-art downstream task scores for biomedical event extraction and opinion analysis applications. Our code is available together with all pre-trained models at: https://github.com/datquocnguyen/jPTDP

pdf bib
Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings
Dat Quoc Nguyen | Karin Verspoor
Proceedings of the BioNLP 2018 workshop

We investigate the incorporation of character-based word representations into a standard CNN-based relation extraction model. We experiment with two common neural architectures, CNN and LSTM, to learn word vector representations from character embeddings. Through a task on the BioCreative-V CDR corpus, extracting relationships between chemicals and diseases, we show that models exploiting the character-based word representations improve on models that do not use this information, obtaining state-of-the-art result relative to previous neural approaches.

pdf bib
Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition
Zenan Zhai | Dat Quoc Nguyen | Karin Verspoor
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

We compare the use of LSTM-based and CNN-based character-level word embeddings in BiLSTM-CRF models to approach chemical and disease named entity recognition (NER) tasks. Empirical results over the BioCreative V CDR corpus show that the use of either type of character-level word embeddings in conjunction with the BiLSTM-CRF models leads to comparable state-of-the-art performance. However, the models using CNN-based character-level word embeddings have a computational performance advantage, increasing training time over word-based models by 25% while the LSTM-based character-level word embeddings more than double the required training time.

pdf bib
Proceedings of the Third Conference on Machine Translation: Research Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Research Papers

bib
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

pdf bib
Findings of the WMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol | Cristian Grozea | Amy Siu | Madeleine Kittner | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

Machine translation enables the automatic translation of textual documents between languages and can facilitate access to information only available in a given language for non-speakers of this language, e.g. research results presented in scientific publications. In this paper, we provide an overview of the Biomedical Translation shared task in the Workshop on Machine Translation (WMT) 2018, which specifically examined the performance of machine translation systems for biomedical texts. This year, we provided test sets of scientific publications from two sources (EDP and Medline) and for six language pairs (English with each of Chinese, French, German, Portuguese, Romanian and Spanish). We describe the development of the various test sets, the submissions that we received and the evaluations that we carried out. We obtained a total of 39 runs from six teams and some of this year’s BLEU scores were somewhat higher that last year’s, especially for teams that made use of biomedical resources or state-of-the-art MT algorithms (e.g. Transformer). Finally, our manual evaluation scored automatic translations higher than the reference translations for German and Spanish.

2017

pdf bib
SemEval-2017 Task 3: Community Question Answering
Preslav Nakov | Doris Hoogeveen | Lluís Màrquez | Alessandro Moschitti | Hamdy Mubarak | Timothy Baldwin | Karin Verspoor
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe SemEval–2017 Task 3 on Community Question Answering. This year, we reran the four subtasks from SemEval-2016: (A) Question–Comment Similarity, (B) Question–Question Similarity, (C) Question–External Comment Similarity, and (D) Rerank the correct answers for a new question in Arabic, providing all the data from 2015 and 2016 for training, and fresh data for testing. Additionally, we added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums. A total of 23 teams participated in the task, and submitted a total of 85 runs (36 primary and 49 contrastive) for subtasks A–D. Unfortunately, no teams participated in subtask E. A variety of approaches and features were used by the participating systems to address the different subtasks. The best systems achieved an official score (MAP) of 88.43, 47.22, 15.46, and 61.16 in subtasks A, B, C, and D, respectively. These scores are better than the baselines, especially for subtasks A–C.

pdf bib
Automatic Negation and Speculation Detection in Veterinary Clinical Text
Katherine Cheng | Timothy Baldwin | Karin Verspoor
Proceedings of the Australasian Language Technology Association Workshop 2017

pdf bib
Findings of the WMT 2017 Biomedical Translation Shared Task
Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Karin Verspoor | Ondřej Bojar | Arthur Boyer | Cristian Grozea | Barry Haddow | Madeleine Kittner | Yvonne Lichtblau | Pavel Pecina | Roland Roller | Rudolf Rosa | Amy Siu | Philippe Thomas | Saskia Trescher
Proceedings of the Second Conference on Machine Translation

2016

pdf bib
Syndromic Surveillance through Measuring Lexical Shift in Emergency Department Chief Complaint Texts
Hafsah Aamer | Bahadorreza Ofoghi | Karin Verspoor
Proceedings of the Australasian Language Technology Association Workshop 2016

pdf bib
ASM Kernel: Graph Kernel using Approximate Subgraph Matching for Relation Extraction
Nagesh C. Panyam | Karin Verspoor | Trevor Cohn | Rao Kotagiri
Proceedings of the Australasian Language Technology Association Workshop 2016

pdf bib
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

bib
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Findings of the 2016 Conference on Machine Translation
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Varvara Logacheva | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Martin Popel | Matt Post | Raphael Rubino | Carolina Scarton | Lucia Specia | Marco Turchi | Karin Verspoor | Marcos Zampieri
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
SeeDev Binary Event Extraction using SVMs and a Rich Feature Set
Nagesh C. Panyam | Gitansh Khirbat | Karin Verspoor | Trevor Cohn | Kotagiri Ramamohanarao
Proceedings of the 4th BioNLP Shared Task Workshop

pdf bib
Rev at SemEval-2016 Task 2: Aligning Chunks by Lexical, Part of Speech and Semantic Equivalence
Ping Tan | Karin Verspoor | Timothy Miller
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment
Julio Cesar Salinas Alvarado | Karin Verspoor | Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2015

pdf bib
Structural Alignment as the Basis to Improve Significant Change Detection in Versioned Sentences
Ping Ping Tan | Karin Verspoor | Tim Miller
Proceedings of the Australasian Language Technology Association Workshop 2015

2014

pdf bib
Automated Generation of Test Suites for Error Analysis of Concept Recognition Systems
Tudor Groza | Karin Verspoor
Proceedings of the Australasian Language Technology Association Workshop 2014

pdf bib
Exploring Temporal Patterns in Emergency Department Triage Notes with Topic Models
Simon Kocbek | Karin Verspoor | Wray Buntine
Proceedings of the Australasian Language Technology Association Workshop 2014

pdf bib
Analysis of Coreference Relations in the Biomedical Literature
Miji Choi | Karin Verspoor | Justin Zobel
Proceedings of the Australasian Language Technology Association Workshop 2014

pdf bib
Integrating UIMA with Alveo, a human communication science virtual laboratory
Dominique Estival | Steve Cassidy | Karin Verspoor | Andrew MacKinlay | Denis Burnham
Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT

pdf bib
What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages
Long Duong | Trevor Cohn | Karin Verspoor | Steven Bird | Paul Cook
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Earlier Identification of Epilepsy Surgery Candidates Using Natural Language Processing
Pawel Matykiewicz | Kevin Cohen | Katherine D. Holland | Tracy A. Glauser | Shannon M. Standridge | Karin M. Verspoor | John Pestian
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
Extracting Biomedical Events and Modifications Using Subgraph Matching with Noisy Training Data
Andrew MacKinlay | David Martinez | Antonio Jimeno Yepes | Haibin Liu | W. John Wilbur | Karin Verspoor
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
Generalizing an Approximate Subgraph Matching-based System to Extract Events in Molecular Biology and Cancer Genetics
Haibin Liu | Karin Verspoor | Donald C. Comeau | Andrew MacKinlay | W. John Wilbur
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)
Sarvnaz Karimi | Karin Verspoor
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

pdf bib
Impact of Corpus Diversity and Complexity on NER Performance
Tatyana Shmanina | Ingrid Zukerman | Antonio Jimeno Yepes | Lawrence Cavedon | Karin Verspoor
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

pdf bib
e-Learning with Kaggle in Class: Adapting the ALTA Shared Task 2013 to a Class Project
Karin Verspoor | Jeremy Nicholson
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

2012

pdf bib
Towards Adaptation of Linguistic Annotations to Scholarly Annotation Formalisms on the Semantic Web
Karin Verspoor | Kevin Livingston
Proceedings of the Sixth Linguistic Annotation Workshop

2011

pdf bib
Fast and simple semantic class assignment for biomedical text
K. Bretonnel Cohen | Thomas Christiansen | William Baumgartner Jr. | Karin Verspoor | Lawrence Hunter
Proceedings of BioNLP 2011 Workshop

pdf bib
From Graphs to Events: A Subgraph Matching Approach for Information Extraction from Biomedical Text
Haibin Liu | Ravikumar Komandur | Karin Verspoor
Proceedings of BioNLP Shared Task 2011 Workshop

2010

pdf bib
Test Suite Design for Biomedical Ontology Concept Recognition Systems
K. Bretonnel Cohen | Christophe Roeder | William A. Baumgartner Jr. | Lawrence E. Hunter | Karin Verspoor
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Systems that locate mentions of concepts from ontologies in free text are known as ontology concept recognition systems. This paper describes an approach to the evaluation of the workings of ontology concept recognition systems through use of a structured test suite and presents a publicly available test suite for this purpose. It is built using the principles of descriptive linguistic fieldwork and of software testing. More broadly, we also seek to investigate what general principles might inform the construction of such test suites. The test suite was found to be effective in identifying performance errors in an ontology concept recognition system. The system could not recognize 2.1% of all canonical forms and no non-canonical forms at all. Regarding the question of general principles of test suite construction, we compared this test suite to a named entity recognition test suite constructor. We found that they had twenty features in total and that seven were shared between the two models, suggesting that there is a core of feature types that may be applicable to test suite construction for any similar type of application.

2009

pdf bib
High-precision biological event extraction with a concept recognizer
K. Bretonnel Cohen | Karin Verspoor | Helen Johnson | Chris Roeder | Philip Ogren | William Baumgartner | Elizabeth White | Lawrence Hunter
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task

2006

pdf bib
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Karin Verspoor | Kevin Bretonnel Cohen | Ben Goertzel | Inderjeet Mani
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

1998

pdf bib
Predictivity vs. Stipulativity in the Lexicon
Cornelia Maria Verspoor
Proceedings of the 12th Pacific Asia Conference on Language, Information and Computation

pdf bib
Automatic English-Chinese name transliteration for development of multilingual resources
Stephen Wan | Cornelia Maria Verspoor
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Automatic English-Chinese name transliteration for development of multilingual resources
Stephen Wan | Cornelia Maria Verspoor
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

Search
Co-authors