Ken Barker


2019

pdf bib
Combining Unsupervised Pre-training and Annotator Rationales to Improve Low-shot Text Classification
Oren Melamud | Mihaela Bornea | Ken Barker
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Supervised learning models often perform poorly at low-shot tasks, i.e. tasks for which little labeled data is available for training. One prominent approach for improving low-shot learning is to use unsupervised pre-trained neural models. Another approach is to obtain richer supervision by collecting annotator rationales (explanations supporting label annotations). In this work, we combine these two approaches to improve low-shot text classification with two novel methods: a simple bag-of-words embedding approach; and a more complex context-aware method, based on the BERT model. In experiments with two English text classification datasets, we demonstrate substantial performance gains from combining pre-training with rationales. Furthermore, our investigation of a range of train-set sizes reveals that the simple bag-of-words approach is the clear top performer when there are only a few dozen training instances or less, while more complex models, such as BERT or CNN, require more training data to shine.

pdf bib
Leveraging Medical Literature for Section Prediction in Electronic Health Records
Sara Rosenthal | Ken Barker | Zhicheng Liang
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Electronic Health Records (EHRs) contain both structured content and unstructured (text) content about a patient’s medical history. In the unstructured text parts, there are common sections such as Assessment and Plan, Social History, and Medications. These sections help physicians find information easily and can be used by an information retrieval system to return specific information sought by a user. However, it is common that the exact format of sections in a particular EHR does not adhere to known patterns. Therefore, being able to predict sections and headers in EHRs automatically is beneficial to physicians. Prior approaches in EHR section prediction have only used text data from EHRs and have required significant manual annotation. We propose using sections from medical literature (e.g., textbooks, journals, web content) that contain content similar to that found in EHR sections. Our approach uses data from a different kind of source where labels are provided without the need of a time-consuming annotation effort. We use this data to train two models: an RNN and a BERT-based model. We apply the learned models along with source data via transfer learning to predict sections in EHRs. Our results show that medical literature can provide helpful supervision signal for this classification task.

2017

pdf bib
Stacking With Auxiliary Features for Entity Linking in the Medical Domain
Nazneen Fatema Rajani | Mihaela Bornea | Ken Barker
BioNLP 2017

Linking spans of natural language text to concepts in a structured source is an important task for many problems. It allows intelligent systems to leverage rich knowledge available in those sources (such as concept properties and relations) to enhance the semantics of the mentions of these concepts in text. In the medical domain, it is common to link text spans to medical concepts in large, curated knowledge repositories such as the Unified Medical Language System. Different approaches have different strengths: some are precision-oriented, some recall-oriented; some better at considering context but more prone to hallucination. The variety of techniques suggests that ensembling could outperform component technologies at this task. In this paper, we describe our process for building a Stacking ensemble using additional, auxiliary features for Entity Linking in the medical domain. We report experiments that show that naive ensembling does not always outperform component Entity Linking systems, that stacking usually outperforms naive ensembling, and that auxiliary features added to the stacker further improve its performance on three distinct datasets. Our best model produces state-of-the-art results on several medical datasets.

2010

pdf bib
Building an end-to-end text reading system based on a packed representation
Doo Soon Kim | Ken Barker | Bruce Porter
Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading

pdf bib
Improving the Quality of Text Understanding by Delaying Ambiguity Resolution
Doo Soon Kim | Ken Barker | Bruce Porter
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

1998

pdf bib
Semi-Automatic Recognition of Noun Modifier Relationships
Ken Barker | Stan Szpakowicz
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Semi-Automatic Recognition of Noun Modifier Relationships
Ken Barker | Stan Szpakowicz
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1996

pdf bib
Book Reviews: Natural Language Processing for Prolog Programmers
Ken Barker | Stan Szpakowicz
Computational Linguistics, Volume 22, Number 1, March 1996