Ongoing Developments in Automatically Adapting Lexical Resources to the Biomedical Domain
Dominic Widdows | Adil Toumouh | Beate Dorow | Ahmed Lehireche
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper describes a range of experiments using empirical methods to adapt theWordNet noun ontology for specific use in the biomedical domain. Our basic technique is to extract relationships between terms using the Ohsumed corpus, a large collection of abstracts from PubMed, and to compare the relationships extracted with those that would be expected for medical terms, given the structure of the WordNet ontology. The linguistic methods involve the use of a variety of lexicosyntactic patterns that enable us to extract pairs of coordinate noun terms, and also related groups of adjectives and nouns, using Markov clustering. This enables us in many cases to analyse ambiguous words and select the correct meaning for the biomedical domain. While results are often encouraging, the paper also highlights evident problems and drawbacks with the method, and outlines suggestions for future work.