Learning Patterns for Building Resources about Semantic Relations in the Medical Domain

Mehdi Embarek, Olivier Ferret


Abstract
In this article, we present a method for extracting automatically semantic relations from texts in the medical domain using linguistic patterns. These patterns refer to three levels of information about words: inflected form, lemma and part-of-speech. The method we present consists first in identifying the entities that are part of the relations to extract, that is to say diseases, exams, treatments, drugs or symptoms. Thereafter, sentences that contain couples of entities are extracted and the presence of a semantic relation is validated by applying linguistic patterns. These patterns were previously learnt automatically from a manually annotated corpus by relying onan algorithm based on the edit distance. We first report the results of an evaluation of our medical entity tagger for the five types of entities we have mentioned above and then, more globally, the results of an evaluation of our extraction method for four relations between these entities. Both evaluations were done for French.
Anthology ID:
L08-1238
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/514_paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/514_paper.pdf