Ayla Rigouts Terryn


2020

pdf bib
TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset
Ayla Rigouts Terryn | Veronique Hoste | Patrick Drouin | Els Lefever
Proceedings of the 6th International Workshop on Computational Terminology

The TermEval 2020 shared task provided a platform for researchers to work on automatic term extraction (ATE) with the same dataset: the Annotated Corpora for Term Extraction Research (ACTER). The dataset covers three languages (English, French, and Dutch) and four domains, of which the domain of heart failure was kept as a held-out test set on which final f1-scores were calculated. The aim was to provide a large, transparent, qualitatively annotated, and diverse dataset to the ATE research community, with the goal of promoting comparative research and thus identifying strengths and weaknesses of various state-of-the-art methodologies. The results show a lot of variation between different systems and illustrate how some methodologies reach higher precision or recall, how different systems extract different types of terms, how some are exceptionally good at finding rare terms, or are less impacted by term length. The current contribution offers an overview of the shared task with a comparative evaluation, which complements the individual papers by all participants.

2019

pdf bib
Analysing the Impact of Supervised Machine Learning on Automatic Term Extraction: HAMLET vs TermoStat
Ayla Rigouts Terryn | Patrick Drouin | Veronique Hoste | Els Lefever
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Traditional approaches to automatic term extraction do not rely on machine learning (ML) and select the top n ranked candidate terms or candidate terms above a certain predefined cut-off point, based on a limited number of linguistic and statistical clues. However, supervised ML approaches are gaining interest. Relatively little is known about the impact of these supervised methodologies; evaluations are often limited to precision, and sometimes recall and f1-scores, without information about the nature of the extracted candidate terms. Therefore, the current paper presents a detailed and elaborate analysis and comparison of a traditional, state-of-the-art system (TermoStat) and a new, supervised ML approach (HAMLET), using the results obtained for the same, manually annotated, Dutch corpus about dressage.

2018

pdf bib
A Gold Standard for Multilingual Automatic Term Extraction from Comparable Corpora: Term Structure and Translation Equivalents
Ayla Rigouts Terryn | Véronique Hoste | Els Lefever
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)