Towards Automatic Thesaurus Construction and Enrichment.

Amir Hazem, Beatrice Daille, Lanza Claudia


Abstract
Thesaurus construction with minimum human efforts often relies on automatic methods to discover terms and their relations. Hence, the quality of a thesaurus heavily depends on the chosen methodologies for: (i) building its content (terminology extraction task) and (ii) designing its structure (semantic similarity task). The performance of the existing methods on automatic thesaurus construction is still less accurate than the handcrafted ones of which is important to highlight the drawbacks to let new strategies build more accurate thesauri models. In this paper, we will provide a systematic analysis of existing methods for both tasks and discuss their feasibility based on an Italian Cybersecurity corpus. In particular, we will provide a detailed analysis on how the semantic relationships network of a thesaurus can be automatically built, and investigate the ways to enrich the terminological scope of a thesaurus by taking into account the information contained in external domain-oriented semantic sets.
Anthology ID:
2020.computerm-1.9
Volume:
Proceedings of the 6th International Workshop on Computational Terminology
Month:
May
Year:
2020
Address:
Marseille, France
Venues:
CompuTerm | LREC | WS
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
62–71
Language:
English
URL:
https://www.aclweb.org/anthology/2020.computerm-1.9
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.computerm-1.9.pdf