JRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task

Guillaume Jacquet, Jakub Piskorski, Hristo Tanev, Ralf Steinberger


Abstract
We report on the participation of the JRC Text Mining and Analysis Competence Centre (TMA-CC) in the BSNLP-2019 Shared Task, which focuses on named-entity recognition, lemmatisation and cross-lingual linking. We propose a hybrid system combining a rule-based approach and light ML techniques. We use multilingual lexical resources such as JRC-NAMES and BABELNET together with a named entity guesser to recognise names. In a second step, we combine known names with wild cards to increase recognition recall by also capturing inflection variants. In a third step, we increase precision by filtering these name candidates with automatically learnt inflection patterns derived from name occurrences in large news article collections. Our major requirement is to achieve high precision. We achieved an average of 65% F-measure with 93% precision on the four languages.
Anthology ID:
W19-3714
Volume:
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | BSNLP | WS
SIG:
SIGSLAV
Publisher:
Association for Computational Linguistics
Note:
Pages:
100–104
Language:
URL:
https://www.aclweb.org/anthology/W19-3714
DOI:
10.18653/v1/W19-3714
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-3714.pdf