Effects of Graph Generation for Unsupervised Non-Contextual Single Document Keyword Extraction

Natalie Schluter


Abstract
This paper presents an exhaustive study on the generation of graph input to unsupervised graph-based non-contextual single document keyword extraction systems. A concrete hypothesis on concept coordination for documents that are scientific articles is put forward, consistent with two separate graph models : one which is based on word adjacency in the linear text–an approach forming the foundation of all previous graph-based keyword extraction methods, and a novel one that is based on word adjacency modulo their modifiers. In doing so, we achieve a best reported NDCG score to date of 0.431 for any system on the same data. In terms of a best parameter f-score, we achieve the highest reported to date (0.714) at a reasonable ranked list cut-off of n = 6, which is also the best reported f-score for any keyword extraction or generation system in the literature on the same data. The best-parameter f-score corresponds to a reduction in error of 12.6% conservatively.
Anthology ID:
2015.jeptalnrecital-court.10
Volume:
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Month:
June
Year:
2015
Address:
Caen, France
Venue:
JEP/TALN/RECITAL
SIG:
Publisher:
ATALA
Note:
Pages:
61–67
Language:
URL:
https://www.aclweb.org/anthology/2015.jeptalnrecital-court.10
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2015.jeptalnrecital-court.10.pdf