Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction

Emmanuel Giguet, Gaël Lejeune, Jean-Baptiste Tanguy


Abstract
We present our contributions for the 2020 FinTOC Shared Tasks: Title Detection and Table of Contents Extraction. For the Structure Extraction task, we propose an approach that combines information from multiple sources: the table of contents, the wording of the document, and lexical domain knowledge. For the title detection task, we compare surface features to character-based features on various training configurations. We show that title detection results are very sensitive to the kind of training dataset used.
Anthology ID:
2020.fnp-1.30
Volume:
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venues:
COLING | FNP
SIG:
Publisher:
COLING
Note:
Pages:
174–180
Language:
URL:
https://www.aclweb.org/anthology/2020.fnp-1.30
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.fnp-1.30.pdf