Comparison of Representations of Named Entities for Document Classification

Lidia Pivovarova, Roman Yangarber


Abstract
We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.
Anthology ID:
W18-3008
Volume:
Proceedings of The Third Workshop on Representation Learning for NLP
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venues:
ACL | RepL4NLP | WS
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
64–68
Language:
URL:
https://www.aclweb.org/anthology/W18-3008
DOI:
10.18653/v1/W18-3008
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-3008.pdf