Comparison of Representations of Named Entities for Document Classification
Lidia Pivovarova, Roman Yangarber
Abstract
We explore representations for multi-word names in text classification tasks, on Reuters (RCV1) topic and sector classification. We find that: the best way to treat names is to split them into tokens and use each token as a separate feature; NEs have more impact on sector classification than topic classification; replacing NEs with entity types is not an effective strategy; representing tokens by different embeddings for proper names vs. common nouns does not improve results. We highlight the improvements over state-of-the-art results that our CNN models yield.- Anthology ID:
- W18-3008
- Volume:
- Proceedings of The Third Workshop on Representation Learning for NLP
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Venues:
- ACL | RepL4NLP | WS
- SIG:
- SIGREP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 64–68
- Language:
- URL:
- https://www.aclweb.org/anthology/W18-3008
- DOI:
- 10.18653/v1/W18-3008
- PDF:
- http://aclanthology.lst.uni-saarland.de/W18-3008.pdf