Subcharacter Information in Japanese Embeddings: When Is It Worth It?
Marzena Karpinska, Bofang Li, Anna Rogers, Aleksandr Drozd
Abstract
Languages with logographic writing systems present a difficulty for traditional character-level models. Leveraging the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese. We examine whether the same strategies could be applied for Japanese, and contribute a new analogy dataset for this language.- Anthology ID:
- W18-2905
- Volume:
- Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Venues:
- ACL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 28–37
- Language:
- URL:
- https://www.aclweb.org/anthology/W18-2905
- DOI:
- 10.18653/v1/W18-2905
- PDF:
- http://aclanthology.lst.uni-saarland.de/W18-2905.pdf