Subcharacter Information in Japanese Embeddings: When Is It Worth It?

Marzena Karpinska, Bofang Li, Anna Rogers, Aleksandr Drozd


Abstract
Languages with logographic writing systems present a difficulty for traditional character-level models. Leveraging the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese. We examine whether the same strategies could be applied for Japanese, and contribute a new analogy dataset for this language.
Anthology ID:
W18-2905
Volume:
Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venues:
ACL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28–37
Language:
URL:
https://www.aclweb.org/anthology/W18-2905
DOI:
10.18653/v1/W18-2905
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W18-2905.pdf