Learning Multilingual Word Embeddings Using Image-Text Data
Karan Singhal, Karthik Raman, Balder ten Cate
Abstract
There has been significant interest recently in learning multilingual word embeddings – in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.- Anthology ID:
- W19-1807
- Volume:
- Proceedings of the Second Workshop on Shortcomings in Vision and Language
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Venues:
- NAACL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 68–77
- Language:
- URL:
- https://www.aclweb.org/anthology/W19-1807
- DOI:
- 10.18653/v1/W19-1807
- PDF:
- http://aclanthology.lst.uni-saarland.de/W19-1807.pdf