Learning Bilingual Word Embeddings Using Lexical Definitions

Weijia Shi, Muhao Chen, Yingtao Tian, Kai-Wei Chang


Abstract
Bilingual word embeddings, which represent lexicons of different languages in a shared embedding space, are essential for supporting semantic and knowledge transfers in a variety of cross-lingual NLP tasks. Existing approaches to training bilingual word embeddings require either large collections of pre-defined seed lexicons that are expensive to obtain, or parallel sentences that comprise coarse and noisy alignment. In contrast, we propose BiLex that leverages publicly available lexical definitions for bilingual word embedding learning. Without the need of predefined seed lexicons, BiLex comprises a novel word pairing strategy to automatically identify and propagate the precise fine-grain word alignment from lexical definitions. We evaluate BiLex in word-level and sentence-level translation tasks, which seek to find the cross-lingual counterparts of words and sentences respectively. BiLex significantly outperforms previous embedding methods on both tasks.
Anthology ID:
W19-4316
Volume:
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | RepL4NLP | WS
SIG:
SIGREP
Publisher:
Association for Computational Linguistics
Note:
Pages:
142–147
Language:
URL:
https://www.aclweb.org/anthology/W19-4316
DOI:
10.18653/v1/W19-4316
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W19-4316.pdf