Improving Word Alignment of Rare Words with Word Embeddings

Masoud Jalili Sabet, Heshaam Faili, Gholamreza Haffari


Abstract
We address the problem of inducing word alignment for language pairs by developing an unsupervised model with the capability of getting applied to other generative alignment models. We approach the task by: i)proposing a new alignment model based on the IBM alignment model 1 that uses vector representation of words, and ii)examining the use of similar source words to overcome the problem of rare source words and improving the alignments. We apply our method to English-French corpora and run the experiments with different sizes of sentence pairs. Our results show competitive performance against the baseline and in some cases improve the results up to 6.9% in terms of precision.
Anthology ID:
C16-1302
Volume:
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Month:
December
Year:
2016
Address:
Osaka, Japan
Venue:
COLING
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
3209–3215
Language:
URL:
https://www.aclweb.org/anthology/C16-1302
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/C16-1302.pdf