Cross-Lingual Syntactic Transfer with Limited Resources

Mohammad Sadegh Rasooli, Michael Collins


Abstract
We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. This method makes use of three steps: 1) a method for deriving cross-lingual word clusters, which can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins (2015). Experiments show improvements over the state-of-the-art in several languages used in previous work, in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the results of Rasooli and Collins (2015). We conclude with results on 38 datasets from the Universal Dependencies corpora.
Anthology ID:
Q17-1020
Volume:
Transactions of the Association for Computational Linguistics, Volume 5
Month:
Year:
2017
Address:
Venue:
TACL
SIG:
Publisher:
Note:
Pages:
279–293
Language:
URL:
https://www.aclweb.org/anthology/Q17-1020
DOI:
10.1162/tacl_a_00061
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/Q17-1020.pdf
Video:
 https://vimeo.com/276419865