Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian

Chenchen Ding, Masao Utiyama, Eiichiro Sumita


Abstract
This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.
Anthology ID:
W16-4614
Volume:
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
Month:
December
Year:
2016
Address:
Osaka, Japan
Venues:
WAT | WS
SIG:
Publisher:
The COLING 2016 Organizing Committee
Note:
Pages:
149–156
Language:
URL:
https://www.aclweb.org/anthology/W16-4614
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W16-4614.pdf