Estimating the Resource Adaption Cost from a Resource Rich Language to a Similar Resource Poor Language

Anil Kumar Singh, Kiran Pala, Harshit Surana


Abstract
Developing resources which can be used for Natural Language Processing is an extremely difficult task for any language, but is even more so for less privileged (or less computerized) languages. One way to overcome this difficulty is to adapt the resources of a linguistically close resource rich language. In this paper we discuss how the cost of such adaption can be estimated using subjective and objective measures of linguistic similarity for allocating financial resources, time, manpower etc. Since this is the first work of its kind, the method described in this paper should be seen as only a preliminary method, indicative of how better methods can be developed. Corpora of several less computerized languages had to be collected for the work described in the paper, which was difficult because for many of these varieties there is not much electronic data available. Even if it is, it is in non-standard encodings, which means that we had to build encoding converters for these varieties. The varieties we have focused on are some of the varieties spoken in the South Asian region.
Anthology ID:
L08-1067
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/894_paper.pdf
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/894_paper.pdf