Irina Stenger


2020

pdf bib
The INCOMSLAV Platform: Experimental Website with Integrated Methods for Measuring Linguistic Distances and Asymmetries in Receptive Multilingualism
Irina Stenger | Klara Jagrova | Tania Avgustinova
Proceedings of the LREC 2020 Workshop on "Citizen Linguistics in Language Resource Development"

We report on a web-based resource for conducting intercomprehension experiments with native speakers of Slavic languages and present our methods for measuring linguistic distances and asymmetries in receptive multilingualism. Through a website which serves as a platform for online testing, a large number of participants with different linguistic backgrounds can be targeted. A statistical language model is used to measure information density and to gauge how language users master various degrees of (un)intelligibilty. The key idea is that intercomprehension should be better when the model adapted for understanding the unknown language exhibits relatively low average distance and surprisal. All obtained intelligibility scores together with distance and asymmetry measures for the different language pairs and processing directions are made available as an integrated online resource in the form of a Slavic intercomprehension matrix (SlavMatrix).

2019

pdf bib
incom.py - A Toolbox for Calculating Linguistic Distances and Asymmetries between Related Languages
Marius Mosbach | Irina Stenger | Tania Avgustinova | Dietrich Klakow
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Languages may be differently distant from each other and their mutual intelligibility may be asymmetric. In this paper we introduce incom.py, a toolbox for calculating linguistic distances and asymmetries between related languages. incom.py allows linguist experts to quickly and easily perform statistical analyses and compare those with experimental results. We demonstrate the efficacy of incom.py in an incomprehension experiment on two Slavic languages: Bulgarian and Russian. Using incom.py we were able to validate three methods to measure linguistic distances and asymmetries: Levenshtein distance, word adaptation surprisal, and conditional entropy as predictors of success in a reading intercomprehension experiment.

2016

pdf bib
Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility
Andrea Fischer | Klára Jágrová | Irina Stenger | Tania Avgustinova | Dietrich Klakow | Roland Marti
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In an intercomprehension scenario, typically a native speaker of language L1 is confronted with output from an unknown, but related language L2. In this setting, the degree to which the receiver recognizes the unfamiliar words greatly determines communicative success. Despite exhibiting great string-level differences, cognates may be recognized very successfully if the receiver is aware of regular correspondences which allow to transform the unknown word into its familiar form. Modeling L1-L2 intercomprehension then requires the identification of all the regular correspondences between languages L1 and L2. We here present a set of linguistic orthographic correspondences manually compiled from comparative linguistics literature along with a set of statistically-inferred suggestions for correspondence rules. In order to do statistical inference, we followed the Minimum Description Length principle, which proposes to choose those rules which are most effective at describing the data. Our statistical model was able to reproduce most of our linguistic correspondences (88.5% for Czech-Polish and 75.7% for Bulgarian-Russian) and furthermore allowed to easily identify many more non-trivial correspondences which also cover aspects of morphology.