Jack Rueter


2020

pdf bib
FST Morphology for the Endangered Skolt Sami Language
Jack Rueter | Mika Hämäläinen
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami. Like other minority Uralic languages, Skolt Sami exhibits a rich morphology, on the one hand, and there is little golden standard material for it, on the other. This makes NLP approaches for its study difficult without a solid morphological analysis. The language is severely endangered and the work presented in this paper forms a part of a greater whole in its revitalization efforts. Furthermore, we intersperse our description with facilitation and description practices not well documented in the infrastructure. Currently, the analyzer covers over 30,000 Skolt Sami words in 148 inflectional paradigms and over 12 derivational forms.

pdf bib
On the questions in developing computational infrastructure for Komi-Permyak
Jack Rueter | Niko Partanen | Larisa Ponomareva
Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages

pdf bib
On Editing Dictionaries for Uralic Languages in an Online Environment
Khalid Alnajjar | Mika Hämäläinen | Jack Rueter
Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages

pdf bib
Ve’rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement
Khalid Alnajjar | Mika Hämäläinen | Jack Rueter | Niko Partanen
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

We present an open-source online dictionary editing system, Ve′rdd, that offers a chance to re-evaluate and edit grassroots dictionaries that have been exposed to multiple amateur editors. The idea is to incorporate community activities into a state-of-the-art finite-state language description of a seriously endangered minority language, Skolt Sami. Problems involve getting the community to take part in things above the pencil-and-paper level. At times, it seems that the native speakers and the dictionary oriented are lacking technical understanding to utilize the infrastructures which might make their work more meaningful in the future, i.e. multiple reuse of all of their input. Therefore, our system integrates with the existing tools and infrastructures for Uralic language masking the technical complexities behind a user-friendly UI.

pdf bib
Open-Source Morphology for Endangered Mordvinic Languages
Jack Rueter | Mika Hämäläinen | Niko Partanen
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

This document describes shared development of finite-state description of two closely related but endangered minority languages, Erzya and Moksha. It touches upon morpholexical unity and diversity of the two languages and how this provides a motivation for shared open-source FST development. We describe how we have designed the transducers so that they can benefit from existing open-source infrastructures and are as reusable as possible.

2019

pdf bib
Revisiting NMT for Normalization of Early English Letters
Mika Hämäläinen | Tanja Säily | Jack Rueter | Jörg Tiedemann | Eetu Mäkelä
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus. The corpus has previously been normalized so that only less frequent deviant forms are left out without normalization. This paper discusses different methods for improving the normalization of these deviant forms by using different approaches. Adding features to the training data is found to be unhelpful, but using a lexicographical resource to filter the top candidates produced by the NMT model together with lemmatization improves results.

pdf bib
Finding Sami Cognates with a Character-Based NMT Approach
Mika Hämäläinen | Jack Rueter
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)

pdf bib
Morphosyntactic Disambiguation in an Endangered Language Setting
Jeff Ens | Mika Hämäläinen | Jack Rueter | Philippe Pasquier
Proceedings of the 22nd Nordic Conference on Computational Linguistics

Endangered Uralic languages present a high variety of inflectional forms in their morphology. This results in a high number of homonyms in inflections, which introduces a lot of morphological ambiguity in sentences. Previous research has employed constraint grammars to address this problem, however CGs are often unable to fully disambiguate a sentence, and their development is labour intensive. We present an LSTM based model for automatically ranking morphological readings of sentences based on their quality. This ranking can be used to evaluate the existing CG disambiguators or to directly morphologically disambiguate sentences. Our approach works on a morphological abstraction and it can be trained with a very small dataset.

pdf bib
Survey of Uralic Universal Dependencies development
Niko Partanen | Jack Rueter
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

2018

pdf bib
Combining Concepts and Their Translations from Structured Dictionaries of Uralic Minority Languages
Mika Hämäläinen | Liisa Lotta Tarvainen | Jack Rueter
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages
Tommi A. Pirinen | Michael Rießler | Jack Rueter | Trond Trosterud | Francis M. Tyers
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages

pdf bib
Development of an Open Source Natural Language Generation Tool for Finnish
Mika Hämäläinen | Jack Rueter
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages

pdf bib
Towards an open-source universal-dependency treebank for Erzya
Jack Rueter | Francis Tyers
Proceedings of the Fourth International Workshop on Computational Linguistics of Uralic Languages

pdf bib
Normalizing Early English Letters to Present-day English Spelling
Mika Hämäläinen | Tanja Säily | Jack Rueter | Jörg Tiedemann | Eetu Mäkelä
Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper presents multiple methods for normalizing the most deviant and infrequent historical spellings in a corpus consisting of personal correspondence from the 15th to the 19th century. The methods include machine translation (neural and statistical), edit distance and rule-based FST. Different normalization methods are compared and evaluated. All of the methods have their own strengths in word normalization. This calls for finding ways of combining the results from these methods to leverage their individual strengths.

2017

pdf bib
Building and using language resources and infrastructure to develop e-learning programs for a minority language
Heli Uibo | Jack Rueter | Sulev Iva
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition

pdf bib
Synchronized Mediawiki based analyzer dictionary development
Jack Rueter | Mika Hämäläinen
Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages

pdf bib
DEMO: Giellatekno Open-source click-in-text dictionaries for bringing closely related languages into contact.
Jack Rueter
Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages

2015

pdf bib
Oahpa! Õpi! Opiq! Developing free online programs for learning Estonian and Võro
Heli Uibo | Jaak Pruulmann-Vengerfeldt | Jack Rueter | Sulev Iva
Proceedings of the fourth workshop on NLP for computer-assisted language learning