Eric Joanis


2020

pdf bib
The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0 with Preliminary Machine Translation Results
Eric Joanis | Rebecca Knowles | Roland Kuhn | Samuel Larkin | Patrick Littell | Chi-kiu Lo | Darlene Stewart | Jeffrey Micher
Proceedings of the 12th Language Resources and Evaluation Conference

The Inuktitut language, a member of the Inuit-Yupik-Unangan language family, is spoken across Arctic Canada and noted for its morphological complexity. It is an official language of two territories, Nunavut and the Northwest Territories, and has recognition in additional regions. This paper describes a newly released sentence-aligned Inuktitut–English corpus based on the proceedings of the Legislative Assembly of Nunavut, covering sessions from April 1999 to June 2017. With approximately 1.3 million aligned sentence pairs, this is, to our knowledge, the largest parallel corpus of a polysynthetic language or an Indigenous language of the Americas released to date. The paper describes the alignment methodology used, the evaluation of the alignments, and preliminary experiments on statistical and neural machine translation (SMT and NMT) between Inuktitut and English, in both directions.

pdf bib
The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software
Roland Kuhn | Fineen Davis | Alain Désilets | Eric Joanis | Anna Kazantseva | Rebecca Knowles | Patrick Littell | Delaney Lothian | Aidan Pine | Caroline Running Wolf | Eddie Santos | Darlene Stewart | Gilles Boulianne | Vishwa Gupta | Brian Maracle Owennatékha | Akwiratékha’ Martin | Christopher Cox | Marie-Odile Junker | Olivia Sammons | Delasie Torkornoo | Nathan Thanyehténhas Brinklow | Sara Child | Benoît Farley | David Huggins-Daines | Daisy Rosenblum | Heather Souter
Proceedings of the 28th International Conference on Computational Linguistics

This paper surveys the first, three-year phase of a project at the National Research Council of Canada that is developing software to assist Indigenous communities in Canada in preserving their languages and extending their use. The project aimed to work within the empowerment paradigm, where collaboration with communities and fulfillment of their goals is central. Since many of the technologies we developed were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (such as Kanyen’kéha, in the Iroquoian language family), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for recordings of speech in Indigenous languages (and other languages), software for implementing text prediction and read-along audiobooks for Indigenous languages, and several other subprojects.

2010

pdf bib
Lessons from NRC’s Portage System at WMT 2010
Samuel Larkin | Boxing Chen | George Foster | Ulrich Germann | Eric Joanis | Howard Johnson | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib
Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too
Ulrich Germann | Eric Joanis | Samuel Larkin
Proceedings of the Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing (SETQA-NLP 2009)

2007

pdf bib
Integration of an Arabic Transliteration Module into a Statistical Machine Translation System
Mehdi M. Kashani | Eric Joanis | Roland Kuhn | George Foster | Fred Popowich
Proceedings of the Second Workshop on Statistical Machine Translation

2006

pdf bib
PORTAGE: with Smoothed Phrase Tables and Segment Choice Models
Howard Johnson | Fatiha Sadat | George Foster | Roland Kuhn | Michel Simard | Eric Joanis | Samuel Larkin
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Segment Choice Models: Feature-Rich Models for Global Distortion in Statistical Machine Translation
Roland Kuhn | Denis Yuen | Michel Simard | Patrick Paul | George Foster | Eric Joanis | Howard Johnson
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2003

pdf bib
Semi-supervised Verb Class Discovery Using Noisy Features
Suzanne Stevenson | Eric Joanis
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

pdf bib
A General Feature Space for Automatic Verb Classification
Eric Joanis | Suzanne Stevenson
10th Conference of the European Chapter of the Association for Computational Linguistics