Lucas F.E. Ashby


2020

pdf bib
Massively Multilingual Pronunciation Modeling with WikiPron
Jackson L. Lee | Lucas F.E. Ashby | M. Elizabeth Garza | Yeonju Lee-Sikka | Sean Miller | Alan Wong | Arya D. McCarthy | Kyle Gorman
Proceedings of the 12th Language Resources and Evaluation Conference

We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary. We first describe the design and use of WikiPron. We then discuss the challenges faced scaling this tool to create an automatically-generated database of 1.7 million pronunciations from 165 languages. Finally, we validate the pronunciation database by using it to train and evaluating a collection of generic grapheme-to-phoneme models. The software, pronunciation data, and models are all made available under permissive open-source licenses.

pdf bib
The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion
Kyle Gorman | Lucas F.E. Ashby | Aaron Goyzueta | Arya McCarthy | Shijie Wu | Daniel You
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. Participants were asked to submit systems which take in a sequence of graphemes in a given language as input, then output a sequence of phonemes representing the pronunciation of that grapheme sequence. Nine teams submitted a total of 23 systems, at best achieving a 18% relative reduction in word error rate (macro-averaged over languages), versus strong neural sequence-to-sequence baselines. To facilitate error analysis, we publicly release the complete outputs for all systems—a first for the SIGMORPHON workshop.