Kyle Gorman


2020

pdf bib
UniMorph 3.0: Universal Morphology
Arya D. McCarthy | Christo Kirov | Matteo Grella | Amrit Nidhi | Patrick Xia | Kyle Gorman | Ekaterina Vylomova | Sabrina J. Mielke | Garrett Nicolai | Miikka Silfverberg | Timofey Arkhangelskiy | Nataly Krizhanovsky | Andrew Krizhanovsky | Elena Klyachko | Alexey Sorokin | John Mansfield | Valts Ernštreits | Yuval Pinter | Cassandra L. Jacobs | Ryan Cotterell | Mans Hulden | David Yarowsky
Proceedings of the 12th Language Resources and Evaluation Conference

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.

pdf bib
Massively Multilingual Pronunciation Modeling with WikiPron
Jackson L. Lee | Lucas F.E. Ashby | M. Elizabeth Garza | Yeonju Lee-Sikka | Sean Miller | Alan Wong | Arya D. McCarthy | Kyle Gorman
Proceedings of the 12th Language Resources and Evaluation Conference

We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary. We first describe the design and use of WikiPron. We then discuss the challenges faced scaling this tool to create an automatically-generated database of 1.7 million pronunciations from 165 languages. Finally, we validate the pronunciation database by using it to train and evaluating a collection of generic grapheme-to-phoneme models. The software, pronunciation data, and models are all made available under permissive open-source licenses.

pdf bib
Detecting Objectifying Language in Online Professor Reviews
Angie Waller | Kyle Gorman
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Student reviews often make reference to professors’ physical appearances. Until recently RateMyProfessors.com, the website of this study’s focus, used a design feature to encourage a “hot or not” rating of college professors. In the wake of recent #MeToo and #TimesUp movements, social awareness of the inappropriateness of these reviews has grown; however, objectifying comments remain and continue to be posted in this online context. We describe two supervised text classifiers for detecting objectifying commentary in professor reviews. We then ensemble these classifiers and use the resulting model to track objectifying commentary at scale. We measure correlations between objectifying commentary, changes to the review website interface, and teacher gender across a ten-year period.

pdf bib
Is the Best Better? Bayesian Statistical Model Comparison for Natural Language Processing
Piotr Szymański | Kyle Gorman
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Recent work raises concerns about the use of standard splits to compare natural language processing models. We propose a Bayesian statistical model comparison technique which uses k-fold cross-validation across multiple data sets to estimate the likelihood that one model will outperform the other, or that the two will produce practically equivalent results. We use this technique to rank six English part-of-speech taggers across two data sets and three evaluation metrics.

pdf bib
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Kyle Gorman | Ryan Cotterell
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion
Kyle Gorman | Lucas F.E. Ashby | Aaron Goyzueta | Arya McCarthy | Shijie Wu | Daniel You
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. Participants were asked to submit systems which take in a sequence of graphemes in a given language as input, then output a sequence of phonemes representing the pronunciation of that grapheme sequence. Nine teams submitted a total of 23 systems, at best achieving a 18% relative reduction in word error rate (macro-averaged over languages), versus strong neural sequence-to-sequence baselines. To facilitate error analysis, we publicly release the complete outputs for all systems—a first for the SIGMORPHON workshop.

2019

pdf bib
Weird Inflects but OK: Making Sense of Morphological Generation Errors
Kyle Gorman | Arya D. McCarthy | Ryan Cotterell | Ekaterina Vylomova | Miikka Silfverberg | Magdalena Markowska
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We conduct a manual error analysis of the CoNLL-SIGMORPHON Shared Task on Morphological Reinflection. This task involves natural language generation: systems are given a word in citation form (e.g., hug) and asked to produce the corresponding inflected form (e.g., the simple past hugged). We propose an error taxonomy and use it to annotate errors made by the top two systems across twelve languages. Many of the observed errors are related to inflectional patterns sensitive to inherent linguistic properties such as animacy or affect; many others are failures to predict truly unpredictable inflectional behaviors. We also find nearly one quarter of the residual “errors” reflect errors in the gold data.

pdf bib
We Need to Talk about Standard Splits
Kyle Gorman | Steven Bedrick
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

It is standard practice in speech & language technology to rank systems according to their performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which claimed state-of-the-art performance on a widely-used “standard split”. While we replicate results on the standard split, we fail to reliably reproduce some rankings when we repeat this analysis with randomly generated training-testing splits. We argue that randomly generated splits should be used in system evaluation.

pdf bib
What Kind of Language Is Hard to Language-Model?
Sabrina J. Mielke | Ryan Cotterell | Kyle Gorman | Brian Roark | Jason Eisner
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

How language-agnostic are current state-of-the-art NLP tools? Are there some types of language that are easier to model with current methods? In prior work (Cotterell et al., 2018) we attempted to address this question for language modeling, and observed that recurrent neural network language models do not perform equally well over all the high-resource European languages found in the Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus. Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora. In other words, the model is aware of inter-sentence variation and can handle missing data. Exploiting this model, we show that “translationese” is not any easier to model than natively written language in a fair comparison. Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.

pdf bib
Neural Models of Text Normalization for Speech Applications
Hao Zhang | Richard Sproat | Axel H. Ng | Felix Stahlberg | Xiaochang Peng | Kyle Gorman | Brian Roark
Computational Linguistics, Volume 45, Issue 2 - June 2019

Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars.We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process.These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers. Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.

2018

pdf bib
Improving homograph disambiguation with supervised machine learning
Kyle Gorman | Gleb Mazovetskiy | Vitaly Nikolaev
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Target word prediction and paraphasia classification in spoken discourse
Joel Adams | Steven Bedrick | Gerasimos Fergadiotis | Kyle Gorman | Jan van Santen
BioNLP 2017

We present a system for automatically detecting and classifying phonologically anomalous productions in the speech of individuals with aphasia. Working from transcribed discourse samples, our system identifies neologisms, and uses a combination of string alignment and language models to produce a lattice of plausible words that the speaker may have intended to produce. We then score this lattice according to various features, and attempt to determine whether the anomalous production represented a phonemic error or a genuine neologism. This approach has the potential to be expanded to consider other types of paraphasic errors, and could be applied to a wide variety of screening and therapeutic applications.

2016

pdf bib
Pynini: A Python library for weighted finite-state grammar compilation
Kyle Gorman
Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata

pdf bib
Minimally Supervised Number Normalization
Kyle Gorman | Richard Sproat
Transactions of the Association for Computational Linguistics, Volume 4

We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems. The first model uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages.

2015

pdf bib
Automated morphological analysis of clinical language samples
Kyle Gorman | Steven Bedrick | Géza Kiss | Eric Morley | Rosemary Ingham | Metrah Mohammed | Katina Papadakis | Jan van Santen
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality