Lane Schwartz


2020

pdf bib
Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function Morphology
Emily Chen | Hyunji Hayley Park | Lane Schwartz
Proceedings of the 12th Language Resources and Evaluation Conference

St. Lawrence Island Yupik is an endangered polysynthetic language of the Bering Strait region. While conducting linguistic fieldwork between 2016 and 2019, we observed substantial support within the Yupik community for language revitalization and for resource development to support Yupik education. To that end, Chen & Schwartz (2018) implemented a finite-state morphological analyzer as a critical enabling technology for use in Yupik language education and technology. Chen & Schwartz (2018) reported a morphological analysis coverage rate of approximately 75% on a dataset of 60K Yupik tokens, leaving considerable room for improvement. In this work, we present a re-implementation of the Chen & Schwartz (2018) finite-state morphological analyzer for St. Lawrence Island Yupik that incorporates new linguistic insights; in particular, in this implementation we make use of the Paradigm Function Morphology (PFM) theory of morphology. We evaluate this new PFM-based morphological analyzer, and demonstrate that it consistently outperforms the existing analyzer of Chen & Schwartz (2018) with respect to accuracy and coverage rate across multiple datasets.

2019

pdf bib
Unsupervised Learning of PCFGs with Normalizing Flow
Lifeng Jin | Finale Doshi-Velez | Timothy Miller | Lane Schwartz | William Schuler
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Unsupervised PCFG inducers hypothesize sets of compact context-free rules as explanations for sentences. PCFG induction not only provides tools for low-resource languages, but also plays an important role in modeling language acquisition (Bannard et al., 2009; Abend et al. 2017). However, current PCFG induction models, using word tokens as input, are unable to incorporate semantics and morphology into induction, and may encounter issues of sparse vocabulary when facing morphologically rich languages. This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information. Linguistically motivated sparsity and categorical distance constraints are imposed on the inducer as regularization. Experiments show that the PCFG induction model with normalizing flow produces grammars with state-of-the-art accuracy on a variety of different languages. Ablation further shows a positive effect of normalizing flow, context embeddings and proposed regularizers.

pdf bib
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)
Antti Arppe | Jeff Good | Mans Hulden | Jordan Lachler | Alexis Palmer | Lane Schwartz | Miikka Silfverberg
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)

pdf bib
Bootstrapping a Neural Morphological Analyzer for St. Lawrence Island Yupik from a Finite-State Transducer
Lane Schwartz | Emily Chen | Benjamin Hunt | Sylvia L.R. Schreiner
Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages Volume 1 (Papers)

pdf bib
Community lexical access for an endangered polysynthetic language: An electronic dictionary for St. Lawrence Island Yupik
Benjamin Hunt | Emily Chen | Sylvia L.R. Schreiner | Lane Schwartz
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)

In this paper, we introduce a morphologically-aware electronic dictionary for St. Lawrence Island Yupik, an endangered language of the Bering Strait region. Implemented using HTML, Javascript, and CSS, the dictionary is set in an uncluttered interface and permits users to search in Yupik or in English for Yupik root words and Yupik derivational suffixes. For each matching result, our electronic dictionary presents the user with the corresponding entry from the Badten (2008) Yupik-English paper dictionary. Because Yupik is a polysynthetic language, handling of multimorphemic word forms is critical. If a user searches for an inflected Yupik word form, we perform a morphological analysis and return entries for the root word and for any derivational suffixes present in the word. This electronic dictionary should serve not only as a valuable resource for all students and speakers of Yupik, but also for field linguists working towards documentation and conservation of the language.

2018

pdf bib
A Morphological Analyzer for St. Lawrence Island / Central Siberian Yupik
Emily Chen | Lane Schwartz
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction
Lifeng Jin | Finale Doshi-Velez | Timothy Miller | William Schuler | Lane Schwartz
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model. Modern depth-bounded grammar inducers have been shown to be more accurate than early unbounded PCFG inducers, but this technique has never been compared against unbounded induction within the same system, in part because most previous depth-bounding models are built around sequence models, the complexity of which grows exponentially with the maximum allowed depth. The present work instead applies depth bounds within a chart-based Bayesian PCFG inducer, where bounding can be switched on and off, and then samples trees with or without bounding. Results show that depth-bounding is indeed significantly effective in limiting the search space of the inducer and thereby increasing accuracy of resulting parsing model, independent of the contribution of modern Bayesian induction techniques. Moreover, parsing results on English, Chinese and German show that this bounded model is able to produce parse trees more accurately than or competitively with state-of-the-art constituency grammar induction models.

pdf bib
Unsupervised Grammar Induction with Depth-bounded PCFG
Lifeng Jin | Finale Doshi-Velez | Timothy Miller | William Schuler | Lane Schwartz
Transactions of the Association for Computational Linguistics, Volume 6

There has been recent interest in applying cognitively- or empirically-motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence models, and therefore more fully exploits the space reductions of depth-bounding. Results for this model on grammar acquisition from transcribed child-directed speech and newswire text exceed or are competitive with those of other models when evaluated on parse accuracy. Moreover, grammars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models.

2017

pdf bib
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages
Antti Arppe | Jeff Good | Mans Hulden | Jordan Lachler | Alexis Palmer | Lane Schwartz
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

2016

pdf bib
Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input
Cory Shain | William Bryce | Lifeng Jin | Victoria Krakovna | Finale Doshi-Velez | Timothy Miller | William Schuler | Lane Schwartz
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper presents a new memory-bounded left-corner parsing model for unsupervised raw-text syntax induction, using unsupervised hierarchical hidden Markov models (UHHMM). We deploy this algorithm to shed light on the extent to which human language learners can discover hierarchical syntax through distributional statistics alone, by modeling two widely-accepted features of human language acquisition and sentence processing that have not been simultaneously modeled by any existing grammar induction algorithm: (1) a left-corner parsing strategy and (2) limited working memory capacity. To model realistic input to human language learners, we evaluate our system on a corpus of child-directed speech rather than typical newswire corpora. Results beat or closely match those of three competing systems.

pdf bib
Normalized Log-Linear Interpolation of Backoff Language Models is Efficient
Kenneth Heafield | Chase Geigle | Sean Massung | Lane Schwartz
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Automated Translation of a Literary Work: A Pilot Study
Laurent Besacier | Lane Schwartz
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

pdf bib
The University of Illinois submission to the WMT 2015 Shared Translation Task
Lane Schwartz | Bill Bryce | Chase Geigle | Sean Massung | Yisi Liu | Haoruo Peng | Vignesh Raja | Subhro Roy | Shyam Upadhyay
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Machine Translation and Monolingual Postediting: The AFRL WMT-14 System
Lane Schwartz | Timothy Anderson | Jeremy Gwinnup | Katherine Young
Proceedings of the Ninth Workshop on Statistical Machine Translation

2011

pdf bib
Incremental Syntactic Language Models for Phrase-based Translation
Lane Schwartz | Chris Callison-Burch | William Schuler | Stephen Wu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Broad-Coverage Parsing Using Human-Like Memory Constraints
William Schuler | Samir AbdelRahman | Tim Miller | Lane Schwartz
Computational Linguistics, Volume 36, Number 1, March 2010

pdf bib
Joshua 2.0: A Toolkit for Parsing-Based Machine Translation with Syntax, Semirings, Discriminative Training and Other Goodies
Zhifei Li | Chris Callison-Burch | Chris Dyer | Juri Ganitkevitch | Ann Irvine | Sanjeev Khudanpur | Lane Schwartz | Wren Thornton | Ziyuan Wang | Jonathan Weese | Omar Zaidan
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Reproducible Results in Parsing-Based Machine Translation: The JHU Shared Task Submission
Lane Schwartz
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2009

pdf bib
Joshua: An Open Source Toolkit for Parsing-Based Machine Translation
Zhifei Li | Chris Callison-Burch | Chris Dyer | Sanjeev Khudanpur | Lane Schwartz | Wren Thornton | Jonathan Weese | Omar Zaidan
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Articles: A Framework for Fast Incremental Interpretation during Speech Decoding
William Schuler | Stephen Wu | Lane Schwartz
Computational Linguistics, Volume 35, Number 3, September 2009

pdf bib
Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation
Zhifei Li | Chris Callison-Burch | Chris Dyer | Juri Ganitkevitch | Sanjeev Khudanpur | Lane Schwartz | Wren N. G. Thornton | Jonathan Weese | Omar F. Zaidan
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

2008

pdf bib
Toward a Psycholinguistically-Motivated Model of Language Processing
William Schuler | Samir AbdelRahman | Tim Miller | Lane Schwartz
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)