Janet Pierrehumbert

Also published as: Janet B. Pierrehumbert


pdf bib
A Graph Auto-encoder Model of Derivational Morphology
Valentin Hofmann | Hinrich Schütze | Janet Pierrehumbert
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

There has been little work on modeling the morphological well-formedness (MWF) of derivatives, a problem judged to be complex and difficult in linguistics. We present a graph auto-encoder that learns embeddings capturing information about the compatibility of affixes and stems in derivation. The auto-encoder models MWF in English surprisingly well by combining syntactic and semantic information with associative information from the mental lexicon.

pdf bib
Predicting the Growth of Morphological Families from Social and Linguistic Factors
Valentin Hofmann | Janet Pierrehumbert | Hinrich Schütze
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present the first study that examines the evolution of morphological families, i.e., sets of morphologically related words such as “trump”, “antitrumpism”, and “detrumpify”, in social media. We introduce the novel task of Morphological Family Expansion Prediction (MFEP) as predicting the increase in the size of a morphological family. We create a ten-year Reddit corpus as a benchmark for MFEP and evaluate a number of baselines on this benchmark. Our experiments demonstrate very good performance on MFEP.

pdf bib
DagoBERT: Generating Derivational Morphology with a Pretrained Language Model
Valentin Hofmann | Janet Pierrehumbert | Hinrich Schütze
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Can pretrained language models (PLMs) generate derivationally complex words? We present the first study investigating this question, taking BERT as the example PLM. We examine BERT’s derivational capabilities in different settings, ranging from using the unmodified pretrained model to full finetuning. Our best model, DagoBERT (Derivationally and generatively optimized BERT), clearly outperforms the previous state of the art in derivation generation (DG). Furthermore, our experiments show that the input segmentation crucially impacts BERT’s derivational knowledge, suggesting that the performance of PLMs could be further improved if a morphologically informed vocabulary of units were used.


pdf bib
On Hapax Legomena and Morphological Productivity
Janet Pierrehumbert | Ramon Granell
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Quantifying and predicting morphological productivity is a long-standing challenge in corpus linguistics and psycholinguistics. The same challenge reappears in natural language processing in the context of handling words that were not seen in the training set (out-of-vocabulary, or OOV, words). Prior research showed that a good indicator of the productivity of a morpheme is the number of words involving it that occur exactly once (the hapax legomena). A technical connection was adduced between this result and Good-Turing smoothing, which assigns probability mass to unseen events on the basis of the simplifying assumption that word frequencies are stationary. In a large-scale study of 133 affixes in Wikipedia, we develop evidence that success in fact depends on tapping the frequency range in which the assumptions of Good-Turing are violated.


pdf bib
Rules, Analogy, and Social Factors Codetermine Past-tense Formation Patterns in English
Péter Rácz | Clayton Beckner | Jennifer B. Hay | Janet B. Pierrehumbert
Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM

pdf bib
Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages
Peter Baumann | Janet Pierrehumbert
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The world-wide proliferation of digital communications has created the need for language and speech processing systems for under-resourced languages. Developing such systems is challenging if only small data sets are available, and the problem is exacerbated for languages with highly productive morphology. However, many under-resourced languages are spoken in multi-lingual environments together with at least one resource-rich language and thus have numerous borrowings from resource-rich languages. Based on this insight, we argue that readily available resources from resource-rich languages can be used to bootstrap the morphological analyses of under-resourced languages with complex and productive morphological systems. In a case study of two such languages, Tagalog and Zulu, we show that an easily obtainable English wordlist can be deployed to seed a morphological analysis algorithm from a small training set of conversational transcripts. Our method achieves a precision of 100% and identifies 28 and 66 of the most productive affixes in Tagalog and Zulu, respectively.


pdf bib
Much ado about nothing: A social network model of Russian paradigmatic gaps
Robert Daland | Andrea D. Sims | Janet Pierrehumbert
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics


pdf bib
Stochastic phonological grammars and acceptability
John Coleman | Janet Pierrehumbert
Computational Phonology: Third Meeting of the ACL Special Interest Group in Computational Phonology


pdf bib
The Intonational Structuring of Discourse
Julia Hirschberg | Janet Pierrehumbert
24th Annual Meeting of the Association for Computational Linguistics

pdf bib
Japanese Prosodic Phrasing and Intonation Synthesis
Mary E. Beckman | Janet B. Pierrehumbert
24th Annual Meeting of the Association for Computational Linguistics


pdf bib
Automatic Recognition of Intonation Patterns
Janet B. Pierrehumbert
21st Annual Meeting of the Association for Computational Linguistics