Micha Elsner


2020

pdf bib
Acquiring language from speech by learning to remember and predict
Cory Shain | Micha Elsner
Proceedings of the 24th Conference on Computational Natural Language Learning

Classical accounts of child language learning invoke memory limits as a pressure to discover sparse, language-like representations of speech, while more recent proposals stress the importance of prediction for language learning. In this study, we propose a broad-coverage unsupervised neural network model to test memory and prediction as sources of signal by which children might acquire language directly from the perceptual stream. Our model embodies several likely properties of real-time human cognition: it is strictly incremental, it encodes speech into hierarchically organized labeled segments, it allows interactive top-down and bottom-up information flow, it attempts to model its own sequence of latent representations, and its objective function only recruits local signals that are plausibly supported by human working memory capacity. We show that much phonemic structure is learnable from unlabeled speech on the basis of these local signals. We further show that remembering the past and predicting the future both contribute to the linguistic content of acquired representations, and that these contributions are at least partially complementary.

pdf bib
The Paradigm Discovery Problem
Alexander Erdmann | Micha Elsner | Shijie Wu | Ryan Cotterell | Nizar Habash
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This work treats the paradigm discovery problem (PDP), the task of learning an inflectional morphological system from unannotated sentences. We formalize the PDP and develop evaluation metrics for judging systems. Using currently available resources, we construct datasets for the task. We also devise a heuristic benchmark for the PDP and report empirical results on five diverse languages. Our benchmark system first makes use of word embeddings and string similarity to cluster forms by cell and by paradigm. Then, we bootstrap a neural transducer on top of the clustered data to predict words to realize the empty paradigm slots. An error analysis of our system suggests clustering by cell across different inflection classes is the most pressing challenge for future work.

pdf bib
Stop the Morphological Cycle, I Want to Get Off: Modeling the Development of Fusion
Micha Elsner | Martha Johnson | Stephanie Antetomaso | Andrea Sims
Proceedings of the Society for Computation in Linguistics 2020

pdf bib
Interpreting Sequence-to-Sequence Models for Russian Inflectional Morphology
David King | Andrea Sims | Micha Elsner
Proceedings of the Society for Computation in Linguistics 2020

2019

pdf bib
Measuring the perceptual availability of phonological features during language acquisition using unsupervised binary stochastic autoencoders
Cory Shain | Micha Elsner
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we deploy binary stochastic neural autoencoder networks as models of infant language learning in two typologically unrelated languages (Xitsonga and English). We show that the drive to model auditory percepts leads to latent clusters that partially align with theory-driven phonemic categories. We further evaluate the degree to which theory-driven phonological features are encoded in the latent bit patterns, finding that some (e.g. [+-approximant]), are well represented by the network in both languages, while others (e.g. [+-spread glottis]) are less so. Together, these findings suggest that many reliable cues to phonemic structure are immediately available to infants from bottom-up perceptual characteristics alone, but that these cues must eventually be supplemented by top-down lexical and phonotactic information to achieve adult-like phone discrimination. Our results also suggest differences in degree of perceptual availability between features, yielding testable predictions as to which features might depend more or less heavily on top-down cues during child language acquisition.

pdf bib
Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities
Alexander Erdmann | David Joseph Wrisley | Benjamin Allen | Christopher Brown | Sophie Cohen-Bodénès | Micha Elsner | Yukun Feng | Brian Joseph | Béatrice Joyeux-Prunel | Marie-Catherine de Marneffe
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Scholars in inter-disciplinary fields like the Digital Humanities are increasingly interested in semantic annotation of specialized corpora. Yet, under-resourced languages, imperfect or noisily structured data, and user-specific classification tasks make it difficult to meet their needs using off-the-shelf models. Manual annotation of large corpora from scratch, meanwhile, can be prohibitively expensive. Thus, we propose an active learning solution for named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. Our system robustly handles any domain or user-defined label set and requires no external resources, enabling quality named entity recognition for Humanities corpora where such resources are not available. Evaluating on typologically disparate languages and datasets, we reduce required annotation by 20-60% and greatly outperform a competitive active learning baseline.

2018

pdf bib
Lexical Networks in !Xung
Syed-Amad Hussain | Micha Elsner | Amanda Miller
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

We investigate the lexical network properties of the large phoneme inventory Southern African language Mangetti Dune !Xung as it compares to English and other commonly-studied languages. Lexical networks are graphs in which nodes (words) are linked to their minimal pairs; global properties of these networks are believed to mediate lexical access in the minds of speakers. We show that the network properties of !Xung are within the range found in previously-studied languages. By simulating data (”pseudolexicons”) with varying levels of phonotactic structure, we find that the lexical network properties of !Xung diverge from previously-studied languages when fewer phonotactic constraints are retained. We conclude that lexical network properties are representative of an underlying cognitive structure which is necessary for efficient word retrieval and that the phonotactics of !Xung may be shaped by a selective pressure which preserves network properties within this cognitively useful range.

2017

pdf bib
Click reduction in fluent speech: a semi-automated analysis of Mangetti Dune !Xung
Amanda Miller | Micha Elsner
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

pdf bib
Breaking NLP: Using Morphosyntax, Semantics, Pragmatics and World Knowledge to Fool Sentiment Analysis Systems
Taylor Mahler | Willy Cheung | Micha Elsner | David King | Marie-Catherine de Marneffe | Cory Shain | Symon Stevens-Guille | Michael White
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

This paper describes our “breaker” submission to the 2017 EMNLP “Build It Break It” shared task on sentiment analysis. In order to cause the “builder” systems to make incorrect predictions, we edited items in the blind test data according to linguistically interpretable strategies that allow us to assess the ease with which the builder systems learn various components of linguistic structure. On the whole, our submitted pairs break all systems at a high rate (72.6%), indicating that sentiment analysis as an NLP task may still have a lot of ground to cover. Of the breaker strategies that we consider, we find our semantic and pragmatic manipulations to pose the most substantial difficulties for the builder systems.

pdf bib
Speech segmentation with a neural encoder model of working memory
Micha Elsner | Cory Shain
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present the first unsupervised LSTM speech segmenter as a cognitive model of the acquisition of words from unsegmented input. Cognitive biases toward phonological and syntactic predictability in speech are rooted in the limitations of human memory (Baddeley et al., 1998); compressed representations are easier to acquire and retain in memory. To model the biases introduced by these memory limitations, our system uses an LSTM-based encoder-decoder with a small number of hidden units, then searches for a segmentation that minimizes autoencoding loss. Linguistically meaningful segments (e.g. words) should share regular patterns of features that facilitate decoder performance in comparison to random segmentations, and we show that our learner discovers these patterns when trained on either phoneme sequences or raw acoustics. To our knowledge, ours is the first fully unsupervised system to be able to segment both symbolic and acoustic representations of speech.

2016

pdf bib
Joint Word Segmentation and Phonetic Category Induction
Micha Elsner | Stephanie Antetomaso | Naomi Feldman
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Micha Elsner | Sandra Kuebler
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
Automatic discovery of Latin syntactic changes
Micha Elsner | Emily Lane
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib
Challenges and Solutions for Latin Named Entity Recognition
Alexander Erdmann | Christopher Brown | Brian Joseph | Mark Janse | Petra Ajaka | Micha Elsner | Marie-Catherine de Marneffe
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality.

2015

pdf bib
Abstract Representations of Plot Structure
Micha Elsner
Linguistic Issues in Language Technology, Volume 12, 2015 - Literature Lifts up Computational Linguistics

Since the 18th century, the novel has been one of the defining forms of English writing, a mainstay of popular entertainment and academic criticism. Despite its importance, however, there are few computational studies of the large-scale structure of novels—and many popular representations for discourse modeling do not work very well for novelistic texts. This paper describes a high-level representation of plot structure which tracks the frequency of mentions of different characters, topics and emotional words over time. The representation can distinguish with high accuracy between real novels and artificially permuted surrogates; characters are important for eliminating random permutations, while topics are effective at distinguishing beginnings from ends.

2014

pdf bib
Bootstrapping into Filler-Gap: An Acquisition Story
Marten van Schijndel | Micha Elsner
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process
Kairit Sirts | Jacob Eisenstein | Micha Elsner | Sharon Goldwater
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Information Structure Prediction for Visual-world Referring Expressions
Micha Elsner | Hannah Rohde | Alasdair Clarke
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
Micha Elsner | Sharon Goldwater | Naomi Feldman | Frank Wood
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
Micha Elsner | Sharon Goldwater | Jacob Eisenstein
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Character-based kernels for novelistic plot structure
Micha Elsner
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Learning to Fuse Disparate Sentences
Micha Elsner | Deepak Santhanam
Proceedings of the Workshop on Monolingual Text-To-Text Generation

pdf bib
Disentangling Chat with Local Coherence Models
Micha Elsner | Eugene Charniak
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Extending the Entity Grid with Entity-Specific Features
Micha Elsner | Eugene Charniak
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Disentangling Chat
Micha Elsner | Eugene Charniak
Computational Linguistics, Volume 36, Issue 3 - September 2010

pdf bib
The Same-Head Heuristic for Coreference
Micha Elsner | Eugene Charniak
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
Bounding and Comparing Methods for Correlation Clustering Beyond ILP
Micha Elsner | Warren Schudy
Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing

pdf bib
EM Works for Pronoun Anaphora Resolution
Eugene Charniak | Micha Elsner
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Structured Generative Models for Unsupervised Named-Entity Clustering
Micha Elsner | Eugene Charniak | Mark Johnson
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2008

pdf bib
You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement
Micha Elsner | Eugene Charniak
Proceedings of ACL-08: HLT

pdf bib
Coreference-inspired Coherence Modeling
Micha Elsner | Eugene Charniak
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
A Unified Local and Global Model for Discourse Coherence
Micha Elsner | Joseph Austerweil | Eugene Charniak
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

2006

pdf bib
Multilevel Coarse-to-Fine PCFG Parsing
Eugene Charniak | Mark Johnson | Micha Elsner | Joseph Austerweil | David Ellis | Isaac Haxton | Catherine Hill | R. Shrivaths | Jeremy Moore | Michael Pozar | Theresa Vu
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
Online Statistics for a Unification-Based Dialogue Parser
Micha Elsner | Mary Swift | James Allen | Daniel Gildea
Proceedings of the Ninth International Workshop on Parsing Technology