Lori Levin

Also published as: Lori S. Levin


2020

pdf bib
A Resource for Computational Experiments on Mapudungun
Mingjun Duan | Carlos Fasola | Sai Krishna Rallabandi | Rodolfo Vega | Antonios Anastasopoulos | Lori Levin | Alan W Black
Proceedings of the 12th Language Resources and Evaluation Conference

We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers. We provide 142 hours of culturally significant conversations in the domain of medical treatment. The conversations are fully transcribed and translated into Spanish. The transcriptions also include annotations for code-switching and non-standard pronunciations. We also provide baseline results on three core NLP tasks: speech recognition, speech synthesis, and machine translation between Spanish and Mapudungun. We further explore other applications for which the corpus will be suitable, including the study of code-switching, historical orthography change, linguistic structure, and sociological and anthropological studies.

pdf bib
An Empirical Exploration of Local Ordering Pre-training for Structured Prediction
Zhisong Zhang | Xiang Kong | Lori Levin | Eduard Hovy
Findings of the Association for Computational Linguistics: EMNLP 2020

Recently, pre-training contextualized encoders with language model (LM) objectives has been shown an effective semi-supervised method for structured prediction. In this work, we empirically explore an alternative pre-training method for contextualized encoders. Instead of predicting words in LMs, we “mask out” and predict word order information, with a local ordering strategy and word-selecting objectives. With evaluations on three typical structured prediction tasks (dependency parsing, POS tagging, and NER) over four languages (English, Finnish, Czech, and Italian), we show that our method is consistently beneficial. We further conduct detailed error analysis, including one that examines a specific type of parsing error where the head is misidentified. The results show that pre-trained contextual encoders can bring improvements in a structured way, suggesting that they may be able to capture higher-order patterns and feature combinations from unlabeled data.

pdf bib
Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations
Xingyuan Zhao | Satoru Ozaki | Antonios Anastasopoulos | Graham Neubig | Lori Levin
Proceedings of the 28th International Conference on Computational Linguistics

Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers. Manual production of IGT takes time and requires linguistic expertise. We attempt to address this issue by creating automatic glossing models, using modern multi-source neural models that additionally leverage easy-to-collect translations. We further explore cross-lingual transfer and a simple output length control mechanism, further refining our models. Evaluated on three challenging low-resource scenarios, our approach significantly outperforms a recent, state-of-the-art baseline, particularly improving on overall accuracy as well as lemma and tag recall.

pdf bib
Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings
Naoki Otani | Satoru Ozaki | Xingyuan Zhao | Yucen Li | Micaelah St Johns | Lori Levin
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Cross-lingual word embedding (CWE) algorithms represent words in multiple languages in a unified vector space. Multi-Word Expressions (MWE) are common in every language. When training word embeddings, each component word of an MWE gets its own separate embedding, and thus, MWEs are not translated by CWEs. We propose a simple method for word translation of MWEs to and from English in ten languages: we first compile lists of MWEs in each language and then tokenize the MWEs as single tokens before training word embeddings. CWEs are trained on a word-translation task using the dictionaries that only contain single words. In order to evaluate MWE translation, we created bilingual word lists from multilingual WordNet that include single-token words and MWEs, and most importantly, include MWEs that correspond to single words in another language. We release these dictionaries to the research community. We show that the pre-tokenization of MWEs as single tokens performs better than averaging the embeddings of the individual tokens of the MWE. We can translate MWEs at a top-10 precision of 30-60%. The tokenization of MWEs makes the occurrences of single words in a training corpus more sparse, but we show that it does not pose negative impacts on single-word translations.

2018

pdf bib
Parser combinators for Tigrinya and Oromo morphology
Patrick Littell | Tom McCoy | Na-Rae Han | Shruti Rijhwani | Zaid Sheikh | David Mortensen | Teruko Mitamura | Lori Levin
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
DeepCx: A transition-based approach for shallow semantic parsing with complex constructional triggers
Jesse Dunietz | Jaime Carbonell | Lori Levin
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

This paper introduces the surface construction labeling (SCL) task, which expands the coverage of Shallow Semantic Parsing (SSP) to include frames triggered by complex constructions. We present DeepCx, a neural, transition-based system for SCL. As a test case for the approach, we apply DeepCx to the task of tagging causal language in English, which relies on a wider variety of constructions than are typically addressed in SSP. We report substantial improvements over previous tagging efforts on a causal language dataset. We also propose ways DeepCx could be extended to still more difficult constructions and to other semantic domains once appropriate datasets become available.

pdf bib
Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations
Aditi Chaudhary | Chunting Zhou | Lori Levin | Graham Neubig | David R. Mortensen | Jaime Carbonell
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Much work in Natural Language Processing (NLP) has been for resource-rich languages, making generalization to new, less-resourced languages challenging. We present two approaches for improving generalization to low-resourced languages by adapting continuous word representations using linguistically motivated subword units: phonemes, morphemes and graphemes. Our method requires neither parallel corpora nor bilingual dictionaries and provides a significant gain in performance over previous methods relying on these resources. We demonstrate the effectiveness of our approaches on Named Entity Recognition for four languages, namely Uyghur, Turkish, Bengali and Hindi, of which Uyghur and Bengali are low resource languages, and also perform experiments on Machine Translation. Exploiting subwords with transfer learning gives us a boost of +15.2 NER F1 for Uyghur and +9.7 F1 for Bengali. We also show improvements in the monolingual setting where we achieve (avg.) +3 F1 and (avg.) +1.35 BLEU.

pdf bib
Annotation Schemes for Surface Construction Labeling
Lori Levin
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

In this talk I will describe the interaction of linguistics and language technologies in Surface Construction Labeling (SCL) from the perspective of corpus annotation tasks such as definiteness, modality, and causality. Linguistically, following Construction Grammar, SCL recognizes that meaning may be carried by morphemes, words, or arbitrary constellations of morpho-lexical elements. SCL is like Shallow Semantic Parsing in that it does not attempt a full compositional analysis of meaning, but rather identifies only the main elements of a semantic frame, where the frames may be invoked by constructions as well as lexical items. Computationally, SCL is different from tasks such as information extraction in that it deals only with meanings that are expressed in a conventional, grammaticalized way and does not address inferred meanings. I review the work of Dunietz (2018) on the labeling of causal frames including causal connectives and cause and effect arguments. I will describe how to design an annotation scheme for SCL, including isolating basic units of form and meaning and building a “constructicon”. I will conclude with remarks about the nature of universal categories and universal meaning representations in language technologies. This talk describes joint work with Jaime Carbonell, Jesse Dunietz, Nathan Schneider, and Miriam Petruck.

2017

pdf bib
URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors
Patrick Littell | David R. Mortensen | Ke Lin | Katherine Kairis | Carlisle Turner | Lori Levin
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We introduce the URIEL knowledge base for massively multilingual NLP and the lang2vec utility, which provides information-rich vector identifications of languages drawn from typological, geographical, and phylogenetic databases and normalized to have straightforward and consistent formats, naming, and semantics. The goal of URIEL and lang2vec is to enable multilingual NLP, especially on less-resourced languages and make possible types of experiments (especially but not exclusively related to NLP tasks) that are otherwise difficult or impossible due to the sparsity and incommensurability of the data sources. lang2vec vectors have been shown to reduce perplexity in multilingual language modeling, when compared to one-hot language identification vectors.

pdf bib
Automatically Tagging Constructions of Causation and Their Slot-Fillers
Jesse Dunietz | Lori Levin | Jaime Carbonell
Transactions of the Association for Computational Linguistics, Volume 5

This paper explores extending shallow semantic parsing beyond lexical-unit triggers, using causal relations as a test case. Semantic parsing becomes difficult in the face of the wide variety of linguistic realizations that causation can take on. We therefore base our approach on the concept of constructions from the linguistic paradigm known as Construction Grammar (CxG). In CxG, a construction is a form/function pairing that can rely on arbitrary linguistic and semantic features. Rather than codifying all aspects of each construction’s form, as some attempts to employ CxG in NLP have done, we propose methods that offload that problem to machine learning. We describe two supervised approaches for tagging causal constructions and their arguments. Both approaches combine automatically induced pattern-matching rules with statistical classifiers that learn the subtler parameters of the constructions. Our results show that these approaches are promising: they significantly outperform naïve baselines for both construction recognition and cause and effect head matches.

pdf bib
The BECauSE Corpus 2.0: Annotating Causality and Overlapping Relations
Jesse Dunietz | Lori Levin | Jaime Carbonell
Proceedings of the 11th Linguistic Annotation Workshop

Language of cause and effect captures an essential component of the semantics of a text. However, causal language is also intertwined with other semantic relations, such as temporal precedence and correlation. This makes it difficult to determine when causation is the primary intended meaning. This paper presents BECauSE 2.0, a new version of the BECauSE corpus with exhaustively annotated expressions of causal language, but also seven semantic relations that are frequently co-present with causation. The new corpus shows high inter-annotator agreement, and yields insights both about the linguistic expressions of causation and about the process of annotating co-present semantic relations.

pdf bib
Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages
Michael Yoder | Shruti Rijhwani | Carolyn Rosé | Lori Levin
Proceedings of the Second Workshop on NLP and Computational Social Science

Code-switching has been found to have social motivations in addition to syntactic constraints. In this work, we explore the social effect of code-switching in an online community. We present a task from the Arabic Wikipedia to capture language choice, in this case code-switching between Arabic and other languages, as a predictor of social influence in collaborative editing. We find that code-switching is positively associated with Wikipedia editor success, particularly borrowing technical language on pages with topics less directly related to Arabic-speaking regions.

2016

pdf bib
Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik
Patrick Littell | David R. Mortensen | Kartik Goyal | Chris Dyer | Lori Levin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition – capitalization – is absent, as the language’s Perso-Arabic script does not make a distinction between uppercase and lowercase letters. We describe a system for deriving an inferred capitalization value from closely related languages by phonological similarity, and illustrate the system using several related Western Iranian languages.

pdf bib
Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning
Yulia Tsvetkov | Sunayana Sitaram | Manaal Faruqui | Guillaume Lample | Patrick Littell | David Mortensen | Alan W Black | Lori Levin | Chris Dyer
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik
Patrick Littell | Kartik Goyal | David R. Mortensen | Alexa Little | Chris Dyer | Lori Levin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of “Linguistic Rapid Response” to potential emergency humanitarian relief situations. In the absence of large annotated corpora, parallel corpora, treebanks, bilingual lexica, etc., we found the following to be effective: exploiting distributional regularities in monolingual data, projecting information across closely related languages, and utilizing human linguist judgments. We show promising results on both a four-month exercise in Sorani and a two-day exercise in Tajik, achieved with minimal annotation costs.

pdf bib
PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors
David R. Mortensen | Patrick Littell | Akash Bharadwaj | Kartik Goyal | Chris Dyer | Lori Levin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper contributes to a growing body of evidence that—when coupled with appropriate machine-learning techniques–linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data. In particular, we show that phonological features outperform character-based models. PanPhon is a database relating over 5,000 IPA segments to 21 subsegmental articulatory features. We show that this database boosts performance in various NER-related tasks. Phonologically aware, neural CRF models built on PanPhon features are able to perform better on monolingual Spanish and Turkish NER tasks that character-based models. They have also been shown to work well in transfer models (as between Uzbek and Turkish). PanPhon features also contribute measurably to Orthography-to-IPA conversion tasks.

2015

pdf bib
Annotating Causal Language Using Corpus Lexicography of Constructions
Jesse Dunietz | Lori Levin | Jaime Carbonell
Proceedings of The 9th Linguistic Annotation Workshop

pdf bib
Proceedings of the Grammar Engineering Across Frameworks (GEAF) 2015 Workshop
Emily M. Bender | Lori Levin | Stefan Müller | Yannick Parmentier | Aarne Ranta
Proceedings of the Grammar Engineering Across Frameworks (GEAF) 2015 Workshop

pdf bib
Unsupervised POS Induction with Word Embeddings
Chu-Cheng Lin | Waleed Ammar | Chris Dyer | Lori Levin
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
The CMU Submission for the Shared Task on Language Identification in Code-Switched Data
Chu-Cheng Lin | Waleed Ammar | Lori Levin | Chris Dyer
Proceedings of the First Workshop on Computational Approaches to Code Switching

pdf bib
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop
Lori Levin | Manfred Stede
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

pdf bib
Keynote Lecture 3: Modeling Non-Propositional Semantics
Lori Levin
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
Automatic Classification of Communicative Functions of Definiteness
Archna Bhatia | Chu-Cheng Lin | Nathan Schneider | Yulia Tsvetkov | Fatima Talib Al-Raisi | Laleh Roostapour | Jordan Bender | Abhimanu Kumar | Lori Levin | Mandy Simons | Chris Dyer
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
A Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness
Archna Bhatia | Mandy Simons | Lori Levin | Yulia Tsvetkov | Chris Dyer | Jordan Bender
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a definiteness annotation scheme that captures the semantic, pragmatic, and discourse information, which we call communicative functions, associated with linguistic descriptions such as “a story about my speech”, “the story”, “every time I give it”, “this slideshow”. A survey of the literature suggests that definiteness does not express a single communicative function but is a grammaticalization of many such functions, for example, identifiability, familiarity, uniqueness, specificity. Our annotation scheme unifies ideas from previous research on definiteness while attempting to remove redundancy and make it easily annotatable. This annotation scheme encodes the communicative functions of definiteness rather than the grammatical forms of definiteness. We assume that the communicative functions are largely maintained across languages while the grammaticalization of this information may vary. One of the final goals is to use our semantically annotated corpora to discover how definiteness is grammaticalized in different languages. We release our annotated corpora for English and Hindi, and sample annotations for Hebrew and Russian, together with an annotation manual.

pdf bib
Resources for the Detection of Conventionalized Metaphors in Four Languages
Lori Levin | Teruko Mitamura | Brian MacWhinney | Davida Fromm | Jaime Carbonell | Weston Feely | Robert Frederking | Anatole Gershman | Carlos Ramirez
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes a suite of tools for extracting conventionalized metaphors in English, Spanish, Farsi, and Russian. The method depends on three significant resources for each language: a corpus of conventionalized metaphors, a table of conventionalized conceptual metaphors (CCM table), and a set of extraction rules. Conventionalized metaphors are things like “escape from poverty” and “burden of taxation”. For each metaphor, the CCM table contains the metaphorical source domain word (such as “escape”) the target domain word (such as “poverty”) and the grammatical construction in which they can be found. The extraction rules operate on the output of a dependency parser and identify the grammatical configurations (such as a verb with a prepositional phrase complement) that are likely to contain conventional metaphors. We present results on detection rates for conventional metaphors and analysis of the similarity and differences of source domains for conventional metaphors in the four languages.

pdf bib
The CMU METAL Farsi NLP Approach
Weston Feely | Mehdi Manshadi | Robert Frederking | Lori Levin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

While many high-quality tools are available for analyzing major languages such as English, equivalent freely-available tools for important but lower-resourced languages such as Farsi are more difficult to acquire and integrate into a useful NLP front end. We report here on an accurate and efficient Farsi analysis front end that we have assembled, which may be useful to others who wish to work with written Farsi. The pre-existing components and resources that we incorporated include the Carnegie Mellon TurboParser and TurboTagger (Martins et al., 2010) trained on the Dadegan Treebank (Rasooli et al., 2013), the Uppsala Farsi text normalizer PrePer (Seraji, 2013), the Uppsala Farsi tokenizer (Seraji et al., 2012a), and Jon Dehdari’s PerStem (Jadidinejad et al., 2010). This set of tools (combined with additional normalization and tokenization modules that we have developed and made available) achieves a dependency parsing labeled attachment score of 89.49%, unlabeled attachment score of 92.19%, and label accuracy score of 91.38% on a held-out parsing test data set. All of the components and resources used are freely available. In addition to describing the components and resources, we also explain the rationale for our choices.

pdf bib
Morphological parsing of Swahili using crowdsourced lexical resources
Patrick Littell | Kaitlyn Price | Lori Levin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We describe a morphological analyzer for the Swahili language, written in an extension of XFST/LEXC intended for the easy declaration of morphophonological patterns and importation of lexical resources. Our analyzer was supplemented extensively with data from the Kamusi Project (kamusi.org), a user-contributed multilingual dictionary. Making use of this resource allowed us to achieve wide lexical coverage quickly, but the heterogeneous nature of user-contributed content also poses some challenges when adapting it for use in an expert system.

2013

pdf bib
Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options
Yulia Tsvetkov | Chris Dyer | Lori Levin | Archna Bhatia
Proceedings of the Eighth Workshop on Statistical Machine Translation

pdf bib
Introducing Computational Concepts in a Linguistics Olympiad
Patrick Littell | Lori Levin | Jason Eisner | Dragomir Radev
Proceedings of the Fourth Workshop on Teaching NLP and CL

pdf bib
The Effects of Lexical Resource Quality on Preference Violation Detection
Jesse Dunietz | Lori Levin | Jaime Carbonell
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Modality and Negation in SIMT Use of Modality and Negation in Semantically-Informed Syntactic MT
Kathryn Baker | Michael Bloodgood | Bonnie J. Dorr | Chris Callison-Burch | Nathaniel W. Filardo | Christine Piatko | Lori Levin | Scott Miller
Computational Linguistics, Volume 38, Issue 2 - June 2012

pdf bib
Statistical Modality Tagging from Rule-based Annotations and Crowdsourcing
Vinodkumar Prabhakaran | Michael Bloodgood | Mona Diab | Bonnie Dorr | Lori Levin | Christine D. Piatko | Owen Rambow | Benjamin Van Durme
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

2010

pdf bib
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
Fei Xia | William Lewis | Lori Levin
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

pdf bib
A Modality Lexicon and its use in Automatic Tagging
Kathryn Baker | Michael Bloodgood | Bonnie Dorr | Nathaniel W. Filardo | Lori Levin | Christine Piatko
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme. Our annotation scheme is based on identifying three components of modality: a trigger, a target and a holder. We describe how our modality lexicon was produced semi-automatically, expanding from an initial hand-selected list of modality trigger words and phrases. The resulting expanded modality lexicon is being made publicly available. We demonstrate that one tagger―a structure-based tagger―results in precision around 86% (depending on genre) for tagging of a standard LDC data set. In a machine translation application, using the structure-based tagger to annotate English modalities on an English-Urdu training corpus improved the translation quality score for Urdu by 0.3 Bleu points in the face of sparse training data.

2009

pdf bib
Proceedings of the First Workshop on Language Technologies for African Languages
Lori Levin | John Kiango | Judith Klavans | Guy De Pauw | Gilles-Maurice de Schryver | Peter Waiganjo Wagacha
Proceedings of the First Workshop on Language Technologies for African Languages

pdf bib
Committed Belief Annotation and Tagging
Mona Diab | Lori Levin | Teruko Mitamura | Owen Rambow | Vinodkumar Prabhakaran | Weiwei Guo
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
Adaptable, Community-Controlled, Language Technologies for Language Maintenance
Lori Levin
Proceedings of the 13th Annual conference of the European Association for Machine Translation

2008

pdf bib
Toward Active Learning in Data Selection: Automatic Discovery of Language Features During Elicitation
Jonathan Clark | Robert Frederking | Lori Levin
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Data Selection has emerged as a common issue in language technologies. We define Data Selection as the choosing of a subset of training data that is most effective for a given task. This paper describes deductive feature detection, one component of a data selection system for machine translation. Feature detection determines whether features such as tense, number, and person are expressed in a language. The database of the The World Atlas of Language Structures provides a gold standard against which to evaluate feature detection. The discovered features can be used as input to a Navigator, which uses active learning to determine which piece of language data is the most important to acquire next.

pdf bib
Linguistic Structure and Bilingual Informants Help Induce Machine Translation of Lesser-Resourced Languages
Christian Monson | Ariadna Font Llitjós | Vamshi Ambati | Lori Levin | Alon Lavie | Alison Alvarez | Roberto Aranovich | Jaime Carbonell | Robert Frederking | Erik Peterson | Katharina Probst
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on leveraging to the maximum extent two resources that are available for minority languages: linguistic structure and bilingual informants. All natural languages contain linguistic structure. And although the details of that linguistic structure vary from language to language, language universals such as context-free syntactic structure and the paradigmatic structure of inflectional morphology, allow us to learn the specific details of a minority language. Similarly, most minority languages possess speakers who are bilingual with the major language of the area. This paper discusses our efforts to utilize linguistic structure and the translation information that bilingual informants can provide in three sub-areas of our rapid development MT program: morphology induction, syntactic transfer rule learning, and refinement of imperfect learned rules.

pdf bib
The North American Computational Linguistics Olympiad (NACLO)
Dragomir R. Radev | Lori Levin | Thomas E. Payne
Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics

pdf bib
Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation
Jonathan H. Clark | Robert Frederking | Lori Levin
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

pdf bib
Evaluating an Agglutinative Segmentation Model for ParaMor
Christian Monson | Alon Lavie | Jaime Carbonell | Lori Levin
Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology

2007

pdf bib
ParaMor: Minimally Supervised Induction of Paradigm Structure and Morphological Analysis
Christian Monson | Jaime Carbonell | Alon Lavie | Lori Levin
Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology

2006

pdf bib
Parallel Syntactic Annotation of Multiple Languages
Owen Rambow | Bonnie Dorr | David Farwell | Rebecca Green | Nizar Habash | Stephen Helmreich | Eduard Hovy | Lori Levin | Keith J. Miller | Teruko Mitamura | Florence Reeder | Advaith Siddharthan
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes an effort to investigate the incrementally deepening development of an interlingua notation, validated by human annotation of texts in English plus six languages. We begin with deep syntactic annotation, and in this paper present a series of annotation manuals for six different languages at the deep-syntactic level of representation. Many syntactic differences between languages are removed in the proposed syntactic annotation, making them useful resources for multilingual NLP projects with semantic components.

pdf bib
Understanding Temporal Expressions in Emails
Benjamin Han | Donna Gates | Lori Levin
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
The MILE Corpus for Less Commonly Taught Languages
Alison Alvarez | Lori Levin | Robert Frederking | Simon Fung | Donna Gates | Jeff Good
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2004

pdf bib
A trainable transfer-based MT approach for languages with limited resources
Alon Lavie | Katharina Probst | Erik Peterson | Stephan Vogel | Lori Levin | Ariadna Font-Llitjos | Jaime Carbonell
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

pdf bib
Data Collection and Analysis of Mapudungun Morphology for Spelling Correction
Christian Monson | Lori Levin | Rodolfo Vega | Ralf Brown | Ariadna Font Llitjos | Alon Lavie | Jaime Carbonell | Eliseo Cañulef | Rosendo Huisca
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Unsupervised Induction of Natural Language Morphology Inflection Classes
Christian Monson | Alon Lavie | Jaime Carbonell | Lori Levin
Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology

pdf bib
Interlingual Annotation of Multilingual Text Corpora
Stephen Helmreich | David Farwell | Bonnie Dorr | Nizar Habash | Lori Levin | Teruko Mitamura | Florence Reeder | Keith Miller | Eduard Hovy | Owen Rambow | Advaith Siddharthan
Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004

2003

pdf bib
Domain Specific Speech Acts for Spoken Language Translation
Lori Levin | Chad Langley | Alon Lavie | Donna Gates | Dorcas Wallace | Kay Peterson
Proceedings of the Fourth SIGdial Workshop of Discourse and Dialogue

pdf bib
Speechalator: Two-Way Speech-to-Speech Translation in Your Hand
Alex Waibel | Ahmed Badran | Alan W. Black | Robert Frederking | Donna Gates | Alon Lavie | Lori Levin | Kevin Lenzo | Laura Mayfield Tomokiyo | Juergen Reichert | Tanja Schultz | Dorcas Wallace | Monika Woszczyna | Jing Zhang
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

2002

pdf bib
Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers
Chad Langley | Alon Lavie | Lori Levin | Dorcas Wallace | Donna Gates | Kay Peterson
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
Balancing Expressiveness and Simplicity in an Interlingua for Task Based Dialogue
Lori Levin | Donna Gates | Dorcas Pianta | Roldano Cattoni | Nadia Mana | Kay Peterson | Alon Lavie | Fabio Pianesi
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

2001

pdf bib
Domain Portability in Speech-to-Speech Translation
Alon Lavie | Lori Levin | Tanja Schultz | Chad Langley | Benjamin Han | Alicia Tribble | Donna Gates | Dorcas Wallace | Kay Peterson
Proceedings of the First International Conference on Human Language Technology Research

2000

pdf bib
Lessons Learned from a Task-based Evaluation of Speech-to-Speech Machine Translation
Lori Levin | Boris Bartlog | Ariadna Font Llitjos | Donna Gates | Alon Lavie | Dorcas Wallace | Taro Watanabe | Monika Woszczyna
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Shallow Discourse Genre Annotation in CallHome Spanish
Klaus Ries | Lori Levin | Liza Valle | Alon Lavie | Alex Waibel
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Evaluation of a Practical Interlingua for Task-Oriented Dialogue
Lori Levin | Donna Gates | Alon Lavie | Fabio Pianesi | Dorcas Wallace | Taro Watanabe
NAACL-ANLP 2000 Workshop: Applied Interlinguas: Practical Applications of Interlingual Approaches to NLP

1999

pdf bib
Tagging of Speech Acts and Dialogue Games in Spanish Call Home
Lori Levin | Klaus Ries | Ann Thyme-Gobbel | Alon Lavie
Towards Standards and Tools for Discourse Tagging

1998

pdf bib
An Interactive Domain Independent Approach to Robust Dialogue Interpretation
Carolyn Penstein Rose | Lori S. Levin
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
An Interactive Domain Independent Approach to Robust Dialogue Interpretation
Carolyn Penstein Rose | Lori S. Levin
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1997

pdf bib
Expanding the Domain of a Multi-lingual Speech-to-Speech Translation System
Alon Lavie | Lori Levin | Puming Zhan | Maite Taboada | Donna Gates | Mirella Lapata | Cortis Clark | Matthew Broadhead | Alex Waibel
Spoken Language Translation

1996

pdf bib
Multi-lingual Translation of Spontaneously Spoken Language in a Limited Domain
Alon Lavie | Donna Gates | Marsal Gavalda | Laura Mayfield | Alex Waibel | Lori Levin
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

pdf bib
JANUS: multi-lingual translation of spontaneous speech in limited domain
Alon Lavie | Lori Levin | Alex Waibel | Donna Gates | Marsal Gavalda | Laura Mayfield
Conference of the Association for Machine Translation in the Americas

1995

pdf bib
Discourse Processing of Dialogues with Multiple Threads
Carolyn Penstein Rosé | Barbara Di Eugenio | Lori S. Levin | Carol Van Ess-Dykema
33rd Annual Meeting of the Association for Computational Linguistics

1994

pdf bib
PANGLOSS
Jaime Carbonell | David Farwell | Robert Frederking | Steven Helmreich | Eduard Hovy | Kevin Knight | Lori Levin | Sergei Nirenburg
Proceedings of the First Conference of the Association for Machine Translation in the Americas

pdf bib
The Correct Place of Lexical Semantics in Interlingual MT
Lori Levin | Sergei Nirenburg
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

1991

pdf bib
Syntax-Driven and Ontology-Driven Lexical Semantics
Sergei Nirenburg | Lori Levin
Lexical Semantics and Knowledge Representation

1989

pdf bib
Ambiguity Resolution in the DMTRANS PLUS
Hiroaki Kitano | Hideto Tomabechi | Lori Levin
Fourth Conference of the European Chapter of the Association for Computational Linguistics

Search
Co-authors