Colin Cherry


2020

pdf bib
Re-translation versus Streaming for Simultaneous Translation
Naveen Arivazhagan | Colin Cherry | Wolfgang Macherey | George Foster
Proceedings of the 17th International Conference on Spoken Language Translation

There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis beyond strictly appending words are permitted. This is suitable for applications such as live captioning an audio feed. In this setting, we compare custom streaming approaches to re-translation, a straightforward strategy where each new source token triggers a distinct translation from scratch. We find re-translation to be as good or better than state-of-the-art streaming systems, even when operating under constraints that allow very few revisions. We attribute much of this success to a previously proposed data-augmentation technique that adds prefix-pairs to the training data, which alongside wait-k inference forms a strong baseline for streaming translation. We also highlight re-translation’s ability to wrap arbitrarily powerful MT systems with an experiment showing large improvements from an upgrade to its base model.

pdf bib
Inference Strategies for Machine Translation with Conditional Masking
Julia Kreutzer | George Foster | Colin Cherry
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard “mask-predict” algorithm, and provide analyses of its behavior on machine translation tasks.

pdf bib
Simultaneous Translation
Liang Huang | Colin Cherry | Mingbo Ma | Naveen Arivazhagan | Zhongjun He
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Simultaneous translation, which performs translation concurrently with the source speech, is widely useful in many scenarios such as international conferences, negotiations, press releases, legal proceedings, and medicine. This problem has long been considered one of the hardest problems in AI and one of its holy grails. Recently, with rapid improvements in machine translation, speech recognition, and speech synthesis, there has been exciting progress towards simultaneous translation. This tutorial will focus on the design and evaluation of policies for simultaneous translation, to leave attendees with a deep technical understanding of the history, the recent advances, and the remaining challenges in this field.

2019

pdf bib
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Colin Cherry | Greg Durrett | George Foster | Reza Haffari | Shahram Khadivi | Nanyun Peng | Xiang Ren | Swabha Swayamdipta
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

pdf bib
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
Naveen Arivazhagan | Colin Cherry | Wolfgang Macherey | Chung-Cheng Chiu | Semih Yavuz | Ruoming Pang | Wei Li | Colin Raffel
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far. We do so by introducing Monotonic Infinite Lookback (MILk) attention, which maintains both a hard, monotonic attention head to schedule the reading of the source sentence, and a soft attention head that extends from the monotonic head back to the beginning of the source. We show that MILk’s adaptive schedule allows it to arrive at latency-quality trade-offs that are favorable to those of a recently proposed wait-k strategy for many latency values.

pdf bib
Reinforcement Learning based Curriculum Optimization for Neural Machine Translation
Gaurav Kumar | George Foster | Colin Cherry | Maxim Krikun
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We consider the problem of making efficient use of heterogeneous training data in neural machine translation (NMT). Specifically, given a training dataset with a sentence-level feature such as noise, we seek an optimal curriculum, or order for presenting examples to the system during training. Our curriculum framework allows examples to appear an arbitrary number of times, and thus generalizes data weighting, filtering, and fine-tuning schemes. Rather than relying on prior knowledge to design a curriculum, we use reinforcement learning to learn one automatically, jointly with the NMT system, in the course of a single training run. We show that this approach can beat uniform baselines on Paracrawl and WMT English-to-French datasets by +3.4 and +1.3 BLEU respectively. Additionally, we match the performance of strong filtering baselines and hand-designed, state-of-the-art curricula.

2018

pdf bib
Revisiting Character-Based Neural Machine Translation with Capacity and Compression
Colin Cherry | George Foster | Ankur Bapna | Orhan Firat | Wolfgang Macherey
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Translating characters instead of words or word-fragments has the potential to simplify the processing pipeline for neural machine translation (NMT), and improve results by eliminating hyper-parameters and manual feature engineering. However, it results in longer sequences in which each symbol contains less information, creating both modeling and computational challenges. In this paper, we show that the modeling problem can be solved by standard sequence-to-sequence architectures of sufficient depth, and that deep models operating at the character level outperform identical models operating over word fragments. This result implies that alternative architectures for handling character input are better viewed as methods for reducing computation time than as improved ways of modeling longer sequences. From this perspective, we evaluate several techniques for character-level NMT, verify that they do not match the performance of our deep character baseline model, and evaluate the performance versus computation time tradeoffs they offer. Within this framework, we also perform the first evaluation for NMT of conditional computation over time, in which the model learns which timesteps can be skipped, rather than having them be dictated by a fixed schedule specified before training begins.

pdf bib
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Colin Cherry | Graham Neubig
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Reza Haffari | Colin Cherry | George Foster | Shahram Khadivi | Bahar Salehi
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

2017

pdf bib
Cost Weighting for Neural Machine Translation Domain Adaptation
Boxing Chen | Colin Cherry | George Foster | Samuel Larkin
Proceedings of the First Workshop on Neural Machine Translation

In this paper, we propose a new domain adaptation technique for neural machine translation called cost weighting, which is appropriate for adaptation scenarios in which a small in-domain data set and a large general-domain data set are available. Cost weighting incorporates a domain classifier into the neural machine translation training algorithm, using features derived from the encoder representation in order to distinguish in-domain from out-of-domain data. Classifier probabilities are used to weight sentences according to their domain similarity when updating the parameters of the neural translation model. We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting. Experiments on two large-data tasks show that both the traditional techniques and our novel proposal lead to significant gains, with cost weighting outperforming the traditional methods.

pdf bib
NRC Machine Translation System for WMT 2017
Chi-kiu Lo | Boxing Chen | Colin Cherry | George Foster | Samuel Larkin | Darlene Stewart | Roland Kuhn
Proceedings of the Second Conference on Machine Translation

pdf bib
A Challenge Set Approach to Evaluating Machine Translation
Pierre Isabelle | Colin Cherry | George Foster
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Neural machine translation represents an exciting leap forward in translation quality. But what longstanding weaknesses does it resolve, and which remain? We address these questions with a challenge set approach to translation evaluation and error analysis. A challenge set consists of a small set of sentences, each hand-designed to probe a system’s capacity to bridge a particular structural divergence between languages. To exemplify this approach, we present an English-French challenge set, and use it to analyze phrase-based and neural systems. The resulting analysis provides not only a more fine-grained picture of the strengths of neural systems, but also insight into which linguistic phenomena remain out of reach.

2016

pdf bib
A Dataset for Detecting Stance in Tweets
Saif Mohammad | Svetlana Kiritchenko | Parinaz Sobhani | Xiaodan Zhu | Colin Cherry
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We can often detect from a person’s utterances whether he/she is in favor of or against a given target entity (a product, topic, another person, etc.). Here for the first time we present a dataset of tweets annotated for whether the tweeter is in favor of or against pre-chosen targets of interest―their stance. The targets of interest may or may not be referred to in the tweets, and they may or may not be the target of opinion in the tweets. The data pertains to six targets of interest commonly known and debated in the United States. Apart from stance, the tweets are also annotated for whether the target of interest is the target of opinion in the tweet. The annotations were performed by crowdsourcing. Several techniques were employed to encourage high-quality annotations (for example, providing clear and simple instructions) and to identify and discard poor annotations (for example, using a small set of check questions annotated by the authors). This Stance Dataset, which was subsequently also annotated for sentiment, can be used to better understand the relationship between stance, sentiment, entity relationships, and textual inference.

pdf bib
An Empirical Evaluation of Noise Contrastive Estimation for the Neural Network Joint Model of Translation
Colin Cherry
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Integrating Morphological Desegmentation into Phrase-based Decoding
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
NRC Russian-English Machine Translation System for WMT 2016
Chi-kiu Lo | Colin Cherry | George Foster | Darlene Stewart | Rabib Islam | Anna Kazantseva | Roland Kuhn
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
SemEval-2016 Task 6: Detecting Stance in Tweets
Saif Mohammad | Svetlana Kiritchenko | Parinaz Sobhani | Xiaodan Zhu | Colin Cherry
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
What Matters Most in Morphologically Segmented SMT Models?
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Morpho-syntactic Regularities in Continuous Word Representations: A multilingual study.
Garrett Nicolai | Colin Cherry | Grzegorz Kondrak
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing

pdf bib
NRC: Infused Phrase Vectors for Named Entity Recognition in Twitter
Colin Cherry | Hongyu Guo | Chengbi Dai
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition
Colin Cherry | Hongyu Guo
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Inflection Generation as Discriminative String Transduction
Garrett Nicolai | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews
Svetlana Kiritchenko | Xiaodan Zhu | Colin Cherry | Saif Mohammad
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Lattice Desegmentation for Statistical Machine Translation
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU
Boxing Chen | Colin Cherry
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf bib
Regularized Minimum Error Rate Training
Michel Galley | Chris Quirk | Colin Cherry | Kristina Toutanova
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Improved Reordering for Phrase-Based Translation using Sparse Features
Colin Cherry
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Reversing Morphological Tokenization in English-to-Arabic SMT
Mohammad Salameh | Colin Cherry | Grzegorz Kondrak
Proceedings of the 2013 NAACL HLT Student Research Workshop

2012

pdf bib
Batch Tuning Strategies for Statistical Machine Translation
Colin Cherry | George Foster
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
MSR SPLAT, a language analysis toolkit
Chris Quirk | Pallavi Choudhury | Jianfeng Gao | Hisami Suzuki | Kristina Toutanova | Michael Gamon | Wen-tau Yih | Colin Cherry | Lucy Vanderwende
Proceedings of the Demonstration Session at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
On Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding
Colin Cherry | Robert C. Moore | Chris Quirk
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Paraphrasing for Style
Wei Xu | Alan Ritter | Bill Dolan | Ralph Grishman | Colin Cherry
Proceedings of COLING 2012

2011

pdf bib
Data-Driven Response Generation in Social Media
Alan Ritter | Colin Cherry | William B. Dolan
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Indexing Spoken Documents with Hierarchical Semantic Structures: Semantic Tree-to-string Alignment Models
Xiaodan Zhu | Colin Cherry | Gerald Penn
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Lexically-Triggered Hidden Markov Models for Clinical Document Coding
Svetlana Kiritchenko | Colin Cherry
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
Colin Cherry | Shane Bergsma
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Unsupervised Modeling of Twitter Conversations
Alan Ritter | Colin Cherry | Bill Dolan
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Integrating Joint n-gram Features into a Discriminative Training Framework
Sittichai Jiampojamarn | Colin Cherry | Grzegorz Kondrak
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Book Review: Statistical Machine Translation by Philipp Koehn
Colin Cherry
Computational Linguistics, Volume 36, Issue 4 - December 2010

pdf bib
Fast and Accurate Arc Filtering for Dependency Parsing
Shane Bergsma | Colin Cherry
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Imposing Hierarchical Browsing Structures onto Spoken Documents
Xiaodan Zhu | Colin Cherry | Gerald Penn
Coling 2010: Posters

2009

pdf bib
Discriminative Substring Decoding for Transliteration
Colin Cherry | Hisami Suzuki
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
NEWS 2009 Machine Transliteration Shared Task System Description: Transliteration with Letter-to-Phoneme Technology
Colin Cherry | Hisami Suzuki
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
A global model for joint lemmatization and part-of-speech prediction
Kristina Toutanova | Colin Cherry
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Unsupervised Morphological Segmentation with Log-Linear Models
Hoifung Poon | Colin Cherry | Kristina Toutanova
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
On the Syllabification of Phonemes
Susan Bartlett | Grzegorz Kondrak | Colin Cherry
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Cohesive Constraints in A Beam Search Phrase-based Decoder
Nguyen Bach | Stephan Vogel | Colin Cherry
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

2008

pdf bib
Cohesive Phrase-Based Decoding for Statistical Machine Translation
Colin Cherry
Proceedings of ACL-08: HLT

pdf bib
Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion
Susan Bartlett | Grzegorz Kondrak | Colin Cherry
Proceedings of ACL-08: HLT

pdf bib
Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion
Sittichai Jiampojamarn | Colin Cherry | Grzegorz Kondrak
Proceedings of ACL-08: HLT

2007

pdf bib
Inversion Transduction Grammar for Joint Phrasal Translation Modeling
Colin Cherry | Dekang Lin
Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

2006

pdf bib
Soft Syntactic Constraints for Word Alignment through Discriminative Training
Colin Cherry | Dekang Lin
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Improved Large Margin Dependency Parsing via Local Constraints and Laplacian Regularization
Qin Iris Wang | Colin Cherry | Dan Lizotte | Dale Schuurmans
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib
Biomedical Term Recognition with the Perceptron HMM Algorithm
Sittichai Jiampojamarn | Grzegorz Kondrak | Colin Cherry
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

pdf bib
A Comparison of Syntactically Motivated Word Alignment Spaces
Colin Cherry | Dekang Lin
11th Conference of the European Chapter of the Association for Computational Linguistics

2005

pdf bib
Dependency Treelet Translation: Syntactically Informed Phrasal SMT
Chris Quirk | Arul Menezes | Colin Cherry
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
An Expectation Maximization Approach to Pronoun Resolution
Colin Cherry | Shane Bergsma
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2003

pdf bib
ProAlign: Shared Task System Description
Dekang Lin | Colin Cherry
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
A Probability Model to Improve Word Alignment
Colin Cherry | Dekang Lin
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Word Alignment with Cohesion Constraint
Dekang Lin | Colin Cherry
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers