George Foster


2020

pdf bib
Re-translation versus Streaming for Simultaneous Translation
Naveen Arivazhagan | Colin Cherry | Wolfgang Macherey | George Foster
Proceedings of the 17th International Conference on Spoken Language Translation

There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis beyond strictly appending words are permitted. This is suitable for applications such as live captioning an audio feed. In this setting, we compare custom streaming approaches to re-translation, a straightforward strategy where each new source token triggers a distinct translation from scratch. We find re-translation to be as good or better than state-of-the-art streaming systems, even when operating under constraints that allow very few revisions. We attribute much of this success to a previously proposed data-augmentation technique that adds prefix-pairs to the training data, which alongside wait-k inference forms a strong baseline for streaming translation. We also highlight re-translation’s ability to wrap arbitrarily powerful MT systems with an experiment showing large improvements from an upgrade to its base model.

pdf bib
Inference Strategies for Machine Translation with Conditional Masking
Julia Kreutzer | George Foster | Colin Cherry
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard “mask-predict” algorithm, and provide analyses of its behavior on machine translation tasks.

2019

pdf bib
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Colin Cherry | Greg Durrett | George Foster | Reza Haffari | Shahram Khadivi | Nanyun Peng | Xiang Ren | Swabha Swayamdipta
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

pdf bib
Reinforcement Learning based Curriculum Optimization for Neural Machine Translation
Gaurav Kumar | George Foster | Colin Cherry | Maxim Krikun
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We consider the problem of making efficient use of heterogeneous training data in neural machine translation (NMT). Specifically, given a training dataset with a sentence-level feature such as noise, we seek an optimal curriculum, or order for presenting examples to the system during training. Our curriculum framework allows examples to appear an arbitrary number of times, and thus generalizes data weighting, filtering, and fine-tuning schemes. Rather than relying on prior knowledge to design a curriculum, we use reinforcement learning to learn one automatically, jointly with the NMT system, in the course of a single training run. We show that this approach can beat uniform baselines on Paracrawl and WMT English-to-French datasets by +3.4 and +1.3 BLEU respectively. Additionally, we match the performance of strong filtering baselines and hand-designed, state-of-the-art curricula.

2018

pdf bib
Revisiting Character-Based Neural Machine Translation with Capacity and Compression
Colin Cherry | George Foster | Ankur Bapna | Orhan Firat | Wolfgang Macherey
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Translating characters instead of words or word-fragments has the potential to simplify the processing pipeline for neural machine translation (NMT), and improve results by eliminating hyper-parameters and manual feature engineering. However, it results in longer sequences in which each symbol contains less information, creating both modeling and computational challenges. In this paper, we show that the modeling problem can be solved by standard sequence-to-sequence architectures of sufficient depth, and that deep models operating at the character level outperform identical models operating over word fragments. This result implies that alternative architectures for handling character input are better viewed as methods for reducing computation time than as improved ways of modeling longer sequences. From this perspective, we evaluate several techniques for character-level NMT, verify that they do not match the performance of our deep character baseline model, and evaluate the performance versus computation time tradeoffs they offer. Within this framework, we also perform the first evaluation for NMT of conditional computation over time, in which the model learns which timesteps can be skipped, rather than having them be dictated by a fixed schedule specified before training begins.

pdf bib
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mia Xu Chen | Orhan Firat | Ankur Bapna | Melvin Johnson | Wolfgang Macherey | George Foster | Llion Jones | Mike Schuster | Noam Shazeer | Niki Parmar | Ashish Vaswani | Jakob Uszkoreit | Lukasz Kaiser | Zhifeng Chen | Yonghui Wu | Macduff Hughes
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT’14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets.

pdf bib
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Reza Haffari | Colin Cherry | George Foster | Shahram Khadivi | Bahar Salehi
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

2017

pdf bib
Cost Weighting for Neural Machine Translation Domain Adaptation
Boxing Chen | Colin Cherry | George Foster | Samuel Larkin
Proceedings of the First Workshop on Neural Machine Translation

In this paper, we propose a new domain adaptation technique for neural machine translation called cost weighting, which is appropriate for adaptation scenarios in which a small in-domain data set and a large general-domain data set are available. Cost weighting incorporates a domain classifier into the neural machine translation training algorithm, using features derived from the encoder representation in order to distinguish in-domain from out-of-domain data. Classifier probabilities are used to weight sentences according to their domain similarity when updating the parameters of the neural translation model. We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting. Experiments on two large-data tasks show that both the traditional techniques and our novel proposal lead to significant gains, with cost weighting outperforming the traditional methods.

pdf bib
NRC Machine Translation System for WMT 2017
Chi-kiu Lo | Boxing Chen | Colin Cherry | George Foster | Samuel Larkin | Darlene Stewart | Roland Kuhn
Proceedings of the Second Conference on Machine Translation

pdf bib
A Challenge Set Approach to Evaluating Machine Translation
Pierre Isabelle | Colin Cherry | George Foster
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Neural machine translation represents an exciting leap forward in translation quality. But what longstanding weaknesses does it resolve, and which remain? We address these questions with a challenge set approach to translation evaluation and error analysis. A challenge set consists of a small set of sentences, each hand-designed to probe a system’s capacity to bridge a particular structural divergence between languages. To exemplify this approach, we present an English-French challenge set, and use it to analyze phrase-based and neural systems. The resulting analysis provides not only a more fine-grained picture of the strengths of neural systems, but also insight into which linguistic phenomena remain out of reach.

2016

pdf bib
NRC Russian-English Machine Translation System for WMT 2016
Chi-kiu Lo | Colin Cherry | George Foster | Darlene Stewart | Rabib Islam | Anna Kazantseva | Roland Kuhn
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2014

pdf bib
Book Reviews: Semi-Supervised Learning and Domain Adaptation in Natural Language Processing by Anders Søgaard
George Foster
Computational Linguistics, Volume 40, Issue 2 - June 2014

pdf bib
Linear Mixture Models for Robust Machine Translation
Marine Carpuat | Cyril Goutte | George Foster
Proceedings of the Ninth Workshop on Statistical Machine Translation

2013

pdf bib
Vector Space Model for Adaptation in Statistical Machine Translation
Boxing Chen | Roland Kuhn | George Foster
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Adaptation of Reordering Models for Statistical Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Batch Tuning Strategies for Statistical Machine Translation
Colin Cherry | George Foster
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Mixing Multiple Translation Models in Statistical Machine Translation
Majid Razmara | George Foster | Baskaran Sankaran | Anoop Sarkar
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Improving AMBER, an MT Evaluation Metric
Boxing Chen | Roland Kuhn | George Foster
Proceedings of the Seventh Workshop on Statistical Machine Translation

2010

pdf bib
Fast Consensus Hypothesis Regeneration for Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Lessons from NRC’s Portage System at WMT 2010
Samuel Larkin | Boxing Chen | George Foster | Ulrich Germann | Eric Joanis | Howard Johnson | Roland Kuhn
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Phrase Clustering for Smoothing TM Probabilities - or, How to Extract Paraphrases from Phrase Tables
Roland Kuhn | Boxing Chen | George Foster | Evan Stratford
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Bilingual Sense Similarity for Statistical Machine Translation
Boxing Chen | George Foster | Roland Kuhn
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation
George Foster | Cyril Goutte | Roland Kuhn
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Stabilizing Minimum Error Rate Training
George Foster | Roland Kuhn
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf bib
Tighter Integration of Rule-Based and Statistical MT in Serial System Combination
Nicola Ueffing | Jens Stephan | Evgeny Matusov | Loïc Dugast | George Foster | Roland Kuhn | Jean Senellart | Jin Yang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Integration of an Arabic Transliteration Module into a Statistical Machine Translation System
Mehdi M. Kashani | Eric Joanis | Roland Kuhn | George Foster | Fred Popowich
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Mixture-Model Adaptation for SMT
George Foster | Roland Kuhn
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Improving Translation Quality by Discarding Most of the Phrasetable
Howard Johnson | Joel Martin | George Foster | Roland Kuhn
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Phrasetable Smoothing for Statistical Machine Translation
George Foster | Roland Kuhn | Howard Johnson
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib
PORTAGE: with Smoothed Phrase Tables and Segment Choice Models
Howard Johnson | Fatiha Sadat | George Foster | Roland Kuhn | Michel Simard | Eric Joanis | Samuel Larkin
Proceedings on the Workshop on Statistical Machine Translation

pdf bib
Segment Choice Models: Feature-Rich Models for Global Distortion in Statistical Machine Translation
Roland Kuhn | Denis Yuen | Michel Simard | Patrick Paul | George Foster | Eric Joanis | Howard Johnson
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

2005

pdf bib
Automatic Detection of Translation Errors: The State of the Art
Graham Russell | George Foster | Ngoc Tran Nguyen
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

pdf bib
PORTAGE: A Phrase-Based Machine Translation System
Fatiha Sadat | Howard Johnson | Akakpo Agbago | George Foster | Roland Kuhn | Joel Martin | Aaron Tikuisis
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf bib
Confidence Estimation for Machine Translation
John Blatz | Erin Fitzgerald | George Foster | Simona Gandrabur | Cyril Goutte | Alex Kulesza | Alberto Sanchis | Nicola Ueffing
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Adaptive Language and Translation Models for Interactive Machine Translation
Laurent Nepveu | Guy Lapalme | Philippe Langlais | George Foster
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Confidence estimation for translation prediction
Simona Gandrabur | George Foster
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

2002

pdf bib
User-Friendly Text Prediction For Translators
George Foster | Philippe Langlais | Guy Lapalme
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2000

pdf bib
Evaluation of TRANSTYPE, a Computer-aided Translation Typing System: A Comparison of a Theoretical- and a User-oriented Evaluation Procedures
Philippe Langlais | Sébastien Sauvé | George Foster | Elliott Macklovitch | Guy Lapalme
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
A Maximum Entropy/Minimum Divergence Translation Model
George Foster
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
TransType: a Computer-Aided Translation Typing System
Philippe Langlais | George Foster | Guy Lapalme
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems

pdf bib
Incorporating Position Information into a Maximum Entropy/Minimum Divergence Translation Model
George Foster
Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop

pdf bib
Unit Completion for a Computer-aided Translation Typing System
Philippe Langlais | George Foster | Guy Lapalme
Sixth Applied Natural Language Processing Conference

1998

pdf bib
Using a Probabilistic Translation Model for Cross-Language Information Retrieval
Jian-Yun Nie | Pierre Isabelle | George Foster
Sixth Workshop on Very Large Corpora

1996

pdf bib
Word Completion- A First Step Toward Target-Text Mediated IMT
George Foster | Pierre Isabelle | Pierre Plamondon
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics