Jakob Uszkoreit


pdf bib
An Empirical Study of Generation Order for Machine Translation
William Chan | Mitchell Stern | Jamie Kiros | Jakob Uszkoreit
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this work, we present an empirical study of generation order for machine translation. Building on recent advances in insertion-based modeling, we first introduce a soft order-reward framework that enables us to train models to follow arbitrary oracle generation policies. We then make use of this framework to explore a large variety of generation orders, including uninformed orders, location-based orders, frequency-based orders, content-based orders, and model-based orders. Curiously, we find that for the WMT’14 English German and WMT’18 English Chinese translation tasks, order does not have a substantial impact on output quality. Moreover, for English German, we even discover that unintuitive orderings such as alphabetical and shortest-first can match the performance of a standard Transformer, suggesting that traditional left-to-right generation may not be necessary to achieve high performance.

pdf bib
Towards End-to-End In-Image Neural Machine Translation
Elman Mansimov | Mitchell Stern | Mia Chen | Orhan Firat | Jakob Uszkoreit | Puneet Jain
Proceedings of the First International Workshop on Natural Language Processing Beyond Text

In this paper, we offer a preliminary investigation into the task of in-image machine translation: transforming an image containing text in one language into an image containing the same text in another language. We propose an end-to-end neural model for this task inspired by recent approaches to neural machine translation, and demonstrate promising initial results based purely on pixel-level supervision. We then offer a quantitative and qualitative evaluation of our system outputs and discuss some common failure modes. Finally, we conclude with directions for future work.


pdf bib
Natural Questions: A Benchmark for Question Answering Research
Tom Kwiatkowski | Jennimaria Palomaki | Olivia Redfield | Michael Collins | Ankur Parikh | Chris Alberti | Danielle Epstein | Illia Polosukhin | Jacob Devlin | Kenton Lee | Kristina Toutanova | Llion Jones | Matthew Kelcey | Ming-Wei Chang | Andrew M. Dai | Jakob Uszkoreit | Quoc Le | Slav Petrov
Transactions of the Association for Computational Linguistics, Volume 7

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.


pdf bib
Self-Attention with Relative Position Representations
Peter Shaw | Jakob Uszkoreit | Ashish Vaswani
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

pdf bib
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mia Xu Chen | Orhan Firat | Ankur Bapna | Melvin Johnson | Wolfgang Macherey | George Foster | Llion Jones | Mike Schuster | Noam Shazeer | Niki Parmar | Ashish Vaswani | Jakob Uszkoreit | Lukasz Kaiser | Zhifeng Chen | Yonghui Wu | Macduff Hughes
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT’14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets.

pdf bib
Tensor2Tensor for Neural Machine Translation
Ashish Vaswani | Samy Bengio | Eugene Brevdo | Francois Chollet | Aidan Gomez | Stephan Gouws | Llion Jones | Łukasz Kaiser | Nal Kalchbrenner | Niki Parmar | Ryan Sepassi | Noam Shazeer | Jakob Uszkoreit
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)


pdf bib
Coarse-to-Fine Question Answering for Long Documents
Eunsol Choi | Daniel Hewlett | Jakob Uszkoreit | Illia Polosukhin | Alexandre Lacoste | Jonathan Berant
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a framework for question answering that can efficiently scale to longer documents while maintaining or even improving performance of state-of-the-art models. While most successful approaches for reading comprehension rely on recurrent neural networks (RNNs), running them over long documents is prohibitively slow because it is difficult to parallelize over sequences. Inspired by how people first skim the document, identify relevant parts, and carefully read these parts to produce an answer, we combine a coarse, fast model for selecting relevant sentences and a more expensive RNN for producing the answer from those sentences. We treat sentence selection as a latent variable trained jointly from the answer only using reinforcement learning. Experiments demonstrate state-of-the-art performance on a challenging subset of the WikiReading dataset and on a new dataset, while speeding up the model by 3.5x-6.7x.

pdf bib
Neural Paraphrase Identification of Questions with Noisy Pretraining
Gaurav Singh Tomar | Thyago Duque | Oscar Täckström | Jakob Uszkoreit | Dipanjan Das
Proceedings of the First Workshop on Subword and Character Level Models in NLP

We present a solution to the problem of paraphrase identification of questions. We focus on a recent dataset of question pairs annotated with binary paraphrase labels and show that a variant of the decomposable attention model (replacing the word embeddings of the decomposable attention model of Parikh et al. 2016 with character n-gram representations) results in accurate performance on this task, while being far simpler than many competing neural architectures. Furthermore, when the model is pretrained on a noisy dataset of automatically collected question paraphrases, it obtains the best reported performance on the dataset.


pdf bib
A Decomposable Attention Model for Natural Language Inference
Ankur Parikh | Oscar Täckström | Dipanjan Das | Jakob Uszkoreit
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing


pdf bib
Language-Independent Discriminative Parsing of Temporal Expressions
Gabor Angeli | Jakob Uszkoreit
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)


pdf bib
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
Oscar Täckström | Ryan McDonald | Jakob Uszkoreit
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Feature-Rich Constituent Context Model for Grammar Induction
Dave Golland | John DeNero | Jakob Uszkoreit
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)


pdf bib
Inducing Sentence Structure from Parallel Corpora for Reordering
John DeNero | Jakob Uszkoreit
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.
Ashish Venugopal | Jakob Uszkoreit | David Talbot | Franz Och | Juri Ganitkevitch
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing


pdf bib
Large Scale Parallel Document Mining for Machine Translation
Jakob Uszkoreit | Jay Ponte | Ashok Popat | Moshe Dubiner
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
“Poetic” Statistical Machine Translation: Rhyme and Meter
Dmitriy Genzel | Jakob Uszkoreit | Franz Och
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing


pdf bib
Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation
Jakob Uszkoreit | Thorsten Brants
Proceedings of ACL-08: HLT

pdf bib
Lattice-based Minimum Error Rate Training for Statistical Machine Translation
Wolfgang Macherey | Franz Och | Ignacio Thayer | Jakob Uszkoreit
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing