André F. T. Martins

Also published as: Andre Martins, André Martins


2020

pdf bib
Revisiting Higher-Order Dependency Parsers
Erick Fonseca | André F. T. Martins
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Neural encoders have allowed dependency parsers to shift from higher-order structured models to simpler first-order ones, making decoding faster and still achieving better accuracy than non-neural parsers. This has led to a belief that neural encoders can implicitly encode structural constraints, such as siblings and grandparents in a tree. We tested this hypothesis and found that neural parsers may benefit from higher-order features, even when employing a powerful pre-trained encoder, such as BERT. While the gains of higher-order features are small in the presence of a powerful encoder, they are consistent for long-range dependencies and long sentences. In particular, higher-order models are more accurate on full sentence parses and on the exact match of modifier lists, indicating that they deal better with larger, more complex structures.

pdf bib
Proceedings of the Fourth Workshop on Structured Prediction for NLP
Priyanka Agrawal | Zornitsa Kozareva | Julia Kreutzer | Gerasimos Lampouras | André Martins | Sujith Ravi | Andreas Vlachos
Proceedings of the Fourth Workshop on Structured Prediction for NLP

pdf bib
Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning
Tsvetomila Mihaylova | Vlad Niculae | André F. T. Martins
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT – a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.

pdf bib
Sparse Text Generation
Pedro Henrique Martins | Zita Marinho | André F. T. Martins
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Current state-of-the-art text generators build on powerful language models such as GPT-2, achieving impressive performance. However, to avoid degenerate text, they require sampling from a modified softmax, via temperature parameters or ad-hoc truncation techniques, as in top-k or nucleus sampling. This creates a mismatch between training and testing conditions. In this paper, we use the recently introduced entmax transformation to train and sample from a natively sparse language model, avoiding this mismatch. The result is a text generator with favorable performance in terms of fluency and consistency, fewer repetitions, and n-gram diversity closer to human text. In order to evaluate our model, we propose three new metrics for comparing sparse or truncated distributions: 𝜖-perplexity, sparsemax score, and Jensen-Shannon divergence. Human-evaluated experiments in story completion and dialogue generation show that entmax sampling leads to more engaging and coherent stories and conversations.

pdf bib
The Explanation Game: Towards Prediction Explainability through Sparse Communication
Marcos Treviso | André F. T. Martins
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Explainability is a topic of growing importance in NLP. In this work, we provide a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier’s decision. We use this framework to compare several explainers, including gradient methods, erasure, and attention mechanisms, in terms of their communication success. In addition, we reinterpret these methods in the light of classical feature selection, and use this as inspiration for new embedded explainers, through the use of selective, sparse attention. Experiments in text classification and natural language inference, using different configurations of explainers and laypeople (including both machines and humans), reveal an advantage of attention-based explainers over gradient and erasure methods, and show that selective attention is a simpler alternative to stochastic rationalizers. Human experiments show strong results on text classification with post-hoc explainers trained to optimize communication success.

pdf bib
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
André Martins | Helena Moniz | Sara Fumega | Bruno Martins | Fernando Batista | Luisa Coheur | Carla Parra | Isabel Trancoso | Marco Turchi | Arianna Bisazza | Joss Moorkens | Ana Guerberof | Mary Nurminen | Lena Marg | Mikel L. Forcada
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

pdf bib
Learning Non-Monotonic Automatic Post-Editing of Translations from Human Orderings
António Góis | Kyunghyun Cho | André Martins
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Recent research in neural machine translation has explored flexible generation orders, as an alternative to left-to-right generation. However, training non-monotonic models brings a new complication: how to search for a good ordering when there is a combinatorial explosion of orderings arriving at the same final result? Also, how do these automatic orderings compare with the actual behaviour of human translators? Current models rely on manually built biases or are left to explore all possibilities on their own. In this paper, we analyze the orderings produced by human post-editors and use them to train an automatic post-editing system. We compare the resulting system with those trained with left-to-right and random post-editing orderings. We observe that humans tend to follow a nearly left-to-right order, but with interesting deviations, such as preferring to start by correcting punctuation or verbs.

pdf bib
Document-level Neural MT: A Systematic Comparison
António Lopes | M. Amin Farajian | Rachel Bawden | Michael Zhang | André F. T. Martins
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

In this paper we provide a systematic comparison of existing and new document-level neural machine translation solutions. As part of this comparison, we introduce and evaluate a document-level variant of the recently proposed Star Transformer architecture. In addition to using the traditional metric BLEU, we report the accuracy of the models in handling anaphoric pronoun translation as well as coherence and cohesion using contrastive test sets. Finally, we report the results of human evaluation in terms of Multidimensional Quality Metrics (MQM) and analyse the correlation of the results obtained by the automatic metrics with human judgments.

pdf bib
DeepSPIN: Deep Structured Prediction for Natural Language Processing
André F. T. Martins
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

DeepSPIN is a research project funded by the European Research Council (ERC) whose goal is to develop new neural structured prediction methods, models, and algorithms for improving the quality, interpretability, and data-efficiency of natural language processing (NLP) systems, with special emphasis on machine translation and quality estimation applications.

pdf bib
Project MAIA: Multilingual AI Agent Assistant
André F. T. Martins | Joao Graca | Paulo Dimas | Helena Moniz | Graham Neubig
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper presents the Multilingual Artificial Intelligence Agent Assistant (MAIA), a project led by Unbabel with the collaboration of CMU, INESC-ID and IT Lisbon. MAIA will employ cutting-edge machine learning and natural language processing technologies to build multilingual AI agent assistants, eliminating language barriers. MAIA’s translation layer will empower human agents to provide customer support in real-time, in any language, with human quality.

pdf bib
One-Size-Fits-All Multilingual Models
Ben Peters | André F. T. Martins
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper presents DeepSPIN’s submissions to Tasks 0 and 1 of the SIGMORPHON 2020 Shared Task. For both tasks, we present multilingual models, training jointly on data in all languages. We perform no language-specific hyperparameter tuning – each of our submissions uses the same model for all languages. Our basic architecture is the sparse sequence-to-sequence model with entmax attention and loss, which allows our models to learn sparse, local alignments while still being trainable with gradient-based techniques. For Task 1, we achieve strong performance with both RNN- and transformer-based sparse models. For Task 0, we extend our RNN-based model to a multi-encoder set-up in which separate modules encode the lemma and inflection sequences. Despite our models’ lack of language-specific tuning, they tie for first in Task 0 and place third in Task 1.

2019

pdf bib
Adaptively Sparse Transformers
Gonçalo M. Correia | Vlad Niculae | André F. T. Martins
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with alpha-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the alpha parameter – which controls the shape and sparsity of alpha-entmax – allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations.

pdf bib
Sparse Sequence-to-Sequence Models
Ben Peters | Vlad Niculae | André F. T. Martins
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of 𝛼-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any 𝛼 > 1. We provide fast algorithms to evaluate these transformations and their gradients, which scale well for large vocabulary sizes. Our models are able to produce sparse alignments and to assign nonzero probability to a short list of plausible outputs, sometimes rendering beam search exact. Experiments on morphological inflection and machine translation reveal consistent gains over dense models.

pdf bib
A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning
Gonçalo M. Correia | André F. T. Martins
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Automatic post-editing (APE) seeks to automatically refine the output of a black-box machine translation (MT) system through human post-edits. APE systems are usually trained by complementing human post-edited data with large, artificial data generated through back-translations, a time-consuming process often no easier than training a MT system from scratch. in this paper, we propose an alternative where we fine-tune pre-trained BERT models on both the encoder and decoder of an APE system, exploring several parameter sharing strategies. By only training on a dataset of 23K sentences for 3 hours on a single GPU we obtain results that are competitive with systems that were trained on 5M artificial sentences. When we add this artificial data our method obtains state-of-the-art results.

pdf bib
Joint Learning of Named Entity Recognition and Entity Linking
Pedro Henrique Martins | Zita Marinho | André F. T. Martins
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Named entity recognition (NER) and entity linking (EL) are two fundamentally related tasks, since in order to perform EL, first the mentions to entities have to be detected. However, most entity linking approaches disregard the mention detection part, assuming that the correct mentions have been previously detected. In this paper, we perform joint learning of NER and EL to leverage their relatedness and obtain a more robust and generalisable system. For that, we introduce a model inspired by the Stack-LSTM approach. We observe that, in fact, doing multi-task learning of NER and EL improves the performance in both tasks when comparing with models trained with individual objectives. Furthermore, we achieve results competitive with the state-of-the-art in both NER and EL.

pdf bib
Scheduled Sampling for Transformers
Tsvetomila Mihaylova | André F. T. Martins
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Scheduled sampling is a technique for avoiding one of the known problems in sequence-to-sequence generation: exposure bias. It consists of feeding the model a mix of the teacher forced embeddings and the model predictions from the previous step in training time. The technique has been used for improving model performance with recurrent neural networks (RNN). In the Transformer model, unlike the RNN, the generation of a new word attends to the full sentence generated so far, not only to the last word, and it is not straightforward to apply the scheduled sampling technique. We propose some structural changes to allow scheduled sampling to be applied to Transformer architectures, via a two-pass decoding strategy. Experiments on two language pairs achieve performance close to a teacher-forcing baseline and show that this technique is promising for further exploration.

pdf bib
OpenKiwi: An Open Source Framework for Quality Estimation
Fabio Kepler | Jonay Trénous | Marcos Treviso | Miguel Vera | André F. T. Martins
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We introduce OpenKiwi, a Pytorch-based open source framework for translation quality estimation. OpenKiwi supports training and testing of word-level and sentence-level quality estimation systems, implementing the winning systems of the WMT 2015–18 quality estimation campaigns. We benchmark OpenKiwi on two datasets from WMT 2018 (English-German SMT and NMT), yielding state-of-the-art performance on the word-level tasks and near state-of-the-art in the sentence-level tasks.

pdf bib
Latent Structure Models for Natural Language Processing
André F. T. Martins | Tsvetomila Mihaylova | Nikita Nangia | Vlad Niculae
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

Latent structure models are a powerful tool for modeling compositional data, discovering linguistic structure, and building NLP pipelines. They are appealing for two main reasons: they allow incorporating structural bias during training, leading to more accurate models; and they allow discovering hidden linguistic structure, which provides better interpretability. This tutorial will cover recent advances in discrete latent structure models. We discuss their motivation, potential, and limitations, then explore in detail three strategies for designing such models: gradient approximation, reinforcement learning, and end-to-end differentiable methods. We highlight connections among all these methods, enumerating their strengths and weaknesses. The models we present and analyze have been applied to a wide variety of NLP tasks, including sentiment analysis, natural language inference, language modeling, machine translation, and semantic parsing. Examples and evaluation will be covered throughout. After attending the tutorial, a practitioner will be better informed about which method is best suited for their problem.

pdf bib
Proceedings of the Third Workshop on Structured Prediction for NLP
Andre Martins | Andreas Vlachos | Zornitsa Kozareva | Sujith Ravi | Gerasimos Lampouras | Vlad Niculae | Julia Kreutzer
Proceedings of the Third Workshop on Structured Prediction for NLP

pdf bib
ITIST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection
Ben Peters | André F. T. Martins
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper presents the Instituto de Telecomunicações–Instituto Superior Técnico submission to Task 1 of the SIGMORPHON 2019 Shared Task. Our models combine sparse sequence-to-sequence models with a two-headed attention mechanism that learns separate attention distributions for the lemma and inflectional tags. Among submissions to Task 1, our models rank second and third. Despite the low data setting of the task (only 100 in-language training examples), they learn plausible inflection patterns and often concentrate all probability mass into a small set of hypotheses, making beam search exact.

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

pdf bib
Findings of the WMT 2019 Shared Tasks on Quality Estimation
Erick Fonseca | Lisa Yankovskaya | André F. T. Martins | Mark Fishel | Christian Federmann
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

We report the results of the WMT19 shared task on Quality Estimation, i.e. the task of predicting the quality of the output of machine translation systems given just the source text and the hypothesis translations. The task includes estimation at three granularity levels: word, sentence and document. A novel addition is evaluating sentence-level QE against human judgments: in other words, designing MT metrics that do not need a reference translation. This year we include three language pairs, produced solely by neural machine translation systems. Participating teams from eleven institutions submitted a variety of systems to different task variants and language pairs.

pdf bib
Unbabel’s Participation in the WMT19 Translation Quality Estimation Shared Task
Fabio Kepler | Jonay Trénous | Marcos Treviso | Miguel Vera | António Góis | M. Amin Farajian | António V. Lopes | André F. T. Martins
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

We present the contribution of the Unbabel team to the WMT 2019 Shared Task on Quality Estimation. We participated on the word, sentence, and document-level tracks, encompassing 3 language pairs: English-German, English-Russian, and English-French. Our submissions build upon the recent OpenKiwi framework: We combine linear, neural, and predictor-estimator systems with new transfer learning approaches using BERT and XLM pre-trained models. We compare systems individually and propose new ensemble techniques for word and sentence-level predictions. We also propose a simple technique for converting word labels into document-level predictions. Overall, our submitted systems achieve the best results on all tracks and language pairs by a considerable margin.

pdf bib
Unbabel’s Submission to the WMT2019 APE Shared Task: BERT-Based Encoder-Decoder for Automatic Post-Editing
António V. Lopes | M. Amin Farajian | Gonçalo M. Correia | Jonay Trénous | André F. T. Martins
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

This paper describes Unbabel’s submission to the WMT2019 APE Shared Task for the English-German language pair. Following the recent rise of large, powerful, pre-trained models, we adapt the BERT pretrained model to perform Automatic Post-Editing in an encoder-decoder framework. Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation mt strings. Furthermore, we explore a conservativeness factor to constrain the APE system to perform fewer edits. As the official results show, when trained on a weighted combination of in-domain and artificial training data, our BED system with the conservativeness penalty improves significantly the translations of a strong NMT system by -0.78 and +1.23 in terms of TER and BLEU, respectively. Finally, our submission achieves a new state-of-the-art, ex-aequo, in English-German APE of NMT.

pdf bib
Translator2Vec: Understanding and Representing Human Post-Editors
António Góis | André F. T. Martins
Proceedings of Machine Translation Summit XVII Volume 1: Research Track

pdf bib
Pivot Machine Translation in INTERACT Project
Chao-Hong Liu | Andy Way | Catarina Silva | André Martins
Proceedings of Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks

pdf bib
Selective Attention for Context-aware Neural Machine Translation
Sameen Maruf | André F. T. Martins | Gholamreza Haffari
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Despite the progress made in sentence-level NMT, current systems still fall short at achieving fluent, good quality translation for a full document. Recent works in context-aware NMT consider only a few previous sentences as context and may not scale to entire documents. To this end, we propose a novel and scalable top-down approach to hierarchical attention for context-aware NMT which uses sparse attention to selectively focus on relevant sentences in the document context and then attends to key words in those sentences. We also propose single-level attention approaches based on sentence or word-level information in the context. The document-level context representation, produced from these attention modules, is integrated into the encoder or decoder of the Transformer model depending on whether we use monolingual or bilingual context. Our experiments and evaluation on English-German datasets in different document MT settings show that our selective attention approach not only significantly outperforms context-agnostic baselines but also surpasses context-aware baselines in most cases.

pdf bib
Jointly Extracting and Compressing Documents with Summary State Representations
Afonso Mendes | Shashi Narayan | Sebastião Miranda | Zita Marinho | André F. T. Martins | Shay B. Cohen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We present a new neural model for text summarization that first extracts sentences from a document and then compresses them. The pro-posed model offers a balance that sidesteps thedifficulties in abstractive methods while gener-ating more concise summaries than extractivemethods. In addition, our model dynamically determines the length of the output summary based on the gold summaries it observes during training and does not require length constraints typical to extractive summarization. The model achieves state-of-the-art results on the CNN/DailyMail and Newsroom datasets, improving over current extractive and abstractive methods. Human evaluations demonstratethat our model generates concise and informa-tive summaries. We also make available a new dataset of oracle compressive summaries derived automatically from the CNN/DailyMailreference summaries.

2018

pdf bib
Towards Dynamic Computation Graphs via Sparse Latent Structure
Vlad Niculae | André F. T. Martins | Claire Cardie
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Deep NLP models benefit from underlying structures in the data—e.g., parse trees—typically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff: either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribution over latent structures, we propose a novel approach for end-to-end learning of latent structure predictors jointly with a downstream predictor. To the best of our knowledge, our method is the first to enable unrestricted dynamic computation graph construction from the global latent structure, while maintaining differentiability.

pdf bib
Sparse and Constrained Attention for Neural Machine Translation
Chaitanya Malaviya | Pedro Ferreira | André F. T. Martins
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In neural machine translation, words are sometimes dropped from the source or generated repeatedly in the translation. We explore novel strategies to address the coverage problem that change only the attention transformation. Our approach allocates fertilities to source words, used to bound the attention each word can receive. We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse. Empirical evaluation is provided in three languages pairs.

pdf bib
Marian: Fast Neural Machine Translation in C++
Marcin Junczys-Dowmunt | Roman Grundkiewicz | Tomasz Dwojak | Hieu Hoang | Kenneth Heafield | Tom Neckermann | Frank Seide | Ulrich Germann | Alham Fikri Aji | Nikolay Bogoychev | André F. T. Martins | Alexandra Birch
Proceedings of ACL 2018, System Demonstrations

We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs. Marian is written entirely in C++. We describe the design of the encoder-decoder framework and demonstrate that a research-friendly toolkit can achieve high training and translation speed.

pdf bib
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing
Ramón Astudillo | João Graça | André Martins
Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing

pdf bib
Interpretable Structure Induction via Sparse Attention
Ben Peters | Vlad Niculae | André F. T. Martins
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Neural network methods are experiencing wide adoption in NLP, thanks to their empirical performance on many tasks. Modern neural architectures go way beyond simple feedforward and recurrent models: they are complex pipelines that perform soft, differentiable computation instead of discrete logic. The price of such soft computing is the introduction of dense dependencies, which make it hard to disentangle the patterns that trigger a prediction. Our recent work on sparse and structured latent computation presents a promising avenue for enhancing interpretability of such neural pipelines. Through this extended abstract, we aim to discuss and explore the potential and impact of our methods.

pdf bib
Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations
Sameen Maruf | André F. T. Martins | Gholamreza Haffari
Proceedings of the Third Conference on Machine Translation: Research Papers

Recent works in neural machine translation have begun to explore document translation. However, translating online multi-speaker conversations is still an open problem. In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task. To initiate an evaluation for this task, we introduce datasets extracted from Europarl v7 and OpenSubtitles2016. Our experiments on four language-pairs confirm the significance of leveraging conversation history, both in terms of BLEU and manual evaluation.

pdf bib
Findings of the WMT 2018 Shared Task on Quality Estimation
Lucia Specia | Frédéric Blain | Varvara Logacheva | Ramón Astudillo | André F. T. Martins
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We report the results of the WMT18 shared task on Quality Estimation, i.e. the task of predicting the quality of the output of machine translation systems at various granularity levels: word, phrase, sentence and document. This year we include four language pairs, three text domains, and translations produced by both statistical and neural machine translation systems. Participating teams from ten institutions submitted a variety of systems to different task variants and language pairs.

2017

pdf bib
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics
André Martins | Anselmo Peñas
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Pushing the Limits of Translation Quality Estimation
André F. T. Martins | Marcin Junczys-Dowmunt | Fabio N. Kepler | Ramón Astudillo | Chris Hokamp | Roman Grundkiewicz
Transactions of the Association for Computational Linguistics, Volume 5

Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways. However, this potential is currently limited by the relatively low accuracy of existing systems. In this paper, we achieve remarkable improvements by exploiting synergies between the related tasks of word-level quality estimation and automatic post-editing. First, we stack a new, carefully engineered, neural model into a rich feature-based word-level quality estimation system. Then, we use the output of an automatic post-editing system as an extra feature, obtaining striking results on WMT16: a word-level FMULT1 score of 57.47% (an absolute gain of +7.95% over the current state of the art), and a Pearson correlation score of 65.56% for sentence-level HTER prediction (an absolute gain of +13.36%).

pdf bib
Unbabel’s Participation in the WMT17 Translation Quality Estimation Shared Task
André F. T. Martins | Fabio Kepler | José Monteiro
Proceedings of the Second Conference on Machine Translation

pdf bib
Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers
André F. T. Martins | Julia Kreutzer
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We introduce a novel neural easy-first decoder that learns to solve sequence tagging tasks in a flexible order. In contrast to previous easy-first decoders, our models are end-to-end differentiable. The decoder iteratively updates a “sketch” of the predictions over the sequence. At its core is an attention mechanism that controls which parts of the input are strategically the best to process next. We present a new constrained softmax transformation that ensures the same cumulative attention to every word, and show how to efficiently evaluate and backpropagate over it. Our models compare favourably to BILSTM taggers on three sequence tagging tasks.

2016

pdf bib
Semi-Supervised Learning of Sequence Models with Method of Moments
Zita Marinho | André F. T. Martins | Shay B. Cohen | Noah A. Smith
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Jointly Learning to Embed and Predict with Multiple Languages
Daniel C. Ferreira | André F. T. Martins | Mariana S. C. Almeida
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Unbabel’s Participation in the WMT16 Word-Level Translation Quality Estimation Shared Task
André F. T. Martins | Ramón Astudillo | Chris Hokamp | Fabio Kepler
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Aligning Opinions: Cross-Lingual Opinion Mining with Dependencies
Mariana S. C. Almeida | Cláudia Pinto | Helena Figueira | Pedro Mendes | André F. T. Martins
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Transferring Coreference Resolvers with Posterior Regularization
André F. T. Martins
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Parsing as Reduction
Daniel Fernández-González | André F. T. Martins
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Lisbon: Evaluating TurboSemanticParser on Multiple Languages and Out-of-Domain Data
Mariana S. C. Almeida | André F. T. Martins
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
Priberam: A Turbo Semantic Parser with Second Order Features
André F. T. Martins | Mariana S. C. Almeida
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Frame-Semantic Parsing
Dipanjan Das | Desai Chen | André F. T. Martins | Nathan Schneider | Noah A. Smith
Computational Linguistics, Volume 40, Issue 1 - March 2014

pdf bib
A Joint Model for Quotation Attribution and Coreference Resolution
Mariana S. C. Almeida | Miguel B. Almeida | André F. T. Martins
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

bib
Linear Programming Decoders in Natural Language Processing: From Integer Programming to Message Passing and Dual Decomposition
André F. T. Martins
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

This tutorial will cover the theory and practice of linear programming decoders. This class of decoders encompasses a variety of techniques that have enjoyed great success in devising structured models for natural language processing (NLP). Along the tutorial, we provide a unified view of different algorithms and modeling techniques, including belief propagation, dual decomposition, integer linear programming, Markov logic, and constrained conditional models. Various applications in NLP will serve as a motivation.There is a long string of work using integer linear programming (ILP) formulations in NLP, for example in semantic role labeling, machine translation, summarization, dependency parsing, coreference resolution, and opinion mining, to name just a few. At the heart of these approaches is the ability to encode logic and budget constraints (common in NLP and information retrieval) as linear inequalities. Thanks to general purpose solvers (such as Gurobi, CPLEX, or GLPK), the practitioner can abstract away from the decoding algorithm and focus on developing a powerful model. A disadvantage, however, is that general solvers do not scale well to large problem instances, since they fail to exploit the structure of the problem.This is where graphical models come into play. In this tutorial, we show that most logic and budget constraints that arise in NLP can be cast in this framework. This opens the door for the use of message-passing algorithms, such as belief propagation and variants thereof. An alternative are algorithms based on dual decomposition, such as the subgradient method or AD3. These algorithms have achieved great success in a variety of applications, such as parsing, corpus-wide tagging, machine translation, summarization, joint coreference resolution and quotation attribution, and semantic role labeling. Interestingly, most decoders used in these works can be regarded as structure-aware solvers for addressing relaxations of integer linear programs. All these algorithms have a similar consensus-based architecture: they repeatedly perform certain "local" operations in the graph, until some form of local agreement is achieved. The local operations are performed at each factor, and they range between computing marginals, max-marginals, an optimal configuration, or a small quadratic problem, all of which are commonly tractable and efficient in a wide range of problems.As a companion of this tutorial, we provide an open-source implementation of some of the algorithms described above, available at http://www.ark.cs.cmu.edu/AD3.

pdf bib
Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese
Miguel B. Almeida | Mariana S. C. Almeida | André F. T. Martins | Helena Figueira | Pedro Mendes | Cláudia Pinto
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we introduce the Priberam Compressive Summarization Corpus, a new multi-document summarization corpus for European Portuguese. The corpus follows the format of the summarization corpora for English in recent DUC and TAC conferences. It contains 80 manually chosen topics referring to events occurred between 2010 and 2013. Each topic contains 10 news stories from major Portuguese newspapers, radio and TV stations, along with two human generated summaries up to 100 words. Apart from the language, one important difference from the DUC/TAC setup is that the human summaries in our corpus are \emph{compressive}: the annotators performed only sentence and word deletion operations, as opposed to generating summaries from scratch. We use this corpus to train and evaluate learning-based extractive and compressive summarization systems, providing an empirical comparison between these two approaches. The corpus is made freely available in order to facilitate research on automatic summarization.

2013

pdf bib
Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
Miguel Almeida | André Martins
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
André Martins | Miguel Almeida | Noah A. Smith
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Structured Sparsity in Natural Language Processing: Models, Algorithms and Applications
André F. T. Martins | Mário A. T. Figueiredo | Noah A. Smith
Tutorial Abstracts at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints
Dipanjan Das | André F. T. Martins | Noah A. Smith
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Dual Decomposition with Many Overlapping Components
André Martins | Noah Smith | Mário Figueiredo | Pedro Aguiar
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Structured Sparsity in Structured Prediction
André Martins | Noah Smith | Mário Figueiredo | Pedro Aguiar
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Turbo Parsers: Dependency Parsing by Approximate Variational Inference
André Martins | Noah Smith | Eric Xing | Pedro Aguiar | Mário Figueiredo
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Summarization with a Joint Model for Sentence Extraction and Compression
André Martins | Noah A. Smith
Proceedings of the Workshop on Integer Linear Programming for Natural Language Processing

pdf bib
Concise Integer Linear Programming Formulations for Dependency Parsing
André Martins | Noah Smith | Eric Xing
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2004

pdf bib
Design and Implementation of a Semantic Search Engine for Portuguese
Carlos Amaral | Dominique Laurent | André Martins | Afonso Mendes | Cláudia Pinto
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Search
Co-authors