Iacer Calixto


Can Wikipedia Categories Improve Masked Language Model Pretraining?
Diksha Meghwal | Katharina Kann | Iacer Calixto | Stanislaw Jastrzebski
Proceedings of the The Fourth Widening Natural Language Processing Workshop

Pretrained language models have obtained impressive results for a large set of natural language understanding tasks. However, training these models is computationally expensive and requires huge amounts of data. Thus, it would be desirable to automatically detect groups of more or less important examples. Here, we investigate if we can leverage sources of information which are commonly overlooked, Wikipedia categories as listed in DBPedia, to identify useful or harmful data points during pretraining. We define an experimental setup in which we analyze correlations between language model perplexity on specific clusters and downstream NLP task performances during pretraining. Our experiments show that Wikipedia categories are not a good indicator of the importance of specific sentences for pretraining.


pdf bib
Latent Variable Model for Multi-modal Translation
Iacer Calixto | Miguel Rios | Wilker Aziz
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formulation utilises visual and textual inputs during training but does not require that images be available at test time. We show that our latent variable MMT formulation improves considerably over strong baselines, including a multi-task learning approach (Elliott and Kadar, 2017) and a conditional variational auto-encoder approach (Toyama et al., 2016). Finally, we show improvements due to (i) predicting image features in addition to only conditioning on them, (ii) imposing a constraint on the KL term to promote models with non-negligible mutual information between inputs and latent variable, and (iii) by training on additional target-language image descriptions (i.e. synthetic data).


pdf bib
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
Iacer Calixto | Qun Liu | Nick Campbell
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neural networks, bridging the gap between image description and translation. Our decoder learns to attend to source-language words and parts of an image independently by means of two separate attention mechanisms as it generates words in the target language. We find that our model can efficiently exploit not just back-translated in-domain multi-modal data but also large general-domain text-only MT corpora. We also report state-of-the-art results on the Multi30k data set.

pdf bib
Using Images to Improve Machine-Translating E-Commerce Product Listings.
Iacer Calixto | Daniel Stein | Evgeny Matusov | Pintu Lohar | Sheila Castilho | Andy Way
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper we study the impact of using images to machine-translate user-generated e-commerce product listings. We study how a multi-modal Neural Machine Translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attentional NMT and a Statistical Machine Translation (SMT) model. User-generated product listings often do not constitute grammatical or well-formed sentences. More often than not, they consist of the juxtaposition of short phrases or keywords. We train our models end-to-end as well as use text-only and multi-modal NMT models for re-ranking n-best lists generated by an SMT model. We qualitatively evaluate our user-generated training data also analyse how adding synthetic data impacts the results. We evaluate our models quantitatively using BLEU and TER and find that (i) additional synthetic data has a general positive impact on text-only and multi-modal NMT models, and that (ii) using a multi-modal NMT model for re-ranking n-best lists improves TER significantly across different n-best list sizes.

pdf bib
Sentence-Level Multilingual Multi-modal Embedding for Natural Language Processing
Iacer Calixto | Qun Liu
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We propose a novel discriminative ranking model that learns embeddings from multilingual and multi-modal data, meaning that our model can take advantage of images and descriptions in multiple languages to improve embedding quality. To that end, we introduce an objective function that uses pairwise ranking adapted to the case of three or more input sources. We compare our model against different baselines, and evaluate the robustness of our embeddings on image–sentence ranking (ISR), semantic textual similarity (STS), and neural machine translation (NMT). We find that the additional multilingual signals lead to improvements on all three tasks, and we highlight that our model can be used to consistently improve the adequacy of translations generated with NMT models when re-ranking n-best lists.

pdf bib
Human Evaluation of Multi-modal Neural Machine Translation: A Case-Study on E-Commerce Listing Titles
Iacer Calixto | Daniel Stein | Evgeny Matusov | Sheila Castilho | Andy Way
Proceedings of the Sixth Workshop on Vision and Language

In this paper, we study how humans perceive the use of images as an additional knowledge source to machine-translate user-generated product listings in an e-commerce company. We conduct a human evaluation where we assess how a multi-modal neural machine translation (NMT) model compares to two text-only approaches: a conventional state-of-the-art attention-based NMT and a phrase-based statistical machine translation (PBSMT) model. We evaluate translations obtained with different systems and also discuss the data set of user-generated product listings, which in our case comprises both product listings and associated images. We found that humans preferred translations obtained with a PBSMT system to both text-only and multi-modal NMT over 56% of the time. Nonetheless, human evaluators ranked translations from a multi-modal NMT model as better than those of a text-only NMT over 88% of the time, which suggests that images do help NMT in this use-case.

pdf bib
Linguistic realisation as machine translation: Comparing different MT models for AMR-to-text generation
Thiago Castro Ferreira | Iacer Calixto | Sander Wubben | Emiel Krahmer
Proceedings of the 10th International Conference on Natural Language Generation

In this paper, we study AMR-to-text generation, framing it as a translation task and comparing two different MT approaches (Phrase-based and Neural MT). We systematically study the effects of 3 AMR preprocessing steps (Delexicalisation, Compression, and Linearisation) applied before the MT phase. Our results show that preprocessing indeed helps, although the benefits differ for the two MT models.

pdf bib
DCU System Report on the WMT 2017 Multi-modal Machine Translation Task
Iacer Calixto | Koel Dutta Chowdhury | Qun Liu
Proceedings of the Second Conference on Machine Translation

pdf bib
Incorporating Global Visual Features into Attention-based Neural Machine Translation.
Iacer Calixto | Qun Liu
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We introduce multi-modal, attention-based neural machine translation (NMT) models which incorporate visual features into different parts of both the encoder and the decoder. Global image features are extracted using a pre-trained convolutional neural network and are incorporated (i) as words in the source sentence, (ii) to initialise the encoder hidden state, and (iii) as additional data to initialise the decoder hidden state. In our experiments, we evaluate translations into English and German, how different strategies to incorporate global image features compare and which ones perform best. We also study the impact that adding synthetic multi-modal, multilingual data brings and find that the additional data have a positive impact on multi-modal NMT models. We report new state-of-the-art results and our best models also significantly improve on a comparable phrase-based Statistical MT (PBSMT) model trained on the Multi30k data set according to all metrics evaluated. To the best of our knowledge, it is the first time a purely neural model significantly improves over a PBSMT model on all metrics evaluated on this data set.


pdf bib
Developing a Dataset for Evaluating Approaches for Document Expansion with Images
Debasis Ganguly | Iacer Calixto | Gareth Jones
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Motivated by the adage that a “picture is worth a thousand words” it can be reasoned that automatically enriching the textual content of a document with relevant images can increase the readability of a document. Moreover, features extracted from the additional image data inserted into the textual content of a document may, in principle, be also be used by a retrieval engine to better match the topic of a document with that of a given query. In this paper, we describe our approach of building a ground truth dataset to enable further research into automatic addition of relevant images to text documents. The dataset is comprised of the official ImageCLEF 2010 collection (a collection of images with textual metadata) to serve as the images available for automatic enrichment of text, a set of 25 benchmark documents that are to be enriched, which in this case are children’s short stories, and a set of manually judged relevant images for each query story obtained by the standard procedure of depth pooling. We use this benchmark dataset to evaluate the effectiveness of standard information retrieval methods as simple baselines for this task. The results indicate that using the whole story as a weighted query, where the weight of each query term is its tf-idf value, achieves an precision of 0:1714 within the top 5 retrieved images on an average.

pdf bib
DCU-UvA Multimodal MT System Report
Iacer Calixto | Desmond Elliott | Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers


pdf bib
Automatic Text Simplification for Spanish: Comparative Evaluation of Various Simplification Strategies
Sanja Štajner | Iacer Calixto | Horacio Saggion
Proceedings of the International Conference Recent Advances in Natural Language Processing


pdf bib
Target-Centric Features for Translation Quality Estimation
Chris Hokamp | Iacer Calixto | Joachim Wagner | Jian Zhang
Proceedings of the Ninth Workshop on Statistical Machine Translation