Elia Bruni


2020

pdf bib
Location Attention for Extrapolation to Longer Sequences
Yann Dubois | Gautier Dagan | Dieuwke Hupkes | Elia Bruni
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Neural networks are surprisingly good at interpolating and perform remarkably well when the training set examples resemble those in the test set. However, they are often unable to extrapolate patterns beyond the seen data, even when the abstractions required for such patterns are simple. In this paper, we first review the notion of extrapolation, why it is important and how one could hope to tackle it. We then focus on a specific type of extrapolation which is especially useful for natural language processing: generalization to sequences that are longer than the training ones. We hypothesize that models with a separate content- and location-based attention are more likely to extrapolate than those with common attention mechanisms. We empirically support our claim for recurrent seq2seq models with our proposed attention on variants of the Lookup Table task. This sheds light on some striking failures of neural models for sequences and on possible methods to approaching such issues.

pdf bib
Internal and external pressures on language emergence: least effort, object constancy and frequency
Diana Rodríguez Luna | Edoardo Maria Ponti | Dieuwke Hupkes | Elia Bruni
Findings of the Association for Computational Linguistics: EMNLP 2020

In previous work, artificial agents were shown to achieve almost perfect accuracy in referential games where they have to communicate to identify images. Nevertheless, the resulting communication protocols rarely display salient features of natural languages, such as compositionality. In this paper, we propose some realistic sources of pressure on communication that avert this outcome. More specifically, we formalise the principle of least effort through an auxiliary objective. Moreover, we explore several game variants, inspired by the principle of object constancy, in which we alter the frequency, position, and luminosity of the objects in the images. We perform an extensive analysis on their effect through compositionality metrics, diagnostic classifiers, and zero-shot evaluation. Our findings reveal that the proposed sources of pressure result in emerging languages with less redundancy, more focus on high-level conceptual information, and better abilities of generalisation. Overall, our contributions reduce the gap between emergent and natural languages.

pdf bib
The Grammar of Emergent Languages
Oskar van der Wal | Silvan de Boer | Elia Bruni | Dieuwke Hupkes
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

In this paper, we consider the syntactic properties of languages emerged in referential games, using unsupervised grammar induction (UGI) techniques originally designed to analyse natural language. We show that the considered UGI techniques are appropriate to analyse emergent languages and we then study if the languages that emerge in a typical referential game setup exhibit syntactic structure, and to what extent this depends on the maximum message length and number of symbols that the agents are allowed to use. Our experiments demonstrate that a certain message length and vocabulary size are required for structure to emerge, but they also illustrate that more sophisticated game scenarios are required to obtain syntactic properties more akin to those observed in human language. We argue that UGI techniques should be part of the standard toolkit for analysing emergent languages and release a comprehensive library to facilitate such analysis for future researchers.

2019

pdf bib
Learning to request guidance in emergent language
Benjamin Kolb | Leon Lang | Henning Bartsch | Arwin Gansekoele | Raymond Koopmanschap | Leonardo Romor | David Speck | Mathijs Mul | Elia Bruni
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

Previous research into agent communication has shown that a pre-trained guide can speed up the learning process of an imitation learning agent. The guide achieves this by providing the agent with discrete messages in an emerged language about how to solve the task. We extend this one-directional communication by a one-bit communication channel from the learner back to the guide: It is able to ask the guide for help, and we limit the guidance by penalizing the learner for these requests. During training, the agent learns to control this gate based on its current observation. We find that the amount of requested guidance decreases over time and guidance is requested in situations of high uncertainty. We investigate the agent’s performance in cases of open and closed gates and discuss potential motives for the observed gating behavior.

pdf bib
The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue
Janosch Haber | Tim Baumgärtner | Ece Takmaz | Lieke Gelderloos | Elia Bruni | Raquel Fernández
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation. Taking inspiration from seminal work on dialogue analysis, we propose a data-collection task formulated as a collaborative game prompting two online participants to refer to images utilising both their visual context as well as previously established referring expressions. We provide a detailed description of the task setup and a thorough analysis of the 2,500 dialogues collected. To further illustrate the novel features of the dataset, we propose a baseline model for reference resolution which uses a simple method to take into account shared information accumulated in a reference chain. Our results show that this information is particularly important to resolve later descriptions and underline the need to develop more sophisticated models of common ground in dialogue interaction.

pdf bib
The Fast and the Flexible: Training Neural Networks to Learn to Follow Instructions from Small Data
Rezka Leonandya | Dieuwke Hupkes | Elia Bruni | Germán Kruszewski
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Learning to follow human instructions is a long-pursued goal in artificial intelligence. The task becomes particularly challenging if no prior knowledge of the employed language is assumed while relying only on a handful of examples to learn from. Work in the past has relied on hand-coded components or manually engineered features to provide strong inductive biases that make learning in such situations possible. In contrast, here we seek to establish whether this knowledge can be acquired automatically by a neural network system through a two phase training procedure: A (slow) offline learning stage where the network learns about the general structure of the task and a (fast) online adaptation phase where the network learns the language of a new given speaker. Controlled experiments show that when the network is exposed to familiar instructions but containing novel words, the model adapts very efficiently to the new vocabulary. Moreover, even for human speakers whose language usage can depart significantly from our artificial training language, our network can still make use of its automatically acquired inductive bias to learn to follow instructions more effectively.

pdf bib
Assessing Incrementality in Sequence-to-Sequence Models
Dennis Ulmer | Dieuwke Hupkes | Elia Bruni
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

Since their inception, encoder-decoder models have successfully been applied to a wide array of problems in computational linguistics. The most recent successes are predominantly due to the use of different variations of attention mechanisms, but their cognitive plausibility is questionable. In particular, because past representations can be revisited at any point in time, attention-centric methods seem to lack an incentive to build up incrementally more informative representations of incoming sentences. This way of processing stands in stark contrast with the way in which humans are believed to process language: continuously and rapidly integrating new information as it is encountered. In this work, we propose three novel metrics to assess the behavior of RNNs with and without an attention mechanism and identify key differences in the way the different model types process sentences.

pdf bib
Transcoding Compositionally: Using Attention to Find More Generalizable Solutions
Kris Korrel | Dieuwke Hupkes | Verna Dankers | Elia Bruni
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

While sequence-to-sequence models have shown remarkable generalization power across several natural language tasks, their construct of solutions are argued to be less compositional than human-like generalization. In this paper, we present seq2attn, a new architecture that is specifically designed to exploit attention to find compositional patterns in the input. In seq2attn, the two standard components of an encoder-decoder model are connected via a transcoder, that modulates the information flow between them. We show that seq2attn can successfully generalize, without requiring any additional supervision, on two tasks which are specifically constructed to challenge the compositional skills of neural networks. The solutions found by the model are highly interpretable, allowing easy analysis of both the types of solutions that are found and potential causes for mistakes. We exploit this opportunity to introduce a new paradigm to test compositionality that studies the extent to which a model overgeneralizes when confronted with exceptions. We show that seq2attn exhibits such overgeneralization to a larger degree than a standard sequence-to-sequence model.

pdf bib
On the Realization of Compositionality in Neural Networks
Joris Baan | Jana Leible | Mitja Nikolaus | David Rau | Dennis Ulmer | Tim Baumgärtner | Dieuwke Hupkes | Elia Bruni
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

We present a detailed comparison of two types of sequence to sequence models trained to conduct a compositional task. The models are architecturally identical at inference time, but differ in the way that they are trained: our baseline model is trained with a task-success signal only, while the other model receives additional supervision on its attention mechanism (Attentive Guidance), which has shown to be an effective method for encouraging more compositional solutions. We first confirm that the models with attentive guidance indeed infer more compositional solutions than the baseline, by training them on the lookup table task presented by Liska et al. (2019). We then do an in-depth analysis of the structural differences between the two model types, focusing in particular on the organisation of the parameter space and the hidden layer activations and find noticeable differences in both these aspects. Guided networks focus more on the components of the input rather than the sequence as a whole and develop small functional groups of neurons with specific purposes that use their gates more selectively. Results from parameter heat maps, component swapping and graph analysis also indicate that guided networks exhibit a more modular structure with a small number of specialized, strongly connected neurons.

pdf bib
Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat
Ravi Shekhar | Aashish Venkatesh | Tim Baumgärtner | Elia Bruni | Barbara Plank | Raffaella Bernardi | Raquel Fernández
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We propose a grounded dialogue state encoder which addresses a foundational issue on how to integrate visual grounding with dialogue system components. As a test-bed, we focus on the GuessWhat?! game, a two-player game where the goal is to identify an object in a complex visual scene by asking a sequence of yes/no questions. Our visually-grounded encoder leverages synergies between guessing and asking questions, as it is trained jointly using multi-task learning. We further enrich our model via a cooperative learning regime. We show that the introduction of both the joint architecture and cooperative learning lead to accuracy improvements over the baseline system. We compare our approach to an alternative system which extends the baseline with reinforcement learning. Our in-depth analysis shows that the linguistic skills of the two models differ dramatically, despite approaching comparable performance levels. This points at the importance of analyzing the linguistic output of competing systems beyond numeric comparison solely based on task success.

2018

pdf bib
Ask No More: Deciding when to guess in referential visual dialogue
Ravi Shekhar | Tim Baumgärtner | Aashish Venkatesh | Elia Bruni | Raffaella Bernardi | Raquel Fernandez
Proceedings of the 27th International Conference on Computational Linguistics

Our goal is to explore how the abilities brought in by a dialogue manager can be included in end-to-end visually grounded conversational agents. We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess. Our analyses show that adding a decision making component produces dialogues that are less repetitive and that include fewer unnecessary questions, thus potentially leading to more efficient and less unnatural interactions.

2017

pdf bib
Adversarial evaluation for open-domain dialogue generation
Elia Bruni | Raquel Fernández
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We investigate the potential of adversarial evaluation methods for open-domain dialogue generation systems, comparing the performance of a discriminative agent to that of humans on the same task. Our results show that the task is hard, both for automated models and humans, but that a discriminative agent can learn patterns that lead to above-chance performance.

2014

pdf bib
Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world
Angeliki Lazaridou | Elia Bruni | Marco Baroni
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf bib
Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts
Andrew J. Anderson | Elia Bruni | Ulisse Bordignon | Massimo Poesio | Marco Baroni
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
VSEM: An open library for visual semantics representation
Elia Bruni | Ulisse Bordignon | Adam Liska | Jasper Uijlings | Irina Sergienya
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
Visual Features for Linguists: Basic image analysis techniques for multimodally-curious NLPers
Elia Bruni | Marco Baroni
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Tutorials)

2012

pdf bib
Distributional Semantics in Technicolor
Elia Bruni | Gemma Boleda | Marco Baroni | Nam-Khanh Tran
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2011

pdf bib
Distributional semantics from text and images
Elia Bruni | Giang Binh Tran | Marco Baroni
Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics