Mauricio Mazuecos


2020

pdf bib
They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies
Alberto Testoni | Claudio Greco | Tobias Bianchi | Mauricio Mazuecos | Agata Marcante | Luciana Benotti | Raffaella Bernardi
Proceedings of the Third International Workshop on Spatial Language Understanding

In this paper, we study the grounding skills required to answer spatial questions asked by humans while playing the GuessWhat?! game. We propose a classification for spatial questions dividing them into absolute, relational, and group questions. We build a new answerer model based on the LXMERT multimodal transformer and we compare a baseline with and without visual features of the scene. We are interested in studying how the attention mechanisms of LXMERT are used to answer spatial questions since they require putting attention on more than one region simultaneously and spotting the relation holding among them. We show that our proposed model outperforms the baseline by a large extent (9.70% on spatial questions and 6.27% overall). By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.

pdf bib
On the role of effective and referring questions in GuessWhat?!
Mauricio Mazuecos | Alberto Testoni | Raffaella Bernardi | Luciana Benotti
Proceedings of the First Workshop on Advances in Language and Vision Research

Task success is the standard metric used to evaluate referential visual dialogue systems. In this paper we propose two new metrics that evaluate how each question contributes to the goal. First, we measure how effective each question is by evaluating whether the question discards objects that are not the referent. Second, we define referring questions as those that univocally identify one object in the image. We report the new metrics for human dialogues and for state of the art publicly available models on GuessWhat?!. Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models. With respect to the second metric, humans make questions at the end of the dialogue that are referring, confirming their guess before guessing. Human dialogues that use this strategy have a higher task success but models do not seem to learn it.

bib
Effective questions in referential visual dialogue
Mauricio Mazuecos | Alberto Testoni | Raffaella Bernardi | Luciana Benotti
Proceedings of the The Fourth Widening Natural Language Processing Workshop

An interesting challenge for situated dialogue systems is referential visual dialog: by asking questions, the system has to identify the referent to which the user refers to. Task success is the standard metric used to evaluate these systems. However, it does not consider how effective each question is, that is how much each question contributes to the goal. We propose a new metric, that measures question effectiveness. As a preliminary study, we report the new metric for state of the art publicly available models on GuessWhat?!. Surprisingly, successful dialogues do not have a higher percentage of effective questions than failed dialogues. This suggests that a system with high task success is not necessarily one that generates good questions.