On the role of effective and referring questions in GuessWhat?!

Mauricio Mazuecos, Alberto Testoni, Raffaella Bernardi, Luciana Benotti


Abstract
Task success is the standard metric used to evaluate referential visual dialogue systems. In this paper we propose two new metrics that evaluate how each question contributes to the goal. First, we measure how effective each question is by evaluating whether the question discards objects that are not the referent. Second, we define referring questions as those that univocally identify one object in the image. We report the new metrics for human dialogues and for state of the art publicly available models on GuessWhat?!. Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models. With respect to the second metric, humans make questions at the end of the dialogue that are referring, confirming their guess before guessing. Human dialogues that use this strategy have a higher task success but models do not seem to learn it.
Anthology ID:
2020.alvr-1.4
Volume:
Proceedings of the First Workshop on Advances in Language and Vision Research
Month:
July
Year:
2020
Address:
Online
Venues:
ACL | ALVR | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19–25
Language:
URL:
https://www.aclweb.org/anthology/2020.alvr-1.4
DOI:
10.18653/v1/2020.alvr-1.4
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.alvr-1.4.pdf