A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction

Kalin Stefanov, Jonas Beskow


Abstract
This papers describes a data collection setup and a newly recorded dataset. The main purpose of this dataset is to explore patterns in the focus of visual attention of humans under three different conditions - two humans involved in task-based interaction with a robot; same two humans involved in task-based interaction where the robot is replaced by a third human, and a free three-party human interaction. The dataset contains two parts - 6 sessions with duration of approximately 3 hours and 9 sessions with duration of approximately 4.5 hours. Both parts of the dataset are rich in modalities and recorded data streams - they include the streams of three Kinect v2 devices (color, depth, infrared, body and face data), three high quality audio streams, three high resolution GoPro video streams, touch data for the task-based interactions and the system state of the robot. In addition, the second part of the dataset introduces the data streams from three Tobii Pro Glasses 2 eye trackers. The language of all interactions is English and all data streams are spatially and temporally aligned.
Anthology ID:
L16-1703
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4440–4444
Language:
URL:
https://www.aclweb.org/anthology/L16-1703
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/L16-1703.pdf