This papers describes a data collection setup and a newly recorded dataset. The main purpose of this dataset is to explore patterns in the focus of visual attention of humans under three different conditions - two humans involved in task-based interaction with a robot; same two humans involved in task-based interaction where the robot is replaced by a third human, and a free three-party human interaction. The dataset contains two parts - 6 sessions with duration of approximately 3 hours and 9 sessions with duration of approximately 4.5 hours. Both parts of the dataset are rich in modalities and recorded data streams - they include the streams of three Kinect v2 devices (color, depth, infrared, body and face data), three high quality audio streams, three high resolution GoPro video streams, touch data for the task-based interactions and the system state of the robot. In addition, the second part of the dataset introduces the data streams from three Tobii Pro Glasses 2 eye trackers. The language of all interactions is English and all data streams are spatially and temporally aligned.