Automatic identification of head movements in video-recorded conversations: can words help?

Patrizia Paggio, Costanza Navarretta, Bart Jongejan


Abstract
We present an approach where an SVM classifier learns to classify head movements based on measurements of velocity, acceleration, and the third derivative of position with respect to time, jerk. Consequently, annotations of head movements are added to new video data. The results of the automatic annotation are evaluated against manual annotations in the same data and show an accuracy of 68% with respect to these. The results also show that using jerk improves accuracy. We then conduct an investigation of the overlap between temporal sequences classified as either movement or non-movement and the speech stream of the person performing the gesture. The statistics derived from this analysis show that using word features may help increase the accuracy of the model.
Anthology ID:
W17-2006
Volume:
Proceedings of the Sixth Workshop on Vision and Language
Month:
April
Year:
2017
Address:
Valencia, Spain
Venues:
VL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
40–42
Language:
URL:
https://www.aclweb.org/anthology/W17-2006
DOI:
10.18653/v1/W17-2006
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-2006.pdf