Comparing Approaches for Automatic Question Identification

Angel Maredia, Kara Schechtman, Sarah Ita Levitan, Julia Hirschberg


Abstract
Collecting spontaneous speech corpora that are open-ended, yet topically constrained, is increasingly popular for research in spoken dialogue systems and speaker state, inter alia. Typically, these corpora are labeled by human annotators, either in the lab or through crowd-sourcing; however, this is cumbersome and time-consuming for large corpora. We present four different approaches to automatically tagging a corpus when general topics of the conversations are known. We develop these approaches on the Columbia X-Cultural Deception corpus and find accuracy that significantly exceeds the baseline. Finally, we conduct a cross-corpus evaluation by testing the best performing approach on the Columbia/SRI/Colorado corpus.
Anthology ID:
S17-1013
Volume:
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)
Month:
August
Year:
2017
Address:
Vancouver, Canada
Venue:
*SEMEVAL
SIGs:
SIGLEX | SIGSEM
Publisher:
Association for Computational Linguistics
Note:
Pages:
110–114
Language:
URL:
https://www.aclweb.org/anthology/S17-1013
DOI:
10.18653/v1/S17-1013
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/S17-1013.pdf