Parsing transcripts of speech

Andrew Caines, Michael McCarthy, Paula Buttery


Abstract
We present an analysis of parser performance on speech data, comparing word type and token frequency distributions with written data, and evaluating parse accuracy by length of input string. We find that parser performance tends to deteriorate with increasing length of string, more so for spoken than for written texts. We train an alternative parsing model with added speech data and demonstrate improvements in accuracy on speech-units, with no deterioration in performance on written text.
Anthology ID:
W17-4604
Volume:
Proceedings of the Workshop on Speech-Centric Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
27–36
Language:
URL:
https://www.aclweb.org/anthology/W17-4604
DOI:
10.18653/v1/W17-4604
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/W17-4604.pdf