Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels

Ted Briscoe, John Carroll


Abstract
We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.
Anthology ID:
1995.iwpt-1.8
Volume:
Proceedings of the Fourth International Workshop on Parsing Technologies
Month:
September 20-24
Year:
1995
Address:
Prague and Karlovy Vary, Czech Republic
Venues:
IWPT | WS
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–58
Language:
URL:
https://www.aclweb.org/anthology/1995.iwpt-1.8
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/1995.iwpt-1.8.pdf