Parsing Without Grammar

Shinsuke Mori, Makoto Nagao


Abstract
We describe and evaluate experimentally a method to parse a tagged corpus without grammar modeling a natural language on context-free language. This method is based on the following three hypotheses. 1) Part-of-speech sequences on the right-hand side of a rewriting rule are less constrained as to what part-of-speech precedes and follows them than non-constituent sequences. 2) Part-of-speech sequences directly derived from the same non-terminal symbol have similar environments. 3) The most suitable set of rewriting rules makes the greatest reduction of the corpus size. Based on these hypotheses, the system finds a set of constituent-like part-of-speech sequences and replaces them with a new symbol. The repetition of these processes brings us a set of rewriting rules, a grammar, and the bracketed corpus.
Anthology ID:
1995.iwpt-1.22
Volume:
Proceedings of the Fourth International Workshop on Parsing Technologies
Month:
September 20-24
Year:
1995
Address:
Prague and Karlovy Vary, Czech Republic
Venues:
IWPT | WS
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
174–185
Language:
URL:
https://www.aclweb.org/anthology/1995.iwpt-1.22
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/1995.iwpt-1.22.pdf