Parsing with Traces: An O(n4) Algorithm and a Structural Representation

Jonathan K. Kummerfeld, Dan Klein


Abstract
General treebank analyses are graph structured, but parsers are typically restricted to tree structures for efficiency and modeling reasons. We propose a new representation and algorithm for a class of graph structures that is flexible enough to cover almost all treebank structures, while still admitting efficient learning and inference. In particular, we consider directed, acyclic, one-endpoint-crossing graph structures, which cover most long-distance dislocation, shared argumentation, and similar tree-violating linguistic phenomena. We describe how to convert phrase structure parses, including traces, to our new representation in a reversible manner. Our dynamic program uniquely decomposes structures, is sound and complete, and covers 97.3% of the Penn English Treebank. We also implement a proof-of-concept parser that recovers a range of null elements and trace types.
Anthology ID:
Q17-1031
Volume:
Transactions of the Association for Computational Linguistics, Volume 5
Month:
Year:
2017
Address:
Venue:
TACL
SIG:
Publisher:
Note:
Pages:
441–454
Language:
URL:
https://www.aclweb.org/anthology/Q17-1031
DOI:
10.1162/tacl_a_00072
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/Q17-1031.pdf
Video:
 https://vimeo.com/238235203