upbib
pdf
bib
abs
Proceedings of the Second International Workshop on Parsing Technologies (IWPT ’91)
Masaru Tomita

Martin Kay

Robert Berwick

Eva Hajicova

Aravind Joshi

Ronald Kaplan

Makoto Nagao

Yorick Wilks
pdf
bib
abs
Parsing without Parser
Koîti Hasida

Hiroshi Tsuda
In the domain of artificial intelligence, the pattern of information flow varies drastically from one context to another. To capture this diversity of information flow, a naturallanguage processing (NLP) system should consist of modules of constraints and one general constraint solver to process all of them; there should be no specialized procedure module such as a parser and a generator. This paper presents how to implement such a constraintbased approach to NLP. Dependency Propagation (DP) is a constraint solver which transforms the program (=constraint) represented in terms of logic programs. Constraint Unification (CU) is a unification method incorporating DP. cuProlog is an extended Prolog which employs CU instead of the standard unification. cuProlog can treat some lexical and grammatical knowledge as constraints on the structure of grammatical categories, enabling a very straightforward implementation of a parser using constraintbased grammars. By extending DP, one can deal efficiently with phrase structures in terms of constraints. Computation on category structures and phrase structures are naturally integrated in an extended DP. The computation strategies to do all this are totally attributed to a very abstract, taskindependent principle: prefer computation using denser information. Efficient parsing is hence possible without any parser.
pdf
bib
abs
Parsing = Parsimonious Covering?
Venu Dasigi
Many researchers believe that certain aspects of natural language processing, such as word sense disambiguation and plan recognition in stories, constitute abductive inferences. We have been working with a specific model of abduction, called parsimonious covering, applied in diagnostic problem solving, word sense disambiguation and logical form generation in some restricted settings. Diagnostic parsimonious covering has been extended into a dualroute model to account for syntactic and semantic aspects of natural language. The two routes of covering are integrated by defining “open class” linguistic concepts, aiding each other. The diagnostic model has dealt with sets, while the extended version, where syntactic considerations dictate word order, deals with sequences of linguistic concepts. Here we briefly describe the original model and the extended version, and briefly characterize the notions of covering and different criteria of parsimony. Finally we examine the question of whether parsimonious covering can serve as a general framework for parsing.
pdf
bib
abs
The Valid Prefix Property and Left to Right Parsing of TreeAdjoining Grammar
Yves Schabes
The valid prefix property (VPP), the capability of a left to right parser to detect errors as soon as possible, often goes unnoticed in parsing CFGs. Earley’s parser for CFGs (Earley, 1968; Earley, 1970) maintains the valid prefix property and obtains an O(n^{3})time worst case complexity, as good as parsers that do not maintain such as the CKY parser (Younger, 1967; Kasami, 1965). Contrary to CFGs, maintaining the valid prefix property for TAGs is costly. In 1988, Schabes and Joshi proposed an Earleytype parser for TAGs. It maintains the valid prefix property at the expense of its worst case complexity (O(n^{9})time). To our knowledge, it is the only known polynomial time parser for TAGs that maintains the valid prefix property. In this paper, we explain why the valid prefix property is expensive to maintain for TAGs and we introduce a predictive left to right parser for TAGs that does not maintain the valid prefix property but that achieves an O(n^{6})time worst case behavior, O(n^{4})time for unambiguous grammars and linear time for a large class of grammars.
pdf
bib
abs
Preprocessing and lexicon design for parsing technical text
Robert P. Futrelle

Christopher E. Dunn

Debra S. Ellis

Maurice J. Pescitelli, Jr.
Technical documents with complex structures and orthography present special difficulties for current parsing technology. These include technical notation such as subscripts, superscripts and numeric and algebraic expressions as well as Greek letters, italics, small capitals, brackets and punctuation marks. Structural elements such as references to figures, tables and bibliographic items also cause problems. We first handcode documents in Standard Generalized Markup Language (SGML) to specify the document’s logical structure (paragraphs, sentences, etc.) and capture significant orthography. Next, a regular expression analyzer produced by LEX is used to tokenize the SGML text. Then a tokenbased phrasal lexicon is used to identify the longest token sequences in the input that represent single lexical items. This lookup is efficient because limits on lookahead are precomputed for every item. After this, the Alvey Tools parser with specialized subgrammars is used to discover items such as floatingpoint numbers. The product of these preprocessing stages is a text that is acceptable to a full natural language parser. This work is directed towards automating the building of knowledge bases from research articles in the field of bacterial chemotaxis, but the techniques should be of wide applicability.
pdf
bib
abs
Incremental LL(1) Parsing in LanguageBased Editors
John J. Shilling
This paper introduces an efficient incremental LL(1) parsing algorithm for use in languagebased editors that use the structure recognition approach. It features very fine grained analysis and a unique approach to parse control and error recovery. It also presents incomplete LL(1) grammars as a way of dealing with the complexity of full language grammars and as a mechanism for providing structured editor support for task languages that are only partially structured. The semantics of incomplete grammars are presented and it is shown how incomplete LL(1) grammars can be transformed into complete LL(1) grammars. The algorithms presented have been implemented in the fred languagebased editor
pdf
bib
abs
Linguistic Information in the Databases as a Basis for Linguistic Parsing Algorithms
Tatiana A. Apollonskaya

Larissa N. Beliaeva

Raimund G. Piotrowski
The focus of this paper is investigation of linguistic data base design in conjugation with parsing algorithms. The structure of linguistic data base in natural language processing systems, the structure of lexicon items and the structure and the volume of linguistic information in automatic dictionary is the base for linguistic parsing organization.
pdf
bib
abs
Binding Pronominals with an LFG Parser
Radolfo Delmonte

Dario Bianchi
This paper describes an implemented algorithm for handling pronominal reference and anaphoric control within an LFG framework. At first there is a brief description of the grammar implemented in Prolog using XGs (extraposition grammars) introduced by Pereira (1981;1983). Then the algorithm mapping binding equations is discussed at length. In particular the algorithm makes use of fcommand together with the obviation principle, rather than ccommand which is shown to be insufficient to explain the facts of binding of both English and Italian. Previous work (Ingria,1989;Hobbs,1978) was based on English and the classes of pronominals to account for were two: personal and possessive pronouns and anaphors  reflexives and reciprocals. In Italian, and in other languages of the world, the classes are many more. We dealt with four: a.pronouns  personal and independent pronouns, epithets, possessive pronouns; b.clitic pronouns and Morphologically Unexpressed PRO/pros; c.long distance anaphors; short distance anaphors. Binding of anaphors and coreference of pronouns is extensively shown to depend on structural properties of fstructures, on thematic roles and grammatical functions associated with the antecedents or controller, on definiteness of NPs and mood of clausal fstructures. The algorithm uses feature matrixes to tell pronominal classes apart and scores to determine the ranking of candidates for antecedenthood, as well as for restricting the behaviour of proforms and anaphors.
pdf
bib
abs
A Hybrid Model of Human Sentence Processing: Parsing RightBranching, CenterEmbedded and CrossSerial Dependencies
Theo Vosse

Gerard Kempen
A new cognitive architecture for the syntactic aspects of human sentence processing (called Unification Space) is tested against experimental data from human subjects. The data, originally collected by Bach, Brown and MarslenWilson (1986), concern the comprehensibility of verb dependency constructions in Dutch and German: rightbranching, centerembedded, and crossserial dependencies of one to four levels deep. A satisfactory fit is obtained between comprehensibility data and parsability scores in the model.
pdf
bib
abs
Using Inheritance in ObjectOriented Programming to Combine Syntactic Rules and Lexical Idiosyncrasies
Benoît Habert
In parsing idioms and frozen expressions in French, one needs to combine general syntactic rules and idiosyncratic constraints. The inheritance structure provided by ObjectOriented Programming languages, and more specifically the combination of methods present in CLOS, Common Lisp Object System, appears as an elegant and efficient approach to deal with such a complex interaction.
pdf
bib
abs
An LR(k) Error Diagnosis and Recovery Method
Philippe Charles
In this paper, a new practical, efficient and languageindependent syntactic error recovery method for LR(k) parsers is presented. This method is similar to and builds upon the threelevel approach of BurkeFisher. However, it is more time and spaceefficient and fully automatic.
pdf
bib
abs
Adaptive Probabilistic Generalized LR Parsing
Jerry Wright

Ave Wrigley

Richard Sharman
Various issues in the implementation of generalized LR parsing with probability are discussed. A method for preventing the generation of infinite numbers of states is described and the space requirements of the parsing tables are assessed for a substantial naturallanguage grammar. Because of a high degree of ambiguity in the grammar, there are many multiple entries and the tables are rather large. A new method for grammar adaptation is introduced which may help to reduce this problem. A probabilistic version of the Tomita parse forest is also described.
pdf
bib
abs
Phonological Analysis and Opaque Rule Orders
Michael Maxwell
General morphological/phonological analysis using ordered phonological rules has appeared to be computationally expensive, because ambiguities in feature values arising when phonological rules are “unapplied” multiply with additional rules. But in fact those ambiguities can be largely ignored until lexical lookup, since the underlying values of altered features are needed only in the case of rare opaque rule orderings, and not always then.
pdf
bib
abs
An Efficient Connectionist ContextFree Parser
Klaas Sikkel

Anton Nijholt
A connectionist network is defined that parses a grammar in Chomsky Normal Form in logarithmic time, based on a modification of Rytter’s recognition algorithm. A similar parsing network can be defined for an arbitrary contextfree grammar. Such networks can be integrated into a connectionist parsing environment for interactive distributed processing of syntactic, semantic and pragmatic information.
pdf
bib
abs
Slow and Fast Parallel Recognition
Hans de Vreught

Job Honig
In the first part of this paper a slow parallel recognizer is described for general CFG’s. The recognizer runs in 𝛩(n^{3}/p(n)) time with p(n) = O(n^{2}) processors. It generalizes the items of the Earley algorithm to double dotted items, which are more suited to parallel parsing. In the second part a fast parallel recognizer is given for general CFG’s. The recognizer runs in O(log n) time using O(n^{6}) processors. It is a generalisation of the Gibbons and Rytter algorithm for grammars in CNF.
pdf
bib
abs
Processing Unknown Words in Continuous Speech Recognition
Kenji Kita

Terumasa Ehara

Tsuyoshi Morimoto
Current continuous speech recognition systems essentially ignore unknown words. Systems are designed to recognize words in the lexicon. However, for using speech recognition systems in real applications of spokenlanguage processing, it is very important to process unknown words. This paper proposes a continuous speech recognition method which accepts any utterance that might include unknown words. In this method, words not in the lexicon are transcribed as phone sequences, while words in the lexicon are recognized correctly. The HMMLR speech recognition system, which is an integration of Hidden Markov Models and generalized LR parsing, is used as the baseline system, and enhanced with the trigram model of syllables to take into account the stochastic characteristics of a language. Preliminary results indicate that our approach is very promising.
pdf
bib
abs
The Specification and Implementation of ConstraintBased Unification Grammars
Robert Carpenter

Carl Pollard

Alex Franz
Our aim is to motivate and provide a specification for a unificationbased natural language processing system where grammars are expressed in terms of principles which constrain linguistic representations. Using typed feature structures with multiple inheritance for our linguistic representations and definite attributevalue logic clauses to express constraints, we will develop the bare essentials required for an implementation of a parser and generator for the Headdriven Phrase Structure Grammar (HPSG) formalism of Pollard and Sag (1987).
pdf
bib
abs
Probabilistic LR Parsing for General ContextFree Grammars
SeeKiong Ng

Masaru Tomita
To combine the advantages of probabilistic grammars and generalized LR parsing, an algorithm for constructing a probabilistic LR parser given a probabilistic contextfree grammar is needed. In this paper, implementation issues in adapting Tomita’s generalized LR parser with graphstructured stack to perform probabilistic parsing are discussed. Wright and Wrigley (1989) has proposed a probabilistic LRtable construction algorithm for nonleftrecursive contextfree grammars. To account for left recursions, a method for computing item probabilities using the generation of systems of linear equations is presented. The notion of deferred probabilities is proposed as a means for dealing with similar item sets with differing probability assignments.
pdf
bib
abs
QuasiDestructive Graph Unification
Hideto Tomabechi
Graph unification is the most expensive part of unificationbased grammar parsing. It often takes over 90% of the total parsing time of a sentence. We focus on two speedup elements in the design of unification algorithms: 1) elimination of excessive copying by only copying successful unifications, 2) Finding unification failures as soon as possible. We have developed a scheme to attain these two criteria without expensive overhead through temporarily modifying graphs during unification to eliminate copying during unification. The temporary modification is invalidated in constant time and therefore, unification can continue looking for a failure without the overhead associated with copying. After a successful unification because the nodes are temporarily prepared for copying, a fast copying can be performed without overhead for handling reentrancy, loops and variables. We found that parsing relatively long sentences (requiring about 500 unifications during a parse) using our algorithm is 100 to 200 percent faster than parsing the same sentences using Wroblewski’s algorithm.
pdf
bib
abs
Unification Algorithms for Massively Parallel Computers
Hiroaki Kitano
This paper describes unification algorithms for finegrained massively parallel computers. The algorithms are based on a parallel markerpassing scheme. The markerpassing scheme in our algorithms carry only bitvectors, address pointers and values. Because of their simplicity, our algorithms can be implemented on various architectures of massively parallel machines without loosing the inherent benefits of parallel computation. Also, we describe two augmentations of unification algorithms such as multiple unification and fuzzy unification. Experimental results indicate that our algorithm attaines more than 500 unification per seconds (for DAGs of average depth of 4) and has a linear timecomplexity. This leads to possible implementations of massively parallel natural language parsing with full linguistic analysis.
pdf
bib
abs
UnificationBased Dependency Parsing of GovernorFinal Languages
HyukChul Kwon

Aesun Yoon
This paper describes a unificationbased dependency parsing method for governorfinal languages. Our method can parse not only projective sentences but also nonprojective sentences. The feature structures in the tradition of the unificationbased formalism are used for writing dependency relations. We use a structure sharing and a local ambiguity packing to save storage.
pdf
bib
abs
Pearl: A Probabilistic Chart Parser
David M. Magerman

Mitchell P. Marcus
This paper describes a natural language parsing algorithm for unrestricted text which uses a probabilitybased scoring function to select the “best” parse of a sentence. The parser, Pearl, is a timeasynchronous bottomup chart parser with Earleytype topdown prediction which pursues the highestscoring theory in the chart, where the score of a theory represents the extent to which the context of the sentence predicts that interpretation. This parser differs from previous attempts at stochastic parsers in that it uses a richer form of conditional probabilities based on context to predict likelihood. Pearl also provides a framework for incorporating the results of previous work in partofspeech assignment, unknown word models, and other probabilistic models of linguistic features into one parsing tool, interleaving these techniques instead of using the traditional pipeline architecture. In preliminary tests, Pearl has been successful at resolving partofspeech and word (in speech processing) ambiguity, determining categories for unknown words, and selecting correct parses first using a very loosely fitting covering grammar.
pdf
bib
abs
Local Syntactic Constraints
Jacky Herz

Mori Rimon
A method to reduce ambiguity at the level of word tagging, on the basis of local syntactic constraints, is described. Such “short context” constraints are easy to process and can remove most of the ambiguity at that level, which is otherwise a source of great difficulty for parsers and other applications in certain natural languages. The use of local constraints is also very effective for quick invalidation of a large set of illformed inputs. While in some approaches local constraints are defined manually or discovered by processing of large corpora, we extract them directly from a grammar (typically context free) of the given language. We focus on deterministic constraints, but later extend the method for a probabilistic language model.
pdf
bib
abs
Stochastic ContextFree Grammars for IslandDriven Probabilistic Parsing
Anna Corazza

Renato De Mori

Roberto Gretter

Giorgio Satta
In automatic speech recognition the use of language models improves performance. Stochastic language models fit rather well the uncertainty created by the acoustic pattern matching. These models are used to score theories corresponding to partial interpretations of sentences. Algorithms have been developed to compute probabilities for theories that grow in a strictly lefttoright fashion. In this paper we consider new relations to compute probabilities of partial interpretations of sentences. We introduce theories containing a gap corresponding to an uninterpreted signal segment. Algorithms can be easily obtained from these relations. Computational complexity of these algorithms is also derived.
pdf
bib
abs
Substring Parsing for Arbitrary ContextFree Grammars
Jan Rekers

Wilco Koorn
A substring recognizer for a language L determines whether a string s is a substring of a sentence in L, i.e., substringrecognize(s) succeeds if and only if ∃ v,w: vsw ∈ L. The algorithm for substring recognition presented here accepts general contextfree grammars and uses the same parse tables as the parsing algorithm from which it was derived. Substring recognition is useful for noncorrecting syntax error recovery and for incremental parsing. By extending the substring recognizer with the ability to generate trees for the possible contextual completions of the substring, we obtain a substring parser, which can be used in a syntaxdirected editor to complete fragments of sentences.
pdf
bib
abs
Parsing with Relational Unification Grammars
Kent Wittenburg
In this paper we present a unificationbased grammar formalism and parsing algorithm for the purposes of defining and processing nonconcatenative languages. In order to encompass languages that are characterized by relations beyond simple string concatenation, we introduce relational constraints into a linguisticallybased unification grammar formalism and extend bottomup chart parsing methods. This work is currently being applied in the interpretation of handsketched mathematical expressions and structured flowcharts on notebook computers and interactive worksurfaces.
pdf
bib
abs
Parsing 2D Languages with Positional Grammars
Gennaro Costagliola

ShiKuo Chang
In this paper we will present a way to parse twodimensional languages using LR parsing tables. To do this we describe twodimensional (positional) grammars as a generalization of the contextfree string grammars. The main idea behind this is to allow a traditional LR parser to choose the next symbol to parse from a twodimensional space. Cases of ambiguity are analyzed and some ways to avoid them are presented. Finally, we construct a parser for the twodimensional arithmetic expression language and implement it by using the tool Yacc.