Linguistic Issues in Language Technology, Volume 9, 2014 - Perspectives on Semantic Representations for Textual Inference
- Anthology ID:
- CSLI Publications
Recent progress in research of the Recognizing Textual Entailment (RTE) task shows a constantly-increasing level of complexity in this research field. A way to avoid having this complexity becoming a barrier for researchers, especially for new-comers in the field, is to provide a freely available RTE system with a high level of flexibility and extensibility. In this paper, we introduce our RTE system, BiuTee2, and suggest it as an effective research framework for RTE. In particular, BiuTee follows the prominent transformation-based paradigm for RTE, and offers an accessible platform for research within this approach. We describe each of BiuTee’s components and point out the mechanisms and properties which directly support adaptations and integration of new components. In addition, we describe BiuTee’s visual tracing tool, which provides notable assistance for researchers in refining and “debugging” their knowledge resources and inference components.
From a purely theoretical point of view, it makes sense to approach recognizing textual entailment (RTE) with the help of logic. After all, entailment matters are all about logic. In practice, only few RTE systems follow the bumpy road from words to logic. This is probably because it requires a combination of robust, deep semantic analysis and logical inference—and why develop something with this complexity if you perhaps can get away with something simpler? In this article, with the help of an RTE system based on Combinatory Categorial Grammar, Discourse Representation Theory, and first-order theorem proving, we make an empirical assessment of the logic-based approach. High precision paired with low recall is a key characteristic of this system. The bottleneck in achieving high recall is the lack of a systematic way to produce relevant background knowledge. There is a place for logic in RTE, but it is (still) overshadowed by the knowledge acquisition problem.
Beside formal approaches to semantic inference that rely on logical representation of meaning, the notion of Textual Entailment (TE) has been proposed as an applied framework to capture major semantic inference needs across applications in Computational Linguistics. Although several approaches have been tried and evaluation campaigns have shown improvements in TE, a renewed interest is rising in the research community towards a deeper and better understanding of the core phenomena involved in textual inference. Pursuing this direction, we are convinced that crucial progress will derive from a focus on decomposing the complexity of the TE task into basic phenomena and on their combination. In this paper, we carry out a deep analysis on TE data sets, investigating the relations among two relevant aspects of semantic inferences: the logical dimension, i.e. the capacity of the inference to prove the conclusion from its premises, and the linguistic dimension, i.e. the linguistic devices used to accomplish the goal of the inference. We propose a decomposition approach over TE pairs, where single linguistic phenomena are isolated in what we have called atomic inference pairs, and we show that at this granularity level the actual correlation between the linguistic and the logical dimensions of semantic inferences emerges and can be empirically observed.
The lexicon of any natural language encodes a huge number of distinct word meanings. Just to understand this article, you will need to know what thousands of words mean. The space of possible sentential meanings is infinite: In this article alone, you will encounter many sentences that express ideas you have never heard before, we hope. Statistical semantics has addressed the issue of the vastness of word meaning by proposing methods to harvest meaning automatically from large collections of text (corpora). Formal semantics in the Fregean tradition has developed methods to account for the infinity of sentential meaning based on the crucial insight of compositionality, the idea that meaning of sentences is built incrementally by combining the meanings of their constituents. This article sketches a new approach to semantics that brings together ideas from statistical and formal semantics to account, in parallel, for the richness of lexical meaning and the combinatorial power of sentential semantics. We adopt, in particular, the idea that word meaning can be approximated by the patterns of co-occurrence of words in corpora from statistical semantics, and the idea that compositionality can be captured in terms of a syntax-driven calculus of function application from formal semantics.
Classical intensional semantic frameworks, like Montague’s Intensional Logic (IL), identify intensional identity with logical equivalence. This criterion of co-intensionality is excessively coarse-grained, and it gives rise to several well-known difficulties. Theories of fine-grained intensionality have been been proposed to avoid this problem. Several of these provide a formal solution to the problem, but they do not ground this solution in a substantive account of intensional difference. Applying the distinction between operational and denotational meaning, developed for the semantics of programming languages, to the interpretation of natural language expressions, offers the basis for such an account. It permits us to escape some of the complications generated by the traditional modal characterization of intensions.
This paper serves two purposes. It is a summary of much work concerning formal treatments of monotonicity and polarity in natural language, and it also discusses connections to related work on exclusion relations, and connections to psycholinguistics and computational linguistics. The second part of the paper presents a summary of some new work on a formal Monotonicity Calculus.
The relational syllogistic is an extension of the language of Classical syllogisms in which predicates are allowed to feature transitive verbs with quantified objects. It is known that the relational syllogistic does not admit a finite set of syllogism-like rules whose associated (direct) derivation relation is sound and complete. We present a modest extension of this language which does.
Recent implementations of Natural Logic (NLog) have shown that NLog provides a quite direct means of going from sentences in ordinary language to many of the obvious entailments of those sentences. We show here that Episodic Logic (EL) and its Epilog implementation are well-adapted to capturing NLog-like inferences, but beyond that, also support inferences that require a combination of lexical knowledge and world knowledge. However, broad language understanding and commonsense reasoning are still thwarted by the “knowledge acquisition bottleneck”, and we summarize some of our ongoing and contemplated attacks on that persistent difficulty.
We introduce a new formal semantic model for annotating textual entailments that describes restrictive, intersective, and appositive modification. The model contains a formally defined interpreted lexicon, which specifies the inventory of symbols and the supported semantic operators, and an informally defined annotation scheme that instructs annotators in which way to bind words and constructions from a given pair of premise and hypothesis to the interpreted lexicon. We explore the applicability of the proposed model to the Recognizing Textual Entailment (RTE) 1–4 corpora and describe a first-stage annotation scheme on which we based the manual annotation work. The constructions we annotated were found to occur in 80.65% of the entailments in RTE 1–4 and were annotated with cross-annotator agreement of 68% on average. The annotated parts of the RTE corpora are publicly available for further research.
The role of inference as it relates to natural language (NL) semantics has often been neglected. Recently, there has been a move away by some NL semanticists from the heavy machinery of, say, Montagovianstyle semantics to a more proof-based approach. Although researchers tend to study each type of system independently, MacCartney (2009) and MacCartney and Manning (2009) (henceforth M&M) recently developed an algorithmic approach to natural logic that attempts to combine insights from both monotonicity calculi and various syllogistic fragments to derive compositionally the relation between two NL sentences from the relations of their parts. At the heart of their system, M&M begin with seven intuitive lexicalsemantic relations that NL expressions can stand in, e.g., synonymy and antonymy, and then ask the question: if ' stands in some lexicalsemantic relation to ; and stands in (a possibly different) lexicalsemantic relation to ✓; what lexical-semantic relation (if any) can be concluded about the relation between ' and ✓? This type of reasoning has the familiar shape of a logical inference rule. However, the logical properties of their join table have not been explored in any real detail. The purpose of this paper is to give M&M’s table a proper logical treatment. As I will show, the table has the underlying form of a syllogistic fragment and relies on a sort of generalized transitive reasoning.