Syntactic parsers have dominated natural language understanding for decades. Yet, their syntactic interpretations are losing centrality in downstream tasks due to the success of large-scale textual representation learners. In this paper, we propose KERMIT (Kernel-inspired Encoder with Recursive Mechanism for Interpretable Trees) to embed symbolic syntactic parse trees into artificial neural networks and to visualize how syntax is used in inference. We experimented with KERMIT paired with two state-of-the-art transformer-based universal sentence encoders (BERT and XLNet) and we showed that KERMIT can indeed boost their performance by effectively embedding human-coded universal syntactic representations in neural networks
In ontology learning from texts, we have ontology-rich domains where we have large structured domain knowledge repositories or we have large general corpora with large general structured knowledge repositories such as WordNet (Miller, 1995). Ontology learning methods are more useful in ontology-poor domains. Yet, in these conditions, these methods have not a particularly high performance as training material is not sufficient. In this paper we present an LSP ontology learning method that can exploit models learned from a generic domain to extract new information in a specific domain. In our model, we firstly learn a model from training data and then we use the learned model to discover knowledge in a specific domain. We tested our model adaptation strategy using a background domain that is applied to learn the isa networks in the Earth Observation Domain as a specific domain. We will demonstrate that our method captures domain knowledge better than other generic models: our model better captures what is expected by domain experts than a baseline method based only on WordNet. This latter is better correlated with non-domain annotators asked to produce the ontology for the specific domain.
The research field of extracting knowledge bases from text collections seems to be mature: its target and its working hypotheses are clear. In this paper we propose a platform, YAPEK, i.e., Yet Another Platform for Extracting Knowledge from corpora, that wants to be the base to collect the majority of algorithms for extracting knowledge bases from corpora. The idea is that, when many knowledge extraction algorithms are collected under the same platform, relative comparisons are clearer and many algorithms can be leveraged to extract more valuable knowledge for final tasks such as Textual Entailment Recognition. As we want to collect many knowledge extraction algorithms, YAPEK is based on the three working hypotheses of the area: the basic hypothesis, the distributional hypothesis, and the point-wise assertion patterns. In YAPEK, these three hypotheses define two spaces: the space of the target textual forms and the space of the contexts. This platform guarantees the possibility of rapidly implementing many models for extracting knowledge from corpora as the platform gives clear entry points to model what is really different in the different algorithms: the feature spaces, the distances in these spaces, and the actual algorithm.