André Freitas

Also published as: Andre Freitas


2020

pdf bib
Premise Selection in Natural Language Mathematical Texts
Deborah Ferreira | André Freitas
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The discovery of supporting evidence for addressing complex mathematical problems is a semantically challenging task, which is still unexplored in the field of natural language processing for mathematical text. The natural language premise selection task consists in using conjectures written in both natural language and mathematical formulae to recommend premises that most likely will be useful to prove a particular statement. We propose an approach to solve this task as a link prediction problem, using Deep Convolutional Graph Neural Networks. This paper also analyses how different baselines perform in this task and shows that a graph structure can provide higher F1-score, especially when considering multi-hop premise selection.

pdf bib
Natural Language Premise Selection: Finding Supporting Statements for Mathematical Text
Deborah Ferreira | André Freitas
Proceedings of the 12th Language Resources and Evaluation Conference

Mathematical text is written using a combination of words and mathematical expressions. This combination, along with a specific way of structuring sentences makes it challenging for state-of-art NLP tools to understand and reason on top of mathematical discourse. In this work, we propose a new NLP task, the natural premise selection, which is used to retrieve supporting definitions and supporting propositions that are useful for generating an informal mathematical proof for a particular statement. We also make available a dataset, NL-PS, which can be used to evaluate different approaches for the natural premise selection task. Using different baselines, we demonstrate the underlying interpretation challenges associated with the task.

pdf bib
A Framework for Evaluation of Machine Reading Comprehension Gold Standards
Viktor Schlegel | Marco Valentino | Andre Freitas | Goran Nenadic | Riza Batista-Navarro
Proceedings of the 12th Language Resources and Evaluation Conference

Machine Reading Comprehension (MRC) is the task of answering a question over a paragraph of text. While neural MRC systems gain popularity and achieve noticeable performance, issues are being raised with the methodology used to establish their performance, particularly concerning the data design of gold standards that are used to evaluate them. There is but a limited understanding of the challenges present in this data, which makes it hard to draw comparisons and formulate reliable hypotheses. As a first step towards alleviating the problem, this paper proposes a unifying framework to systematically investigate the present linguistic features, required reasoning and background knowledge and factual correctness on one hand, and the presence of lexical cues as a lower bound for the requirement of understanding on the other hand. We propose a qualitative annotation schema for the first and a set of approximative metrics for the latter. In a first application of the framework, we analyse modern MRC gold standards and present our findings: the absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.

2019

pdf bib
Identifying and Explaining Discriminative Attributes
Armins Stepanjans | André Freitas
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Identifying what is at the center of the meaning of a word and what discriminates it from other words is a fundamental natural language inference task. This paper describes an explicit word vector representation model (WVM) to support the identification of discriminative attributes. A core contribution of the paper is a quantitative and qualitative comparative analysis of different types of data sources and Knowledge Bases in the construction of explainable and explicit WVMs: (i) knowledge graphs built from dictionary definitions, (ii) entity-attribute-relationships graphs derived from images and (iii) commonsense knowledge graphs. Using a detailed quantitative and qualitative analysis, we demonstrate that these data sources have complementary semantic aspects, supporting the creation of explicit semantic vector spaces. The explicit vector spaces are evaluated using the task of discriminative attribute identification, showing comparable performance to the state-of-the-art systems in the task (F1-score = 0.69), while delivering full model transparency and explainability.

pdf bib
Identifying Supporting Facts for Multi-hop Question Answering with Document Graph Networks
Mokanarangan Thayaparan | Marco Valentino | Viktor Schlegel | André Freitas
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

Recent advances in reading comprehension have resulted in models that surpass human performance when the answer is contained in a single, continuous passage of text. However, complex Question Answering (QA) typically requires multi-hop reasoning - i.e. the integration of supporting facts from different sources, to infer the correct answer. This paper proposes Document Graph Network (DGN), a message passing architecture for the identification of supporting facts over a graph-structured representation of text. The evaluation on HotpotQA shows that DGN obtains competitive results when compared to a reading comprehension baseline operating on raw text, confirming the relevance of structured representations for supporting multi-hop reasoning.

pdf bib
DBee: A Database for Creating and Managing Knowledge Graphs and Embeddings
Viktor Schlegel | André Freitas
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

This paper describes DBee, a database to support the construction of data-intensive AI applications. DBee provides a unique data model which operates jointly over large-scale knowledge graphs (KGs) and embedding vector spaces (VSs). This model supports queries which exploit the semantic properties of both types of representations (KGs and VSs). Additionally, DBee aims to facilitate the construction of KGs and VSs, by providing a library of generators, which can be used to create, integrate and transform data into KGs and VSs.

pdf bib
Transforming Complex Sentences into a Semantic Hierarchy
Christina Niklaus | Matthias Cetto | André Freitas | Siegfried Handschuh
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We present an approach for recursively splitting and rephrasing complex English sentences into a novel semantic hierarchy of simplified sentences, with each of them presenting a more regular structure that may facilitate a wide variety of artificial intelligence tasks, such as machine translation (MT) or information extraction (IE). Using a set of hand-crafted transformation rules, input sentences are recursively transformed into a two-layered hierarchical representation in the form of core sentences and accompanying contexts that are linked via rhetorical relations. In this way, the semantic relationship of the decomposed constituents is preserved in the output, maintaining its interpretability for downstream applications. Both a thorough manual analysis and automatic evaluation across three datasets from two different domains demonstrate that the proposed syntactic simplification approach outperforms the state of the art in structural text simplification. Moreover, an extrinsic evaluation shows that when applying our framework as a preprocessing step the performance of state-of-the-art Open IE systems can be improved by up to 346% in precision and 52% in recall. To enable reproducible research, all code is provided online.

pdf bib
MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions
Christina Niklaus | André Freitas | Siegfried Handschuh
Proceedings of the 12th International Conference on Natural Language Generation

We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.

pdf bib
DisSim: A Discourse-Aware Syntactic Text Simplification Framework for English and German
Christina Niklaus | Matthias Cetto | André Freitas | Siegfried Handschuh
Proceedings of the 12th International Conference on Natural Language Generation

We introduce DisSim, a discourse-aware sentence splitting framework for English and German whose goal is to transform syntactically complex sentences into an intermediate representation that presents a simple and more regular structure which is easier to process for downstream semantic applications. For this purpose, we turn input sentences into a two-layered semantic hierarchy in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them. In that way, we preserve the coherence structure of the input and, hence, its interpretability for downstream tasks.

2018

pdf bib
Indra: A Word Embedding and Semantic Relatedness Server
Juliano Efson Sales | Leonardo Souza | Siamak Barzegar | Brian Davis | André Freitas | Siegfried Handschuh
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Multilingual Test Collection for the Semantic Search of Entity Categories
Juliano Efson Sales | Siamak Barzegar | Wellington Franco | Bernhard Bermeitinger | Tiago Cunha | Brian Davis | André Freitas | Siegfried Handschuh
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs
Thomas Gaillat | Manel Zarrouk | André Freitas | Brian Davis
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Building a Knowledge Graph from Natural Language Definitions for Interpretable Text Entailment Recognition
Vivian Silva | André Freitas | Siegfried Handschuh
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages
Siamak Barzegar | Brian Davis | Manel Zarrouk | Siegfried Handschuh | Andre Freitas
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Graphene: Semantically-Linked Propositions in Open Information Extraction
Matthias Cetto | Christina Niklaus | André Freitas | Siegfried Handschuh
Proceedings of the 27th International Conference on Computational Linguistics

We present an Open Information Extraction (IE) approach that uses a two-layered transformation stage consisting of a clausal disembedding layer and a phrasal disembedding layer, together with rhetorical relation identification. In that way, we convert sentences that present a complex linguistic structure into simplified, syntactically sound sentences, from which we can extract propositions that are represented in a two-layered hierarchy in the form of core relational tuples and accompanying contextual information which are semantically linked via rhetorical relations. In a comparative evaluation, we demonstrate that our reference implementation Graphene outperforms state-of-the-art Open IE systems in the construction of correct n-ary predicate-argument structures. Moreover, we show that existing Open IE approaches can benefit from the transformation process of our framework.

pdf bib
A Survey on Open Information Extraction
Christina Niklaus | Matthias Cetto | André Freitas | Siegfried Handschuh
Proceedings of the 27th International Conference on Computational Linguistics

We provide a detailed overview of the various approaches that were proposed to date to solve the task of Open Information Extraction. We present the major challenges that such systems face, show the evolution of the suggested approaches over time and depict the specific issues they address. In addition, we provide a critique of the commonly applied evaluation procedures for assessing the performance of Open IE systems and highlight some directions for future work.

pdf bib
Graphene: a Context-Preserving Open Information Extraction System
Matthias Cetto | Christina Niklaus | André Freitas | Siegfried Handschuh
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

We introduce Graphene, an Open IE system whose goal is to generate accurate, meaningful and complete propositions that may facilitate a variety of downstream semantic applications. For this purpose, we transform syntactically complex input sentences into clean, compact structures in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them in order to maintain their semantic relationship. In that way, we preserve the context of the relational tuples extracted from a source sentence, generating a novel lightweight semantic representation for Open IE that enhances the expressiveness of the extracted propositions.

2017

pdf bib
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Keith Cortis | André Freitas | Tobias Daudert | Manuela Huerlimann | Manel Zarrouk | Siegfried Handschuh | Brian Davis
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper discusses the “Fine-Grained Sentiment Analysis on Financial Microblogs and News” task as part of SemEval-2017, specifically under the “Detecting sentiment, humour, and truth” theme. This task contains two tracks, where the first one concerns Microblog messages and the second one covers News Statements and Headlines. The main goal behind both tracks was to predict the sentiment score for each of the mentioned companies/stocks. The sentiment scores for each text instance adopted floating point values in the range of -1 (very negative/bearish) to 1 (very positive/bullish), with 0 designating neutral sentiment. This task attracted a total of 32 participants, with 25 participating in Track 1 and 29 in Track 2.

pdf bib
SemEval-2017 Task 11: End-User Development using Natural Language
Juliano Sales | Siegfried Handschuh | André Freitas
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This task proposes a challenge to support the interaction between users and applications, micro-services and software APIs using natural language. The task aims for supporting the evaluation and evolution of the discussions surrounding the natural language processing approaches within the context of end-user natural language programming, under scenarios of high semantic heterogeneity/gap.

2016

pdf bib
NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
Frederico Tommasi Caroli | André Freitas | João Carlos Pereira da Silva | Siegfried Handschuh
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Lately, with the success of Deep Learning techniques in some computational linguistics tasks, many researchers want to explore new models for their linguistics applications. These models tend to be very different from what standard Neural Networks look like, limiting the possibility to use standard Neural Networks frameworks. This work presents NNBlocks, a new framework written in Python to build and train Neural Networks that are not constrained by a specific kind of architecture, making it possible to use it in computational linguistics.

pdf bib
A Sentence Simplification System for Improving Relation Extraction
Christina Niklaus | Bernhard Bermeitinger | Siegfried Handschuh | André Freitas
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

We present a text simplification approach that is directed at improving the performance of state-of-the-art Open Relation Extraction (RE) systems. As syntactically complex sentences often pose a challenge for current Open RE approaches, we have developed a simplification framework that performs a pre-processing step by taking a single sentence as input and using a set of syntactic-based transformation rules to create a textual input that is easier to process for subsequently applied Open RE systems.

pdf bib
Semantic Relation Classification: Task Formalisation and Refinement
Vivian Santos | Manuela Huerliman | Brian Davis | Siegfried Handschuh | André Freitas
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded, allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.

pdf bib
Categorization of Semantic Roles for Dictionary Definitions
Vivian Silva | Siegfried Handschuh | André Freitas
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

Understanding the semantic relationships between terms is a fundamental task in natural language processing applications. While structured resources that can express those relationships in a formal way, such as ontologies, are still scarce, a large number of linguistic resources gathering dictionary definitions is becoming available, but understanding the semantic structure of natural language definitions is fundamental to make them useful in semantic interpretation tasks. Based on an analysis of a subset of WordNet’s glosses, we propose a set of semantic roles that compose the semantic structure of a dictionary definition, and show how they are related to the definition’s syntactic configuration, identifying patterns that can be used in the development of information extraction frameworks and semantic models.

pdf bib
A Compositional-Distributional Semantic Model for Searching Complex Entity Categories
Juliano Efson Sales | André Freitas | Brian Davis | Siegfried Handschuh
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

2015

pdf bib
How hard is this query? Measuring the Semantic Complexity of Schema-agnostic Queries
André Freitas | Juliano Efson Sales | Siegfried Handschuh | Edward Curry
Proceedings of the 11th International Conference on Computational Semantics