Brian Davis


2020

pdf bib
Proceedings of the 13th International Conference on Natural Language Generation
Brian Davis | Yvette Graham | John Kelleher | Yaji Sripada
Proceedings of the 13th International Conference on Natural Language Generation

pdf bib
Towards the Ontologization of the Outsider Art Domain: Position Paper
John Roberto | Brian Davis
16th Joint ACL - ISO Workshop on Interoperable Semantic Annotation PROCEEDINGS

The purpose of this paper is to present a prospective and interdisciplinary research project seeking to ontologize knowledge of the domain of Outsider Art, that is, the art created outside the boundaries of official culture. The goal is to combine ontology engineering methodologies to develop a knowledge base which i) examines the relation between social exclusion and cultural productions, ii) standardizes the terminology of Outsider Art and iii) enables semantic interoperability between cultural metadata relevant to Outsider Art. The Outsider Art ontology will integrate some existing ontologies and terminologies, such as the CIDOC - Conceptual Reference Model (CRM), the Art & Architecture Thesaurus and the Getty Union List of Artist Names, among other resources. Natural Language Processing and Machine Learning techniques will be fundamental instruments for knowledge acquisition and elicitation. NLP techniques will be used to annotate bibliographies of relevant outsider artists and descriptions of outsider artworks with linguistic information. Machine Learning techniques will be leveraged to acquire knowledge from linguistic features embedded in both types of texts.

pdf bib
Toward the Automatic Retrieval and Annotation of Outsider Art images: A Preliminary Statement
John Roberto | Diego Ortego | Brian Davis
Proceedings of the 1st International Workshop on Artificial Intelligence for Historical Image Enrichment and Access

The aim of this position paper is to establish an initial approach to the automatic classification of digital images about the Outsider Art style of painting. Specifically, we explore whether is it possible to classify non-traditional artistic styles by using the same features that are used for classifying traditional styles? Our research question is motivated by two facts. First, art historians state that non-traditional styles are influenced by factors “outside” of the world of art. Second, some studies have shown that several artistic styles confound certain classification techniques. Following current approaches to style prediction, this paper utilises Deep Learning methods to encode image features. Our preliminary experiments have provided motivation to think that, as is the case with traditional styles, Outsider Art can be computationally modelled with objective means by using training datasets and CNN models. Nevertheless, our results are not conclusive due to the lack of a large available dataset on Outsider Art. Therefore, at the end of the paper, we have mapped future lines of action, which include the compilation of a large dataset of Outsider Art images and the creation of an ontology of Outsider Art.

2019

pdf bib
A Social Opinion Gold Standard for the Malta Government Budget 2018
Keith Cortis | Brian Davis
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

We present a gold standard of annotated social opinion for the Malta Government Budget 2018. It consists of over 500 online posts in English and/or the Maltese less-resourced language, gathered from social media platforms, specifically, social networking services and newswires, which have been annotated with information about opinions expressed by the general public and other entities, in terms of sentiment polarity, emotion, sarcasm/irony, and negation. This dataset is a resource for opinion mining based on social data, within the context of politics. It is the first opinion annotated social dataset from Malta, which has very limited language resources available.

pdf bib
CoSACT: A Collaborative Tool for Fine-Grained Sentiment Annotation and Consolidation of Text
Tobias Daudert | Manel Zarrouk | Brian Davis
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

2018

pdf bib
Indra: A Word Embedding and Semantic Relatedness Server
Juliano Efson Sales | Leonardo Souza | Siamak Barzegar | Brian Davis | André Freitas | Siegfried Handschuh
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Multilingual Test Collection for the Semantic Search of Entity Categories
Juliano Efson Sales | Siamak Barzegar | Wellington Franco | Bernhard Bermeitinger | Tiago Cunha | Brian Davis | André Freitas | Siegfried Handschuh
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs
Thomas Gaillat | Manel Zarrouk | André Freitas | Brian Davis
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages
Siamak Barzegar | Brian Davis | Manel Zarrouk | Siegfried Handschuh | Andre Freitas
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Implicit and Explicit Aspect Extraction in Financial Microblogs
Thomas Gaillat | Bernardo Stearns | Gopal Sridhar | Ross McDermott | Manel Zarrouk | Brian Davis
Proceedings of the First Workshop on Economics and Natural Language Processing

This paper focuses on aspect extraction which is a sub-task of Aspect-based Sentiment Analysis. The goal is to report an extraction method of financial aspects in microblog messages. Our approach uses a stock-investment taxonomy for the identification of explicit and implicit aspects. We compare supervised and unsupervised methods to assign predefined categories at message level. Results on 7 aspect classes show 0.71 accuracy, while the 32 class classification gives 0.82 accuracy for messages containing explicit aspects and 0.35 for implicit aspects.

pdf bib
FinSentiA: Sentiment Analysis in English Financial Microblogs
Thomas Gaillat | Annanda Sousa | Manel Zarrouk | Brian Davis
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

FinSentiA: Sentiment Analysis in English Financial Microblogs The objective of this paper is to report on the building of a Sentiment Analysis (SA) system dedicated to financial microblogs in English. The purpose of our work is to build a financial classifier that predicts the sentiment of stock investors in microblog platforms such as StockTwits and Twitter. Our contribution shows that it is possible to conduct such tasks in order to provide fine grained SA of financial microblogs. We extracted financial entities with relevant contexts and assigned scores on a continuous scale by adopting a deep learning method for the classification.

2017

pdf bib
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Keith Cortis | André Freitas | Tobias Daudert | Manuela Huerlimann | Manel Zarrouk | Siegfried Handschuh | Brian Davis
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper discusses the “Fine-Grained Sentiment Analysis on Financial Microblogs and News” task as part of SemEval-2017, specifically under the “Detecting sentiment, humour, and truth” theme. This task contains two tracks, where the first one concerns Microblog messages and the second one covers News Statements and Headlines. The main goal behind both tracks was to predict the sentiment score for each of the mentioned companies/stocks. The sentiment scores for each text instance adopted floating point values in the range of -1 (very negative/bearish) to 1 (very positive/bullish), with 0 designating neutral sentiment. This task attracted a total of 32 participants, with 25 participating in Track 1 and 29 in Track 2.

2016

pdf bib
Semantic Relation Classification: Task Formalisation and Refinement
Vivian Santos | Manuela Huerliman | Brian Davis | Siegfried Handschuh | André Freitas
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded, allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.

pdf bib
A Compositional-Distributional Semantic Model for Searching Complex Entity Categories
Juliano Efson Sales | André Freitas | Brian Davis | Siegfried Handschuh
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

2014

pdf bib
Proceedings of the Third Workshop on Semantic Web and Information Extraction
Diana Maynard | Marieke van Erp | Brian Davis
Proceedings of the Third Workshop on Semantic Web and Information Extraction

2013

pdf bib
Proceedings of the Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction
Diana Maynard | Marieke van Erp | Brian Davis | Petya Osenova | Kiril Simov | Georgi Georgiev | Preslav Nakov
Proceedings of the Joint Workshop on NLP&LOD and SWAIE: Semantic Web, Linked Open Data and Information Extraction

2010

pdf bib
Classifying Action Items for Semantic Email
Simon Scerri | Gerhard Gossen | Brian Davis | Siegfried Handschuh
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Email can be considered as a virtual working environment in which users are constantly struggling to manage the vast amount of exchanged data. Although most of this data belongs to well-defined workflows, these are implicit and largely unsupported by existing email clients. Semanta provides this support by enabling Semantic Email ― email enhanced with machine-processable metadata about specific types of email Action Items (e.g. Task Assignment, Meeting Proposal). In the larger picture, these items form part of ad-hoc workflows (e.g. Task Delegation, Meeting Scheduling). Semanta is faced with a knowledge-acquisition bottleneck, as users cannot be expected to annotate each action item, and their automatic recognition proves difficult. This paper focuses on applying computationally treatable aspects of speech act theory for the classification of email action items. A rule-based classification model is employed, based on the presence or form of a number of linguistic features. The technology’s evaluation suggests that whereas full automation is not feasible, the results are good enough to be presented as suggestions for the user to review. In addition the rule-based system will bootstrap a machine learning system that is currently in development, to generate the initial training sets which are then improved through the user’s reviewing.

pdf bib
A Use Case for Controlled Languages as Interfaces to Semantic Web Applications
Pradeep Dantuluri | Brian Davis | Siegfried Handschuh
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Although the Semantic web is steadily gaining in popularity, it remains a mystery to a large percentage of Internet users. This can be attributed to the complexity of the technologies that form its core. Creating intuitive interfaces which completely abstract the technologies underneath, is one way to solve this problem. A contrasting approach is to ease the user into understanding the technologies. We propose a solution which anchors on using controlled languages as interfaces to semantic web applications. This paper describes one such approach for the domain of meeting minutes, status reports and other project specific documents. A controlled language is developed along with an ontology to handle semi-automatic knowledge extraction. The contributions of this paper include an ontology designed for the domain of meeting minutes and status reports, and a controlled language grammar tailored for the above domain to perform the semi-automatic knowledge acquisition and generate RDF triples. This paper also describes two grammar prototypes, which were developed and evaluated prior to the development of the final grammar, as well as the Link grammar, which was the grammar formalism of choice.

2008

pdf bib
Evaluating the Ontology underlying sMail - the Conceptual Framework for Semantic Email Communication
Simon Scerri | Myriam Mencke | Brian Davis | Siegfried Handschuh
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The lack of structure in the content of email messages makes it very hard for data channelled between the sender and the recipient to be correctly interpreted and acted upon. As a result, the purposes of messages frequently end up not being fulfilled, prompting prolonged communication and stalling the disconnected workflow that is characteristic of email. This problem could be partially solved by extending the current email model to support light-weight semantics pertaining to the intents of the sender and the expectations from the recipient(s), thus leaving no room for ambiguity. Semantically-aware email clients will then be able to support the user with the workflow of email-generated tasks. In line with this thinking, we present the sMail Conceptual Framework. At its core, this framework has an Email Speech Act Model. Given this model, email content can be categorized into a set of speech acts, each carrying specific expectations. In this paper we present and discuss the methodology and results of this model?s statistical evaluation. By performing the same evaluation on another existing model, we demonstrate our model?s higher sophistication. After careful observations, we perform changes to the model and subsequently accommodate the changes in the revised sMail Conceptual Framework.

pdf bib
Linguistically Light Lexical Extensions for Ontologies
Brian Davis | Siegfried Handschuh | Alexander Troussov | John Judge | Mikhail Sogrin
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The identification of class instances within unstructured text for either the purposes of Ontology population or semantic annotation are usually limited to term mentions of Proper Noun and Personal Noun or fixed Key Phrases within Text Analytics or Ontology based Information Extraction(OBIE) applications. These systems do not generalize to cope with compound nominal classes of multi word expressions. Computational Linguistics’ approaches involving deep analysis tend to suffer from idiomaticity and overgeneration problems while the shallower “words with spaces” approach frequently employed in Information Extraction(IE) and Industrial Text Analytics systems lacks flexibility and is prone to lexical proliferation. We outline a representation for encoding light linguistic features of Compound Nominal term mentions of Concepts within an Ontology as well as a lightweight semantic annotator which complies the above linguistic information into efficient Dictionary formats to drive large scale identification and semantic annotation of the aforementioned concepts.