Proceedings of the Student Research Workshop Associated with RANLP 2017
- Anthology ID:
- INCOMA Ltd.
The present paper considers the problem of dietary conflict detection from dish titles. The proposed method explores the semantics associated with the dish title in order to discover a certain or possible incompatibility of a particular dish with a particular diet. Dish titles are parts of the elusive and metaphoric gastronomy language, their processing can be viewed as a combination of short text and domain-specific texts analysis. We build our algorithm on the basis of a common knowledge lexical semantic network and show how such network can be used for domain specific short text processing.
In today’s world, globalisation is not only affecting inter-culturalism but also linking markets across the globe. Given that all markets are affecting each other and are not only driven by fundamental data but also by sentiments, sentiment analysis regarding the markets becomes a tool to predict, anticipate, and milden future economic crises such as the one we faced in 2008. In this paper, an approach to improve sentiment analysis by exploiting relationships among different kinds of sentiment, together with supplementary information, from and across various data sources is proposed.
There is no agreed upon standard for the evaluation of conversational dialog systems, which are well-known to be hard to evaluate due to the difficulty in pinning down metrics that will correspond to human judgements and the subjective nature of human judgment itself. We explored the possibility of using Grice’s Maxims to evaluate effective communication in conversation. We collected some system generated dialogs from popular conversational chatbots across the spectrum and conducted a survey to see how the human judgements based on Gricean maxims correlate, and if such human judgments can be used as an effective evaluation metric for conversational dialog.
This paper presents a neural network architecture for word sense disambiguation (WSD). The architecture employs recurrent neural layers and more specifically LSTM cells, in order to capture information about word order and to easily incorporate distributed word representations (embeddings) as features, without having to use a fixed window of text. The paper demonstrates that the architecture is able to compete with the most successful supervised systems for WSD and that there is an abundance of possible improvements to take it to the current state of the art. In addition, it explores briefly the potential of combining different types of embeddings as input features; it also discusses possible ways for generating “artificial corpora” from knowledge bases – for the purpose of producing training data and in relation to possible applications of embedding lemmas and word senses in the same space.
A multi-document summarizer finds the key topics from multiple textual sources and organizes information around them. In this paper we propose a summarization method for Persian text using paragraph vectors that can represent textual units of arbitrary lengths. We use these vectors to calculate the semantic relatedness between documents, cluster them to a number of predetermined groups, weight them based on their distance to the centroids and the intra-cluster homogeneity and take out the key paragraphs. We compare the final summaries with the gold-standard summaries of 21 digital topics using the ROUGE evaluation metric. Experimental results show the advantages of using paragraph vectors over earlier attempts at developing similar methods for a low resource language like Persian.
Over the past few years a lot of research has been done on sentiment analysis, however, the emotional analysis, being so subjective, is not a well examined dis-cipline. The main focus of this proposal is to categorize a given sentence in two dimensions - sentiment and arousal. For this purpose two techniques will be com-bined – Machine Learning approach and Lexicon-based approach. The first di-mension will give the sentiment value – positive versus negative. This will be re-solved by using Naïve Bayes Classifier. The second and more interesting dimen-sion will determine the level of arousal. This will be achieved by evaluation of given a phrase or sentence based on lexi-con with affective ratings for 14 thousand English words.
The paper aims to achieve the legal question answering information retrieval (IR) task at Competition on Legal Information Extraction/Entailment (COLIEE) 2017. Our proposal methodology for the task is to utilize deep neural network, natural language processing and word2vec. The system was evaluated using training and testing data from the competition on legal information extraction/entailment (COLIEE). Our system mainly focuses on giving relevant civil law articles for given bar exams. The corpus of legal questions is drawn from Japanese Legal Bar exam queries. We implemented a combined deep neural network with additional features NLP and word2vec to gain the corresponding civil law articles based on a given bar exam ‘Yes/No’ questions. This paper focuses on clustering words-with-relation in order to acquire relevant civil law articles. All evaluation processes were done on the COLIEE 2017 training and test data set. The experimental result shows a very promising result.