François Rousseau


2017

pdf bib
Multivariate Gaussian Document Representation from Word Embeddings for Text Categorization
Giannis Nikolentzos | Polykarpos Meladianos | François Rousseau | Yannis Stavrakas | Michalis Vazirgiannis
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Recently, there has been a lot of activity in learning distributed representations of words in vector spaces. Although there are models capable of learning high-quality distributed representations of words, how to generate vector representations of the same quality for phrases or documents still remains a challenge. In this paper, we propose to model each document as a multivariate Gaussian distribution based on the distributed representations of its words. We then measure the similarity between two documents based on the similarity of their distributions. Experiments on eight standard text categorization datasets demonstrate the effectiveness of the proposed approach in comparison with state-of-the-art methods.

pdf bib
Shortest-Path Graph Kernels for Document Similarity
Giannis Nikolentzos | Polykarpos Meladianos | François Rousseau | Yannis Stavrakas | Michalis Vazirgiannis
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we present a novel document similarity measure based on the definition of a graph kernel between pairs of documents. The proposed measure takes into account both the terms contained in the documents and the relationships between them. By representing each document as a graph-of-words, we are able to model these relationships and then determine how similar two documents are by using a modified shortest-path graph kernel. We evaluate our approach on two tasks and compare it against several baseline approaches using various performance metrics such as DET curves and macro-average F1-score. Experimental results on a range of datasets showed that our proposed approach outperforms traditional techniques and is capable of measuring more accurately the similarity between two documents.

2016

pdf bib
Regularizing Text Categorization with Clusters of Words
Konstantinos Skianis | François Rousseau | Michalis Vazirgiannis
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Convolutional Sentence Kernel from Word Embeddings for Short Text Categorization
Jonghoon Kim | François Rousseau | Michalis Vazirgiannis
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Text Categorization as a Graph Classification Problem
François Rousseau | Emmanouil Kiagias | Michalis Vazirgiannis
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)