Fragkiskos D. Malliaros


pdf bib
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)
Dmitry Ustalov | Swapna Somasundaran | Alexander Panchenko | Fragkiskos D. Malliaros | Ioana Hulpuș | Peter Jansen | Abhik Jana
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)


Graph-based Text Representations: Boosting Text Mining, NLP and Information Retrieval with Graphs
Fragkiskos D. Malliaros | Michalis Vazirgiannis
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Graphs or networks have been widely used as modeling tools in Natural Language Processing (NLP), Text Mining (TM) and Information Retrieval (IR). Traditionally, the unigram bag-of-words representation is applied; that way, a document is represented as a multiset of its terms, disregarding dependencies between the terms. Although several variants and extensions of this modeling approach have been proposed (e.g., the n-gram model), the main weakness comes from the underlying term independence assumption. The order of the terms within a document is completely disregarded and any relationship between terms is not taken into account in the final task (e.g., text categorization). Nevertheless, as the heterogeneity of text collections is increasing (especially with respect to document length and vocabulary), the research community has started exploring different document representations aiming to capture more fine-grained contexts of co-occurrence between different terms, challenging the well-established unigram bag-of-words model. To this direction, graphs constitute a well-developed model that has been adopted for text representation. The goal of this tutorial is to offer a comprehensive presentation of recent methods that rely on graph-based text representations to deal with various tasks in NLP and IR. We will describe basic as well as novel graph theoretic concepts and we will examine how they can be applied in a wide range of text-related application domains.All the material associated to the tutorial will be available at: