Henrique Ferraz de Arruda
On the “Calligraphy” of Books
Vanessa Queiroz Marinho | Henrique Ferraz de Arruda | Thales Sinelli | Luciano da Fontoura Costa | Diego Raphael Amancio
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing
Authorship attribution is a natural language processing task that has been widely studied, often by considering small order statistics. In this paper, we explore a complex network approach to assign the authorship of texts based on their mesoscopic representation, in an attempt to capture the flow of the narrative. Indeed, as reported in this work, such an approach allowed the identification of the dominant narrative structure of the studied authors. This has been achieved due to the ability of the mesoscopic approach to take into account relationships between different, not necessarily adjacent, parts of the text, which is able to capture the story flow. The potential of the proposed approach has been illustrated through principal component analysis, a comparison with the chance baseline method, and network visualization. Such visualizations reveal individual characteristics of the authors, which can be understood as a kind of calligraphy.