André Blessing

Also published as: Andre Blessing


pdf bib
DEbateNet-mig15:Tracing the 2015 Immigration Debate in Germany Over Time
Gabriella Lapesa | Andre Blessing | Nico Blokker | Erenay Dayanik | Sebastian Haunss | Jonas Kuhn | Sebastian Padó
Proceedings of the 12th Language Resources and Evaluation Conference

DEbateNet-migr15 is a manually annotated dataset for German which covers the public debate on immigration in 2015. The building block of our annotation is the political science notion of a claim, i.e., a statement made by a political actor (a politician, a party, or a group of citizens) that a specific action should be taken (e.g., vacant flats should be assigned to refugees). We identify claims in newspaper articles, assign them to actors and fine-grained categories and annotate their polarity and date. The aim of this paper is two-fold: first, we release the full DEbateNet-mig15 corpus and document it by means of a quantitative and qualitative analysis; second, we demonstrate its application in a discourse network analysis framework, which enables us to capture the temporal dynamics of the political debate


pdf bib
Who Sides with Whom? Towards Computational Construction of Discourse Networks for Political Debates
Sebastian Padó | Andre Blessing | Nico Blokker | Erenay Dayanik | Sebastian Haunss | Jonas Kuhn
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Understanding the structures of political debates (which actors make what claims) is essential for understanding democratic political decision making. The vision of computational construction of such discourse networks from newspaper reports brings together political science and natural language processing. This paper presents three contributions towards this goal: (a) a requirements analysis, linking the task to knowledge base population; (b) an annotated pilot corpus of migration claims based on German newspaper reports; (c) initial modeling results.

pdf bib
An Environment for Relational Annotation of Political Debates
Andre Blessing | Nico Blokker | Sebastian Haunss | Jonas Kuhn | Gabriella Lapesa | Sebastian Padó
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

This paper describes the MARDY corpus annotation environment developed for a collaboration between political science and computational linguistics. The tool realizes the complete workflow necessary for annotating a large newspaper text collection with rich information about claims (demands) raised by politicians and other actors, including claim and actor spans, relations, and polarities. In addition to the annotation GUI, the tool supports the identification of relevant documents, text pre-processing, user management, integration of external knowledge bases, annotation comparison and merging, statistical analysis, and the incorporation of machine learning models as “pseudo-annotators”.


pdf bib
The GermaParl Corpus of Parliamentary Protocols
Andreas Blätte | Andre Blessing
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
An End-to-end Environment for Research Question-Driven Entity Extraction and Network Analysis
Andre Blessing | Nora Echelmeyer | Markus John | Nils Reiter
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper presents an approach to extract co-occurrence networks from literary texts. It is a deliberate decision not to aim for a fully automatic pipeline, as the literary research questions need to guide both the definition of the nature of the things that co-occur as well as how to decide co-occurrence. We showcase the approach on a Middle High German romance, Parzival. Manual inspection and discussion shows the huge impact various choices have.


pdf bib
Towards a text analysis system for political debates
Dieu-Thu Le | Ngoc Thang Vu | Andre Blessing
Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities


pdf bib
Textual Emigration Analysis (TEA)
Andre Blessing | Jonas Kuhn
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a web-based application which is called TEA (Textual Emigration Analysis) as a showcase that applies textual analysis for the humanities. The TEA tool is used to transform raw text input into a graphical display of emigration source and target countries (under a global or an individual perspective). It provides emigration-related frequency information, and gives access to individual textual sources, which can be downloaded by the user. Our application is built on top of the CLARIN infrastructure which targets researchers of the humanities. In our scenario, we focus on historians, literary scientists, and other social scientists that are interested in the semantic interpretation of text. Our application processes a large set of documents to extract information about people who emigrated. The current implementation integrates two data sets: A data set from the Global Migrant Origin Database, which does not need additional processing, and a data set which was extracted from the German Wikipedia edition. The TEA tool can be accessed by using the following URL:

pdf bib
The eIdentity Text Exploration Workbench
Fritz Kliche | André Blessing | Ulrich Heid | Jonathan Sonntag
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We work on tools to explore text contents and metadata of newspaper articles as provided by news archives. Our tool components are being integrated into an “Exploration Workbench” for Digital Humanities researchers. Next to the conversion of different data formats and character encodings, a prominent feature of our design is its “Wizard” function for corpus building: Researchers import raw data and define patterns to extract text contents and metadata. The Workbench also comprises different tools for data cleaning. These include filtering of off-topic articles, duplicates and near-duplicates, corrupted and empty articles. We currently work on ca. 860.000 newspaper articles from different media archives, provided in different data formats. We index the data with state-of-the-art systems to allow for large scale information retrieval. We extract metadata on publishing dates, author names, newspaper sections, etc., and split articles into segments such as headlines, subtitles, paragraphs, etc. After cleaning the data and compiling a thematically homogeneous corpus, the sample can be used for quantitative analyses which are not affected by noise. Users can retrieve sets of articles on different topics, issues or otherwise defined research questions (“subcorpora”) and investigate quantitatively their media attention on the timeline (“Issue Cycles”).


pdf bib
Towards a Tool for Interactive Concept Building for Large Scale Analysis in the Humanities
Andre Blessing | Jonathan Sonntag | Fritz Kliche | Ulrich Heid | Jonas Kuhn | Manfred Stede
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities


pdf bib
Self-Annotation for fine-grained geospatial relation extraction
Andre Blessing | Hinrich Schütze
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Fine-Grained Geographical Relation Extraction from Wikipedia
Andre Blessing | Hinrich Schütze
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present work on enhancing the basic data resource of a context-aware system. Electronic text offers a wealth of information about geospatial data and can be used to improve the completeness and accuracy of geospatial resources (e.g., gazetteers). First, we introduce a supervised approach to extracting geographical relations on a fine-grained level. Second, we present a novel way of using Wikipedia as a corpus based on self-annotation. A self-annotation is an automatically created high-quality annotation that can be used for training and evaluation. Wikipedia contains two types of different context: (i) unstructured text and (ii) structured data: templates (e.g., infoboxes about cities), lists and tables. We use the structured data to annotate the unstructured text. Finally, the extracted fine-grained relations are used to complete gazetteer data. The precision and recall scores of more than 97 percent confirm that a statistical IE pipeline can be used to improve the data quality of community-based resources.