Gabriella Lapesa


pdf bib
DEbateNet-mig15:Tracing the 2015 Immigration Debate in Germany Over Time
Gabriella Lapesa | Andre Blessing | Nico Blokker | Erenay Dayanik | Sebastian Haunss | Jonas Kuhn | Sebastian Padó
Proceedings of the 12th Language Resources and Evaluation Conference

DEbateNet-migr15 is a manually annotated dataset for German which covers the public debate on immigration in 2015. The building block of our annotation is the political science notion of a claim, i.e., a statement made by a political actor (a politician, a party, or a group of citizens) that a specific action should be taken (e.g., vacant flats should be assigned to refugees). We identify claims in newspaper articles, assign them to actors and fine-grained categories and annotate their polarity and date. The aim of this paper is two-fold: first, we release the full DEbateNet-mig15 corpus and document it by means of a quantitative and qualitative analysis; second, we demonstrate its application in a discourse network analysis framework, which enables us to capture the temporal dynamics of the political debate

pdf bib
Swimming with the Tide? Positional Claim Detection across Political Text Types
Nico Blokker | Erenay Dayanik | Gabriella Lapesa | Sebastian Padó
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Manifestos are official documents of political parties, providing a comprehensive topical overview of the electoral programs. Voters, however, seldom read them and often prefer other channels, such as newspaper articles, to understand the party positions on various policy issues. The natural question to ask is how compatible these two formats (manifesto and newspaper reports) are in their representation of party positioning. We address this question with an approach that combines political science (manual annotation and analysis) and natural language processing (supervised claim identification) in a cross-text type setting: we train a classifier on annotated newspaper data and test its performance on manifestos. Our findings show a) strong performance for supervised classification even across text types and b) a substantive overlap between the two formats in terms of party positioning, with differences regarding the salience of specific issues.


pdf bib
An Environment for Relational Annotation of Political Debates
Andre Blessing | Nico Blokker | Sebastian Haunss | Jonas Kuhn | Gabriella Lapesa | Sebastian Padó
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

This paper describes the MARDY corpus annotation environment developed for a collaboration between political science and computational linguistics. The tool realizes the complete workflow necessary for annotating a large newspaper text collection with rich information about claims (demands) raised by politicians and other actors, including claim and actor spans, relations, and polarities. In addition to the annotation GUI, the tool supports the identification of relevant documents, text pre-processing, user management, integration of external knowledge bases, annotation comparison and merging, statistical analysis, and the incorporation of machine learning models as “pseudo-annotators”.


pdf bib
Large-scale evaluation of dependency-based DSMs: Are they worth the effort?
Gabriella Lapesa | Stefan Evert
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper presents a large-scale evaluation study of dependency-based distributional semantic models. We evaluate dependency-filtered and dependency-structured DSMs in a number of standard semantic similarity tasks, systematically exploring their parameter space in order to give them a “fair shot” against window-based models. Our results show that properly tuned window-based DSMs still outperform the dependency-based models in most tasks. There appears to be little need for the language-dependent resources and computational cost associated with syntactic analysis.

pdf bib
Are doggies really nicer than dogs? The impact of morphological derivation on emotional valence in German
Gabriella Lapesa | Sebastian Padó | Tillmann Pross | Antje Rossdeutscher
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

pdf bib
Modeling Derivational Morphology in Ukrainian
Mariia Melymuka | Gabriella Lapesa | Max Kisselew | Sebastian Padó
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers


pdf bib
SemantiKLUE: Semantic Textual Similarity with Maximum Weight Matching
Nataliia Plotnikova | Gabriella Lapesa | Thomas Proisl | Stefan Evert
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)


pdf bib
Contrasting Syntagmatic and Paradigmatic Relations: Insights from Distributional Semantic Models
Gabriella Lapesa | Stefan Evert | Sabine Schulte im Walde
Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

pdf bib
NaDiR: Naive Distributional Response Generation
Gabriella Lapesa | Stefan Evert
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf bib
A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection
Gabriella Lapesa | Stefan Evert
Transactions of the Association for Computational Linguistics, Volume 2

This paper presents the results of a large-scale evaluation study of window-based Distributional Semantic Models on a wide variety of tasks. Our study combines a broad coverage of model parameters with a model selection methodology that is robust to overfitting and able to capture parameter interactions. We show that our strategy allows us to identify parameter configurations that achieve good performance across different datasets and tasks.


pdf bib
Evaluating Neighbor Rank and Distance Measures as Predictors of Semantic Priming
Gabriella Lapesa | Stefan Evert
Proceedings of the Fourth Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL)


pdf bib
LexIt: A Computational Resource on Italian Argument Structure
Alessandro Lenci | Gabriella Lapesa | Giulia Bonansinga
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The aim of this paper is to introduce LexIt, a computational framework for the automatic acquisition and exploration of distributional information about Italian verbs, nouns and adjectives, freely available through a web interface at the address LexIt is the first large-scale resource for Italian in which subcategorization and semantic selection properties are characterized fully on distributional ground: in the paper we describe both the process of data extraction and the evaluation of the subcategorization frames extracted with LexIt.


pdf bib
Building an Italian FrameNet through Semi-automatic Corpus Analysis
Alessandro Lenci | Martina Johnson | Gabriella Lapesa
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

n this paper, we outline the methodology we adopted to develop a FrameNet for Italian. The main element of novelty with respect to the original FrameNet is represented by the fact that the creation and annotation of Lexical Units is strictly grounded in distributional information (statistical distribution of verbal subcategorization frames, lexical and semantic preferences of each frame) automatically acquired from a large, dependency-parsed corpus. We claim that this approach allows us to overcome some of the shortcomings of the classical lexicographic method used to create FrameNet, by complementing the accuracy of manual annotation with the robustness of data on the global distributional patterns of a verb. In the paper, we describe our method for extracting distributional data from the corpus and the way we used it for the encoding and annotation of LUs. The long-term goal of our project is to create an electronic lexicon for Italian similar to the original English FrameNet. For the moment, we have developed a database of syntactic valences that will be made freely accessible via a web interface. This represents an autonomous resource besides the FrameNet lexicon, of which we have a beginning nucleus consisting of 791 annotated sentences.