Dominik Schlechtweg


2020

pdf bib
Predicting Degrees of Technicality in Automatic Terminology Extraction
Anna Hätty | Dominik Schlechtweg | Michael Dorna | Sabine Schulte im Walde
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

While automatic term extraction is a well-researched area, computational approaches to distinguish between degrees of technicality are still understudied. We semi-automatically create a German gold standard of technicality across four domains, and illustrate the impact of a web-crawled general-language corpus on technicality prediction. When defining a classification approach that combines general-language and domain-specific word embeddings, we go beyond previous work and align vector spaces to gain comparative embeddings. We suggest two novel models to exploit general- vs. domain-specific comparisons: a simple neural network model with pre-computed comparative-embedding information as input, and a multi-channel model computing the comparison internally. Both models outperform previous approaches, with the multi-channel model performing best.

pdf bib
SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
Dominik Schlechtweg | Barbara McGillivray | Simon Hengchen | Haim Dubossarsky | Nina Tahmasebi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders progress. We present the results of the first shared task that addresses this gap by providing researchers with an evaluation framework and manually annotated, high-quality datasets for English, German, Latin, and Swedish. 33 teams submitted 186 systems, which were evaluated on two subtasks.

pdf bib
IMS at SemEval-2020 Task 1: How Low Can You Go? Dimensionality in Lexical Semantic Change Detection
Jens Kaiser | Dominik Schlechtweg | Sean Papay | Sabine Schulte im Walde
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the results of our system for SemEval-2020 Task 1 that exploits a commonly used lexical semantic change detection model based on Skip-Gram with Negative Sampling. Our system focuses on Vector Initialization (VI) alignment, compares VI to the currently top-ranking models for Subtask 2 and demonstrates that these can be outperformed if we optimize VI dimensionality. We demonstrate that differences in performance can largely be attributed to model-specific sources of noise, and we reveal a strong relationship between dimensionality and frequency-induced noise in VI alignment. Our results suggest that lexical semantic change models integrating vector space alignment should pay more attention to the role of the dimensionality parameter.

pdf bib
CCOHA: Clean Corpus of Historical American English
Reem Alatrash | Dominik Schlechtweg | Jonas Kuhn | Sabine Schulte im Walde
Proceedings of the 12th Language Resources and Evaluation Conference

Modelling language change is an increasingly important area of interest within the fields of sociolinguistics and historical linguistics. In recent years, there has been a growing number of publications whose main concern is studying changes that have occurred within the past centuries. The Corpus of Historical American English (COHA) is one of the most commonly used large corpora in diachronic studies in English. This paper describes methods applied to the downloadable version of the COHA corpus in order to overcome its main limitations, such as inconsistent lemmas and malformed tokens, without compromising its qualitative and distributional properties. The resulting corpus CCOHA contains a larger number of cleaned word tokens which can offer better insights into language change and allow for a larger variety of tasks to be performed.

2019

pdf bib
Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change
Haim Dubossarsky | Simon Hengchen | Nina Tahmasebi | Dominik Schlechtweg
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

State-of-the-art models of lexical semantic change detection suffer from noise stemming from vector space alignment. We have empirically tested the Temporal Referencing method for lexical semantic change and show that, by avoiding alignment, it is less affected by this noise. We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing outperforms alignment models on a synthetic task as well as a manual testset. We introduce a principled way to simulate lexical semantic change and systematically control for possible biases.

pdf bib
A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains
Dominik Schlechtweg | Anna Hätty | Marco Del Tredici | Sabine Schulte im Walde
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains. Our work addresses the superficialness and lack of comparison in assessing models of diachronic lexical change, by bringing together and extending benchmark models on a common state-of-the-art evaluation task. In addition, we demonstrate that the same evaluation task and modelling approaches can successfully be utilised for the synchronic detection of domain-specific sense divergences in the field of term extraction.

pdf bib
Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling
Dominik Schlechtweg | Cennet Oguz | Sabine Schulte im Walde
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

We simulate first- and second-order context overlap and show that Skip-Gram with Negative Sampling is similar to Singular Value Decomposition in capturing second-order co-occurrence information, while Pointwise Mutual Information is agnostic to it. We support the results with an empirical study finding that the models react differently when provided with additional second-order information. Our findings reveal a basic property of Skip-Gram with Negative Sampling and point towards an explanation of its success on a variety of tasks.

pdf bib
SURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction
Anna Hätty | Dominik Schlechtweg | Sabine Schulte im Walde
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

We introduce SURel, a novel dataset with human-annotated meaning shifts between general-language and domain-specific contexts. We show that meaning shifts of term candidates cause errors in term extraction, and demonstrate that the SURel annotation reflects these errors. Furthermore, we illustrate that SURel enables us to assess optimisations of term extraction techniques when incorporating meaning shifts.

2018

pdf bib
Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change
Dominik Schlechtweg | Sabine Schulte im Walde | Stefanie Eckmann
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

We propose a framework that extends synchronic polysemy annotation to diachronic changes in lexical meaning, to counteract the lack of resources for evaluating computational models of lexical semantic change. Our framework exploits an intuitive notion of semantic relatedness, and distinguishes between innovative and reductive meaning changes with high inter-annotator agreement. The resulting test set for German comprises ratings from five annotators for the relatedness of 1,320 use pairs across 22 target words.

2017

pdf bib
German in Flux: Detecting Metaphoric Change via Word Entropy
Dominik Schlechtweg | Stefanie Eckmann | Enrico Santus | Sabine Schulte im Walde | Daniel Hole
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper explores the information-theoretic measure entropy to detect metaphoric change, transferring ideas from hypernym detection to research on language change. We build the first diachronic test set for German as a standard for metaphoric change annotation. Our model is unsupervised, language-independent and generalizable to other processes of semantic change.

pdf bib
Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection
Vered Shwartz | Enrico Santus | Dominik Schlechtweg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The fundamental role of hypernymy in NLP has motivated the development of many methods for the automatic identification of this relation, most of which rely on word distribution. We investigate an extensive number of such unsupervised measures, using several distributional semantic models that differ by context type and feature weighting. We analyze the performance of the different methods based on their linguistic motivation. Comparison to the state-of-the-art supervised methods shows that while supervised methods generally outperform the unsupervised ones, the former are sensitive to the distribution of training instances, hurting their reliability. Being based on general linguistic hypotheses and independent from training data, unsupervised measures are more robust, and therefore are still useful artillery for hypernymy detection.

2016

pdf bib
Exploitation of Co-reference in Distributional Semantics
Dominik Schlechtweg
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The aim of distributional semantics is to model the similarity of the meaning of words via the words they occur with. Thereby, it relies on the distributional hypothesis implying that similar words have similar contexts. Deducing meaning from the distribution of words is interesting as it can be done automatically on large amounts of freely available raw text. It is because of this convenience that most current state-of-the-art-models of distributional semantics operate on raw text, although there have been successful attempts to integrate other kinds of―e.g., syntactic―information to improve distributional semantic models. In contrast, less attention has been paid to semantic information in the research community. One reason for this is that the extraction of semantic information from raw text is a complex, elaborate matter and in great parts not yet satisfyingly solved. Recently, however, there have been successful attempts to integrate a certain kind of semantic information, i.e., co-reference. Two basically different kinds of information contributed by co-reference with respect to the distribution of words will be identified. We will then focus on one of these and examine its general potential to improve distributional semantic models as well as certain more specific hypotheses.