Christin Beck


2020

pdf bib
DiaSense at SemEval-2020 Task 1: Modeling Sense Change via Pre-trained BERT Embeddings
Christin Beck
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes DiaSense, a system developed for Task 1 ‘Unsupervised Lexical Semantic Change Detection’ of SemEval 2020. In DiaSense, contextualized word embeddings are used to model word sense changes. This allows for the calculation of metrics which mimic human intuitions about the semantic relatedness between individual use pairs of a target word for the assessment of lexical semantic change. DiaSense is able to detect lexical semantic change in English, German, Latin and Swedish (accuracy = 0.728). Moreover, DiaSense differentiates between weak and strong change.

pdf bib
Representation Problems in Linguistic Annotations: Ambiguity, Variation, Uncertainty, Error and Bias
Christin Beck | Hannah Booth | Mennatallah El-Assady | Miriam Butt
Proceedings of the 14th Linguistic Annotation Workshop

The development of linguistic corpora is fraught with various problems of annotation and representation. These constitute a very real challenge for the development and use of annotated corpora, but as yet not much literature exists on how to address the underlying problems. In this paper, we identify and discuss five sources of representation problems, which are independent though interrelated: ambiguity, variation, uncertainty, error and bias. We outline and characterize these sources, discussing how their improper treatment can have stark consequences for research outcomes. Finally, we discuss how an adequate treatment can inform corpus-related linguistic research, both computational and theoretical, improving the reliability of research results and NLP models, as well as informing the more general reproducibility issue.