Disambiguation of Potentially Idiomatic Expressions with Contextual Embeddings

Murathan Kurfalı, Robert Östling


Abstract
The majority of multiword expressions can be interpreted as figuratively or literally in different contexts which pose challenges in a number of downstream tasks. Most previous work deals with this ambiguity following the observation that MWEs with different usages occur in distinctly different contexts. Following this insight, we explore the usefulness of contextual embeddings by means of both supervised and unsupervised classification. The results show that in the supervised setting, the state-of-the-art can be substantially improved for all expressions in the experiments. The unsupervised classification, similarly, yields very impressive results, comparing favorably to the supervised classifier for the majority of the expressions. We also show that multilingual contextual embeddings can also be employed for this task without leading to any significant loss in performance; hence, the proposed methodology has the potential to be extended to a number of languages.
Anthology ID:
2020.mwe-1.11
Volume:
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Month:
December
Year:
2020
Address:
online
Venues:
COLING | MWE
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
85–94
Language:
URL:
https://www.aclweb.org/anthology/2020.mwe-1.11
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
http://aclanthology.lst.uni-saarland.de/2020.mwe-1.11.pdf