Ehsaneddin Asgari


2020

pdf bib
EmbLexChange at SemEval-2020 Task 1: Unsupervised Embedding-based Detection of Lexical Semantic Changes
Ehsaneddin Asgari | Christoph Ringlstetter | Hinrich Schütze
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes EmbLexChange, a system introduced by the “Life-Language” team for SemEval-2020 Task 1, on unsupervised detection of lexical-semantic changes. EmbLexChange is defined as the divergence between the embedding based profiles of word w (calculated with respect to a set of reference words) in the source and the target domains (source and target domains can be simply two time frames t_1 and t_2). The underlying assumption is that the lexical-semantic change of word w would affect its co-occurring words and subsequently alters the neighborhoods in the embedding spaces. We show that using a resampling framework for the selection of reference words (with conserved senses), we can more reliably detect lexical-semantic changes in English, German, Swedish, and Latin. EmbLexChange achieved second place in the binary detection of semantic changes in the SemEval-2020.

pdf bib
UniSent: Universal Adaptable Sentiment Lexica for 1000+ Languages
Ehsaneddin Asgari | Fabienne Braune | Benjamin Roth | Christoph Ringlstetter | Mohammad Mofrad
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we introduce UniSent universal sentiment lexica for 1000+ languages. Sentiment lexica are vital for sentiment analysis in absence of document-level annotations, a very common scenario for low-resource languages. To the best of our knowledge, UniSent is the largest sentiment resource to date in terms of the number of covered languages, including many low resource ones. In this work, we use a massively parallel Bible corpus to project sentiment information from English to other languages for sentiment analysis on Twitter data. We introduce a method called DomDrift to mitigate the huge domain mismatch between Bible and Twitter by a confidence weighting scheme that uses domain-specific embeddings to compare the nearest neighbors for a candidate sentiment word in the source (Bible) and target (Twitter) domain. We evaluate the quality of UniSent in a subset of languages for which manually created ground truth was available, Macedonian, Czech, German, Spanish, and French. We show that the quality of UniSent is comparable to manually created sentiment resources when it is used as the sentiment seed for the task of word sentiment prediction on top of embedding representations. In addition, we show that emoticon sentiments could be reliably predicted in the Twitter domain using only UniSent and monolingual embeddings in German, Spanish, French, and Italian. With the publication of this paper, we release the UniSent sentiment lexica at http://language-lab.info/unisent.

2017

pdf bib
Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages
Ehsaneddin Asgari | Hinrich Schütze
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting – to the best of our knowledge – the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.

2016

pdf bib
Text Analysis and Automatic Triage of Posts in a Mental Health Forum
Ehsaneddin Asgari | Soroush Nasiriany | Mohammad R.K. Mofrad
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance
Ehsaneddin Asgari | Mohammad R.K. Mofrad
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP

2013

pdf bib
Linguistic Resources and Topic Models for the Analysis of Persian Poems
Ehsaneddin Asgari | Jean-Cédric Chappelier
Proceedings of the Workshop on Computational Linguistics for Literature