Alexander Gelbukh

Also published as: Alexander F. Gelbukh


2020

pdf bib
NLP-CIC at SemEval-2020 Task 9: Analysing Sentiment in Code-switching Language Using a Simple Deep-learning Classifier
Jason Angel | Segun Taofeek Aroyehun | Antonio Tamayo | Alexander Gelbukh
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Code-switching is a phenomenon in which two or more languages are used in the same message. Nowadays, it is quite common to find messages with languages mixed in social media. This phenomenon presents a challenge for sentiment analysis. In this paper, we use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages. Our simple approach achieved a F1-score of 0:71 on test set on the competition. We analyze our best model capabilities and perform error analysis to expose important difficulties for classifying sentiment in a code-switching setting.

pdf bib
COSMIC: COmmonSense knowledge for eMotion Identification in Conversations
Deepanway Ghosal | Navonil Majumder | Alexander Gelbukh | Rada Mihalcea | Soujanya Poria
Findings of the Association for Computational Linguistics: EMNLP 2020

In this paper, we address the task of utterance level emotion recognition in conversations using commonsense knowledge. We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations, and build upon them to learn interactions between interlocutors participating in a conversation. Current state-of-theart methods often encounter difficulties in context propagation, emotion shift detection, and differentiating between related emotion classes. By learning distinct commonsense representations, COSMIC addresses these challenges and achieves new state-of-the-art results for emotion recognition on four different benchmark conversational datasets. Our code is available at https://github.com/declare-lab/conv-emotion.

pdf bib
MIME: MIMicking Emotions for Empathetic Response Generation
Navonil Majumder | Pengfei Hong | Shanshan Peng | Jiankun Lu | Deepanway Ghosal | Alexander Gelbukh | Rada Mihalcea | Soujanya Poria
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Current approaches to empathetic response generation view the set of emotions expressed in the input text as a flat structure, where all the emotions are treated uniformly. We argue that empathetic responses often mimic the emotion of the user to a varying degree, depending on its positivity or negativity and content. We show that the consideration of these polarity-based emotion clusters and emotional mimicry results in improved empathy and contextual relevance of the response as compared to the state-of-the-art. Also, we introduce stochasticity into the emotion mixture that yields emotionally more varied empathetic responses than the previous work. We demonstrate the importance of these factors to empathetic response generation using both automatic- and human-based evaluations. The implementation of MIME is publicly available at https://github.com/declare-lab/MIME.

2019

pdf bib
DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation
Deepanway Ghosal | Navonil Majumder | Soujanya Poria | Niyati Chhaya | Alexander Gelbukh
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Emotion recognition in conversation (ERC) has received much attention, lately, from researchers due to its potential widespread applications in diverse areas, such as health-care, education, and human resources. In this paper, we present Dialogue Graph Convolutional Network (DialogueGCN), a graph neural network based approach to ERC. We leverage self and inter-speaker dependency of the interlocutors to model conversational context for emotion recognition. Through the graph network, DialogueGCN addresses context propagation issues present in the current RNN-based methods. We empirically show that this method alleviates such issues, while outperforming the current state of the art on a number of benchmark emotion classification datasets.

pdf bib
Detection of Adverse Drug Reaction in Tweets Using a Combination of Heterogeneous Word Embeddings
Segun Taofeek Aroyehun | Alexander Gelbukh
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

This paper details our approach to the task of detecting reportage of adverse drug reaction in tweets as part of the 2019 social media mining for healthcare applications shared task. We employed a combination of three types of word representations as input to a LSTM model. With this approach, we achieved an F1 score of 0.5209.

pdf bib
CIC at SemEval-2019 Task 5: Simple Yet Very Efficient Approach to Hate Speech Detection, Aggressive Behavior Detection, and Target Classification in Twitter
Iqra Ameer | Muhammad Hammad Fahim Siddiqui | Grigori Sidorov | Alexander Gelbukh
Proceedings of the 13th International Workshop on Semantic Evaluation

In recent years, the use of social media has in-creased incredibly. Social media permits Inter-net users a friendly platform to express their views and opinions. Along with these nice and distinct communication chances, it also allows bad things like usage of hate speech. Online automatic hate speech detection in various aspects is a significant scientific problem. This paper presents the Instituto Politécnico Nacional (Mexico) approach for the Semeval 2019 Task-5 [Hateval 2019] (Basile et al., 2019) competition for Multilingual Detection of Hate Speech on Twitter. The goal of this paper is to detect (A) Hate speech against immigrants and women, (B) Aggressive behavior and target classification, both for English and Spanish. In the proposed approach, we used a bag of words model with preprocessing (stem-ming and stop words removal). We submitted two different systems with names: (i) CIC-1 and (ii) CIC-2 for Hateval 2019 shared task. We used TF values in the first system and TF-IDF for the second system. The first system, CIC-1 got 2nd rank in subtask B for both English and Spanish languages with EMR score of 0.568 for English and 0.675 for Spanish. The second system, CIC-2 was ranked 4th in sub-task A and 1st in subtask B for Spanish language with a macro-F1 score of 0.727 and EMR score of 0.705 respectively.

2018

pdf bib
IARM: Inter-Aspect Relation Modeling with Memory Networks in Aspect-Based Sentiment Analysis
Navonil Majumder | Soujanya Poria | Alexander Gelbukh | Md. Shad Akhtar | Erik Cambria | Asif Ekbal
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Sentiment analysis has immense implications in e-commerce through user feedback mining. Aspect-based sentiment analysis takes this one step further by enabling businesses to extract aspect specific sentimental information. In this paper, we present a novel approach of incorporating the neighboring aspects related information into the sentiment classification of the target aspect using memory networks. We show that our method outperforms the state of the art by 1.6% on average in two distinct domains: restaurant and laptop.

pdf bib
Complex Word Identification: Convolutional Neural Network vs. Feature Engineering
Segun Taofeek Aroyehun | Jason Angel | Daniel Alejandro Pérez Alvarez | Alexander Gelbukh
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

We describe the systems of NLP-CIC team that participated in the Complex Word Identification (CWI) 2018 shared task. The shared task aimed to benchmark approaches for identifying complex words in English and other languages from the perspective of non-native speakers. Our goal is to compare two approaches: feature engineering and a deep neural network. Both approaches achieved comparable performance on the English test set. We demonstrated the flexibility of the deep-learning approach by using the same deep neural network setup in the Spanish track. Our systems achieved competitive results: all our systems were within 0.01 of the system with the best macro-F1 score on the test sets except on Wikipedia test set, on which our best system is 0.04 below the best macro-F1 score.

pdf bib
Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling
Segun Taofeek Aroyehun | Alexander Gelbukh
Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)

With the advent of the read-write web which facilitates social interactions in online spaces, the rise of anti-social behaviour in online spaces has attracted the attention of researchers. In this paper, we address the challenge of automatically identifying aggression in social media posts. Our team, saroyehun, participated in the English track of the Aggression Detection in Social Media Shared Task. On this task, we investigate the efficacy of deep neural network models of varying complexity. Our results reveal that deep neural network models require more data points to do better than an NBSVM linear baseline based on character n-grams. Our improved deep neural network models were trained on augmented data and pseudo labeled examples. Our LSTM classifier receives a weighted macro-F1 score of 0.6425 to rank first overall on the Facebook subtask of the shared task. On the social media sub-task, our CNN-LSTM model records a weighted macro-F1 score of 0.5920 to place third overall.

pdf bib
Automatic Identification of Drugs and Adverse Drug Reaction Related Tweets
Segun Taofeek Aroyehun | Alexander Gelbukh
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

We describe our submissions to the Third Social Media Mining for Health Applications Shared Task. We participated in two tasks (tasks 1 and 3). For both tasks, we experimented with a traditional machine learning model (Naive Bayes Support Vector Machine (NBSVM)), deep learning models (Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM)), and the combination of deep learning model with SVM. We observed that the NBSVM reaches superior performance on both tasks on our development split of the training data sets. Official result for task 1 based on the blind evaluation data shows that the predictions of the NBSVM achieved our team’s best F-score of 0.910 which is above the average score received by all submissions to the task. On task 3, the combination of of BiLSTM and SVM gives our best F-score for the positive class of 0.394.

2017

pdf bib
NITMZ-JU at IJCNLP-2017 Task 4: Customer Feedback Analysis
Somnath Banerjee | Partha Pakray | Riyanka Manna | Dipankar Das | Alexander Gelbukh
Proceedings of the IJCNLP 2017, Shared Tasks

In this paper, we describe a deep learning framework for analyzing the customer feedback as part of our participation in the shared task on Customer Feedback Analysis at the 8th International Joint Conference on Natural Language Processing (IJCNLP 2017). A Convolutional Neural Network (CNN) based deep neural network model was employed for the customer feedback task. The proposed system was evaluated on two languages, namely, English and French.

2016

pdf bib
JUNITMZ at SemEval-2016 Task 1: Identifying Semantic Similarity Using Levenshtein Ratio
Sandip Sarkar | Dipankar Das | Partha Pakray | Alexander Gelbukh
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis
Soujanya Poria | Erik Cambria | Alexander Gelbukh
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
UNAL-NLP: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment
Sergio Jimenez | George Dueñas | Julia Baquero | Alexander Gelbukh
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Open Information Extraction for Spanish Language based on Syntactic Constraints
Alisa Zhila | Alexander Gelbukh
Proceedings of the ACL 2014 Student Research Workshop

pdf bib
A Rule-Based Approach to Aspect Extraction from Product Reviews
Soujanya Poria | Erik Cambria | Lun-Wei Ku | Chen Gui | Alexander Gelbukh
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)

2013

pdf bib
Using Factual Density to Measure Informativeness of Web Documents
Christopher Horn | Alisa Zhila | Alexander Gelbukh | Roman Kern | Elisabeth Lex
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

pdf bib
SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity
Sergio Jimenez | Claudia Becerra | Alexander Gelbukh
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment from Cardinalities and SMT
Sergio Jimenez | Claudia Becerra | Alexander Gelbukh
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distributional Statistics and POS tags
Sergio Jimenez | Claudia Becerra | Alexander Gelbukh
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis
Sergio Jimenez | Claudia Becerra | Alexander Gelbukh
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Soft Cardinality: A Parameterized Similarity Function for Text Comparison
Sergio Jimenez | Claudia Becerra | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
JU_CSE_NLP: Multi-grade Classification of Semantic Similarity between Text Pairs
Snehasis Neogi | Partha Pakray | Sivaji Bandyopadhyay | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Soft Cardinality + ML: Learning Adaptive Similarity Functions for Cross-lingual Textual Entailment
Sergio Jimenez | Claudia Becerra | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
JU_CSE_NLP: Language Independent Cross-lingual Textual Entailment System
Snehasis Neogi | Partha Pakray | Sivaji Bandyopadhyay | Alexander Gelbukh
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2004

pdf bib
A Very Large Dictionary with Paradigmatic, Syntagmatic, and Paronymic
Igor Bolshakov | Alexander Gelbukh
Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries

2000

pdf bib
Book Reviews: Foundations of Computational Linguistics: Man-Machine Communication in Natural Language
Alexander F. Gelbukh
Computational Linguistics, Volume 26, Number 3, September 2000