Mohammed Hasanuzzaman


pdf bib
CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media
Sayanta Paul | Sriparna Saha | Mohammed Hasanuzzaman
Proceedings of the Fourteenth Workshop on Semantic Evaluation

The SemEval-2020 Task 12 (OffensEval) challenge focuses on detection of signs of offensiveness using posts or comments over social media. This task has been organized for several languages, e.g., Arabic, Danish, English, Greek and Turkish. It has featured three related sub-tasks for English language: sub-task A was to discriminate between offensive and non-offensive posts, the focus of sub-task B was on the type of offensive content in the post and finally, in sub-task C, proposed systems had to identify the target of the offensive posts. The corpus for each of the languages is developed using the posts and comments over Twitter, a popular social media platform. We have participated in this challenge and submitted results for different languages. The current work presents different machine learning and deep learning techniques and analyzes their performance for offensiveness prediction which involves various classifiers and feature engineering schemes. The experimental analysis on the training set shows that SVM using language specific pre-trained word embedding (Fasttext) outperforms the other methods. Our system achieves a macro-averaged F1 score of 0.45 for Arabic language, 0.43 for Greek language and 0.54 for Turkish language.


pdf bib
Fine-Grained Temporal Orientation and its Relationship with Psycho-Demographic Correlates
Sabyasachi Kamila | Mohammed Hasanuzzaman | Asif Ekbal | Pushpak Bhattacharyya | Andy Way
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Temporal orientation refers to an individual’s tendency to connect to the psychological concepts of past, present or future, and it affects personality, motivation, emotion, decision making and stress coping processes. The study of the social media users’ psycho-demographic attributes from the perspective of human temporal orientation can be of utmost interest and importance to the business and administrative decision makers as it can provide an extra precious information for them to make informed decisions. In this paper, we propose a very first study to demonstrate the association between the sentiment view of the temporal orientation of the users and their different psycho-demographic attributes by analyzing their tweets. We first create a temporal orientation classifier in a minimally supervised way which classifies each tweet of the users in one of the three temporal categories, namely past, present, and future. A deep Bi-directional Long Short Term Memory (BLSTM) is used for the tweet classification task. Our tweet classifier achieves an accuracy of 78.27% when tested on a manually created test set. We then determine the users’ overall temporal orientation based on their tweets on the social media. The sentiment is added to the tweets at the fine-grained level where each temporal tweet is given a sentiment with either of the positive, negative or neutral. Our experiment reveals that depending upon the sentiment view of temporal orientation, a user’s attributes vary. We finally measure the correlation between the users’ sentiment view of temporal orientation and their different psycho-demographic factors using regression.

pdf bib
Incorporating Deep Visual Features into Multiobjective based Multi-view Search Results Clustering
Sayantan Mitra | Mohammed Hasanuzzaman | Sriparna Saha | Andy Way
Proceedings of the 27th International Conference on Computational Linguistics

Current paper explores the use of multi-view learning for search result clustering. A web-snippet can be represented using multiple views. Apart from textual view cued by both the semantic and syntactic information, a complimentary view extracted from images contained in the web-snippets is also utilized in the current framework. A single consensus partitioning is finally obtained after consulting these two individual views by the deployment of a multiobjective based clustering technique. Several objective functions including the values of a cluster quality measure measuring the goodness of partitionings obtained using different views and an agreement-disagreement index, quantifying the amount of oneness among multiple views in generating partitionings are optimized simultaneously using AMOSA. In order to detect the number of clusters automatically, concepts of variable length solutions and a vast range of permutation operators are introduced in the clustering process. Finally, a set of alternative partitioning are obtained on the final Pareto front by the proposed multi-view based multiobjective technique. Experimental results by the proposed approach on several benchmark test datasets of SRC with respect to different performance metrics evidently establish the power of visual and text-based views in achieving better search result clustering.

pdf bib
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data
Koel Dutta Chowdhury | Mohammed Hasanuzzaman | Qun Liu
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP

In this paper, we investigate the effectiveness of training a multimodal neural machine translation (MNMT) system with image features for a low-resource language pair, Hindi and English, using synthetic data. A three-way parallel corpus which contains bilingual texts and corresponding images is required to train a MNMT system with image features. However, such a corpus is not available for low resource language pairs. To address this, we developed both a synthetic training dataset and a manually curated development/test dataset for Hindi based on an existing English-image parallel corpus. We used these datasets to build our image description translation system by adopting state-of-the-art MNMT models. Our results show that it is possible to train a MNMT system for low-resource language pairs through the use of synthetic data and that such a system can benefit from image features.


pdf bib
Temporal Orientation of Tweets for Predicting Income of Users
Mohammed Hasanuzzaman | Sabyasachi Kamila | Mandeep Kaur | Sriparna Saha | Asif Ekbal
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Automatically estimating a user’s socio-economic profile from their language use in social media can significantly help social science research and various downstream applications ranging from business to politics. The current paper presents the first study where user cognitive structure is used to build a predictive model of income. In particular, we first develop a classifier using a weakly supervised learning framework to automatically time-tag tweets as past, present, or future. We quantify a user’s overall temporal orientation based on their distribution of tweets, and use it to build a predictive model of income. Our analysis uncovers a correlation between future temporal orientation and income. Finally, we measure the predictive power of future temporal orientation on income by performing regression.

pdf bib
Demographic Word Embeddings for Racism Detection on Twitter
Mohammed Hasanuzzaman | Gaël Dias | Andy Way
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Most social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3%) and significantly improves over the classification performance of demographic-agnostic models.

pdf bib
ADAPT at IJCNLP-2017 Task 4: A Multinomial Naive Bayes Classification Approach for Customer Feedback Analysis task
Pintu Lohar | Koel Dutta Chowdhury | Haithem Afli | Mohammed Hasanuzzaman | Andy Way
Proceedings of the IJCNLP 2017, Shared Tasks

In this age of the digital economy, promoting organisations attempt their best to engage the customers in the feedback provisioning process. With the assistance of customer insights, an organisation can develop a better product and provide a better service to its customer. In this paper, we analyse the real world samples of customer feedback from Microsoft Office customers in four languages, i.e., English, French, Spanish and Japanese and conclude a five-plus-one-classes categorisation (comment, request, bug, complaint, meaningless and undetermined) for meaning classification. The task is to %access multilingual corpora annotated by the proposed meaning categorization scheme and develop a system to determine what class(es) the customer feedback sentences should be annotated as in four languages. We propose following approaches to accomplish this task: (i) a multinomial naive bayes (MNB) approach for multi-label classification, (ii) MNB with one-vs-rest classifier approach, and (iii) the combination of the multilabel classification-based and the sentiment classification-based approach. Our best system produces F-scores of 0.67, 0.83, 0.72 and 0.7 for English, Spanish, French and Japanese, respectively. The results are competitive to the best ones for all languages and secure 3rd and 5th position for Japanese and French, respectively, among all submitted systems.


pdf bib
Building Tempo-HindiWordNet: A resource for effective temporal information access in Hindi
Dipawesh Pawar | Mohammed Hasanuzzaman | Asif Ekbal
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we put forward a strategy that supplements Hindi WordNet entries with information on the temporality of its word senses. Each synset of Hindi WordNet is automatically annotated to one of the five dimensions: past, present, future, neutral and atemporal. We use semi-supervised learning strategy to build temporal classifiers over the glosses of manually selected initial seed synsets. The classification process is iterated based on the repetitive confidence based expansion strategy of the initial seed list until cross-validation accuracy drops. The resource is unique in its nature as, to the best of our knowledge, still no such resource is available for Hindi.

pdf bib
Identifying Temporal Orientation of Word Senses
Mohammed Hasanuzzaman | Gaël Dias | Stéphane Ferrari | Yann Mathet | Andy Way
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning


pdf bib
Propagation Strategies for Building Temporal Ontologies
Mohammed Hasanuzzaman | Gaël Dias | Stéphane Ferrari | Yann Mathet
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers