Muhammad Abdul-Mageed


pdf bib
AraNet: A Deep Learning Toolkit for Arabic Social Media
Muhammad Abdul-Mageed | Chiyu Zhang | Azadeh Hashemi | El Moatez Billah Nagoudi
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

We describe AraNet, a collection of deep learning Arabic social media processing tools. Namely, we exploit an extensive host of both publicly available and novel social media datasets to train bidirectional encoders from transformers (BERT) focused at social meaning extraction. AraNet models predict age, dialect, gender, emotion, irony, and sentiment. AraNet either delivers state-of-the-art performance on a number of these tasks and performs competitively on others. AraNet is exclusively based on a deep learning framework, giving it the advantage of being feature-engineering free. To the best of our knowledge, AraNet is the first to performs predictions across such a wide range of tasks for Arabic NLP. As such, AraNet has the potential to meet critical needs. We publicly release AraNet to accelerate research, and to facilitate model-based comparisons across the different tasks

pdf bib
Understanding and Detecting Dangerous Speech in Social Media
Ali Alshehri | El Moatez Billah Nagoudi | Muhammad Abdul-Mageed
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

Social media communication has become a significant part of daily activity in modern societies. For this reason, ensuring safety in social media platforms is a necessity. Use of dangerous language such as physical threats in online environments is a somewhat rare, yet remains highly important. Although several works have been performed on the related issue of detecting offensive and hateful language, dangerous speech has not previously been treated in any significant way. Motivated by these observations, we report our efforts to build a labeled dataset for dangerous speech. We also exploit our dataset to develop highly effective models to detect dangerous content. Our best model performs at 59.60% macro F1, significantly outperforming a competitive baseline.

pdf bib
Leveraging Affective Bidirectional Transformers for Offensive Language Detection
AbdelRahim Elmadany | Chiyu Zhang | Muhammad Abdul-Mageed | Azadeh Hashemi
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

Social media are pervasive in our life, making it necessary to ensure safe online experiences by detecting and removing offensive and hate speech. In this work, we report our submission to the Offensive Language and hate-speech Detection shared task organized with the 4th Workshop on Open-Source Arabic Corpora and Processing Tools Arabic (OSACT4). We focus on developing purely deep learning systems, without a need for feature engineering. For that purpose, we develop an effective method for automatic data augmentation and show the utility of training both offensive and hate speech models off (i.e., by fine-tuning) previously trained affective models (i.e., sentiment and emotion). Our best models are significantly better than a vanilla BERT model, with 89.60% acc (82.31% macro F1) for hate speech and 95.20% acc (70.51% macro F1) on official TEST data.

pdf bib
Growing Together: Modeling Human Language Learning With n-Best Multi-Checkpoint Machine Translation
El Moatez Billah Nagoudi | Muhammad Abdul-Mageed | Hasan Cavusoglu
Proceedings of the Fourth Workshop on Neural Generation and Translation

We describe our submission to the 2020 Duolingo Shared Task on Simultaneous Translation And Paraphrase for Language Education (STAPLE). We view MT models at various training stages (i.e., checkpoints) as human learners at different levels. Hence, we employ an ensemble of multi-checkpoints from the same model to generate translation sequences with various levels of fluency. From each checkpoint, for our best model, we sample n-Best sequences (n=10) with a beam width =100. We achieve an 37.57 macro F1 with a 6 checkpoint model ensemble on the official shared task test data, outperforming a baseline Amazon translation system of 21.30 macro F1 and ultimately demonstrating the utility of our intuitive method.

pdf bib
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Imed Zitouni | Muhammad Abdul-Mageed | Houda Bouamor | Fethi Bougares | Mahmoud El-Haj | Nadi Tomeh | Wajdi Zaghouani
Proceedings of the Fifth Arabic Natural Language Processing Workshop

pdf bib
Machine Generation and Detection of Arabic Manipulated and Fake News
El Moatez Billah Nagoudi | AbdelRahim Elmadany | Muhammad Abdul-Mageed | Tariq Alhindi
Proceedings of the Fifth Arabic Natural Language Processing Workshop

Fake news and deceptive machine-generated text are serious problems threatening modern societies, including in the Arab world. This motivates work on detecting false and manipulated stories online. However, a bottleneck for this research is lack of sufficient data to train detection models. We present a novel method for automatically generating Arabic manipulated (and potentially fake) news stories. Our method is simple and only depends on availability of true stories, which are abundant online, and a part of speech tagger (POS). To facilitate future work, we dispense with both of these requirements altogether by providing AraNews, a novel and large POS-tagged news dataset that can be used off-the-shelf. Using stories generated based on AraNews, we carry out a human annotation study that casts light on the effects of machine manipulation on text veracity. The study also measures human ability to detect Arabic machine manipulated text generated by our method. Finally, we develop the first models for detecting manipulated Arabic news and achieve state-of-the-art results on Arabic fake news detection (macro F1=70.06). Our models and data are publicly available.

pdf bib
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task
Muhammad Abdul-Mageed | Chiyu Zhang | Houda Bouamor | Nizar Habash
Proceedings of the Fifth Arabic Natural Language Processing Workshop

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and is collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.

pdf bib
Automatic Detection of Machine Generated Text: A Critical Survey
Ganesh Jawahar | Muhammad Abdul-Mageed | Laks Lakshmanan, V.S.
Proceedings of the 28th International Conference on Computational Linguistics

Text generative models (TGMs) excel in producing text that matches the style of human language reasonably well. Such TGMs can be misused by adversaries, e.g., by automatically generating fake news and fake product reviews that can look authentic and fool humans. Detectors that can distinguish text generated by TGM from human written text play a vital role in mitigating such misuse of TGMs. Recently, there has been a flurry of works from both natural language processing (NLP) and machine learning (ML) communities to build accurate detectors for English. Despite the importance of this problem, there is currently no work that surveys this fast-growing literature and introduces newcomers to important research challenges. In this work, we fill this void by providing a critical survey and review of this literature to facilitate a comprehensive understanding of this problem. We conduct an in-depth error analysis of the state-of-the-art detector and discuss research directions to guide future work in this exciting area.

pdf bib
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments
Muhammad Abdul-Mageed | Chiyu Zhang | AbdelRahim Elmadany | Lyle Ungar
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Although prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties. Inspired by geolocation research, we propose the novel task of Micro-Dialect Identification (MDI) and introduce MARBERT, a new language model with striking abilities to predict a fine-grained variety (as small as that of a city) given a single, short message. For modeling, we offer a range of novel spatially and linguistically-motivated multi-task learning models. To showcase the utility of our models, we introduce a new, large-scale dataset of Arabic micro-varieties (low-resource) suited to our tasks. MARBERT predicts micro-dialects with 9.9% F1, 76 better than a majority class baseline. Our new language model also establishes new state-of-the-art on several external tasks.

pdf bib
One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble
Kaili Vesik | Muhammad Abdul-Mageed | Miikka Silfverberg
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis. Similar to other speech and language processing tasks, in a scenario where only small-sized training data are available, learning G2P models is challenging. We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages. Our models are developed as part of our participation in the SIGMORPHON 2020 Shared Task 1 focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.


pdf bib
No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects
Chiyu Zhang | Muhammad Abdul-Mageed
Proceedings of the Fourth Arabic Natural Language Processing Workshop

We present our deep leaning system submitted to MADAR shared task 2 focused on twitter user dialect identification. We develop tweet-level identification models based on GRUs and BERT in supervised and semi-supervised set-tings. We then introduce a simple, yet effective, method of porting tweet-level labels at the level of users. Our system ranks top 1 in the competition, with 71.70% macro F1 score and 77.40% accuracy.

pdf bib
Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation
Michael Przystupa | Muhammad Abdul-Mageed
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

We present our contribution to the WMT19 Similar Language Translation shared task. We investigate the utility of neural machine translation on three low-resource, similar language pairs: Spanish – Portuguese, Czech – Polish, and Hindi – Nepali. Since state-of-the-art neural machine translation systems still require large amounts of bitext, which we do not have for the pairs we consider, we focus primarily on incorporating monolingual data into our models with backtranslation. In our analysis, we found Transformer models to work best on Spanish – Portuguese and Czech – Polish translation, whereas LSTMs with global attention worked best on Hindi – Nepali translation.

pdf bib
UBC-NLP at SemEval-2019 Task 6: Ensemble Learning of Offensive Content With Enhanced Training Data
Arun Rajendran | Chiyu Zhang | Muhammad Abdul-Mageed
Proceedings of the 13th International Workshop on Semantic Evaluation

We examine learning offensive content on Twitter with limited, imbalanced data. For the purpose, we investigate the utility of using various data enhancement methods with a host of classical ensemble classifiers. Among the 75 participating teams in SemEval-2019 sub-task B, our system ranks 6th (with 0.706 macro F1-score). For sub-task C, among the 65 participating teams, our system ranks 9th (with 0.587 macro F1-score).

pdf bib
UBC-NLP at SemEval-2019 Task 4: Hyperpartisan News Detection With Attention-Based Bi-LSTMs
Chiyu Zhang | Arun Rajendran | Muhammad Abdul-Mageed
Proceedings of the 13th International Workshop on Semantic Evaluation

We present our deep learning models submitted to the SemEval-2019 Task 4 competition focused at Hyperpartisan News Detection. We acquire best results with a Bi-LSTM network equipped with a self-attention mechanism. Among 33 participating teams, our submitted system ranks top 7 (65.3% accuracy) on the ‘labels-by-publisher’ sub-task and top 24 out of 44 teams (68.3% accuracy) on the ‘labels-by-article’ sub-task (65.3% accuracy). We also report a model that scores higher than the 8th ranking system (78.5% accuracy) on the ‘labels-by-article’ sub-task.


pdf bib
You Tweet What You Speak: A City-Level Dataset of Arabic Dialects
Muhammad Abdul-Mageed | Hassan Alhuzali | Mohamed Elaraby
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Enabling Deep Learning of Emotion With First-Person Seed Expressions
Hassan Alhuzali | Muhammad Abdul-Mageed | Lyle Ungar
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

The computational treatment of emotion in natural language text remains relatively limited, and Arabic is no exception. This is partly due to lack of labeled data. In this work, we describe and manually validate a method for the automatic acquisition of emotion labeled data and introduce a newly developed data set for Modern Standard and Dialectal Arabic emotion detection focused at Robert Plutchik’s 8 basic emotion types. Using a hybrid supervision method that exploits first person emotion seeds, we show how we can acquire promising results with a deep gated recurrent neural network. Our best model reaches 70% F-score, significantly (i.e., 11%, p < 0.05) outperforming a competitive baseline. Applying our method and data on an external dataset of 4 emotions released around the same time we finalized our work, we acquire 7% absolute gain in F-score over a linear SVM classifier trained on gold data, thus validating our approach.

pdf bib
Deep Models for Arabic Dialect Identification on Benchmarked Data
Mohamed Elaraby | Muhammad Abdul-Mageed
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

The Arabic Online Commentary (AOC) (Zaidan and Callison-Burch, 2011) is a large-scale repos-itory of Arabic dialects with manual labels for4varieties of the language. Existing dialect iden-tification models exploiting the dataset pre-date the recent boost deep learning brought to NLPand hence the data are not benchmarked for use with deep learning, nor is it clear how much neural networks can help tease the categories in the data apart. We treat these two limitations:We (1) benchmark the data, and (2) empirically test6different deep learning methods on thetask, comparing peformance to several classical machine learning models under different condi-tions (i.e., both binary and multi-way classification). Our experimental results show that variantsof (attention-based) bidirectional recurrent neural networks achieve best accuracy (acc) on thetask, significantly outperforming all competitive baselines. On blind test data, our models reach87.65%acc on the binary task (MSA vs. dialects),87.4%acc on the 3-way dialect task (Egyptianvs. Gulf vs. Levantine), and82.45%acc on the 4-way variants task (MSA vs. Egyptian vs. Gulfvs. Levantine). We release our benchmark for future work on the dataset

pdf bib
UBC-NLP at IEST 2018: Learning Implicit Emotion With an Ensemble of Language Models
Hassan Alhuzali | Mohamed Elaraby | Muhammad Abdul-Mageed
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

We describe UBC-NLP contribution to IEST-2018, focused at learning implicit emotion in Twitter data. Among the 30 participating teams, our system ranked the 4th (with 69.3% F-score). Post competition, we were able to score slightly higher than the 3rd ranking system (reaching 70.7%). Our system is trained on top of a pre-trained language model (LM), fine-tuned on the data provided by the task organizers. Our best results are acquired by an average of an ensemble of language models. We also offer an analysis of system performance and the impact of training data size on the task. For example, we show that training our best model for only one epoch with < 40% of the data enables better performance than the baseline reported by Klinger et al. (2018) for the task.


pdf bib
EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks
Muhammad Abdul-Mageed | Lyle Ungar
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Accurate detection of emotion from natural language has applications ranging from building emotional chatbots to better understanding individuals and their lives. However, progress on emotion detection has been hampered by the absence of large labeled datasets. In this work, we build a very large dataset for fine-grained emotions and develop deep learning models on it. We achieve a new state-of-the-art on 24 fine-grained types of emotions (with an average accuracy of 87.58%). We also extend the task beyond emotion types to model Robert Plutick’s 8 primary emotion dimensions, acquiring a superior accuracy of 95.68%.

pdf bib
Not All Segments are Created Equal: Syntactically Motivated Sentiment Analysis in Lexical Space
Muhammad Abdul-Mageed
Proceedings of the Third Arabic Natural Language Processing Workshop

Although there is by now a considerable amount of research on subjectivity and sentiment analysis on morphologically-rich languages, it is still unclear how lexical information can best be modeled in these languages. To bridge this gap, we build effective models exploiting exclusively gold- and machine-segmented lexical input and successfully employ syntactically motivated feature selection to improve classification. Our best models achieve significantly above the baselines, with 67.93% and 69.37% accuracies for subjectivity and sentiment classification respectively.


pdf bib
Does ‘well-being’ translate on Twitter?
Laura Smith | Salvatore Giorgi | Rishi Solanki | Johannes Eichstaedt | H. Andrew Schwartz | Muhammad Abdul-Mageed | Anneke Buffone | Lyle Ungar
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing


pdf bib
SANA: A Large Scale Multi-Genre, Multi-Dialect Lexicon for Arabic Subjectivity and Sentiment Analysis
Muhammad Abdul-Mageed | Mona Diab
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The computational treatment of subjectivity and sentiment in natural language is usually significantly improved by applying features exploiting lexical resources where entries are tagged with semantic orientation (e.g., positive, negative values). In spite of the fair amount of work on Arabic sentiment analysis over the past few years (e.g., (Abbasi et al., 2008; Abdul-Mageed et al., 2014; Abdul-Mageed et al., 2012; Abdul-Mageed and Diab, 2012a; Abdul-Mageed and Diab, 2012b; Abdul-Mageed et al., 2011a; Abdul-Mageed and Diab, 2011)), the language remains under-resourced as to these polarity repositories compared to the English language. In this paper, we report efforts to build and present SANA, a large-scale, multi-genre, multi-dialect multi-lingual lexicon for the subjectivity and sentiment analysis of the Arabic language and dialects.


pdf bib
ASMA: A System for Automatic Segmentation and Morpho-Syntactic Disambiguation of Modern Standard Arabic
Muhammad Abdul-Mageed | Mona Diab | Sandra Kübler
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013


pdf bib
AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis
Muhammad Abdul-Mageed | Mona Diab
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present AWATIF, a multi-genre corpus of Modern Standard Arabic (MSA) labeled for subjectivity and sentiment analysis (SSA) at the sentence level. The corpus is labeled using both regular as well as crowd sourcing methods under three different conditions with two types of annotation guidelines. We describe the sub-corpora constituting the corpus and provide examples from the various SSA categories. In the process, we present our linguistically-motivated and genre-nuanced annotation guidelines and provide evidence showing their impact on the labeling task.

pdf bib
SAMAR: A System for Subjectivity and Sentiment Analysis of Arabic Social Media
Muhammad Abdul-Mageed | Sandra Kuebler | Mona Diab
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis


pdf bib
“Yes we can?”: Subjectivity Annotation and Tagging for the Health Domain
Muhammad Abdul-Mageed | Mohammed Korayem | Ahmed YoussefAgha
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Subjectivity and Sentiment Annotation of Modern Standard Arabic Newswire
Muhammad Abdul-Mageed | Mona Diab
Proceedings of the 5th Linguistic Annotation Workshop

pdf bib
Subjectivity and Sentiment Analysis of Modern Standard Arabic
Muhammad Abdul-Mageed | Mona Diab | Mohammed Korayem
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies