Behrouz Minaei-Bidgoli

Also published as: Behrouz Minaei, Behrouz Minaei-bidgoli


2020

pdf bib
IUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using Deep Neural Networks and Linear Baselines
Soroush Javdan | Taha Shangipour ataei | Behrouz Minaei-Bidgoli
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Sentiment Analysis is a well-studied field of Natural Language Processing. However, the rapid growth of social media and noisy content within them poses significant challenges in addressing this problem with well-established methods and tools. One of these challenges is code-mixing, which means using different languages to convey thoughts in social media texts. Our group, with the name of IUST(username: TAHA), participated at the SemEval-2020 shared task 9 on Sentiment Analysis for Code-Mixed Social Media Text, and we have attempted to develop a system to predict the sentiment of a given code-mixed tweet. We used different preprocessing techniques and proposed to use different methods that vary from NBSVM to more complicated deep neural network models. Our best performing method obtains an F1 score of 0.751 for the Spanish-English sub-task and 0.706 over the Hindi-English sub-task.

pdf bib
Applying Transformers and Aspect-based Sentiment Analysis approaches on Sarcasm Detection
Taha Shangipour ataei | Soroush Javdan | Behrouz Minaei-Bidgoli
Proceedings of the Second Workshop on Figurative Language Processing

Sarcasm is a type of figurative language broadly adopted in social media and daily conversations. The sarcasm can ultimately alter the meaning of the sentence, which makes the opinion analysis process error-prone. In this paper, we propose to employ bidirectional encoder representations transformers (BERT), and aspect-based sentiment analysis approaches in order to extract the relation between context dialogue sequence and response and determine whether or not the response is sarcastic. The best performing method of ours obtains an F1 score of 0.73 on the Twitter dataset and 0.734 over the Reddit dataset at the second workshop on figurative language processing Shared Task 2020.

2013

pdf bib
An Empirical Study on the Effect of Morphological and Lexical Features in Persian Dependency Parsing
Mojtaba Khallash | Ali Hadian | Behrouz Minaei-Bidgoli
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

pdf bib
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model
Mohammad Hoseyn Sheykholeslam | Behrouz Minaei-Bidgoli | Hossein Juzi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

There are several methods offered for spelling correction in Farsi (Persian) Language. Unfortunately no powerful framework has been implemented because of lack of a large training set in Farsi as an accurate model. A training set consisting of erroneous and related correction string pairs have been obtained from a large number of instances of the books each of which were typed two times in Computer Research Center of Islamic Sciences. We trained our error model using this huge set. In testing part after finding erroneous words in sample text, our program proposes some candidates for related correction. The paper focuses on describing the method of ranking related corrections. This method is customized version of Noisy Channel Spelling Correction for Farsi. This ranking method attempts to find intended correction c from a typo t, that maximizes P(c) P(t | c). In this paper different methods are described and analyzed to obtain a wide overview of the field. Our evaluation results show that Noisy Channel Model using our corpus and training set in this framework works more accurately and improves efficiently in comparison with other methods.

pdf bib
Improving K-Nearest Neighbor Efficacy for Farsi Text Classification
Mohammad Hossein Elahimanesh | Behrouz Minaei | Hossein Malekinezhad
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

One of the common processes in the field of text mining is text classification. Because of the complex nature of Farsi language, words with separate parts and combined verbs, the most of text classification systems are not applicable to Farsi texts. K-Nearest Neighbors (KNN) is one of the most popular used methods for text classification and presents good performance in experiments on different datasets. A method to improve the classification performance of KNN is proposed in this paper. Effects of removing or maintaining stop words, applying N-Grams with different lengths are also studied. For this study, a portion of a standard Farsi corpus called Hamshahri1 and articles of some archived newspapers are used. As the results indicate, classification efficiency improves by applying this approach especially when eight-grams indexing method and removing stop words are applied. Using N-grams with lengths more than 3 characters, presented very encouraging results for Farsi text classification. The Results of classification using our method are compared with the results obtained by mentioned related works.

2010

pdf bib
A Persian Part-Of-Speech Tagger Based on Morphological Analysis
Mahdi Mohseni | Behrouz Minaei-bidgoli
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a method based on morphological analysis of words for a Persian Part-Of-Speech (POS) tagging system. This is a main part of a process for expanding a large Persian corpus called Peyekare (or Textual Corpus of Persian Language). Peykare is arranged into two parts: annotated and unannotated parts. We use the annotated part in order to create an automatic morphological analyzer, a main segment of the system. Morphosyntactic features of Persian words cause two problems: the number of tags is increased in the corpus (586 tags) and the form of the words is changed. This high number of tags debilitates any taggers to work efficiently. From other side the change of word forms reduces the frequency of words with the same lemma; and the number of words belonging to a specific tag reduces as well. This problem also has a bad effect on statistical taggers. The morphological analyzer by removing the problems helps the tagger to cover a large number of tags in the corpus. Using a Markov tagger the method is evaluated on the corpus. The experiments show the efficiency of the method in Persian POS tagging.