Lamia Hadrich Belguith

Also published as: Lamia Hadrich-Belguith, Lamia Belguith, Lamia Belguith Hadrich, Lamia Hadrich Belguith


2020

pdf bib
Toward Qualitative Evaluation of Embeddings for Arabic Sentiment Analysis
Amira Barhoumi | Nathalie Camelin | Chafik Aloulou | Yannick Estève | Lamia Hadrich Belguith
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we propose several protocols to evaluate specific embeddings for Arabic sentiment analysis (SA) task. In fact, Arabic language is characterized by its agglutination and morphological richness contributing to great sparsity that could affect embedding quality. This work presents a study that compares embeddings based on words and lemmas in SA frame. We propose first to study the evolution of embedding models trained with different types of corpora (polar and non polar) and explore the variation between embeddings by observing the sentiment stability of neighbors in embedding spaces. Then, we evaluate embeddings with a neural architecture based on convolutional neural network (CNN). We make available our pre-trained embeddings to Arabic NLP research community with free to use. We provide also for free resources used to evaluate our embeddings. Experiments are done on the Large Arabic-Book Reviews (LABR) corpus in binary (positive/negative) classification frame. Our best result reaches 91.9%, that is higher than the best previous published one (91.5%).

pdf bib
Parallel resources for Tunisian Arabic Dialect Translation
Saméh Kchaou | Rahma Boujelbane | Lamia Hadrich-Belguith
Proceedings of the Fifth Arabic Natural Language Processing Workshop

The difficulty of processing dialects is clearly observed in the high cost of building representative corpus, in particular for machine translation. Indeed, all machine translation systems require a huge amount and good management of training data, which represents a challenge in a low-resource setting such as the Tunisian Arabic dialect. In this paper, we present a data augmentation technique to create a parallel corpus for Tunisian Arabic dialect written in social media and standard Arabic in order to build a Machine Translation (MT) model. The created corpus was used to build a sentence-based translation model. This model reached a BLEU score of 15.03% on a test set, while it was limited to 13.27% utilizing the corpus without augmentation.

2019

pdf bib
Plongements lexicaux spécifiques à la langue arabe : application à l’analyse d’opinions (Arabic-specific embedddings : application in Sentiment Analysis)
Amira Barhoumi | Nathalie Camelin | Chafik Aloulou | Yannick Estève | Lamia Hadrich Belguith
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Nous nous intéressons, dans cet article, à la tâche d’analyse d’opinions en arabe. Nous étudions la spécificité de la langue arabe pour la détection de polarité. Nous nous focalisons ici sur les caractéristiques d’agglutination et de richesse morphologique de cette langue. Nous avons particulièrement étudié différentes représentations d’unité lexicale : token, lemme et light stemme. Nous avons construit et testé des espaces continus de ces différentes représentations lexicales. Nous avons mesuré l’apport de tels types de representations vectorielles dans notre cadre spécifique. Les performances du réseau CNN montrent un gain significatif de 2% par rapport à l’état de l’art.

pdf bib
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Wassim El-Hajj | Lamia Hadrich Belguith | Fethi Bougares | Walid Magdy | Imed Zitouni | Nadi Tomeh | Mahmoud El-Haj
Proceedings of the Fourth Arabic Natural Language Processing Workshop

pdf bib
LIUM-MIRACL Participation in the MADAR Arabic Dialect Identification Shared Task
Saméh Kchaou | Fethi Bougares | Lamia Hadrich-Belguith
Proceedings of the Fourth Arabic Natural Language Processing Workshop

This paper describes the joint participation of the LIUM and MIRACL Laboratories at the Arabic dialect identification challenge of the MADAR Shared Task (Bouamor et al., 2019) conducted during the Fourth Arabic Natural Language Processing Workshop (WANLP 2019). We participated to the Travel Domain Dialect Identification subtask. We built several systems and explored different techniques including conventional machine learning methods and deep learning algorithms. Deep learning approaches did not perform well on this task. We experimented several classification systems and we were able to identify the dialect of an input sentence with an F1-score of 65.41% on the official test set using only the training data supplied by the shared task organizers.

pdf bib
Semantic Language Model for Tunisian Dialect
Abir Masmoudi | Rim Laatar | Mariem Ellouze | Lamia Hadrich Belguith
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

In this paper, we describe the process of creating a statistical Language Model (LM) for the Tunisian Dialect. Indeed, this work is part of the realization of Automatic Speech Recognition (ASR) system for the Tunisian Railway Transport Network. Since our eld of work has been limited, there are several words with similar behaviors (semantic for example) but they do not have the same appearance probability; their class groupings will therefore be possible. For these reasons, we propose to build an n-class LM that is based mainly on the integration of purely semantic data. Indeed, each class represents an abstraction of similar labels. In order to improve the sequence labeling task, we proposed to use a discriminative algorithm based on the Conditional Random Field (CRF) model. To better judge our choice of creating an n-class word model, we compared the created model with the 3-gram type model on the same test corpus of evaluation. Additionally, to assess the impact of using the CRF model to perform the semantic labelling task in order to construct semantic classes, we compared the n-class created model with using the CRF in the semantic labelling task and the n- class model without using the CRF in the semantic labelling task. The drawn comparison of the predictive power of the n-class model obtained by applying the CRF model in the semantic labelling is that it is better than the other two models presenting the highest value of its perplexity.

2017

pdf bib
Machine Learning Approach to Evaluate MultiLingual Summaries
Samira Ellouze | Maher Jaoua | Lamia Hadrich Belguith
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres

The present paper introduces a new MultiLing text summary evaluation method. This method relies on machine learning approach which operates by combining multiple features to build models that predict the human score (overall responsiveness) of a new summary. We have tried several single and “ensemble learning” classifiers to build the best model. We have experimented our method in summary level evaluation where we evaluate each text summary separately. The correlation between built models and human score is better than the correlation between baselines and manual score.

pdf bib
Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments
Salima Medhaffar | Fethi Bougares | Yannick Estève | Lamia Hadrich-Belguith
Proceedings of the Third Arabic Natural Language Processing Workshop

Dialectal Arabic (DA) is significantly different from the Arabic language taught in schools and used in written communication and formal speech (broadcast news, religion, politics, etc.). There are many existing researches in the field of Arabic language Sentiment Analysis (SA); however, they are generally restricted to Modern Standard Arabic (MSA) or some dialects of economic or political interest. In this paper we are interested in the SA of the Tunisian Dialect. We utilize Machine Learning techniques to determine the polarity of comments written in Tunisian Dialect. First, we evaluate the SA systems performances with models trained using freely available MSA and Multi-dialectal data sets. We then collect and annotate a Tunisian Dialect corpus of 17.000 comments from Facebook. This corpus allows us a significant accuracy improvement compared to the best model trained on other Arabic dialects or MSA data. We believe that this first freely available corpus will be valuable to researchers working in the field of Tunisian Sentiment Analysis and similar areas.

2016

pdf bib
Impact de l’agglutination dans l’extraction de termes en arabe standard moderne (Adaptation of a term extractor to the Modern Standard Arabic language)
Wafa Neifar | Thierry Hamon | Pierre Zweigenbaum | Mariem Ellouze | Lamia Hadrich Belguith
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Posters)

Nous présentons, dans cet article, une adaptation à l’arabe standard moderne d’un extracteur de termes pour le français et l’anglais. L’adaptation a d’abord consisté à décrire le processus d’extraction des termes de manière similaire à celui défini pour l’anglais et le français en prenant en compte certains particularités morpho-syntaxiques de la langue arabe. Puis, nous avons considéré le phénomène de l’agglutination de la langue arabe. L’évaluation a été réalisée sur un corpus de textes médicaux. Les résultats montrent que parmi 400 termes candidats maximaux analysés, 288 sont jugés corrects par rapport au domaine (72,1%). Les erreurs d’extraction sont dues à l’étiquetage morpho-syntaxique et à la non-voyellation des textes mais aussi à des phénomènes d’agglutination.

2015

pdf bib
Détection automatique de l’ironie dans les tweets en français
Jihen Karoui | Farah Benamara Zitoune | Véronique Moriceau | Nathalie Aussenac-Gilles | Lamia Hadrich Belguith
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente une méthode par apprentissage supervisé pour la détection de l’ironie dans les tweets en français. Un classifieur binaire utilise des traits de l’état de l’art dont les performances sont reconnues, ainsi que de nouveaux traits issus de notre étude de corpus. En particulier, nous nous sommes intéressés à la négation et aux oppositions explicites/implicites entre des expressions d’opinion ayant des polarités différentes. Les résultats obtenus sont encourageants.

pdf bib
Towards a Contextual Pragmatic Model to Detect Irony in Tweets
Jihen Karoui | Farah Benamara Zitoune | Véronique Moriceau | Nathalie Aussenac-Gilles | Lamia Hadrich Belguith
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Sentiment Classification of Arabic Documents: Experiments with multi-type features and ensemble algorithms
Amine Bayoudhi | Hatem Ghorbel | Lamia Hadrich Belguith
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf bib
A Conventional Orthography for Tunisian Arabic
Inès Zribi | Rahma Boujelbane | Abir Masmoudi | Mariem Ellouze | Lamia Belguith | Nizar Habash
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Tunisian Arabic is a dialect of the Arabic language spoken in Tunisia. Tunisian Arabic is an under-resourced language. It has neither a standard orthography nor large collections of written text and dictionaries. Actually, there is no strict separation between Modern Standard Arabic, the official language of the government, media and education, and Tunisian Arabic; the two exist on a continuum dominated by mixed forms. In this paper, we present a conventional orthography for Tunisian Arabic, following a previous effort on developing a conventional orthography for Dialectal Arabic (or CODA) demonstrated for Egyptian Arabic. We explain the design principles of CODA and provide a detailed description of its guidelines as applied to Tunisian Arabic.

pdf bib
A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition
Abir Masmoudi | Mariem Ellouze Khmekhem | Yannick Estève | Lamia Hadrich Belguith | Nizar Habash
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we describe an effort to create a corpus and phonetic dictionary for Tunisian Arabic Automatic Speech Recognition (ASR). The corpus, named TARIC (Tunisian Arabic Railway Interaction Corpus) has a collection of audio recordings and transcriptions from dialogues in the Tunisian Railway Transport Network. The phonetic (or pronunciation) dictionary is an important ASR component that serves as an intermediary between acoustic models and language models in ASR systems. The method proposed in this paper, to automatically generate a phonetic dictionary, is rule based. For that reason, we define a set of pronunciation rules and a lexicon of exceptions. To determine the performance of our phonetic rules, we chose to evaluate our pronunciation dictionary on two types of corpora. The word error rate of word grapheme-to-phoneme mapping is around 9%.

2013

pdf bib
Mapping Rules for Building a Tunisian Dialect Lexicon and Generating Corpora
Rahma Boujelbane | Mariem Ellouze Khemekhem | Lamia Hadrich Belguith
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Morphological Analysis of Tunisian Dialect
Inès Zribi | Mariem Ellouze Khemakhem | Lamia Hadrich Belguith
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Building bilingual lexicon to create Dialect Tunisian corpora and adapt language model
Rahma Boujelbane | Mariem Ellouze khemekhem | Siwar BenAyed | Lamia Hadrich Belguith
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
An Evaluation Summary Method Based on a Combination of Content and Linguistic Metrics
Samira Ellouze | Maher Jaoua | Lamia Hadrich Belguith
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Segmenting Arabic Texts into Elementary Discourse Units (Segmentation de textes arabes en unités discursives minimales) [in French]
Iskandar Keskes | Farah Beanamara | Lamia Hadrich Belguith
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
An evaluation summary method based on combination of automatic and textual complexity metrics (Une méthode d’évaluation des résumés basée sur la combinaison de métriques automatiques et de complexité textuelle) [in French]
Samira Walha Ellouze | Maher Jaoua | Lamia Hadrich Belguith
Proceedings of TALN 2013 (Volume 2: Short Papers)

2012

pdf bib
Étude comparative entre trois approches de résumé automatique de documents arabes (Comparative Study of Three Approaches to Automatic Summarization of Arabic Documents) [in French]
Iskandar Keskes | Mohamed Mahdi Boudabous | Mohamed Hédi Maaloul | Lamia Hadrich Belguith
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
La reconnaissance automatique de la fonction des pronoms démonstratifs en langue arabe (Automatic recognition of demonstrative pronouns function in Arabic) [in French]
Yacine Ben Yahia | Souha Mezghani Hammami | Lamia Hadrich Belguith
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Clause-based Discourse Segmentation of Arabic Texts
Iskandar Keskes | Farah Benamara | Lamia Hadrich Belguith
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying only on lexical cues and finally by using both typology and lexical cues. The results were compared with manual segmentations elaborated by experts.

pdf bib
Extraction de lexiques bilingues à partir de Wikipédia (Bilingual lexicon extraction from Wikipedia) [in French]
Rahma Sellami | Fatiha Sadat | Lamia Hadrich Belguith
JEP-TALN-RECITAL 2012, Workshop TALAf 2012: Traitement Automatique des Langues Africaines (TALAf 2012: African Language Processing)

2003

pdf bib
Implémentation du système MASPAR selon une approche multi-agent
Chafik Aloulou | Lamia Hadrich Belguith | Ahmed Hadj Kacem | Souha Hammami Mezghani
Proceedings of the Eighth International Conference on Parsing Technologies

Le traitement automatique du langage naturel est un axe de recherche qui connaît chaque jour de nouvelles théories et approches. Les systèmes d’analyse automatique qui sont fondés sur une approche séquentielle présentent plusieurs inconvénients. Afin de pallier ces limites, nous nous sommes intéressés à la réalisation d’un système d’analyse syntaxique de textes arabes basé sur l’approche multi-agent : MASPAR « Multi-Agent System for Parsing ARabic ».

1998

pdf bib
Multilingual Robust Anaphora Resolution
Ruslan Mitkov | Lamia Belguith | Malgorzata Stys
Proceedings of the Third Conference on Empirical Methods for Natural Language Processing

1997

pdf bib
An Agreement Error Correction Method Based on a Multicriteria Approach: An Application to Arabic Language
Lamia Belguith Hadrich | Abdelmajid Ben Hamadou | Chafik Aloulou
Proceedings of the 10th Research on Computational Linguistics International Conference