Mehdi Embarek


2016

pdf bib
STAM : traduction des textes non structurés (dialectes du Maghreb) (STAM: Translation of unstructured text (Maghreb dialects) The use of communication platforms (social networks, discussion forums)
Mehdi Embarek | Soumya Embarek
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

L’utilisation des plateformes de communication (réseaux sociaux, forums de discussions, ...) a pris une ampleur considérable. Ces plateformes permettent aux internautes d’exprimer leur avis concernant un sujet, demander ou échanger des informations, commenter un événement, etc. Ainsi, nous retrouvons dans ces différentes sources d’informations une quantité importante de textes rédigés dans des dialectes locaux dont sont originaires les rédacteurs. Cependant, ces textes non structurés rendent l’exploitation des outils de traitement automatique des langues très difficile. Le système STAM aborde cette problématique en proposant un système capable de transcrire automatiquement des textes écrits dans un dialecte parlé dans les pays du Maghreb en un texte facilement interprétable et compréhensible (français ou anglais).

2014

pdf bib
The STAM System (Le système STAM) [in French]
Mehdi Embarek
Proceedings of TALN 2014 (Volume 3: System Demonstrations)

2008

pdf bib
Learning Patterns for Building Resources about Semantic Relations in the Medical Domain
Mehdi Embarek | Olivier Ferret
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this article, we present a method for extracting automatically semantic relations from texts in the medical domain using linguistic patterns. These patterns refer to three levels of information about words: inflected form, lemma and part-of-speech. The method we present consists first in identifying the entities that are part of the relations to extract, that is to say diseases, exams, treatments, drugs or symptoms. Thereafter, sentences that contain couples of entities are extracted and the presence of a semantic relation is validated by applying linguistic patterns. These patterns were previously learnt automatically from a manually annotated corpus by relying onan algorithm based on the edit distance. We first report the results of an evaluation of our medical entity tagger for the five types of entities we have mentioned above and then, more globally, the results of an evaluation of our extraction method for four relations between these entities. Both evaluations were done for French.