Olivier Kraif

Also published as: O. Kraif


2020

pdf bib
Multi-Task Sequence Prediction For Tunisian Arabizi Multi-Level Annotation
Elisa Gugliotta | Marco Dinarelli | Olivier Kraif
Proceedings of the Fifth Arabic Natural Language Processing Workshop

In this paper we propose a multi-task sequence prediction system, based on recurrent neural networks and used to annotate on multiple levels an Arabizi Tunisian corpus. The annotation performed are text classification, tokenization, PoS tagging and encoding of Tunisian Arabizi into CODA* Arabic orthography. The system is learned to predict all the annotation levels in cascade, starting from Arabizi input. We evaluate the system on the TIGER German corpus, suitably converting data to have a multi-task problem, in order to show the effectiveness of our neural architecture. We show also how we used the system in order to annotate a Tunisian Arabizi corpus, which has been afterwards manually corrected and used to further evaluate sequence models on Tunisian data. Our system is developed for the Fairseq framework, which allows for a fast and easy use for any other sequence prediction problem.

2018

pdf bib
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation
Ali Can Kocabiyikoglu | Laurent Besacier | Olivier Kraif
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Comparing Recurring Lexico-Syntactic Trees (RLTs) and Ngram Techniques for Extended Phraseology Extraction
Agnès Tutin | Olivier Kraif
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

This paper aims at assessing to what extent a syntax-based method (Recurring Lexico-syntactic Trees (RLT) extraction) allows us to extract large phraseological units such as prefabricated routines, e.g. “as previously said” or “as far as we/I know” in scientific writing. In order to evaluate this method, we compare it to the classical ngram extraction technique, on a subset of recurring segments including speech verbs in a French corpus of scientific writing. Results show that the LRT extraction technique is far more efficient for extended MWEs such as routines or collocations but performs more poorly for surface phenomena such as syntactic constructions or fully frozen expressions.

2015

pdf bib
Multialignement vs bialignement : à plusieurs, c’est mieux !
Olivier Kraif
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous proposons une méthode originale destinée à effectuer l’alignement d’un corpus multiparallèle, i.e. comportant plus de deux langues, en prenant en compte toutes les langues simultanément (et non en composant une série de bialignements indépendants). Pour ce faire, nous nous appuyons sur les réseaux de correspondances lexicales constitués par les transfuges (chaînes identiques) et cognats (mots apparentés), et nous montrons comment divers tuilages des couples de langues permettent d’exploiter au mieux les ressemblances superficielles liées aux relations génétiques interlinguistiques. Nous évaluons notre méthode par rapport à une méthode de bialignement classique, et montrons en quoi le multialignement permet d’obtenir des résultats à la fois plus précis et plus robustes.

2013

pdf bib
The constitution of an Arabic semantic resource from a multilingual aligned corpus (Constitution d’une ressource sémantique arabe à partir de corpus multilingue aligné) [in French]
Authoul Abdul Hay | Olivier Kraif
Proceedings of TALN 2013 (Volume 1: Long Papers)

2012

pdf bib
Le Lexicoscope : un outil pour l’étude de profils combinatoires et l’extraction de constructions lexico-syntaxiques (The Lexicoscope : an integrated tool for combinatoric profles observation and lexico-syntactic constructs extraction) [in French]
Olivier Kraif | Sascha Diwersy
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
ScienQuest: a Treebank Exploitation Tool for non NLP-Specialists
Achille Falaise | Olivier Kraif | Agnès Tutin | David Rouquet
Proceedings of COLING 2012: Demonstration Papers

2006

pdf bib
Evaluation of multilingual text alignment systems: the ARCADE II project
Yun-Chuang Chiao | Olivier Kraif | Dominique Laurent | Thi Minh Huyen Nguyen | Nasredine Semmar | François Stuck | Jean Véronis | Wajdi Zaghouani
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the ARCADE II project, concerned with the evaluation of parallel text alignment systems. The ARCADE II project aims at exploring the techniques of multilingual text alignment through a fine evaluation of the existing techniques and the development of new alignment methods. The evaluation campaign consists of two tracks devoted to the evaluation of alignment at sentence and word level respectively. It differs from ARCADE I in the multilingual aspect and the investigation of lexical alignment.

2004

pdf bib
Combining clues for lexical level aligning using the Null hypothesis approach
Olivier Kraif | Boxing Chen
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Using a Word Sense Disambiguation system for translation disambiguation: the LIA-LIDILEM team experiment
Grégoire Moreau de Montcheuil | Marc El-Bèze | Boxing Chen | Olivier Kraif
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
NLP-based scripting for CALL activities
G. Antoniadis | S. Echinard | O. Kraif | T. Lebarbé | M. Loiseau | C. Ponton
Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning