Claude Roux


pdf bib
Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness
Alexandre Berard | Ioan Calapodescu | Marc Dymetman | Claude Roux | Jean-Luc Meunier | Vassilina Nikoulina
Proceedings of the 3rd Workshop on Neural Generation and Translation

We share a French-English parallel corpus of Foursquare restaurant reviews, and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of such user-generated content, and train good baseline models that build upon the latest techniques for MT robustness. We also perform an extensive evaluation (automatic and human) that shows significant improvements over existing online systems. Finally, we propose task-specific metrics based on sentiment analysis or translation accuracy of domain-specific polysemous words.

pdf bib
Naver Labs Europe’s Systems for the WMT19 Machine Translation Robustness Task
Alexandre Berard | Ioan Calapodescu | Claude Roux
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper describes the systems that we submitted to the WMT19 Machine Translation robustness task. This task aims to improve MT’s robustness to noise found on social media, like informal language, spelling mistakes and other orthographic variations. The organizers provide parallel data extracted from a social media website in two language pairs: French-English and Japanese-English (one for each language direction). The goal is to obtain the best scores on unseen test sets from the same source, according to automatic metrics (BLEU) and human evaluation. We propose one single and one ensemble system for each translation direction. Our ensemble models ranked first in all language pairs, according to BLEU evaluation. We discuss the pre-processing choices that we made, and present our solutions for robustness to noise and domain adaptation.


pdf bib
XRCE at SemEval-2016 Task 5: Feedbacked Ensemble Modeling on Syntactico-Semantic Knowledge for Aspect Based Sentiment Analysis
Caroline Brun | Julien Perez | Claude Roux
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)


pdf bib
Un système hybride pour l’analyse de sentiments associés aux aspects
Caroline Brun | Diana Nicoleta Popa | Claude Roux
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente en détails notre participation à la tâche 4 de SemEval2014 (Analyse de Sentiments associés aux Aspects). Nous présentons la tâche et décrivons précisément notre système qui consiste en une combinaison de composants linguistiques et de modules de classification. Nous exposons ensuite les résultats de son évaluation, ainsi que les résultats des meilleurs systèmes. Nous concluons par la présentation de quelques nouvelles expériences réalisées en vue de l’amélioration de ce système.


pdf bib
XRCE: Hybrid Classification for Aspect-based Sentiment Analysis
Caroline Brun | Diana Nicoleta Popa | Claude Roux
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Part of Speech Tagging for French Social Media Data
Farhad Nooralahzadeh | Caroline Brun | Claude Roux
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Decomposing Hashtags to Improve Tweet Polarity Classification (Décomposition des « hash tags » pour l’amélioration de la classification en polarité des « tweets ») [in French]
Caroline Brun | Claude Roux
Proceedings of TALN 2014 (Volume 2: Short Papers)

pdf bib
Investigating the Image of Entities in Social Media: Dataset Design and First Results
Julien Velcin | Young-Min Kim | Caroline Brun | Jean-Yves Dormagen | Eric SanJuan | Leila Khouas | Anne Peradotto | Stephane Bonnevay | Claude Roux | Julien Boyadjian | Alejandro Molina | Marie Neihouser
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The objective of this paper is to describe the design of a dataset that deals with the image (i.e., representation, web reputation) of various entities populating the Internet: politicians, celebrities, companies, brands etc. Our main contribution is to build and provide an original annotated French dataset. This dataset consists of 11527 manually annotated tweets expressing the opinion on specific facets (e.g., ethic, communication, economic project) describing two French policitians over time. We believe that other researchers might benefit from this experience, since designing and implementing such a dataset has proven quite an interesting challenge. This design comprises different processes such as data selection, formal definition and instantiation of an image. We have set up a full open-source annotation platform. In addition to the dataset design, we present the first results that we obtained by applying clustering methods to the annotated dataset in order to extract the entity images.


pdf bib
Coupling a Linguistic Formalism and a Script Language
Claude Roux
Proceedings of the Third Workshop on Constraints and Language Processing


pdf bib
Towards an International Standard on Feature Structure Representation
Kiyong Lee | Lou Burnard | Laurent Romary | Eric de la Clergerie | Thierry Declerck | Syd Bauman | Harry Bunt | Lionel Clément | Tomaž Erjavec | Azim Roussanaly | Claude Roux
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)


pdf bib
A Robust and Flexible Platform for Dependency Extraction
Caroline Hagège | Claude Roux
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)


pdf bib
A Multi-Input Dependency Parser
Salah Aït-Mokhtar | Jean-Pierre Chanod | Claude Roux
Proceedings of the Seventh International Workshop on Parsing Technologies


pdf bib
A Step toward Semantic Indexing of an Encyclopedic Corpus
Philippe Alcouffe | Nicolas Gacon | Claude Roux | Frédérique Segond
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)