Jean Senellart


2020

pdf bib
Boosting Neural Machine Translation with Similar Translations
Jitao Xu | Josep Crego | Jean Senellart
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper explores data augmentation methods for training Neural Machine Translation to make use of similar translations, in a comparable way a human translator employs fuzzy matches. In particular, we show how we can simply present the neural model with information of both source and target sides of the fuzzy matches, we also extend the similarity to include semantically related translations retrieved using sentence distributed representations. We show that translations based on fuzzy matching provide the model with “copy” information while translations based on embedding similarities tend to extend the translation “context”. Results indicate that the effect from both similar sentences are adding up to further boost accuracy, combine naturally with model fine-tuning and are providing dynamic adaptation for unseen translation pairs. Tests on multiple data sets and domains show consistent accuracy improvements. To foster research around these techniques, we also release an Open-Source toolkit with efficient and flexible fuzzy-match implementation.

pdf bib
Efficient and High-Quality Neural Machine Translation with OpenNMT
Guillaume Klein | Dakun Zhang | Clément Chouteau | Josep Crego | Jean Senellart
Proceedings of the Fourth Workshop on Neural Generation and Translation

This paper describes the OpenNMT submissions to the WNGT 2020 efficiency shared task. We explore training and acceleration of Transformer models with various sizes that are trained in a teacher-student setup. We also present a custom and optimized C++ inference engine that enables fast CPU and GPU decoding with few dependencies. By combining additional optimizations and parallelization techniques, we create small, efficient, and high-quality neural machine translation models.

pdf bib
The OpenNMT Neural Machine Translation Toolkit: 2020 Edition
Guillaume Klein | François Hernandez | Vincent Nguyen | Jean Senellart
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
Integrating Domain Terminology into Neural Machine Translation
Elise Michon | Josep Crego | Jean Senellart
Proceedings of the 28th International Conference on Computational Linguistics

This paper extends existing work on terminology integration into Neural Machine Translation, a common industrial practice to dynamically adapt translation to a specific domain. Our method, based on the use of placeholders complemented with morphosyntactic annotation, efficiently taps into the ability of the neural network to deal with symbolic knowledge to surpass the surface generalization shown by alternative techniques. We compare our approach to state-of-the-art systems and benchmark them through a well-defined evaluation framework, focusing on actual application of terminology and not just on the overall performance. Results indicate the suitability of our method in the use-case where terminology is used in a system trained on generic data only.

2019

pdf bib
SYSTRAN @ WAT 2019: Russian-Japanese News Commentary task
Jitao Xu | TuAnh Nguyen | MinhQuang Pham | Josep Crego | Jean Senellart
Proceedings of the 6th Workshop on Asian Translation

This paper describes Systran’s submissions to WAT 2019 Russian-Japanese News Commentary task. A challenging translation task due to the extremely low resources available and the distance of the language pair. We have used the neural Transformer architecture learned over the provided resources and we carried out synthetic data generation experiments which aim at alleviating the data scarcity problem. Results indicate the suitability of the data augmentation experiments, enabling our systems to rank first according to automatic evaluations.

pdf bib
Enhanced Transformer Model for Data-to-Text Generation
Li Gong | Josep Crego | Jean Senellart
Proceedings of the 3rd Workshop on Neural Generation and Translation

Neural models have recently shown significant progress on data-to-text generation tasks in which descriptive texts are generated conditioned on database records. In this work, we present a new Transformer-based data-to-text generation model which learns content selection and summary generation in an end-to-end fashion. We introduce two extensions to the baseline transformer model: First, we modify the latent representation of the input, which helps to significantly improve the content correctness of the output summary; Second, we include an additional learning objective that accounts for content selection modelling. In addition, we propose two data augmentation methods that succeed to further improve performance of the resulting generation models. Evaluation experiments show that our final model outperforms current state-of-the-art systems as measured by different metrics: BLEU, content selection precision and content ordering. We made publicly available the transformer extension presented in this paper.

pdf bib
SYSTRAN @ WNGT 2019: DGT Task
Li Gong | Josep Crego | Jean Senellart
Proceedings of the 3rd Workshop on Neural Generation and Translation

This paper describes SYSTRAN participation to the Document-level Generation and Trans- lation (DGT) Shared Task of the 3rd Workshop on Neural Generation and Translation (WNGT 2019). We participate for the first time using a Transformer network enhanced with modified input embeddings and optimising an additional objective function that considers content selection. The network takes in structured data of basketball games and outputs a summary of the game in natural language.

2018

pdf bib
Fixing Translation Divergences in Parallel Corpora for Neural MT
MinhQuang Pham | Josep Crego | Jean Senellart | François Yvon
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Corpus-based approaches to machine translation rely on the availability of clean parallel corpora. Such resources are scarce, and because of the automatic processes involved in their preparation, they are often noisy. This paper describes an unsupervised method for detecting translation divergences in parallel sentences. We rely on a neural network that computes cross-lingual sentence similarity scores, which are then used to effectively filter out divergent translations. Furthermore, similarity scores predicted by the network are used to identify and fix some partial divergences, yielding additional parallel segments. We evaluate these methods for English-French and English-German machine translation tasks, and show that using filtered/corrected corpora actually improves MT performance.

pdf bib
OpenNMT: Neural Machine Translation Toolkit
Guillaume Klein | Yoon Kim | Yuntian Deng | Vincent Nguyen | Jean Senellart | Alexander Rush
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)

pdf bib
OpenNMT System Description for WNMT 2018: 800 words/sec on a single-core CPU
Jean Senellart | Dakun Zhang | Bo Wang | Guillaume Klein | Jean-Pierre Ramatchandirin | Josep Crego | Alexander Rush
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

We present a system description of the OpenNMT Neural Machine Translation entry for the WNMT 2018 evaluation. In this work, we developed a heavily optimized NMT inference model targeting a high-performance CPU system. The final system uses a combination of four techniques, all of them lead to significant speed-ups in combination: (a) sequence distillation, (b) architecture modifications, (c) precomputation, particularly of vocabulary, and (d) CPU targeted quantization. This work achieves the fastest performance of the shared task, and led to the development of new features that have been integrated to OpenNMT and available to the community.

pdf bib
Neural Network Architectures for Arabic Dialect Identification
Elise Michon | Minh Quang Pham | Josep Crego | Jean Senellart
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

SYSTRAN competes this year for the first time to the DSL shared task, in the Arabic Dialect Identification subtask. We participate by training several Neural Network models showing that we can obtain competitive results despite the limited amount of training data available for learning. We report our experiments and detail the network architecture and parameters of our 3 runs: our best performing system consists in a Multi-Input CNN that learns separate embeddings for lexical, phonetic and acoustic input features (F1: 0.5289); we also built a CNN-biLSTM network aimed at capturing both spatial and sequential features directly from speech spectrograms (F1: 0.3894 at submission time, F1: 0.4235 with later found parameters); and finally a system relying on binary CNN-biLSTMs (F1: 0.4339).

pdf bib
SYSTRAN Participation to the WMT2018 Shared Task on Parallel Corpus Filtering
MinhQuang Pham | Josep Crego | Jean Senellart
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the participation of SYSTRAN to the shared task on parallel corpus filtering at the Third Conference on Machine Translation (WMT 2018). We participate for the first time using a neural sentence similarity classifier which aims at predicting the relatedness of sentence pairs in a multilingual context. The paper describes the main characteristics of our approach and discusses the results obtained on the data sets published for the shared task.

2017

pdf bib
Adaptation incrémentale de modèles de traduction neuronaux (Incremental adaptation of neural machine translation models)
Christophe Servan | Josep Crego | Jean Senellart
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

L’adaptation au domaine est un verrou scientifique en traduction automatique. Il englobe généralement l’adaptation de la terminologie et du style, en particulier pour la post-édition humaine dans le cadre d’une traduction assistée par ordinateur. Avec la traduction automatique neuronale, nous étudions une nouvelle approche d’adaptation au domaine que nous appelons “spécialisation” et qui présente des résultats prometteurs tant dans la vitesse d’apprentissage que dans les scores de traduction. Dans cet article, nous proposons d’explorer cette approche.

pdf bib
Conception d’une solution de détection d’événements basée sur Twitter (Design of a solution for event detection from Tweeter)
Christophe Servan | Catherine Kobus | Yongchao Deng | Cyril Touffet | Jungi Kim | Inès Kapp | Djamel Mostefa | Josep Crego | Aurélien Coquard | Jean Senellart
Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Volume 3 - Démonstrations

Cet article présente un système d’alertes fondé sur la masse de données issues de Tweeter. L’objectif de l’outil est de surveiller l’actualité, autour de différents domaines témoin incluant les événements sportifs ou les catastrophes naturelles. Cette surveillance est transmise à l’utilisateur sous forme d’une interface web contenant la liste d’événements localisés sur une carte.

pdf bib
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein | Yoon Kim | Yuntian Deng | Jean Senellart | Alexander Rush
Proceedings of ACL 2017, System Demonstrations

pdf bib
Boosting Neural Machine Translation
Dakun Zhang | Jungi Kim | Josep Crego | Jean Senellart
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Training efficiency is one of the main problems for Neural Machine Translation (NMT). Deep networks need for very large data as well as many training iterations to achieve state-of-the-art performance. This results in very high computation cost, slowing down research and industrialisation. In this paper, we propose to alleviate this problem with several training methods based on data boosting and bootstrap with no modifications to the neural network. It imitates the learning process of humans, which typically spend more time when learning “difficult” concepts than easier ones. We experiment on an English-French translation task showing accuracy improvements of up to 1.63 BLEU while saving 20% of training time.

pdf bib
Domain Control for Neural Machine Translation
Catherine Kobus | Josep Crego | Jean Senellart
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Machine translation systems are very sensitive to the domains they were trained on. Several domain adaptation techniques have already been deeply studied. We propose a new technique for neural machine translation (NMT) that we call domain control which is performed at runtime using a unique neural network covering multiple domains. The presented approach shows quality improvements when compared to dedicated domains translating on any of the covered domains and even on out-of-domain data. In addition, model parameters do not need to be re-estimated for each domain, making this effective to real use cases. Evaluation is carried out on English-to-French translation for two different testing scenarios. We first consider the case where an end-user performs translations on a known domain. Secondly, we consider the scenario where the domain is not known and predicted at the sentence level before translating. Results show consistent accuracy improvements for both conditions.

pdf bib
SYSTRAN Purely Neural MT Engines for WMT2017
Yongchao Deng | Jungi Kim | Guillaume Klein | Catherine Kobus | Natalia Segal | Christophe Servan | Bo Wang | Dakun Zhang | Josep Crego | Jean Senellart
Proceedings of the Second Conference on Machine Translation

2012

pdf bib
Joint WMT 2012 Submission of the QUAERO Project
Markus Freitag | Stephan Peitz | Matthias Huck | Hermann Ney | Jan Niehues | Teresa Herrmann | Alex Waibel | Hai-son Le | Thomas Lavergne | Alexandre Allauzen | Bianka Buschbeck | Josep Maria Crego | Jean Senellart
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
Collaborative Machine Translation Service for Scientific texts
Patrik Lambert | Jean Senellart | Laurent Romary | Holger Schwenk | Florian Zipser | Patrice Lopez | Frédéric Blain
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Joint WMT Submission of the QUAERO Project
Markus Freitag | Gregor Leusch | Joern Wuebker | Stephan Peitz | Hermann Ney | Teresa Herrmann | Jan Niehues | Alex Waibel | Alexandre Allauzen | Gilles Adda | Josep Maria Crego | Bianka Buschbeck | Tonio Wandmacher | Jean Senellart
Proceedings of the Sixth Workshop on Statistical Machine Translation

2009

pdf bib
Statistical Post Editing and Dictionary Extraction: Systran/Edinburgh Submissions for ACL-WMT2009
Loic Dugast | Jean Senellart | Philipp Koehn
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
SMT and SPE Machine Translation Systems for WMT‘09
Holger Schwenk | Sadaf Abdul-Rauf | Loïc Barrault | Jean Senellart
Proceedings of the Fourth Workshop on Statistical Machine Translation

2008

pdf bib
First Steps towards a General Purpose French/English Statistical Machine Translation System
Holger Schwenk | Jean-Baptiste Fouet | Jean Senellart
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Can we Relearn an RBMT System?
Loïc Dugast | Jean Senellart | Philipp Koehn
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Tighter Integration of Rule-Based and Statistical MT in Serial System Combination
Nicola Ueffing | Jens Stephan | Evgeny Matusov | Loïc Dugast | George Foster | Roland Kuhn | Jean Senellart | Jin Yang
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
Statistical Post-Editing on SYSTRAN‘s Rule-Based Translation System
Loïc Dugast | Jean Senellart | Philipp Koehn
Proceedings of the Second Workshop on Statistical Machine Translation

2003

pdf bib
SYSTRAN’s Chinese Word Segmentation
Jin Yang | Jean Senellart | Remi Zajac
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

1999

bib
Semi-automatic acquisition of lexical resources for new languages or new domains
Jean Senellart
EAMT Workshop: EU and the new languages

1998

pdf bib
Locating noun phrases with finite state transducers.
Jean Senellart
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Locating Noun Phrases with Finite State Transducers
Jean Senellart
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Tools for locating noun phrases with finite state transducers
Jean Senellart
The Computational Treatment of Nominals