Aurélien Max


2018

pdf bib
Construction of a Multilingual Corpus Annotated with Translation Relations
Yuming Zhai | Aurélien Max | Anne Vilnat
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

Translation relations, which distinguish literal translation from other translation techniques, constitute an important subject of study for human translators (Chuquet and Paillard, 1989). However, automatic processing techniques based on interlingual relations, such as machine translation or paraphrase generation exploiting translational equivalence, have not exploited these relations explicitly until now. In this work, we present a categorisation of translation relations and annotate them in a parallel multilingual (English, French, Chinese) corpus of oral presentations, the TED Talks. Our long term objective will be to automatically detect these relations in order to integrate them as important characteristics for the search of monolingual segments in relation of equivalence (paraphrases) or of entailment. The annotated corpus resulting from our work will be made available to the community.

2016

pdf bib
LIMSI’s Contribution to the WMT’16 Biomedical Translation Task
Julia Ive | Aurélien Max | François Yvon
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
Sentence alignment for literary texts: The state-of-the-art and beyond
Yong Xu | Aurélien Max | François Yvon
Linguistic Issues in Language Technology, Volume 12, 2015 - Literature Lifts up Computational Linguistics

Literary works are becoming increasingly available in electronic formats, thus quickly transforming editorial processes and reading habits. In the context of the global enthusiasm for multilingualism, the rapid spread of e-book readers, such as Amazon Kindle R or Kobo Touch R , fosters the development of a new generation of reading tools for bilingual books. In particular, literary works, when available in several languages, offer an attractive perspective for self-development or everyday leisure reading, but also for activities such as language learning, translation or literary studies. An important issue in the automatic processing of multilingual e-books is the alignment between textual units. Alignment could help identify corresponding text units in different languages, which would be particularly beneficial to bilingual readers and translation professionals. Computing automatic alignments for literary works, however, is a task more challenging than in the case of better behaved corpora such as parliamentary proceedings or technical manuals. In this paper, we revisit the problem of computing high-quality. alignment for literary works. We first perform a large-scale evaluation of automatic alignment for literary texts, which provides a fair assessment of the actual difficulty of this task. We then introduce a two-pass approach, based on a maximum entropy model. Experimental results for novels available in English and French or in English and Spanish demonstrate the effectiveness of our method.

pdf bib
Touch-Based Pre-Post-Editing of Machine Translation Output
Benjamin Marie | Aurélien Max
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Multi-Pass Decoding With Complex Feature Guidance for Statistical Machine Translation
Benjamin Marie | Aurélien Max
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
LIMSI @ WMT’14 Medical Translation Task
Nicolas Pécheux | Li Gong | Quoc Khanh Do | Benjamin Marie | Yulia Ivanishcheva | Alexander Allauzen | Thomas Lavergne | Jan Niehues | Aurélien Max | François Yvon
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Confidence-based Rewriting of Machine Translation Output
Benjamin Marie | Aurélien Max
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Towards a More Efficient Development of Statistical Machine Translation Systems (Vers un développement plus efficace des systèmes de traduction statistique : un peu de vert dans un monde de BLEU) [in French]
Li Gong | Aurélien Max | François Yvon
Proceedings of TALN 2014 (Volume 2: Short Papers)

pdf bib
(Much) Faster Construction of SMT Phrase Tables from Large-scale Parallel Corpora (Construction (très) rapide de tables de traduction à partir de grands bi-textes) [in French]
Li Gong | Aurélien Max | François Yvon
Proceedings of TALN 2014 (Volume 3: System Demonstrations)

2013

pdf bib
LIMSI’s participation to the 2013 shared task on Native Language Identification
Thomas Lavergne | Gabriel Illouz | Aurélien Max | Ryo Nagata
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
LIMSI @ WMT13
Alexander Allauzen | Nicolas Pécheux | Quoc Khanh Do | Marco Dinarelli | Thomas Lavergne | Aurélien Max | Hai-Son Le | François Yvon
Proceedings of the Eighth Workshop on Statistical Machine Translation

2012

pdf bib
Extraction d’information automatique en domaine médical par projection inter-langue : vers un passage à l’échelle (Automatic Information Extraction in the Medical Domain by Cross-Lingual Projection) [in French]
Asma Ben Abacha | Pierre Zweigenbaum | Aurélien Max
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Validation sur le Web de reformulations locales: application à la Wikipédia (Assisted Rephrasing for Wikipedia Contributors through Web-based Validation) [in French]
Houda Bouamor | Aurélien Max | Gabriel Illouz | Anne Vilnat
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
Une étude en 3D de la paraphrase: types de corpus, langues et techniques (A Study of Paraphrase along 3 Dimensions : Corpus Types, Languages and Techniques) [in French]
Houda Bouamor | Aurélien Max | Anne Vilnat
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib
A contrastive review of paraphrase acquisition techniques
Houda Bouamor | Aurélien Max | Gabriel Illouz | Anne Vilnat
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper addresses the issue of what approach should be used for building a corpus of sententential paraphrases depending on one's requirements. Six strategies are studied: (1) multiple translations into a single language from another language; (2) multiple translations into a single language from different other languages; (3) multiple descriptions of short videos; (4) multiple subtitles for the same language; (5) headlines for similar news articles; and (6) sub-sentential paraphrasing in the context of a Web-based game. We report results on French for 50 paraphrase pairs collected for all these strategies, where corpora were manually aligned at the finest possible level to define oracle performance in terms of accessible sub-sentential paraphrases. The differences observed will be used as criteria for motivating the choice of a given approach before attempting to build a new paraphrase corpus.

pdf bib
Aligning Bilingual Literary Works: a Pilot Study
Qian Yu | Aurélien Max | François Yvon
Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature

pdf bib
LIMSI @ WMT12
Hai-Son Le | Thomas Lavergne | Alexandre Allauzen | Marianna Apidianaki | Li Gong | Aurélien Max | Artem Sokolov | Guillaume Wisniewski | François Yvon
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
WSD for n-best reranking and local language modeling in SMT
Marianna Apidianaki | Guillaume Wisniewski | Artem Sokolov | Aurélien Max | François Yvon
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Validation of sub-sentential paraphrases acquired from parallel monolingual corpora
Houda Bouamor | Aurélien Max | Anne Vilnat
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs
Aurélien Max | Houda Bouamor | Anne Vilnat
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Web-based Validation for Contextual Targeted Paraphrasing
Houda Bouamor | Aurélien Max | Gabriel Illouz | Anne Vilnat
Proceedings of the Workshop on Monolingual Text-To-Text Generation

pdf bib
LIMSI @ WMT11
Alexandre Allauzen | Hélène Bonneau-Maynard | Hai-Son Le | Aurélien Max | Guillaume Wisniewski | François Yvon | Gilles Adda | Josep Maria Crego | Adrien Lardilleux | Thomas Lavergne | Artem Sokolov
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs
Houda Bouamor | Aurélien Max | Anne Vilnat
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Local lexical adaptation in Machine Translation through triangulation: SMT helping SMT
Josep Maria Crego | Aurélien Max | François Yvon
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Contrastive Lexical Evaluation of Machine Translation
Aurélien Max | Josep Maria Crego | François Yvon
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper advocates a complementary measure of translation performance that focuses on the constrastive ability of two or more systems or system versions to adequately translate source words. This is motivated by three main reasons : 1) existing automatic metrics sometimes do not show significant differences that can be revealed by fine-grained focussed human evaluation, 2) these metrics are based on direct comparisons between system hypotheses with the corresponding reference translations, thus ignoring the input words that were actually translated, and 3) as these metrics do not take input hypotheses from several systems at once, fine-grained contrastive evaluation can only be done indirectly. This proposal is illustrated on a multi-source Machine Translation scenario where multiple translations of a source text are available. Significant gains (up to +1.3 BLEU point) are achieved on these experiments, and contrastive lexical evaluation is shown to provide new information that can help to better analyse a system's performance.

pdf bib
Mining Naturally-occurring Corrections and Paraphrases from Wikipedia’s Revision History
Aurélien Max | Guillaume Wisniewski
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also prove interesting material for linguistic studies. In this article, we present WiCoPaCo (Wikipedia Correction and Paraphrase Corpus), a new freely-available resource built by automatically mining Wikipedia’s revision history. The WiCoPaCo corpus focuses on local modifications made by human revisors and include various types of corrections (such as spelling error or typographical corrections) and rewritings, which can be categorized broadly into meaning-preserving and meaning-altering revisions. We present an initial hand-built typology of these revisions, but the resource allows for any possible annotation scheme. We discuss the main motivations for building such a resource and describe the main technical details guiding its construction. We also present applications and data analysis on French and report initial results on spelling error correction and morphosyntactic rewriting. The WiCoPaCo corpus can be freely downloaded from http://wicopaco.limsi.fr.

pdf bib
Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation
Aurélien Max
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
LIMSI‘s Statistical Translation Systems for WMT‘09
Alexandre Allauzen | Josep Crego | Aurélien Max | François Yvon
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Sub-sentencial Paraphrasing by Contextual Pivot Translation
Aurélien Max
Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer)

2008

pdf bib
An Evaluation of Spoken and Textual Interaction in the RITEL Interactive Question Answering System
Dave Toney | Sophie Rosset | Aurélien Max | Olivier Galibert | Eric Bilinski
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The RITEL project aims to integrate a spoken language dialogue system and an open-domain information retrieval system in order to enable human users to ask a general question and to refine their search for information interactively. This type of system is often referred to as an Interactive Question Answering (IQA) system. In this paper, we present an evaluation of how the performance of the RITEL system differs when users interact with it using spoken versus textual input and output. Our results indicate that while users do not perceive the two versions to perform significantly differently, many more questions are asked in a typical text-based dialogue.

pdf bib
Looking up phrase rephrasings via a pivot language
Aurélien Max | Michael Zock
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)

pdf bib
Explorations in using grammatical dependencies for contextual phrase translation disambiguation
Aurélien Max | Rafik Makhloufi | Philippe Langlais
Proceedings of the 12th Annual conference of the European Association for Machine Translation

2004

pdf bib
From Controlled Document Authoring to Interactive Document Normalization
Aurélien Max
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Interpreting communicative goals in constrained domains using generation and interactive negotiation
Aurélien Max
Proceedings of the 2nd Workshop on Text Meaning and Interpretation

pdf bib
he Syntax Student’s Companion: an eLearning Tool designed for (Computational) Linguistics Students
Aurélien Max
Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning

2003

pdf bib
Multi-language Machine Translation through Interactive Document Normalization
Aurélien Max
Proceedings of the 7th International EAMT workshop on MT and other language technology tools, Improving MT through other language technology tools, Resource and tools for building MT at EACL 2003

pdf bib
Reversing Controlled Document Authoring to Normalize Documents
Aurélien Max
Student Research Workshop

pdf bib
Towards Interactive Text Understanding
Marc Dymetman | Aurélien Max | Kenji Yamada
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics