Christian Boitet

Also published as: Ch. Boitet


2020

pdf bib
Démo de AMALD-serveur et AMALD-corpus, dédiés à l’analyse morphologique de l’allemand (Demonstration of AMALD-serveur and AMALD-corpus, dedicated to the morphological analysis of German)
Christian Boitet | Vincent Berment | Jean-Philippe Guilbaud | Claire Lemaire
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

Le projet AMALDarium vise à offrir sur la plateforme lingwarium.org (1) un service d’analyse morphologique de l’allemand (AMALD-serveur), à grande couverture et de haute qualité, traitant la flexion, la dérivation et la composition, ainsi que les verbes à particule séparable séparée (ou agglutinée), (2) un corpus de référence de haute qualité donnant tous les résultats possibles de l’analyse morphologique, avant filtrage par une méthode statistique ou syntaxique, et (3) une plateforme (AMALD-éval) permettant d’organiser des évaluations comparatives, dans la perspective d’améliorer les performances d’algorithmes d’apprentissage en morphologie. Nous présentons ici une démonstration en ligne seulement de AMALD-serveur et AMALD-corpus. Le corpus est un sous-ensemble anonymisé et vérifié d’un corpus en allemand formé de textes sur le cancer du sein, contenant de nombreux mots composés techniques.

2018

pdf bib
Towards an Automatic Classification of Illustrative Examples in a Large Japanese-French Dictionary Obtained by OCR
Christian Boitet | Mathieu Mangeot | Mutsuko Tomokiyo
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

We work on improving the Cesselin, a large and open source Japanese-French bilingual dictionary digitalized by OCR, available on the web, and contributively improvable online. Labelling its examples (about 226000) would significantly enhance their usefulness for language learners. Examples are proverbs, idiomatic constructions, normal usage examples, and, for nouns, phrases containing a quantifier. Proverbs are easy to spot, but not examples of other types. To find a method for automatically or at least semi-automatically annotating them, we have studied many entries, and hypothesized that the degree of lexical similarity between results of MT into a third language might give good cues. To confirm that hypothesis, we sampled 500 examples and used Google Translate to translate into English their Japanese expressions and their French translations. The hypothesis holds well, in particular for distinguishing examples of normal usage from idiomatic examples. Finally, we propose a detailed annotation procedure and discuss its future automatization.

2016

pdf bib
An Aligned French-Chinese corpus of 10K segments from university educational material
Ruslan Kalitvianski | Lingxiao Wang | Valérie Bellynck | Christian Boitet
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the PROJECT_NAME project, by native Chinese students. The quality, as judged by native speakers, is ad-equate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as supplemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released.

pdf bib
Corpus and dictionary development for classifiers/quantifiers towards a French-Japanese machine translation
Mutsuko Tomokiyo | Christian Boitet
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

Although quantifiers/classifiers expressions occur frequently in everyday communications or written documents, there is no description for them in classical bilingual paper dictionaries, nor in machine-readable dictionaries. The paper describes a corpus and dictionary development for quantifiers/classifiers, and their usage in the framework of French-Japanese machine translation (MT). They often cause problems of lexical ambiguity and of set phrase recognition during analysis, in particular for a long-distance language pair like French and Japanese. For the development of a dictionary aiming at ambiguity resolution for expressions including quantifiers and classifiers which may be ambiguous with common nouns, we have annotated our corpus with UWs (interlingual lexemes) of UNL (Universal Networking Language) found on the UNL-jp dictionary. The extraction of potential classifiers/quantifiers from corpus is made by UNLexplorer web service. Keywords : classifiers, quantifiers, phraseology study, corpus annotation, UNL (Universal Networking Language), UWs dictionary, Tori Bank, French-Japanese machine translation (MT).

pdf bib
Héloïse, une plate-forme pour développer des systèmes de TA compatibles Ariane en réseau (Heloise, a platform for collaborative development of Ariane-compatible MT systems)
Vincent Berment | Christian Boitet | Guillaume de Malézieux
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

Dans cette démo, nous montrons comment utiliser Héloïse pour développer des systèmes de TA.

2015

pdf bib
Post-editing a chapter of a specialized textbook into 7 languages: importance of terminological proximity with English for productivity
Ritesh Shah | Christian Boitet | Pushpak Bhattacharyya | Mithun Padmakumar | Leonardo Zilio | Ruslan Kalitvianski | Mohammad Nasiruddin | Mutsuko Tomokiyo | Sandra Castellanos Páez
Proceedings of the 12th International Conference on Natural Language Processing

2014

pdf bib
Jibiki-LINKS: a tool between traditional dictionaries and lexical networks for modelling lexical resources
Ying Zhang | Mathieu Mangeot | Valérie Bellynck | Christian Boitet
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf bib
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing
Christian Boitet | M.G. Abbas Malik
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

pdf bib
On-going Cooperative Research towards Developing Economy-Oriented Chinese-French SMT Systems with a New SMT Framework
Yidong Chen | Lingxiao Wang | Christian Boitet | Xiaodong Shi
Proceedings of TALN 2014 (Volume 2: Short Papers)

2013

pdf bib
Urdu Hindi Machine Transliteration using SMT
M. G. Abbas Malik | Christian Boitet | Laurent Besacier | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing

pdf bib
An extended morphological analyzer of German handling verbal forms with separated separable particles (Un analyseur morphologique étendu de l’allemand traitant les formes verbales à particule séparée) [in French]
Jean-Philippe Guilbaud | Christian Boitet | Vincent Berment
Proceedings of TALN 2013 (Volume 2: Short Papers)

2012

pdf bib
Proceedings of COLING 2012
Martin Kay | Christian Boitet
Proceedings of COLING 2012

pdf bib
Proceedings of COLING 2012: Posters
Martin Kay | Christian Boitet
Proceedings of COLING 2012: Posters

pdf bib
Heloise — A Reengineering of Ariane-G5 SLLPs for Application to π-languages
Vincent Berment | Christian Boitet
Proceedings of COLING 2012: Posters

pdf bib
Proceedings of COLING 2012: Demonstration Papers
Martin Kay | Christian Boitet
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Heloise — An Ariane-G5 Compatible Rnvironment for Developing Expert MT Systems Online
Vincent Berment | Christian Boitet
Proceedings of COLING 2012: Demonstration Papers

pdf bib
An In-Context and Collaborative Software Localisation Model
Amel Fraisse | Christian Boitet | Valérie Bellynck
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Collaborative Computer-Assisted Translation Applied to Pedagogical Documents and Literary Works
Ruslan Kalitvianski | Christian Boitet | Valérie Bellynck
Proceedings of COLING 2012: Demonstration Papers

pdf bib
Demo of iMAG Possibilities: MT-postediting, Translation Quality Evaluation, Parallel Corpus Production
Ling Xiao Wang | Ying Zhang | Christian Boitet | Valerie Bellynck
Proceedings of COLING 2012: Demonstration Papers

2011

pdf bib
Learning-to-Translate Based on the S-SSTC Annotation Schema
Enya Kong Tang | Zaharin Yusoff | Christian Boitet
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2010

pdf bib
Multilingual Lexical Network from the Archives of the Digital Silk Road
Hans-Mohammad Daoud | Kyo Kageura | Christian Boitet | Asanobu Kitamoto | Mathieu Mangeot
Proceedings of the 6th Workshop on Ontologies and Lexical Resources

pdf bib
Ontology driven content extraction using interlingual annotation of texts in the OMNIA project
Achille Falaise | David Rouquet | Didier Schwab | Hervé Blanchon | Christian Boitet
Proceedings of the 4th Workshop on Cross Lingual Information Access

pdf bib
Multilinguization and Personalization of NL-based Systems
Najeh Hajlaoui | Christian Boitet
Proceedings of the 4th Workshop on Cross Lingual Information Access

pdf bib
Finite-state Scriptural Translation
M. G. Abbas Malik | Christian Boitet | Pushpak Bhattacharyya
Coling 2010: Posters

pdf bib
Web-based and combined language models: a case study on noun compound identification
Carlos Ramisch | Aline Villavicencio | Christian Boitet
Coling 2010: Posters

pdf bib
Multiword Expressions in the wild? The mwetoolkit comes in handy
Carlos Ramisch | Aline Villavicencio | Christian Boitet
Coling 2010: Demonstrations

pdf bib
mwetoolkit: a Framework for Multiword Expression Identification
Carlos Ramisch | Aline Villavicencio | Christian Boitet
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the Multiword Expression Toolkit (mwetoolkit), an environment for type and language-independent MWE identification from corpora. The mwetoolkit provides a targeted list of MWE candidates, extracted and filtered according to a number of user-defined criteria and a set of standard statistical association measures. For generating corpus counts, the toolkit provides both a corpus indexation facility and a tool for integration with web search engines, while for evaluation, it provides validation and annotation facilities. The mwetoolkit also allows easy integration with a machine learning tool for the creation and application of supervised MWE extraction models if annotated data is available. In our experiment, the mwetoolkit was tested and evaluated in the context of MWE extraction in the biomedical domain. Our preliminary results show that the toolkit performs better than other approaches, especially concerning recall. Moreover, this first version can also be extended in several ways in order to improve the quality of the results.

2009

pdf bib
A Hybrid Model for Urdu Hindi Transliteration
Abbas Malik | Laurent Besacier | Christian Boitet | Pushpak Bhattacharyya
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

2008

pdf bib
SECTra_w.1: an Online Collaborative System for Evaluating, Post-editing and Presenting MT Translation Corpora
Cong-Phap Huynh | Christian Boitet | Hervé Blanchon
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

SECTra_w is a web-oriented system mainly dedicated to the evaluation of MT systems. After importing a source corpus, and possibly reference translations, one can call various MT systems, store their results, and have a collection of human judges perform subjective evaluation online (fluidity, adequacy). It is also possible to perform objective, task-oriented evaluation by letting humans post-edit the MT results, using a web translation editor, and measuring an edit distance and/or the post-editing time. The post-edited results can be added to the set of reference translations, or constitute it if there were no references. SECTra_w makes it possible to show not only tables of figures as results of an evaluation campaign, but also the real data (source, MT outputs, references, post-edited outputs), and to make the post-edition effort sensible by transforming the trace of the edit distance computation in an intuitive presentation, much like a “revision” presentation in Word. The system is written in java under Xwiki and uses the Ajax technique. It can handle large, multilingual and multimedia corpora: EuroParl, BTEC, ERIM (bilingual interpreted dialogues with audio and text), Unesco-B@bel, and a test corpus by France Telecom have been loaded together and used in tests.

pdf bib
Hindi Urdu Machine Transliteration using Finite-State Transducers
M G Abbas Malik | Christian Boitet | Pushpak Bhattacharyya
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf bib
BEYTrans: A Free Online Collaborative Wiki-Based CAT Environment Designed for Online Translation Communities
Youcef Bey | Kyo Kageura | Christian Boitet
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

2006

pdf bib
Data Management in QRLex, an Online Aid System for Volunteer Translators’
Youcef Bey | Kyo Kageura | Christian Boitet
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 4, December 2006

2005

pdf bib
A Framework for Data Management for the Online Volunteer Translators’ Aid System QRLex
Youcef Bey | Kyo Kageura | Christian Boitet
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation

2004

pdf bib
Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment
Georges Fafiotte | Christian Boitet | Mark Seligman | Chengqing Zong
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
PolyphraZ: a Tool for the Management of Parallel Corpora
Najeh Hajlaoui | Christian Boitet
Proceedings of the Workshop on Multilingual Linguistic Resources

2002

pdf bib
Coedition to Share Text Revision across Languages and Improve MT a Posteriori
Christian Boitet | Wang-Ju Tsai
COLING-02: Machine Translation in Asia

pdf bib
The PAPILLON Project: Cooperatively Building a Multilingual Lexical Data-base to Derive Open Source Dictionaries & Lexicons
Christian Boitet | Mathieu Mangeot | Gilles Sérasset
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)

pdf bib
UNL Lexical Selection with Conceptual Vectors
Mathieu Lafourcade | Christian Boitet
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
On UNL as the future “html of the linguistic content” & the reuse of existing NLP components in UNL-related applications with the example of a UNL-French deconverter
Gilles Sérasset | Christian Boitet
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1998

pdf bib
Transforming Lattices into Non-deterministic Automata with Optional Null Arcs
Mark Seligman | Christian Boitet | Boubaker Meddeb-Hamrouni
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
Transforming Lattices into Non-deterministic Automata with Optional Null Arcs
Mark Seligman | Christian Boitet | Boubaker Meddeb-Hamrouni
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1996

pdf bib
Theory and practice of ambiguity labelling with a view to interactive disambiguation in text and speech MT
Christian Boitet | Mutsuko Tomokiyo
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1994

pdf bib
The “Whiteboard” Architecture: A Way to Integrate Heterogeneous Components of NLP Systems
Christian Boitet | Mark Seligman
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

1992

pdf bib
About these proceedings
Christian Boitet
COLING 1992 Volume 1: The 15th International Conference on Computational Linguistics

pdf bib
Multilinguisation d’un editeur de documents structures. Application a un dictionnaire trilingue
Huy Khanh Phan | Christian Boitet
COLING 1992 Volume 3: The 15th International Conference on Computational Linguistics

1990

pdf bib
Towards Personal MT: general design, dialogue structure, potential role of speech
Christian Boitet
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

pdf bib
Towards Personal MT: general design, dialogue structure, potential role of speech
Christian Boitet
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics

1988

pdf bib
Representation Trees and String-Tree Correspondences
Ch. Boitet | Y. Zaharin
Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics

1986

pdf bib
TOWARD INTEGRATED DICTIONARIES FOR M(a)T: motivations and linguistic organisation
Ch. Boitet | N. Nedobejkine
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics

1985

pdf bib
Automated Translation at Grenoble University
Bernard Vauquois | Christian Boitet
Computational Linguistics Formerly the American Journal of Computational Linguistics, Volume 11, Number 1, January-March 1985

pdf bib
Various Representations of Text Proposed for Eurotra
Christian Boitet | Nelson Verastegui | Daniel Bachut
Second Conference of the European Chapter of the Association for Computational Linguistics

1984

pdf bib
Expert Systems and Other New Techniques in MT Systems
Christian Boitet | Rene Gerber
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics

1982

pdf bib
Implementation and Conversational Environment of ARIANE 78.4, An Integrated System for Automated Translation and Human Revision
Ch. Boitet | P. Guillaume | M. Quezel-Ambrunaz
Coling 1982: Proceedings of the Ninth International Conference on Computational Linguistics

1980

pdf bib
Present and Future Paradigms in the Automatized Translation of Natural Languages.
Ch. Boitet | P. Chatelin | P. Daun Fraga
COLING 1980 Volume 1: The 8th International Conference on Computational Linguistics

pdf bib
Russian-French at GETA: Outline of the Method and Detailed Example
Ch. Boitet | N. Nedobejkine
COLING 1980 Volume 1: The 8th International Conference on Computational Linguistics