Eric Gaussier

Also published as: Éric Gaussier


2020

pdf bib
Word Representations Concentrate and This is Good News!
Romain Couillet | Yagmur Gizem Cinar | Eric Gaussier | Muhammad Imran
Proceedings of the 24th Conference on Computational Natural Language Learning

This article establishes that, unlike the legacy tf*idf representation, recent natural language representations (word embedding vectors) tend to exhibit a so-called concentration of measure phenomenon, in the sense that, as the representation size p and database size n are both large, their behavior is similar to that of large dimensional Gaussian random vectors. This phenomenon may have important consequences as machine learning algorithms for natural language data could be amenable to improvement, thereby providing new theoretical insights into the field of natural language processing.

2019

pdf bib
Terminology-based Text Embedding for Computing Document Similarities on Technical Content
Hamid Mirisaee | Eric Gaussier | Cedric Lagnier | Agnes Guerraz
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Terminologie et Intelligence Artificielle (atelier TALN-RECITAL \& IC)

We propose in this paper a new, hybrid document embedding approach in order to address the problem of document similarities with respect to the technical content. To do so, we employ a state-of-the-art graph techniques to first extract the keyphrases (composite keywords) of documents and, then, use them to score the sentences. Using the ranked sentences, we propose two approaches to embed documents and show their performances with respect to two baselines. With domain expert annotations, we illustrate that the proposed methods can find more relevant documents and outperform the baselines up to 27% in terms of NDCG.

2018

pdf bib
Char2char Generation with Reranking for the E2E NLG Challenge
Shubham Agarwal | Marc Dymetman | Éric Gaussier
Proceedings of the 11th International Conference on Natural Language Generation

This paper describes our submission to the E2E NLG Challenge. Recently, neural seq2seq approaches have become mainstream in NLG, often resorting to pre- (respectively post-) processing delexicalization (relexicalization) steps at the word-level to handle rare words. By contrast, we train a simple character level seq2seq model, which requires no pre/post-processing (delexicalization, tokenization or even lowercasing), with surprisingly good results. For further improvement, we explore two re-ranking approaches for scoring candidates. We also introduce a synthetic dataset creation procedure, which opens up a new way of creating artificial datasets for Natural Language Generation.

2017

pdf bib
Topical Coherence in LDA-based Models through Induced Segmentation
Hesam Amoualian | Wei Lu | Eric Gaussier | Georgios Balikas | Massih R. Amini | Marianne Clausel
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.

2016

pdf bib
Natural Language Generation through Character-based RNNs with Finite-state Prior Knowledge
Raghav Goyal | Marc Dymetman | Eric Gaussier
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Recently Wen et al. (2015) have proposed a Recurrent Neural Network (RNN) approach to the generation of utterances from dialog acts, and shown that although their model requires less effort to develop than a rule-based system, it is able to improve certain aspects of the utterances, in particular their naturalness. However their system employs generation at the word-level, which requires one to pre-process the data by substituting named entities with placeholders. This pre-processing prevents the model from handling some contextual effects and from managing multiple occurrences of the same attribute. Our approach uses a character-level model, which unlike the word-level model makes it possible to learn to “copy” information from the dialog act to the target without having to pre-process the input. In order to avoid generating non-words and inventing information not present in the input, we propose a method for incorporating prior knowledge into the RNN in the form of a weighted finite-state automaton over character sequences. Automatic and human evaluations show improved performance over baselines on several evaluation criteria.

pdf bib
Modeling topic dependencies in semantically coherent text spans with copulas
Georgios Balikas | Hesam Amoualian | Marianne Clausel | Eric Gaussier | Massih R. Amini
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The exchangeability assumption in topic models like Latent Dirichlet Allocation (LDA) often results in inferring inconsistent topics for the words of text spans like noun-phrases, which are usually expected to be topically coherent. We propose copulaLDA, that extends LDA by integrating part of the text structure to the model and relaxes the conditional independence assumption between the word-specific latent topics given the per-document topic distributions. To this end, we assume that the words of text spans like noun-phrases are topically bound and we model this dependence with copulas. We demonstrate empirically the effectiveness of copulaLDA on both intrinsic and extrinsic evaluation tasks on several publicly available corpora.

2011

pdf bib
Clustering Comparable Corpora For Bilingual Lexicon Extraction
Bo Li | Eric Gaussier | Akiko Aizawa
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Book Review: Statistical Language Models for Information Retrieval by ChengXiang Zhai
Eric Gaussier
Computational Linguistics, Volume 36, Number 2, June 2010

pdf bib
Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora
Bo Li | Eric Gaussier
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2006

pdf bib
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Dan Jurafsky | Eric Gaussier
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Translating with Non-contiguous Phrases
Michel Simard | Nicola Cancedda | Bruno Cavestro | Marc Dymetman | Eric Gaussier | Cyril Goutte | Kenji Yamada | Philippe Langlais | Arne Mauser
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Aligning words using matrix factorisation
Cyril Goutte | Kenji Yamada | Eric Gaussier
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora
Eric Gaussier | J.M. Renders | I. Matveeva | C. Goutte | H. Dejean
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Reducing Parameter Space for Word Alignment
Herve Dejean | Eric Gaussier | Cyril Goutte | Kenji Yamada
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

2002

pdf bib
Combining Labelled and Unlabelled Data: A Case Study on Fisher Kernels and Transductive Inference for Biological Entity Recognition
Cyril Goutte | Hervé Déjean | Eric Gaussier | Nicola Cancedda | Jean-Michel Renders
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

pdf bib
An Approach Based on Multilingual Thesauri and Model Combination for Bilingual Lexicon Extraction
Hervé Déjean | Éric Gaussier | Fatiha Sadat
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Probabilistic models for PP-attachment resolution and NP analysis
Eric Gaussier | Nicola Cancedda
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)

2000

pdf bib
Coreference Resolution Evaluation Based on Descriptive Specificity
François Trouilleux | Eric Gaussier | Gabriel G. Bès | Annie Zaenen
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib
Unsupervised learning of derivational morphology from inflectional lexicons
Eric Gaussier
Unsupervised Learning in Natural Language Processing

1998

pdf bib
Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora
Eric Gaussier
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora
Eric Gaussier
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1994

pdf bib
Towards Automatic Extraction of Monolingual and Bilingual Terminology
Beatrice Daille | Eric Gaussier | Jean-Marc Lange
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics