Christophe Gravier


2020

pdf bib
Few-shot Pseudo-Labeling for Intent Detection
Thomas Dopierre | Christophe Gravier | Julien Subercaze | Wilfried Logerais
Proceedings of the 28th International Conference on Computational Linguistics

In this paper, we introduce a state-of-the-art pseudo-labeling technique for few-shot intent detection. We devise a folding/unfolding hierarchical clustering algorithm which assigns weighted pseudo-labels to unlabeled user utterances. We show that our two-step method yields significant improvement over existing solutions. This performance is achieved on multiple intent detection datasets, even in more challenging situations where the number of classes is large or when the dataset is highly imbalanced. Moreover, we confirm this results on the more general text classification task. We also demonstrate that our approach nicely complements existing solutions, thereby providing an even stronger state-of-the-art ensemble method.

2018

pdf bib
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
Hady Elsahar | Pavlos Vougiouklis | Arslen Remaci | Christophe Gravier | Jonathon Hare | Frederique Laforest | Elena Simperl
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types
Hady Elsahar | Christophe Gravier | Frederique Laforest
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present a neural model for question generation from knowledge graphs triples in a “Zero-shot” setup, that is generating questions for predicate, subject types or object types that were not seen at training time. Our model leverages triples occurrences in the natural language corpus in a encoder-decoder architecture, paired with an original part-of-speech copy action mechanism to generate questions. Benchmark and human evaluation show that our model outperforms state-of-the-art on this task.

pdf bib
Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata
Lucie-Aimée Kaffee | Hady Elsahar | Pavlos Vougiouklis | Christophe Gravier | Frédérique Laforest | Jonathon Hare | Elena Simperl
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.

2017

pdf bib
High Recall Open IE for Relation Discovery
Hady Elsahar | Christophe Gravier | Frederique Laforest
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Relation Discovery discovers predicates (relation types) from a text corpus relying on the co-occurrence of two named entities in the same sentence. This is a very narrowing constraint: it represents only a small fraction of all relation mentions in practice. In this paper we propose a high recall approach for Open IE, which enables covering up to 16 times more sentences in a large corpus. Comparison against OpenIE systems shows that our proposed approach achieves 28% improvement over the highest recall OpenIE system and 6% improvement in precision than the same system.

pdf bib
Dict2vec : Learning Word Embeddings using Lexical Dictionaries
Julien Tissier | Christophe Gravier | Amaury Habrard
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Learning word embeddings on large unlabeled corpus has been shown to be successful in improving many natural language tasks. The most efficient and popular approaches learn or retrofit such representations using additional external data. Resulting embeddings are generally better than their corpus-only counterparts, although such resources cover a fraction of words in the vocabulary. In this paper, we propose a new approach, Dict2vec, based on one of the largest yet refined datasource for describing words – natural language dictionaries. Dict2vec builds new word pairs from dictionary entries so that semantically-related words are moved closer, and negative sampling filters out pairs whose words are unrelated in dictionaries. We evaluate the word representations obtained using Dict2vec on eleven datasets for the word similarity task and on four datasets for a text classification task.

2015

pdf bib
On metric embedding for boosting semantic similarity computations
Julien Subercaze | Christophe Gravier | Frederique Laforest
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)