Leonardo Campillos Llanos

Also published as: Leonardo Campillos Llanos


2017

pdf bib
Automatic classification of doctor-patient questions for a virtual patient record query task
Leonardo Campillos Llanos | Sophie Rosset | Pierre Zweigenbaum
BioNLP 2017

We present the work-in-progress of automating the classification of doctor-patient questions in the context of a simulated consultation with a virtual patient. We classify questions according to the computational strategy (rule-based or other) needed for looking up data in the clinical record. We compare ‘traditional’ machine learning methods (Gaussian and Multinomial Naive Bayes, and Support Vector Machines) and a neural network classifier (FastText). We obtained the best results with the SVM using semantic annotations, whereas the neural classifier achieved promising results without it.

2016

pdf bib
Transfer-Based Learning-to-Rank Assessment of Medical Term Technicality
Dhouha Bouamor | Leonardo Campillos Llanos | Anne-Laure Ligozat | Sophie Rosset | Pierre Zweigenbaum
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

While measuring the readability of texts has been a long-standing research topic, assessing the technicality of terms has only been addressed more recently and mostly for the English language. In this paper, we train a learning-to-rank model to determine a specialization degree for each term found in a given list. Since no training data for this task exist for French, we train our system with non-lexical features on English data, namely, the Consumer Health Vocabulary, then apply it to French. The features include the likelihood ratio of the term based on specialized and lay language models, and tests for containing morphologically complex words. The evaluation of this approach is conducted on 134 terms from the UMLS Metathesaurus and 868 terms from the Eugloss thesaurus. The Normalized Discounted Cumulative Gain obtained by our system is over 0.8 on both test sets. Besides, thanks to the learning-to-rank approach, adding morphological features to the language model features improves the results on the Eugloss thesaurus.

pdf bib
Managing Linguistic and Terminological Variation in a Medical Dialogue System
Leonardo Campillos Llanos | Dhouha Bouamor | Pierre Zweigenbaum | Sophie Rosset
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We introduce a dialogue task between a virtual patient and a doctor where the dialogue system, playing the patient part in a simulated consultation, must reconcile a specialized level, to understand what the doctor says, and a lay level, to output realistic patient-language utterances. This increases the challenges in the analysis and generation phases of the dialogue. This paper proposes methods to manage linguistic and terminological variation in that situation and illustrates how they help produce realistic dialogues. Our system makes use of lexical resources for processing synonyms, inflectional and derivational variants, or pronoun/verb agreement. In addition, specialized knowledge is used for processing medical roots and affixes, ontological relations and concept mapping, and for generating lay variants of terms according to the patient’s non-expert discourse. We also report the results of a first evaluation carried out by 11 users interacting with the system. We evaluated the non-contextual analysis module, which supports the Spoken Language Understanding step. The annotation of task domain entities obtained 91.8% of Precision, 82.5% of Recall, 86.9% of F-measure, 19.0% of Slot Error Rate, and 32.9% of Sentence Error Rate.

2015

pdf bib
Description of the PatientGenesys Dialogue System
Leonardo Campillos Llanos | Dhouha Bouamor | Éric Bilinski | Anne-Laure Ligozat | Pierre Zweigenbaum | Sophie Rosset
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2012

pdf bib
Designing a search interface for a Spanish learner spoken corpus: the end-user’s evaluation
Leonardo Campillos Llanos
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This article summarizes the evaluation process of an interface under development to consult an oral corpus of learners of Spanish as a Foreign Language. The databank comprises 40 interviews with students with over 9 different mother tongues collected for Error Analysis. XML mark-up is used to code the information about the learners and their errors (with an explanation), and the search tool makes it is possible to look up these errors and to listen to the utterances where they appear. The formative evaluation was performed to improve the interface during the design stage by means of a questionnaire which addressed issues related to the teachers' beliefs about languages, their opinion about the Error Analysis methodology, and specific points about the interface design and usability. The results unveiled some deficiencies of the current prototype as well as the interests of the teaching professionals which should be considered to bridge the gap between technology development and its pedagogical applications.

pdf bib
Spontaneous Speech Corpora for language learners of Spanish, Chinese and Japanese
Antonio Moreno-Sandoval | Leonardo Campillos Llanos | Yang Dong | Emi Takamori | José M. Guirao | Paula Gozalo | Chieko Kimura | Kengo Matsui | Marta Garrote-Salazar
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a method for designing, compiling and annotating corpora intended for language learners. In particular, we focus on spoken corpora for being used as complementary material in the classroom as well as in examinations. We describe the three corpora (Spanish, Chinese and Japanese) compiled by the Laboratorio de Lingüística Informática at the Autonomous University of Madrid (LLI-UAM). A web-based concordance tool has been used to search for examples in the corpus, and providing the text along with the corresponding audio. Teaching materials from the corpus, consisting the texts, the audio files and exercises on them, are currently on development.