Juan-Manuel Torres-Moreno

Also published as: Juan-Manuel Torres, Juan-Manuel Torres Moreno


A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
Elvys Linhares Pontes | Juan-Manuel Torres-Moreno | Stéphane Huet | Andréa Carneiro Linhares
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Multi-Sentence Compression with Word Vertex-Labeled Graphs and Integer Linear Programming
Elvys Linhares Pontes | Stéphane Huet | Thiago Gouveia da Silva | Andréa carneiro Linhares | Juan-Manuel Torres-Moreno
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)

Multi-Sentence Compression (MSC) aims to generate a short sentence with key information from a cluster of closely related sentences. MSC enables summarization and question-answering systems to generate outputs combining fully formed sentences from one or several documents. This paper describes a new Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, and novel 3-gram scores to generate more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state-of-the-art for evaluations led on news dataset. We led both automatic and manual evaluations to determine the informativeness and the grammaticality of compressions for each dataset. Additional tests, which take advantage of the fact that the length of compressions can be modulated, still improve ROUGE scores with shorter output sentences.

Cyberbullying Detection Task: the EBSI-LIA-UNAM System (ELU) at COLING’18 TRAC-1
Ignacio Arroyo-Fernández | Dominic Forest | Juan-Manuel Torres-Moreno | Mauricio Carrasco-Ruiz | Thomas Legeleux | Karen Joannette
Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)

The phenomenon of cyberbullying has growing in worrying proportions with the development of social networks. Forums and chat rooms are spaces where serious damage can now be done to others, while the tools for avoiding on-line spills are still limited. This study aims to assess the ability that both classical and state-of-the-art vector space modeling methods provide to well known learning machines to identify aggression levels in social network cyberbullying (i.e. social network posts manually labeled as Overtly Aggressive, Covertly Aggressive and Non-aggressive). To this end, an exploratory stage was performed first in order to find relevant settings to test, i.e. by using training and development samples, we trained multiple learning machines using multiple vector space modeling methods and discarded the less informative configurations. Finally, we selected the two best settings and their voting combination to form three competing systems. These systems were submitted to the competition of the TRACK-1 task of the Workshop on Trolling, Aggression and Cyberbullying. Our voting combination system resulted second place in predicting Aggression levels on a test set of untagged social network posts.

Predicting the Semantic Textual Similarity with Siamese CNN and LSTM
Elvys Linhares Pontes | Stéphane Huet | Andréa Carneiro Linhares | Juan-Manuel Torres-Moreno
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Semantic Textual Similarity (STS) is the basis of many applications in Natural Language Processing (NLP). Our system combines convolution and recurrent neural networks to measure the semantic similarity of sentences. It uses a convolution network to take account of the local context of words and an LSTM to consider the global context of sentences. This combination of networks helps to preserve the relevant information of sentences and improves the calculation of the similarity between sentences. Our model has achieved good results and is competitive with the best state-of-the-art systems.

DEFT2018 : recherche d’information et analyse de sentiments dans des tweets concernant les transports en Île de France (DEFT2018 : Information Retrieval and Sentiment Analysis in Tweets about Public Transportation in Île de France Region )
Patrick Paroubek | Cyril Grouin | Patrice Bellot | Vincent Claveau | Iris Eshkol-Taravella | Amel Fraisse | Agata Jackiewicz | Jihen Karoui | Laura Monceaux | Juan-Manuel Torres-Moreno
Actes de la Conférence TALN. Volume 2 - Démonstrations, articles des Rencontres Jeunes Chercheurs, ateliers DeFT

Cet article présente l’édition 2018 de la campagne d’évaluation DEFT (Défi Fouille de Textes). A partir d’un corpus de tweets, quatre tâches ont été proposées : identifier les tweets sur la thématique des transports, puis parmi ces derniers, identifier la polarité (négatif, neutre, positif, mixte), identifier les marqueurs de sentiment et la cible, et enfin, annoter complètement chaque tweet en source et cible des sentiments exprimés. Douze équipes ont participé, majoritairement sur les deux premières tâches. Sur l’identification de la thématique des transports, la micro F-mesure varie de 0,827 à 0,908. Sur l’identification de la polarité globale, la micro F-mesure varie de 0,381 à 0,823.


Classification and Optimization Algorithms: the LIA/ADOC participation at DEFT’14 (Algorithmes de classification et d’optimisation : participation du LIA/ADOC à DEFT’14) [in French]
Luis Adrián Cabrera-Diego | Stéphane Huet | Bassam Jabaian | Alejandro Molina | Juan-Manuel Torres-Moreno | Marc El-Bèze | Barthélémy Durette
TALN-RECITAL 2014 Workshop DEFT 2014 : DÉfi Fouille de Textes (DEFT 2014 Workshop: Text Mining Challenge)


SegCV : Eficient parsing of résumés with analysis and correction of errors (SegCV : traitement efficace de CV avec analyse et correction d’erreurs) [in French]
Luis Adrián Cabrera-Diego | Juan-Manuel Torres-Moreno | Marc El-Bèze
Proceedings of TALN 2013 (Volume 2: Short Papers)

Search and usage of named conceptual entities in a categorisazion task (Recherche et utilisation d’entités nommées conceptuelles dans une tâche de catégorisation) [in French]
Jean-Valère Cossu | Juan-Manuel Torres-Moreno | Marc El-Bèze
Proceedings of TALN 2013 (Volume 2: Short Papers)


The RST Spanish Treebank On-line Interface
Iria da Cunha | Juan-Manuel Torres-Moreno | Gerardo Sierra | Luis-Adrián Cabrera-Diego | Brenda-Gabriela Castro-Rolón | Juan-Miguel Rolland Bartilotti
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

On the Development of the RST Spanish Treebank
Iria da Cunha | Juan-Manuel Torres-Moreno | Gerardo Sierra
Proceedings of the 5th Linguistic Annotation Workshop


Multilingual Summarization Evaluation without Human Models
Horacio Saggion | Juan-Manuel Torres-Moreno | Iria da Cunha | Eric SanJuan | Patricia Velázquez-Morales
Coling 2010: Posters

Automatic Summarization Using Terminological and Semantic Resources
Jorge Vivaldi | Iria da Cunha | Juan-Manuel Torres-Moreno | Patricia Velázquez-Morales
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents a new algorithm for automatic summarization of specialized texts combining terminological and semantic resources: a term extractor and an ontology. The term extractor provides the list of the terms that are present in the text together their corresponding termhood. The ontology is used to calculate the semantic similarity among the terms found in the main body and those present in the document title. The general idea is to obtain a relevance score for each sentence taking into account both the ”termhood” of the terms found in such sentence and the similarity among such terms and those terms present in the title of the document. The phrases with the highest score are chosen to take part of the final summary. We evaluate the algorithm with Rouge, comparing the resulting summaries with the summaries of other summarizers. The sentence selection algorithm was also tested as part of a standalone summarizer. In both cases it obtains quite good results although the perception is that there is a space for improvement.

NLGbAse: A Free Linguistic Resource for Natural Language Processing Systems
Eric Charton | Juan-Manuel Torres-Moreno
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Availability of labeled language resources, such as annotated corpora and domain dependent labeled language resources is crucial for experiments in the field of Natural Language Processing. Most often, due to lack of resources, manual verification and annotation of electronic text material is a prerequisite for the development of NLP tools. In the context of under-resourced language, the lack of copora becomes a crucial problem because most of the research efforts are supported by organizations with limited funds. Using free, multilingual and highly structured corpora like Wikipedia to produce automatically labeled language resources can be an answer to those needs. This paper introduces NLGbAse, a multilingual linguistic resource built from the Wikipedia encyclopedic content. This system produces structured metadata which make possible the automatic annotation of corpora with syntactical and semantical labels. A metadata contains semantical and statistical informations related to an encyclopedic document. To validate our approach, we built and evaluated a Named Entity Recognition tool, trained with Wikipedia corpora annotated by our system.

A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression
Claude de Loupy | Marie Guégan | Christelle Ayache | Somara Seng | Juan-Manuel Torres Moreno
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of 10 documents per topic is summarized and then the second set is used to produce an update summarization (new information). 4 annotators were involved and produced a total of 160 abstracts. The second corpus contains all the sentences of the first one. 4 annotators were asked to compress the 8432 sentences. This is the biggest corpus of compressed sentences we know, whatever the language. The paper provides some figures in order to compare the different annotators: compression rates, number of tokens per sentence, percentage of tokens kept according to their POS, position of dropped tokens in the sentence compression phase, etc. These figures show important differences from an annotator to the other. Another point is the different strategies of compression used according to the length of the sentence.


Proceedings of the 1st Workshop on Definition Extraction
Gerardo Sierra | Mara Pozzi | Juan-Manuel Torres
Proceedings of the 1st Workshop on Definition Extraction


A Scalable MMR Approach to Sentence Scoring for Multi-Document Update Summarization
Florian Boudin | Marc El-Bèze | Juan-Manuel Torres-Moreno
Coling 2008: Companion volume: Posters