Diego Moussallem


2019

pdf bib
A Holistic Natural Language Generation Framework for the Semantic Web
Axel-Cyrille Ngonga Ngomo | Diego Moussallem | Lorenz Bühmann
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

With the ever-growing generation of data for the Semantic Web comes an increasing demand for this data to be made available to non-semantic Web experts. One way of achieving this goal is to translate the languages of the Semantic Web into natural language. We present LD2NL, a framework that allows verbalizing the three key languages of the Semantic Web, i.e., RDF, OWL, and SPARQL. Our framework is based on a bottom-up approach to verbalization. We evaluated LD2NL in an open survey with 86 persons. Our results suggest that our framework can generate verbalizations that are close to natural languages and that can be easily understood by non-experts. Therewith, it enables non-domain experts to interpret Semantic Web data with more than 91% of the accuracy of domain experts.

2018

pdf bib
LIdioms: A Multilingual Linked Idioms Data Set
Diego Moussallem | Mohamed Ahmed Sherif | Diego Esteves | Marcos Zampieri | Axel-Cyrille Ngonga Ngomo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
RDF2PT: Generating Brazilian Portuguese Texts from RDF Data
Diego Moussallem | Thiago Ferreira | Marcos Zampieri | Maria Claudia Cavalcanti | Geraldo Xexéo | Mariana Neves | Axel-Cyrille Ngonga Ngomo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
NeuralREG: An end-to-end approach to referring expression generation
Thiago Castro Ferreira | Diego Moussallem | Ákos Kádár | Sander Wubben | Emiel Krahmer
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Traditionally, Referring Expression Generation (REG) models first decide on the form and then on the content of references to discourse entities in text, typically relying on features such as salience and grammatical function. In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extraction. Using a delexicalized version of the WebNLG corpus, we show that the neural model substantially improves over two strong baselines.

pdf bib
Enriching the WebNLG corpus
Thiago Castro Ferreira | Diego Moussallem | Emiel Krahmer | Sander Wubben
Proceedings of the 11th International Conference on Natural Language Generation

This paper describes the enrichment of WebNLG corpus (Gardent et al., 2017a,b), with the aim to further extend its usefulness as a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation. We also produce a silver-standard German translation of the corpus to enable the exploitation of NLG approaches to other languages than English. The enriched corpus is publicly available.

pdf bib
BENGAL: An Automatic Benchmark Generator for Entity Recognition and Linking
Axel-Cyrille Ngonga Ngomo | Michael Röder | Diego Moussallem | Ricardo Usbeck | René Speck
Proceedings of the 11th International Conference on Natural Language Generation

The manual creation of gold standards for named entity recognition and entity linking is time- and resource-intensive. Moreover, recent works show that such gold standards contain a large proportion of mistakes in addition to being difficult to maintain. We hence present Bengal, a novel automatic generation of such gold standards as a complement to manually created benchmarks. The main advantage of our benchmarks is that they can be readily generated at any time. They are also cost-effective while being guaranteed to be free of annotation errors. We compare the performance of 11 tools on benchmarks in English generated by Bengal and on 16 benchmarks created manually. We show that our approach can be ported easily across languages by presenting results achieved by 4 tools on both Brazilian Portuguese and Spanish. Overall, our results suggest that our automatic benchmark generation approach can create varied benchmarks that have characteristics similar to those of existing benchmarks. Our approach is open-source. Our experimental results are available at http://faturl.com/bengalexpinlg and the code at https://github.com/dice-group/BENGAL.