Julian Moreno Schneider

Also published as: Julián Moreno-Schneider, Julian Moreno-Schneider, Julián Moreno Schneider


2020

pdf bib
Orchestrating NLP Services for the Legal Domain
Julian Moreno-Schneider | Georg Rehm | Elena Montiel-Ponsoda | Víctor Rodriguez-Doncel | Artem Revenko | Sotirios Karampatakis | Maria Khvalchik | Christian Sageder | Jorge Gracia | Filippo Maganza
Proceedings of the 12th Language Resources and Evaluation Conference

Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which includes partners from industry and research. The key contribution of this paper is a workflow manager that enables the flexible orchestration of workflows based on a portfolio of Natural Language Processing and Content Curation services as well as a Multilingual Legal Knowledge Graph that contains semantic information and meaningful references to legal documents. We also describe different use cases with which we experiment and develop prototypical solutions.

pdf bib
A Dataset of German Legal Documents for Named Entity Recognition
Elena Leitner | Georg Rehm | Julian Moreno-Schneider
Proceedings of the 12th Language Resources and Evaluation Conference

We describe a dataset developed for Named Entity Recognition in German federal court decisions. It consists of approx. 67,000 sentences with over 2 million tokens. The resource contains 54,000 manually annotated entities, mapped to 19 fine-grained semantic classes: person, judge, lawyer, country, city, street, landscape, organization, company, institution, court, brand, law, ordinance, European legal norm, regulation, contract, court decision, and legal literature. The legal documents were, furthermore, automatically annotated with more than 35,000 TimeML-based time expressions. The dataset, which is available under a CC-BY 4.0 license in the CoNNL-2002 format, was developed for training an NER service for German legal documents in the EU project Lynx.

pdf bib
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
Dmitrii Aksenov | Julian Moreno-Schneider | Peter Bourgonje | Robert Schwarzenberg | Leonhard Hennig | Georg Rehm
Proceedings of the 12th Language Resources and Evaluation Conference

We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modeling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.

pdf bib
A Workflow Manager for Complex NLP and Content Curation Workflows
Julian Moreno-Schneider | Peter Bourgonje | Florian Kintzel | Georg Rehm
Proceedings of the 1st International Workshop on Language Technology Platforms

We present a workflow manager for the flexible creation and customisation of NLP processing pipelines. The workflow manager addresses challenges in interoperability across various different NLP tasks and hardware-based resource usage. Based on the four key principles of generality, flexibility, scalability and efficiency, we present the first version of the workflow manager by providing details on its custom definition language, explaining the communication components and the general system architecture and setup. We currently implement the system, which is grounded and motivated by real-world industry use cases in several innovation and transfer projects.

pdf bib
Towards an Interoperable Ecosystem of AI and LT Platforms: A Roadmap for the Implementation of Different Levels of Interoperability
Georg Rehm | Dimitris Galanis | Penny Labropoulou | Stelios Piperidis | Martin Welß | Ricardo Usbeck | Joachim Köhler | Miltos Deligiannis | Katerina Gkirtzou | Johannes Fischer | Christian Chiarcos | Nils Feldhus | Julian Moreno-Schneider | Florian Kintzel | Elena Montiel | Víctor Rodríguez Doncel | John Philip McCrae | David Laqua | Irina Patricia Theile | Christian Dittmar | Kalina Bontcheva | Ian Roberts | Andrejs Vasiļjevs | Andis Lagzdiņš
Proceedings of the 1st International Workshop on Language Technology Platforms

With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider federation of AI/LT platforms. We illustrate the approach using the five emerging AI/LT platforms AI4EU, ELG, Lynx, QURATOR and SPEAKER.

2019

pdf bib
Developing and Orchestrating a Portfolio of Natural Legal Language Processing and Document Curation Services
Georg Rehm | Julián Moreno-Schneider | Jorge Gracia | Artem Revenko | Victor Mireles | Maria Khvalchik | Ilan Kernerman | Andis Lagzdins | Marcis Pinnis | Artus Vasilevskis | Elena Leitner | Jan Milde | Pia Weißenhorn
Proceedings of the Natural Legal Language Processing Workshop 2019

We present a portfolio of natural legal language processing and document curation services currently under development in a collaborative European project. First, we give an overview of the project and the different use cases, while, in the main part of the article, we focus upon the 13 different processing services that are being deployed in different prototype applications using a flexible and scalable microservices architecture. Their orchestration is operationalised using a content and document curation workflow manager.

2018

pdf bib
Automatic and Manual Web Annotations in an Infrastructure to handle Fake News and other Online Media Phenomena
Georg Rehm | Julian Moreno-Schneider | Peter Bourgonje
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
DFKI-DKT at SemEval-2017 Task 8: Rumour Detection and Classification using Cascading Heuristics
Ankit Srivastava | Georg Rehm | Julian Moreno Schneider
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe our submissions for SemEval-2017 Task 8, Determining Rumour Veracity and Support for Rumours. The Digital Curation Technologies (DKT) team at the German Research Center for Artificial Intelligence (DFKI) participated in two subtasks: Subtask A (determining the stance of a message) and Subtask B (determining veracity of a message, closed variant). In both cases, our implementation consisted of a Multivariate Logistic Regression (Maximum Entropy) classifier coupled with hand-written patterns and rules (heuristics) applied in a post-process cascading fashion. We provide a detailed analysis of the system performance and report on variants of our systems that were not part of the official submission.

pdf bib
Event Detection and Semantic Storytelling: Generating a Travelogue from a large Collection of Personal Letters
Georg Rehm | Julian Moreno Schneider | Peter Bourgonje | Ankit Srivastava | Jan Nehring | Armin Berger | Luca König | Sören Räuchle | Jens Gerth
Proceedings of the Events and Stories in the News Workshop

We present an approach at identifying a specific class of events, movement action events (MAEs), in a data set that consists of ca. 2,800 personal letters exchanged by the German architect Erich Mendelsohn and his wife, Luise. A backend system uses these and other semantic analysis results as input for an authoring environment that digital curators can use to produce new pieces of digital content. In our example case, the human expert will receive recommendations from the system with the goal of putting together a travelogue, i.e., a description of the trips and journeys undertaken by the couple. We describe the components and architecture and also apply the system to news data.

pdf bib
Semantic Storytelling, Cross-lingual Event Detection and other Semantic Services for a Newsroom Content Curation Dashboard
Julian Moreno-Schneider | Ankit Srivastava | Peter Bourgonje | David Wabnitz | Georg Rehm
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

We present a prototypical content curation dashboard, to be used in the newsroom, and several of its underlying semantic content analysis components (such as named entity recognition, entity linking, summarisation and temporal expression analysis). The idea is to enable journalists (a) to process incoming content (agency reports, twitter feeds, reports, blog posts, social media etc.) and (b) to create new articles more easily and more efficiently. The prototype system also allows the automatic annotation of events in incoming content for the purpose of supporting journalists in identifying important, relevant or meaningful events and also to adapt the content currently in production accordingly in a semi-automatic way. One of our long-term goals is to support journalists building up entire storylines with automatic means. In the present prototype they are generated in a backend service using clustering methods that operate on the extracted events.

pdf bib
From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles
Peter Bourgonje | Julian Moreno Schneider | Georg Rehm
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

We present a system for the detection of the stance of headlines with regard to their corresponding article bodies. The approach can be applied in fake news, especially clickbait detection scenarios. The component is part of a larger platform for the curation of digital content; we consider veracity and relevancy an increasingly important part of curating online information. We want to contribute to the debate on how to deal with fake news and related online phenomena with technological means, by providing means to separate related from unrelated headlines and further classifying the related headlines. On a publicly available data set annotated for the stance of headlines with regard to their corresponding article bodies, we achieve a (weighted) accuracy score of 89.59.

2016

pdf bib
Processing Document Collections to Automatically Extract Linked Data: Semantic Storytelling Technologies for Smart Curation Workflows
Peter Bourgonje | Julian Moreno Schneider | Georg Rehm | Felix Sasaki
Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016)

2010

pdf bib
UC3M System: Determining the Extent, Type and Value of Time Expressions in TempEval-2
María Teresa Vicente-Díez | Julián Moreno Schneider | Paloma Martínez
Proceedings of the 5th International Workshop on Semantic Evaluation