Leon Derczynski


2020

pdf bib
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
Marcos Zampieri | Preslav Nakov | Sara Rosenthal | Pepa Atanasova | Georgi Karadzhov | Hamdy Mubarak | Leon Derczynski | Zeses Pitenis | Çağrı Çöltekin
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We present the results and the main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval-2020). The task included three subtasks corresponding to the hierarchical taxonomy of the OLID schema from OffensEval-2019, and it was offered in five languages: Arabic, Danish, English, Greek, and Turkish. OffensEval-2020 was one of the most popular tasks at SemEval-2020, attracting a large number of participants across all subtasks and languages: a total of 528 teams signed up to participate in the task, 145 teams submitted official runs on the test data, and 70 teams submitted system description papers.

pdf bib
Accelerated High-Quality Mutual-Information Based Word Clustering
Manuel R. Ciosici | Ira Assent | Leon Derczynski
Proceedings of the 12th Language Resources and Evaluation Conference

Word clustering groups words that exhibit similar properties. One popular method for this is Brown clustering, which uses short-range distributional information to construct clusters. Specifically, this is a hard hierarchical clustering with a fixed-width beam that employs bi-grams and greedily minimizes global mutual information loss. The result is word clusters that tend to outperform or complement other word representations, especially when constrained by small datasets. However, Brown clustering has high computational complexity and does not lend itself to parallel computation. This, together with the lack of efficient implementations, limits their applicability in NLP. We present efficient implementations of Brown clustering and the alternative Exchange clustering as well as a number of methods to accelerate the computation of both hierarchical and flat clusters. We show empirically that clusters obtained with the accelerated method match the performance of clusters computed using the original methods.

pdf bib
Offensive Language and Hate Speech Detection for Danish
Gudbjartur Ingi Sigurbergsson | Leon Derczynski
Proceedings of the 12th Language Resources and Evaluation Conference

The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with this type of content. Until now, most of the research has focused on solving the problem for the English language, while the problem is multilingual. We construct a Danish dataset DKhate containing user-generated comments from various social media platforms, and to our knowledge, the first of its kind, annotated for various types and target of offensive language. We develop four automatic classification systems, each designed to work for both the English and the Danish language. In the detection of offensive language in English, the best performing system achieves a macro averaged F1-score of 0.74, and the best performing system for Danish achieves a macro averaged F1-score of 0.70. In the detection of whether or not an offensive post is targeted, the best performing system for English achieves a macro averaged F1-score of 0.62, while the best performing system for Danish achieves a macro averaged F1-score of 0.73. Finally, in the detection of the target type in a targeted offensive post, the best performing system for English achieves a macro averaged F1-score of 0.56, and the best performing system for Danish achieves a macro averaged F1-score of 0.63. Our work for both the English and the Danish language captures the type and targets of offensive language, and present automatic methods for detecting different kinds of offensive language such as hate speech and cyberbullying.

pdf bib
Maintaining Quality in FEVER Annotation
Leon Derczynski | Julie Binau | Henri Schulte
Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER)

We propose two measures for measuring the quality of constructed claims in the FEVER task. Annotating data for this task involves the creation of supporting and refuting claims over a set of evidence. Automatic annotation processes often leave superficial patterns in data, which learning systems can detect instead of performing the underlying task. Humans also can leave these superficial patterns, either voluntarily or involuntarily (due to e.g. fatigue). The two measures introduced attempt to detect the impact of these superficial patterns. One is a new information-theoretic and distributionality based measure, DCI; and the other an extension of neural probing work over the ARCT task, utility. We demonstrate these measures over a recent major dataset, that from the English FEVER task in 2019.

pdf bib
Detection and Resolution of Rumors and Misinformation with NLP
Leon Derczynski | Arkaitz Zubiaga
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

Detecting and grounding false and misleading claims on the web has grown to form a substantial sub-field of NLP. The sub-field addresses problems at multiple different levels of misinformation detection: identifying check-worthy claims; tracking claims and rumors; rumor collection and annotation; grounding claims against knowledge bases; using stance to verify claims; and applying style analysis to detect deception. This half-day tutorial presents the theory behind each of these steps as well as the state-of-the-art solutions.

2019

pdf bib
Political Stance in Danish
Rasmus Lehmann | Leon Derczynski
Proceedings of the 22nd Nordic Conference on Computational Linguistics

The task of stance detection consists of classifying the opinion within a text towards some target. This paper seeks to generate a dataset of quotes from Danish politicians, label this dataset to allow the task of stance detection to be performed, and present annotation guidelines to allow further expansion of the generated dataset. Furthermore, three models based on an LSTM architecture are designed, implemented and optimized to perform the task of stance detection for the generated dataset. Experiments are performed using conditionality and bi-directionality for these models, and using either singular word embeddings or averaged word embeddings for an entire quote, to determine the optimal model design. The simplest model design, applying neither conditionality or bi-directionality, and averaged word embeddings across quotes, yields the strongest results. Furthermore, it was found that inclusion of the quotes politician, and the party affiliation of the quoted politician, greatly improved performance of the strongest model.

pdf bib
Joint Rumour Stance and Veracity Prediction
Anders Edelbo Lillie | Emil Refsgaard Middelboe | Leon Derczynski
Proceedings of the 22nd Nordic Conference on Computational Linguistics

The net is rife with rumours that spread through microblogs and social media. Not all the claims in these can be verified. However, recent work has shown that the stances alone that commenters take toward claims can be sufficiently good indicators of claim veracity, using e.g. an HMM that takes conversational stance sequences as the only input. Existing results are monolingual (English) and mono-platform (Twitter). This paper introduces a stance-annotated Reddit dataset for the Danish language, and describes various implementations of stance classification models. Of these, a Linear SVM provides predicts stance best, with 0.76 accuracy / 0.42 macro F1. Stance labels are then used to predict veracity across platforms and also across languages, training on conversations held in one language and using the model on conversations held in another. In our experiments, monolinugal scores reach stance-based veracity accuracy of 0.83 (F1 0.68); applying the model across languages predicts veracity of claims with an accuracy of 0.82 (F1 0.67). This demonstrates the surprising and powerful viability of transferring stance-based veracity prediction across languages.

pdf bib
Bornholmsk Natural Language Processing: Resources and Tools
Leon Derczynski | Alex Speed Kjeldsen
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This paper introduces language processing resources and tools for Bornholmsk, a language spoken on the island of Bornholm, with roots in Danish and closely related to Scanian. This presents an overview of the language and available data, and the first NLP models for this living, minority Nordic language. Sammenfattnijng på borrijnholmst: Dæjnna artikkelijn introduserer natursprågsresurser å varktoi for borrijnholmst, ed språg a dær snakkes på ön Borrijnholm me rødder i danst å i nær familia me skånst. Artikkelijn gjer ed âuersyn âuer språged å di datan som fijnnes, å di fosste NLP modællarna for dætta læwenes nordiska minnretâlsspråged.

pdf bib
The Lacunae of Danish Natural Language Processing
Andreas Kirkedal | Barbara Plank | Leon Derczynski | Natalie Schluter
Proceedings of the 22nd Nordic Conference on Computational Linguistics

Danish is a North Germanic language spoken principally in Denmark, a country with a long tradition of technological and scientific innovation. However, the language has received relatively little attention from a technological perspective. In this paper, we review Natural Language Processing (NLP) research, digital resources and tools which have been developed for Danish. We find that availability of models and tools is limited, which calls for work that lifts Danish NLP a step closer to the privileged languages. Dansk abstrakt: Dansk er et nordgermansk sprog, talt primært i kongeriget Danmark, et land med stærk tradition for teknologisk og videnskabelig innovation. Det danske sprog har imidlertid været genstand for relativt begrænset opmærksomhed, teknologisk set. I denne artikel gennemgår vi sprogteknologi-forskning, -ressourcer og -værktøjer udviklet for dansk. Vi konkluderer at der eksisterer et fåtal af modeller og værktøjer, hvilket indbyder til forskning som løfter dansk sprogteknologi i niveau med mere priviligerede sprog.

pdf bib
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing
Joakim Nivre | Leon Derczynski | Filip Ginter | Bjørn Lindi | Stephan Oepen | Anders Søgaard | Jörg Tidemann
Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

pdf bib
Quantifying the morphosyntactic content of Brown Clusters
Manuel R. Ciosici | Leon Derczynski | Ira Assent
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Brown and Exchange word clusters have long been successfully used as word representations in Natural Language Processing (NLP) systems. Their success has been attributed to their seeming ability to represent both semantic and syntactic information. Using corpora representing several language families, we test the hypothesis that Brown and Exchange word clusters are highly effective at encoding morphosyntactic information. Our experiments show that word clusters are highly capable at distinguishing Parts of Speech. We show that increases in Average Mutual Information, the clustering algorithms’ optimization goal, are highly correlated with improvements in encoding of morphosyntactic information. Our results provide empirical evidence that downstream NLP systems addressing tasks dependent on morphosyntactic information can benefit from word cluster features.

pdf bib
SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours
Genevieve Gorrell | Elena Kochkina | Maria Liakata | Ahmet Aker | Arkaitz Zubiaga | Kalina Bontcheva | Leon Derczynski
Proceedings of the 13th International Workshop on Semantic Evaluation

Since the first RumourEval shared task in 2017, interest in automated claim validation has greatly increased, as the danger of “fake news” has become a mainstream concern. However automated support for rumour verification remains in its infancy. It is therefore important that a shared task in this area continues to provide a focus for effort, which is likely to increase. Rumour verification is characterised by the need to consider evolving conversations and news updates to reach a verdict on a rumour’s veracity. As in RumourEval 2017 we provided a dataset of dubious posts and ensuing conversations in social media, annotated both for stance and veracity. The social media rumours stem from a variety of breaking news stories and the dataset is expanded to include Reddit as well as new Twitter posts. There were two concrete tasks; rumour stance prediction and rumour verification, which we present in detail along with results achieved by participants. We received 22 system submissions (a 70% increase from RumourEval 2017) many of which used state-of-the-art methodology to tackle the challenges involved.

2018

pdf bib
IUCM at SemEval-2018 Task 11: Similar-Topic Texts as a Comprehension Knowledge Source
Sofia Reznikova | Leon Derczynski
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the IUCM entry at SemEval-2018 Task 11, on machine comprehension using commonsense knowledge. First, clustering and topic modeling are used to divide given texts into topics. Then, during the answering phase, other texts of the same topic are retrieved and used as commonsense knowledge. Finally, the answer is selected. While clustering itself shows good results, finding an answer proves to be more challenging. This paper reports the results of system evaluation and suggests potential improvements.

pdf bib
Proceedings of the 27th International Conference on Computational Linguistics
Emily M. Bender | Leon Derczynski | Pierre Isabelle
Proceedings of the 27th International Conference on Computational Linguistics

2017

pdf bib
SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours
Leon Derczynski | Kalina Bontcheva | Maria Liakata | Rob Procter | Geraldine Wong Sak Hoi | Arkaitz Zubiaga
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Media is full of false claims. Even Oxford Dictionaries named “post-truth” as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the nature of the discourse around it. RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics – each having their own families of claims and replies – and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.

pdf bib
Simple Open Stance Classification for Rumour Analysis
Ahmet Aker | Leon Derczynski | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Stance classification determines the attitude, or stance, in a (typically short) text. The task has powerful applications, such as the detection of fake news or the automatic extraction of attitudes toward entities or events in the media. This paper describes a surprisingly simple and efficient classification approach to open stance classification in Twitter, for rumour and veracity classification. The approach profits from a novel set of automatically identifiable problem-specific features, which significantly boost classifier accuracy and achieve above state-of-the-art results on recent benchmark datasets. This calls into question the value of using complex sophisticated models for stance classification without first doing informed feature extraction.

pdf bib
Proceedings of the 3rd Workshop on Noisy User-generated Text
Leon Derczynski | Wei Xu | Alan Ritter | Tim Baldwin
Proceedings of the 3rd Workshop on Noisy User-generated Text

pdf bib
Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition
Leon Derczynski | Eric Nichols | Marieke van Erp | Nut Limsopatham
Proceedings of the 3rd Workshop on Noisy User-generated Text

This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarization), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?!” – even human experts find the entity ‘kktny’ hard to detect and resolve. The goal of this task is to provide a definition of emerging and of rare entities, and based on that, also datasets for detecting these entities. The task as described in this paper evaluated the ability of participating entries to detect and classify novel and emerging named entities in noisy text.

2016

pdf bib
Complementarity, F-score, and NLP Evaluation
Leon Derczynski
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper addresses the problem of quantifying the differences between entity extraction systems, where in general only a small proportion a document should be selected. Comparing overall accuracy is not very useful in these cases, as small differences in accuracy may correspond to huge differences in selections over the target minority class. Conventionally, one may use per-token complementarity to describe these differences, but it is not very useful when the set is heavily skewed. In such situations, which are common in information retrieval and entity recognition, metrics like precision and recall are typically used to describe performance. However, precision and recall fail to describe the differences between sets of objects selected by different decision strategies, instead just describing the proportional amount of correct and incorrect objects selected. This paper presents a method for measuring complementarity for precision, recall and F-score, quantifying the difference between entity extraction approaches.

pdf bib
GATE-Time: Extraction of Temporal Expressions and Events
Leon Derczynski | Jannik Strötgen | Diana Maynard | Mark A. Greenwood | Manuel Jung
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

GATE is a widely used open-source solution for text processing with a large user community. It contains components for several natural language processing tasks. However, temporal information extraction functionality within GATE has been rather limited so far, despite being a prerequisite for many application scenarios in the areas of natural language processing and information retrieval. This paper presents an integrated approach to temporal information processing. We take state-of-the-art tools in temporal expression and event recognition and bring them together to form an openly-available resource within the GATE infrastructure. GATE-Time provides annotation in the form of TimeML events and temporal expressions complying with this mature ISO standard for temporal semantic annotation of documents. Major advantages of GATE-Time are (i) that it relies on HeidelTime for temporal tagging, so that temporal expressions can be extracted and normalized in multiple languages and across different domains, (ii) it includes a modern, fast event recognition and classification tool, and (iii) that it can be combined with different linguistic pre-processing annotations, and is thus not bound to license restricted preprocessing components.

pdf bib
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Leon Derczynski | Kalina Bontcheva | Ian Roberts
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

One of the main obstacles, hampering method development and comparative evaluation of named entity recognition in social media, is the lack of a sizeable, diverse, high quality annotated corpus, analogous to the CoNLL’2003 news dataset. For instance, the biggest Ritter tweet corpus is only 45,000 tokens – a mere 15% the size of CoNLL’2003. Another major shortcoming is the lack of temporal, geographic, and author diversity. This paper introduces the Broad Twitter Corpus (BTC), which is not only significantly bigger, but sampled across different regions, temporal periods, and types of Twitter users. The gold-standard named entity annotations are made by a combination of NLP experts and crowd workers, which enables us to harness crowd recall while maintaining high quality. We also measure the entity drift observed in our dataset (i.e. how entity representation varies over time), and compare to newswire. The corpus is released openly, including source text and intermediate annotations.

pdf bib
Representation and Learning of Temporal Relations
Leon Derczynski
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Determining the relative order of events and times described in text is an important problem in natural language processing. It is also a difficult one: general state-of-the-art performance has been stuck at a relatively low ceiling for years. We investigate the representation of temporal relations, and empirically evaluate the effect that various temporal relation representations have on machine learning performance. While machine learning performance decreases with increased representational expressiveness, not all representation simplifications have equal impact.

pdf bib
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)
Bo Han | Alan Ritter | Leon Derczynski | Wei Xu | Tim Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

pdf bib
Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text
Bo Han | Afshin Rahimi | Leon Derczynski | Timothy Baldwin
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

This paper presents the shared task for English Twitter geolocation prediction in WNUT 2016. We discuss details of task settings, data preparations and participant systems. The derived dataset and performance figures from each system provide baselines for future research in this realm.

pdf bib
SemEval-2016 Task 12: Clinical TempEval
Steven Bethard | Guergana Savova | Wei-Te Chen | Leon Derczynski | James Pustejovsky | Marc Verhagen
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Tune Your Brown Clustering, Please
Leon Derczynski | Sean Chester | Kenneth Bøgh
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Temporal Relation Classification using a Model of Tense and Aspect
Leon Derczynski | Robert Gaizauskas
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Efficient Named Entity Annotation through Pre-empting
Leon Derczynski | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Analysis of Temporal Expressions Annotated in Clinical Notes
Hegler Tissot | Angus Roberts | Leon Derczynski | Genevieve Gorrell | Marcus Didonet Del Fabro
Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11)

pdf bib
USFD: Twitter NER with Drift Compensation and Linked Data
Leon Derczynski | Isabelle Augenstein | Kalina Bontcheva
Proceedings of the Workshop on Noisy User-generated Text

pdf bib
Handling and Mining Linguistic Variation in UGC
Leon Derczynski
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

pdf bib
Swiss-Chocolate: Combining Flipout Regularization and Random Forests with Artificially Built Subsystems to Boost Text-Classification for Sentiment
Fatih Uzdilli | Martin Jaggi | Dominic Egger | Pascal Julmy | Leon Derczynski | Mark Cieliebak
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
SemEval-2015 Task 6: Clinical TempEval
Steven Bethard | Leon Derczynski | Guergana Savova | James Pustejovsky | Marc Verhagen
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
UFPRSheffield: Contrasting Rule-based and Support Vector Machine Approaches to Time Expression Identification in Clinical TempEval
Hegler Tissot | Genevieve Gorrell | Angus Roberts | Leon Derczynski | Marcos Didonet Del Fabro
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
DKIE: Open Source Information Extraction for Danish
Leon Derczynski | Camilla Vilhelmsen Field | Kenneth S. Bøgh
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy
Kalina Bontcheva | Ian Roberts | Leon Derczynski | Samantha Alexander-Eames
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets
Leon Derczynski | Kalina Bontcheva
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Marta Sabou | Kalina Bontcheva | Leon Derczynski | Arno Scharl
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range of other linguistic resources. Although the use of this approach is intensifying in all its key genres (paid-for crowdsourcing, games with a purpose, volunteering-based approaches), the community still lacks a set of best-practice guidelines similar to the annotation best practices for traditional, expert-based corpus acquisition. In this paper we focus on the use of crowdsourcing methods for corpus acquisition and propose a set of best practice guidelines based in our own experiences in this area and an overview of related literature. We also introduce GATE Crowd, a plugin of the GATE platform that relies on these guidelines and offers tool support for using crowdsourcing in a more principled and efficient manner.

2013

pdf bib
Empirical Validation of Reichenbach’s Tense Framework
Leon Derczynski | Robert Gaizauskas
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

pdf bib
Temporal Signals Help Label Temporal Relations
Leon Derczynski | Robert Gaizauskas
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Kalina Bontcheva | Leon Derczynski | Adam Funk | Mark Greenwood | Diana Maynard | Niraj Aswani
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Recognising and Interpreting Named Temporal Expressions
Matteo Brucato | Leon Derczynski | Hector Llorens | Kalina Bontcheva | Christian S. Jensen
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
Leon Derczynski | Alan Ritter | Sam Clark | Kalina Bontcheva
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations
Naushad UzZaman | Hector Llorens | Leon Derczynski | James Allen | Marc Verhagen | James Pustejovsky
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
TIMEN: An Open Temporal Expression Normalisation Resource
Hector Llorens | Leon Derczynski | Robert Gaizauskas | Estela Saquete
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Temporal expressions are words or phrases that describe a point, duration or recurrence in time. Automatically annotating these expressions is a research goal of increasing interest. Recognising them can be achieved with minimally supervised machine learning, but interpreting them accurately (normalisation) is a complex task requiring human knowledge. In this paper, we present TIMEN, a community-driven tool for temporal expression normalisation. TIMEN is derived from current best approaches and is an independent tool, enabling easy integration in existing systems. We argue that temporal expression normalisation can only be effectively performed with a large knowledge base and set of rules. Our solution is a framework and system with which to capture this knowledge for different languages. Using both existing and newly-annotated data, we present results showing competitive performance and invite the IE community to contribute to a knowledge base in order to solve the temporal expression normalisation problem.

pdf bib
Massively Increasing TIMEX3 Resources: A Transduction Approach
Leon Derczynski | Héctor Llorens | Estela Saquete
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Automatic annotation of temporal expressions is a research challenge of great interest in the field of information extraction. Gold standard temporally-annotated resources are limited in size, which makes research using them difficult. Standards have also evolved over the past decade, so not all temporally annotated data is in the same format. We vastly increase available human-annotated temporal expression resources by converting older format resources to TimeML/TIMEX3. This task is difficult due to differing annotation methods. We present a robust conversion tool and a new, large temporal expression resource. Using this, we evaluate our conversion process by using it as training data for an existing TimeML annotation tool, achieving a 0.87 F1 measure - better than any system in the TempEval-2 timex recognition exercise.

2010

pdf bib
USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2
Leon Derczynski | Robert Gaizauskas
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Analysing Temporally Annotated Corpora with CAVaT
Leon Derczynski | Robert Gaizauskas
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present CAVaT, a tool that performs Corpus Analysis and Validation for TimeML. CAVaT is an open source, modular checking utility for statistical analysis of features specific to temporally-annotated natural language corpora. It provides reporting, highlights salient links between a variety of general and time-specific linguistic features, and also validates a temporal annotation to ensure that it is logically consistent and sufficiently annotated. Uniquely, CAVaT provides analysis specific to TimeML-annotated temporal information. TimeML is a standard for annotating temporal information in natural language text. In this paper, we present the reporting part of CAVaT, and then its error-checking ability, including the workings of several novel TimeML document verification methods. This is followed by the execution of some example tasks using the tool to show relations between times, events, signals and links. We also demonstrate inconsistencies in a TimeML corpus (TimeBank) that have been detected with CAVaT.

2008

pdf bib
A Data Driven Approach to Query Expansion in Question Answering
Leon Derczynski | Jun Wang | Robert Gaizauskas | Mark A. Greenwood
Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question Answering