Filip Ilievski


2020

pdf bib
Large-scale Cross-lingual Language Resources for Referencing and Framing
Piek Vossen | Filip Ilievski | Marten Postma | Antske Fokkens | Gosse Minnema | Levi Remijnse
Proceedings of the 12th Language Resources and Evaluation Conference

In this article, we lay out the basic ideas and principles of the project Framing Situations in the Dutch Language. We provide our first results of data acquisition, together with the first data release. We introduce the notion of cross-lingual referential corpora. These corpora consist of texts that make reference to exactly the same incidents. The referential grounding allows us to analyze the framing of these incidents in different languages and across different texts. During the project, we will use the automatically generated data to study linguistic framing as a phenomenon, build framing resources such as lexicons and corpora. We expect to capture larger variation in framing compared to traditional approaches for building such resources. Our first data release, which contains structured data about a large number of incidents and reference texts, can be found at http://dutchframenet.nl/data-releases/.

pdf bib
Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering
Peifeng Wang | Nanyun Peng | Filip Ilievski | Pedro Szekely | Xiang Ren
Findings of the Association for Computational Linguistics: EMNLP 2020

Commonsense question answering (QA) requires background knowledge which is not explicitly stated in a given context. Prior works use commonsense knowledge graphs (KGs) to obtain this knowledge for reasoning. However, relying entirely on these KGs may not suffice, considering their limited coverage and the contextual dependence of their knowledge. In this paper, we augment a general commonsense QA framework with a knowledgeable path generator. By extrapolating over existing paths in a KG with a state-of-the-art language model, our generator learns to connect a pair of entities in text with a dynamic, and potentially novel, multi-hop relational path. Such paths can provide structured evidence for solving commonsense questions without fine-tuning the path generator. Experiments on two datasets show the superiority of our method over previous works which fully rely on knowledge from KGs (with up to 6% improvement in accuracy), across various amounts of training data. Further evaluation suggests that the generated paths are typically interpretable, novel, and relevant to the task.

pdf bib
Combining Conceptual and Referential Annotation to Study Variation in Framing
Marten Postma | Levi Remijnse | Filip Ilievski | Antske Fokkens | Sam Titarsolej | Piek Vossen
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet

We introduce an annotation tool whose purpose is to gain insights into variation of framing by combining FrameNet annotation with referential annotation. English FrameNet enables researchers to study variation in framing at the conceptual level as well through its packaging in language. We enrich FrameNet annotations in two ways. First, we introduce the referential aspect. Secondly, we annotate on complete texts to encode connections between mentions. As a result, we can analyze the variation of framing for one particular event across multiple mentions and (cross-lingual) documents. We can examine how an event is framed over time and how core frame elements are expressed throughout a complete text. The data model starts with a representation of an event type. Each event type has many incidents linked to it, and each incident has several reference texts describing it as well as structured data about the incident. The user can apply two types of annotations: 1) mappings from expressions to frames and frame elements, 2) reference relations from mentions to events and participants of the structured data.

2018

pdf bib
SemEval-2018 Task 5: Counting Events and Participants in the Long Tail
Marten Postma | Filip Ilievski | Piek Vossen
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper discusses SemEval-2018 Task 5: a referential quantification task of counting events and participants in local, long-tail news documents with high ambiguity. The complexity of this task challenges systems to establish the meaning, reference and identity across documents. The task consists of three subtasks and spans across three domains. We detail the design of this referential quantification task, describe the participating systems, and present additional analysis to gain deeper insight into their performance.

pdf bib
Don’t Annotate, but Validate: a Data-to-Text Method for Capturing Event Data
Piek Vossen | Filip Ilievski | Marten Postma | Roxane Segers
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Systematic Study of Long Tail Phenomena in Entity Linking
Filip Ilievski | Piek Vossen | Stefan Schlobach
Proceedings of the 27th International Conference on Computational Linguistics

State-of-the-art entity linkers achieve high accuracy scores with probabilistic methods. However, these scores should be considered in relation to the properties of the datasets they are evaluated on. Until now, there has not been a systematic investigation of the properties of entity linking datasets and their impact on system performance. In this paper we report on a series of hypotheses regarding the long tail phenomena in entity linking datasets, their interaction, and their impact on system performance. Our systematic study of these hypotheses shows that evaluation datasets mainly capture head entities and only incidentally cover data from the tail, thus encouraging systems to overfit to popular/frequent and non-ambiguous cases. We find the most difficult cases of entity linking among the infrequent candidates of ambiguous forms. With our findings, we hope to inspire future designs of both entity linking systems and evaluation datasets. To support this goal, we provide a list of recommended actions for better inclusion of tail cases.

2016

pdf bib
Context-enhanced Adaptive Entity Linking
Filip Ilievski | Giuseppe Rizzo | Marieke van Erp | Julien Plu | Raphaël Troncy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

More and more knowledge bases are publicly available as linked data. Since these knowledge bases contain structured descriptions of real-world entities, they can be exploited by entity linking systems that anchor entity mentions from text to the most relevant resources describing those entities. In this paper, we investigate adaptation of the entity linking task using contextual knowledge. The key intuition is that entity linking can be customized depending on the textual content, as well as on the application that would make use of the extracted information. We present an adaptive approach that relies on contextual knowledge from text to enhance the performance of ADEL, a hybrid linguistic and graph-based entity linking system. We evaluate our approach on a domain-specific corpus consisting of annotated WikiNews articles.

pdf bib
Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job
Marieke van Erp | Pablo Mendes | Heiko Paulheim | Filip Ilievski | Julien Plu | Giuseppe Rizzo | Joerg Waitelonis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Entity linking has become a popular task in both natural language processing and semantic web communities. However, we find that the benchmark datasets for entity linking tasks do not accurately evaluate entity linking systems. In this paper, we aim to chart the strengths and weaknesses of current benchmark datasets and sketch a roadmap for the community to devise better benchmark datasets.

pdf bib
Semantic overfitting: what ‘world’ do we consider when evaluating disambiguation of text?
Filip Ilievski | Marten Postma | Piek Vossen
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Semantic text processing faces the challenge of defining the relation between lexical expressions and the world to which they make reference within a period of time. It is unclear whether the current test sets used to evaluate disambiguation tasks are representative for the full complexity considering this time-anchored relation, resulting in semantic overfitting to a specific period and the frequent phenomena within. We conceptualize and formalize a set of metrics which evaluate this complexity of datasets. We provide evidence for their applicability on five different disambiguation tasks. To challenge semantic overfitting of disambiguation systems, we propose a time-based, metric-aware method for developing datasets in a systematic and semi-automated manner, as well as an event-based QA task.

pdf bib
Moving away from semantic overfitting in disambiguation datasets
Marten Postma | Filip Ilievski | Piek Vossen | Marieke van Erp
Proceedings of the Workshop on Uphill Battles in Language Processing: Scaling Early Achievements to Robust Methods