Carole Tiberius


2020

pdf bib
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi | John Philip McCrae | Sanni Nimb | Fahad Khan | Monica Monachini | Bolette Pedersen | Thierry Declerck | Tanja Wissik | Andrea Bellandi | Irene Pisani | Thomas Troelsgård | Sussi Olsen | Simon Krek | Veronika Lipp | Tamás Váradi | László Simon | András Gyorffy | Carole Tiberius | Tanneke Schoonheim | Yifat Ben Moshe | Maya Rudich | Raya Abu Ahmad | Dorielle Lonke | Kira Kovalenko | Margit Langemets | Jelena Kallas | Oksana Dereza | Theodorus Fransen | David Cillessen | David Lindemann | Mikel Alonso | Ana Salgado | José Luis Sancho | Rafael-J. Ureña-Ruiz | Jordi Porta Zamorano | Kiril Simov | Petya Osenova | Zara Kancheva | Ivaylo Radev | Ranka Stanković | Andrej Perdih | Dejan Gabrovsek
Proceedings of the 12th Language Resources and Evaluation Conference

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Stella Markantonatou | John McCrae | Jelena Mitrović | Carole Tiberius | Carlos Ramisch | Ashwini Vaidya | Petya Osenova | Agata Savary
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

2014

pdf bib
Taalportaal: an online grammar of Dutch and Frisian
Frank Landsbergen | Carole Tiberius | Roderik Dernison
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we present the Taalportaal project. Taalportaal will create an online portal containing an exhaustive and fully searchable electronic reference of Dutch and Frisian phonology, morphology and syntax. Its content will be in English. The main aim of the project is to serve the scientific community by organizing, integrating and completing the grammatical knowledge of both languages, and to make this data accessible in an innovative way. The project is carried out by a consortium of four universities and research institutions. Content is generated in two ways: (1) by a group of authors who, starting from existing grammatical resources, write text directly in XML, and (2) by integrating the full Syntax of Dutch into the portal, after an automatic conversion from Word to XML. We discuss the project’s workflow, content creation and management, the actual web application, and the way in which we plan to enrich the portal’s content, such as by crosslinking between topics and linking to external resources.

2008

pdf bib
Standardising Bilingual Lexical Resources According to the Lexicon Markup Framework
Isa Maks | Carole Tiberius | Remco van Veenendaal
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The Dutch HLT agency for language and speech technology (known as TST-centrale) at the Institute for Dutch Lexicology is responsible for the maintenance, distribution and accessibility of (Dutch) digital language resources. In this paper we present a project which aims to standardise the format of a set of bilingual lexicons in order to make them available to potential users, to facilitate the exchange of data (among the resources and with other (monolingual) resources) and to enable reuse of these lexicons for NLP applications like machine translation and multilingual information retrieval. We pay special attention to the methods and tools we used and to some of the problematic issues we encountered during the conversion process. As these problems are mainly caused by the fact that the standard LMF model fails in representing the detailed semantic and pragmatic distinctions made in our bilingual data, we propose some modifications to the standard. In general, we think that a standard for lexicons should provide a model for bilingual lexicons that is able to represent all detailed and fine-grained translation information which is generally found in these types of lexicons.

pdf bib
Accessing the ANW Dictionary
Fons Moerdijk | Carole Tiberius | Jan Niestadt
Coling 2008: Proceedings of the Workshop on Cognitive Aspects of the Lexicon (COGALEX 2008)

2004

pdf bib
Inflectional Syncretism and Corpora
Dunstan Brown | Carole Tiberius | Greville G. Corbett
Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora

2003

pdf bib
A Large-scale Inheritance-based Morphological Lexicon for Russian
Roger Evans | Carole Tiberius | Dunstan Brown | Greville C. Corbett
Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages

2002

pdf bib
How to build a multilingual inheritance-based lexicon
Carole Tiberius
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
A typological database of agreement
Carole Tiberius | Dunstan Brown | Greville Corbett
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Cross Linguistic Phoneme Correspondences
Lynne Cahill | Carole Tiberius
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

2000

pdf bib
Incorporating Metaphonemes in a Multilingual Lexicon
Carole Tiberius | Lynne Cahill
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics